ETCSLManual |
The Electronic Text Corpus of Sumerian Literature (ETCSL) consists of 394 literary compositions in transliterated Sumerian and their English translations. The sources for the compositions date to the late third and early second millennia BCE and come from ancient Mesopotamia (modern Iraq).
Some of the compositions have only one source; others have many, from which a modern edition, referred to as a composite text, has been compiled. The primary sources are clay tablets, which are often broken or damaged in some way. Thus, the compositions are in many cases incompletely preserved, and there may be several lacunae of varying size in a particular composite text.
Many of the sources come from a school setting and were written by apprentice scribes who may not have been fluent in the Sumerian language. All of these factors represent a great challenge not only to the modern scholars editing the compositions, but also to the encoding framework.
The compositions are divided into seven categories: ancient literary catalogues (not translated), narrative and mythological compositions, royal praise poetry and compositions with a historical background, literary letters and letter-prayers, hymns and cult songs, other literature, and proverb collections. The categories are numbered 0–6 and the numbers form part of the corpus file names.
The corpus is encoded according to the TEI guidelines and the corpus files are in ASCII format.
1.2 Transliteration and translation
The Sumerian sources have been transliterated using Roman characters, with a few additional characters and symbols (see 2.8). To achieve a fluent English prose translation, several lines of transliterated Sumerian usually correspond to one English paragraph, as shown next. The id and corresp attributes of the <l> and <p> tags link the transliteration and translation.
<l n="11" id="c111.11" corresp="t111.p3"> dilmun&ki;-a uga&mucen; gu3-gu3 nu-mu-ni-be2</l> <l n="12" id="c111.12" corresp="t111.p3"> dar&mucen;-e gu3 dar&mucen;-re nu-mu-ni-ib-be2</l> <l n="13" id="c111.13" corresp="t111.p3"> ur-gu-la saj jic nu-ub-ra-ra</l> <l n="14" id="c111.14" corresp="t111.p3"> ur-bar-ra-ke4 sila4 nu-ub-kar-re</l> <l n="15" id="c111.15" corresp="t111.p3"> ur-gir15 mac2 gam-gam nu-ub-zu</l> <l n="16" id="c111.16" corresp="t111.p3"> cah2 ce gu7-gu7-e nu-ub-zu</l> |
<p id="t111.p3" n="11-16" corresp="c111.11">In <w type="GN">Dilmun</w> the raven was not yet cawing, the partridge not cackling. The lion did not slay, the wolf was not carrying off lambs, the dog had not been taught to make kids curl up, the pig had not learned that grain was to be eaten.</p> |
The corpus is freely available on the Internet and the XML source files can be downloaded from the Oxford Text Archive. When referring to or citing the corpus, please use the following form of citation:
Black, J.A., Cunningham, G., Ebeling, J., Flückiger-Hawker, E., Robson, E., Taylor, J., and Zólyomi, G., The Electronic Text Corpus of Sumerian Literature (http://etcsl.orinst.ox.ac.uk/), Oxford 1998–2006.
The ETCSL project would like to thank Miguel Civil and Steve Tinney, whose contributions were fundamental to the shape and success of the project. Miguel's unpublished catalogue of Sumerian literature, the product of decades of brilliant and painstaking work, lies at the heart of the corpus. Steve's wisdom and expertise were crucial throughout, from the earliest planning stages to the final lemmatisation.
We are also extremely grateful to all those who have contributed source material to the project: Bendt Alster, Antoine Cavigneaux, Miguel Civil, Gertrud Farber, Andrew George, Geerd Haayer, Bram Jagersma, Joachim Krecher, Marie-Christine Ludwig, Piotr Michalowski, Martha Roth, Yitschak Sefati, Steve Tinney, Herman Vanstiphout, Niek Veldhuis, Konrad Volk, Christopher Walker, Claus Wilcke, and Annette Zgoll. Many other colleagues have given helpful advice, in particular Pascal Attinger, Cale Johnson, Alan Millard, and Nicholas Postgate. Four research students made essential contributions to the project: Ronan Head, Anne Löhnert, Naoko Ohgama, and Faimon Roberts.
The ETCSL is in broad agreement with version P4 of the TEI guidelines (http://www.tei-c.org/Guidelines2/index.html). In the few cases where the coding of the ETCSL differs from the guidelines, this has been done for practical as well as historical reasons. The modifications have been incorporated into the TEI DTD according to the procedure described in chapter 29 of the guidelines, Modifying and Customizing the TEI DTD.
In principle, any tag, including its attributes, of the following TEI tag sets can be used in an ETCSL document: TEI.corpus, TEI.prose, TEI.XML, TEI.analysis, TEI.linking and TEI.transcr. The provision is that they do not clash with the ETCSL-specific modifications.
2.2 Transliterating cuneiform writing
The cuneiform sign inventory consists of more than 500 signs, encoding values which usually have the form V, CV or VC; CVC sequences are often written CV-VC.
A sign can be used logographically or phonographically. For example, used logographically the sign 'A' has the value 'a' and a meaning 'water'; used phonographically it has the same value but may encode the locative case marker or an interjection meaning 'Alas!'.
In addition, the same sign was often used to encode words of a similar meaning and/or of a similar sound. Consequently one sign can have more than one value. The sign 'IM', for example, can be read 'iškur' (the name of a wind god), 'tum9' (wind), 'im' (thunderstorm), or 'ni2' (self), depending on the context.
So far as we can judge, Sumerian seems to include a large number of homophones, that is words written differently, and with a different meaning, but with the same pronunciation. To distinguish between these homophones, subscript numerals are used in our alphabetic transliteration system. For example, in 'gu' (thread) the lack of a subscript numeral indicates that the word was written with the sign referred to as 'GU'; in 'gu2' (neck), the numeral indicates that the word was written with the sign referred to as 'GU2'; and in 'gu3' (voice), the numeral indicates that the word was written with the sign referred to as 'KA'; such sequences can continue with 4, 5, etc. (An older convention was to use an acute accent rather than the numeral 2, and a grave accent rather than the numeral 3, that is 'gú' for 'gu2' and 'gù' for 'gu3'.) Our convention is to display the numerals in subscript; however, plain numerals or the entities &s1;, &s2;, etc. are used in the XML version of the data. See further 2.8 Special characters and entities.
In a few instances no general conventions have been established for correlating a reading to a particular sign. In such cases the sign is specified in brackets after the reading. If the value is already associated with another sign, the following convention is used: 'tubax(TUG2)'. However, if the value is not associated with another sign, the convention is as follows: 'unud(UNU3)'. The capitalised sequences are referred to as sign names. They are also used unbracketed in transliteration when context cannot distinguish between the various possible readings of a sign. For example, the sign 'IM' is transliterated 'IM' when the appropriate reading cannot be identified. When the sign itself cannot be identified it is transliterated as 'X'. Some signs have complex names; for example, 'NUNtenû' indicates that the sign in question is written with the sign 'NUN' placed at an angle (tenû being the Akkadian technical term).
In addition to instances such as 'unud(UNU3)' in which the bracketed sign name specifies how the word was written, brackets have a further function in transliteration, being used to qualify the presence of a sign in the source. In these cases, the contents of the brackets specify the qualified sign, in lower case if the appropriate reading can be suggested, e.g. '(tum9)'; in capitals if not, e.g. '(IM)'; and as '(X)' if no suggestion can be made.
The following conventions have been used to encode special sign configurations:
Symbol | Example | Explanation |
. | &d;ŠAR.ŠAR.GABA | the sign sequence forms one word(stem) |
/ | ŠIR.LU.KUŠ/ŠIR.LU.KUŠ | the first sign (sequence) is written above the second (sequence) |
: | gal:simug | the pronunciation is the reverse of the written sequence |
+ | ERIN2+KISIM5 | the two signs form a ligature |
× | KA×A | the second sign (sequence) is written within or conjoined to the first sign (sequence) |
(inverted) | AN.NISABA.(inverted)AN.NISABA | the second sign (sequence) is written inverted in relation to the first sign (sequence) |
Since these symbols have special functions, they are not used with their usual meaning of full stop, etc. in the body of the transliterations. The only exception is when they occur inside English notes.
Hyphens in transliteration have several functions.
For more details on hyphenation, see the Hyphenation principles (pdf file).
2.3 The overall structure of the corpus
The structure is basically the same for the transliterations and the translations. The transliterations are, however, much more richly annotated. A bibliography document, which is separate from the transliterations and the translations, differs in structure from the other documents, but is also in agreement with the TEI guidelines. The source of the bibliography document is a FileMaker Pro database.
All documents start and end with the <TEI.2> document tag.
<TEI.2 id="c.0.1.1">
…
</TEI.2>
Each document has a header element (<teiHeader>) and a text element (<text>). All headers are in English; text is in Sumerian for the transliterations and in English for the translations. The bibliography document contains no data except what is in its header, that is the <text> element of the document is empty.
The value of the id attribute of the <TEI.2> document tag uniquely identifies a composition within the corpus. The identifier starts with lower case 'c' for the transliterations and lower case 't' for the translations. In the bibliography document, the id attribute value is simply 'etcsl-bibliography'.
When there are several versions of the same composition in one TEI document, the <group> tag is used and the <text> elements within the <group> tag are numbered.
<TEI.2 id="c.2.4.1.1">
<teiHeader lang="eng">
…
</teiHeader>
<text lang="sux">
<group>
<text n="1">
…
</text>
<text n="2">
…
</text>
</group>
</text>
</TEI.2>
The <group> tag is also used to group together various sources containing proverbs.
When we have only one version of a composition in a TEI document, there is no need for the <group> tag, and the <text> element is not numbered.
<TEI.2 id="c.1.2.1">
<teiHeader lang="eng">
…
</teiHeader>
<text lang="sux">
…
</text>
</TEI.2>
To be able to process all the documents, transliterations as well as translations, an outer wrapper in the form of a <teiCorpus.2> element is used. This acts as a container and makes it easy to, for instance, validate the corpus using an XML validation program.
<teiCorpus.2 id="ETCSL">
<TEI.2 id="c.0.1.1">
…
</TEI.2>
…
<TEI.2 id="c.0.1.2">
…
</TEI.2>
</teiCorpus.2>
The corpus header acts as a container for certain values of id attributes used as targets or anchors in the individual documents, e.g. the language attributes. Apart from these unique ids and the name and address of the distributor, the corpus header is empty.
<teiHeader type="corpus">
<fileDesc>
<titleStmt>
<title></title>
</titleStmt>
<publicationStmt><distributor>The Electronic Text Corpus of Sumerian Literature</distributor>
<address><addrLine>The Oriental Institute, University of Oxford, Pusey Lane, Oxford OX1 2LE, UK, email etcsl@orinst.ox.ac.uk, http://etcsl.orinst.ox.ac.uk</addrLine></address>
<date>2000</date></publicationStmt>
<notesStmt>
<note id="scribe"/>
<note id="editor"/>
<note id="x">dummy line/paragraph id</note>
</notesStmt>
<sourceDesc>
<biblStruct>
<monogr>
<author></author>
<title></title>
<imprint>
<pubPlace></pubPlace>
<publisher></publisher>
<date></date>
</imprint>
</monogr>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<langUsage>
<language id="sux"/>
<language id="akk"/>
<language id="eng"/>
<language id="dut"/>
<language id="ger"/>
<language id="fre"/>
<language id="fra"/>
</langUsage>
<handList>
<hand id="GC"/>
<hand id="NN"/>
</handList>
<textClass>
<classCode>
<sic id="scribes"/>
<sic id="editors"/>
</classCode>
</textClass>
</profileDesc>
</teiHeader>
The document header contains more information than the corpus header, but it also has a very simple structure. It consists of two main parts only, the file description, <fileDesc>, and the revision description, <revisionDesc>.
The <fileDesc> consists of the <titleStmt>, <publicationStmt>, <seriesStmt>, and the <sourceDesc>. The <sourceDesc> in the transliterations distinguishes between 'secondary' and 'tablets'. Each secondary source has a number which corresponds to an id in the bibliography document. The <sourceDesc> is occasionally preceded by a <notesStmt> which provides further information about the text part of the document.
The <revisionDesc> contains a list of all the revisions the document has gone through, with date of revision, name of person responsible, and purpose of revision. A detailed discussion of the use of the elements found in the <teiHeader> can be found in chapter 5, The TEI Header, of the TEI guidelines.
<teiHeader lang="eng">
<fileDesc>
<titleStmt>
<title>Letter from &C;ulgi (?) to Arad&g;u about troops -- a composite transliteration</title>
<respStmt><name>Jeremy Black</name><resp>Project Director and Editor</resp></respStmt>
<respStmt><name>Gábor Zólyomi</name><resp>Editor</resp></respStmt>
<respStmt><name>Graham Cunningham</name><resp>Editor</resp></respStmt>
<respStmt><name>Esther Flückiger-Hawker</name><resp>Editor</resp></respStmt>
<respStmt><name>Eleanor Robson</name><resp>Technical Developer</resp></respStmt>
<respStmt><name>Jarle Ebeling</name><resp>Technical Developer</resp></respStmt>
</titleStmt>
<publicationStmt>
<distributor>The Electronic Text Corpus of Sumerian Literature
</distributor><address><addrLine>The Oriental Institute, University of Oxford, Pusey Lane, Oxford OX1 2LE, UK, email etcsl@orinst.ox.ac.uk, http://etcsl.orinst.ox.ac.uk</addrLine></address>
<date>2001</date>
</publicationStmt>
<seriesStmt><p>The Electronic Text Corpus of Sumerian Literature aims to make accessible over 400 literary works composed in the Sumerian language in ancient Mesopotamia during the late third and early second millennia BC. The works are in Sumerian composite transliteration and English prose translation with bibliographical information on each composition.</p></seriesStmt>
<sourceDesc>
<listBibl type="secondary">
<bibl><xref from="b729">Michalowski 1976</xref><biblScope type="page">184-185</biblScope><biblScope type="contents">score transliteration, commentary</biblScope></bibl>
<bibl><xref from="b732">Huber 1998</xref><biblScope type="page">36</biblScope><biblScope type="contents">commentary</biblScope></bibl>
<bibl><xref from="b309">Krecher 1996b</xref><biblScope type="contents">composite text, translation</biblScope></bibl>
</listBibl>
<listBibl type="tablets">
<bibl><biblScope type="museumNumber">Ni 9703 (ISET 2 120) c 2'-9'</biblScope><biblScope type="composLines">1-8</biblScope></bibl>
</listBibl>
</sourceDesc>
</fileDesc>
<revisionDesc>
<change><date>03.v.2001</date><respStmt><name>EFH</name><resp>editor</resp></respStmt> <item>standardisation</item></change>
<change><date>09.viii.2001</date><respStmt><name>JAB</name><resp>editor</resp></respStmt> <item>proofreading</item></change>
<change><date>11.viii.2001</date><respStmt><name>GZ</name><resp>editor</resp></respStmt> <item>SGML tagging</item></change>
<change><date>21.ix.2001</date><respStmt><name>ER</name><resp>editor</resp></respStmt> <item>proofreading SGML</item></change>
<change><date>01.vi.2003</date><respStmt><name>GC/JE</name><resp>editor/technical developer</resp></respStmt> <item>XML/TEI conversion</item></change>
</revisionDesc>
</teiHeader>
The translations have a similar header to the transliterations except that there are no bibliographic references.
The header of the bibliography document differs from the headers of the transliterations and the translations. It contains all the works cited in the transliteration documents, e.g. Huber 1998 (see above). Each work is encoded by a <biblStruct> element.
<biblStruct id="b732" lang="fre" type="unpublished work">
<analytic>
<author>Huber, Fabienne</author>
<title>Etude sur l'Authenticité de la Correspondence Royale d'Ur</title>
</analytic>
<monogr>
<imprint>
<publisher>University of Geneva</publisher>
<date value="2003">1998</date>
</imprint>
</monogr>
<note>Mémoire de licence</note>
<note>lic.phil.</note>
</biblStruct>
Another example is
<biblStruct id="b226" lang="fre" type="article">
<analytic>
<author>Cavigneaux, A.</author>
<author>F.N.H. Al-Rawi</author>
<title>Gilgameš et Taureau de Ciel (šul-mè-kam) (Textes de Tell Haddad IV)</title>
</analytic>
<monogr>
<title>Revue d'Assyriologie</title>
<imprint>
<pubPlace>Paris</pubPlace>
<date>1993</date>
</imprint>
<biblScope type="vol">87</biblScope>
<biblScope type="issue">2</biblScope>
<biblScope type="page">97-129</biblScope>
<biblScope type="plate">I-III</biblScope>
</monogr>
</biblStruct>
2.5.1 The <text>, <body>, <div1>, <lg>, <trailer> and <head> tags
Unless outside a <group> tag, each <text> element contains a <body> tag. The latter can be subdivided into segments and/or line groups if necessary.
A composition is divided into segments when it has a lacuna containing an unknown number of lines. In accordance with the TEI guidelines, the segments are encoded as <div1>. The example below, which is of the kind where there are several versions of the same composition, shows a common structure. The versions (<text>) are numbered using Arabic numbers, while we use capital letters to number the <div1> level. The <head> element is used to tag the heading of the relevant version. For display in HTML, segments have headings which are generated from the values of the <div1> tags.
<group> <text n="1"> <body> <head lang="eng">A version of unknown provenance, supplemented from Nippur mss.</head> <div1 type="segment" n="A"> <l n="1">…</l> <l n="2">…</l> … </div1> … </body> </text> <text n="2"> … </text> </group>
The line group tag, <lg>, is used to group lines functioning as a formal unit, that is individual proverbs (the first example) and specified sections of hymns (the second example). Neither unit has a <head> tag. However, for display in HTML, individual proverbs have headings which are generated from the values of their <lg> tags.
<text lang="sux"> <body> <div1 type="segment" n="A"> <lg type="proverb" n="21.a1"> <l n="1">…</l> <l n="2">…</l> … </lg> </div1> … </body> </text>
Line groups within hymns typically end with a <trailer> tag specifying the Sumerian technical term for the unit (referred to as a rubric).
<text lang="sux"> <body> <lg n="A" type="kirugu"> <l n="1">…</l> <l n="2">…</l> … <trailer place="rubric" type="kirugu"> <l n="21">…</l> </trailer> </lg> … </body> </text>
The <trailer> tag is also used to specify the Sumerian technical term for particular types of composition (referred to as subscripts).
<text lang="sux"> <body> <l n="1">…</l> <l n="2">…</l> … <trailer place="subscript" type="cir-gida"> <l n="208">…</l> </trailer> </body> </text>
The translations have a parallel structure to the transliterations, except that they contain no lines only paragraphs (<p>). Several transliterated lines usually correspond to only one paragraph in the translation.
<text lang="eng"> <body> <div1 type="segment" n="A"> <lg type="proverb" n="14.1"> <p>…</p> … </lg> </body> </text>
Another tag which has an important structuring function is <addSpan/>. This element is described next together with the <altGrp>, <alt/>, <linkGrp>, and <link/> tags.
2.5.2 The <addSpan/>, <anchor/>, <altGrp>, <alt/>, <linkGrp> and <link/> tags
Since we are often dealing with composite editions compiled from several sources, it has been necessary to devise a way to tag differences between the same span of text. The span can be from one word within a line to a large number of consecutive lines.
One source or a minority of the sources may contain additional words or lines not found in the rest of the sources. These additional words or lines are then encoded using the code pair <addSpan/> and <anchor/>. In addition, a note stating that what follows occurs in one or a minority of the sources is included.
<l n="254">cu-kal-le-tud-da
<addSpan to="c133.v9"/>
<note lang="eng" target="c133.v9">1 ms. adds:</note>
dili-ni
<anchor id="c133.v9"/>
ni2-ta ba-ab-tur-tur-re
</l>
Here, one manuscript (1 ms.) adds the word 'dili-ni', which is not found in the other sources. The <addSpan/> tag marks the start of the addition, and the <anchor/> tag marks the end of it. Since such additions can be nested, it is necessary to uniquely identify exactly where this addition begins and where it ends. This is done by having the value of the to attribute of the <addSpan/> tag match the value of the id attribute of the <anchor/> tag.
In the next example, a line of text has been added. Note the numbering of the lines in this case. The line number of the additional line is 204A. When there are many additional lines, or the additions are nested, e.g. where there are lines added within other lines, the numbering of the additional lines becomes quite elaborate and both upper and lower case letters are used.
<l n="204">jiri3-ni gakkul zabar-ra im-ma-an-cu2-cu2</l>
<addSpan to="c151.v3"/>
<note lang="eng" target="c151.v3">1 ms. adds 1 line:</note>
<l n="204A">cu-si-ni bulug &jic;tackarin mu-na-an-bur12-bur12-re</l>
<anchor id="c151.v3"/>
<l n="205">ackud2-bi i3 he-nun-na-ka cu ga-am3-ma-ni-ib-ur3</l>
A second use of the <addSpan/> and <anchor/> tags is in cases where we have two or more alternatives (variants) within the sources. To separate this type from the type where one or more words or lines are simply added, a different wording of the note is used. In the following example, one source has a one-line alternative to the other source(s).
<addSpan to="c4061.v7"/>
<l n="29">i3-ne-ec2 lu2 lu2-ra a-na an-na-an-dug4</l>
<l n="30">lu2-ulu3 lu2-ra AC a-na an-na-an-tah</l>
<anchor id="c4061.v7"/>
<addSpan to="c4061.v8"/>
<note lang="eng" target="c4061.v7">instead of lines 29-30, 1 ms. has:</note>
<l n="A">i3-ne-ec2 &d;utu ud ne-a</l>
<anchor id="c4061.v8"/>
<altGrp><alt targType="addSpan" targets="c4061.v7 c4061.v8"/></altGrp>
To group the variants explicitly, the <altGrp> and <alt/> tags are used. The value of the targets attribute of the <alt/> tag links the variants by including the values of the to/id attributes of the <addSpan/>/<anchor/> tags.
The <linkGrp> and <link/> tags perform a similar function to the <altGrp> and <alt/> tags, but they group or link elements other than variants, e.g. line groups that cross segment boundaries. The fact that line groups are here referred to as sections (sec) has historical reasons.
<lg id="c2512.sec1" n="C" type="kirugu">
…
</lg>
<lg id="c2512.sec2" n="C" type="kirugu">
…
</lg>
<linkGrp><link targType="section" targets="c2512.sec1 c2512.sec2"/>
The <l> tag in the transliterations is the most important structuring device and corresponds to a line of cuneiform as found on a clay tablet. In many cases it will correspond to a sentence in the form of a verb and its arguments, but not always. Since it is very difficult to translate line by line from Sumerian into English, the <p> element, which is only found in the translations, contains a coherent English prose translation of several Sumerian lines. This means that there is a one-to-many relationship between an English paragraph (<p>) and the corresponding Sumerian lines. Several Sumerian lines thus have links to the same English paragraph, as shown in the next example.
Sumerian original:
<l n="9" id="c151.9" corresp="t151.p2">ur-saj-me-en je26-e iri-ju10-ce3 ga-jen</l>
<l n="10" id="c151.10" corresp="t151.p2">je26-e iri-ju10-ce3 ga-jen a-a-ju10-ce3 ga-an-ci-jen</l>
<l n="11" id="c151.11" corresp="t151.p2">&d;suen-me-en je26-e iri-ju10-ce3 ga-jen</l>
<l n="12" id="c151.12" corresp="t151.p2">je26-e iri-ju10-ce3 ga-jen a-a-ju10-ce3 ga-an-ci-jen</l>
<l n="13" id="c151.13" corresp="t151.p2">a-a-ju10 &d;en-lil2-la2-ce3</w> ga-an-ci-jen</l>
<l n="14" id="c151.14" corresp="t151.p2">je26-e iri-ju10-ce3 ga-jen ama-ju10-ce3 ga-an-ci-jen</l>
<l n="15" id="c151.15" corresp="t151.p2">ama-ju10 &d;nin-lil2-la2-ce3</w> ga-an-ci-jen</l>
<l n="16" id="c151.16" corresp="t151.p2">a-a-ju10-ce3 ga-an-ci-jen</l>
English prose translation:
<p id="t151.p2" n="9-16" corresp="c151.9">I, the hero, will set off for my city. I will set off for my city, I will set off to my father. I, Suen, will set off for my city. I will set off for my city, I will set off to my father. I will set off to my father Enlil. I will set off for my city, I will set off to my mother. I will set off to my mother Ninlil. I will set off to my father.</p>
The explicit linking is achieved by having paired id and corresp attributes. The n attribute of the <p> tag encodes the span of Sumerian lines corresponding to the English paragraph.
In a handful of cases, we have had to introduce dummy lines and paragraphs to be in accordance with the TEI guidelines. In these cases, the n attribute has the value "x".
To hold grammatical and other types of information about Sumerian words, the <w> element is used. The content of the <w> element is comparable to an orthographic word in e.g. Modern English. All recognisable Sumerian words have a lemma, a part of speech, and a label attribute.
<w form="cul" lemma="cul" pos="N" label="young man">cul</w>
<w form="he2-tum2" lemma="de6" pos="V" label="to carry">he2-tum2</w>
The form attribute of the <w> tag has been introduced to simplify the recognistion of the word form, which may have various tags embedded in it.
<w form="na-an-dug4" lemma="dug4" pos="V" label="to say">na<suppliedEnd/>-<damage/>an<damageEnd/>-dug4</w>
The following is a list of possible attributes (some of these are not part of the standard TEI.2 DTD):
The pos and type attributes can have the following values.
Part-of-speech information | |||
Pos value | Part of speech | Type values | Example/Comment |
AJ | adjective | kug, gal-an-zu | |
AV | adverb | a-da-al, ga-nam, me-a | |
C | conjunction | tukum-bi, u3 | |
I | interjection | e, u-u8-a | |
N | noun | ideophone, DN, EN, GN, MN, ON, PN, RN, SN, TN, WN | The upper case values are types of proper nouns. Common nouns are not marked as such. Example of ideophone: dum-dam |
NEG | negator | nu | |
NU | numeral | cardinal, fraction, ordinal | 1, 1/2, 2-kam-ma |
PD | pronoun/determiner | demonstrative, indefinite, interrogative, nominal-relative, personal, reflexive | re, na-me, a-ba, za, ni2-bi |
V | verb |
The tags described in this section mainly occur within the <l> element in the transliterations or the <p> element in the translations.
Many of the tags involve the coding of some kind of damage to a cuneiform source or a qualification of the identification of a sign. Representing sources accurately in transliteration requires an elaborate coding scheme, which in some cases deviates from the TEI guidelines.
2.6.1 The <damage/> and <damageEnd/> tags
Damaged text is bounded by the <damage/> and <damageEnd/> tags.
<l n="29" id="c012.29">ucum huc an-<damage/>na<damageEnd/></l>
Cuneiform signs were of course never combined using hyphens. Combining cuneiform signs with hyphens is a way of approximating words in modern European languages including clitics and affixes, and is part of the transliteration practice. The hyphen is therefore never part of the damage, even though it may occur inside the two tags. Damage is not indicated in the translations.
2.6.2 The <supplied/> and <suppliedEnd/> tags
Missing, and supplied text, is bounded by the <supplied/> and <suppliedEnd/> tags.
<l n="15" id="c135.B.15" corresp="t135.p4"><supplied/>inim<suppliedEnd/> <damage/>kug<damageEnd/> …</l>
When the missing text cannot be supplied, upper case X is used to indicate that one cuneiform sign is missing, while entity &X; indicates that a sequence of more than two signs are missing.
<l n="13" id="c135.B.13" corresp="t135.p4"><supplied/>X X<suppliedEnd/> zi &jic;gu-za-ja2 nam-mah-ja2 mu-un-pad3</l>
Upper case X is also used to indicate an unreadable sign, i.e. where there is a sign, but we cannot decipher it.
<l n="1" id="c135.C.1" corresp="t135.p6">X X <supplied/>&X;<suppliedEnd/></l>
Supplied text is not specified as being so in the translations. However, missing or untranslatable text is indicated by &X;&X;. &X; displays as '…' in the HTML version.
When one line or more lines of text are missing a different tag is used (see 2.7.2).
2.6.3 The <corr>, <corr/> and <corrEnd/> tags
The <corr> tag has three variants. The first variant is in accordance with the TEI guidelines by having the corrected form as the value of a sic attribute.
<l n="16" id="c40825.16" corresp="t40825.p2"><corr sic="iti6">mul</corr> e2-ba mi-ni-di4-di4-la2-gin7</l>
Since the value of the sic attribute can be damaged or queried (see below), we have had to introduce a few entities to capture this. In the following example, the value of the sic attribute is queried.
<l n="77" id="c40833.77" corresp="t40833.p9">isin-ja2 udu-zu he2-em-mi-<corr sic="&qryb;lu&qrye;">gu7</corr></l>
The two other variants of the <corr/> tag have empty start tags with corresponding <corrEnd/> tags. The only difference between the two is the value of the resp attribute, which can be either "scrAdd" or "scrOm", the first indicating a wrong, and corrected, addition and the second an omission made by the scribe.
<l n="8" id="c0203.8">di4-<corr resp="scrOm"/>di4<corrEnd/>-la2 ud-da</l>
<l n="10" id="c0211.10">&d;en-lil2 su3-ra2-<corr resp="scrAdd"/>DU<corrEnd/>-<damage/>ce3<damageEnd/></note></l>
In these cases, there is no sic attribute.
This tag is used to indicate that the interpretation of a sign is unclear or uncertain. The type of uncertainty is encoded by the cert attribute. It can have two values, either "qry" or "cor". The first expresses doubt about whether what is written has been correctly identified. The second indicates that the sign identified is irregularly written.
<l n="701" id="c162.701" corresp="t162.p78"><unclear cert="qry">ucum</unclear> ceg11 gi4 mu-ni-UD-a-ac</l>
<l n="307" id="c217.307" corresp="t217.p39">&jic;NE-ha-<unclear cert="cor">an</unclear> mu-ra-ta-ed3-de3</l>
The <foreign> tag is most commonly found in the <title> element of the header, which has English as the default language, and in the translation text, which of course also has English as the default language.
<title>A <foreign lang="sux">&c;ir-nam&c;ub</foreign> to Nanna for Ur-Namma (Ur-Namma F)</title>
<p id="t225.p10" n="21-33" corresp="c225.D.B.21">Its good <foreign lang="sux">udug</foreign> deities went away, its <foreign lang="sux">lama</foreign> deities ran off. … </p>
The tag is also found in English language contexts within the transliteration text, e.g. in notes. Furthermore, the tag is used in a few cases where we have Akkadian interspersed between the Sumerian text.
Literary Sumerian sometimes contains words in a variety of Sumerian termed Emesal. Emesal words and phrases are tagged <distinct type="emesal">.
<l n="164" id="c144.164" corresp="t144.p18">ki-nu2 <distinct type="emesal">ze2-ze2-ba</distinct> in-nin-ra</l>
2.6.7 The <term> and <gloss> tags
Words or phrases in literary Sumerian are occasionally provided with a pronunciation guide or an Akkadian translation. To handle such glosses, the <term> and <gloss> tags are used. The word or phrase to be glossed is tagged with the <term> element, while the gloss itself is encoded by the <gloss> tag. The gloss can precede as well as follow its term.
<l>nin9 e-ju10 kug &d;nin-<term id="c113.t1">isin2</term><gloss lang="sux" target="c113.t1">si</gloss>-na-ke4</w></l>
<l>su-a ka5-a su-a-ri ur-cub5 udu-kur-ra ab2-za-za <gloss lang="sux" target="c122.t1">ugu</gloss><term id="c122.t1">ugu4</term>-bi</l>
Note that there is no space between the start tag and end tag in these cases (<gloss></term> and <term></gloss>), and that the two tags are linked using the id and target attributes. With Akkadian and other glosses, there IS a white space character between the term and the gloss tag.
<l><term id="c1815.t1">nitah saj-dili</term> <gloss lang="akk" target="c1815.t1">NITAH.ME.EC sag-di-lu-u2</gloss> je26-e-gin7 ak
<l><term id="c111.t1">dili-ni</term> <gloss lang="uncertain" target="c111.t1">TAR</gloss> jectug2-ge tuku-a &d;nin-tud ama kalam-ma-ce3</l>
2.6.8 The <ref>, <xref> and<ptr> tags
The function of the <ref> tag is to link a word or other part of the document to a note within the same document. This has been done sparingly in the transliterations, but quite often in the translations, where an explanatory note is often required.
<l> … uzu-zu-um <ref id="c134.r1">X</ref> <note id="c134.n1" lang="eng" target="c134.r1">possibly an erasure</note> i3-gu7</l>
<p> … Enlil approached the man of the <ref id="t121.r500">Id-kura</ref> <note target="t121.r500">river of the underworld</note>, the man-eating river. … </p>
The <xref> tag is used to refer to another ETCSL document, and in most cases, to a particular point in that document. The tag is found inside the <bibl> element in the transliteration header and as part of notes (see the <note> tag described below) in the transliteration and translation text.
<bibl><xref from="b266">Civil 1989c</xref><biblScope type="contents">composite text</biblScope></bibl>
<l> … ba-ni-in-dag <note lang="eng">Cited in <xref doc="c.0.2.01" from="c0201.39">OB catalogue from Nibru, at Philadelphia, 0.2.01, line 39</xref>; … </l>
The default document for the bibliography references is the file bibliography.xml.
The <ptr> tag is used in the transliteration to refer to a line in another version within the same group.
<l n="142" id="c122.1.A.142" corresp="t122.p10">2-na-ne-ne dul6-ta NE.<damage/>NE<damageEnd/>-en-ze2-en ud-da-ta tu-ud-en-ze2-en</l>
…
<l n="1" id="c122.2.B.1" corresp="t122.p14"><ptr targType="l" target="c122.1.A.142"/><supplied/>&X;<suppliedEnd/> <damage/>du7<damageEnd/>-da zal-zal-le-ze2-en <damage/>mu<damageEnd/> X <supplied/>&X;<suppliedEnd/></l>
In principle, a note can be inserted almost anywhere in an ETCSL document. However, a note will commonly occur either between lines of text or inside lines of text, as can be seen in 2.5.2 and 2.6.8. Below is a shortened example of a more general use of the <note> tag.
<div1>
<note lang="eng">It is possible that this fragment does not belong to the same composition</note>
<l>X <supplied/>&X;<suppliedEnd/></l>
…
</div1>
The <gap/> tag is used to include a special kind of note. This tag often occurs either at the beginning or at the end of a text segment. It is used, in both transliterations and translations, when one or more lines (or an unknown number of lines) is missing.
<div1>
…
<l>an-ur2 an-pa sa-par3 <supplied/>&X;<suppliedEnd/></l>
<gap extent="unknown no. of lines missing"/>
</div1>
This is the only tag, besides the <p> tag, that occurs in the translations only. It is used to indicate direct speech. It can have the two attributes who and toWhom. The latter is not part of the TEI.2 DTD.
<p id="t113.p9" n="84-85" corresp="c113.84"><q who="Anuna" toWhom="Enki">Praise be to <w type="DN">Enki</w>, the much-praised lord who controls all the arts and crafts, who takes decisions!</q></p>
2.8 Special characters and entities
Transliterated Sumerian contains three characters which are not part of the conventional Roman alphabet. These are g̃ (g with tilde), ḫ (h with breve below), š (shin = s with caron), and their corresponding upper case equivalents, G̃, Ḫ and Š. Depending on where in a document they occur, either j/J, h/H, c/C or the entities &j;/&J;, &h;/&H; and &c;/&C; are used to encode these Sumerian characters. The first method is used within the body part of the transliterations, while the other method, i.e. having the special entities, is used in the translations, in the headers of the transliterations and in other non-Sumerian language context, e.g. notes.
Apart from the special Sumerian characters, the documents also contain Akkadian, which has a number of accented vowels and three special consonants, Ṣ (sadhe) ṭ (teth), and ’ (aleph). Most of the accented vowels are covered by the ISOLat1 and ISOLat2 entity sets, while sadhe, teth and aleph have been encoded using the entities &s;/&S;, &t;/&T; and ℵ. All entities special to ETCSL documents are listed in the file "etcsl-sux.ent", which is invoked during the XML parse process.
In addition to the character entities mentioned above, the ETCSL entity set contains a number of other entities, e.g. &hr;, +, × (×) and the so-called determinatives. The &hr; entity is used to encode a horizontal line often found on the clay tablets. The × (×) entity is part of the ISOnum entity set and encodes the multiplication sign. The + entity is used to encode addition. Together with '.', '/', '+' and ':', which are not encoded as entities, × has a special meaning when combining two or more cuneiform signs (see 2.2 Transliterating cuneiform writing).
Determinatives, which occur either before or after a cuneiform sign, are (thought to be) unspoken signs which indicate the semantic set to which a noun belongs. Examples are &d; placed before names of deities and &ki; used after place names.
Finally, &s0;, &s1;, etc. to &s9; are used to mark subscript numbers in the non-Sumerian language contexts (see 2.2 Transliterating cuneiform writing). The headers also contain entities such as & for '&'.
© Copyright 2003, 2004, 2005, 2006 The ETCSL project, Faculty of Oriental Studies, University of Oxford |