filename: \TEXTS\SWIFT\CONVENT Input Conventions for CCRH Texts What our programs assume is plain ASCII, but how an individual prepares a text is of no concern. Our caution is that many Word Processing programs make use of "other" ASCII characters which are not displayed and which will interfere with the running of our software. For preparing text-files, we recommend KEDIT, the text editor of the Mansfield Software Group. The difficulty with preparing any text for processing is coming to understand that a literal representation of one's text is inadequate for a variety of reasons. In reading, we make many distinctions which we do not believe should have to be made by the software. For example, often the closing single quotation mark looks just like an apostrophe, the period marking an abbreviation looks just like the period marking the end of a sentence, and a hyphen at the end of a printed line of text may or may not be a part of the spelling of the whole word. This, for proper processing of a text, we require that such ambiguities be resolved unless a user of our software wants to live with a surprising amount of "junk" in his or her output. Similarly, there are various kinds of text units, from lines of poetry, to stanzas and paragraphs, to printed pages, to the various parts of the characters in a play which need better indentification than the simple indentation, etc., which characterizes our texts as printed. We also recognize that texts will be put to other uses than serving as so much data for concordance-generation, uses such as the retrieval of specific portions of a number of different texts which are then combined and further processed. (One might, for example, want to make a concordance to the part of a single character in a play, or a concordance to that same character's parts (e.g., Falstaff) in several plays, or to the parts of the same type of character in all of Shakespeare's plays. Novice users of CCRH's facilities rarely can envision all the uses to which they will put their texts--often there is a single goal in mind such as the production of a concordance--but our experience is that almost all users of our facilities grow with their experience, and what we have set up permits a certain amount of this growth without the penalty of having to re-do one's earlier work of inputting a text or editing it too heavily. Prior to actually beginning to input a text, everyone, however experienced, should spend some time reflecting, not just upon what he or she expects from our facilities, but very specifically about the text to be input. The direct representation of a text in standard ASCII involves a modern English language bias in that "other characters," such as those in Middle English, will have to be somehow coded, a certain nuisance in the keyboarding. Likewise, the involves a prose bias, or standard verse bias, in that any "odd" spacing or the like will also have to be coded. Although our software makes few demands concerning what code is to be used to represent what textual feature, it is in everyone's best interest to follow the general practices of the other users of our facilities. On a practical level, that means that there is less risk of error in setting up option files in order to process your text if your option files are to be just like everyone else's. On another level, the professional level, having adhered to a certain standard makes it possible to allow others to make use of your text files and for you to make use of theirs, a significant advantage that is often not appreciated until later in one's project. In addition, virtually EVERYONE encounters some idiosyncratic textual feature which requires an addition to the following "standard" conventions. If one is "doing one's own thing," how to represent that textual feature without upsetting ANY of the assumptions that have been made can become a real problem. The following conventions are tried-and-true (more or less). Every text has "units," and most texts have titles and even sub-titles. Whether titles are defined as a part of the text to be processed is up to the individual, but if they are, there is often some clutter involved. So in writing about representing texts, I want to set titles and sub-titles aside for the present. I want to begin with "units." The most obvious unit is the text as a whole. We require that the beginning of a text be marked with a "label" which is typed on a separate line. For the of our software this label is identified by \L in the first two spaces. What the \L is optional, intended to be identifying information in a concordance, or perhaps even information to retrieve texts by. There are smaller units which also should be identified, and these can be considered to refer either to the printing of one's physical text--e.g., pages in one's book--or to textual segments (e.g., paragraphs, speakers' parts, or stanzas). Labels which indicate a new page number are marked by a \P while those which indicate a textual segment are marked by a \S. With both kinds of labels, the identifying information which follows the \L label must either be repeated or adapted to convey the new information appropriate to the new unit. Following ALL labels, you should supply an indication of line number because there is so much diversity in practice. The line number is marked by a \N which is followed immediately by the appropriate line number, a blank space, and the line of text which has that line number. There is a limit to how long a line can be--about 2,000 characters--but in practice it's 50, or 60, or 75. DO NOT LIMIT YOUR LINE LENGTH BY THE WIDTH OF THE SCREEN ON YOUR COMPUTER. Successive lines will have line numbers supplied until there is another label. If there is some oddity in the lineation of your text, you can impose any line number you please at any point by beginning that line with a \N plus the line number. You can also a,b,c,... lineation by beginning the text line with the \N followed immediately by an "a". Thus that line would be line "a," the following line "b," etc. You can also force alphabetic "suffixes," by typing the \N plus the line number plus the appropriate alphabetic character, so that \N2301a would be line 2301a and would be followed by 2301b. In keyboarding your text, begin with the first character, observing shifts in case, font, etc. In what follows, it is assumed that upper- and lower-case Roman letters will be typed accordingly, as will the standard punctuation marks, WITH THE EXCEPTIONS OF double quotation marks, single quotation marks, dashes, elipses and parentheses, brackets, and braces. Since many of these can be used inside as well as outside of words or have opening and closing forms on a laser-printer, special conventions must be used to distinguish them. In the following list, lower-case only is illustrated. WHAT'S IN YOUR TEXT TYPE THIS a with an acute accent a+' a with a grave accent a+` a with a circumflex a+^ a with an umlaut a+: ae ligature a+e c cedilla c+, c "with a line" (ME mss.) c+1 d "with a tail" (ME mss.) d+, raised e ^e^ e with an acute accent e+' e with a grave accent e+` e with a circumflex e+^ f "with a line" (ME mss.) f+1 ff typeset as a character f+f double f (ME mss.) \f |f fi typeset as a character f+i double f plus i \f+i fl typeset as a character f+l double f plus l \f+l ffi typeset as a character f+f+i g "with a line" (ME mss) g+1 crossed h (ME mss) h+- raised i ^i^ i with a circumflex i+^ |i old-style uppercase I \J k "with a line" (ME mss) k+1 l+l m "with a tail" (ME mss) m+) n "with a tail" (ME mss) n+) n with a macron n+- o with a circumflex o+^ o umlaut o+: oe ligature o+e p with macron p+- r with a "tail" r+) raised t ^t^ t with a "line" t+1 thorne t+h u with a "tail" u+) u with a macron u+- u with a circumflex u+^ u umlaut u+: yoke y+o z with a "tail" (MHG) z+, dash -+- elepsis ... spaced elepsis .+.+. comma in numerals \, period in abbreviations \. decimal point |. left internal bracket \[ right internal bracket \] hyphen at end of printed line \- hyphen at e.o.l. (to disappear) |- left double quotation mark "[ right double quotation mark "] ampersand = "et" \& \/ \` \' |' |# forced blank in text \b larger forced blank in text \B no blank at end of line | word terminator (leaves no space) \T Roman type (default) \r (continues until changed) Bold Roman \R " Italic \i " Bold Italic \I " Font size flag (default=0) \s Various text files will require titles, some to be centered, and/or "comments," such as contextual or bibliographical information. The demands of printing a formatted text in a conventional way and of concording that and formatting the concordance introduce a level of ambiguity here, often resolved best with some editing if a file is to be used for both purposes. A "comment" is to be flagged by a \C at the beginning of the line; this may also be used for left-justified titles in to-be-formatted texts. A "centered comment" is to be similarly flagged by a |C.