filename: \TEXTS\SWIFT\CONVENT


                      Input Conventions for CCRH Texts

     What our programs assume is plain ASCII, but how an individual prepares
a text is of no concern. Our caution is that many Word Processing programs
make use of "other" ASCII characters which are not displayed and which will
interfere with the running of our software. For preparing text-files, we
recommend KEDIT, the text editor of the Mansfield Software Group.
     The difficulty with preparing any text for processing is coming to
understand that a literal representation of one's text is inadequate for a
variety of reasons. In reading, we make many distinctions which we do not
believe should have to be made by the software. For example, often the closing
single quotation mark looks just like an apostrophe, the period marking an
abbreviation looks just like the period marking the end of a sentence, and
a hyphen at the end of a printed line of text may or may not be a part of the
spelling of the whole word. This, for proper processing of a text, we require
that such ambiguities be resolved unless a user of our software wants to live
with a surprising amount of "junk" in his or her output.
     Similarly, there are various kinds of text units, from lines of poetry,
to stanzas and paragraphs, to printed pages, to the various parts of the
characters in a play which need better indentification than the simple
indentation, etc., which characterizes our texts as printed. We also recognize
that texts will be put to other uses than serving as so much data for
concordance-generation, uses such as the retrieval of specific portions of
a number of different texts which are then combined and further processed. (One
might, for example, want to make a concordance to the part of a single
character in a play, or a concordance to that same character's parts (e.g.,
Falstaff) in several plays, or to the parts of the same type of character in
all of Shakespeare's plays. Novice users of CCRH's facilities rarely can
envision all the uses to which they will put their texts--often there is a
single goal in mind such as the production of a concordance--but our experience
is that almost all users of our facilities grow with their experience, and
what we have set up permits a certain amount of this growth without the penalty
of having to re-do one's earlier work of inputting a text or editing it too
heavily.
     Prior to actually beginning to input a text, everyone, however experienced,
should spend some time reflecting, not just upon what he or she expects from
our facilities, but very specifically about the text to be input. The direct
representation of a text in standard ASCII involves a modern English language
bias in that "other characters," such as those in Middle English, will have to
be somehow coded, a certain nuisance in the keyboarding. Likewise, the
involves a prose bias, or standard verse bias, in that any "odd" spacing or
the like will also have to be coded. Although our software makes few demands
concerning what code is to be used to represent what textual feature, it is in
everyone's best interest to follow the general practices of the other users of
our facilities. On a practical level, that means that there is less risk of
error in setting up option files in order to process your text if your option
files are to be just like everyone else's. On another level, the professional
level, having adhered to a certain standard makes it possible to allow others
to make use of your text files and for you to make use of theirs, a significant
advantage that is often not appreciated until later in one's project. In
addition, virtually EVERYONE encounters some idiosyncratic textual feature which
requires an addition to the following "standard" conventions. If one is "doing
one's own thing," how to represent that textual feature without upsetting ANY of
the assumptions that have been made can become a real problem. The following
conventions are tried-and-true (more or less).
     Every text has "units," and most texts have titles and even sub-titles.
Whether titles are defined as a part of the text to be processed is up to the
individual, but if they are, there is often some clutter involved. So in
writing about representing texts, I want to set titles and sub-titles aside for
the present. I want to begin with "units."


     The most obvious unit is the text as a whole. We require that the beginning
of a text be marked with a "label" which is typed on a separate line. For the
of our software this label is identified by \L in the first two spaces. What
the \L is optional, intended to be identifying information in a concordance,
or perhaps even information to retrieve texts by.
     There are smaller units which also should be identified, and these can
be considered to refer either to the printing of one's physical text--e.g.,
pages in one's book--or to textual segments (e.g., paragraphs, speakers' parts,
or stanzas). Labels which indicate a new page number are marked by a \P while
those which indicate a textual segment are marked by a \S. With both kinds of
labels, the identifying information which follows the \L label must either
be repeated or adapted to convey the new information appropriate to the new
unit.
     Following ALL labels, you should supply an indication of line number
because there is so much diversity in practice. The line number is marked by
a \N which is followed immediately by the appropriate line number, a blank
space, and the line of text which has that line number. There is a limit to how
long a line can be--about 2,000 characters--but in practice it's 50, or 60, or
75. DO NOT LIMIT YOUR LINE LENGTH BY THE WIDTH OF THE SCREEN ON YOUR COMPUTER.
Successive lines will have line numbers supplied until there is another label.
If there is some oddity in the lineation of your text, you can impose any
line number you please at any point by beginning that line with a \N plus the
line number. You can also a,b,c,... lineation by beginning the text line with
the \N followed immediately by an "a". Thus that line would be line "a," the
following line "b," etc. You can also force alphabetic "suffixes," by typing
the \N plus the line number plus the appropriate alphabetic character, so that
\N2301a would be line 2301a and would be followed by 2301b.
     In keyboarding your text, begin with the first character, observing shifts
in case, font, etc. In what follows, it is assumed that upper- and lower-case
Roman letters will be typed accordingly, as will the standard punctuation marks,
WITH THE EXCEPTIONS OF double quotation marks, single quotation marks, dashes,
elipses and parentheses, brackets, and braces. Since many of these can be used
inside as well as outside of words or have opening and closing forms on a
laser-printer, special conventions must be used to distinguish them.
     In the following list, lower-case only is illustrated.

      WHAT'S IN YOUR TEXT                   TYPE THIS
   a with an acute accent                   a+'
   a with a grave accent                    a+`
   a with a circumflex                      a+^
   a with an umlaut                         a+:
   ae ligature                              a+e
   c cedilla                                c+,
   c "with a line" (ME mss.)                c+1
   d "with a tail" (ME mss.)                d+,
   raised e                                 ^e^
   e with an acute accent                   e+'
   e with a grave accent                    e+`
   e with a circumflex                      e+^
   f "with a line" (ME mss.)                f+1
   ff typeset as a character                f+f
   double f (ME mss.)                       \f
                                            |f
   fi typeset as a character                f+i
   double f plus i                          \f+i
   fl typeset as a character                f+l
   double f plus l                          \f+l
   ffi typeset as a character               f+f+i
   g "with a line" (ME mss)                 g+1
   crossed h (ME mss)                       h+-
   raised i                                 ^i^
   i with a circumflex                      i+^
                                            |i
   old-style uppercase I                    \J
   k "with a line" (ME mss)                 k+1
                                            l+l


   m "with a tail" (ME mss)                 m+)
   n "with a tail" (ME mss)                 n+)
   n with a macron                          n+-
   o with a circumflex                      o+^
   o umlaut                                 o+:
   oe ligature                              o+e
   p with macron                            p+-
   r with a "tail"                          r+)
   raised t                                 ^t^
   t with a "line"                          t+1
   thorne                                   t+h
   u with a "tail"                          u+)
   u with a macron                          u+-
   u with a circumflex                      u+^
   u umlaut                                 u+:
   yoke                                     y+o
   z with a "tail" (MHG)                    z+,

   dash                                     -+-
   elepsis                                  ...
   spaced elepsis                           .+.+.
   comma in numerals                        \,
   period in abbreviations                  \.
   decimal point                            |.
   left internal bracket                    \[
   right internal bracket                   \]
   hyphen at end of printed line            \-
   hyphen at e.o.l. (to disappear)          |-
   left double quotation mark               "[
   right double quotation mark              "]
   ampersand = "et"                         \&
                                            \/
                                            \`
                                            \'
                                            |'
                                            |#
   forced blank in text                     \b
   larger forced blank in text              \B


   no blank at end of line                  |
   word terminator (leaves no space)        \T

   Roman type (default)                     \r (continues until changed)
   Bold Roman                               \R      "
   Italic                                   \i      "
   Bold Italic                              \I      "
   Font size flag (default=0)               \s

Various text files will require titles, some to be centered, and/or "comments,"
such as contextual or bibliographical information. The demands of printing a
formatted text in a conventional way and of concording that and formatting the
concordance introduce a level of ambiguity here, often resolved best with some
editing if a file is to be used for both purposes.
     A "comment" is to be flagged by a \C at the beginning of the line; this may
also be used for left-justified titles in to-be-formatted texts. A "centered
comment" is to be similarly flagged by a |C.