banner
SIL International Home

Implementation of the ESIS file parser

From:

Importing SGML data into CELLAR by means of architectural forms
Gary F. Simons
Summer Institute of Linguistics

Last revised: 12 November 1997


Contents

  1. Overview
  2. Sample of input
  3. An introduction to CELLAR's sublanguage for defining parsers
  4. The main recursive function
  5. The supporting code


1. Overview

This document describes the implementation of the parser that converts an SGML document (which has  been mapped onto CELLAR architectural forms) into objects in the CELLAR database.  (See the main paper for an explanation of the mapping process.) The input is actually the ESIS file output by the SGML parser. After briefly introducing the sublanguage used to define parsers in CELLAR, the source code is given for the main parsing function that recursively handles one SGML element at a time.  Finally, all the smaller supporting functions are presented.

2. Sample of ESIS input

The output of the mapping performed by the nsgmls parser is not actually in SGML format.  Rather it is a simple text representation of the architectural document's Element Structure Information Set. (See http://www.sil.org/sgml/topics.html#esis for information about ESIS.)  It is a file in this ESIS format that the CELLAR parser reads.  See http://jclark.com/sp/sgmlsout.htm for an explanation of the nsgmls output format.

For instance, here is the beginning of the ESIS output file for the CriticalText example (see complete file).  Note that attribute lines for attributes with IMPLIED values have been omitted to improve readability.

ACLASS CDATA CriticalText
APCDATACLASS CDATA String
(OBJECT
(IGNORE
(IGNORE
ACONTENTATTR CDATA title
APCDATACLASS CDATA String
(ATTR
-2 Clement, chapter 7
)ATTR
ACONTENTATTR CDATA authorities
APCDATACLASS CDATA String
(ATTR
ACLASS CDATA Manuscript
ACONTENTATTR CDATA description
APCDATACLASS CDATA String
AID TOKEN A
AATTRNAME CDATA siglum
AATTRVALUE CDATA A
AATTRTYPE CDATA String
(OBJECT
-Codex Alexandrinus\n
ACONTENTATTR CDATA source
APCDATACLASS CDATA String
(ATTR
-A Greek uncial of the fifth century.  Housed in the British \nMuseum.  Published in:  The Codex Alexandrinus in reduced photographic\n facsimile, with an introduction by F. G. Kenyon, London 1909.
)ATTR
)OBJECT
...

The first character of each line is a command character that identifies the kind of information on the line:

The A command has three fields: attribute name, type, and value. In character data, \n represents a line end.

3. An introduction to CELLAR's sublanguage for defining parsers

CELLAR's model of class definitions for objects [RST93] includes not only the attributes which store information, but also:

The programming language built into CELLAR has sublanguages for each of the above purposes. The full documentation for these are given in the books CELLAR Programmer's Tutorial and CELLAR Programmer's Reference in the infobase file \LIBRARY\WRKLNK.NFO on the LinguaLinks CD-ROM [SIL97].  

The function that converts the data in the ESIS file to the equivalent structure of objects in CELLAR is implemented as a parser. A parser definition has the following form:

parser-name( parameter-list ) -->
   local-variable-declarations
   pattern

The body of the parser is a regular-expression-like pattern.  The following metacharacters are used to express a pattern:

{    }	encloses a set of alternative patterns
(    )	groups a sequence of patterns into a single pattern
?	an optional pattern
*	the pattern may match zero or more times
+	the pattern must match one or more times

The following are some of the primitive patterns out of which complex patterns can be built:

'xxx'	  match a literal xxx
nl        match a newline (i.e. line end)
blank     match a span of spaces, tabs, and newlines
c.n	  invoke parser named n for class c
v:=p      set variable v to the value returned by parsing pattern p

For instance, the following pattern assumes that test and action have been declared as local variables.  It repeatedly matches one of three alternatives: a line beginning with 'if' (in which case it puts the contents of the line into test), a line beginning with 'do' (in which case it puts the contents of the line into action), or any other line (in which it does nothing with the contents).  String.upToNL is a built-in parser for class String which matches (and returns) all the characters up to (but not including) the next newline:

*{ ( 'if' blank test:=String.upToNL nl )
   ( 'do' blank action:=String.upToNL nl ) 
   ( String.upToNL nl )
 }

There are two special pseudo-patterns that provide an interface to the sublanguage for expressing actions and queries:

do(    )  perform the enclosed action (which may return a value)
test(  )  treat the enclosed boolean query as a pattern that succeeds or fails

Some key constructs of that sublanguage are:

^v         get value of the local variable or parameter named v
!g         get value of the global variable named g
a of e get attribute a of each object returned by expression e do m to e execute method m on each object returned by expression e [ ] create a sequence of objects m over s execute method m over the sequence s as a whole

This brief introduction to the programming language should make it possible to read the source code which follows.

4. The main recursive function

The heart of the implementation is a parser named ESISelement which is called recursively for each element and each instance of character data in the ESIS file.  The loadESISfile method (which is defined below) opens the ESIS file and sets up the parameters for the top-level call to the ESISelement parser.

The complete source code of the ESISelement parser follows, with commentary inserted between the major sections of the code.  At the beginning is a header comment which summarizes the effect of the parser and the functions of its parameters:

/* ESISelement : a ParserDefn on Object ****************************************
Description:
    Reads one element out of an ESIS (i.e. parsed SGML) file
    that has been mapped onto CELLAR architectural forms and builds
    the CELLAR object that corresponds to it (along with everything it
    recursively owns).

Parameters: 
    currentObject -- This is the object that is currently being built. 
        It will be the owner of the object that this call with create.
    currentAttr -- This is the attribute of the currentObject that new  
        content is currently being added to.  If the value of this 
        parameter is DISCARD, then the potential value is discarded.
    idTable -- This is the table of SGML ID to CELLAR object associations.
    currentEncoding -- This is the encoding that strings in element 
        content should be put into.
    pcdataClass -- The class of basic object to create for PCDATA.
    textBefore, textAfter, textBetween -- Strings to add before, after, 
        and between PCDATA that become Strings.

Side Effects:
    This parser does not return an object. Rather it modifies
    the currentObject which was called by reference.

************************************************************************************************/
Next come the declarations of the parser name, the parameters, and the local variables.
ESISelement ( currentObject   : Object    default is lit. missing,
              currentAttr     : String    default is lit. missing,
              idTable         : IdTable   default is lit. missing,
              currentEncoding : Encoding  default is lit. missing,
              pcdataClass     : String    default is 'String',
              textBetween     : String    default is  lit. missing,
              textBefore      : String    default is lit. missing,
              textAfter       : String    default is lit. missing  ) -->

var void, cdata, newObject, newObject2, attrValue, 
    class, contentAttr, parentAttr, id, attrName, attrValue, attrType,
    class2, contentAttr2, attrName2, attrValue2, attrType2
The first action taken in the parser is to read all the architectural attributes for the next element; in the ESIS file these are in A lines.  This is a large repeatable alternatives pattern.  On each repetition, one parenthesized pattern matches. Since the asterisk operator makes it optional, it simply falls through if there are no attributes at all; this is the case when the next thing in the ESIS file is a line of data (i.e. beginning with the hyphen (-) code).  In the ESIS file, an attribute value of IMPLIED is given when the SGML document specifies no value for an architectural attribute; the parser must explicitly match these lines in order to consume them.  For an attribute that allows a conditional expression, the value is read by using the String.ESISevalExpr parser (which evaluates the expression as it is being parsed and returns just the value appropriate for the current context); otherwise, the value is simply copied from the file by using the String.upToNL parser (nl = newline).
/* First, read the attributes */
*{ ('ACLASS CDATA ' class:=String.ESISevalExpr(^currentObject) nl)
   ('ACONTENTATTR IMPLIED' nl)
   ('ACONTENTATTR CDATA ' contentAttr:=String.ESISevalExpr(^currentObject) nl)
   ('APARENTATTR IMPLIED' nl)
   ('APARENTATTR CDATA '  parentAttr := String.ESISevalExpr(^currentObject) nl
         do( if exists of ^parentAttr then currentAttr:=^parentAttr)   )
   ('APCDATACLASS IMPLIED' nl)
   ('APCDATACLASS CDATA ' pcdataClass:=String.ESISevalExpr(^currentObject) nl)
   ('AATTRNAME IMPLIED' nl)
   ('AATTRNAME CDATA ' attrName:=String.ESISevalExpr(^currentObject) nl)
   ('AATTRVALUE IMPLIED' nl)
   ('AATTRVALUE CDATA ' attrValue:=String.upToNL nl)
   ('AATTRTYPE IMPLIED' nl)
   ('AATTRTYPE CDATA ' attrType:=String.ESISevalExpr(^currentObject) nl)
   ('AID IMPLIED' nl)
   ('AID TOKEN ' id:=String.upToNL nl)
   ('AENCODING IMPLIED' nl)
   ('AENCODING CDATA ' cdata:=String.ESISevalExpr(^currentObject) nl
        do( currentEncoding:= encodingWithCode(^cdata) of !Configuration ) )

   ('ACLASS2 CDATA ' class2:=String.ESISevalExpr(^currentObject) nl)
   ('ACONTENTATTR2 IMPLIED' nl)
   ('ACONTENTATTR2 CDATA ' contentAttr2:=String.ESISevalExpr(^currentObject) nl)
   ('AATTRNAME2 IMPLIED' nl)
   ('AATTRNAME2 CDATA ' attrName2:=String.ESISevalExpr(^currentObject) nl)
   ('AATTRVALUE2 IMPLIED' nl)
   ('AATTRVALUE2 CDATA ' attrValue2:=String.upToNL nl)
   ('AATTRTYPE2 IMPLIED' nl)
   ('AATTRTYPE2 CDATA ' attrType2:=String.ESISevalExpr(^currentObject) nl)

   ('ATEXTBEFORE IMPLIED' nl)
   ('ATEXTBEFORE CDATA ' textBefore:=String.ESISevalExpr(^currentObject) nl)
   ('ATEXTAFTER IMPLIED' nl)
   ('ATEXTAFTER CDATA ' textAfter:=String.ESISevalExpr(^currentObject) nl)
   ('ATEXTBETWEEN IMPLIED' nl)
   ('ATEXTBETWEEN CDATA ' textBetween:=String.ESISevalExpr(^currentObject) nl)

   ('A' void:=String.upToNL nl /* Ignore anything else */ ) }
The remainder of the parser is a large alternatives pattern. There are five alternatives: one for PCDATA and the others for the four architectural forms--IGNORE, ATTR, OBJECT, and DOUBLE.
{ /*Then perform the action that corresponds to the architectural form */
When the next line of the ESIS file is a line of character data, we first test if the currentAttr is set to 'DISCARD'; if so, we do nothing more which has the effect of discarding the data. Otherwise, we must add the data to the current attribute of the current object.  There are two if statements.   The first converts the string of data from the ESIS file into the appropriate CELLAR object.  In the case of String and Text a special ESIS parser (defined below) is used that handles the line break (\n) codes and sets the encoding appropriately. Furthermore, if we are building a String, the textBefore, textAfter, and textBetween strings are concatenated to it. The second if statement handles putting the data into the currentAttr of the currentObject.  If the data item is a String or Text and the target attribute can store only a single value, then the new data is concatenated to the end of the existing attribute value; otherwise, the new value is just appended to the attribute (which overwrites an atomic attribute and adds another value to a sequence attribute).
  /* On DATA: add an object of pcdataClass to the currently open attr  */
  ('-' cdata:=String.upToNL nl
       do( if (^currentAttr ~= 'DISCARD') then
           begin
             if (^pcdataClass = 'String')
                then newObject := join over [
                   if storesValue(^currentAttr) of ^currentObject then ^textBetween,
                   ^textBefore,
                   parse ^cdata using String.ESIS( ^currentEncoding ),
                   ^textAfter ]
                else if (^pcdataClass = 'Text')
                   then newObject := parse ^cdata using Text.ESIS( ^currentEncoding )
                   else newObject := parse ^cdata using Action(!Action(^pcdataClass)).default 
             if ( accepts(^newObject) of !Text  and
                  atomic of attrDefnFor(^currentAttr) of class of ^currentObject )
                then set Action( ^currentAttr) of ^currentObject to
                   join over [ Action( ^currentAttr) of ^currentObject, ^newObject ]
                else append ^newObject to Action(^currentAttr) of ^currentObject
           end )
        )
When the next line of the ESIS file is an IGNORE element, we don't do anything at this level.  We make a recursive call to parse zero or more embedded ESIS elements, and then match the end tag for the IGNORE.
   /* On IGNORE: recurse as though this element wasn't there */
   ( '(IGNORE' nl
     *Object.ESISelement( ^currentObject, ^currentAttr, ^idTable, ^currentEncoding,
                          ^pcdataClass, ^textBetween, ^textBefore, ^textAfter )
     ')IGNORE' nl
     )
When the next line of the ESIS file is an ATTR element, we simply recurse with the value given for the contentAttr architectural attribute as the new value of the currentAttr parameter. Since textBefore and textAfter are not architectural attributes of ATTR, no value is given for these parameters in the recursive call. This has the effect of supplying the default values which are declared to be "missing" (or empty).
   /* On ATTR: recurse with new contentAttr */
   ( '(ATTR' nl
     *Object.ESISelement( ^currentObject, ^contentAttr, ^idTable, ^currentEncoding,
                          ^pcdataClass, ^textBetween )
     ')ATTR' nl
     )
When the next line of the ESIS file is an OBJECT element, we recurse with a new value of the currentAttr parameter as for ATTR. But first, we also create a new object to pass as a new value of currentObject.  The creation of the new object proceeds in five steps:

When the currentAttr is specified as 'DISCARD', the new object does not get added to the current object.  Although everything else happens, including the recursive building of the new object, the effect is that it gets discarded since it never gets added to the current object.

   /* On OBJECT: create it, add to currentAttr, and recurse */
   ( '(OBJECT' nl
     do( begin
           newObject := create of !Action(^class)
           if (^currentAttr ~= 'DISCARD')
              then append ^newObject to Action(^currentAttr) of ^currentObject
           if exists of ^id then do add(^id, ^newObject) to ^idTable
           do ESISattribute( ^attrName, ^attrType, ^attrValue, ^idTable) to ^newObject
           do ESISattribute( ^attrName2, ^attrType2, ^attrValue2, ^idTable) to ^newObject
         end )
     *Object.ESISelement( ^newObject, ^contentAttr, ^idTable, ^currentEncoding,
                          ^pcdataClass, ^textBetween )
     ')OBJECT' nl
     )
The behavior for the final alternative, the DOUBLE element, is similar to that for OBJECT.  The difference is that two objects are created.  The first object is put into the currentAttr of the currentObject, the second object is put into the contentAttr of the first object, and embedded content when the parser recurses is put into the contentAttr2 of the second object.  Note that attrName, attrType, and attrValue apply to the first object, while attrName2, attrType2, and attrValue2 apply to the second object.  Note too that assigning an ID in this construction is not yet supported; the plan is to provide architectural support for assigning it to either the first object or the second object.
   /* On DOUBLE: create both objects and recurse */
   ( '(DOUBLE' nl
     do( begin
           newObject := create of !Action(^class)
           if (^currentAttr ~= 'DISCARD')
              then append ^newObject to Action(^currentAttr) of ^currentObject
           do ESISattribute( ^attrName, ^attrType, ^attrValue, ^idTable) to ^newObject
           newObject2 := create of !Action(^class2)
           if (^contentAttr ~= 'DISCARD')
              then append ^newObject2 to Action(^contentAttr) of ^newObject
           do ESISattribute( ^attrName2, ^attrType2, ^attrValue2, ^idTable) to ^newObject2
         end )
     *Object.ESISelement( ^newObject2, ^contentAttr2, ^idTable, ^currentEncoding, 
                          ^pcdataClass, ^textBetween )
     ')DOUBLE' nl
     )
   }

5. The supporting code

This section gives the source code for all the smaller parsers and methods that support the main parser.

5.1 The method for loading an ESIS file

The ESISelement parser cannot be called on an ESIS file directly.  Rather, it needs a driver function to set up the parameters and then call it.  The function is named loadESISfile and it is a method defined on the RootFolder.  When executed, this method does the following:

The source code is as follows:

/* loadESISfile  : a MethodDefn on RootFolder **********************************
Description: 
   Reads an ESIS (parsed SGML) file that is mapped onto CELLAR
   architectural forms, builds the corresponding CELLAR object, and
   appends it to the contents of the RootFolder

Returns:  Nothing

Side Effects:  Adds new item to end of contents of RootFolder

************************************************************************************************/
loadESISfile : means
   begin
      var file, idTable
      file := do getFilePathName ( "ESIS file to load" ) to !System
      idTable := create of !IdTable
      parse ^file using ( Object.ESISelement( self, "contents", ^idTable,
                                              defaultEncoding of !Configuration )
                          /* The file may have a C at end to signal that
                             it was an SGML conformong document */
                          ?'C' ?blank )
   end

5.2 The parser for evaluating conditional expressions

The value of an architectural attribute may be a conditional expression.  (The syntax of conditional expressions is reviewed in the header comment in the parser definition below.)  When the CELLAR parser reads the value of an architectural attribute, it does so with the String parser named ESISevalExpr. This parser implements the logic which evaluates the conditional expression. The only condition that can be tested in these conditional expressions is the class of the object currently being built.  Thus the parser is passed that object in a parameter named currentObject.  The source code is as follows:

/* ESISevalExpr : a ParserDefn on String ****************************************
Description:
    Evaluates a guarded expression in ESIS file to determine
    appropriate class or attribute name for this context.

        guarded-expression ::=  guarded-case* otherwise-case
        guarded-case ::= "if" current-class  target-value
        otherwise-case  ::=  target-value
        target-value ::=  quoted-string  |  cellar-name  |  "MISSING"

    The guarded-cases are tested in order.  If current-class is the
    class of the current object, then that target-value is returned.
    Otherwise, the value for the otherwise-case is returned.
    A target-value of "MISSING" returns nothing

    Examples:
        heading
        if CaptionedChunk caption  heading
        if CaptionedChunk caption  if Article titleField  heading
         
Parameters:
    currentObject -- The object currently being built

************************************************************************************************/
ESISevalExpr( currentObject : Object  ) -->
   var current, target, void

   { /* If there's an "if", process a guarded-case */
      ( 'if' blank  current:=String.cellarName blank
         target:={ 'MISSING'   String.doubleQuoted
                  String.singleQuoted   String.cellarName} blank
         /* If this current-class matches the current object, return target-value */
         { ( test( (name of class of ^currentObject = ^current) )
             void:=String.upToNL do( ^target ) )
         /* Else recurse to eval the rest of the expression */
         String.ESISevalExpr( ^currentObject )
         }
       )
      /* Otherwise, return the otherwise-case  */
       { 'MISSING'   String.doubleQuoted
          String.singleQuoted   String.cellarName }
   }

At the top level, this parser is a pattern with two alternatives:

When there is a guarded case, the name of the current class in the expression and the corresponding target value are read by the parser.  If the current class just read is the same as the name of the class of object currently being built (currentObject), then the parser throws away the rest of the conditional expression (i.e. everything to the end of the line) and returns the target value.  Otherwise, the parser calls itself recursively to process the remainder of the expression.

The target value of a guarded case or of the otherwise case is matched by a pattern with four alternatives.  The target value may be the literal string 'MISSING', in which case no value is actually returned.  It may be a double-quoted string or it may be a single-quoted string; in either case, the appropriate string parser is called to return the value of the string between the quotes. Finally, the value may be a string that is not quoted; this is parsed by the cellarName parser.  Note, however, that the latter allows no punctuation or spaces in the string; thus a quoted string must be used when punctuation or spaces are required in the value.

5.3 The method for setting an attribute

In the CELLAR architecture, attributes of the CELLAR object are defined by giving values to the architectural attributes attrName, attrType, and attrValue, or attrName2, attrType2, and attrValue2.  When the CELLAR object is constructed, the attribute is actually set by calling the ESISattribute method of class Object.  It has four parameters.  The first three--attrName, attrType, and attrValue--are for passing in the values of the architectural attributes; the fourth, idTable, is for passing in the current table of ID-to-object associations so that IDREFs can be handled. The source code is as follows:

/* ESISattribute : a MethodDefn on Object ***********************************
Description:
    Sets an attribute of self following the specification in an ESIS
    (i.e. parsed SGML) file that has been mapped onto CELLAR architectural
    forms.  If it is a forward reference (IDREF) to an ID, then the 
    attribute value is not actually set, but an unresolved reference 
    record is set up which will set the attribute when the ID is finally 
    encountered.  If no attribute name is passed in, the method does nothing.

Parameters: 
    attrName -- The name of the attribute to set
    attrType -- The type of object to put into the attribute.  It is either
       the name of a CELLAR class, or the keyword IDREF to indicate that
       it is a reference to another object or IDREFS for multiple references
    attrValue -- A string to convert into the attribute value (which is 
       the ID of the target element for IDREF, or IDs separated by spaces 
       for IDREFS)
    idTable -- The table of associations from IDs to objects

Returns: nothing

************************************************************************************************/
ESISattribute( attrName : String, attrType : String,
               attrValue : String, idTable : IdTable ) : means
   if exists of ^attrName then
   begin
      var newValue
      if ( ^attrType = 'IDREF' )
         then /* If it is a backward reference, a value is returned.
                 If it is a forward reference, no value is returned but an
                 IdUnresolved record is added to the IdTable */
              newValue := do find( ^attrValue, self, ^attrName) to ^idTable
      else if ( ^attrType = 'IDREFS' )
         then newValue :=
                 perform( { |item| do find( ^item, self, ^attrName) to ^idTable } )
                 of parse ^attrValue using *( String.upToBlank ?blank)
      else if ( ( ^attrType = 'String' ) or  ( ^attrType = 'Text' ) )
         then newValue := ^attrValue
         else newValue := parse ^attrValue using Action(!Action(^attrType)).default 
      append ^newValue to Action(^attrName) of self
   end

The next section explains how the idTable works to resolve IDREF and IDREFS values.

5.4 The table of ID-to-object associations

The ID-to-object associations are handled by a set of three classes.  (Their definitions are found in the conceptualModel attribute of the SGML97 DomainModel.)  The main class is IdTable which has a contents consisting of a sequence of IdAssociations sorted by IDs:

class IdTable has
   contents: seq of IdAssociation sorted by 'id'

An IdAssociation has two main attributes: an id string which is an element ID from the SGML document and a reference (i.e. pointer) to the object in CELLAR which corresponds to the SGML element.  For the case where the IDREF in the SGML document is a forward reference (that is, the target ID has not yet been encountered in the SGML document), we have the special case of an unresolved association.  For every reference to an ID before the corresponding object is known, the IdAssociation stores an IdUnresolved in its unresolved attribute:

class IdAssociation has
   id:         String
   object:     refers to Object
   unresolved: seq of IdUnresolved

Each IdUnresolved object records the fact that the object pointed to in the source attribute has an unresolved reference to the ID given in the owning IdAssociation.  When the ID is finally encountered, the reference to the target object will be set in the attribute named attr of the source object:

class IdUnresolved has
   attr:   String
   source: refers to Object

The IdTable is passed to the main parsing function as a parameter.  It is accessed in two situations:

The find method has three parameters.  The first is the id that is being looked up.  The other two, the CELLAR object (source) from which the reference originates and the name of the attribute (attr) of that object in which that reference will be stored, must be passed in for the case in which this is actually a forward reference.  When this is a backward reference, the method simply returns the object that corresponds to the ID.  When it is a forward reference, there are two possible cases: either this the first reference to the given ID and both an IdAssociation and an IdUnresolved must be set up, or the this ID has already been referenced so we only need to add another IdUnresolved.  The code is as follows:

/* find : a MethodDefn on IdTable ****************************************
Description: 
    Finds an ID in the table and returns its associated object.  If there
    is no object yet associated with the ID, an unresolved forward 
    reference is recorded in the table (which is fixed up when the ID is
    later defined).

Parameters: 
    id -- The ID to lookup in the table
    source -- The object from which the reference to the ID is originating
    attr -- The attribute of the source object in which the reference will be stored

Returns:
     If this is a backward reference, the associated object is returned.
     If this is a forward reference, nothing is returned.

Side Effects: 
     If this is a forward reference, the table is modified to record an
     unresolved forward reference that is resolved later when the ID is
     defined.

************************************************************************************************/
find( id : String, source : Object, attr : String  ) : Object means
   begin
      var assoc, forward
      assoc := find( ^id, 'contents' ) of self
      if isMissing of ^assoc  then
         begin
            /* This is a first-time forward reference. Create an association. */
            assoc := create of !IdAssociation
            set id of ^assoc to ^id
            append ^assoc to contents of self
         end
      if exists of object of ^assoc
         then /* This is a backward reference. Return object */
              object of ^assoc  
         else /* This is a forward reference.  Set up the info needed to 
                 resolve it later when the ID is finally encountered. */
            begin
              forward := create of !IdUnresolved
              set attr of ^forward to ^attr
              set source of ^forward to ^source
              append ^forward to unresolved of ^assoc
            end
   end

The add method is called when an ID is encountered in the ESIS file.  It has two parameters: the id being declared and the CELLAR object to be associated with it (which happens to be the object currently under construction). The method must first see if it already has an association for the given ID.  If it does, then there were forward references to this ID; the appropriate action is to call the resolve method on the IdAssociation (see below) to set its object and resolved all the pending IdUnresolved forward references to this object.  If there is no association already, then the ID has not yet been referenced and we can simply create a new IdAssociation.  The code is as follows:

/* add : a MethodDefn on IdTable ****************************************
Description: 
   Adds a new ID-to-object association to the IdTable.  If the ID is  
   already in the table, it means that an IDREF was encountered before 
   the ID itself. In this case, the prior references are also cleaned up.

Parameters: 
   id -- The SGML ID
   object -- The corresponding CELLAR object

Returns: Nothing

Side Effects: Goes back and fixes pending forward references to this ID.

************************************************************************************************/
add( id : String, object : Object ) :  means
   begin
      var assoc
      assoc := find( ^id, 'contents' ) of self
      if exists of ^assoc
      then 
         /* This ID has already been referred to.
            Resolve the pending forward references. */
         do resolve( ^object) to ^assoc
      else 
         /* This ID hasn't been referred to yet.  Add the association. */
         begin
            assoc := create of !IdAssociation
            set id of ^assoc to ^id
            set object of ^assoc to ^object
            append ^assoc to contents of self
         end
   end

The resolve method on IdAssociation has one parameter, the object to which the pending forward references are to be resolved.  The method first sets the object attribute of the association to this target object.  Then it goes through the unresolved references one at a time and sets the attr of the source object of each IdUnresolved to the object passed in as parameter.  (This is done in the perform action which performs the embedded lambda action once for each value in the unresolved attribute; the current value is assigned to the ref parameter of the lambda action.)  The unresolved list is then set to empty since there are now no pending forward references.  The code is as follows:

/* resolve : a MethodDefn on IdAssociation ****************************************
Description: 
    Set the object associated with the ID already stored in this 
    IdAssociation to be the object passed in as the parameter.  Then 
    resolve all the pending forward references to this ID.

Parameters: 
    object -- The object to be associated with the ID already stored in
              the id attribute

Returns:  Nothing 

Side Effects:
    Changes all source objects noted in the IdUnresolved records stored 
    in the unresolved attribute to now point to the associated object 
    that has just been passed in as parameter.

Assumptions:
    Assumes both that the object attribute is not set and that the
    unresolved attribute is.

************************************************************************************************/
resolve( object : Object ) : means
   begin
      /* Set the association */
      if isMissing of my object
         then set object of self to ^object
         else /* This should not happen */
              do printDebug( 'Double association for' ) to my id
      if isMissing of my unresolved
         then /* This, too, should not happen */
              do printDebug( 'No unresolved references for' ) to my id
      /* Resolve the pending forward references */
      perform( { | ref |
                set Action( attr of ^ref) of source of ^ref to ^object
               } ) of my unresolved
      set unresolved of self to lit. missing
   end

5.5 The parsers for converting CDATA into CELLAR Strings and Texts

When a CDATA item is encountered in the main parser, in a line of the ESIS file beginning with - (hyphen), the parser must convert the CDATA string in the ESIS file to a CELLAR object.  The pcdataClass parameter names the target CELLAR class.  When the target class is String, the String.ESIS parser is called to make the conversion.  The main issue that this parser must deal with is handling the \n codes that are in the ESIS representation of the CDATA.  Whenever the original SGML document had a line break in the content of an element with a declared content type of PCDATA or CDATA, the ESIS representation encodes that line break as the character sequence \n.  A String in CELLAR is not allowed to contain a line break; thus each occurrence of \n must be converted to a space.  The following parser performs this task:

/*  ESIS : a ParserDefn on String ****************************************

Description:
     Converts string data from an ESIS file to a CELLAR string.
     Converts \n (newline) in the string to space.

Parameters:
     encoding -- Specifies the encoding for the resulting string

************************************************************************************************/
ESIS( encoding : Encoding default is lit. missing ) -->

   var string, substrings

   substrings := ( ?String.upTo('\n')
                   *( '\n' do( ' ' )  ?String.upTo('\n')  )
                 )
  
   do( begin
          string := create of !String
          set basicValue of ^string to ^substrings
          set encoding of ^string to ^encoding
          ^string
       end )

Note that this parser performs a second function; it also sets the encoding of the String to the one specified in the encoding parameter.  In the above parser, the substrings variable ends up holding a sequence of all the substrings returned by the String.upTo('\n') parsers.  A match of a literal string does not return a value.  Thus the code '\n' do(' ') has the effect of returning a literal space when the \n sequence is matched.  Later in set basicValue of ^string to ^substrings, all of the substrings are automatically concatenated as they are placed into the basicValue of the String.

A Text object in CELLAR is allowed to have line breaks.  Thus, a pcdataClass of Text is specified when one wants the data to preserve the line breaks that are in the original SGML document.  The Text.ESIS parser  differs from String.ESIS only in that the value substituted when \n is matched is a literal line break rather than a literal space.

/*   ESIS : a ParserDefn on Text ****************************************

Description:
     Converts string data from an ESIS file to a CELLAR Text.
     Converts \n (newline) in the string to a newline.

Parameters:
     encoding -- Specifies the encoding for the resulting string

************************************************************************************************/
ESIS( encoding : Encoding default is lit. missing ) -->

   var string, substrings

   substrings :=  ( ?String.upTo('\n')
                    *( '\n' do( '
' )  ?String.upTo('\n')  )
                  )

   do( begin
          string := create of !String
          set basicValue of ^string to ^substrings
          set encoding of ^string to ^encoding
          ^string
       end )


Document date: 12-Nov-1997