A Conceptual Introduction
to
FieldWorks Language Explorer Stage 1 Morpho-phonology
H. Andrew Black

6-March-2006

Contents

1 Introduction

Morphology is the study of word forms. Morphological parsers are computational tools that automatically produce a morphological analysis for a given word form. Such tools have proven to be quite useful as spelling checkers, as morphological grammar checkers, in producing interlinear text and in adaptation of a text from one related language to another. This document is designed to help the reader do morphological parsing using the approach allowed by Stage 1 of the FieldWorks Language Explorer parser.

The purpose of this documentation is to provide an introduction to the key concepts and notions in the FieldWorks Language Explorer approach to morphological parsing. It is divided into two main sections: morphotactics and morphophonemics. The first has to do with controlling which morphemes can co-occur with which other morphemes within a well-formed word. The second has to do with controlling the phonological shape of individual morphemes. (There is one other main section that deals with some issues related to lexical entries.)

Please note that the mechanisms described here are the ones available for Stage 1, the first, rather simple-minded (linguistically-speaking) instantiation of FieldWorks Language Explorer. Later stages will provide much more power and capabilities.[1] The main reason why we have stages in the FieldWorks Language Explorer development project is to avoid trying to develop tools with all the user interface challenges all in one fell swoop. Doing that would be quite a daunting task and take a long time before any product could be released. Instead, we are staging the development to handle the basic items first. Then we'll add more and more as we go along.

1.1 Key Issues

We begin by addressing some of the key issues that any general morphological parser must face. Before we can tell the computer what to do, we need to understand what is going on linguistically. What kinds of language phenomena must such a computational tool be able to handle if it will indeed be a general tool?

1.1.1 Inflection

Many, if not most, languages inflect verbs and/or nouns. Consider the nominal Orizaba Nahuatl forms shown in (1) and the verbal ones shown in (2).[2]

(1)
nokal
no-kal
1SgPoss-house
‘my house’
mokal
mo-kal
2SgPoss-house
‘your(sg) house’
ikal
i-kal
3SgPoss-house
‘his/her/its house’
tokal
to-kal
1PlPoss-house
‘our house’
amokal
amo-kal
2PlPoss-house
‘your(pl) house’
inkal
in-kal
3PlPoss-house
‘their house’
 
nokalvan
no-kal-van
1SgPoss-house-PlPoss
‘my houses’
mokalvan
mo-kal-van
2SgPoss-house-PlPoss
‘your(sg) houses’
ikalvan
i-kal-van
3SgPoss-house-PlPoss
‘his/her/its houses’
tokalvan
to-kal-van
1PlPoss-house-PlPoss
‘our houses’
amokalvan
amo-kal-van
2PlPoss-house-PlPoss
‘your(pl) houses’
inkalvan
in-kal-van
3PlPoss-house-PlPoss
‘their houses’
(2)
nimiki
ni-miki
1SgSubj-to.die
‘I die’
timiki
ti-miki
2SgSubj-to.die
‘you(sg) die’
miki
0-miki
3SgSubj-to.die
‘he/she/it dies’
timikih
ti-miki-h
1PlSubj-to.die-Pl
‘we die’
anmikih
an-miki-h
2PlSubj-to.die-Pl
‘you(pl) die’
mikih
0-miki-h
3PlSubj-to.die-Pl
‘they die’

Notice how each possessed noun in (1) has at least a possessor prefix. Certain nouns require this possessor inflection. Similarly the verbs in (2) require subject markers (with the possible exception of 3rd person). A morphological parser must account for such inflectional items.

1.1.2 Derivation

Consider the English forms[3] in (3). What is happening here? How do you get a dumb computer to “understand” these forms correctly?

(3)
a. institute
b. institution
c. institutional
d. institutionalize
e. institutionalization
f. institutionalizational
g. institutionalizationally

In (3a) institute is a verb root (e.g. We need to institute some changes around here.). By adding the suffix ‑ion as in (3b), the word is changed to a noun. The suffix ‑al can be added to a noun stem to change it to an adjective, as in (3c). The suffix ‑ize changes an adjective into a verb (3d). Further category changes occur with the addition of each suffix in (3e-g). From this English example, we have seen that the computer needs to be able to distinguish between roots and suffixes, with each one restricted as to what category it attaches to and what category it changes the stem to. (Note, for example, that the suffix ‑ly cannot be added to either a verb stem or a noun stem: *institutely, *institutionly.)

A Huallaga Quechua example showing similar category changes along with various types of verbal and nominal affixes is given in (4). The verb root meaning ‘to see’ has the imperfective aspect marker added, followed by the first person object marker, yielding ‘to see me.’ The addition of the nominalizer changes the form to a noun meaning ‘seeing me.’ The noun form can now be possessed by the second person possessive marker and then the purpose marker may optionally follow, finally giving ‘in order that you might be seeing me.’[4]

(4)
rikaykaamaanaykipaq
rika-yka:-ma:-na-yki-paq
to.see-Imp-1Obj-Nom-2Pos-Pur
‘in order that you might be seeing me’

A morphological parser must account for such derivational items.

1.1.3 Ambiguity

Ambiguity is also apparent in (3a), since institute can be either a verb, as above, or a noun, as in Australian Institute of Marine and Power Engineers. Note that there are different types of ambiguity in natural language as well. For example, the word bank (among other things) can mean either the side of a river or a building that holds money. With either meaning, bank is a noun.

Now consider the following word:

(5)
a.
cooks
cook-s
person.who.prepares.food-PL
b.
cooks
cook-s
to.prepare.food-3SgPres

Note that cooks is ambiguous not only in the root meaning but also as to the suffix: the -s is a nominal plural morpheme in (5a) but a verbal third person singular present tense morpheme in (5b).

A morphological parser must be able to deal with the fact that individual words can legitimately be ambiguous. That is, a morphological parser must be able to discover and report all possible analyses of a word form. In many cases, the ambiguity is eliminated when the word is seen in context, so ideally a morphological parser is used in the context of computational tools that look beyond a single word.

1.1.4 Epenthesis

There are still other types of challenges for morphological parsing. For example, consider the Caquinte word in (6):[5]

(6)
itsavetakohitiro
i-tsave-(t)-ako-hi-(t)-i-ro
3M-tell- -DAT-PAS- -NF-3FO
‘she is told about’

The (t) in two places on the second line (which shows the word broken into morphemes) are not really morphemes at all. Instead, they are epenthetic consonants added to serve as onsets to syllables. Caquinte does not allow vowel clusters nor syllables without onsets (in this part of the verb), so whenever two vowels come together at a morpheme break, an epenthetic t is inserted. A morphological parser needs to be able to correctly account for forms that include epenthetic segments inserted to preserve syllable structure.

1.1.5 Discontinuous Morphemes

Now consider the Caquinte form in (7), which is the same word as in (6), but changed to future tense:

(7)
intsavetacojitero
i-n-tsave-(t)-ako-hi-(t)-e-ro
3M-FUT-tell- -DAT-PAS- -F-3FO
‘she will be told about’

What is the challenge here? The future tense is realized as a discontinuous morpheme: it is composed of the prefix n‑ and the suffix ‑e. The computer must be able to check these noncontiguous parts of the word to correctly analyze the future tense in Caquinte; one part cannot be present without the other.

1.1.6 Infixation

The Tagalog forms (from Spencer (1991:12-13) in (8) illustrate another challenge:

(8)
a. sulat ‘to write or writing (infinitive form)’
b. sumulat ‘to write (with actor focus)’
c. sinulat ‘to write (with object focus)’

What is happening here? This is a case of infixation, where the root sulat splits into two parts so that one of the focus morphemes, ‑um‑ or ‑in‑, can be inserted. A parser must correctly recognize the root even though it is broken apart by the infix.

1.1.7 Reduplication

Look at the additional Tagalog forms in (9) to determine how the imperfective aspect is marked:

(9)
a. susulat ‘to write (imperfective)’
b. magpasulat ‘to make someone write (perfective)’
c. magpapasulat ‘to make someone write (imperfective)’

We know from (8a) that sulat means ‘to write’. So in (9a) it appears that the imperfective marker is su, but we cannot tell if it is a prefix or an infix without looking at other forms. In example (9b) the causative ‘to make someone’ is the prefix pa‑. The mag‑ is what some call the actor focus or actor voice morpheme. But the imperfective of this causative form is not *sumagpasulat, *magsupasulat, nor *magpasusulat as we would expect from either prefixing or infixing su. Instead, we have magpapasulat in (9c) where it is clear that the marker for imperfective is the extra pa. The correct analysis is therefore that imperfective aspect is marked in Tagalog by reduplicating either the first syllable of the stem or the initial consonant and vowel of the first syllable of the stem.

A morphological parser must be able to recognize reduplication within a word form.

1.1.8 Root and Pattern Morphology

Semitic languages pose a special challenge with their root and pattern morphology. These languages have roots composed of three consonants, as exemplified in the Silt'i data in (10), where ‘buy’ is the root wkb. The aspect markers are composed of vowel patterns that fit between or around the root consonants, such as the a-a vowel pattern indicating the perfective aspect shown in (10). The parser needs to be able to find the root consonants and corresponding vowels of the aspect, even though they are intermingled in the surface form of the word.[6]

(10)
wakaba
a-a-wkb-a
Perf-buy-3rdSgPerf
‘he bought’

1.1.9 Metathesis

Now study the following Caquinte word.

(11)
ihikekehai
i-hi-k-e-kea-hi
3M-to.think.mistakenly-PROG-NF-FOC-NEG
‘he thought mistakenly’

What change takes place at the juncture between the final two morphemes? Notice that where one might expect the sequence keahi, what surfaces is kehai, where the h and a switch positions.[7] Such a transposition of phonemes is called metathesis. Furthermore, notice that the metathesis process in (11) crosses morpheme boundaries.

Such data imply that a morphological parser must be able to correctly identify morphemes even when some segments within the morphemes may have switched positions.

1.1.10 Morphemes that May Be Null

For a final challenge, consider these Caquinte forms (you do not need to understand all the morpheme glosses here; just concentrate on the initial subject prefixes):

(12)
a.
anehero
a-0-neh-e-ro
1I-FUT-see-F-3FO
‘we will see her’
b.
okeekake
o-keek-ak-e
3F-dig-PERF-NF
‘she had dug’
c.
oasanomahakemparime
0-0-o-(a)-sano-maha-k-e-Npa-ri-me
1I-FUT-eat- -VERI.M-VERI-PROG-F-R-3MO-CNTR
3F-FUT-eat- -VERI.M-VERI-PROG-F-R-3MO-CNTR
‘we/she will not really be eating it’

What is the problem with the subject prefixes? In (12a) we see that the first person inclusive subject marker is a‑, and in (12b) the third person feminine subject marker is o‑. Yet, in (12c), the gloss shows ambiguity between ‘we’ and ‘she’ as the subject, and both of these are represented as null. This is because both subject prefixes are vowels and the stem in (12c) is vowel-initial, yielding two vowels together. Recall from (6) that Caquinte generally does not allow vowel clusters, and therefore adds an epenthetic ‑t‑ when necessary to avoid such clusters. It turns out that epenthesis is only used in the suffixes. Within the prefixes, the initial vowel of a cluster deletes, causing the ambiguity seen in (12c).

This means that a morphological parser must be able to identify a morpheme even when the morpheme has no overt segments.

1.2 Tasks for any Morphological Parser

Given the challenges of morphological parsing exemplified in the preceding section, how can a computer program go about analyzing words into their constituent morphemes? Let's say that the task of a morphological parser is to take a form like itsavetacojitiro from (6) above and

What are some of the things our parser is going to have to know and what are some of the things that it is going to have to do?

Things the parser needs to KNOW:

Things the parser needs to DO:

Clearly, properly using and controlling the constraints is the major task in implementing a parser for a given language. Since a morphological parser must model linguistic reality, it is a good idea to use constraints that model appropriate linguistic notions. Two major concepts for morphology are morphotactics and morphophonemics. Morphotactics deal with what morphemes can co-occur with what other morphemes. Morphophonemics deal with what shape a given morpheme will have in various phonological and morphological environments. The next two major sections outline the constraints available with the Stage 1 FieldWorks Language Explorer parser and how to use them.

2 Morphotactics

Morphotactics has to do with controlling the order of the morphemes in a well-formed word and controlling which morphemes can co-occur with which other morphemes. As examples of the former, one would not expect to find a prefix at the end of a word or a suffix at the beginning of a word. As an example of the latter, while one would expect a tense affix to appear with a verb root in a verbal word, one would not expect a tense affix to show up on a pronoun. The morphotactic mechanisms described in this section delineate what one can do within the FieldWorks Language Explorer model to control such things. The idea is to use the morphotactic mechanisms to correctly describe the facts of the language and thereby not only provide correct parses, but also rule out false parses.

By the way, correctly describing the facts of the language also provides the basis for a grammatical description, something that FieldWorks Language Explorer provides. By making a correct description of the facts we can both generate a description that people can read to learn about the language and we can feed the information to a parser that can put our description to work checking spellings, adapting to other languages, and verifying the fit of our description.

Note that for words which consist solely of a single morpheme, there are no special morphotactic considerations. One merely adds appropriate lexical entries for these and ensures that the morpheme type of the allomorph(s)[9] in the entry is(are) set to a root or stem type.

This section has four major sub-sections. The first deals with handling affixation to stems (section 2.1). The second deals with stem compounding (section 2.2). The third discusses issues related to clitics (section 2.3). The fourth is for those cases where the parser is producing parses that are incorrect, but the Stage 1 mechanisms do not allow any other way to eliminate the false parses (section 2.4).

2.1 Affixation

This section discusses issues relating to adding affixes to stems. Linguists typically divide affixes into two major categories: inflectional and derivational. Therefore, FieldWorks Language Explorer allows you to declare a given affix as being either inflectional or derivational. In the process of analyzing a language, however, sometimes one does not yet know whether a given affix is inflectional or derivational. There are certain affixes which are truly difficult to classify in this fashion. For this reason, FieldWorks Language Explorer also allows you to label a given affix as being unclassified with respect to inflection and derivation. As you study the language more, you should eventually figure out whether such affixes are inflectional or derivational and then you can change their status from being unclassified to the appropriate one.

2.1.1 Unclassified Affixes

You can label an affix as “unclassified” when you do not know if it is derivational or inflectional. Please understand, though, that when you do this, the affix is relatively unconstrained as to where it can appear. As a result, the FieldWorks Language Explorer parser may return a number of incorrect parses for some word forms which happen to contain a sequence of characters that match one or more allomorphs of an unclassified affix. One partial solution to this is to indicate the category of the stem to which the affix may attach. The best solution, of course, is to classify the affix as being either inflectional or derivational so it will only show up where it should.

2.1.2 Inflectional Affixes

Inflectional affixes typically reflect what some call “grammatical meaning.” These are things like person, number, case, gender, tense, aspect, etc. One can also typically create a paradigm of word forms with the various inflectional categories as labels on the chart.[10]

2.1.2.1 Simple Example

For example, consider the information for a possessed noun in Orizaba Nahuatl given in (1) above, but this time displayed in a different fashion:

(13)
house singular possessed noun
1st Person Singular Possessive nokal
2nd Person Singular Possessive mokal
3rd Person Singular Possessive ikal
1st Person Plural Possessive tokal
2nd Person Plural Possessive amokal
3rd Person Plural Possessive inkal

What are the inflectional affixes here? Given that every form has the sequence kal, it appears that there are six possessor prefixes which occur before the noun stem. Similar paradigms for other singular possessed nouns would show the same situation (ignoring any morphophonology). Therefore we could posit that the singular possessed noun has an inflectional template that consists of a possessor prefix followed by the stem. We could diagram this as in (14).

(14)
Possessor Stem
no- ‘1SgPoss’
mo- ‘2SgPoss’
i- ‘3SgPoss’
to- ‘1PlPoss’
amo- ‘2PlPoss’
-in ‘3PlPoss’

Now consider the plural possessed noun data from (1) above, but displayed in a similar fashion to (13).

(15)
house plural possessed noun
1st Person Singular Possessive nokalvan
2nd Person Singular Possessive mokalvan
3rd Person Singular Possessive ikalvan
1st Person Plural Possessive tokalvan
2nd Person Plural Possessive amokalvan
3rd Person Plural Possessive inkalvan

What are the inflectional affixes here? Notice that there is the same stem (kal) and the same set of six possessor prefixes as in (13). In addition, there is a plural suffix ‑van. Similar paradigms for other plural possessed nouns would show the same situation (ignoring any morphophonology). Therefore we could posit that the plural possessed noun has an inflectional template that consists of a possessor prefix followed by the stem which, in turn, is followed by a plural suffix. Since plural is an instance of the notion of number, we could diagram this as an inflectional template as shown in (16).

(16)
Possessor Stem Number
no- ‘1SgPoss’
mo- ‘2SgPossj’
i- ‘3SgPossj’
to- ‘1PlPossj’
amo- ‘2PlPossj’
-in ‘3PlPossj’
‑van ‘Plural’

Notice what we have described here: for a particular category (possessed noun), we have an inflectional template with one prefix slot (for possessor) and one suffix slot (for number). The possessor slot can be filled by any of the inflectional prefixes listed in (13). The number slot can be filled by the plural suffix.

2.1.2.2 Optional Affix Slots

Now you may well have noticed that there is a potential problem here with the template in (16). If we treat each slot in the template as being obligatory, then the template says we must have a number suffix in order for the template to be satisfied. This means that a possessed singular noun will not meet the requirements of this template because it does not have a suffix in the number slot. It turns out that FieldWorks Language Explorer actually does treat each slot as being obligatory unless it is overtly marked as being optional.

What can we do about this? There are at least three options available within the FieldWorks Language Explorer approach:

  1. Treat the Number slot as being optional so that for the singular case, there would not be any suffix in the Number slot.
  2. Create two distinct templates for the possessed noun category: one for singular and one for plural.
  3. Create a null singular number suffix which could then satisfy the requirement of something being in the Number slot.

Which of these three should we use? Options 1 and 2 will effectively give the same result, although option 1 is definitely simpler. Following the general principle known as Occam's Razor,[11] option 1 is thus better.

Option 3 requires us to posit a null suffix and some argue that if an affix is always null (as it would be here) then what we really have is a default feature: unless there is an overt number suffix, assume that the number is singular. While Stage 1 of FieldWorks Language Explorer does not allow us to mark such default features, later stages of FieldWorks Language Explorer will.

Therefore, from a long term perspective, we recommend following option 1.

This means that to model this inflectional template, we will need to do the following:

(17)
  1. Create or at least make sure we have a possessed noun category.
  2. Create an inflectional template within the possessed noun category.
  3. Give that template one prefix slot (for possessor).
  4. Give that template one suffix slot (for number).
  5. Mark the number suffix slot as optional.
  6. For the possessor prefix slot, put in it the six possessor prefixes listed in the first column of (16). If these possessor prefixes do not already exist, then we need to create the lexical entries and mark them as inflectional.
  7. For the number suffix slot, put the plural suffix in it. If the number suffix does not already exist, then create the lexical entry for it and mark it as inflectional.

Once we have done this, we will have successfully set up the inflectional morphotactics for possessed nominals in Orizaba Nahuatl.

2.1.2.3 Multiple Templates

In the previous section we suggested that using optional affix slots in a template was a good choice for handling Orizaba Nahuatl nominal possession. Since we noted that within the FieldWorks Language Explorer approach, one could add more than one template to a category, one might wonder when it would be appropriate to choose such an option.

Orizaba Nahuatl happens to provide such a case. Consider the information for an intransitive, present tense verb given in (2) above, but this time displayed in a fashion more conducive to our purposes here:

(18)
to.die, present tense 1st Person Subject 2nd Person Subject 3rd Person Subject
Singular nimiki timiki miki
Plural timikih anmikih mikih

What are the inflectional affixes here? At least under one analysis, there are four subject prefixes and a plural suffix. Third person subject is the default or is null. Similarly, singular number is the default or null.

Where do these inflectional affixes appear? Notice that all the subject ones appear just before the stem and that the plural suffix appears right after the stem. Similar paradigms for other intransitive verbs would show the same situation (ignoring any morphophonology). Therefore we could posit that the present tense, intransitive verb has an inflectional template that consists of a subject inflectional affix followed by the stem which is followed by a number inflectional suffix. We could diagram this as in (19).

(19)
Subject Stem Number
ni- ‘1SgSubj’
ti- ‘2SgSubj’
ti- ‘1PlSubj’
an- ‘2PlSubj’
-h ‘Pl’

At first glance, this is very much like what we saw for possessed nominals in example (16) above. We might think initially that we can do exactly what we did for possessed nominals and merely mark the Number slot as optional for these intransitive verbs. If we were to do that, however, notice what would happen for a form like timiki which is supposed to only mean ‘you(sg.) die.’ Because the Number slot would be optional, the FieldWorks Language Explorer parser would allow a parse of 1PlSubj-to.die as well (this, of course, is because both 2SgSubj and 1PlSubj have the same shape: ti‑). At this point, we would have nothing to prevent this incorrect parse.[12]

To eliminate this problem (as well as to eliminate the possibility of the parser allowing a parse for an ill-formed word such as *anmiki), we can create two inflectional templates: one for singular and one for plural. The singular one will be like this:

(20)
Subject Stem
ni- ‘1SgSubj’
ti- ‘2SgSubj’

The plural one will be like this:

(21)
Subject Stem Number
ti- ‘1PlSubj’
an- ‘2PlSubj’
-h ‘Pl’

Notice how this method places the singular subject markers in the singular template and puts the plural subject markers in the plural template. This way we force the presence of the plural suffix for the plural subject prefixes.

What needs to be done to handle the 3rd person cases? We will need to mark the subject slot as optional in both templates in order to allow for the 3rd person cases.

This means that to model this inflectional template, we will need to do the following:

(22)
  1. Create or at least make sure we have an intransitive verb category.
  2. Create two inflectional templates within the intransitive verb category:
    1. For the singular template:
      1. Give it one prefix slot (for singular subject).
      2. Mark this slot as optional.
      3. Put the 1SgSubj‑ and 2SgSubj‑ prefixes in this slot. If these prefixes do not already exist, create them and mark them as inflectional.
    2. For the plural template:
      1. Give it one prefix slot (for plural subject).
      2. Mark this slot as optional.
      3. Put the 1PlSubj‑ and 2PlSubj‑ prefixes in this slot. If these prefixes do not already exist, create them and mark them as inflectional.
      4. Give it a required suffix slot (for number).
      5. Put the ‑Pl suffix in this slot. If this suffix does not already exist, create it and mark it as inflectional.

2.1.2.4 Discontinuous Morpheme

In section 1.1.5 above, we noted that in Caquinte, the future tense is realized as a discontinuous morpheme: it is composed of the prefix n‑ and the suffix ‑e. We repeat the example here:

(23)
intsavetacojitero (=7)
i-n-tsave-(t)-ako-hi-(t)-e-ro
3M-FUT-tell- -DAT-PAS- -F-3FO
‘she will be told about’

How do we fulfill this requirement that both the future prefix and future suffix appear? One way is to create a future tense inflectional template which has both the prefix and the suffix required. The template might look like this:

(24)
Subject Future Stem Future Object
no- ‘1Subj’
a- ‘1InclSubj’
pi- ‘2Subj’
i- ‘3MascSubj’
o- ‘3FemSubj’
N- ‘FUT’
-e ‘F’
-na ‘1Obj’
-ahi ‘1InclObj’
-Npi ‘2Obj’
-ri ‘3MascObj’
-ro ‘3FemObj’

2.1.2.5 Inflection and Categories Considerations

The categories in FieldWorks Language Explorer are organized in a hierarchical fashion. For example, one can have a major category of verb and then nest other verb types underneath it (e.g. intransitive verb, transitive verb, etc.) One can even nest other types under these if one so wishes (e.g. one might put bitransitive verb under transitive verb.).

The exact hierarchy one uses can make a difference for how FieldWorks Language Explorer handles the inflectional templates and their slots. When one defines the slots for a given category, those slots may be used in any template for this category and any of its nested categories. The same is true for templates. You may well need to keep this in mind as you design your category hierarchy.

2.1.2.6 Inflection Classes

Now consider the Yalálag Zapotec data given in (25)(26):[13]

(25)
a.
utecho
u-te-cho
Fut-to.pass(trans)-1PlIncl
b.
u-:ke'nia'cho
u-:ke'nia'-cho
Fut-to.limp(intrans)-1PlIncl
(26)
a.
:techo
:-te-cho
Fut-to.pass(intrans)-1PlIncl
b.
:ti:pla':chcho
:-ti:pla':ch-cho
Fut-to.encourage(trans)-1PlIncl

What is the phonological shape of the Future marker? It appears to be u‑ in (25) but the “fortifier” segment/feature :‑ (i.e. a colon) in (26). Notice that there do not appear to be any phonological reasons for the different allomorphs. In fact, the stem has the same phonological shape in (25a) and in (26a).[14] This problem is not isolated to these pairs of forms; it turns out that verb roots in general divide into two groups, those that take the u‑ future and those that take the :‑ future.

How do we handle this kind of allomorphy when the choice of allomorphs is not motivated by the phonological environment but by the choice of the lexical root? The FieldWorks Language Explorer approach is to use inflection classes. An inflection class is “a set of lexemes whose members each have the same type of inflectional forms.” Aronoff (1994:64). They correspond to the traditional idea of declension classes or conjugation classes. For Yalálag Zapotec, we would create two inflection classes within the verb category (so that it applies to all verbs, not just one particular subtype of verb). One class would be for roots that select the u‑ allomorph and the other would be for those that take the “fortifier”:‑ allomorph.

This means that to model these inflectional classes, we will need to do the following:

(27)
  1. Create two inflectional classes within the verb category.
  2. Create the future inflectional prefix and within it
    1. Create the u‑ allomorph and tag it as belonging to the first inflection class.
    2. Create the “fortifier”:‑ allomorph and tag it as belonging to the second inflection class.
  3. For each verb root, tag it as belonging to either the first or the second inflection class, whichever is correct for that verb.

Now consider the following Latin data which also illustrates the use of inflection classes.[15]

(28)
Declension Citation Form Gloss Dative Plural
I causa reason caus-is
II annus year ann-is
III civis citizen civ-ibus
IV manus hand man-ibus
V dies day di-ebus

Note that while there are five distinct declensions in Latin, there are only three forms for the dative plural: ‑is, ‑ibus, and ‑ebus. In particular, notice that ‑is is used for both declension class I and II and, similarly, ‑ibus is used for both declension class III and IV. So to model this Latin data in FieldWorks Language Explorer, we will need to do the following:[16]

(29)
  1. Create five inflectional classes within the noun category.
  2. Create the dative plural inflectional suffix and within it
    1. Create the ‑is allomorph and tag it as belonging to both the first and second inflection classes.
    2. Create the ‑ibus allomorph and tag it as belonging to both the third and fourth inflection classes.
    3. Create the ‑ebus allomorph and tag it as belonging to the fifth inflection class.
  3. For each noun root, tag it as belonging to the appropriate inflection class, whichever is correct for that noun.

2.1.2.7 Inflection Features

Another mechanism offered by FieldWorks Language Explorer can be illustrated by the Spanish noun data given in (30) below:

(30)
a.
casa
kas-a
house-Feminine
b.
caso
kas-o
case-Masculine
c.
casita
kas-it-a
house-Diminutive-Feminine
d.
casito
kas-it-o
case-Diminutive-Masculine

Notice that the main difference between these nouns is the gender suffix. If the ‑a ‘Feminine’ suffix is used, then the cas root means ‘house’. On the other hand, if the ‑o ‘Masculine’ suffix is used, then the cas root means  ‘case’.

For a human, it is not necessarily difficult to keep these facts straight, but for a morphological parser, we need some way to prevent it from thinking that casa has the masculine root cas that means ‘case’. Similarly we need a way to keep the parser from thinking that caso has the feminine root cas that means ‘house’. That is, we need a way to prevent the parser from giving “analyses” such as the ones shown in (31), where the asterisk (*) indicates that the analysis is incorrect.

(31)
a.
casa
kas-a
*case-Feminine
b.
caso
kas-o
*house-Masculine

With the FieldWorks Language Explorer parser we use inflection features to deal with this issue. Inflection features are typically characteristics of a morpheme that play a role in the inflection of a word and/or play a role in the syntax (such as agreement within a noun phrase or agreement between a verbal affix and the noun phrase it agrees with). Note that if you use the Morphological Glossing Assistant tool for glossing inflectional affixes, then FieldWorks Language Explorer will automatically add some inflectional features for you.

Coming back to the Spanish data in (30) and (31) above, how exactly does one use inflection features to rule out incorrect parses such as the ones in (31)? The problem here is that there is mismatch between the gender of the root and the gender of the affix. If we can mark the root for the correct gender and also mark the suffixes for the gender they agree with, then the FieldWorks Language Explorer parser will only produce the correct parses.

Note that for cases where a noun has noun class, say, and in addition, has a possessive affix which has a different noun class, then we must be careful to avoid the two noun classes from clashing with each other. If we merely use a feature of “Class” for both the noun and the possessive affix, then the values will differ and the parser will not analyze the word. Instead, we need to use separate noun agreement and possessor agreement complex features. Within each of these complex features, we use the “Class” feature and its values. In this way, not only does the parser correctly analyze the word, it also will have the correct features demarcated for eventual syntactic analysis.

How does one create and use an inflection feature in FieldWorks Language Explorer?

(32)
  1. Determine the inflection feature involved, including its type,[17] name, and possible values.
    1. Try using the Inflection Feature Catalog[18] to see if the feature is already in the catalog. If so, add the feature via the catalog (it's much easier this way).
    2. If the feature is not in the catalog, then
      1. If the feature type does not yet exist, add it to the feature types.
      2. Create the feature and its values in the features section.
  2. For each category which will use the feature, add the feature to the category's set of inflectable features.
  3. For each root needing a feature, add the feature and its appropriate value to the stem's grammatical function.
  4. For each inflectional affix needing a feature, add the feature and its appropriate value to the inflectional affix's grammatical function.

Many languages will use one or more of the inflection features listed in the chart shown in (33) below.

(33)
Feature Type Feature Name Sample Values
Agreement Person 1st, 2nd, 3rd
Number Singular, Dual, Plural
Gender Masculine, Feminine, Neuter
Class 1, 2, ..., 20 (or by shape or other classification system)
Animacy Animate, Inanimate
Case Nominative, Accusative, Dative, Locative, Genitive, Ergative, Absolutive
Default Aspect Completive, Continuative, Habitual, Perfective, Progressive, Stative
Tense Past, Present, Future
Mood Declarative, Imperative, Interrogative, Irrealis, Realis

These are just some examples. Your language may use these or may need others. You may want to check with a linguistic consultant who is familiar with your language family for ideas as to which inflection features are appropriate for your language. Or you may just want to add them only when you find a need for them, such as when the FieldWorks Language Explorer parser gives incorrect parses for forms.

The Spanish data illustrates how we can use gender inflection features to rule out incorrect parses when a gender affix shows up incorrectly on a root. Some possible situations where inflection features could play a similar role in ruling out incorrect parses include those shown in (34).

(34)
Situation Possible Inflection Features to use
Gender mismatch between affix and stem Gender agreement features
Noun class mismatch between affix and stem Noun class agreement features
Animacy mismatch between affix and stem Animacy agreement features
Two or more aspect markers showing on a verb, when there should only be one Aspect features
Two or more tense markers showing on a verb, when there should only be one Tense features

2.1.2.8 Inflection Classes versus Inflection Features

When modeling a given language, one may well wonder if a given phenonmenon should be handled by inflection classes or by inflection features. Here are some guidelines to help one decide:

Look at the various affixes involved.

If they ... then use ...
have no semantic differences (i.e. have the same meaning),
have non-phonologically motivated shape differences,
and are not involved in (syntactic) agreement
inflection class
have semantic differences (i.e. actually have different meaning) inflection features
are involved in (syntactic) agreement inflection features
are really declension classes or conjugation classes inflection classes
are noun classes or gender inflection features

2.1.3 Derivational Affixes

Derivational affixes typically reflect what some call “lexical meaning.” They go on a stem to produce a new stem. The new stem may then be inflected (if the category of the new stem has inflection). Derivational affixes often change syntactic category. See Bickford (1998:135ff) for more on this.

2.1.3.1 Major Category-changing Derivational Affixes

The English data from example (3) is repeated below with more information:

(35)
Form Derivational Affix Category
institute (none) verb
institution -ion noun
institutional -al adjective
institutionalize -ize verb
institutionalization -ation noun
institutionalizational -al adjective
institutionalizationally -ly adverb

What do we have here? We have five derivational suffixes, each of which changes the major category of the resulting stem. Recall that these suffixes only go on stems of a certain category. For example, the ‑al suffix only goes on noun stems. It does not go on other stems (*institutal, *institutionalal, and *quicklyal). These affixes are summarized in (36) below.

(36)
Form “from category” “to category” Gloss
ion verb noun Nominalizer
al noun adjective Adjectivizer
ize adjective verb Verbalizer
ation verb noun Nominalizer2
ly adjective adverb Adverbializer

How do we model these category changing affixes in FieldWorks Language Explorer? We need to do the following:

(37)
  1. Add each affix as a lexical entry and mark it as being derivational.
  2. For the “from category” piece of information, use the category of the stem to which this affix attaches (see section 2.1.3.6 for more on this).
  3. For the “to category” piece of information, use the category of the stem that results when this affix is attached (see section 2.1.3.6 for more on this).

2.1.3.2 Sub-category-changing Derivational Affixes

Now consider the pairs of data in (38)-(40) from Turkish:[19]

(38)
a.
Çocuğu yıkadı
Çocuğ-u yıka-dı
child-Acc wash-Past
‘(S)he washed the child’
b.
Çocuk yıkandı
Çocuk yıka-n-dı
child wash-Pass-Past
‘The child was washed’
(39)
a.
Bu işi yapmaya başlıyorlar
Bu iş-i yap-ma-ya başl-ıyor-lar
this work-Acc do-Inf-Dat begin-Prog-3pl
‘They are beginning to do this work’
b.
Bu yapımaya başlanıyor
Bu yap-ıl-ma-ya başla-n-ıyor
this work do-Pass-Inf-Dat begin-Pass-Prog
‘This work is beginning to be done’
(40)
a.
O adamlar sigara içiyor
O adam-lar sigara iç-iyor
Those man-Pl cigarette drink-Prog
‘Those men are smoking cigarettes’
b.
Sigara içilmez
Sigara iç-il-mez
cigarette(s) drink-Pass-Neg
‘Cigarettes are not smoked here’ (= no smoking)

What is the key difference in each pair? It is the addition of the passive morpheme. Notice how the number of arguments changes from two (subject and object) to one (just subject) with the addition of the passive.

Is passive, then, a category changing derivational affix? While it does not change major category (i.e. it does not change a verb into a noun, say) it does change a transitive verb into an intransitive verb. That is, passive is a case where the sub-category is changed. Many languages have other such sub-category changing derivational affixes such as causatives, applicatives, and transitivizers. As far as FieldWorks Language Explorer is concerned, these are category changing derivational affixes since the result of the derivation produces a different sub-category that potentially requires a different inflectional template to complete the word form.

How do we model these sub-category changing affixes in FieldWorks Language Explorer? We need to do the following:

(41)
  1. Add each affix as a lexical entry and mark it as being derivational.
  2. For the “from category” piece of information, use the (sub-)category of the stem to which this affix attaches (see section 2.1.3.6 for more on this).
  3. For the “to category” piece of information, use the (sub-)category of the stem that results when this affix is attached (see section 2.1.3.6 for more on this).

2.1.3.3 Non-category-changing Derivational Affixes

Now consider the following Yalálag Zapotec data:[21]

(42)
a.
:xopcho
:-xop-cho
Fut-to.drag-1PlIncl
b.
waxopcho
w-a-xop-cho
Fut-Rep-to.drag-1PlIncl
(43)
a.
uchi:chcho
u-chi:ch-cho
Fut-to.laugh-1PlIncl
b.
wachi:chcho
w-a-chi:ch-cho
Fut-Rep-to.laugh-1PlIncl

The addition of the repetitive prefix does not change either the major category or the sub-category of the words in (42)-(43). One might wonder, then, if the repetitive in Yalálag Zapotec is actually an inflectional prefix. The evidence that it is derivational is that it actually changes the inflection class of the resulting stem. In (42a) the stem is inflection class 2 (because it takes the “fortifier”:‑ allomorph of the future prefix). After the a‑ repetitive prefix is added in (42b), the resulting stem uses the inflection class 1 allomorph of future (u/w‑).

How do we model these non-category changing affixes in FieldWorks Language Explorer? We need to do the following:

(44)
  1. Add each affix as a lexical entry and mark it as being derivational.
  2. For the “from category” piece of information, use the category of the stem to which this affix attaches (see section 2.1.3.6 for more on this).
  3. For the “to category” piece of information, use the same category as for the “from category”.

Notice that in this case the from‑ and to‑ categories will be the same, but we do need to deal with the change in inflection class. This leads us to the next topic below.

2.1.3.4 Inflection Class and Derivational Affixes

If the language you are studying has inflection classes (see section 2.1.2.6), then what happens when derivational affixes are attached? Does the inflection class of the stem stay the same or does it change?

2.1.3.4.1 Inflection Class May Change

As we saw from the Yalálag Zapotec data in 2.1.3.3, the inflection class can indeed change. How do we model this? In addition to what we've done for the categories, we need to do the following:

(45)
  1. Also indicate the resulting inflection class in the “to inflection class” piece of information.
2.1.3.4.2 Inflection Class Does Not Change

There are cases, though, where a derivational affix is attached and it does not change the inflection class of the resulting stem. For example, consider the following data from Atzingo Popoloca:[22]

(46)
a.
tjanchia
t-janchi-a
Pres-to.ask-1aSgSubjAct
b.
tjáncháhā
t-jánchá-h-ā
Pres-to.ask-Apl-1aSgSubjAct
(47)
a.
níncaon
0-nínkaon
Pres-to.get.angry
b.
níncaconhen
0-nínkakon-hen
Pres-to.get.angry-Apl

The applicative suffix Apl adds an argument to the verb, but it does not change the inflection class of the resulting stem. The root in (46) belongs to inflection class 1 and so takes the t‑ allomorph of the present tense morpheme. Adding the applicative does not change this (46b). Similarly, the root in (47) belongs to inflection class 2 and so takes a null allomorph of the present tense. Once again, adding the applicative does not change the inflection class of the resulting stem (47b).

To model this in FieldWorks Language Explorer, one does the following:

(48)
  1. Merely leave the “to inflection class” information blank in the lexical entry for the appropriate affix.

2.1.3.5 Inflection Features and Derivational Affixes

If the language you are studying has inflection features (see section 2.1.2.7), then what happens when derivational affixes are attached to a stem with, say, agreement features? Or what happens when a derivational affix changes the category of the stem to a category that has agreement features? For example, consider the Spanish data in (49) and (50):[23]

(49)
a.
apretar
apret-ar
to.press-Infinitive
b.
apretón
apret-ón
to.press-Nominalizer (= pressure)
(50)
a.
trasquilar
traskil-ar
to.shear-Infinitive
b.
trasquilón
traskil-ón
to.shear-Nominalizer (= clipping of wool)

Here we have a verb (e.g. apretar) and a noun derived from that verb (e.g. apretón). Recall from section 2.1.2.7 that Spanish nouns are marked for gender (masculine or feminine). While Spanish verbs are not marked for gender, a noun derived from a verb will have gender. In the case of the ‑ón derivational suffix, the resulting noun has masculine gender. To properly model this, we would need to indicate that the resulting noun has this gender.

How does one mark a derivational affix for inflection features in FieldWorks Language Explorer?

(51)
  1. Determine the inflection feature involved, including its type, name, and possible values.
    1. If the feature type does not yet exist, add it to the feature types.
    2. If the feature and its values do not yet exist, add them in the features section.
  2. For each category which will use the feature, add the feature to the category's set of inflectable features (if it's not already listed).
  3. For each derivational affix needing a feature:
    1. If the derivational affix requires the stem to have such a feature, add the feature and its appropriate value to the derivational affix analysis “From” features.
    2. If the stem that results from adding the derivational affix has such a feature, add the feature and its appropriate value to the derivational affix analysis “To” features.

2.1.3.6 Category-changing Derivational Affixes and Category Organization

As we noted in section 2.1.2.5, the categories in FieldWorks Language Explorer are organized in a hierarchical fashion.

The exact hierarchy one uses can make a difference for how FieldWorks Language Explorer handles the categories of derivational affixes. When one indicates the “from category”