The Linguist's Shoebox
Integrated data management and analysis for the field linguist
Tip
The Shoebox parser works according to the longest-match principle.
In Shoebox, the parser eliminates much of the potential ambiguity by selecting the parse that "cuts off" the longest affix. If there is more than one way that the parser could divide a word into pieces, the longest affix can "win" over the longest root. Depending on the morphology of the particular language you are analyzing, the way that the Shoebox parser uses "greedy" matching can work for you or against you. Here are two potential parsing problems:
-
A valid parse causes another valid parse to be eliminated.
In English, does is ambiguous. It could be the third person plural of the verb do or the plural of the noun doe. Because -es is longer than -s, the Shoebox parser eliminates doe -s. -
An invalid parse causes the valid parse to be eliminated.
In English, hopes is the third person plural of the verb hope. However, there is an invalid parse that consists of hop and -es. Because the longer affix -es wins over the longer root hope, you lose in this case.
-
You can create a desired ambiguity by entering two forms of equal length.
\lx do \lx doe . \a does . \a does . . \u do -s . . \u doe -s -
You can eliminate an invalid parse by entering a longer valid form.
\lx hope . \a hopes . . \u hope -s
For more information: Read Parsing with Shoebox and pages 247250 in the Shoebox Tutorial.
Index of tips:
alternate forms;
longest-match principle;
parsing;
underlying forms
List of tips
