Information structure research presented at recent workshop

(December 2012) Linguists with a special interest in the study of information structure recently met in the Netherlands for a workshop at the Max Planck Institute for Psycholinguistics. The theme of the second annual workshop organized by the Syntax, Typology and Information Structure Group was “Categories of Information Structure across Languages.” An SIL linguist was among those who presented research at the workshop.

The order and method in which information is presented in different languages is of interest to many descriptive and theoretical linguists. (For example, how does a listener keep track of the participants in a narrative or identify which information is meant to be understood as most important?) Researchers differ on whether some features related to information structure may actually be linguistic universals—found in all languages. Whether or not particular structures are common across languages, it’s clear that the act of communication requires that speakers organize information in some way that can be understood by listeners.

At the workshop, SIL linguist Erwin Komen and colleague Dr. Bettelou Los of Radboud University, Nijmegen, presented a paper entitled “Information state categories based on the pentaset.” The paper describes findings from a study that Komen and Los have conducted as part of Komen’s PhD research, with contributions from Professor Ans van Kemenade; the three work together in the Language and Transition Stages group of Radboud’s English Language Department. In the study they presented, Komen and Los examine the referential categories of a clause’s constituents (sentences are analyzed in their discourse context, such as a story that is being told).

Komen and Los hypothesize that referential categories can be grouped into a set of five members, termed the “pentaset:”

1. Identity:a noun phrase refers to exactly the same mental entity as its antecedent, something already present in the mental model being made in the hearer/reader’s mind. (In the example below, ‘he’ in the second sentence refers back to ‘Jack’ from the first sentence.)

Once upon a time there was a boy named Jack.
It was evening when he saw a nice little restaurant.

2. Inferred information:the noun phrase infers from an antecedent already available in the hearer/reader’s mind. (A person familiar with restaurants will know that there is a person who owns the restaurant, an owner.)

It was evening when he saw a nice little restaurant.
As soon as he came in, the owner approached him.

3. Assumed information:the noun phrase has an antecedent that is readily understood from the situation or general knowledge, but is not in the text. (‘Sun’ is available in the hearer/reader’s mind from general knowledge of the world.)

The door was wide open, and the sun shone straight at them.

4. Inert information:the noun phrase does not link back to anything, and nothing in the following text can refer back to it.

The man stared at him, and Jack was desperately looking for words of wisdom.

5. New information:the noun phrase does not link back to anything mentioned already or extra-textually available; so a new mental entity is created for it in the model that is being built up step by step in the hearer/reader’s mind. (The hearer/reader will understand that ‘a boy’ and ‘a restaurant’ are new entities being introduced in the discourse.)

Once upon a time there was a boy named Jack.
It was evening when he saw a nice little restaurant.


Using software developed by Komen, the group is tagging words and phrases in collections of historical texts with the pentaset categories. Since the combination of grammatical, antecedent and pentaset information allows deriving other existing sets and seems to allow determining “higher order” information structure notions (such as the focus domain), Komen and Los posit the pentaset as being a basic and universal ingredient of language, although more research is needed to validate such a claim.

Other presentations at the workshop included research on various aspects of information structure across languages as well as several which explored specific languages or language families, including Korean, German, Dutch, Basque, the Finno-Ugric (Uralic) languages and signed languages.

Related links of interest