Language identification and IT: addressing problems of linguistic diversity on a global scale

Constable, Peter and Gary Simons

Many processes used within information technology need to be customized to work for specific languages. For this purpose, systems of tags are needed to identify the language in which information is expressed. Various systems exist and are commonly used, but all of them cover only a minor portion of languages used in the world today, and technologies are being applied to an increasingly diverse range of languages that go well beyond those already covered by these systems. Furthermore, there are several other problems that limit these systems in their ability to cope with these expanding needs. This paper examines five specific problem areas in existing tagging systems for language identification and proposes a particular solution that covers all the world's languages while addressing all five problems.

Computer programs
web development
RFC 1766
linguistic diversity
language identification
ISO 639
internationalization (I18N)
information technology (IT)
