Standards

Language Technology developers are actively engaged in developing data standards for language data. In collaboration with others in the industry, progress is being made on the following standards:

LIFT — LIFT (Lexicon Interchange FormaT) is an XML format for lexical information (dictionaries). LIFT allows movement of data between programs such as WeSay, FLEx and Lexique Pro.

Flextext — an XML format for interlinear data, to allow movement of data between programs such as SayMore, FLEx and ELAN.

Unicode — an industry-wide character set encoding standard designed to support the worldwide interchange, processing, and display of the written texts of the diverse languages and technical disciplines of the modern world. Closely related to ISO/IEC 10646.

ISO 15924 — the  International Organization for Standardization's registry of scripts. Each script is identified by a name and four-letter code. The current version of the standard includes 156 scripts.

ISO 639-3 — the International Organization for Standardization's registry of the languages of the world. It is comprised of living languages taken from SIL's Ethnologue, as well as extinct, ancient, reconstructed, and artificial languages. The current registry includes over 7000 languages, each identified by a unique three-letter code.