Linguistic Data Resources on the Internet
A topically organized list of language data resources on the Internet.
Texts
Electronic Text Centers
- Center for Electronic Texts in the Humanities (CETH)
- CTI Centre for Textual Studies at Oxford University
- Directory of Electronic Text Centers Worldwide
- Electronic Text Center at UVirginia combines an on-line archive of thousands of SGML-encoded electronic texts (some of which are publicly available) with a library-based Center housing hardware and software suitable for the creation and analysis of text
- The Electronic Text (Etext) Pages
- Electronic Text Service at Columbia University
- LETRS: Library Electronic Texts Resource Service
- The Survey of English Usage
Digital Libraries
- Cornell Digital Library, Prototype
- Digital Library Initiative at UIUC
- Stanford University Digital Libraries Project
- UC Berkeley Digital Library Project
- University of Michigan Digital Library Project
- University of Virginia Electronic Text Library
Text Collections
- Aboriginal Studies Electronic Data Archive
- Alex: A Catalog of Electronic Texts
- AMALGAM, Automatic Mapping Among Lexico-Grammatical Annotation Models
- ARTFL Project, University of Chicago, Project for American and French Research on the Treasury of the French Language
- British National Corpus, corpora page.
- CCALAS, Centre for Computer Analysis of Language and Speech
- CCAT: Classical Studies and Religious Studies at UPenn
- CELT: Corpus of Electronic Texts, contemporary and historical Irish documents
- Corpus Cyrillo-Methodianum Helsingiense, an electronic corpus of Old Church Slavonic texts
- Dante Project
- The Data Archive at the Univ. of Essex, computer-readable data in the social sciences and humanities
- ECI Multilingual Corpus
- Goteborg Language Bank of Swedish
- ICAME (Text Corpora) via Web
- International Corpus of English
- IPL Reading Room Public Online Texts
- Japanese Text Initiative
- The Labyrinth (medieval studies)
- Linguistic Data Consortium
- Literature, Electronic Books and Journals Directory via Rice Univ.
- Online Book Initiative e-texts
- On-line Books Page
- On-line books FAQ
- Oxford Text Archive (OTA)
- Penn-Helsinki Parsed Corpus of Middle English, a database of 510,000 words of syntactically parsed Middle English text for use by historical linguists
- Perseus Project, Classical Greek texts both in Greek and in English translation
- Philosophy Etexts
- Project Gutenberg
- Project Libellus (Classics)
- Spanish corpora
- UMich Humanities Text Initiative
- WWW-to-PAT Gateway: exploiting an SGML-aware system through the Web
Dictionaries, Lexica, and Lexical Resources
Indexes and General
- A Web of On-line Dictionaries
- Electronically Available Dictonaries and Corpora
- Language Dictionaries and Translators
- Language Representation Database
- Lexicography e-mail discussion list
- Lexicool.com, Directory of Translation Dictionaries
- List of Dictionaries
- Online Language Dictionaries and Translators
- Special Interest Group on the Lexicon of the ACL
Collections
- CHILDES, Child Language Data Exchange System
- EDICTA: Early Dictionaries
- Lexica from CLR (Consortium for Lexical Research)
- Lexica available from the UMich Linguistics Archive
- The Moby lexicon project (word lists, part-of-speech, thesaurus, etc.)
- travlang's Translating Dictionaries (German, Dutch, French, Spanish, Danish, Portuguese, etc.)
- Wordlists via Oxford
Individual Resources
- Jeffrey's Japanese/English Dictionary Server.
- ARIES Natural Language Tools, a lexical platform for the Spanish language
- ARTFL Project Reference Collection French, English, and South Asia dictionaries
- BioTech's Biotechnology Dictionary
- COBUILD English Dictionary
- COMLEX Syntax, a monolingual English Dictionary consisting of 38,000 head words intended for use in natural language processing
- Turkish-English dictionary
- CoreLex, systematic polysemy and underspecification
- EURODICAUTOM, a database of official and technical terms
- English-Urdu Dictionary
- English verb index from English Verb Classes and Alternations,
by Beth Levin
Download file: evca.zip [28K] - English Wordlists via SIL
- English wordlist with part-of-speech tags
Download file: keiras.zip [51K] - Gamilaraay Dictionary (Australian indigenous language)
- The Kamusi Project/ Internet Living Swahili Dictionary Project, Yale University, Martin Benjamin, General Editor
- LOGOS: Translations, Deja Vu, and Dictionary
- Perseus Project, Greek and Latin lexica
- Roget's Thesaurus version 1.02. Provided by MICRA Inc and the Gutenberg Project
- Spanish wordlist, 90,000+ entires
Download file: span-lex.zip [261K] - The Survey of English Usage based at the University College London
- Thesaurus Linguae Latinae
- Thesaurus Linguae Graecae
- Visual Thesaurus displays interrelationships between words and meanings as spatial maps.
