The Linguistic Data Consortium (LDC) creates, collects and distributes speech and text databases, lexicons, and other resources for research and development purposes. LDC is an open consortium of universities, companies and government research laboratories with the University of Pennsylvania being LDC's host institution. LDC was founded in 1992 with a grant from the Advanced Research Projects Agency (ARPA), and is partly supported by grant IRI-9528587 from the Information and Intelligent Systems division of the National Science Foundation.
Corpus types included in LDC:
"T" indicates a text corpus
"S" indicates a speech audio corpus
"V" indicates a video corpus
"L" indicates a lexicon
To access corpora from 2016-present that are available for download:
For all MIT users, available corpora are listed by year on the LDC site (MIT Libraries’ access to LDC corpora is limited to corpora published from 2016 on). Note that the corpora available as CDs or DVDs from 2016-present can be accessed by individual title through the Library’s Catalog.