Skip to Main Content

MIT Libraries logo

MIT logo Search Account

Linguistics: Linguistic Data Consortium

This is the Linguistics subject guide

Accessing Linguistic Data Consortium via MIT Libraries

The Linguistic Data Consortium (LDC) creates, collects and distributes speech and text databases, lexicons, and other resources for research and development purposes. LDC is an open consortium of universities, companies and government research laboratories with the University of Pennsylvania being LDC's host institution. LDC was founded in 1992 with a grant from the Advanced Research Projects Agency (ARPA), and is partly supported by grant IRI-9528587 from the Information and Intelligent Systems division of the National Science Foundation. 

What included in LDC

Corpus types included in LDC: 

"T" indicates a text corpus
"S" indicates a speech audio corpus
"V" indicates a video corpus
"L" indicates a lexicon

Linguistic Data Consortium

To access corpora from 2016-present that are available for download: 

  1. First, register with LDC using your MIT email address and your current department or lab.
    1. Under "Organization" look for  "Massachusetts Institute of Technology - MIT - Libraries" and select that option from the resulting drop-down menu. Do not use abbreviated form (e.g. "MIT") please type the whole string above to access the correct account.
    2. The libraries will approve your account within one business day. (If you don't receive an email, check your spam filter).
  2. After your account is approved, to access data, login to LDC and then go to the downloads tab under "your account options".
    1. The datasets that MIT Libraries has access to will be listed.
    2. If you need a corpus that is only available by hard drive or requires a separate license before download, please email Please note that if you need to use a dataset that requires a special license signed by the individual user, your name may be available to other MIT LDC users.
    3. If you need a pre-2016 corpus that we don’t have subscription access to, please fill out this form. Please note that typical delivery time is 5-7 business days – this can vary based on corpora cost and license(s), so some may take longer!

For all MIT users, available corpora are listed by year on the LDC site (MIT Libraries’ access to LDC corpora is limited to corpora published from 2016 on). Note that the corpora available as CDs or DVDs from 2016-present can be accessed by individual title through the Library’s Catalog.