LibGuides: Resources and Tools for Computational Research: Descriptions

Alpha Vantage Stock API

What it does: The Alpha Vantage Stock API Service offers pre-processed and normalized finance and economic data for stocks, ETFs, mutual funds, foreign exchange rates, financial reports from SEC filings, and over 50 derived technical indicators

How it’s accessed: API calls are made using any web-enabled client (e.g. a web browser) to make an HTTP GET request to an appropriate URL. API users can use the programming language of their choice

Result format: JSON, CSV

How to register: A free API key can be obtained here

Limitations: Each free API key allows up to 500 API calls per day by default. Please reach out to support@alphavantage.co if a higher rate limit is needed

Contact for technical questions: support@alphavantage.co

For more information: Please refer to the official documentation and the supplementary stock API review article for technical integration guide and financial modeling best practices

American Association for the Advancement of Science (AAAS) articles

What it includes: MIT-subscribed and open access content published by AAAS

How it's accessed: Content may be downloaded for TDM directly from the AAAS online platform for local storage and analysis

Limitations: Downloading must be limited to a “reasonable rate and speed,” users must comply with the terms in Annex A here. Subscribed content limited to MIT users and walk-in users physically present at MIT

Contact for technical questions and more information: http://www.sciencemag.org/subscribe/institutional-license-agreement

American Archive of Public Broadcasting API

What it includes: Digitized public radio and television programs and their metadata records available from the AAPB Online Reading Room

How it’s accessed: By API, see here

Access restrictions: None

Limitations: Some volume limitations may apply

For more information: https://github.com/WGBH/AAPB2#api

American Physical Society (APS) articles

What it includes: MIT-subscribed and open access journals published by APS

How it’s accessed: TDM access can be arranged by request, MIT users should contact textmine@mit.edu

Access restrictions: Subscribed content limited to MIT users

Limitations: Some rate limits and restrictions may apply

For more information: textmine@mit.edu

arXiv Preprint Server

What it does: Gives programmatic access to all of the arXiv articles, metadata, and search interface via bulk metadata access and bulk full text access.

How it’s accessed: OAI-PMH, API, and RSS for metadata access and various cloud options for the full-text access.

How to register: Free to use, no registration or API key required

Limitations: No limitation, but see the Play nice and Consider the impact sections here.

Contact for technical questions: arXiv Google Group

For more information: arXiv Bulk Data Access

BioMed Central Journal articles

What it does: Provides access to open access content published by BMC

How it’s accessed: SpringerNature Open Access API and text and data mining is through SpringerNature

Result format: variety of different output formats, including XML and JSON

How to register: Registration to the developer portal required.

Limitations: No stated limitations for BMC content

Contact for technical questions: supportapi@springernature.com

For more information: BMC's site on I ndexing, archiving and access to data

Brill Academic Publishers articles

What it includes: MIT-subscribed and open access content published by Brill

How it’s accessed: Content may be downloaded for TDM

Access restrictions: Subscribed content limited to MIT users

Limitations: Some rate limits and restrictions may apply

For more information: textmine@mit.edu

Caselaw Access Project API

What it does: Provides queryable access to all published US court decisions

How it’s accessed: In-browser API viewer or RESTful interface, also available as bulk download

Result format: structured XML, presentation HTML, or plain text

How to register: Most queries do not require registration, some jurisdictions with access restrictions require a free API key

Limitations: Full text of cases limited to 500 cases per person per day, unless otherwise authorized. More on access limits here

Contact for technical questions: https://case.law/api/#problems

For more information: https://case.law/

Chronicling America

Coverage: US newspapers from 1789-1924

How it’s accessed: Accessible by API or bulk download

Access restrictions: none, no registration or API key required

Limitations: none

Contact for technical questions: help page

For more information: http://chroniclingamerica.loc.gov/

Congress.gov

What it does: Congress.gov shares its application programming interface (API) to provide computational access to accurate and structured congressional data including bills, amendments, summaries, members, the Congressional Record, committee reports, nominations, treaties, and House Communications. Over time we will be adding other collections such as hearing transcripts and Senate Communications.

How it’s accessed: To use the API you must first get an API key.

Result format: Congress.gov API is a REST API and presents data in a hierarchical browse format with responses provided in XML or JSON. The XML format is the default for the API.

How to register: Free to register, API key required

Limitations: See Github page for limits.

Contact for technical questions: Library of Congress Ask a Librarian service.

For more information: see congress.gov's GitHub page for documentation, user guides, a change log that details changes to the API, and opportunities for feedback

Constellate

What it does: Constellate is a text and data analytics service from JSTOR and Portico that has the ability to build datasets and analyze texts from across multiple content sources, and visualize and analyze their datasets. MIT users can download up to 50,000 documents.

How it’s accessed: All MIT users can access Constellate via the access request form.

Result format: Dataset files may be downloaded in CSV or JSON. For all documents in Constellate, you may download bibliographic metadata, unigrams, bigrams, and full-text. For any content which is not rights restricted (e.g. Chronicling America, Reveal Digital, or early JSTOR content) your dataset files will contain the full-text.

You may read more about

Contact for technical questions: If you have access to Constellate and need technical help, or need additional (typically more than 50,000) document downloads, please contact constellate@ithaka.org.

For more information: Contact lib-comptool@mit.edu with questions about the use of this tool.

CORE

What it does: gives programmatic access to metadata and full-text of millions of OA research papers

How it’s accessed: Through API or bulk data download. See an overview of CORE services for more information

How to register: Free to use, API key required, register for API key at https://core.ac.uk/api-keys/register

Limitations: Quota applied for query volume, details at https://core.ac.uk/services#api

Contact for technical questions: theteam@core.ac.uk

For more information: https://core.ac.uk/services

CrossRef DOI Registry Agency

What it does: Allows access to metadata records for over 75 million scholarly works that have CrossRef DOIs, covering around 5000 publishers. Can be used for text and data mining, checking against funder mandates, and to obtain metadata in a variety of representations.

How it’s accessed: General search interfaces and various APIs

Result format: JSON, Text, and XML

How to register: No registration required

Limitations: No data use stated limitations; may be limited by publisher participation

Contact for technical questions: support@crossref.org

For more information: https://www.crossref.org/documentation/retrieve-metadata/

CQ Press content

What it includes: American government, politics, history, public policy, and current affairs, 1923-present

How it’s accessed: Content may be downloaded for TDM

Access restrictions: Limited to MIT users, alumni, and and walk-in users physically present at MIT

Limitations: Some rate limits and restrictions may apply

For more information: textmine@mit.edu

Dataverse

What it does: data available through Dataverse is available for TDM, including content from the Harvard Dataverse Network, MIT Libraries-purchased data, and data deposited in other Dataverse Network repositories

How they’re accessed: Data may be downloaded for local analysis, or users may use a Dataverse API

Result format: DDI XML and JSON for partial records

How to register: Access to restricted data sets requires approval by data owners. To access MIT Libraries-purchased dat a, login to Dataverse by selecting Massachusetts Institute of Technology and using your certificates or touchstone.

Limitations: No limitations on public data set downloads after agreeing to terms of use. No limitations on restricted data set downloads after access is granted by data owners

Contact for technical questions: dvn_support@help.hmdc.harvard.edu; Questions can also be posted in https://groups.google.com/forum/#!forum/dataverse-community

For more information: http://guides.dataverse.org/en/4.6/api/ and http://guides.dataverse.org/en/4.6/user/index.html

Dewey Data

Coverage: The dataset vendors accessible through Dewey Data include:

SafeGraph- POI and aggregated transaction patterns
Advan Research-Aggregated foot-traffic
Context Analytics-Twitter social sentiment
Similarweb-Website traffic and search keywords
BrightQuery-Private company data
Global Wireless Solutions-Mobile app engagement data
PDI Technologies (formerly Skupos)-U.S. point-of-sale data
A full listing of the dataset vendors can be found here. Please read the terms and conditions below before using this database.

How it’s accessed: https://www.deweydata.io/

Users must create an account using your MIT email to access Dewey Data. Click on the Get started button and then the Sign-up option to create your account. Dewey Data will upgrade your account for full access.

Access restrictions: None

Limitations: None

Digital Public Library of America (DPLA) metadata

What it does: Allows programmatic access to metadata in DPLA collections, including partner data from Harvard, New York Public Library, ARTstor, and others

How it’s accessed: DPLA metadata is accessible by API or as zipped JSON files for bulk download

Result format: Structured JSON-LD objects

How to register: Free to use; API key must be requested with information here: https://pro.dp.la/developers/policies#get-a-key

Limitations: No stated limitations

Contact for technical questions: codex@dp.la

Digital Theatre Plus (DT+) content

What it includes: MIT-subscribed and open access content published by Digital Theatre

How it’s accessed: Contact textmine@mit.edu

Access restrictions: Subscribed content limited to MIT users

Limitations: Non-commercial use only, some rate limits and restrictions may apply

For more information: textmine@mit.edu

Dimension Analytics API

What it does: Dimensions Analytics API enables users to perform analytics on the Dimensions Analytics database, which contains publications, awarded grants, patents, datasets, clinical trials and policy documents to deliver an unparalleled look at research globally..

How it’s accessed: Access is provided through an API key. MIT Libraries has licensed 10 keys for the community. MIT users must register with Dimensions Analytics prior to requesting a key. Please use this form to request access.

Result format: The Dimensions Analytics API uses a custom, domain-specific language called the Dimensions Search Language (DSL) to return results in JSON. Additionally, there is a Google Sheets Connector that enables queries as well as an extensive API Lab with reusable Python notebooks.

Contact for technical questions: Dimensions Analytics has extensive support and documentation. Go to the Support menu at the top right of Dimensions Analytics web app. If you need more assistance, please contact support@dimensions.ai.

For more information: Contact lib-comptool@mit.edu with questions about the use of this tool.

Early English Books Online Text Creation Partnership (EEBO-TCP)

Coverage: the EEBO TCP Phase I corpus: books printed in England, Ireland, Scotland, Wales and British North America and works in English printed elsewhere from 1473–1700

How it’s accessed: full-text access and search tools available to all via the University of Michigan EEBO-TCP site, downloadable full-text files available here, HTML, ePUB, and TEI P5 XML copies available through the Oxford Text Archive, and tiff files of page images available to MIT users here

Access restrictions: text versions of Phase 1 content are openly available for public use, page images and Phase II text limited to subscribing institutions

Limitations: no limitations on openly available data, access via ProQuest subject to terms of use

Contact for technical questions: University of Michigan EEBO help

For more information: http://quod.lib.umich.edu/e/eebogroup/

Eighteenth Century Collections Online Text Creation Partnership (ECCO-TCP)

Coverage: English-language and foreign-language titles printed in the United Kingdom during the 18th century, along with thousands of important works from the Americas (the ECCO-TCP corpus)

How it’s accessed: Multiple ways to access, listed here

Access restrictions: none

Limitations: none

Contact for technical questions: http://www.textcreationpartnership.org/contact/

For more information: http://www.textcreationpartnership.org/tcp-ecco/

Electrochemical Society

What it includes: MIT-subscribed Electrochemical Society publications

How it’s accessed: TDM access can be arranged on a per-project basis, MIT users can contact textmine@mit.edu to inquire

Access restrictions: Subscribed content limited to MIT users

Limitations: Some rate limits and restrictions may apply

For more information: textmine@mit.edu

Emerald Publishing

What it includes: MIT-subscribed and open access content available from Emerald Publishing

How it’s accessed: Accessible via CrossRef’s TDM service; users should inform support@emeraldinsight.com of the IP address that will be used for mining prior to beginning to avoid an IP block

Access restrictions: Subscribed content limited to MIT users

Limitations: Non-commercial use only; subject to terms in the TDM license

Contact for technical questions: support@emeraldinsight.com

For more information: https://www.emerald.com/insight/site-policies

Europeana data, metadata and annotation

What it is: Wide variety of European content, selected openly available content listed here

How they’re accessed: searchable by web interface or various APIs

Result format: Varies by API

How to register: https://pro.europeana.eu/pages/get-api

Limitations: See Terms of Use for varying items

Contact for technical questions: api@europeana.eu or API Google groups page

For more information: https://pro.europeana.eu/page/apis

Evans Early American Imprint Collection Text Creation Partnership (Evans-TCP)

Coverage: 6,000 of the most frequently studied books from the Evans Early American Imprints Collection

How it’s accessed: Evans-TCP web interface

Access restrictions: none

Limitations: none

Contact for technical questions: http://www.textcreationpartnership.org/contact/

For more information: http://quod.lib.umich.edu/e/evans/

Google Books

Coverage: Large corpus of > 25 million scanned books from libraries and publishers, including foreign language corpora

How it’s accessed: Multiple ways to access, including third party tools: search via Google Books web interface, Ngram Viewer, BYU Google Books viewer, Culturomics Bookworm Viewer

Access restrictions: none

Limitations: TDM output limited to snippet view for in-copyright works

Contact for technical questions: Google Books help

For more information: https://books.google.com/intl/en/googlebooks/about/

HathiTrust Digital Library

What it is: Two large corpora of scanned works available for download: a non-Google corpus of >550,000 primarily English-language public domain volumes published prior to 1923, and a Google-digitized corpus of >4.8 million public domain in a wide variety of languages, subjects, and dates (see visualizations of coverage); custom datasets also available

How it’s accessed: All content available for search via web interface, content also available for computational analysis through the HathiTrust Research Center, as datasets to download or via various APIs

Result format: varies depending on query method

How to register: Two methods of access: via a Web client, requiring authentication (users who are not members of a HathiTrust partner institution must sign up for a University of Michigan “Friend” Account), or programmatically using an access key that can be obtained at http://babel.hathitrust.org/cgi/kgs/request

Limitations: All text is searchable, but web search output results are limited for in-copyright works; no restrictions on download of non-Google public domain corpus, download of Google-digitized corpus is restricted to participating institutions. Check the limitations for downloaded corpora and other policies.

Contact for technical questions: feedback@issues.hathitrust.org, https://www.hathitrust.org/feedback

For more information: https://www.hathitrust.org/data_api and https://www.hathitrust.org/datasets

HeinOnline

What it includes: Full-text legal history collection from 1700-present including legal journals, books, world constitutions, treaties, US Supreme Court reports, US Code, Statutes at Large, Code of Federal Regulations, Congressional Record, presidential papers, Foreign Relations of the United States, federal agency reports and records, Philippine law collection, resources for researching legislative histories, and 5th-7th editions of Leiter’s “National Survey of State Laws”

How it’s accessed: TDM access can be arranged on a per-project basis, MIT users can contact textmine@mit.edu to inquire

Access restrictions: Subscribed content limited to MIT users

Limitations: Some rate limits and restrictions may apply

For more information: textmine@mit.edu

ICSD (Inorganic Crystal Structure Database) API

What it includes: ICSD (Inorganic Crystal Structure Database) provides the world's largest database for completely identified inorganic crystal structures, including minerals, metals, alloys and metal-organic.The first records date back to 1913. Currently, around 12,000 new structures are added every year.

The curated data collection includes:

How it’s accessed: MIT affiliates have access to ICSD API service via our site license. To request access to the ICSD API, please email icsd-api@mit.edu and include the following information in your message.

Your project title and a brief description
Estimated duration of API access needed
Your MIT email address

Each user will need to review and agree on the Terms and Conditions of ICSD API from ICSD.

Access restrictions: The MIT community (MIT faculty, students, post-docs & other researchers, and staff) are authorized to use ICSD for academic research purposes only.

The data can only be used for academic research and cannot be used to “calculate powder pattern collections for material identification or quantitation, or to make derivative databases for the aforementioned purposes”.
The dataset cannot be shared with anyone else outside your project.
The dataset needs to be deleted when you finish your project or when our license expires.

Limitations: We have limited number of seats for ICSD API access. They will be allocated to users on first-come first-serviced basis with an expiration date based on individual project needs. If you need an extension of the access, please email icsd-api@mit.edu .

For more information: textmine@mit.edu

IEEE Xplore API

What it does: Provides flexible query and retrieval of metadata records for more then 4 million documents comprising IEEE journals, conference proceedings, and technical standards

How it’s accessed: HTTP requests using structured URL queries

Result format: JSON, XML

How to register: Follow the steps at https://developer.ieee.org/getting_started

Limitations: Maximum of 200 results may be retrieved in a single query. A query term can only contain a maximum of 10 words

Contact for technical questions: onlinesupport@ieee.org

For more information: https://developer.ieee.org/

Internet Archive eBooks and Texts

Coverage: Over 11 million fully accessible books and texts

How it’s accessed: Searchable by web interface, with multiple download formats for individual works; instructions for a method for bulk download here

Access restrictions: none

Limitations: No stated technical limitations; subject to terms of use

Contact for technical questions: info@archive.org

For more information: https://archive.org/details/texts

Institute of Physics (IOP)

What it includes: MIT-subscribed and open access journals published by IOP

How it’s accessed: TDM access can be arranged by request, MIT users should contact textmine@mit.edu

Access restrictions: Subscribed content limited to MIT users

Limitations: Some rate limits and restrictions may apply

For more information: textmine@mit.edu

JSTOR Data for Research

What it does: Not a true API, but allows computational analysis and selection of JSTOR’s scholarly journal and primary resource collections. Includes tools for faceted searching and filtering, text analysis, topic modeling, data extraction, and visualization

How it’s accessed: Web interface

Result format: CSV, varies depending on tool used

How to register: Free to access, registration is required to obtain results; no institutional affiliation required

Limitations: Datasets are capped by default at 1,000 articles; users seeking larger results are asked to contact JSTOR Data for Research

Contact for technical questions: support@jstor.org, http://about.jstor.org/contact

For more information: http://about.jstor.org/service/data-for-research

The Lens API

What it does: The Lens API provides programmatic access to Patent and Scholarly Works meta records. The Lens meta records of patents and scholarly works are metadata aggregated from various sources with persistent identifiers from the original data sources, and normalized with provenance maintained. All MIT affiliates have access via the Institutional User API Plans, which include the following rates and volumes for each user.

Scholarly Institutional User API Plan

Patent Institutional User API Plan

● 5,000 requests per month

● 500 records per request

● 10 requests per minute

● 5,000 requests per month

● 100 records per request

● 10 requests per minute

In addition, MIT Libraries has a small number of seats for high-volume API access via the Institutional Toolkit (ITK) Plan.

How it’s accessed: All MIT affiliates can access the API via the request access form.

Result format: provides programmatic access via REST API to the full corpus of Lens scholarly works and patent, with JSON output format. The Lens API documentation includes details of request structure, searchable fields, code examples, and response fields etc. You may also use their Swagger UI for query development.

Contact for technical questions: If you have access to the API and need technical help, please create an issue on The Lens GitHub repo or use the feedback form on their support page at https://docs.api.lens.org/support.html. You may also contact support@lens.org.

For more information: Contact lib-comptool@mit.edu with questions about our subscription and the use of this tool.

LexisNexis Academic

Coverage: news, business, and legal resources

How it’s accessed: Content may be downloaded for TDM via the web interface

Access restrictions: Access limited to MIT users

Limitations: Content may be downloaded from search results in batches up to the batch limit allowed by the platform, and must be deleted after 90 days; scripting of batch downloads is not permitted

For more information: textmine@mit.edu

Library of Congress APIs

What they do: Multiple APIs available to download bibliographic data and search Library of Congress digital collections, including images, public radio and television, and historic newspapers

How they’re accessed: Varies by API used, more information available here

Result format: Varies by API used

How to register: Free to use, most APIs do not require an API key

Limitations: Not specified, varies by API used

Contact for technical questions: https://labs.loc.gov/lc-for-robots/

For more information: https://labs.loc.gov/lc-for-robots/

Linguistics Data Consortium (LDC)

What it does: The Linguistic Data Consortium (LDC) creates, collects and distributes speech and text databases, lexicons, and other resources for research and development purposes.

How it’s accessed: All MIT users can access LDC. Access information is listed at this libguide: https://libguides.mit.edu/ldc

Result format: Most text data is released as plain text or in XML format. LDC prefers simple and shallow formatting to make it easy for an automatic process to use, ignore or remove the markup tagging.

Contact for technical questions: If you have access to the computational tool and need technical help, please contact ldc-lib@mit.edu
For more information: Contact ldc-lib@mit.edu with questions about the use of this tool.

music21

Coverage: Corpus of encoded public domain and openly licensed musical compositions; full list available here: http://mit.edu/music21/doc/about/referenceCorpus.html

How it’s accessed: Install the music21 toolkit and access via Python

Access restrictions: None

Limitations: None

Contact for technical questions: http://groups.google.com/group/music21list

For more information: http://mit.edu/music21/

Nature Blogs API

What it does: Blog tracking and indexing service; tracks Nature blogs and other third-party science blogs

How it’s accessed: RESTful interface, queries are made as HTTP GET requests

Result format: Default is JSON, some queries return Atom/RSS

How to register: Free to register, API key no longer required as of 2013

Limitations: 2 calls per second; 5,000 calls per day; RSS results are limited to 100 items maximum

Contact for technical questions: developers@nature.com

For more information: http://www.nature.com/developers/documentation/api-references/blogs-api/

Nature OpenSearch API

What it does: Bibliographic search service for Nature content

How it’s accessed: RSS, JSON, ATOM, SRU XML, TURTLE, depending on interface used

Result format: REST API with two interfaces: 1) OpenSearch standard interface using keyword searches; 2) SRU search interface using CQL structed queries

How to register: Free to register, API key no longer required as of 2013

Limitations: Results served in pages of 25 records. Additional records can be retrieved by paging through the result set. The page size can be varied and is capped at 100 records

Contact for technical questions: developers@nature.com

For more information: http://www.nature.com/developers/documentation/api-references/opensearch-api/

Nature articles

What it includes: MIT-subscribed and open access publications by Nature

How it’s accessed: Content may be downloaded directly from the Nature online platform, including by automated download

Access restrictions: Limited to MIT users and research collaborators

Limitations: Download of content should not exceed 1 document per second; TDM rights are limited to non-commercial use

For more information: textmine@mit.edu

NCBI Developer Portal

What it is: The NCBI developer portal contains a variety of resources such as API, software libraries, and datasets that can me downloaded and accessed for computational use.

Limitation: No limitations, but see their data use policy and usage guidelines

Contact for technical support: contact the NCBI support center

New York Times

Coverage: metadata and some content from New York Times articles 1851-present

How it’s accessed: Multiple APIs are available for different uses, full list here

Access restrictions: Free to access with registration and acceptance of terms of use

Limitations: Noncommercial use only, and users must agree to terms of use; API calls limited to 1,000 calls per day, and 5 calls per second

Contact for technical questions: code@nytimes.com

For more information: http://developer.nytimes.com/

OECD Data APIs

What they do: Allows programmatic access to a selection of OECD datasets

How they’re accessed: two RESTful APIs available for queries in SDMX-JSON or SDMX-ML formats

Result format: JSON, XML

How to register: No registration required

Limitations: 1 million data points; not all OECD datasets are covered

Contact for technical questions: OECDdotStat@oecd.org

For more information: https://data.oecd.org/api/

ORCID API

What it does: Queries and searches the ORCID researcher identifier system and obtain researcher profile data

How it’s accessed: RESTful interface

Result format: HTML, XML, or JSON

How to register: Two options: 1) Users can access the Public API, which only returns data marked as “public”; 2) Become an ORCID member to receive API credentials: see here

Limitations: Data retrieved through Public API is limited

Contact for technical questions: https://orcid.org/help/contact-us

For more information: https://orcid.org/organizations/integrators/API

Oxford Text Archive

Coverage: full catalog and selected corpora here

How it’s accessed: Searchable by web interface, with multiple download formats for individual works; curated corpora also available

Access restrictions: Most texts are free to use; some subject to depositor restrictions

Limitations: User must agree to user agreement

Contact for technical questions: ota@it.ox.ac.uk

For more information: http://ota.ox.ac.uk/

Oxford University Press (OUP) Publications

What it includes: MIT-subscribed and open access content published by Oxford University Press (OUP)

How it’s accessed: Content may be downloaded for TDM

Access restrictions: MIT subscribed content limited to MIT users

Limitations: Noncommercial use only

For more information: textmine@mit.edu

PLoS APIs and bulk download

What it is: Every PLOS article, including all Articles and Front Matter can be bulk downloaded and data about PLOS articles or the articles themselves can be access through APIs.

How it’s accessed: Bulk download and APIs

Limitations: No stated technical limitations; content under CC-BY license, bulk data does not include figures or supplemental data.

Access restrictions: No restrictions. See API display policy

For technical support: Join the PLOS API developers group.

Project Gutenberg

Coverage: >53,000 books, primarily in the public domain (pre-1923)

How it’s accessed: Individual works are downloadable in multiple formats from the Project Gutenberg website; bulk downloading is permitted via mirroring or wget, more information available here

Access restrictions: none

Limitations: Most works are available for use without restriction, but in-copyright works may have individual restrictions; users must agree to terms of use

Contact for technical questions: Contact information

For more information: http://www.gutenberg.org/wiki/Gutenberg:Information_About_Robot_Access_to_our_Pages

Royal Society of Chemistry articles

What it includes: MIT-subscribed and open access content published by the Royal Society of Chemistry

How it’s accessed: TDM access can be arranged by request, MIT users should contact textmine@mit.edu

Access restrictions: MIT subscribed content limited to MIT users

Limitations: Noncommercial use only

For more information: textmine@mit.edu

Sage Publications

What it includes: MIT-subscribed and open access content published by Sage

How it’s accessed: Content may be downloaded for TDM

Access restrictions: MIT subscribed content limited to MIT users

Limitations: Noncommercial use only

For more information: textmine@mit.edu

STAT!Ref OpenSearch API

What it does: Bibliographic search service for displaying STAT!Ref results on a website.

How it’s accessed: OpenSearch specifications

Result format: RSS, ATOM, HTML

How to register: Free to register for users at a subscribing institution

Limitations: Limits exist but are not specified; high-volume users should contact STAT!Ref

Contact for technical questions: support@statref.com

For more information: http://online.statref.com/Resources/StatRefOpenSearch.aspx

ScienceDirect APIs

What they do: Multiple APIs available for different use cases, including text mining of full-text content, search widgets, displaying journal or book level data, federated searching, and indexing

How they’re accessed: varies, depending on use case

Result format: varies, depending on use case

How to register: Free to register at https://dev.elsevier.com/. MIT users should register via institutional login

Limitations: varies, some API results are limited to MIT Libraries' article entitlements.

Contact for technical questions: integrationsupport@elsevier.com

For more information: https://dev.elsevier.com/; https://dev.elsevier.com/sd_apis.html

Smithsonian Astrophysical Observatory/NASA Astrophysics Data System API

Coverage: Three bibliographic databases of publications in astronomy, astrophysics, physics, and all content included in the arXiv e-prints

How it’s accessed: available through ADS API, request a token

Access restrictions: none

Limitations: systematic downloading of content prohibited except through provided API, API subject to rate and scope limits; researchers seeking to regularly download and store copies of API results should contact ADS first. See terms of use.

Contact for technical questions: adshelp@cfa.harvard.edu

For more information: https://ui.adsabs.harvard.edu/about/ and the full API documentation

Springer journal articles

What it includes: MIT-subscribed and open access content on the SpringerLink platform

How it’s accessed: Content may be downloaded for TDM directly from SpringerLink, and downloading may be automated for that purpose; Springer APIs may be used to identify desired content for download.

Access restrictions: Subscribed content limit to MIT users. To access Springer content that the MIT Libraries subscribes to researchers need to be within MIT’s IP range. If on campus please connect to the MIT Secure network; if off campus please connect to MIT VPN.

Limitations: Non-commercial use only, users should adhere to the Springer TDM policy

Contact for technical questions: Eddie.Bates@springernature.com, support.api@springer.com

For more information: https://www.springer.com/gp/rights-permissions/springer-s-text-and-data-mining-policy/29056

Springer APIs

What they do: Multiple APIs providing access to Springer metadata and open access content

How they’re accessed: RESTful interface, using structured URL requests

Result format: XML and JSON

How to register: Free to register, API key required and provided after registration.

Limitations: maximum results for a single query is 100 results for metadata queries, or 20 results for open access queries

Contact for technical questions: support.api@springer.com

For more information: https://dev.springer.com/; https://dev.springer.com/docs; https://dev.springer.com/restfuloperations

Taylor & Francis articles

What it includes: MIT-subscribed and open access journals published by Taylor & Francis

How it’s accessed: TDM access can be arranged by request, MIT users should contact textmine@mit.edu

Access restrictions: Subscribed content limited to MIT users

Limitations: Some rate limits and restrictions may apply

For more information: textmine@mit.edu

TDM Studio

What it does: ProQuest’s TDM Studio is a text and data mining solution for research, teaching and learning and allows MIT users to analyze the content from eligible MIT Libraries ProQuest subscriptions. Please check the eligible content file for details and the date last updated. If you need a more recent version, please contact lib-comptool@mit.edu

How it’s accessed: All MIT-affiliated users can access TDM Studio via the request access form.

Result format: Dataset files of metadata and derived data can be downloaded in CSV format. Full-text content export is not allowed.

Contact for technical questions: If you have access to the API and need technical help, please contact TDMStudio@clarivate.com.

For more information: Contact lib-comptool@mit.edu with questions about the use of this tool.

UN Comtrade APIs

What they do: Allow access to data on International Merchandise Trade Statistics (IMTS) and the work of the International Merchandise Trade Statistics Section (IMTSS) of the United Nations Statistics Division

How they’re accessed: Some services in REST, some in SOAP

Result format: XML or CSV, depending on service

How to register: Comtrade Web Services requires IP authentication, users must have site license account, however, access to metadata and data availability is not restricted

Limitations: Depending on access rights, the following data can be obtained: Comtrade Data, Tariff Line Data, Total Trade, Annual Totals, Processed Data or Original Data. The latest three are restricted for data exchange between UN and OECD.

Contact for technical questions: comtrade@un.org

For more information: https://comtrade.un.org/ws/

Wall Street Journal Historical Archive

What it includes: Wall Street Journal content 1889-1934

How it’s accessed: Downloadable in XML format from Dataverse

Access restrictions: Limited to MIT users

Limitations: None

Contact for technical questions: dvn_support@help.hmdc.harvard.edu; Questions can also be posted in https://groups.google.com/forum/#!forum/dataverse-community

For more information: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/XAUHMH

Web of Science Lite

What it does: Allows text- and data-mining access to content in Web of Science Lite

How it’s accessed: Accessible via Clarivate’s Developers Portal

Result format: JSON or XML

How to register: Must be part of a subscribing institution to have full text access. MIT users must set up an individual account at Clarivate’s Developers Portal https://developer.clarivate.com and fill out a form with their name, email address, and an optional description of the project.

Limitations: Maximum number of tokens per user:1. Maximum number of requests/second:2.

Users may use the API to access the Data Fields in accordance with the applicable License Level, in each case as permitted by your subscription.

If a user is using Web of Science data in an article or presentation they must appropriately cite and credit Clarivate Analytics as the source.

Contact for technical questions: Contact support link here

For more information: https://developer.clarivate.com/

Web of Science API Expanded

What it does: The Web of Science™ API Expanded API supports rich searching across the Web of Science to retrieve full item-level metadata from an expanded list of fields, including times cited counts, contributor addresses/affiliations and funding data. Additional operations support the ability to discover related records as well as cited references and citing items. The API offers full customization and flexibility for researchers who want to build more sophisticated queries, but requires some technical skill and coding ability to get up and running. MIT Libraries’ subscription is limited to 1 million record downloads for the whole MIT community.

How it’s accessed: Access is provided through an API key and the WoS Developer Portal (registration required). Due to the limited amount of downloads per year for the MIT community, requests for access need to be reviewed. Please use this form to request access.

Result format: provides REST-based programmatic access to the Web of Science™ documents, with JSON and XML output format

Contact for technical questions: If you have access to the API and need technical help, please contact clarivate.customersupport@clarivate.com or use the contact form.

For more information: Contact lib-comptool@mit.edu with questions about the use of this tool.

Wiley Text and Data Mining

What it does: Allows text- and data-mining access to content in the Wiley Online Library

How it’s accessed: Accessible via CrossRef’s TDM service; RESTful interface

Result format: JSON

How to register: Must be part of a subscribing institution to have full text access. Users will encounter a click-through agreement and will receive a Client API Token, which is needed when requesting full text of articles

Limitations: rate-limits implemented through CrossRef rate-limiting headers, exact limitations not specified

Contact for technical questions: TDM@wiley.com; labs@crossref.org for support using the CrossRef TDM service

For more information: https://onlinelibrary.wiley.com/library-info/resources/text-and-datamining

World Bank APIs

What they do: Provide access to World Bank statistical databases, indicators, projects, and loans, credits, financial statements and other data related to financial operations

How they’re accessed: Three RESTful APIs available to provide access to different datasets: Indicators (time series data), Projects (data on the World Bank’s operations), Finances (World Bank financial data)

Result format: XML, JSON, RDF, and Atom, depending on specific API used

How to register: Free to use, no registration or API key required

Limitations: Request volume limits are unspecified, but should be “reasonable”

Contact for technical questions: data@worldbank.org or “Contact support” link here

For more information: https://datahelpdesk.worldbank.org/knowledgebase/topics/125589

World Digital Library

Coverage: Primary source materials from many cultures and countries, representing over 100 different languages

How it’s accessed: Multiple access methods supported, see: http://api.wdl.org/

Access restrictions: None

Limitations: None

For more information: http://api.wdl.org/; https://labs.loc.gov/lc-for-robots/