What it does: The Alpha Vantage Stock API Service offers pre-processed and normalized finance and economic data for stocks, ETFs, mutual funds, foreign exchange rates, financial reports from SEC filings, and over 50 derived technical indicators
How it’s accessed: API calls are made using any web-enabled client (e.g. a web browser) to make an HTTP GET request to an appropriate URL. API users can use the programming language of their choice
Result format: JSON, CSV
How to register: A free API key can be obtained here
Limitations: Each free API key allows up to 500 API calls per day by default. Please reach out to support@alphavantage.co if a higher rate limit is needed
Contact for technical questions: support@alphavantage.co
For more information: Please refer to the official documentation and the supplementary stock API review article for technical integration guide and financial modeling best practices
What it includes: MIT-subscribed and open access content published by AAAS
How it's accessed: Content may be downloaded for TDM directly from the AAAS online platform for local storage and analysis
Limitations: Downloading must be limited to a “reasonable rate and speed,” users must comply with the terms in Annex A here. Subscribed content limited to MIT users and walk-in users physically present at MIT
Contact for technical questions and more information: http://www.sciencemag.org/subscribe/institutional-license-agreement
What it includes: Digitized public radio and television programs and their metadata records available from the AAPB Online Reading Room
How it’s accessed: By API, see here
Access restrictions: None
Limitations: Some volume limitations may apply
For more information: https://github.com/WGBH/AAPB2#api
What it includes: MIT-subscribed and open access publications by ACS
How it’s accessed: Content is delivered for local storage and analysis; users may use tools of their choice for analysis
Access restrictions: Limited to MIT users and their research collaborators, who must agree to and sign an agreement with ACS; to begin, contact textmine@mit.edu
Limitations: No limitations on volume, but users will need to provide information on the specific content they would like to mine (journal title and date range, or a list of DOIs)
For more information: textmine@mit.edu
What it includes: MIT-subscribed and open access journals published by APS
How it’s accessed: TDM access can be arranged by request, MIT users should contact textmine@mit.edu
Access restrictions: Subscribed content limited to MIT users
Limitations: Some rate limits and restrictions may apply
For more information: textmine@mit.edu
What it does: Gives programmatic access to all of the arXiv articles, metadata, and search interface via bulk metadata access and bulk full text access.
How it’s accessed: OAI-PMH, API, and RSS for metadata access and various cloud options for the full-text access.
How to register: Free to use, no registration or API key required
Limitations: No limitation, but see the Play nice and Consider the impact sections here.
Contact for technical questions: arXiv Google Group
For more information: arXiv Bulk Data Access
What it does: Provides access to open access content published by BMC
How it’s accessed: SpringerNature Open Access API and text and data mining is through SpringerNature
Result format: variety of different output formats, including XML and JSON
How to register: Registration to the developer portal required.
Limitations: No stated limitations for BMC content
Contact for technical questions: supportapi@springernature.com
For more information: BMC's site on Indexing, archiving and access to data
What it includes: MIT-subscribed and open access content published by Brill
How it’s accessed: Content may be downloaded for TDM
Access restrictions: Subscribed content limited to MIT users
Limitations: Some rate limits and restrictions may apply
For more information: textmine@mit.edu
What it does: Provides queryable access to all published US court decisions
How it’s accessed: In-browser API viewer or RESTful interface, also available as bulk download
Result format: structured XML, presentation HTML, or plain text
How to register: Most queries do not require registration, some jurisdictions with access restrictions require a free API key
Limitations: Full text of cases limited to 500 cases per person per day, unless otherwise authorized. More on access limits here
Contact for technical questions: https://case.law/api/#problems
For more information: https://case.law/
Coverage: US newspapers from 1789-1924
How it’s accessed: Accessible by API or bulk download
Access restrictions: none, no registration or API key required
Limitations: none
Contact for technical questions: help page
For more information: http://chroniclingamerica.loc.gov/
What it does: Congress.gov shares its application programming interface (API) to provide computational access to accurate and structured congressional data including bills, amendments, summaries, members, the Congressional Record, committee reports, nominations, treaties, and House Communications. Over time we will be adding other collections such as hearing transcripts and Senate Communications.
How it’s accessed: To use the API you must first get an API key.
Result format: Congress.gov API is a REST API and presents data in a hierarchical browse format with responses provided in XML or JSON. The XML format is the default for the API.
How to register: Free to register, API key required
Limitations: See Github page for limits.
Contact for technical questions: Library of Congress Ask a Librarian service.
For more information: see congress.gov's GitHub page for documentation, user guides, a change log that details changes to the API, and opportunities for feedback
What it does: Constellate is a text and data analytics service from JSTOR and Portico that has the ability to build datasets and analyze texts from across multiple content sources, and visualize and analyze their datasets. MIT users can download up to 50,000 documents.
How it’s accessed: All MIT users can access Constellate via the access request form.
Result format: Dataset files may be downloaded in CSV or JSON. For all documents in Constellate, you may download bibliographic metadata, unigrams, bigrams, and full-text. For any content which is not rights restricted (e.g. Chronicling America, Reveal Digital, or early JSTOR content) your dataset files will contain the full-text.
You may read more about
Contact for technical questions: If you have access to Constellate and need technical help, or need additional (typically more than 50,000) document downloads, please contact constellate@ithaka.org.
For more information: Contact lib-comptool@mit.edu with questions about the use of this tool.
What it does: gives programmatic access to metadata and full-text of millions of OA research papers
How it’s accessed: Through API or bulk data download. See an overview of CORE services for more information
How to register: Free to use, API key required, register for API key at https://core.ac.uk/api-keys/register
Limitations: Quota applied for query volume, details at https://core.ac.uk/services#api
Contact for technical questions: theteam@core.ac.uk
For more information: https://core.ac.uk/services
What it does: Allows access to metadata records for over 75 million scholarly works that have CrossRef DOIs, covering around 5000 publishers. Can be used for text and data mining, checking against funder mandates, and to obtain metadata in a variety of representations.
How it’s accessed: General search interfaces and various APIs
Result format: JSON, Text, and XML
How to register: No registration required
Limitations: No data use stated limitations; may be limited by publisher participation
Contact for technical questions: support@crossref.org
For more information: https://www.crossref.org/documentation/retrieve-metadata/
What it includes: American government, politics, history, public policy, and current affairs, 1923-present
How it’s accessed: Content may be downloaded for TDM
Access restrictions: Limited to MIT users, alumni, and and walk-in users physically present at MIT
Limitations: Some rate limits and restrictions may apply
For more information: textmine@mit.edu
What it does: data available through Dataverse is available for TDM, including content from the Harvard Dataverse Network, MIT Libraries-purchased data, and data deposited in other Dataverse Network repositories
How they’re accessed: Data may be downloaded for local analysis, or users may use a Dataverse API
Result format: DDI XML and JSON for partial records
How to register: Access to restricted data sets requires approval by data owners. To access MIT Libraries-purchased data, login to Dataverse by selecting Massachusetts Institute of Technology and using your certificates or touchstone.
Limitations: No limitations on public data set downloads after agreeing to terms of use. No limitations on restricted data set downloads after access is granted by data owners
Contact for technical questions: dvn_support@help.hmdc.harvard.edu; Questions can also be posted in https://groups.google.com/forum/#!forum/dataverse-community
For more information: http://guides.dataverse.org/en/4.6/api/ and http://guides.dataverse.org/en/4.6/user/index.html
Coverage: The dataset vendors accessible through Dewey Data include:
SafeGraph- POI and aggregated transaction patterns
Advan Research-Aggregated foot-traffic
Context Analytics-Twitter social sentiment
Similarweb-Website traffic and search keywords
BrightQuery-Private company data
Global Wireless Solutions-Mobile app engagement data
PDI Technologies (formerly Skupos)-U.S. point-of-sale data
A full listing of the dataset vendors can be found here. Please read the terms and conditions below before using this database.
How it’s accessed: https://www.deweydata.io/
Users must create an account using your MIT email to access Dewey Data. Click on the Get started button and then the Sign-up option to create your account. Dewey Data will upgrade your account for full access.
Access restrictions: None
Limitations: None
What it does: Allows programmatic access to metadata in DPLA collections, including partner data from Harvard, New York Public Library, ARTstor, and others
How it’s accessed: DPLA metadata is accessible by API or as zipped JSON files for bulk download
Result format: Structured JSON-LD objects
How to register: Free to use; API key must be requested with information here: https://dp.la/info/developers/codex/policies/#get-a-key
Limitations: No stated limitations
Contact for technical questions: codex@dp.la; Users can also submit issues to DPLA’s Issue Tracker
For more information: http://dp.la/info/developers/codex/
What it includes: MIT-subscribed and open access content published by Digital Theatre
How it’s accessed: Contact textmine@mit.edu
Access restrictions: Subscribed content limited to MIT users
Limitations: Non-commercial use only, some rate limits and restrictions may apply
For more information: textmine@mit.edu
What it does: Dimensions Analytics API enables users to perform analytics on the Dimensions Analytics database, which contains publications, awarded grants, patents, datasets, clinical trials and policy documents to deliver an unparalleled look at research globally..
How it’s accessed: Access is provided through an API key. MIT Libraries has licensed 10 keys for the community. MIT users must register with Dimensions Analytics prior to requesting a key. Please use this form to request access.
Result format: The Dimensions Analytics API uses a custom, domain-specific language called the Dimensions Search Language (DSL) to return results in JSON. Additionally, there is a Google Sheets Connector that enables queries as well as an extensive API Lab with reusable Python notebooks.
Contact for technical questions: Dimensions Analytics has extensive support and documentation. Go to the Support menu at the top right of Dimensions Analytics web app. If you need more assistance, please contact support@dimensions.ai.
For more information: Contact lib-comptool@mit.edu with questions about the use of this tool.
Coverage: the EEBO TCP Phase I corpus: books printed in England, Ireland, Scotland, Wales and British North America and works in English printed elsewhere from 1473–1700
How it’s accessed: full-text access and search tools available to all via the University of Michigan EEBO-TCP site, downloadable full-text files available here, HTML, ePUB, and TEI P5 XML copies available through the Oxford Text Archive, and tiff files of page images available to MIT users here
Access restrictions: text versions of Phase 1 content are openly available for public use, page images and Phase II text limited to subscribing institutions
Limitations: no limitations on openly available data, access via ProQuest subject to terms of use
Contact for technical questions: University of Michigan EEBO help
For more information: http://quod.lib.umich.edu/e/eebogroup/
Coverage: English-language and foreign-language titles printed in the United Kingdom during the 18th century, along with thousands of important works from the Americas (the ECCO-TCP corpus)
How it’s accessed: Multiple ways to access, listed here
Access restrictions: none
Limitations: none
Contact for technical questions: http://www.textcreationpartnership.org/contact/
For more information: http://www.textcreationpartnership.org/tcp-ecco/
What it includes: MIT-subscribed Electrochemical Society publications
How it’s accessed: TDM access can be arranged on a per-project basis, MIT users can contact textmine@mit.edu to inquire
Access restrictions: Subscribed content limited to MIT users
Limitations: Some rate limits and restrictions may apply
For more information: textmine@mit.edu
What it includes: MIT-subscribed and open access content available from Emerald Publishing
How it’s accessed: Accessible via CrossRef’s TDM service; users should inform support@emeraldinsight.com of the IP address that will be used for mining prior to beginning to avoid an IP block
Access restrictions: Subscribed content limited to MIT users
Limitations: Non-commercial use only; subject to terms in the TDM license
Contact for technical questions: support@emeraldinsight.com
For more information: http://www.emeraldinsight.com/page/tdm; http://www.emeraldinsight.com/page/tdmfaqs
What it is: Wide variety of European content, selected openly available content listed here
How they’re accessed: searchable by web interface or various APIs
Result format: Varies by API
How to register: https://pro.europeana.eu/pages/get-api
Limitations: See Terms of Use for varying items
Contact for technical questions: api@europeana.eu or API Google groups page
For more information: https://pro.europeana.eu/page/apis
Coverage: 6,000 of the most frequently studied books from the Evans Early American Imprints Collection
How it’s accessed: Evans-TCP web interface
Access restrictions: none
Limitations: none
Contact for technical questions: http://www.textcreationpartnership.org/contact/
For more information: http://quod.lib.umich.edu/e/evans/
Coverage: Large corpus of > 25 million scanned books from libraries and publishers, including foreign language corpora
How it’s accessed: Multiple ways to access, including third party tools: search via Google Books web interface, Ngram Viewer, BYU Google Books viewer, Culturomics Bookworm Viewer
Access restrictions: none
Limitations: TDM output limited to snippet view for in-copyright works
Contact for technical questions: Google Books help
For more information: https://books.google.com/intl/en/googlebooks/about/
What it is: Two large corpora of scanned works available for download: a non-Google corpus of >550,000 primarily English-language public domain volumes published prior to 1923, and a Google-digitized corpus of >4.8 million public domain in a wide variety of languages, subjects, and dates (see visualizations of coverage); custom datasets also available
How it’s accessed: All content available for search via web interface, content also available for computational analysis through the HathiTrust Research Center, as datasets to download or via various APIs
Result format: varies depending on query method
How to register: Two methods of access: via a Web client, requiring authentication (users who are not members of a HathiTrust partner institution must sign up for a University of Michigan “Friend” Account), or programmatically using an access key that can be obtained at http://babel.hathitrust.org/cgi/kgs/request
Limitations: All text is searchable, but web search output results are limited for in-copyright works; no restrictions on download of non-Google public domain corpus, download of Google-digitized corpus is restricted to participating institutions. Check the limitations for downloaded corpora and other policies.
Contact for technical questions: feedback@issues.hathitrust.org, https://www.hathitrust.org/feedback
For more information: https://www.hathitrust.org/data_api and https://www.hathitrust.org/datasets
What it includes: Full-text legal history collection from 1700-present including legal journals, books, world constitutions, treaties, US Supreme Court reports, US Code, Statutes at Large, Code of Federal Regulations, Congressional Record, presidential papers, Foreign Relations of the United States, federal agency reports and records, Philippine law collection, resources for researching legislative histories, and 5th-7th editions of Leiter’s “National Survey of State Laws”
How it’s accessed: TDM access can be arranged on a per-project basis, MIT users can contact textmine@mit.edu to inquire
Access restrictions: Subscribed content limited to MIT users
Limitations: Some rate limits and restrictions may apply
For more information: textmine@mit.edu
What it includes: ICSD (Inorganic Crystal Structure Database) provides the world's largest database for completely identified inorganic crystal structures, including minerals, metals, alloys and metal-organic.The first records date back to 1913. Currently, around 12,000 new structures are added every year.
The curated data collection includes:
How it’s accessed: MIT affiliates have access to ICSD API service via our site license. To request access to the ICSD API, please email icsd-api@mit.edu and include the following information in your message.
Each user will need to review and agree on the Terms and Conditions of ICSD API from ICSD.
Access restrictions: The MIT community (MIT faculty, students, post-docs & other researchers, and staff) are authorized to use ICSD for academic research purposes only.
Limitations: We have limited number of seats for ICSD API access. They will be allocated to users on first-come first-serviced basis with an expiration date based on individual project needs. If you need an extension of the access, please email icsd-api@mit.edu .
For more information: textmine@mit.edu
What it does: Provides flexible query and retrieval of metadata records for more then 4 million documents comprising IEEE journals, conference proceedings, and technical standards
How it’s accessed: HTTP requests using structured URL queries
Result format: JSON, XML
How to register: Follow the steps at https://developer.ieee.org/getting_started
Limitations: Maximum of 200 results may be retrieved in a single query. A query term can only contain a maximum of 10 words
Contact for technical questions: onlinesupport@ieee.org
For more information: https://developer.ieee.org/
Coverage: Over 11 million fully accessible books and texts
How it’s accessed: Searchable by web interface, with multiple download formats for individual works; instructions for a method for bulk download here
Access restrictions: none
Limitations: No stated technical limitations; subject to terms of use
Contact for technical questions: info@archive.org
For more information: https://archive.org/details/texts
What it includes: MIT-subscribed and open access journals published by IOP
How it’s accessed: TDM access can be arranged by request, MIT users should contact textmine@mit.edu
Access restrictions: Subscribed content limited to MIT users
Limitations: Some rate limits and restrictions may apply
For more information: textmine@mit.edu
What it does: Not a true API, but allows computational analysis and selection of JSTOR’s scholarly journal and primary resource collections. Includes tools for faceted searching and filtering, text analysis, topic modeling, data extraction, and visualization
How it’s accessed: Web interface
Result format: CSV, varies depending on tool used
How to register: Free to access, registration is required to obtain results; no institutional affiliation required
Limitations: Datasets are capped by default at 1,000 articles; users seeking larger results are asked to contact JSTOR Data for Research
Contact for technical questions: support@jstor.org, http://about.jstor.org/contact
For more information: http://about.jstor.org/service/data-for-research
What it does: The Lens API provides programmatic access to Patent and Scholarly Works meta records. The Lens meta records of patents and scholarly works are metadata aggregated from various sources with persistent identifiers from the original data sources, and normalized with provenance maintained. All MIT affiliates have access via the Institutional User API Plans, which include the following rates and volumes for each user.
Scholarly Institutional User API Plan |
Patent Institutional User API Plan |
● 5,000 requests per month ● 500 records per request ● 10 requests per minute |
● 5,000 requests per month ● 100 records per request ● 10 requests per minute |
In addition, MIT Libraries has a small number of seats for high-volume API access via the Institutional Toolkit (ITK) Plan.
How it’s accessed: All MIT affiliates can access the API via the request access form.
Result format: provides programmatic access via REST API to the full corpus of Lens scholarly works and patent, with JSON output format. The Lens API documentation includes details of request structure, searchable fields, code examples, and response fields etc. You may also use their Swagger UI for query development.
Contact for technical questions: If you have access to the API and need technical help, please create an issue on The Lens GitHub repo or use the feedback form on their support page at https://docs.api.lens.org/support.html. You may also contact support@lens.org.
For more information: Contact lib-comptool@mit.edu with questions about our subscription and the use of this tool.
Coverage: news, business, and legal resources
How it’s accessed: Content may be downloaded for TDM via the web interface
Access restrictions: Access limited to MIT users
Limitations: Content may be downloaded from search results in batches up to the batch limit allowed by the platform, and must be deleted after 90 days; scripting of batch downloads is not permitted
For more information: textmine@mit.edu
What they do: Multiple APIs available to download bibliographic data and search Library of Congress digital collections, including images, public radio and television, and historic newspapers
How they’re accessed: Varies by API used, more information available here
Result format: Varies by API used
How to register: Free to use, most APIs do not require an API key
Limitations: Not specified, varies by API used
Contact for technical questions: https://labs.loc.gov/lc-for-robots/
For more information: https://labs.loc.gov/lc-for-robots/
What it does: The Linguistic Data Consortium (LDC) creates, collects and distributes speech and text databases, lexicons, and other resources for research and development purposes.
How it’s accessed: All MIT users can access LDC. Access information is listed at this libguide: https://libguides.mit.edu/ldc
Result format: Most text data is released as plain text or in XML format. LDC prefers simple and shallow formatting to make it easy for an automatic process to use, ignore or remove the markup tagging.
Contact for technical questions: If you have access to the computational tool and need technical help, please contact ldc-lib@mit.edu
For more information: Contact ldc-lib@mit.edu with questions about the use of this tool.
Coverage: Corpus of encoded public domain and openly licensed musical compositions; full list available here: http://mit.edu/music21/doc/about/referenceCorpus.html
How it’s accessed: Install the music21 toolkit and access via Python
Access restrictions: None
Limitations: None
Contact for technical questions: http://groups.google.com/group/music21list
For more information: http://mit.edu/music21/
What it does: Blog tracking and indexing service; tracks Nature blogs and other third-party science blogs
How it’s accessed: RESTful interface, queries are made as HTTP GET requests
Result format: Default is JSON, some queries return Atom/RSS
How to register: Free to register, API key no longer required as of 2013
Limitations: 2 calls per second; 5,000 calls per day; RSS results are limited to 100 items maximum
Contact for technical questions: developers@nature.com
For more information: http://www.nature.com/developers/documentation/api-references/blogs-api/
What it does: Bibliographic search service for Nature content
How it’s accessed: RSS, JSON, ATOM, SRU XML, TURTLE, depending on interface used
Result format: REST API with two interfaces: 1) OpenSearch standard interface using keyword searches; 2) SRU search interface using CQL structed queries
How to register: Free to register, API key no longer required as of 2013
Limitations: Results served in pages of 25 records. Additional records can be retrieved by paging through the result set. The page size can be varied and is capped at 100 records
Contact for technical questions: developers@nature.com
For more information: http://www.nature.com/developers/documentation/api-references/opensearch-api/
What it includes: MIT-subscribed and open access publications by Nature
How it’s accessed: Content may be downloaded directly from the Nature online platform, including by automated download
Access restrictions: Limited to MIT users and research collaborators
Limitations: Download of content should not exceed 1 document per second; TDM rights are limited to non-commercial use
For more information: textmine@mit.edu
What it is: The NCBI developer portal contains a variety of resources such as API, software libraries, and datasets that can me downloaded and accessed for computational use.
Limitation: No limitations, but see their data use policy and usage guidelines
Contact for technical support: contact the NCBI support center
Coverage: metadata and some content from New York Times articles 1851-present
How it’s accessed: Multiple APIs are available for different uses, full list here
Access restrictions: Free to access with registration and acceptance of terms of use
Limitations: Noncommercial use only, and users must agree to terms of use; API calls limited to 1,000 calls per day, and 5 calls per second
Contact for technical questions: code@nytimes.com
For more information: http://developer.nytimes.com/
What they do: Allows programmatic access to a selection of OECD datasets
How they’re accessed: two RESTful APIs available for queries in SDMX-JSON or SDMX-ML formats
Result format: JSON, XML
How to register: No registration required
Limitations: 1 million data points; not all OECD datasets are covered
Contact for technical questions: OECDdotStat@oecd.org
For more information: https://data.oecd.org/api/
What it does: Queries and searches the ORCID researcher identifier system and obtain researcher profile data
How it’s accessed: RESTful interface
Result format: HTML, XML, or JSON
How to register: Two options: 1) Users can access the Public API, which only returns data marked as “public”; 2) Become an ORCID member to receive API credentials: see here
Limitations: Data retrieved through Public API is limited
Contact for technical questions: https://orcid.org/help/contact-us
For more information: https://orcid.org/organizations/integrators/API
Coverage: full catalog and selected corpora here
How it’s accessed: Searchable by web interface, with multiple download formats for individual works; curated corpora also available
Access restrictions: Most texts are free to use; some subject to depositor restrictions
Limitations: User must agree to user agreement
Contact for technical questions: ota@it.ox.ac.uk
For more information: http://ota.ox.ac.uk/
What it includes: MIT-subscribed and open access content published by Oxford University Press (OUP)
How it’s accessed: Content may be downloaded for TDM
Access restrictions: MIT subscribed content limited to MIT users
Limitations: Noncommercial use only
For more information: textmine@mit.edu
What it is: Every PLOS article, including all Articles and Front Matter can be bulk downloaded and data about PLOS articles or the articles themselves can be access through APIs.
How it’s accessed: Bulk download and APIs
Limitations: No stated technical limitations; content under CC-BY license, bulk data does not include figures or supplemental data.
Access restrictions: No restrictions. See API display policy
For technical support: Join the PLOS API developers group.
Coverage: >53,000 books, primarily in the public domain (pre-1923)
How it’s accessed: Individual works are downloadable in multiple formats from the Project Gutenberg website; bulk downloading is permitted via mirroring or wget, more information available here
Access restrictions: none
Limitations: Most works are available for use without restriction, but in-copyright works may have individual restrictions; users must agree to terms of use
Contact for technical questions: Contact information
For more information: http://www.gutenberg.org/wiki/Gutenberg:Information_About_Robot_Access_to_our_Pages
What it includes: MIT-subscribed and open access content published by the Royal Society of Chemistry
How it’s accessed: TDM access can be arranged by request, MIT users should contact textmine@mit.edu
Access restrictions: MIT subscribed content limited to MIT users
Limitations: Noncommercial use only
For more information: textmine@mit.edu
What it includes: MIT-subscribed and open access content published by Sage
How it’s accessed: Content may be downloaded for TDM
Access restrictions: MIT subscribed content limited to MIT users
Limitations: Noncommercial use only
For more information: textmine@mit.edu
What it does: Bibliographic search service for displaying STAT!Ref results on a website.
How it’s accessed: OpenSearch specifications
Result format: RSS, ATOM, HTML
How to register: Free to register for users at a subscribing institution
Limitations: Limits exist but are not specified; high-volume users should contact STAT!Ref
Contact for technical questions: support@statref.com
For more information: http://online.statref.com/Resources/StatRefOpenSearch.aspx
What they do: Multiple APIs available for different use cases, including text mining of full-text content, search widgets, displaying journal or book level data, federated searching, and indexing
How they’re accessed: varies, depending on use case
Result format: varies, depending on use case
How to register: Free to register at https://dev.elsevier.com/. MIT users should register via institutional login
Limitations: varies, some API results are limited to MIT Libraries' article entitlements.
Contact for technical questions: integrationsupport@elsevier.com
For more information: https://dev.elsevier.com/; https://dev.elsevier.com/sd_apis.html
Coverage: Three bibliographic databases of publications in astronomy, astrophysics, physics, and all content included in the arXiv e-prints
How it’s accessed: available through ADS API, request a token
Access restrictions: none
Limitations: systematic downloading of content prohibited except through provided API, API subject to rate and scope limits; researchers seeking to regularly download and store copies of API results should contact ADS first. See terms of use.
Contact for technical questions: adshelp@cfa.harvard.edu
For more information: https://ui.adsabs.harvard.edu/about/ and the full API documentation
What it includes: MIT-subscribed and open access content on the SpringerLink platform
How it’s accessed: Content may be downloaded for TDM directly from SpringerLink, and downloading may be automated for that purpose; Springer APIs may be used to identify desired content for download. Contact textmine@mit.edu for key instructions.
Access restrictions: Subscribed content limit to MIT users
Limitations: Non-commercial use only, users should adhere to the Springer TDM policy
Contact for technical questions: Eddie.Bates@springernature.com, support.api@springer.com
For more information: https://www.springer.com/gp/rights-permissions/springer-s-text-and-data-mining-policy/29056
What they do: Multiple APIs providing access to Springer metadata and open access content
How they’re accessed: RESTful interface, using structured URL requests
Result format: XML and JSON
How to register: Free to register, API key required and provided after registration.
Limitations: maximum results for a single query is 100 results for metadata queries, or 20 results for open access queries
Contact for technical questions: support.api@springer.com
For more information: https://dev.springer.com/; https://dev.springer.com/docs; https://dev.springer.com/restfuloperations
What it includes: MIT-subscribed and open access journals published by Taylor & Francis
How it’s accessed: TDM access can be arranged by request, MIT users should contact textmine@mit.edu
Access restrictions: Subscribed content limited to MIT users
Limitations: Some rate limits and restrictions may apply
For more information: textmine@mit.edu
What it does: ProQuest’s TDM Studio is a text and data mining solution for research, teaching and learning and allows MIT users to analyze the content from eligible MIT Libraries ProQuest subscriptions. Please check the eligible content file for details and the date last updated. If you need a more recent version, please contact lib-comptool@mit.edu
How it’s accessed: All MIT-affiliated users can access TDM Studio via the request access form.
Result format: Dataset files of metadata and derived data can be downloaded in CSV format. Full-text content export is not allowed.
Contact for technical questions: If you have access to the API and need technical help, please contact TDMStudio@clarivate.com.
For more information: Contact lib-comptool@mit.edu with questions about the use of this tool.
What they do: Allow access to data on International Merchandise Trade Statistics (IMTS) and the work of the International Merchandise Trade Statistics Section (IMTSS) of the United Nations Statistics Division
How they’re accessed: Some services in REST, some in SOAP
Result format: XML or CSV, depending on service
How to register: Comtrade Web Services requires IP authentication, users must have site license account, however, access to metadata and data availability is not restricted
Limitations: Depending on access rights, the following data can be obtained: Comtrade Data, Tariff Line Data, Total Trade, Annual Totals, Processed Data or Original Data. The latest three are restricted for data exchange between UN and OECD.
Contact for technical questions: comtrade@un.org
For more information: https://comtrade.un.org/ws/
What it includes: Wall Street Journal content 1889-1934
How it’s accessed: Downloadable in XML format from Dataverse
Access restrictions: Limited to MIT users
Limitations: None
Contact for technical questions: dvn_support@help.hmdc.harvard.edu; Questions can also be posted in https://groups.google.com/forum/#!forum/dataverse-community
For more information: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/XAUHMH
What it does: Allows text- and data-mining access to content in Web of Science Lite
How it’s accessed: Accessible via Clarivate’s Developers Portal
Result format: JSON or XML
How to register: Must be part of a subscribing institution to have full text access. MIT users must set up an individual account at Clarivate’s Developers Portal https://developer.clarivate.com and fill out a form with their name, email address, and an optional description of the project.
Limitations: Maximum number of tokens per user:1. Maximum number of requests/second:2.
Users may use the API to access the Data Fields in accordance with the applicable License Level, in each case as permitted by your subscription.
If a user is using Web of Science data in an article or presentation they must appropriately cite and credit Clarivate Analytics as the source.
Contact for technical questions: Contact support link here
For more information: https://developer.clarivate.com/
What it does: The Web of Science™ API Expanded API supports rich searching across the Web of Science to retrieve full item-level metadata from an expanded list of fields, including times cited counts, contributor addresses/affiliations and funding data. Additional operations support the ability to discover related records as well as cited references and citing items. The API offers full customization and flexibility for researchers who want to build more sophisticated queries, but requires some technical skill and coding ability to get up and running. MIT Libraries’ subscription includes 1 million record downloads per year for the MIT community.
How it’s accessed: Access is provided through an API key and the WoS Developer Portal (registration required). Due to the limited amount of downloads per year for the MIT community, requests for access need to be reviewed. Please use this form to request access.
Result format: provides REST-based programmatic access to the Web of Science™ documents, with JSON and XML output format
Contact for technical questions: If you have access to the API and need technical help, please contact clarivate.customersupport@clarivate.com or use the contact form.
For more information: Contact lib-comptool@mit.edu with questions about the use of this tool.
What it does: Allows text- and data-mining access to content in the Wiley Online Library
How it’s accessed: Accessible via CrossRef’s TDM service; RESTful interface
Result format: JSON
How to register: Must be part of a subscribing institution to have full text access. Users will encounter a click-through agreement and will receive a Client API Token, which is needed when requesting full text of articles
Limitations: rate-limits implemented through CrossRef rate-limiting headers, exact limitations not specified
Contact for technical questions: TDM@wiley.com; labs@crossref.org for support using the CrossRef TDM service
For more information: https://onlinelibrary.wiley.com/library-info/resources/text-and-datamining
What they do: Provide access to World Bank statistical databases, indicators, projects, and loans, credits, financial statements and other data related to financial operations
How they’re accessed: Three RESTful APIs available to provide access to different datasets: Indicators (time series data), Projects (data on the World Bank’s operations), Finances (World Bank financial data)
Result format: XML, JSON, RDF, and Atom, depending on specific API used
How to register: Free to use, no registration or API key required
Limitations: Request volume limits are unspecified, but should be “reasonable”
Contact for technical questions: data@worldbank.org or “Contact support” link here
For more information: https://datahelpdesk.worldbank.org/knowledgebase/topics/125589
Coverage: Primary source materials from many cultures and countries, representing over 100 different languages
How it’s accessed: Multiple access methods supported, see: http://api.wdl.org/
Access restrictions: None
Limitations: None
For more information: http://api.wdl.org/; https://labs.loc.gov/lc-for-robots/