Benefits and Guidelines for Research Dataset Contributions into the Harvard Dataverse Network (DVN) developed and supported by the Harvard-MIT Data Center/IQSS
The Harvard Dataverse Network is available to MIT researchers to store and make available final versions of the data that they create or compile. The repository accepts data in all disciplines and formats, while primarily oriented to quantitative data. Any researcher or group can create a section for their data called a “Dataverse” which houses datasets.
- Centralized professional storage with distributed control and recognition
- One’s Dataverse can be branded (e.g., to have the look of your web site)
- Provides formal citations to data with a persistent identifier (DOI)
- Versioning: can save and provide stable access to old versions of a study
- Study templates: can create templates for study descriptions in one’s collections
- The study description (based on the DDI metadata standard) has a field for linking data to related publications
- Users can perform online subsetting and statistical analysis of data sets in supported formats
- Roles can be set for participants in a deposit workflow which provide varying levels of oversight of the collections
- Depositors are responsible for ensuring that their data complies with standards for intellectual property rights and safeguarding of research subject confidentiality (e.g., only submitting de-identified data, without direct or indirect identifiers)
- Depositors are responsible for adequate description of the dataset so that other researchers can find, understand and use the data. This can be accomplished through the study description and the uploading of additional documentation that fully describes the methodology of data collection and the contents of the data file(s).
- Policies for access can be set at the Dataverse, dataset, or file level
- One can enable data access to the world, MIT, and/or or particular named individuals (with a Dataverse account)
- Full support: For data files deposited in certain tabular formats (uploaded in either: Stata, R, SPSS, CSV + control card, or tab-delimited + DDI metadata), the system provides full support with the following features:
- Converts to a preservation format (primary data and variable metadata)
- Calculates a universal numerical fingerprint (UNF)
- Enables downloading in additional multiple formats (tab-delimited, Splus, Stata, and R)
- Users can download a customized subset of the data, generate summary statistics online, and apply Zelig (R) statistical methods
- Users can search variable metadata
- All other formats are accepted and can be downloaded in the original format with no additional features
- The system can take in files up to a size limit of 2GB per file; if you need to upload individual files larger than 2 GB, contact the DVN support team at firstname.lastname@example.org to discuss if larger file sizes may be feasible for your project.
- At the moment, there is no limit to the number of files the system can accept (policy subject to change).
- On-line storage and backup is provided for all formats
- As a function of the DVN software, automatic media migration is performed only on files in the aforementioned full support formats, in order to reformat materials as necessary to avoid format obsolescence.
- See also the full DVN Data Backup & Preservation Terms.
To get started: