This is the "Home" page of the "Tools for Understanding Digital Files" guide.
Alternate Page for Screenreader Users
Skip to Page Navigation
Skip to Page Content

Tools for Understanding Digital Files   Tags: archives, digital, preservation, tools  

Guide to resources and archival concepts discussed during IAPril presentation by Kari Smith, Digital Archivist, MIT Institute Archives and Special Collections. 27 April 2012.
Last Updated: Jul 25, 2013 URL: Print Guide RSS UpdatesEmail Alerts
Home Print Page

About the IAPril Presentation

On April 27, 2012 Kari Smith, Digital Archivist, from the MIT Institute Archives gave the presentation,
"Is it what it is? Tools for Understanding your Digital Files."

During this session Smith exposed the audience to a variety of software tools that are being used in the Institute Archives for understanding digital files that are being added to the Archival collections.  These software tools aid with the process of long-term access of digital material by allowing us to know what the digital files are when we receive them, detect any changes over time, and how to make them available in the future.

She briefly discussed how these tools fit into work flows for digital content being developed for use in the Institute Archives and Special Collections department of the MIT Libraries.

Categories of tools that were reviewed include:  file format characterization, fixity, packaging, metadata extraction, conversion / normalization, disk imaging, and metadata embedding.


Archives Concepts

Recommended Reading:  "The Archivist's Perspective: Knowledge and Values," Chapter 3 from Understanding Archives & Manuscripts, James M. O'Toole & Richard J. Cox, 2006. 

Available for course reading at:

Tools Demonstrated or Discussed

  • TerraCopy
    TeraCopy is designed to copy and move files at the maximum possible speed. It skips bad files during the copying process, and then displays them at the end of the transfer so that you can see which ones need attention. TeraCopy can automatically check the copied files for errors by calculating their CRC checksum values
  • ClamAV
  • FTK Imager (Forensics Toolkit)
    Download available at:
  • Karen's Directory Printer
  • DROID (Digital Record Object Identifier)
    DROID (Digital Record Object Identification) is a software tool developed by The National Archives to perform automated batch identification of file formats. Developed by its Digital Preservation Department as part of its broader digital preservation activities, DROID is designed to meet the fundamental requirement of any digital repository to be able to identify the precise format of all stored digital objects, and to link that identification to a central registry of technical information about that format and its dependencies.
  • Manifest Maker
    Manifest Maker is free and open source software developed by the National Archives of Australia. Manifest Maker produces a tab-separated file that comprises:
    media identifier; path and filename of each data object in the transfer; checksum of each data object; checksum algorithm used; item number to which the data object belongs.
  • Curator's Work Bench (for Archives)
    Developed at UNC Chapel Hill. The features break down into three vaguely overlapping categories, those being capture, rearrangement and description of digital files.
  • Data Accessioner (Duke University)
    The Duke DataAccessioner was built out of the need for a simple GUI interface to allow technical services staff an easy way of migrating data off disks and onto a file server for basic preservation, further appraisal, arrangement, & description. It also provides a way to integrate common metadata tools at the time of migration rather than after the fact.
  • XifTool
    ExifTool is a platform-independent Perl library plus a command-line application for reading, writing and editing meta information in a wide variety of files. [MAC & PC]
  • Metadata Extraction Tool
    The Metadata Extraction Tool was developed by the National Library of New Zealand to programmatically extract preservation metadata from a range of file formats like PDF documents, image files, sound files Microsoft office documents, and many others. (WIN & UNIX)
    JSTOR and the Harvard University Library are collaborating on a project to develop an extensible framework for format validation: JHOVE (pronounced "jove"), the JSTOR/Harvard Object Validation Environment. JHOVE provides functions to perform format-specific identification, validation, and characterization of digital objects.
  • NARA File Anlyzer
    Perform evaluation for an entire file directory on filename validation, file size statistical analysis, checksum calculation, and file type extraction and generate summary report of each file analysis result.
  • NARA Video Frame Anlayzer
    Automate and standardize the Quality Control checks on digitized video files and analyze video frame level metadata generated by the digitization process.
  • XENA (XML Electronic Normalising for Archiving)
    Xena is free and open source software developed by the National Archives of Australia to aid in the long term preservation of digital records. Xena is an acronym meaning Xml Electronic Normalising for Archives. Xena software aids digital preservation by performing two important tasks: 1. detecting the file formats of digital objects, and 2. converting digital objects into open formats for preservation.
  • Adobe Bridge
  • QuickView Plus
    Viewing tool from AvantStar that lets you access information in over 300 Windows, Macintosh, Internet, and DOS formats from virtually any source — e-mail attachments, the Web, file servers, and more. *requires license purchase
  • Archivematica
    Archivematica is a free and open-source digital preservation system that is designed to maintain standards-based, long-term access to collections of digital objects. Developed by Artefactual.

    Archivematica uses a micro-services design pattern to provide an integrated suite of software tools that allows users to process digital objects from ingest to access in compliance with the ISO-OAIS functional model. Users monitor and control the micro-services via a web-based dashboard. Archivematica uses METS, PREMIS, Dublin Core and other best practice metadata standards. Archivematica implements media type preservation plans based on an analysis of the significant characteristics of file formats.
  • Digital Preservation Software Platform
    The Digital Preservation Software Platform (DPSP) is free and open source software developed by the National Archives of Australia. The DPSP is a collection of software applications which support the goal of digital preservation. The DPSP comprises:
    Xena - Xena stands for XML Electronic Normalising for Archives. Xena converts digital files to standards based, open formats.
    Digital Preservation Recorder (DPR) - DPR handles bulk preservation of digital files via an automated workflow.
    Checksum Checker - Checksum Checker is a piece of software that is used to monitor the contents of a digital archive for data loss or corruption.
    Manifest Maker - Manifest Maker produces a tab-separated list of digital files in a specified location. The manifest includes the checksum, path and filename of each digital file.
  • UUID (Wikipedia article)
    A universally unique identifier (UUID) is an identifier standard used in software construction, standardized by the Open Software Foundation (OSF) as part of the Distributed Computing Environment (DCE).

    The intent of UUIDs is to enable distributed systems to uniquely identify information without significant central coordination. In this context the word unique should be taken to mean "practically unique" rather than "guaranteed unique". Since the identifiers have a finite size it is possible for two differing items to share the same identifier. The identifier size and generation process need to be selected so as to make this sufficiently improbable in practice. Anyone can create a UUID and use it to identify something with reasonable confidence that the same identifier will never be unintentionally created by anyone to identify something else. Information labeled with UUIDs can therefore be later combined into a single database without needing to resolve identifier (ID) confli

About the Presenter

Profile Image
Kari Smith
Contact Info
Institute Archives & Special Collections
Bldg 14N-118
Tel: 617-253-5690
Send Email

Powered by Springshare. Text licensed under Creative Commons, unless otherwise noted. All other media all rights reserved unless otherwise noted.


Loading  Loading...