Digitization: Technical standards

(Based on UTL Digitization Services Digitization Standards & Specifications, 02 February 2017 - credit: @Anna Lamphear )

UT Libraries Digitization Services adheres to international standards and best practices for digitization, including the Federal Digitization Guidelines Initiative (FADGI), and produces a range of quality archival and access media derivatives, useful to our faculty, students, staff, visiting researchers and the public.

Visit http://www.digitizationguidelines.gov/ for information about the FADGI guidelines.

Digitization products

Digitally reformatting different types of source media will typically result in at least 3 types of product file: Archival file, Production copy, access copies. Depending on the type of source media, other derivative files can be produced.

Archival file

also Reference image, Preservation file; see FADGI glossary page: http://www.digitizationguidelines.gov/term.php?term=archivalmasterfile

Typically represents "original" product of digitization process, with minimal editing or conversion, no cropping, no color correction, no down-sampling, no stitching.

Archival files are vaulted to UTL's tape archive for long-term preservation.

Archival files use file formats which are expected to not become obsolete in the longer term.

Per FADGI guidelines, the recommended file formats for images are TIFF or JPEG 2000 (both with lossless compression). Storing unadjusted or camera RAW images is not recommended since the raw/unadjusted image data is often a less accurate visual representation than output adjusted or optimized for representation accuracy (FADGI Technical Guidelines for Digitizing Cultural Heritage Materials, 2016, p. 63).

Archival files are not generally available through the DAMS, instead the Production copy is used for ingesting into the DAMS into the OBJ datastream. Archival file and Production copy can be identical in some cases. If there is a specific use case for storing Archival files in the DAMS, they can be added as an additional datastream (ARCHIVAL_FILE).

Production file

also Publication copy, Mezzanine file; see FADGI glossary page: http://www.digitizationguidelines.gov/term.php?term=productionmasterfile

Digital copy of content after initial processing that typically does not result in significant loss of quality vs. the Archival file. In some cases, the creation of a Production file will be a necessary step to achieve a valid digital representation of content, e.g. by stitching image segments of an oversized physical original. Other examples of processes resulting in a Production file include down-sampling, color correction, cropping or mild compression (e.g., video material).

Edits are optional, though, depending on the requirements of the individual repository, Production files can be identical to the Archival files (in this case no separate copy is necessary).

Vaulted to UTL's tape archive for long-term preservation.

Recommended file formats for images are uncompressed TIFF, TIFF or JPEG 2000 with lossless compression.

Usually, the Production file is used for ingesting into the DAMS.

Derivative file

also Access copy, Access derivative; see FADGI glossary page: http://www.digitizationguidelines.gov/term.php?term=derivativefile

Derived from the Production file, any file that results from further editing, transformation, or content analysis and extraction. Examples are lower-resolution, compressed representations of content (JPEG images, MP3 audio files), OCR-extracted text (e.g. hOCR, ALTO files), XML documents describing the structure and content (e.g., METS, TEI), PDF documents. Also includes transcripts/cations (e.g. as text documents or WebVTT files).

Derivative files are vaulted to UTL's tape archive if the derivative process cannot be automatically or deterministically reproduced: lower-resolution derivative image files can be easily recreated from a production copy, and do not need to be vaulted to tape. Uncorrected OCR results can be recreated from images (at better quality) as time passes. Manually corrected OCR text is the result of intellectual labor, and should be preserved, similarly METS and TEI representations of documents typically involve manual intervention and should be kept. PDF documents can be regenerated, if they are created through an automatic process from images, METS XML and OCR results.

Some of the Derivative files are generated automatically upon ingest into the DAMS, e.g., lower-resolution JPEG images, thumbnail images and MP3 audio files. Other derivatives can be created outside of the DAMS and added to the digital representation of an asset as an additional datastream.

Ancillary files

Includes for instance carrier photographs or other documentation about the analog carrier(s) necessary/useful for using the material.

  File Modified

Microsoft Word Document digitization_specifications-2019-11-13.docx

Mar 08, 2022 by MM Hanke

PDF File digitization_specifications-2019-11-13.pdf

Mar 08, 2022 by MM Hanke

Microsoft Word Document digitization_specifications-2017-02-02.docx

Mar 08, 2022 by MM Hanke