Content models

In order to appropriately store, manage and disseminate digitized physical assets, the DAMS uses content models for different classes of media, for instance image content, multi-page image content (books and serials issues), audio and video content. Depending on the content model, the DAMS may require different metadata and will show different web forms for entering and editing metadata through the DAMS user interface. Certain types of automatic processes upon ingest of a digital asset into the DAMS depend on the content model, e.g. the creation of derivative media files or full text.

The following sections list the content models currently available in the DAMS, together with the file types the DAMS supports for each content model. With the exception of the Binary content model, the DAMS will be able to create derivatives for the supported file types and provides appropriate viewers within the DAMS user interface.

AUDIO Content model

The following list contains the allowed file extensions that are accepted during a manual ingest of audio files into the DAMS. The file extension can be indicative of a coding format (e.g. MP3) and/or a container format (e.g. MP4). Irrespective of the container format used, audio data can be encoded in different ways, using different coding formats/encoders/codecs and encoder settings. Mozilla provides a good introduction to commonly used media container formats and encoders/decoders: https://developer.mozilla.org/en-US/docs/Web/Media/Formats/Containers.

Because of the large number of coding formats for digital audio, it is highly recommended to consult with Digitization Services before you ingest audio files. It might be recommendable to transcode audio files to increase compatibility.

AUDIO TRANSCRIPT REQUIRED

Audio content that is intended for publication MUST be accompanied by a textual representation (transcript).

Audio content published to the Collections Portal MUST meet web accessibility guidelines at the time of publication, as mandated by UT policy, state and federal law. Not complying with accessibility requirements can create a significant legal risk for UT Libraries. See https://www.w3.org/WAI/media/av/transcripts/ for information about audio transcripts as a means to meet web accessibility standards. See A/V Media: Create Captions and Transcripts for a guide to creating audio transcripts.

In the UTL DAMS, transcripts MUST be plain text and will be stored in a datastream labelled TRANSCRIPT. If the audio content intended for publishing does not contain language, an empty transcript might be required.

Currently, the DAMS does not ingest audio files other than WAV correctly/as intended.

  • Manual ingest
    • M4A files are stored with an incorrect MIME type (application/octet-stream), no proxy MP3 file is generated and no preview is available
    • MP4 files are stored with an incorrect MIME type (video/mp4), no proxy MP3 file is generated and no preview is available
    • WMA files: no proxy MP3 file is generated and no preview is available
    • AU files: no proxy MP3 file is generated and no preview is available
  • Batch ingest
    • M4A files: no proxy MP3 file is generated, DAMS media player plays OBJ datastream
    • MP4 files are stored with an incorrect MIME type (video/mp4), an MP4 proxy file with type video/mp4 is generated and used for preview, the thumbnail/file icon will incorrectly indicate video content
    • WMA files are stored with an incorrect MIME type (video/mp4), an MP4 proxy file with type video/mp4 is generated and used for preview, the thumbnail/file icon will incorrectly indicate video content
    • AU files: no proxy MP3 file is generated and no preview is available

Please contact the DAMS Management team (click here to submit a DAMS service request) or Digitization Services if you want to ingest audio content in one of the formats mentioned above.

  • m4a
  • mp3
  • wav
  • mp4
  • au
  • wma

Upon ingest, the DAMS will automatically create a compressed MP3 audio file (datastream PROXY_MP3) that is used for previewing the audio content in the DAMS and for publishing to the Collections Portal. MP3 audio files published to the Collections Portal will be playable in the AV player app through a progressive download. End users will be able to download the entire published MP3 audio file with comparatively little effort. If you need to restrict a download of the entire MP3 audio file, for instance for contractual reasons, you must create an MP4 derivative file from your audio data and ingest it into a datastream labeled PROXY_MP4. MP4 derivative files will be made available to the Collections Portal AV player as streaming audio. The current streaming media service does not allow for encryption or other methods for digital rights management, and it is the responsibility of each collection's curator to ensure that the dissemination methods available for content is compliant with legal or contractual obligations.

BASIC IMAGE Content model

The 'basic image' content model is being phased out, because assets using this content model cannot be published to the Collections Portal. Please contact the DAMS managers if you plan on ingesting primary object files with one of the file types listed below (click here to submit a DAMS service request).

  • jpg
  • png
  • gif

LARGE IMAGE Content model

If adding map content, check the advice given on the following page: Adding map content

This content model is NOT suited for image assets that have been scanned from both sides (recto/verso). If you want to store and publish recto and verso scans of a single sheet, use the BOOK content model.

The DAMS currently is not able to perform OCR on assets ingested with the LARGE IMAGE content model. Consult with the DAMS management team if you want to store OCR results with an image file using this content model.

Most commonly used content model for scanned images and text, e.g. maps, drawings, photographs. This content model allows to ingest 1 (one) image into the OBJ datastream. Upon ingest, the DAMS will create a JPEG 2000 derivative image with lossless compression, and store it in a datastream labeled JP2. The JP2 datastream is used for publishing assets to the Collections Portal and other endpoints.

VIDEO Content model

The following list contains the allowed file extensions that are accepted during a manual ingest of video files into the DAMS. The file extensions are typically indicative of container formats, which bundle video and audio data streams into a file. Irrespective of the container format used, audio and video data streams can be encoded in many different ways, using different coding formats/encoders/codecs and encoder settings. Mozilla provides a good introduction to commonly used media container formats and encoders/decoders: https://developer.mozilla.org/en-US/docs/Web/Media/Formats/Containers.

Because of the vast number of degrees of freedom for encoding video content into digital formats, it is highly recommended to consult with Digitization Services before you ingest video files. It might be recommendable to transcode video files to increase compatibility.

VIDEO CAPTIONS/TRANSCRIPT REQUIRED

Video content with an audio track containing spoken language MUST be accompanied by a time-coded textual representation (captions) if it is intended for publication. If the video contains visual information necessary to understand the content, a transcript describing visual information is REQUIRED.

Video content published to the Collections Portal MUST meet web accessibility guidelines at the time of publication, as mandated by UT policy, state and federal law. Not complying with accessibility requirements can create a significant legal risk for UT Libraries. See https://www.w3.org/WAI/media/av/captions/ for information about captions/subtitles as a means to meet web accessibility standards. See A/V Media: Create Captions and Transcripts for a guide to creating captions.

In the UTL DAMS, captions MUST be formatted as WebVTT and will be stored in a datastream labelled CAPTIONS. Transcripts MUST be plain text and will be stored in a datastream labelled TRANSCRIPT.

  • mov
  • qt
  • mp4
  • m4v
  • avi
  • wmv
  • ogg
  • mkv

The DAMS will create a lower-resolution, compressed MP4 file (datastream MP4) which will be used for previewing video content in the DAMS and for publishing to the Collections Portal. MP4 derivative files will be made available to the Collections Portal AV player as streaming video. The current streaming media service does not allow for encryption or other methods for digital rights management, and it is the responsibility of each collection's curator to ensure that the dissemination methods available for content is compliant with legal or contractual obligations.

Paged content (book, publication, issue, page)

If adding map content, check the advice given on the following page: Adding map content

When we refer to paged content in the DAMS we are referring to four different content models, PAGE, BOOK, ISSUE and PUBLICATION. Digital assets of these types can be hierarchically organised to digitally recreate the structure of books and serials/continuing resources.

Permitted file types for paged content are:

  • tiff/tif
  • jp2
  • pdf*

*) Born-digital PDF documents should be ingested with the PDF content model workflow. If you plan on ingesting a PDF document that was created from a scanning process, e.g. a digitally reformatted book, please contact the DAMS managers first. It is strongly advised to ingest digitally reformatted content as a batch of high-resolution TIF/JP2 files. PDF documents created from scanned images might store image content with lossy compression, which can result in a visible reduction of image quality of individual pages when ingested into the DAMS.

PAGE Content model

Pages are single images representing an individual page. This content model is essentially equivalent to the LARGE IMAGE Content model, but does not require a Page to be associated with metadata. In addition, the DAMS offers the option to perform OCR on page images ingested with the paged content workflow.

Pages are grouped into Books or Issues.

BOOK Content model

Books contain Digital assets of the type PAGE. An asset of the type BOOK can aggregate OCR full text from child pages, if available. In addition, an aggregate PDF document containing all pages in the Book can be created.

ISSUE Content model

Issues contain Digital assets of the type PAGE. An asset of the type ISSUE can aggregate OCR full text from child pages, if available. In addition, an aggregate PDF document containing all pages in the Issue can be created.

Issues can be grouped into Publications.

PUBLICATION Content model

Publications are groupings of Digital assets of the type ISSUE (typically, but not limited to continuing resources/serials). The DAMS user interface will show a calendar view which is based on the creation/issuance dates of Issues grouped into a publication.

PDF Content model

Content ingested using the PDF content model currently cannot be published to the Collections Portal. Depending on the origin of the PDFs you would like to store in the DAMS, the Book or Publication content models might be more appropriate. Please contact the DAMS managers for a consultation (click here to submit a DAMS service request).

This content model is intended for born-digital PDF documents. If you plan on ingesting a PDF document that was created by a scanning process, e.g. a digitally reformatted book, please contact the DAMS managers (click here to submit a DAMS service request).

  • pdf

The DAMS will attempt to extract text from a PDF document that is ingested with the PDF Content model workflow. If successful, the extracted full text is stored in a datastream called FULL_TEXT.

BINARY Content model

Content ingested using the Binary content model currently cannot be published to the Collections Portal. Depending on the type of file(s), a different content model, or a diffent storage location than the DAMS might be more appropriate. Please contact the DAMS managers for a consultation (click here to submit a DAMS service request).

Any other file types can be added to the DAMS, but cannot be directly viewed or played by users in the DAMS interface. Users can ingest such files using the BINARY content model. Alternatively, consider adding such files alongside a supported file type as an additional datastream. Examples: 

  • raw file
  • iso file
  • psd file 
  • docx file
  • etc.