General

In the context of the DAMS, what we call a digital asset is essentially a bundle of different types of data, consisting for instance of a primary media file (publication/production file), either born digital or digitally reformatted, accompanying metadata, and any additional/secondary/derivative files. The DAMS stores these different types of data in so-called datastreams.

Content in the DAMS is modelled in broad classes, called Content Models. The Content Models determine which datastreams are required or permitted, and which datastreams are automatically generated as derivatives from media files ingested into the DAMS. The Content Model of an asset also determines which metadata form fields are available, which viewer is used in the DAMS GUI and on the Collections Portal. See the page Content Models for a detailed description of the available models.

Example

A preservation-grade scan of a single-page asset like a photograph or a map will have a resolution of somewhere between 400 and 4000 dpi. The image is usually stored uncompressed in a TIFF container, which will result in a very large file. These files are usually not well suited for making a digital image available to users. Instead, a derivative file will be created, for instance in the form of JPEG image with lower resolution and in compressed form. To aid visual search/browsing in the DAMS and in the Collections portal, an additional thumbnail version is created (also in JPEG form). Metadata stored with the image asset will include technical metadata (for instance about the file format in question, results of validity tests, a checksum, etc.), an access policy, as well as bibliographical or archival metadata describing the asset.

In the DAMS GUI the datastreams of an asset are represented as in the following image:

Simple vs. complex assets

Simple assets are assets that does not have constituent parts or "child" assets. Examples are individual photographs, audio files, movies, or single-page print material. In comparison, a complex asset contains constituent parts which will have individual PIDs in the DAMS, e. g. the issues of a journal or the pages of a book.

Simple asset content models

In order to appropriately store, manage and disseminate digitized physical assets, the DAMS uses content models for different classes of media. The following content models are available for processing simple assets:

AUDIO
LARGE IMAGE
PDF
VIDEO
BINARY

See the documentation page on content models for details on each model and the file types supported.

Datastreams

User-provided

OBJ

REQUIRED

Primary media file (e.g. image, audio or video). The media file stored in the OBJ datastream is conventionally a production or publication file. The OBJ datastream is the source of any derivative made available for publication via the Collections portal.

The OBJ datastream for an asset should contain a publication/production file. Curators can decide which processing steps should be applied to an archival file to create a publication copy (for instance cropping, stitching or certain color corrections of images). Curators can also decide to use a publication copy that is virtually identical to the archival file. In any case, curators should consider that the OBJ datastream is currently the only source for media content to be published to the Collections portal.

Paged content/complex assets, publication type assets or collections do not have an OBJ datastream.

MODS

REQUIRED

Descriptive metadata about the asset, organized according to the Metadata Object Description Standard (MODS).

ARCHIVAL_FILE

OPTIONAL

Suggested datastream label for Archival files ingested along with production/publication files in a tiered ingest.

OCR_CUSTOM, FULL_TEXT_CUSTOM

OPTIONAL

User-generated full-text, e.g. as the result of optical character recognition (OCR).

PDF

The PDF datastream is system-generated for page assets that are part of a book or serial issue. If you provide a PDF datastream for a page-level asset, it will be overwritten when the creation of a PDF for the book or serial issue is triggered through the DAMS GUI.

Derivative of the content represented by the asset in PDF format.

Typically provided by Digitization Services for assets which comprise multiple pages (paged content).

TRANSCRIPT

REQUIRED for Audio content

Textual representation of linguistic content in audio and video assets. Required for audio assets to be publishable. Optional for video assets.

Transcripts MUST be in plain text.

CAPTIONS

REQUIRED for Video content

Timed textual representation of linguistic content in audio and video assets. Required for video assets to be publishable.

Captions MUST be provided in WebVTT format.

PROXY_MP4

Audio content can be provided as streaming media, which adds a limited technical hurdle against a simple download of a complete MP3 audio file. If you prefer to deliver audio content as streaming media, you need to externally create an MP4 derivative and ingest it into a datastream labeled PROXY_MP4. Please submit a DAMS support ticket for details on this step.

System-generated

Depending on the ingest method (manual or batch) and the type of content that is ingested to the DAMS (see Content models), the DAMS will automatically create some datastreams.

The system-generated datastreams serve pre-defined functions in the DAMS and are managed by the system. If you want to add a custom datastream, do not use one of these IDs, otherwise the system might overwrite custom data.

RELS-EXT

DAMS-specific metadata about the sub-collection an asset belongs to and about access permissions.

TECHMD

Technical metadata about the content of an asset's OBJ datastream, e.g. information about file format, compression algorithms, creation and modification dates.

POLICY

Metadata about the role-based access permissions to an asset inside the DAMS.

DC

Descriptive metadata about the asset in Dublin Core format, automatically derived from the MODS metadata provided by the user during ingest.

TN

Derivative image file: Thumbnail image.

JPG

Derivative image file: low-resolution dissemination copy.

JP2

Derivative image file: JPEG 2000 copy with lossless compression, for use in the Collections portal.

PROXY_MP3

The PROXY_MP3 datastream is automatically generated upon ingest from an OBJ datastream that contains WAV (waveform audio) data. If you publish an asset with a PROXY_MP3 datastream to the collections portal, the file will be made available to the embedded player as a progressive download. Users can download the complete MP3 file with relative ease. Audio content can be provided as streaming media, which adds a limited technical hurdle against a simple download of the complete MP3 audio file. If you prefer to deliver audio content as streaming media, you need to externally create an MP4 derivative and ingest it into a datastream labeled PROXY_MP4. Please submit a DAMS support ticket for details on this step.

Derivative audio file with MPEG Audio Layer III encoding. Automatically generated upon ingest by the DAMS using lame, for use in the Collections portal.

lame is invoked with the following parameters: -V5 -vbr-new

MP4

Derivative video file: MPEG-4 media container file, generated upon ingest by ffmpeg with H.264 video encoding and AAC audio encoding. Used in the Collections portal.

ffmpeg is invoked with the following parameters: -vcodec libx264 -preset medium -acodec aac -strict -2 -ab 128k -ac 2 -async 1 -movflags faststart

OCR, FULL_TEXT, HOCR

Full text generated by the DAMS during ingest or when OCR process is triggered in the DAMS GUI.

PDF

The PDF datastream is system-generated for Page assets that are part of a Book or Serial Issue. If you provide a PDF datastream for a Page-level asset, it will be overwritten when the creation of a PDF for the Book or Serial Issue is triggered through the DAMS GUI.

System-generated for Page assets when PDF creation is triggered through the DAMS GUI.

UTLDAMS_PDF

Derivative PDF container for paged content, generated through the DAMS GUI or upon ingest.

The system-generated PDF container for paged content is in almost all cases of lower quality than the PDF files provided by Digitization Services, which is typically ingested as the PDF datastream for book/issue-level assets.

Paged content/ complex assets

Digital assets can be designated upon ingest as either a parent or a child in a parent-child relationship. Digital assets that have child assets are referred to as complex assets. Currently, complex assets are used to model paged content, e.g. Books or Issues of publications/serials. Typically, a book-level asset and its pages are created using image files that are the result of a scanning process (digitally reformatted content).

Page-level assets

The individual pages of a book or serial issue are stored in the OBJ datastream of Page assets. These assets typically do not contain descriptive metadata. Page-level assets cannot be published to the Collections Portal individually, only as part of a Book or Issue.

Book/issue-level assets

A Book or Issue asset stores information about the Page assets the book or issue consists of, and their order. In addition, the Book or Serial Issue asset stores the descriptive metadata for the book or issue. Typically, Book or Serial Issue assets does not contain an OBJ datastream.

Publication/series-level assets

Publication-level assets are intended to organize publication issues inside the DAMS. The DAMS GUI will show a calendar display to navigate publication issues based on creation/issuance month and year. Publication/series-level assets cannot be published to the Collections Portal and there is currently no calendar display available on the Collections Portal to browse serial issues.

Hierarchy of complex assets/paged content in the DAMS

Example

This example book asset contains no OBJ datastream, but MODS and DC datastreams with descriptive metadata about the book. The asset also contains a thumbnail image (typically taken from the cover or the first page), as well as a PDF file that has been derived from the page images contained in the book. It also has an OCR datastream which aggregates the OCR results from each page into one datastream.

The DAMS GUI offers a special view of the page images that belong to the book:

Each page image is part of a Page asset, which has its own PID:

Anatomy of DAMS digital assets

General

Simple vs. complex assets

Simple asset content models

Datastreams

User-provided

OBJ

MODS

ARCHIVAL_FILE

OCR_CUSTOM, FULL_TEXT_CUSTOM

PDF

TRANSCRIPT

CAPTIONS

PROXY_MP4

System-generated

RELS-EXT

TECHMD

POLICY

DC

TN

JPG

JP2

PROXY_MP3

MP4

OCR, FULL_TEXT, HOCR

PDF

UTLDAMS_PDF

Paged content/ complex assets

Page-level assets

Book/issue-level assets

Publication/series-level assets

Hierarchy of complex assets/paged content in the DAMS