Anatomy of DAMS digital assets

General

In the context of the DAMS, what we call a digital asset is essentially a bundle of different types of data, consisting for instance of a primary media file (publication/production file), either born digital or digitally reformatted, accompanying metadata, and any additional/secondary/derivative files. The DAMS stores these different types of data in so-called datastreams.

Content in the DAMS is modelled in broad classes, called Content Models. The Content Models determine which datastreams are required or permitted, and which datastreams are automatically generated as derivatives from media files ingested into the DAMS. The Content Model of an asset also determines which metadata form fields are available, which viewer is used in the DAMS GUI and on the Collections Portal. See the page Content Models for a detailed description of the available models.

Example

A preservation-grade scan of a single-page asset like a photograph or a map will have a resolution of somewhere between 400 and 4000 dpi. The image is usually stored uncompressed in a TIFF container, which will result in a very large file. These files are usually not well suited for making a digital image available to users. Instead, a derivative file will be created, for instance in the form of JPEG image with lower resolution and in compressed form. To aid visual search/browsing in the DAMS and in the Collections portal, an additional thumbnail version is created (also in JPEG form). Metadata stored with the image asset will include technical metadata (for instance about the file format in question, results of validity tests, a checksum, etc.), an access policy, as well as bibliographical or archival metadata describing the asset.

In the DAMS GUI the datastreams of an asset are represented as in the following image:

Simple vs. complex assets

Simple assets are assets that does not have constituent parts or "child" assets. Examples are individual photographs, audio files, movies, or single-page print material. In comparison, a complex asset contains constituent parts which will have individual PIDs in the DAMS, e. g. the issues of a journal or the pages of a book.

Simple asset content models

In order to appropriately store, manage and disseminate digitized physical assets, the DAMS uses content models for different classes of media. The following content models are available for processing simple assets:

  • AUDIO
  • LARGE IMAGE
  • PDF
  • VIDEO
  • BINARY

See the documentation page on content models for details on each model and the file types supported.

Paged content/ complex assets

Digital assets can be designated upon ingest as either a parent or a child in a parent-child relationship. Digital assets that have child assets are referred to as complex assets. Currently, complex assets are used to model paged content, e.g. Books or Issues of publications/serials. Typically, a book-level asset and its pages are created using image files that are the result of a scanning process (digitally reformatted content).

Page-level assets

The individual pages of a book or serial issue are stored in the OBJ datastream of Page assets. These assets typically do not contain descriptive metadata. Page-level assets cannot be published to the Collections Portal individually, only as part of a Book or Issue.

Book/issue-level assets

A Book or Issue asset stores information about the Page assets the book or issue consists of, and their order. In addition, the Book or Serial Issue asset stores the descriptive metadata for the book or issue. Typically, Book or Serial Issue assets does not contain an OBJ datastream.

Publication/series-level assets

Publication-level assets are intended to organize publication issues inside the DAMS. The DAMS GUI will show a calendar display to navigate publication issues based on creation/issuance month and year. Publication/series-level assets cannot be published to the Collections Portal and there is currently no calendar display available on the Collections Portal to browse serial issues.

Hierarchy of paged content/ complex assets in the DAMS


Example

This example book asset contains no OBJ datastream, but MODS and DC datastreams with descriptive metadata about the book. The asset also contains a thumbnail image (typically taken from the cover or the first page), as well as a PDF file that has been derived from the page images contained in the book. It also has an OCR datastream which aggregates the OCR results from each page into one datastream.

The DAMS GUI offers a special view of the page images that belong to the book:

Each page image is part of a Page asset, which has its own PID: