Batch ingest multiple datastreams per asset (Tiered ingest)

What this does

Tiered Ingest allows you to group all of the files corresponding to a simple asset's datastreams (including archival files, publication files, other derivatives created outside of Islandora, with the exception of RELS-EXT) into a sub-directory.

When ingesting derivative files, with this method, they may be overwritten by the DAMS software.

The DAMS software will determine the asset's Content model based on the file type (MIME type) of the primary media file, which is ingested into the OBJ datastream. Particularly with AV content, this can lead to unwanted results (e.g. an audio file being ingested with the Video content model). Some media file formats can be used for different kinds of content. Consult with the DAMS Management team when planning your ingest project.

This tiered batch ingest method is NOT suitable for paged content (complex/compound assets with children). See Batch ingest complex assets (paged content) for instructions on how to ingest assets comprised of multiple pages.

The tiered ingest allows you to store additional files with a digital asset, and you can use this method to ingest externally created derivative datastreams (e.g. for streaming audio). See Content models for a breakdown of the expected datastreams per content model, and for information which datastreams can be published to e.g. the Collections Portal.

General information for batch ingest

The batch ingest process runs continuously, looking for newly queued batch jobs approximately every 5 minutes. You can add batch ingest jobs to the queue at any time.

Batch jobs are subject to the following batch job size and file size limitations:

  • max. 100GB/batch job
  • max. 10GB/file

Step 1: Stage files for batch ingest job

Organise files in a batch job folder, using subfolders if appropriate. Refer to the instructions/options listed below for preparing batch jobs.

Staging folder structure

  • All of the files you are ingesting as part of one asset will be staged in one directory per asset, as a sub-directory of a batch job folder.
  • Each sub-directory corresponds to one asset and must contain at least a manifest file for the key datastreams (datastreams.txt).
  • The batch job folder can contain just one asset folder, but would still need the extra nesting


Sample folder structure
eid1234_example-batch-submission/ (batch job folder)
├── asset1/
│   ├── datastreams.txt
│   ├── modsfile.xml
│	├── primaryfile.tif
│	├── anyarbitraryderivativefile.ext
│	├── anyarbitrarycomponentfile.ext
│   └── anymediaphotographfile.ext
├── asset2_audio_example/
│   ├── datastreams.txt
│   ├── modsfile.xml
│   ├── audiofile.wav
│   ├── derivative_audiofile_for_streaming.mp4 (e.g. for creating PROXY_MP4 datastream, which is required for streaming audio)
│   └── audio_transcript.txt
└──	asset3_video_example/
    ├── datastreams.txt
    ├── modsfile.xml
    ├── videofile.mp4
    ├── video_captions.vtt
    └── video_transcript.txt
    └── page02_custom_ocr.txt

Step 2: Create datastreams.txt manifest

Subdirectories in the batch job folder MUST each contain a manifest file named datastreams.txt. The manifest file specifies the intended structure of the DAMS asset, for instance pointing to the MODS XML containing the metadata for the asset, or specifying which additional datastreams should be created from staged files.

Each line of the manifest file contains an argument-value pair in the following format:

<ARGUMENT>==<VALUE> 

Use 2 (two) equal signs to separate arguments and values.

Manifest Arguments

Refer to Anatomy of DAMS digital assets and Content models for a list of allowed/expected datastreams per content model. Consult with the DAMS Management Team for use cases not covered by the datastreams listed in this documentation.

DO NOT use any of the Restricted Datastream IDs.

Manifest generator script

com.atlassian.confluence.api.service.exceptions.PermissionException: Parent page view is restricted

If you're experiencing issues please see our Troubleshooting Guide.

Sample manifests

Sample generic datastreams.txt manifest file
OBJ==primaryfile.ext
MODS==metadata.xml
# optional, if no MODS file is included, minimal metadata is automatically generated during ingest
PDF==custom.pdf
# optional
ARCHIVAL_FILE==originalversionof_primaryfile.ext
# optional, use for archival file (e.g. uncropped scan)
COMPONENT1==componentfile1.ext
COMPONENT2==componentfile2.ext
# optional, can for instance be used in cases where a primary image is stitched from multiple component images; increment for additional files in same directory
# DO NOT use for complex objects that can be modeled as paged content or Islandora component assets!
MEDIAPHOTOGRAPH==anymediaphotographfile.ext 
# optional, can be used for images documenting physical media, cases, covers, etc.; use MEDIAPHOTOGRAPH if there is one image only
MEDIAPHOTOGRAPH1==anymediaphotographfile.ext
MEDIAPHOTOGRAPH2==anymediaphotographfile.ext
# optional, can be used for images documenting physical media, cases, covers, etc.; increment for multiple images documenting the physical carrier(s)
Sample datastreams.txt manifest file for audio content
OBJ==audiofile.wav
MODS==metadata.xml
# optional, if no MODS file is included, minimal metadata is automatically generated during ingest
TRANSCRIPT==audiotranscript.txt
# Textual representation of linguistic content in audio and video assets. REQUIRED for audio assets to be publishable. Transcripts MUST be in plain text.
PROXY_MP4==audioderivative.mp4
# optional; audio content can be provided as streaming media, which adds a limited technical hurdle against a simple download of a complete MP3 audio file. If you prefer to deliver audio content as streaming media, you need to externally create an MP4 derivative and ingest it into a datastream labeled PROXY_MP4.
Sample datastreams.txt manifest file for video content
OBJ==videofile.mpg
MODS==metadata.xml
# optional, if no MODS file is included, minimal metadata is automatically generated during ingest
CAPTIONS==videocaptions.vtt
# Timed textual representation of linguistic content in audio and video assets. REQUIRED for video assets to be publishable. Captions MUST be provided in WebVTT format.
TRANSCRIPT==audiotranscript.txt
# optional; textual representation of linguistic content in audio and video assets. Transcripts MUST be in plain text.

Step 3: Upload batch job to Jscape

Ensure you have a user account with the SFTP server Jscape by checking UT secrets vault stache for an entry named "<your name> JScape SFTP". Contact the UTL DAMS Management Team if you don't already have an account.

The Jscape web interface does not allow you to upload directories. We recommend using an SFTP client to connect and upload your batch submissions.

  1. Connect to jscape in SFTP client:

    Host: jscape.its.utexas.edu
    port: 22
  2. Upload your batch job folder into the appropriate location in Jscape:

    1. TEST corresponds to running batch on dams-t01-rh7.lib.utexas.edu, PROD corresponds to running batch on dams.lib.utexas.edu
    2. Place your batch job folder in the appropriate top-level collection folder within the INGEST folder
      Example: /DAMS/TEST/INGEST/utlmisc/my_batch_job_folder (is what I would do for a batch upload to the miscellaneous collection on the DAMS Test Server).

      Any spaces in folder names must be represented by underscores (e.g. special_collection_1).


      We recommend naming your batch folder with your eid, a reference to the destination collection name in the DAMS, or anything else that will help you recognize the batch. In the example <my eid>_<what I am ingesting>, the folder name would be mm63978_EnPatufet1908-1911.

  3. Go over to the DAMS interface and submit your batch job to queue it to be run (see steps below).
  4. Note: Your batch job folder will be removed from the JScape server after seven days whether or not you have run the batch. Back up your batch requests in box or on your local machine. 

Step 4: Set up collection and submit form in DAMS interface

  1. Navigate to or create the target sub-collection to receive batch ingested files in DAMS

  2. Locate and copy the target sub-collection PID to clipboard (namespace:UUID, e.g. utlarch:9ebf6ac8-1823-4bf4-8398-654b54090776)

  3. Navigate to the Batch Ingest form in the DAMS:
    1. Production system: https://dams.lib.utexas.edu/utdams/batch_queue
    2. Test system: https://dams-t01-rh7.lib.utexas.edu/utdams/batch_queue
  4. Select the DAMS Top-Level collection from the dropdown field
  5. Paste the PID of the target sub-collection into the form
  6. Enter the name of the folder on the Jscape/FTP server that contains the files to be ingested (e.g. mm63978_EnPatufet1908-1911)
  7. For batch ingest of simple assets and paged content: select the appropriate ingest type
  8. Click submit

The DAMS should indicate at the top of the form that the batch ingest job was queued.

You will get an email notification after your request for a batch ingest has been received and another notice once the batch ingest process has finished.