General information for batch ingest

The batch ingest process runs continuously, looking for newly queued batch jobs approximately every 5 minutes. You can add batch ingest jobs to the queue at any time.

Batch jobs are subject to the following batch job size and file size limitations:

max. 100GB/batch job
max. 10GB/file

Step 1: Stage files for batch ingest job

Organise files in a batch job folder, using subfolders if appropriate. Refer to the instructions/options listed below for preparing batch jobs.

Staging options for simple assets

Option 1 - Media file objects only (no MODS XML files)

Place all media files directly into the batch job folder. The DAMS will create one DAMS asset per file, guessing the content model based on the file name extension. The media file will be stored as the asset's OBJ datastream, and the filename of each ingested media file will be used as a preliminary title in the DAMS.

There is currently NO function to batch ingest XML, for attaching it to previously-ingested objects.

After ingesting assets with Method 1 (object-only) ingest, you have the following options to add metadata to the assets:

- Add metadata via DAMS GUI form, one object at a time
- Replace MODS XML datastreams, one object at a time
- Add or edit common metadata via batch edit - currently only a limited set of metadata elements can be edited in batch after ingest

Option 2 - Media file objects and MODS XML files

Place all media files directly into the batch job folder. Add a corresponding MODS XML file per media file. Each MODS XML documents MUST be named with a filename matching the corresponding media file object, followed by the suffix _METADATA (and the file name extension .xml):

<filename>.tif
<filename>_METADATA.xml

The DAMS will create one DAMS asset per media file, guessing the content model based on the file name extension. The media file will be stored as the asset's OBJ datastream, the corresponding MODS XML document will be stored as the asset's MODS datastream.

Option 3 - Media file objects and common metadata MODS XML file

Place all media files directly into the batch job folder. Add one MODS XML document named common_METADATA.xml containing common metadata that will be used for all assets created from this batch.

The DAMS will create one DAMS asset per media file, guessing the content model based on the file name extension. The media file will be stored as the asset's OBJ datastream, and each asset will store a copy of the common metadata MODS XML document in its MODS datastream.

Option 4 - XML only (create placeholder assets)

This option allows to create placeholder DAMS assets which contain only metadata, but no media file object. Use Method 5, described below, to add media file objects to placeholder assets created with Method 4.

Method 4 does NOT allow to add MODS XML metadata to DAMS assets that were previously ingested using Method 1 (object only). There is currently no method available to add MODS XML metadata to DAMS assets that were previously ingested without metadata.

Place MODS XML documents directly into the batch job folder. The filenames of the XML documents MUST contain a suffix that indicates the Content Model the DAMS should use for creating placeholder assets. The Content Model specified when creating placeholder assets with this method CANNOT be changed after ingest.

The general pattern for naming MODS XML documents for this ingest method is <filename>==<CONTENTMODEL>.xml.

The <filename> portion CANNOT contain spaces, use underscore (_) instead.
It is recommended that the <filename> portion anticipates the filename of the media file object to be added later, but this is not a requirement. Media file objects added later with Method 5 (see below) are matched to their respective placeholder assets using a manifest file and the placeholder asset PIDs.
Use 2 (two) equal signs to connect the <filename> portion and the <CONTENTMODEL> portion of the XML document's filename.
The <CONTENTMODEL> portion of the filename must be one of the following options:
- L-IMAGE (Large Image content model, for TIF or JPEG 2000 files; example filename: my_large_image23==L-IMAGE.xml)
- AUDIO (Audio content model, typically for WAV or MP3 files; example filename: interview_part1==AUDIO.xml)
- VIDEO (Video content model; example filename: episode012==VIDEO.xml)
- PDF (PDF content model; example filename: my_pdf_file==PDF.xml)
- BINARY (Binary content model; example filename: very_important_asset==BINARY.xml)

The DAMS will create one DAMS asset per MODS XML file, using the content model specified in the filename. Each asset will store the metadata contained in the MODS XML file in its MODS datastream. The OBJ datastream of each asset will contain a placeholder media file.

Option 5 - Add media file objects to placeholder assets

This ingest method adds media file objects to their corresponding placeholder assets created with Method 4. This method matches a media file objects to its corresponding placeholder asset in the DAMS using the placeholder asset's PID. To obtain a list of all assets and their PIDs in a given DAMS subcollection, you can use the DAMS asset report script: https://github.austin.utexas.edu/mmh4428/dams_user_tools/tree/main/reports or the Advanced Search function in the DAMS GUI.

Place media file objects directly into the batch job folder. Add a manifest file named info.txt in the same folder, to allow matching between placeholder asset and a media file.

Each line of the info.txt manifest MUST contain the following information: <PID without namespace identifier>===<filename.ext>.
<PID without namespace identifier> is the part of a PID after the colon (UUID), e.g. 9ebf6ac8-1823-4bf4-8398-654b54090776 for PID utlarch:9ebf6ac8-1823-4bf4-8398-654b54090776.
Use 3 (three) equal signs to separate the UUID and the <filename>
The <filename> portion CANNOT contain spaces, use underscore (_) instead.

Sample info.txt

Step 2: Upload batch job to Jscape

Ensure you have a user account with the SFTP server Jscape by checking UT secrets vault stache for an entry named "<your name> JScape SFTP". Contact the UTL DAMS Management Team if you don't already have an account.

The Jscape web interface does not allow you to upload directories. We recommend using an SFTP client to connect and upload your batch submissions.

Connect to jscape in SFTP client:
```
Host: jscape.its.utexas.edu
port: 22
```
Upload your batch job folder into the appropriate location in Jscape:
1. TEST corresponds to running batch on dams-t01-rh7.lib.utexas.edu, PROD corresponds to running batch on dams.lib.utexas.edu
2. Place your batch job folder in the appropriate top-level collection folder within the INGEST folder
  Example: /DAMS/TEST/INGEST/utlmisc/my_batch_job_folder (is what I would do for a batch upload to the miscellaneous collection on the DAMS Test Server).
  
  Any spaces in folder names must be represented by underscores (e.g. special_collection_1).
  
  We recommend naming your batch folder with your eid, a reference to the destination collection name in the DAMS, or anything else that will help you recognize the batch. In the example <my eid>_<what I am ingesting>, the folder name would be mm63978_EnPatufet1908-1911.
Go over to the DAMS interface and submit your batch job to queue it to be run (see steps below).
Note: Your batch job folder will be removed from the JScape server after seven days whether or not you have run the batch. Back up your batch requests in box or on your local machine.

Step 3: Set up collection and submit form in DAMS interface

Navigate to or create the target sub-collection to receive batch ingested files in DAMS
Locate and copy the target sub-collection PID to clipboard (namespace:UUID, e.g. utlarch:9ebf6ac8-1823-4bf4-8398-654b54090776)
Navigate to the Batch Ingest form in the DAMS:
1. Production system: https://dams.lib.utexas.edu/utdams/batch_queue
2. Test system: https://dams-t01-rh7.lib.utexas.edu/utdams/batch_queue
Select the DAMS Top-Level collection from the dropdown field
Paste the PID of the target sub-collection into the form
Enter the name of the folder on the Jscape/FTP server that contains the files to be ingested (e.g. mm63978_EnPatufet1908-1911)
For batch ingest of simple assets and paged content: select the appropriate ingest type
Click submit

The DAMS should indicate at the top of the form that the batch ingest job was queued.

You will get an email notification after your request for a batch ingest has been received and another notice once the batch ingest process has finished.

UT Libraries Digital Asset Management System

Batch ingest simple assets

Analytics