Table of Contents

...

<ARGUMENT> Value Associated Purpose Accepted File Types Additional Notes

MODS MODS XML file name provide MODS metadata for an asset xml Can be used for publication/series-level assets, book and issue-level assets.

TN

thumbnail image file name

provide a thumbnail picture for an asset

png, jpg, jpeg

Can be used for publication/series-level assets, book and issue-level assets.

If no thumbnail is provided during batch ingest, the DAMS will copy the thumbnail image of the first page of the asset to the book/issue level asset.

LANG

three-letter language code

instruct the DAMS software to perform OCR for each page

N/A

Can be used for book/issue-level assets.

See page _Text extraction in DAMS for the list of languages for which the DAMS software supports OCR processing.

Note
The OCR software built into the DAMS provides unoptimized recognition results for a limited set of supported languages. Consult with Digitization Services about the external OCR software available for processing, which will yield better recognition results.

Info
If you specify a language not supported by the DAMS software, the asset will still be ingested but no OCR extraction will be performed.

PAGE<NUMBER>

name of the file with the page image

provide page content, in sequential order

tiff, tif, jp2

Can be used for book/issue-level assets.

Replace <NUMBER> with a number for each page that indicates the page's sequential order, for example:

Code Block
PAGE001==filename001.tif PAGE002==filename002.tif (etc.)

Pad the number with zeroes. The number of zeroes for padding is up to you.

PAGE<NUMBER>_OCR_CUSTOM

name of externally generated OCR file for that page

allows you to provide your own OCR datastream for each page

txt

Can be used for book/issue-level assets.

PAGE<NUMBER>_<CUSTOM_DATASTREAM>

name of additional file

allows you to add custom datastreams to page-level assets

*

Replace <CUSTOM_DATASTREAM> with a datastream label. The label should correspond to one of the recommended datastream types listed on page Anatomy of DAMS digital assets.

If you wish to ingest additional files that do not match any of the listed datastream types, please contact the DAMS managers for consultation (click here to submit a DAMS service request).

Warning

DO NOT use any of the Restricted Datastream IDs.

DO NOT use any of the system-generated datastream labels to ingest additional files, as they may be overwritten by the DAMS software.

FULL_TEXT_CUSTOM

name of text file with externally created full text (text extracted from PDF)

allows you to provide your own FULL_TEXT datastream for a book/issue

txt

Can be used for book/issue-level assets.

Note
Use only for assets where the primary source file is a PDF document and for full text produced with pdftotext. See page _Text extraction in DAMS for details on the different text extraction/recognition methods.

PDF

name of your pdf file

PDF for resource

pdf

Can be used for book/issue-level assets.

Use to add an externally created PDF document to an asset.

Info

If no page images are specified in the manifest, the DAMS will render image files from the pages of the PDF document and use these images to create page-level assets.

For digitally reformatted (scanned) content, using a PDF as a source for creating page images is strongly discouraged, as the automatically created page images are almost invariably of lower quality than the original scan images. Contact the DAMS managers for a consultation (click here to submit a DAMS service request).

For born-digital content (for instance modern PDF ebooks or PDF documents directly exported from a word processor), other content models and ingest processes will be more appropriate. Contact the DAMS managers for a consultation (click here to submit a DAMS service request).

HOSTPUBLICATION

PID without namespace ID

Add issue(s) to publication

text

Can be used for book/issue-level assets.

Use to specify which publication/series-level asset an issue shold be added to.

PID without namespace ID is the part of a PID after the colon (UUID), e.g. 9ebf6ac8-1823-4bf4-8398-654b54090776 for PID utlarch:9ebf6ac8-1823-4bf4-8398-654b54090776.

HOSTISSUE

PID without namespace ID

Add pages to an issue

text

Can be used with sets of page images.

Use to specify which issue-level asset a set of page images should be added to.

PID without namespace ID is the part of a PID after the colon (UUID), e.g. 9ebf6ac8-1823-4bf4-8398-654b54090776 for PID utlarch:9ebf6ac8-1823-4bf4-8398-654b54090776.

HOSTBOOK

PID without namespace ID

Add pages to a book

text

Can be used with sets of page images.

Use to specify which book-level asset a set of page images should be added to.

PID without namespace ID is the part of a PID after the colon (UUID), e.g. 9ebf6ac8-1823-4bf4-8398-654b54090776 for PID utlarch:9ebf6ac8-1823-4bf4-8398-654b54090776.

Folder naming conventions and folder hierarchy

...

Code Block

eid1234_example-batch-submission/ (batch job folder)
├── grapes_of_wrath_BOOK/
│   ├── datastreams.txt
│   ├── modsfile.xml
│	├── book_level_custom_ocr.txt
│	├── book_level_pdf.pdf
│   ├── page01.tif
│   └── page02.tif
├── wall_street_journal_PUBLICATION/
│   ├── datastreams.txt
│   ├── modsfile.xml
│   ├── wsj_jan_2016_ISSUE/
│	│	├── datastreams.txt
│	│	├── modsfile.xml
│	│	├── page01.tif
│	│	└── page02.tif
│   └── wsj_feb_2016_ISSUE/
│		├── datastreams.txt
│		├── modsfile.xml
│		├── page01.tif
│		└── page02.tif
├──	ascii_art_monthly_july_2021_ISSUE/
│   ├── datastreams.txt
│   ├── modsfile.xml
│   ├── page01.tif
│   ├── page01_custom_ocr.txt
│   ├── page02.tif
│   └── page02_custom_ocr.txt
└──	nyt_2020-11-04_PAGES/
	├── datastreams.txt
	├── issue_level_custom_ocr.txt
	├── issue_level_pdf.pdf
    ├── page01.tif
    ├── page01_custom_ocr.txt
    ├── page02.tif
    └── page02_custom_ocr.txt

Step 2: Upload batch job to Jscape

Multiexcerpt include

MultiExcerptName	Batch ingest upload
PageWithExcerpt	Batch ingest simple assets

Step 3: Set up collection and submit form in DAMS interface

Multiexcerpt include

MultiExcerptName	batch ingest queue
PageWithExcerpt	Batch ingest simple assets

Versions Compared

Old Version 16

New Version Current

Key

Folder naming conventions and folder hierarchy

Step 2: Upload batch job to Jscape

Step 3: Set up collection and submit form in DAMS interface

Page Comparison

Versions Compared

Old Version 16

New Version Current

Key

Folder naming conventions and folder hierarchy

Step 2: Upload batch job to Jscape

Step 3: Set up collection and submit form in DAMS interface