Minimum metadata requirements
To aid in the standardization and discoverability of all information packages within UTL’s digital archive, all SIPs must include a minimum amount of metadata. Metadata should be as granular and useful as possible. Metadata for an information package can describe individual files or folders of files. See “Filename” description for detailed information.
Digital Stewardship’s current digital preservation management system, Archivematica, recognizes metadata.csv files included in the “metadata” folder of the information package and transposes them into a METS file generated during the AIP creation process. Due to this process, all SIPs must include a metadata.csv file. To create this metadata.csv, download the template provided, and refer below to the definitions and standard vocabularies.
Download metadata.csv template
Please note that all metadata.csv files must be saved as a CSV UTF-8 file.
Do not put multiple values in one cell.
Repeat values must be placed in separate columns. See below an example of repeated values in separate columns:
Field Definitions
The following are the minimum metadata requirements:
Filename / filename (required, non repeatable value)
Archivematica requires that each metadata.csv have a filename column located first in the spreadsheet. This value can represent the file path to a folder or to a file, depending on the desired granularity of the description. For Archivematica to recognize the asset, the file path value must always start with objects/ instead of the SIP number or folder name.
As shown in the example below, to the left is the SIP payload structure formed by the SIP parent folder (2021_0384) containing the main subfolders (2021_0384_derivatives, 2021_0384_files, and metadata). To the right is the first column of the metadata.csv for this 2021_0384 SIP with the value for the filename column being objects/2021_0384_files, indicating that the granularity of this description is at the SIP folder level.
If creating folder-level metadata, the filename path should point to the subfolders present in the SIP. If assigning metadata to more than one folder, additional rows can be created for each folder.
Example: objects/2024_0001_files
At least one subfolder in the SIP must have metadata assigned to it.
If creating file-level metadata, the filename path should point to the files present in the SIP. If assigning metadata to more than one file, additional rows can be created for each file.
Example: objects/2024_0001_files/txu-oclc-1389584595-src001.mp4
Identifier / dc.identifier (required, repeatable)
Identifier values can include sip numbers, bag-group identifiers (see section on standard vocabularies for more information), OCLC numbers, call numbers, format call numbers, locally-assigned archival identifiers, or project names/IDs. At least one value must be present.
Example: aaa-batch00248-2015-06-23
Title / dc.title (required, repeatable)
Main standard title of the batch materials/collection. If both the collection title and the asset titles are included (or other subcategories like folder titles, etc.), provide the collection title in the first title column. At least one value should be present.
Example: Urban Innovations Group records
Contributor / dc.contributor (recommended, repeatable)
Values should be taken from the Contributor vocabulary. Describes the owning location or repository that stewards the item. Values can be organizations and/or individuals. If an individual name is applied, write the name in First Name, Last Name format.
Example: Alexander Architectural Archives
Example: Nancy Sparrow
Description / dc.description (optional, repeatable)
Three types of summaries can be entered into this field: descriptive, processing, or related to versioning. For descriptive summaries, these should be short and describe the assets and/or collection. These descriptions can include publishers, creators, or details about the collections or projects the items may be involved in. Processing description that goes into more detail than the format or extent values can also be provided here. Examples can include detailed born-digital processing information. Versioning can also be provided, for example, when recording relationships between two related SIPS.
Please put each description into its own column and not all in the same cell.
Example: From the Urban Innovations Group collection, set of photos used for an exhibition. Original photos created by Charles Moore. All files are TIFFs.
Extent / dcterms.extent (optional, repeatable)
Numerical values indicating the amount of assets in the SIP.
Example: 10 photographs, front and back scanned
Format / dcterms.medium (recommended, repeatable)
Values should be taken from the Format vocabulary. Describes the physical and/or digital format of the items in the SIP. Terms should be applied in their full form (including any parentheses.)
Example: photographic materials
Rights / dc.rights (required, repeatable)
Curators are responsible for assessing the copyright/licensing status for collections of intellectual entities added to the digital archive. Stipulations from donor agreements must be recorded in the SIP metadata, in particular if they contractually restrict access to material.
Values should contain the following to describe the current copyright/licensing status of the SIP's contents: Embargo dates, a rightsstatement.org or Creative Commons license, and a statement describing current status/permissions. Rights statements and Creative Commons license values should include both the URL and the name of the license as established in the rights vocabulary for the DAMS. Dates must be expressed in YYYY-MM-DD format..
Example: Items are in copyright until YYYY, held by the architect's estate.
Example: In Copyright. https://rightsstatements.org/page/InC/1.0/
Archivematica can store very detailed information about the rights associated with an intellectual object in the archive. Stakeholders are strongly encouraged to create a rights.csv spreadsheet according to the following specifications and add it to the metadata subdirectory: https://www.archivematica.org/en/docs/archivematica-1.17/user-manual/transfer/import-metadata/#rights-csv
Standard Vocabularies
Contributors: The contributor field can include owning location, repository, organizations, or individuals that steward the item. These values should not include any abbreviations to maintain standardization. Use the standardized vocabulary list for common contributors.
Format: The format controlled vocabulary is derived from The Getty Research Institute’s Art and Architecture Thesaurus (AAT). A shortlist of 32 terms was created by Devon Murphy for the purposes of digital preservation.
Identifier (Bag-Group-Identifier): Bag-group-identifiers associate a related group of AIPs. These identifiers are typically assigned for a collection, subcollection, and/or project for material that is expected to generate multiple SIPs ingested into the digital archive. A project is only considered for a bag-group identifier if it involves curatorial efforts and the material doesn’t already belong in a collection.
These identifiers generally follow a predetermined pattern that consists of the repository abbreviation, collection/project name, and, if relevant, subcollection name.
{repo}_{collection_name}
{repo}_{collection_name}_{subcollection_name}
{repo}_{collection_name}_{subcollection_name}_{sub-subcollection_name}
ailla_{ContributorName}_{CollectionOrLanguageName} (using camel caps)
Ideally these identifiers do not include information related to format, unless it is part of the collection’s title, and collection title abbreviations are avoided, unless they are pervasive and unique. Use the decision tree below to determine whether your SIP needs a bag-group-identifier. Refer to the lists of existing IDs divided by repository as needed.