4. Bagging

The Library of Congress’ BagIt spec outlines a number of relatively broad requirements for packaging collection materials, but there are a few software options for creating bags (“bagging”). Bagging packages a folder of files into a “data” directory, and places that data directory alongside four plaintext files:

  1. bag-info.txt, containing administrative and descriptive metadata for the bag
  2. bagit.txt, identifying the version of bagit used to create the bag
  3. manifest-sha256.txt, listing the contents of the bag and their corresponding hash values
  4. tagmanifest-sha256.txt, listing the plaintext files created during bagging itself, with their corresponding hash values

These files allow bag creators and managers to quickly review the contents of a bag as well as verify file fixity. The bag-info.txt file requires manual input, but the other three files are created automatically during the bagging process.

bag-info specification

The Library of Congress BagIt spec does not impose any restrictions or requirements for the contents of bag-info.txt files. Those requirements are instead implemented by institutions themselves. The UTL bag-info specification outlines which metadata fields are required for all bags submitted to tape. See the UTL bag-info specification page for a detailed explanation of each field.

Bagger

Bagger is a Java-based GUI bagging tool that is useful for creating a single bag at a time. Download the ZIP file here and extract the files somewhere easy to access, such as C:\bagger. On Windows, run “bagger.bat” in the “bin” folder. On Unix, run the “bagger” file (no extension) in the same folder.

When Bagger opens, click “Create New Bag”, select the “UTL” profile from the drop-down menu in the window that appears, and fill out each field marked with a red “R” (Required). Optionally fill out any fields not marked as required.

Next, click the green + button in the “Payload” section and navigate to the bag folder you created in step 1. Bagger will load the files into memory, which may take some time.

Once this process completes, click the “Save Bag As…” button. In the window that opens, select a folder where the completed bag will be saved, and change the Tag Manifest Algorithm and Payload Manifest Algorithm drop-down menu options to “sha256”. Leave “Holey Bag” unselected, and leave “Generate Tag Manifest” and “Generate Payload Manifest” selected.

Hit OK to start the bag creation process. This may take some time. Check the destination folder once the process completes to see the values you entered in the resulting bag-info.txt file.

pybagger

pybagger is a Python 3 script that allows you to provide your own bag-info.txt file when bagging a directory, rather than manually entering values in a GUI as with bagger. It requires that the bagit-python library be installed in your local Python environment. pybagger will generate a UUID External-Identifier, so you don’t need to create one yourself.

To run pybagger on the command line, enter your environment’s Python 3 command (typically “python” or “python3”) followed by the path to pybagger.py, then type “-d” followed by the path to the directory to be bagged, then “-b” followed by the path to the bag-info.txt file you’d like to use for the bag. For example:

The above command will create a bag at C:\bags\wbs_0004, and the resulting bag-info.txt file will be based on the contents of C:\bag-infos\wbs_0004.txt.

Run pybagger with the “-h” (help) option to display a description of the program and options.

Run pybagger with the “-u” (unpack) option to unpack a directory, moving the contents of the data directory up to the top-level folder and deleting all plaintext files created during bagging.

See the pybagger GitHub page for more information.

batch_bagger

batch_bagger is similar to pybagger, and allows you to bag multiple directories in succession. Rather than write a bag-info.txt file for each directory, batch_bagger works with a bag-info template file and a spreadsheet file. The bag-info template can contain any descriptive and structural metadata that applies to every folder in a batch, with placeholders for folder-level metadata enclosed as keywords in double square brackets (e.g. [[creatorName]]).

For each individual folder, the values for these placeholder keywords are supplied via the user-provided spreadsheet (XLSX or CSV). The spreadsheet lists the folders to be bagged in the first column, with each subsequent column corresponding to a keyword enclosed in double brackets in the bag-info template. This makes it easier to create bag-info.txt files for a large number of bags in one sitting, especially when creating many bags from a single archival collection whose descriptions are mostly identical.

For example:

This bag-info template is mostly made up of descriptive information that will apply to every bag in the W. B. Stephens Collection, but placeholder keywords on lines 9, 17, and 20 will be drawn from a corresponding spreadsheet.

This corresponding spreadsheet lists the names of the folders to be bagged, with the remaining column headers matching the placeholder keywords in the bag-info template. For each folder in column A, the values in columns B, C, and D will be substituted in place of the bag-info template’s placeholder keywords.

Run batch_bagger with the “-h” (help) option to display a description of the program and options. Like pybagger, batch_bagger has a “-u” (unpack) option to unbag a list of directories.

See the batch_bagger GitHub page for more information.

Bag validation

Every bag you create must be validated before it can be written to tape. The validation process ensures that bags are complete (i.e. that they are structured correctly and contain the necessary text files) as well as valid (i.e. that the manifest.txt and tagmanifest.txt files match the actual contents of the data directory and bagging files, respectively). Validation involves re-calculating the hash value of all the files in the bag, so it takes about as long as bagging itself.

The Java-based Bagger program has a built-in validation tool. To use it, click "Open Existing Bag" from the top menu, and select a bag in the window that appears. It may take some time for larger bags to open. Once the bag has been opened, click "Validate Bag" on the top menu to begin the process of validating the manifest files. Note that opening a bag allows you to edit the contents of its bag-info.txt file, so be careful not to make any changes!

You can also use the bag_validator Python script to validate a bag or set of bags. By default, the script will validate all bags in /dps/write_to_tape, but you can specify a specific bag to be validated with the -i option followed by a path to a bag, or a list of bags with the -f option followed by a path to a text file that lists the location of many bags. Adding the -r option will validate all bags in all subdirectories of a given folder.

Bags created with Bagger can be validated using bag_validator, and bags created with Python or other command-line BagIt tools can be validated using Bagger. Use whichever tool is most convenient for you.