bag-info.txt extractor

This Python script extracts values from bag-info.txt files for use in digital stewardship record keeping. It looks for all bag-info.txt files in a given root directory and outputs a CSV file at the top of that directory.

The script identifies multi-line values by searching for three blank spaces at the beginning of a line. These blank spaces are standard in Bagger but may not be present in the output of the command line or Python versions of Bag-it.

When the script encounters a line beginning with a label (i.e. not beginning with three blank spaces), it adds the previously concatenated value to a list and begins concatenating a new value.

The Bag-Size value is added to the list immediately when it is encountered (rather than when the next label is encountered) because Bagger always outputs this value last.

When the end of the file is reached, the list is written as a row to the output CSV file.

Download the script from GitHub:

https://github.austin.utexas.edu/glib/bag-info_extractor