Using OpenRefine and xml_split to create DAMS MODS XML from spreadsheet

The following was adapted from the blog post "Converting Spreadsheets into MODSXML using Open Refine" by The Digital Scholarship Unit (DSU) at the UTSC Library. When archives metadata can be exported to spreadsheet, it is possible to use OpenRefine and xml_split to create a DAMS MODS XML for each record in the spreadsheet. The following is written for Mac Sierra. Windows 10 documentation forthcoming.

What you need:

OpenRefine v2.8 or later from GitHub.

Your metadata spreadsheet. For example, Architecture's scan aaa-batch00290.xlsx

A DAMS MODS record. For example, the James Riely Gordon Collection MODS Record.zip

What you do:

Make a customized MODS template based on a MODS record

Optionally, if you want to skip creating a customized MODS template, download sample MODS template here.

To make a customized MODS template, open the MODS record in your preferred text editor. Example screenshot.

Replace line 1 of your MODS record with these lines
<!-- For Prefix -->
<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-4.xsd">

<!-- For Row Template -->

Example screenshot.

For each variable value in your metadata, replace the fixed string value with a jsonize(value) expression matching the corresponding metadata field in your spreadsheet
{{jsonize(cells["title"].value).replace('"','')}}

Example screenshot.

Note, the expression is case sensitive, so the value in between the double quotes, must case-match the corresponding value in the metadata field. Example screenshot.

For each fixed value in your metadata spreadsheet, replace the corresponding string value in the MODS record. Example screenshot.

Delete the DAMS generated lines for identifier and relatedItem. Example screenshot.

Paste these lines to the bottom of your MODS record
<!-- Don't put anything in Row Separator -->
<!-- For Suffix -->
</modsCollection>

Example screenshot.

When edits are complete, save the file with a unique name distinguishing it as a DAMS MODS template. For example, dams-mods-template-repository-collection.xml.

Configure an OpenRefine project with imported metadata and a template, then export the results

Optionally, if you want to skip these steps, you can download an export file sample here.

Install OpenRefine to your applications folder per the usual steps for installing Mac Disk Image files. If you see the security dialog "OpenRefine can't be opened because it is from an unidentified developer", then launch security and privacy, and authenticate with your admin credentials to open the application anyway.

In OpenRefine, we'll want to import our metadata spreadsheet by clicking choose files. For example we'll open Architecture's spreadsheet, then click next to upload the data. Optionally, you can configure parsing options and preview the data before creating a project. In this example, configuring parsing options will be skipped. Click Create Project. Open Templating from the Export menu. The templating export dialog is split into a left-hand templating side and a right-hand preview side (screenshot). Replace the left-hand side Prefix, Row Template, and For Suffix content with the corresponding stanzas from the DAMS MODS template (screenshot). Verify the values in preview, editing the template strings or expressions as needed, then click export. Rename the exported file with .xml (screenshot).

Run xml_split to create a DAMS MODS XML for each record

Open Terminal on your Mac.

Login with admin credentials.
login adminusername
Install XML::Twig from CPAN
sudo cpan XML::Twig 

Press 'y' to install all tools. If you don't have a compiler installed on your system, you'll be prompted to get one. A compiler is required for the installation to be successful. Xcode is recommended, and was used in this documentation. If XML::Twig installation is successful, the last installation line will read "/usr/bin/make install  -- OK." Type the logout command to return to your user prompt.

Run xml_split on the DAMS MODS XML file
xml_split /path/to/your/openrefine-export-file.txt.xml

Hint, drag and drop from finder into your terminal to avoid having to type the path to the export file.

If your terminal returns an error like "not well-formed (invalid token) at line 151, column 130, byte 8952 at /System/Library/Perl/Extras/5.18/darwin-thread-multi-2level/XML/Parser.pm line 187." It might be due to an illegal character. In our example, an illegal ampersand(&) is used in the string. Ampersands are escape characters in XML. To print literal ampersands, replace & with the HTML named entity &amp; Example screenshot.

When xml_split completes, you'll have a DAMS MODS XML for each record in your metadata spreadsheet. In our example, we get three records. These records are now ready for ingest into DAMS.