Metaman to Spreadsheet

The following procedure covers creating a master spreadsheet from a copy of Metaman records for use in migration and preservation purposes. A single Metaman record is actually composed of three records (work, item and custom) split among two XML files. This procedure attempts to automate or batch steps as much as possible. The output of these steps will also hopefully serve as a backup of Metaman work as the site is destined for retirement.

What you need (software versions used when making this wiki)

A PC (Windows 10 Version 1803, Build 17134.376) with FTP client (FileZilla 3.37.4), Bash subsystem (Ubuntu 18.04), XML/Text Editor (Oxygen XML Editor 16.0), and OpenRefine (3.0).

What you do

Create a copy of Metaman records using FTP

Open your FTP client. Connect to host sftp://legacy.lib.utexas.edu with your UNIX account credentials (screenshot.) If you don't have a UNIX account, Digital Stewardship can create a copy of the records for you. The remote location /home/metaman/bar/ contains Dublin Core and custom metadata in XML. Download a copy to a local location (screenshot.)

(Or download a metaman-sample.zip)

Join all XML into a single XML using Bash

Open your linux shell environment (screenshot.)

cd /path/to/your/files
ls -1 */*.xml | xargs -i cat {} > /path/to/your/files/master.xml

(Or download a sample-master.zip)

Remove/Replace extraneous strings using an XML editor

(A text editor like Notepad++ also works, but the ability to use XPath makes search and replace much easier and less prone to human error.)

Make a backup copy of master.xml. Open master.xml. In order for XPath to work we need to make this document well formed. This begins with removing all but the first XML declaration (replace to end, second occurrence of XML declaration (screenshot.) Then, add a top-level wrapping tag (i.e. <wrap></wrap> screenshot2) so XPath works, then save file. Select the first work string to replace, right-click and copy XPath, then CTRL+F (screenshot.) Repeat for all work tags. Then replace all item tags using the same method. Then replace all custom metadata tags using the same method, making sure to change the metadata element tag to <dcvalue element="example">. Here's a before and after example of all strings replaced. Replace all <work> with <mm>. Replace all </travel> with </mm>. Remove all <travel>. Remove all </work>. Remove all </item>. Remove all </dublin_core>. Remove all <dublin_core>. Remove all <item>. Replace all </metadata> with </dcvalue>. No need to remove all empty lines.

(Or download a sample-completed-master.zip)

Transpose key/value columns in OpenRefine

Export Spreadsheet and Project Configuration in OpenRefine

Modify export results in Excel. Insert table (this table has columnheaders). Format cells, enable text wrapping. Spot check results for accuracy against Metaman. Backup Metaman files, spreadsheet, and OpenRefine project export to UTBox.