Versions Compared
Key
- This line was added.
- This line was removed.
- Formatting was changed.
This guide serves as an introduction to using regular expressions for mass editing of metadata, in this case applied to XML finding aids. This workflow can be useful for making updates to fit TARO Best Practice guidelines or to remediate harmful language across multiple finding aids. It can be adapted for use in any XML editor (Notepad++, Oxygen, etc.) The examples below use Notepad++ and Oxygen.
Tip |
---|
For more information about regular expressions (regex), check out the tools section on the rightbuttons below. |
Tools
Auibuttongroup | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Workflow
Determine what set of finding aids you want to edit. These finding aids should be in the same directory level. Making edits across a directory allows for easier quality control, as edited values and/or errors should consistently appear across items within it.
Section | ||
---|---|---|
|
Which regular expressions to use depend on what edits need to be made to the finding aid. The examples below detail three scenarios:
Expand | |||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||||||||||||
This example scenario adds a new controlled access section to a set of finding aids. Open the Find window by either pressing Ctrl + F or using the "Find" tab in the top menu. "Search → Find → Find in Files" in Notepad++ or "Find in Files" in Oxygen. Choose the folder of files to edit in the "Directory" dialog box. This view can be seen in the screenshots below.
Next, determine where to insert the controlled access section. In this example, some finding aids have a related material section and some do not. Two find and replace actions will be needed in order to place the controlled access section after related material or user restrictions. Before implementing any find + replace actions, the following options should be chosen in the "Find in Files" menu.
First, use the "Find All" button in the "Find in Files" window to see all instances of the ending </relatedmaterial> or </userstrict> tag. More than one value can be searched for by using | between each value. Using "Find All" returns a list of results as seen in the screenshot below, indicating what line of the finding aid the content is on.
There are two finding aids that only have a user restrictions section and two with a related material section. The find + replace action will need to be run twice for both of these variations. This can be accomplished by separating each set into different folders, then using the "Replace in Files" in the "Find in Files" menu to make changes to all finding aids in the folder (ensure the correct folder is selected in the "Directory" drop down box.) Next, enter in the "Replace with" dialog box the <controlaccess> section and its related text and elements. In the code block below, the </relatedmaterial> tag is placed first, to ensure the control access section appears after it. Then follows the related <head> tag indicating the specific kind of subject terms being added; in this example, it is geographic terms. The \n newline and \t tab regex syntax recreate the hierarchical structure needed in a finding aid. This may be a trial and error process in order to get the exact formatting.
Repeat this process for finding aids that have only a </userestrict> section.
In Oxygen, you can preview the results of the regex expression before running it on the folder of finding aids. Here, we can see that the changes are incorrectly being applied to the </userestrict> tag, not the </relatedmaterial> tag. The following screenshot shows the results of the find + replace regex action. |
Expand | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||||||
This example scenario adds content statements to a set of finding aids that already have a processing information section. Open the Find window by either pressing Ctrl + F or using the "Find" tab in the top menu. "Search → Find → Find in Files" in Notepad++ or "Find in Files" in Oxygen. Select the "Find in Files" tab and choose the folder of files to edit in the "Directory" dialog box. This view can be seen in the screenshot below.
Next, determine where to insert the content statement text. Following UT Libraries guidelines, this should be entered in the processing information section. In these finding aid examples, there is already a statement in the processing information section, indicating who created the finding aid. The content statement will need to be placed after it, as well as being properly indented underneath the <processinfo> and <head> tags.
Before implementing any find + replace actions, the following options should be chosen in the "Find in Files" menu.
First, use the "Find All" button in the "Find in Files" window to see all instances of the ending </processinfo> tag. Using "Find All" returns a list of results as seen in the screenshot below, indicating what line of the finding aid the content is on. This action identifies how many finding aids do have the section (if not, another strategy will need to be used) and if the tag is repeated elsewhere in other sections. In this case, the tag is unique enough to ensure the content statement will be added only in that section.
Next, enter in the "Replace with" dialog box the content statement text and related <p> tags (paragraph). \t means "tab" and will create a space/indent. \n means "newline" and will put the </processinfo> tag on the following line. The following statement will indent the content statement twice, then create a new line, and finally indent twice again to put the </processinfo> tag underneath the content statement.
In Oxygen, you can preview the results of the regex expression before running it on the folder of finding aids. Here we can see that the content statement is being added correctly after the processing statement. Select "Replace in Files" in the "Find in Files" menu to make this change to all finding aids in the folder. The following screenshot shows the results of the find + replace regex action. |
Expand | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||
This example scenario edits a set of existing subject terms that have harmful language. Open the Find window by either pressing Ctrl + F or using the "Find" tab in the top menu. "Search → Find → Find in Files" in Notepad++ or "Find in Files" in Oxygen. Select the "Find in Files" tab and choose the folder of files to edit in the "Directory" dialog box. This view can be seen in the screenshot below.
Next, examine where the subject terms you want to edit are located in the finding aids. Depending on the language, the terms to edit may appear in other parts of the finding aid. Using element tags can help to avoid unwanted edits. It can be helpful to leave "Match case" unchecked in order to catch any terms using varied syntax. In this case, the only instance of the term is in the subject terms section. If the terms existed in other parts of the finding aid that should not be edited, adding the element tag can help in confining the changes to the subject section.
Select "Replace in Files" in the "Find in Files" menu to make this change to all finding aids in the folder.
In Oxygen, you can preview the results of the regex expression before running it on the folder of finding aids. Here we can see that the term is being successfully replaced. The following screenshot shows the results of the find and replace regex action. |