OAI-PMH Documentation

Overview

This wiki page serves to document the OAI-PMH configurations created for the Blanton Museum of Art, Harry Ransom Center, and UT Libraries Collections Portal, as part of the Art and Cultural Heritage Collective (ACHC) grant funded by the Mellon Foundation. Information about Primo's OAI-PMH service, Dublin Core mappings, and background on testing and development of those mappings are located here. Information may change as the grant project progresses. 


For more information about OAI-PMH in general, as well as specific implementation in content management systems used across UT, see the buttons below in this section.

These mappings can be adapted or used as a guide for repositories within UT Libraries who wish to contribute records to Primo. If interested in doing so, please contact Devon Murphy at devon.murphy@austin.utexas.edu

Table of Contents

OAI-PMH Basics


OAI-PMH is a series of six protocols invoked within a web browser, allowing users to obtain metadata records. Institutions can aggregate metadata from other sites, or "data providers," as long as they maintain an OAI-PMH service (acting as "service providers" as seen in the graphic below.)


OAI-PMH Overview


Primo has an OAI-PMH module which allows it to accept various XML formats. Basic information about the data provider (name, source data type, metadata schema) and a stable link to its OAI-PMH endpoint are necessary for a harvest to be created. More information about Primo's OAI-PMH service can be found in the presentation slides below.



Current OAI-PMH Mappings


The spreadsheet below provides an overview of the main Dublin Core mapping and the three specific mappings for each grant partner. Use the tabs underneath the spreadsheet to change between the mapping views. 

OAI-PMH mappings for all three grant partners


As DPLA requires Title, Rights, and Identifiers (as well as the name of the repository), these fields are also required in our OAI-PHM service. Only one partner, the Harry Ransom Center, uses Qualified Dublin Core.


Development


These mappings are partially based on DPLA (Digital Public Library of America) and TexHub harvesting requirements and on shared metadata fields between the three grant partners. Other proposed institutions were also analyzed (Visual Resources Center collection.) This method allowed for record parity between three diverse collections and prepared the collections for possible aggregation into DPLA in the future. 

Mappings were tested within the Primo Sandbox environment, following the Sandbox refresh schedule. Challenges were presented by data modeling in CONTENTdm, reliance on vendors to implement OAI-PMH services and settings, and by Primo's resource types. In further detail:


  • CONTENTdm allows users to ingest multivalued fields into single cells. While CONTENTdm has a method to split these for display, this does not function for OAI-PMH harvesting, which retrieves the actual value. This becomes an issue for fields with variable lengths, as Primo has limited regular expression support.
  • Primo can split fields, but does not have extensive support for more complex regular expressions. Multivalued fields with variable length or formatting cannot be split into other fields successfully.
  • Some partners used in-house content management systems that were maintained by their IT departments, while others used vendor-supplied IT support. The latter could be challenging in some cases. For example, Gallery Systems only provide limited choices for users to set up their OAI-PMH feeds. Any change to fields or mappings has to be handled by the vendor. Without control over this process, wait times and communication were often tricky to navigate.
  • Primo can sort and facet records based on type (image, text, audio, etc.) It looks for these values in the dc:type field. If these fields do not match the expected values in its Resource Type mapping, items will be mapped as "Other." This became an issue due to some partners not using DCMI terms or having multivalued fields that contained DCMI and non-DCMI terms.


Takeaways

  • Encourage data providers to conduct metadata remediation before harvest implementation. After review of the harvested records, metadata remediation or remapping is recommended if errors persist. Primo has limited ability to edit fields.
  • Encourage data providers to investigate how their metadata fields are modeled (data value type, format, etc.). This modeling could have unexpected effects on harvest and display.
  • Migrations or changes in metadata practice on the data provider’s end can affect efficacy or display of their harvest.
  • How often records are updated by the service provider varies. Primo updates frequently, but not all recent changes will be immediately present.
  • Combined fields can cause problems for record display and indexing.
  • Not all fields are displayed or harvested. Aggregated records are not meant to replace the original record, but instead serve as a portal to the original source. Focus on shared fields across contributing institutions.
  • Primo type facets utilize DCMI terms for its internal mapping; if data providers want their resources to appear in this facet, their dc:type field should use DCMI terms. If not, they will be assigned the "Other" label.