DCMI 2014: Archival Fonds and Bonds Preconference

DCMI 2014: Archival Fonds and Bonds Preconference

  • Dr. Ennis:

    • he encourages collaboration, thinks better answers will be arrived at together

  • Gavan McCarthy and Daniel Pitti

    • Expert Group on Archival Description (International Council on Archives)

      • indigenous Australians - intergenerational transfer of knowledge

      • archivists have more in common with archeologists, system analysts, and forensic scientists than librarians

      • the fact that archives are fragments of evidence of what actually happens, to describe them at an item level would make many records beyond understanding

        • if you only look at the items, you are pulling them out of their context of meaning

      • economy is a factor

        • 10 human lifetimes in order to make sense of one human lifetime of activity - Galileo Archive

        • creating a complex graph might be ideal but not practical

        • hierarchy has been the answer in the past

    • ICA (Paris Headquarters) EGAD

      • 1990-2008

      • 2012-2016 Expert Group on Archival Description (inspired by museum and library community - FRBR and CDOC(?))

        • conceptual model for archival description

        • records in context (RIC)

    • Historical Context

      • since mid 19th century cultural heritage has been involved in the following:

        • reimagining description to account for new communication technologies (think the conversation on intersections between digital library, item-level consortia and archival description and aggregate-level description)

        • Trends:

          • separate components of description

          • more efficiently and more effectively use/recreate prevailing access tools

          • enable new tools based on recombination of existing standards

          • the 4 ICA standards reflect all of those trends

          • separation and new perspectives were not realized because they didn’t say how exactly it could or would be done - the adoption was low and there were a very few number of systems that implemented these standards

    • Current Technologies

      • graph technologies: RDF, semantic web, linked open data

        • more expressive but harder to implement

        • opportunities: separation (discretization), recombining, interrelating, break down domain borders

    • So, the model that EGAD is working on is an attempt to take advantage of all of these trends and tools

      • 3 teams, 2 products

        • RIC Ontology (product)

          • high-level: world as perceived by archivists implemented in OWL

          • foundation for interrelating archival community with library and museum communities (CIDOC, CRM/FRBRoo)

          • high level class that include high-level descriptive entity: agent, record, record set, function, mandate

        • RIC Conceptual Model (product)

          • subset of ontology

          • heir to 4 ICA standards

          • high level descriptive entities: agent, record, record set, function, mandate

        • Principles and Terminology

          • ensure RIC is grounded in existing archival principles

          • make sure the terms are unambiguous

          • language translations

  • EAD3 Michael Rush

    • points of emphasis about the revisions:

      • achieving greater semantic and conceptual consistency (address the complain of too many ways to do the same thing)

      • interoperability with other structured archival data, eg, EAC-CPF

      • addressing international community

      • mindful that the new version will affect current users

    • change summary

      • replaced <eadheader> with <control> (more in line with ICA standards and emergence of EAC)

        • one major thing <control> doesn’t do is keep track of what <filedesc> does

        • <control><recordid><otherrecordid><representation>(HTML version of the XML record)<filedesc><maintenencestatus><maintenenceagency><languagedeclaration><conventiondeclaration><maintenancehistory>

      • modified the <did> elements

        • removed extended linking and only have simple XML linking

        • <physdescstructured>

          • all the stuff in physdesc will be migrated straight to simple physdesc

          • future of physdescstructured

            • required attributes - coverage, type, quantity, unittype, physfacet, dimensions, descriptivenote

            • you can combine structured statement, if both describe a part, you can describe the whole by linking the two

      • adopted <relations> (key to staying in sync with ICA work)

        • supporting the use of linked data in archival description

      • updated access term elements

      • disentangled descriptive elements (getting arrangement info out of the scope and content notes)

      • replaced <note> - new elements available for each formerly semantic use of note

      • made block elements more like HTML

      • updated <chronlist>

      • simplified linking elements

      • added support for multilingual description

      • streamlined mixed content models (only 3 mixed content models)

      • made global attribute changes

      • deprecated some elements from EAD2002 (meaning no migration path from EAD3-EAD4 for those elements

      • The hard parts:

        • folks wanted to adopt the camel case naming convention in EAC-CPF

        • long element names and no camel case ultimately because it would have been a pain

        • relations presented political challenges - people wondered if relations was appropriate for archival description so it is considered an experimental element in EAD3 so they want people to use it

        • working towards convergence of EAD and EAC - and integrating documentation

        • we need a single super schema

        • governance - they are moving to a single group for both EAC and EAD

        • they are moving to an ongoing revision cycle instead of every 8 years

        • github repository

  • Europeana Data Model, Archival Hierarchy (Archives Portal of Europe and Europeana)

    • Archives Portal

      • Archives Portal Europe Foundation is their shift to sustainability

      • 30 countries in the Portal

      • 683 institutions

      • 1044 creator records

      • 335K EAD files

      • apeEAD for finding aids

      • apeEAC-CPF for records creators

      • EAG2012 for descriptions of the institutions themselves and their services

      • future: apeMETS and apeMETSRights for structured information on digital archival objects

      • decentralized responsibility for the use of standards

      • local and central tools for conversion and validation

      • local and central tools for the creation and edition of data

    • Europeana

      • CC0 license for all the metadata

      • europeana ecosystem

        • they work with aggregators, one aggregator per domain (national, regional, thematic, audiovisual, libraries, archives portal

      • Challenge was to accommodate all that data in one data model

        • archive data was a challenge to integrate (DPLA still has not solved this problem, why aren’t they asking Europeana)

        • ESE (item-centric, based on Dublin Core) trying to merge with EAD (aggegate-level, highly structured)

        • Europeana Data Model (EDM)

          • reuses several semantic web-based model

          • uses semantic web representation principles (RDF)

            • resuse and mix different voabularies

            • preserve original data, allow for interoperatbiity

          • When aggregators submit to Europeana they send their data in the EDM bundle: description of the original (cultural heritage object) object, description of a digital representation of the original object, description of the web resource, description of “contextual entities” (data about related objects, eg, timespan, concept, etc.)

          • connecting archives to other domains (libraries, museums): create a semantic layer on top of cultural heritage objects

          • representing hierarchies in EDM

            • level of cultural heritage object: fonds, series, subseries, file, item (semantic structure of the object, but also hierarchies at the level of the agent - person within an organization)

            • level of web resource

          • The fonds and every lower level of an arrangement requires its own EDM bundle - cultural heritage object is the series, the webresource may be the landing page on the repository website, etc

          • Once you have created the bundles for each level of arrangement, you start linking the bundles together to re-represent the hierarchy using “isPart”, “isParent”, “isNextinSequence”

          • You can express hierarchies at each level (the cultural heritage object, the web resource, etc.)

        • The Archives Portal has stepped in to make it easier for repositories to create the EDM bundles for their data - a conversion tool for their finding aid data; they include a full conversion or minimal conversion (some partners cannot submit everything because once its in Europeana the data is licensed as CC0)

        • There are a few more granular access statements for digital archival objects and they are using various CC licensing options

          • METSRights in the future

        • Once the conversion is completed, the repository gets a report about what was added

        • They are only providing the part of the finding aid that has digitized objects - this is how they account for the item-level access thing - they are providing items but then some contextual information to go with it

        • Remaining challenges for Europeana: representing hierarchies with one level missing metadata: mabye you have an item digitized from the series and then from the subsubseries, nothing from the subseries

  • ArchivesSpace (Brad Westbrook)

    • Facts

      • possible integration between islandora and archivesspace

      • technical advisory group charged with import/export, architecture, migration issues, development documentation

      • user advisory council is its own thing - bug testing, user documentation, feature requests, compares products,help desk

    • Stack

      • JRuby, Sinatra on Rails; jQuery, Twitter Bootstrap; JSON, MySQL, Derby, Solr; RSpec, Selenium, SimpleCov; Git, Ant

    • Interfaces

      • Public

      • Admin

      • Solr

      • API

    • Are they working on something like the convert tool that Europeana built, where it creates metadata for archival digital objects and their place in the hierarchy that matches the DPLA metadata profile? That would be cool. We could send those packages directly to the Portal and potentially have more control over what stuff gets pushed from Portal to DPLA

    • They have something like Drupal taxonomy so that a given institution has control over the way users have access to the collections other than series, subseries, etc

    • Their physical storage module is in Collection Management/Locations?

    • Resource Records are more complex (presumably this means top-level record that gives access to the entire hierarchy)

    • Is there an internal validator for the EAD so that a check is done before it is exported?

    • Design objectives of the AS project:

      • integration of core archival functions (accession record creation, physical storage management, archival description, etc.)

      • automated encoding (you can export your data as MODS!?)

      • repurposing of data

      • flexible deployments

    • Still using EAD2002 export - will change in the next year

  • RAMP Project

    • archives and wikipedia

    • how about an iSchool editathon for adding archival collections citations to pages - sponsored by SAA Student Chapter

    • RAMP included special collection, university archives, metadata & cataloging

    • application extracts data from the ead eac-cpf xml

    • the pilot used the Cuban Theater Collection to test 1) how long would it take to create a page, 2) how well does the application work, 3) does it actually increase traffic

    • the wikidraft created by the tools goes to the communication department, the curators also have an opportunity to review, and then they make the page live, then there is PR on Twitter or whatever social media platform used by the respository

    • the selection criteria for choosing collection to create wikipedia pages was based on the completeness of the finding aids - the ones with the most biographical notes

    • so it looks like they had to add some stuff to the finding aid like rights statements, that would be carried along during the conversion to wikidraft

    • 1 hour creating each wiki page

    • Web traffic and wiki referrals - outcomes: (DLib, 2013)

      • Less than 2000 pageviews before wikipedia links to collection material was added

      • after links were added, 13000

      • RAMP stats - Wikipedia accounted from 1% (a little over 1200 page views) of referrals of pageviews for the finding aid web views (sept.2013-2014)

      • the wikipedia metadata added from finding aids was used to populate the googel knowledge views that come up when you search for a name

      • DBpedia URI with no extra effort beyond creating the wikipedia page

      • good return on investment for wikipedia page creation

  • xEAC (pronouned Zeke): XForms (W3C standard embedded in XML) for EAC-CPF

    • repository dedicated to the study of coins

    • he works with archaeological data more than archives and library data

    • he is interested in using EAC-CPF for archaeological and archival resources

    • XForms interacts with REST services - you can push and pull

    • XForms processor (Orbeon) - Islandora forms uses XForms, right?

    • Solr, eXist, SPARQL Endpoint

    • user interface, author/editor, convertor for different standards to publish to various web services

    • challenge for EAC-CPF, maintaining relationships between entities - making sure edits go both ways, only having to edit one record

    • The tool allows you to build your own ontology - but you can also upload OWL or RDF ontologies

    • you can query from snac and VIAF - so that in your repository universe with you Dolph Briscoe, you can indicate that your Dolph Briscoe is the same as the Dolph Briscoe in SNAC

    • it can do lookups on DBPedia

    • pulls Getty for profession and genres

    • you are essential embedding all existing URIs associated with parts of the record into the record

    • so the public interface is a map because it takes those location authorities and creates KML files

    • there is also a browse interface that is run by Solr

    • Linked Open Data principles

      • their finding aids are published using an XForms editor that publishes to an RDF triplestore

      • EAD, MODS (for photographs), TEI (document markup)

      • result: archival resources delivered through SPARQL - delivers, in a scaleable way, associating an authority with all the records that they have a relationship to

      • bio info can be extracted from the EAC record and displayed in the finding aid

      • git repository for xEAC, ead editor at blogspot

  • he only took the elements from EAC that would be useful in a linked data environment