DCMI 2014: Archival Fonds and Bonds Preconference

Dr. Ennis:

he encourages collaboration, thinks better answers will be arrived at together

Gavan McCarthy and Daniel Pitti

Expert Group on Archival Description (International Council on Archives)

indigenous Australians - intergenerational transfer of knowledge
archivists have more in common with archeologists, system analysts, and forensic scientists than librarians
the fact that archives are fragments of evidence of what actually happens, to describe them at an item level would make many records beyond understanding

if you only look at the items, you are pulling them out of their context of meaning

economy is a factor

10 human lifetimes in order to make sense of one human lifetime of activity - Galileo Archive
creating a complex graph might be ideal but not practical
hierarchy has been the answer in the past

ICA (Paris Headquarters) EGAD

1990-2008

ISAD(G), (fonds)
ISAAR, (persons and standards - based in large part from library authority files)
ISDF, (Ex. business functions, etc)
ISDIAH (Canada and Ottawa got the ball rolling) (to describe institutions with archival holdings) http://www.ica.org/10198/standards/isdiah-international-standard-for-describing-institutions-with-archival-holdings.html
people tried to pull all these together for “complete” archival description but efforts were not successful so...

2012-2016 Expert Group on Archival Description (inspired by museum and library community - FRBR and CDOC(?))

conceptual model for archival description
records in context (RIC)

Historical Context

since mid 19th century cultural heritage has been involved in the following:

reimagining description to account for new communication technologies (think the conversation on intersections between digital library, item-level consortia and archival description and aggregate-level description)
Trends:

separate components of description
more efficiently and more effectively use/recreate prevailing access tools
enable new tools based on recombination of existing standards
the 4 ICA standards reflect all of those trends
separation and new perspectives were not realized because they didn’t say how exactly it could or would be done - the adoption was low and there were a very few number of systems that implemented these standards

Current Technologies

graph technologies: RDF, semantic web, linked open data

more expressive but harder to implement
opportunities: separation (discretization), recombining, interrelating, break down domain borders

So, the model that EGAD is working on is an attempt to take advantage of all of these trends and tools

3 teams, 2 products

RIC Ontology (product)

high-level: world as perceived by archivists implemented in OWL
foundation for interrelating archival community with library and museum communities (CIDOC, CRM/FRBRoo)
high level class that include high-level descriptive entity: agent, record, record set, function, mandate

RIC Conceptual Model (product)

subset of ontology
heir to 4 ICA standards
high level descriptive entities: agent, record, record set, function, mandate

Principles and Terminology

ensure RIC is grounded in existing archival principles
make sure the terms are unambiguous
language translations

EAD3 Michael Rush

points of emphasis about the revisions:

achieving greater semantic and conceptual consistency (address the complain of too many ways to do the same thing)
interoperability with other structured archival data, eg, EAC-CPF
addressing international community
mindful that the new version will affect current users

change summary

replaced <eadheader> with <control> (more in line with ICA standards and emergence of EAC)

one major thing <control> doesn’t do is keep track of what <filedesc> does
<control><recordid><otherrecordid><representation>(HTML version of the XML record)<filedesc><maintenencestatus><maintenenceagency><languagedeclaration><conventiondeclaration><maintenancehistory>

modified the <did> elements

removed extended linking and only have simple XML linking
<physdescstructured>

all the stuff in physdesc will be migrated straight to simple physdesc
future of physdescstructured

required attributes - coverage, type, quantity, unittype, physfacet, dimensions, descriptivenote
you can combine structured statement, if both describe a part, you can describe the whole by linking the two

adopted <relations> (key to staying in sync with ICA work)

supporting the use of linked data in archival description

updated access term elements
disentangled descriptive elements (getting arrangement info out of the scope and content notes)
replaced <note> - new elements available for each formerly semantic use of note
made block elements more like HTML
updated <chronlist>
simplified linking elements
added support for multilingual description
streamlined mixed content models (only 3 mixed content models)
made global attribute changes
deprecated some elements from EAD2002 (meaning no migration path from EAD3-EAD4 for those elements
The hard parts:

folks wanted to adopt the camel case naming convention in EAC-CPF
long element names and no camel case ultimately because it would have been a pain
relations presented political challenges - people wondered if relations was appropriate for archival description so it is considered an experimental element in EAD3 so they want people to use it
working towards convergence of EAD and EAC - and integrating documentation
we need a single super schema
governance - they are moving to a single group for both EAC and EAD
they are moving to an ongoing revision cycle instead of every 8 years
github repository

Europeana Data Model, Archival Hierarchy (Archives Portal of Europe and Europeana)

Archives Portal

Archives Portal Europe Foundation is their shift to sustainability
30 countries in the Portal
683 institutions
1044 creator records
335K EAD files
apeEAD for finding aids
apeEAC-CPF for records creators
EAG2012 for descriptions of the institutions themselves and their services
future: apeMETS and apeMETSRights for structured information on digital archival objects
decentralized responsibility for the use of standards
local and central tools for conversion and validation
local and central tools for the creation and edition of data

Europeana

CC0 license for all the metadata
europeana ecosystem

they work with aggregators, one aggregator per domain (national, regional, thematic, audiovisual, libraries, archives portal

Challenge was to accommodate all that data in one data model

archive data was a challenge to integrate (DPLA still has not solved this problem, why aren’t they asking Europeana)
ESE (item-centric, based on Dublin Core) trying to merge with EAD (aggegate-level, highly structured)
Europeana Data Model (EDM)

reuses several semantic web-based model
uses semantic web representation principles (RDF)

resuse and mix different voabularies
preserve original data, allow for interoperatbiity

When aggregators submit to Europeana they send their data in the EDM bundle: description of the original (cultural heritage object) object, description of a digital representation of the original object, description of the web resource, description of “contextual entities” (data about related objects, eg, timespan, concept, etc.)
connecting archives to other domains (libraries, museums): create a semantic layer on top of cultural heritage objects
representing hierarchies in EDM

level of cultural heritage object: fonds, series, subseries, file, item (semantic structure of the object, but also hierarchies at the level of the agent - person within an organization)
level of web resource

The fonds and every lower level of an arrangement requires its own EDM bundle - cultural heritage object is the series, the webresource may be the landing page on the repository website, etc
Once you have created the bundles for each level of arrangement, you start linking the bundles together to re-represent the hierarchy using “isPart”, “isParent”, “isNextinSequence”
You can express hierarchies at each level (the cultural heritage object, the web resource, etc.)

The Archives Portal has stepped in to make it easier for repositories to create the EDM bundles for their data - a conversion tool for their finding aid data; they include a full conversion or minimal conversion (some partners cannot submit everything because once its in Europeana the data is licensed as CC0)
There are a few more granular access statements for digital archival objects and they are using various CC licensing options

METSRights in the future

Once the conversion is completed, the repository gets a report about what was added
They are only providing the part of the finding aid that has digitized objects - this is how they account for the item-level access thing - they are providing items but then some contextual information to go with it
Remaining challenges for Europeana: representing hierarchies with one level missing metadata: mabye you have an item digitized from the series and then from the subsubseries, nothing from the subseries

ArchivesSpace (Brad Westbrook)

Facts

possible integration between islandora and archivesspace
technical advisory group charged with import/export, architecture, migration issues, development documentation
user advisory council is its own thing - bug testing, user documentation, feature requests, compares products,help desk

Stack

JRuby, Sinatra on Rails; jQuery, Twitter Bootstrap; JSON, MySQL, Derby, Solr; RSpec, Selenium, SimpleCov; Git, Ant

Interfaces

Public
Admin
Solr
API

Are they working on something like the convert tool that Europeana built, where it creates metadata for archival digital objects and their place in the hierarchy that matches the DPLA metadata profile? That would be cool. We could send those packages directly to the Portal and potentially have more control over what stuff gets pushed from Portal to DPLA
They have something like Drupal taxonomy so that a given institution has control over the way users have access to the collections other than series, subseries, etc
Their physical storage module is in Collection Management/Locations?
Resource Records are more complex (presumably this means top-level record that gives access to the entire hierarchy)
Is there an internal validator for the EAD so that a check is done before it is exported?
Design objectives of the AS project:

integration of core archival functions (accession record creation, physical storage management, archival description, etc.)
automated encoding (you can export your data as MODS!?)
repurposing of data
flexible deployments

Still using EAD2002 export - will change in the next year

RAMP Project

archives and wikipedia
how about an iSchool editathon for adding archival collections citations to pages - sponsored by SAA Student Chapter
RAMP included special collection, university archives, metadata & cataloging
application extracts data from the ead eac-cpf xml
the pilot used the Cuban Theater Collection to test 1) how long would it take to create a page, 2) how well does the application work, 3) does it actually increase traffic
the wikidraft created by the tools goes to the communication department, the curators also have an opportunity to review, and then they make the page live, then there is PR on Twitter or whatever social media platform used by the respository
the selection criteria for choosing collection to create wikipedia pages was based on the completeness of the finding aids - the ones with the most biographical notes
so it looks like they had to add some stuff to the finding aid like rights statements, that would be carried along during the conversion to wikidraft
1 hour creating each wiki page
Web traffic and wiki referrals - outcomes: (DLib, 2013)

Less than 2000 pageviews before wikipedia links to collection material was added
after links were added, 13000
RAMP stats - Wikipedia accounted from 1% (a little over 1200 page views) of referrals of pageviews for the finding aid web views (sept.2013-2014)
the wikipedia metadata added from finding aids was used to populate the googel knowledge views that come up when you search for a name
DBpedia URI with no extra effort beyond creating the wikipedia page
good return on investment for wikipedia page creation

xEAC (pronouned Zeke): XForms (W3C standard embedded in XML) for EAC-CPF

repository dedicated to the study of coins
he works with archaeological data more than archives and library data
he is interested in using EAC-CPF for archaeological and archival resources
XForms interacts with REST services - you can push and pull
XForms processor (Orbeon) - Islandora forms uses XForms, right?
Solr, eXist, SPARQL Endpoint
user interface, author/editor, convertor for different standards to publish to various web services
challenge for EAC-CPF, maintaining relationships between entities - making sure edits go both ways, only having to edit one record
The tool allows you to build your own ontology - but you can also upload OWL or RDF ontologies
you can query from snac and VIAF - so that in your repository universe with you Dolph Briscoe, you can indicate that your Dolph Briscoe is the same as the Dolph Briscoe in SNAC
it can do lookups on DBPedia
pulls Getty for profession and genres
you are essential embedding all existing URIs associated with parts of the record into the record
so the public interface is a map because it takes those location authorities and creates KML files
there is also a browse interface that is run by Solr
Linked Open Data principles

their finding aids are published using an XForms editor that publishes to an RDF triplestore
EAD, MODS (for photographs), TEI (document markup)
result: archival resources delivered through SPARQL - delivers, in a scaleable way, associating an authority with all the records that they have a relationship to
bio info can be extracted from the EAC record and displayed in the finding aid
git repository for xEAC, ead editor at blogspot

he only took the elements from EAC that would be useful in a linked data environment