2014-12-10 Meeting notes ADWG

Date

Attendees

Agenda

    • Mark Phillips will be visiting from University of North Texas to discuss:
      • Clarification about the Portal to Texas History - Does the Portal only accept materials that are related to Texas history? If so, how would we get non-Texas related materials to the DPLA?
      • Are certain subject areas/time periods etc. being more or less actively collected by portal? Are there recommendations on how we should prioritize digitization projects?
      • Does everything that gets submitted to the Portal go to DPLA - What is the selection criteria for pushing the content to the DPLA?
      • Is there a way to link back to TARO for more contextual information?
      • Services that the Portal provides to repositories or plans to provide - roadmap
      • What can we do as repositories, interested in contributing or continuing to contribute to the Portal, to help our digital collection material (and its associated aggregate-level description) make its way to the DPLA? What might be some actionable steps for repositories to take towards meeting the digital libraries/item-level access model half-way, and vice versa?
    • Reports back to the group regarding: 
      • Northwest Digital Archive (Carla), 
      • Folks using ArchivesSpace (Jessi)
      • Responses from Dan on follow-up questions (Jessica)
      • UT Lib white paper on special collections needs (Paloma) --> Jennifer Hecker will join us around 4 and give a brief update
    • Report back on the results of the Doodle poll for next semester scheduling  
      • Possibly meeting next semester during the workday - still on Wednesday
      • Is bi-monthly too frequent?
      • We need to check for conflicts with other working groups on campus
    • Schedule/timeline so that people can see where the tasks are on the timeline for:
      • Preparing Spring 2015 Archival Description Management System proposal
      • Scheduling investigation of EAC-CPF
        • Creator of Xece (pronounced "Zeke") project
        • Yale/Harvard pilot (SAA 2013)
        • What does a phased approach to getting our repositories in a position to create authority records look like?

Discussion items

ItemWhoNotes
Introductions
  • Everyone
Mark 
  • What is the Portal - Overview
    • Portal is a repository that UNT has been running for 10yrs (525,000 unique digital objects; 280 partners that contribute; 6-7M item uses)
    • DPLA Hubs initiative - how to aggregate content to a central point; goal was reduce the number of relationships with content providers
    • DPLA Hubs
      • content hubs - 1 to 1 relationship between content owner and DPLA
        • Getty
        • Government Printing Office
        • Haithi Trust
        • Smithsonian
      • service hubs - 1 to 1 relationship between service hub and DPLA and many to 1 relationship between service hub and content owner
        • Mountain West Digital Library
        • Kentucky Digital Library
        • Portal to Texas History is the 6th largest hub in the country
    • DPLA only harvests metadata and does some enhancement
      • data normalization
      • geocoding
    • Parses dates in order to create timeline
    • Creates thesauri (all versions of photograph)
    • In order to make your (descriptive) metadata part of DPLA, you have to make your metadata in the public domain and if it is not, you have to deed it to the public domain
    • DPLA hosts nothing, it only hosts metadata
    • Portal had to go through all their existing 250 partner agreements and have them explicitly sign over their descriptive metadata to the public domain
    • Has anyone been reluctant to do that?
      • Yes, one was a museum but once they explained the implications, they accepted
        • museum culture versus library & archives culture
      • Yes, they didn't want to resign due to the desire to take collections offline anyway
      • Most people haven't had an issue with it
    • Scope of the Portal:
      • It used to be materials related to Texas History but then overtime, the scope expanded to collection that are physically held by institutions in Texas
      • They aren't incredibly explicit about expanding the scope but they may eventually publish that collection policy online
      • Q (Beth): You said that for people that don't have the capability to provide this access on their own - is their a preservation element to the Portal?
        • A (Mark): Yes, there is. But we provide more than one model. We can provide the preservation master copy of the object in order to create derivatives and we host (redundant backup) for those that are interested in that. High res scans go into the UNT digital repository. As standards change on the web, we have the preservation master and can create new formats for web derivatives without contacting the partners. For others, they may just want us to push their metadata up. There is a spectrum. Insitutions get an access piece, a metadata admin interface, and storage of their preservation masters in derivatives for multiple locations.
      • Q (Beth): Sales pitch for DPLA was about usage statistics, will the Portal give us those statistics
        • The Portal has the notion of a "use" of a digital item - a series of interactions for 30 minutes by a single IP address (a session). Portal aggregates this data daily and you can look at your a statistics in the Portal. - there is a dashboard for usage.
      • Q (Jessi): What is the scope for other hubs - does the DPLA see value in coordinating hubs thematically?
        • A (Mark): Not at this point, hubs accept a lot of different stuff. Eventhough the Portal's scope has broadened, they would not change the name because they have branding around that.
      • Q (Esther): What about contextual archival description for items in the Portal?
        • A (Mark): We can certainly add that - but it is on a collection by collection basis, they will not pull and parse TARO records
      • Q (Jessica): What about DPLA sucking up the contextual archival description?
        • A (Mark): They aren't doing that now, it isn't consistent; the notion of "collection" is problematic. So far, DPLA hasn't shared their plan for how they will do that in the future.
      • Q (Donna): Is DPLA a one-time harvest, how often are the updates
        • A (Mark) : They reharvest each month at which time they get new records, edits, etc.
      • Q (Donna): What about the Portal harvesting the records?
        • A (Mark): We don't harvest, we take the pub masters and the records (which could be in a legacy database) in order to bring people into the Portal
      • Q (Beth): We would like to find a way to make sure we are using standards that easily map to the requirements of the portal?
        • A (Mark): Yes, good.
      • Q (Beth): We want metadata that is interoperable in many directions, what can we do on our end to make this easier and faster for everyone?
        • A (Mark): Just be consistent with the way you do things so that we can transform and map programmatically. And also documentation - be explicit about what your metadata elements mean. The challenge tends to be metadata dictated by XML records or a spreadsheet - they can be challenging either way. Their worldview is XML. We have a lot of collections where we ingest the collection without complete metadata and then make the metadata admin interface available to the repository so they can create item-level metadata. A better system for creating metadata (user experience) results in better-quality metadata. One nice feature of our system, is that we have APIs for every collection, you grab your data whenever you want. You can make modifications to your data whenever you want.
      • Q (Beth): Can you export the metadata?
        • A (Mark): Every partner and every collection has an OAI-PMH endpoint so you can harvest the metadata in a variety of formats. You can get your metadata in a programmatic way (and then you may have to map the profile).
      • Other things the Portal is up to:
        • We are trying to track how long it takes to create metadata. They keep track of editing events (doubles as provenancial). You can look at improvements to the record.
          • This persona averages this much time per item, and this many items per day, and then you can see the sessions that user have with the items and the repository can calculate returns on that reinvestment.
          • We are can rollback any changes to the metadata record ever.
          • You can start to do serious analytics about metadata creation for digital objects.
      • Q (Donna): What about ideas for subject headings - what has proven to be more useful?
        • A (Mark): We could do some analysis - if someone is willing to take a closer look at that we could look and see if people are coming in by subject. We'd like to start exposing relationships between objects across collections; items from the same geographic location; items from the same subject
      • Q (Amy): Can you think of ways that TARO and other cultural heritage insitutions could be integrated?
        • There are cool mashups that could be done - the handbook of Texas online, etc. You have a lot of data about people, places, dates, etc. You could do some interesting things pulling in wikipedia data. The biggest hurdle tends to be the level of descriptions. DPLA rules are that everything resolves to an item (not metadata about an item, which is why the archival metadata is hard for DPLA). We are getting to point where future system designs have to bake in relationships across repositories.
      • Q (Amy): On our finding aid, linking to items on the Portal - finding a way to get that content cross-referential
      • Q (Beth): EAC-CPF?
        • A (Mark): We have a system for authority records at UNT, and those records do end up in the Portal but we might contact someone that works with those records at UNT. When you create records and add a name, the name you select from the list has a resolvable URI that is the authority record and you can click to that authority record from the object-level metadata record.
      • Q (Jennifer): Is the authority system publicly available?
        • A (Mark): Yes, will send link
      • Mark: extended date format - they have a checker for the extended date format when you are creating metadata
      • Q (Esther): Do you have a sense of certain resource types being used more frequently?
        • A (Mark): 1800s who's who books, city directories (main user group is geneologists), newspapers, photographs, historic maps of Texas, maps more generally (some of the most publicized collections in DPLA are maps), soil surveys
      • Q (Beth): Are their any formats you don't take?
        • A (Mark): Not yet but their items that we can provide better access to than other things
      • Q (Beth): Size limitations?
        • A (Mark) : Not yet.
      • Q (Jessica): In the process of the TRAC process, are you guys creating documentation that articulates in explicit terms levels of service for different file formats
        • A (Mark): We are going through a peer-reviewed TRAC process, and what we will end up with is a document that articulates that there are some things we can do more with than other formats.
      • Q (Amy): Funding and sustainability?
        • A (Mark): Most funding is through the University, TexTreasures grants are the extent of funding we receive from the state. We try to stay competitive with some grants. Our infrastructure is funded by UNT dollars, not grants. Grants go to content creation, not infrastructure. We have a good track record with provisioning for storage. They have a digital storage fund. They plan for a 5yr use cycle. They take that amount they think they'll need and divide it by 5, so if the 5th year lands on a financially lean year, they have money to cover the expansion. They have an archival storage fund. They are setting up their first digital library endowments. They have about 220TB now, they have some TB waiting, they will double in size in the next year. People talk about what we are doing in a positive way at a very high level n the university - administrative support and goals align with the digital projects unit in a big way. The questions of how we plan for growth in the library - at what level of the library do you need to set aside/provision funding for the service/infrastructure. At what level do you certify a respository - at what level do we scope it. At what level are we a trusted entity.
      • Q (Beth): How many staff do you have?
        • A (Mark): 15 full time people/army of 35-40 students, 4 programmers
      • Q (Jessica): Is there anything that is in the Portal that doesn't go to DPLA
        • A (Mark): The main thing is if the repository is willing to sign on to the CC license for metadata.
      • Q (Jessica): Model for getting projects done
        • A (Mark): Each unit is allotted student wage hours. They have a certain amount of flexibility where they can absorb projects, but larger projects that mabye special collections would want to do, we would say - How many students would you like to support and the unit shifts that student hour budget over to the Digital Libraries Unit? And they re-up every year for the last three years. When we reup - people sometimes want to hand over more of those hours or more units are interested in donated student hours? Managing is always the challenge. How many students can one person reasoabley manage.
   
   
   
   

Action items

  •