2016-04-07 Meeting notes

2016-04-07 Meeting notes

Date

Apr 7, 2016

Attendees

  • @Jessica Wesley Meyerson

Agenda

Discussion items

Time

Item

Who

Notes

Time

Item

Who

Notes

Round Robin

  • Abby:

    • Porter Olsen is coming to UT from UofM to do research using Gabriela Garcia Marquez Collection and this will provide concrete feedback on viable access policies for born-digital archival material

  • Laura:

    • Studies the preservation of Islamic cinema from the 1970s and 1980s - eventhough there is a state-run means for preserving film archives, because the country is restrictive in what foreigners access there, there has not been a lot of oversight on the actual preservation strategies or efforts being employed

    • When officials in the new regime, post-1979, there was a lot of stuff that was destroyed or damaged

    • Films are in the hands of both the official archives and also private collectors that maintain and in some cases reformat; some of those reformatted films are now popping up online on websites that are not strictly accessible in Iran

  • Shannon:

    • ExMO Collection does not currently have a lot of digital material but working on collection development in that are

  • Ashley

    • Just received a large collection of disks from an anthropologist mostly from the 1990s - the transcripts to the oral history are on those floppy disks and there is a researcher coming to do research; working on an access solution

    • the primary preservation storage approach is vaulting things to tape but the tape drive has failed and is being updated

  • Marianna

    • Joined the data management committee - focused on institutional data

    • Documentum for faculty and staff

    • Enforces records management laws

    • Might go live next summer - August of 2017

    • Academic data is not included because of IP for students and faculty

  • Iraq invasion

    • So much data but not alot of acces, not clear on how to use them

    • Iraqi issue and Kurdenstan issue

    • They have a lot of artifacts abotu the

  • Melanie:

    • Focused on metadata rather than preservation

    • Texas Conference on Digital Libriaries

    • post-custodial model:

    • co-hosting a bof on fedora-based repositories to get folks to come in to town for the conference

  • Katie

    • humanities librarian in architecture

    • looking at the preservation of architectural records

    • digital scholars in practice series

 

Ian Milligan

 

  • We need to

  • Web archive tool and development

  • Warcbase is a web archiving platform - speed up access to wyaback machines; its a way to analyze web archives

  • works really well on a raspberry pi, personal laptop or in a cluster

  • in a nutshell - it takes a warc file and allows you to:

    • look for length structure

    • topic modeling

    • setting up the data

    • network graph

  • They've been developing Warcbase 3 years ago - with Jimmy Lin in computer science

  • Research use cases:

    • IIIPC - the focus up until the last few years has been about grabbing everything and not a lot of thought on access

    • there is a survey coming out and next week at IIIPC they will present stats on why or why not people are using archives

    • Current researchers in LUNA, UK Archive people,

    • Michigan just ran a great conference on web archiving research

    • Basic sense of researchers want:

      • you need more than the wayback machine - you have to know the url, you can only browse at one page at a time

      • we need to scale up

      • while we need search, the search needs to be intelligible

  • Keyword query with no prioritication of results

  • The goal of Warcbase is to make things translaret because if

  • Collaboration between computer scientists and historians - it is working well because the team is half CS and historians

  • Historians will say, "I would really like the power to do X" - they create a ticket, CS responds tot he tickets and they respond - research questions actually guiding the development

  • the importance of doing things open source

  • hackathon last month - about building a community that gains enough momentum

  • Alot of inital development required the researcher to dig in to tool - what they are trying to do is use JUpiter notebooks (ideally they have someone to spin p the notebooks - vagrants), point the webbrowser at it and then

  • goal of development - people use jupiter to prototype what they want to do and then paste that code into the shell

  • publishing on it - working in an interdiplicinary cgroup (librarians, historians, and CS) - the key to success - everybody wants to publish in different scholarly communities

    • technical stuff is ending up

    • librarian presented at code4lib and their journal

    • arts and humaniteis computing

    • published an early web archiving piece in

    • trying to put something together for digital humantiteis quarterly

    • he is working on what it was like being a kid building a webiste in the 1990s

    • everybody has to be happy - jimmy needs a reason to collaborate, ian does and the librarian does - tenue

  • training - they are running a pilot training workshop in iceland, if that works they want to make that universal

  • one of the goals this summer, RAs that don't have

  • software carpentry workshop - python, github, etc.

  • relationship to the library:

    • will it continue to to be run like an open source community or hosted in the libraary - he prefers a consortial model for providing resources to support

    • not having faculty status for librarians at waterloo means they don't have the bandwidth for research

  • what can we do to imporve:

    • the most important thing is documentation - if you guys are setting up a collection, you need to have your seed list written down and why we decided to collect what we decide to collect

    • the canadian political parties collection was set up in 2005 but the libraian that set it up and there was no documentation so if i publish a peer reviewer will immediately go why these sites and not those

    • beyond documentation, the debate between using hertirtx yourself or archive-it - archive-it has a great community, but nick has been doing heretrix by himself and that allows him to run that on an experiemental basis

    • advertsiing is good - university of toronot - you need to advocate and outreach we have this stuff - and if you want to use it or point them to the warcbase workfhop

    • social media is super important - the debate is should you use archive-it or the twitter api to downloadthe data - intersection between

  • this summer, they used that SHINE program that the uk launched, it just provided faceted search - columbia human rights also uses a faceted search engine

  • wayback machine - simple archive it keywords it is overwhelming, if you have good metadata you can start faceting down (dates, languages, subjects) - autoextracted metadata

    • they found that was cool and they got a lot of news coverage - people then go, this cool but it begs the question for something more sophisticated - the gateway to warcbase

  • the issues is the underlying data for websites are messy you have to

  • he works with the socialoogust and network scholars and matt webber at rutgers - when they look at web archives, he wants content and they want the graphs

    • social scientists want to look at networks, entities and things in a non-digital way

    • hisotrical contribution longitudinal demention

    • cs what you need now

    • historians are looking for change over time

  • in a dreamworld we would love to see other folks using it - so if we use it, we should be in contact with the team; open source stack

  • if we decide to

  • the fun of archive-iit and do funky stuff with it

  • how researchers get access across instutional collections - they have been exploring UT legal team - what kind of MOU do we need to ask for warc files and

  • U of Victoria and Alberta have been totally open to it - with the caveat of citations for the source and not sharing the warc files; when they create public facing things, they allow the libraries to review them

  • when they finally get to the document, it will be cited as Univerity of Toronto; he shared a sample mou document (between donating university and the PI on the grant) - same with broader consortial project - agreement between Ian and the UNiverity of alberta

  • web archives for longitudinal knowledge

Action items