| Round Robin | | - Abby:
- Porter Olsen is coming to UT from UofM to do research using Gabriela Garcia Marquez Collection and this will provide concrete feedback on viable access policies for born-digital archival material
- Laura:
- Studies the preservation of Islamic cinema from the 1970s and 1980s - eventhough there is a state-run means for preserving film archives, because the country is restrictive in what foreigners access there, there has not been a lot of oversight on the actual preservation strategies or efforts being employed
- When officials in the new regime, post-1979, there was a lot of stuff that was destroyed or damaged
- Films are in the hands of both the official archives and also private collectors that maintain and in some cases reformat; some of those reformatted films are now popping up online on websites that are not strictly accessible in Iran
- Shannon:
- ExMO Collection does not currently have a lot of digital material but working on collection development in that are
- Ashley
- Just received a large collection of disks from an anthropologist mostly from the 1990s - the transcripts to the oral history are on those floppy disks and there is a researcher coming to do research; working on an access solution
- the primary preservation storage approach is vaulting things to tape but the tape drive has failed and is being updated
- Marianna
- Joined the data management committee - focused on institutional data
- Documentum for faculty and staff
- Enforces records management laws
- Might go live next summer - August of 2017
- Academic data is not included because of IP for students and faculty
- Iraq invasion
- So much data but not alot of acces, not clear on how to use them
- Iraqi issue and Kurdenstan issue
- They have a lot of artifacts abotu the
- Melanie:
- Focused on metadata rather than preservation
- Texas Conference on Digital Libriaries
- post-custodial model:
- co-hosting a bof on fedora-based repositories to get folks to come in to town for the conference
- Katie
- humanities librarian in architecture
- looking at the preservation of architectural records
- digital scholars in practice series
|
| Ian Milligan | | - We need to
- Web archive tool and development
- Warcbase is a web archiving platform - speed up access to wyaback machines; its a way to analyze web archives
- works really well on a raspberry pi, personal laptop or in a cluster
- in a nutshell - it takes a warc file and allows you to:
- look for length structure
- topic modeling
- setting up the data
- network graph
- They've been developing Warcbase 3 years ago - with Jimmy Lin in computer science
- Research use cases:
- IIIPC - the focus up until the last few years has been about grabbing everything and not a lot of thought on access
- there is a survey coming out and next week at IIIPC they will present stats on why or why not people are using archives
- Current researchers in LUNA, UK Archive people,
- Michigan just ran a great conference on web archiving research
- Basic sense of researchers want:
- you need more than the wayback machine - you have to know the url, you can only browse at one page at a time
- we need to scale up
- while we need search, the search needs to be intelligible
- Keyword query with no prioritication of results
- The goal of Warcbase is to make things translaret because if
- Collaboration between computer scientists and historians - it is working well because the team is half CS and historians
- Historians will say, "I would really like the power to do X" - they create a ticket, CS responds tot he tickets and they respond - research questions actually guiding the development
- the importance of doing things open source
- hackathon last month - about building a community that gains enough momentum
- Alot of inital development required the researcher to dig in to tool - what they are trying to do is use JUpiter notebooks (ideally they have someone to spin p the notebooks - vagrants), point the webbrowser at it and then
- goal of development - people use jupiter to prototype what they want to do and then paste that code into the shell
- publishing on it - working in an interdiplicinary cgroup (librarians, historians, and CS) - the key to success - everybody wants to publish in different scholarly communities
- technical stuff is ending up
- librarian presented at code4lib and their journal
- arts and humaniteis computing
- published an early web archiving piece in
- trying to put something together for digital humantiteis quarterly
- he is working on what it was like being a kid building a webiste in the 1990s
- everybody has to be happy - jimmy needs a reason to collaborate, ian does and the librarian does - tenue
- training - they are running a pilot training workshop in iceland, if that works they want to make that universal
- one of the goals this summer, RAs that don't have
- software carpentry workshop - python, github, etc.
- relationship to the library:
- will it continue to to be run like an open source community or hosted in the libraary - he prefers a consortial model for providing resources to support
- not having faculty status for librarians at waterloo means they don't have the bandwidth for research
- what can we do to imporve:
- the most important thing is documentation - if you guys are setting up a collection, you need to have your seed list written down and why we decided to collect what we decide to collect
- the canadian political parties collection was set up in 2005 but the libraian that set it up and there was no documentation so if i publish a peer reviewer will immediately go why these sites and not those
- beyond documentation, the debate between using hertirtx yourself or archive-it - archive-it has a great community, but nick has been doing heretrix by himself and that allows him to run that on an experiemental basis
- advertsiing is good - university of toronot - you need to advocate and outreach we have this stuff - and if you want to use it or point them to the warcbase workfhop
- social media is super important - the debate is should you use archive-it or the twitter api to downloadthe data - intersection between
- this summer, they used that SHINE program that the uk launched, it just provided faceted search - columbia human rights also uses a faceted search engine
- wayback machine - simple archive it keywords it is overwhelming, if you have good metadata you can start faceting down (dates, languages, subjects) - autoextracted metadata
- they found that was cool and they got a lot of news coverage - people then go, this cool but it begs the question for something more sophisticated - the gateway to warcbase
- the issues is the underlying data for websites are messy you have to
- he works with the socialoogust and network scholars and matt webber at rutgers - when they look at web archives, he wants content and they want the graphs
- social scientists want to look at networks, entities and things in a non-digital way
- hisotrical contribution longitudinal demention
- cs what you need now
- historians are looking for change over time
- in a dreamworld we would love to see other folks using it - so if we use it, we should be in contact with the team; open source stack
- if we decide to
- the fun of archive-iit and do funky stuff with it
- how researchers get access across instutional collections - they have been exploring UT legal team - what kind of MOU do we need to ask for warc files and
- U of Victoria and Alberta have been totally open to it - with the caveat of citations for the source and not sharing the warc files; when they create public facing things, they allow the libraries to review them
- when they finally get to the document, it will be cited as Univerity of Toronto; he shared a sample mou document (between donating university and the PI on the grant) - same with broader consortial project - agreement between Ian and the UNiverity of alberta
- web archives for longitudinal knowledge
|