2021-07-28

Date

Attendees

  • Katie Pierce Meyer
  • Josh Conrad
  • Nicole C.
  • Melanie Cofield
  • Paloma Graciani Picardo
  • Mandy Ryan
  • Beth Dodd
  • Nancy Sparrow
  • Katie Pierce Meyer
  • Brenna Edwards
  • Mary Aycock
  • Josh Conrad
  • Michael Shensky
  • Stephanie Tiedeken
  • Brittney Washington
  • Shiela Torres-Blank

Recording

https://utexas.zoom.us/rec/share/uGzVO7JPag4WKnpDuQkNPapE0PmkfZC2EgjU-trPtfL9iQ-lK1euxDzQc3Q8Tco.fyAoN8F01f3MxS-P

Theme

  • Project based work conducted by Josh Conrad, Architectural Collections GRA for Digital Initiatives, adding people and places (including geographic locations) to Wikidata based on representations in collections
  • Assessment discussion continued  

Agenda

Discussion items

ItemWhoNotes
Introductions, agenda overviewKatieNotes by Beth Dodd
Presentation

Josh

Presentation by Josh Conrad on Wikidata work with Architecture collections

Josh’s work on the Buildings of Texas (BOT) project been ongoing since its inception, however he now needs to focus on his role as a PhD student in the School of Architecture.  His goal has been understanding Architecture’s data and how to make it available.  This feeds nicely with his dissertation on the history of data projects over time, specifically the evolution and future of architectural historians and historic preservationists in their data practices.

The Buildings of Texas (BOT) project has evolved and expanded beyond its formal name (based upon the archival collection of the same name).  Josh’s February 2019 presentation to this group focused on the project plan and initial contributions to Wikidata.  At that time, the plan  included integration of data into Getty Vocabularies, which has not happened yet as  Wikidata is sufficient for now. Josh’s September 2019 presentation at the Digital Frontiers conference, provided a database and data model, mapping places, events and people over time.  Today’s presentation focuses on integrating BOT’s local authority data with Wikidata. 

BOT project: What is it?  The first dataset was from the Buildings of Texas collection; however, the project has now grown beyond this collection and even Texas.  It has grown into mapping our collections via:


  • A GIS database of places (buildings, landscapes, sites, districts, neighborhoods), events (activities that “take place” at specific moments) and people (individuals or groups that contribute to our participate in events at places) that appear in UTL collections
  • A digital, interactive map of places (points, lines, polygons, 3D models…)
  • A gazetteer: a geographical dictionary or index used in conjunction with a map or atlas
  • A spatial finding aid for UTL collections- access point via a map to what the UTL holds
  • A linked data ”authority files” that integrates Wikidata


Structure of the database:

Data Sources- “collections as data”.  Sources include the collections of BOT and David Williams; the Alexander’s Drawings Database (staff’s FilemakerPro database; records describe projects or sets of records- generally not at the item level).

Spatial Finding aid- “mapping the collections”.  Location/building points reveal all related sources of documentation up front.  Potentially duplicating data, but all linked to the work. 

Link by reconciliation in Wikidata (authority file). 

Wikidata as an authority file:

Pros:

  • Stable well-maintained database infrastructure
  • Flexible data schema
  • Robust reconciliation API via OpenRefine
  • All records pass notability requirements (“The entity must be notable, in the sense that it can be described using serious and publicly available references.”)
  • Data can be added and improved over time

Cons:

  • Data can be modified or erased against our wishes (though there is usually a valid reason and we are able to debate it in Wikidata forums if needed).
    • Melanie: does this mean we might want to continue maintaining our own local authority database?
    • Katie: this allows flexibility and reconciliation that can later be submitted to the Getty as a more stable source
    • Josh: there is a need for contributions and monitoring
  • Requires regular monitoring of changes to Wikidata items.
    • Beth: is there a feature that monitors, flags or notifies staff of changes.
    • Josh: API scripts are an option.
    • Nancy: are there notifications when records are taken down? Concern stems  from trends on serious contributions being taken down
    • Paloma: see PCC work: Wikidata: WikiProject PCC Wikidata Pilot/Pilot Best Practices/Properties Helpful for Establishing Notability https://www.wikidata.org/wiki/Wikidata:WikiProject_PCC_Wikidata_Pilot/Pilot_Best_Practices/Properties_Helpful_for_Establishing_Notability 
    • Katie: getting it into the Getty then gets it into VIAF. Getty is open, but the process is not as straight forward.  Linking to finding aids adds to notability, especially when they are online.  Issues may remain in satisfying requirements by contributing creators, however just getting it on Wikidata gets the process started and makes it available for others to contribute. 
    • Josh: has had items taken down that he has forgotten to provide source documents (these were test items). He did not receive notice.  It would require active monitoring. 
    • Paloma/Michael: how is monitoring done now? Michael: using 2 APIs.
      • Trying to programmatically monitor
      • SPARQL- finding collections related to the Alexander, then try to find ALL of the things linked to the Alexander’s collections. Retrieve list of all items, then use Wikimedia API to retrieve what has been updated. 
      • Tricky to id those items that we did not edit. It’s a work in progress
      • This information is found in the Source


Workflow:

New data source:  add into the Geodatabase.  1st edit the data into the structure, transformed using a template (in OpenRefine, add in Wikidata id), reconcile with Wikidata, pull in coordinates (so that places are represented with the same geometry). 

Data scheme:  use 2: Wikidata for places and for contributors.  Records on buildings, places, architects, artists….


Ex. demo: https://www.wikidata.org/wiki/Q107323101


Statistics to date:  added-  places= 4,844. contributors=1,423


PLACES

Those without a given name:  

House.  Wikidata requires a description field (combine place with address).  Used for reconciliation.


Ex.

Label: City of San Angelo Municipal Pool. 

Description: Swimming pool at 18 E. Ave. A San Angelo, Texas


Statements

Note: use Stated In reference: Buildings of Texas Collection (which is a type of reference allowed). See Wikidata usage reference structure requirements. 


Buildings of Texas Collection:

Instance of: collection

Archives at: Alexander Architectural Archives


Instance of:  Adding Type of place

Country; Located in: the administrative territorial entity

Coordinate location: this is a critical field for reconciliation by API


Events:

Note: a building can have multiple events over time with multiple participants.

Significant event: (type of event with a qualifier point in time)  construction; with value of inception= date of construction. 

Participant: John G. Becker; object has role: architect

Street address: qualified (English)


CONTRIBUTORS Creators:

Role


Statements

Instance of: human; reference Buildings of Texas (BOT)

Floruit: (date when a person was known to be active or alive, when birth or death not documented.  Use for all new contributor items that we add.)


Noticed that the record used to exhibit work has already been changed by:   

Dzahsh is Josh, another person (who added Given Name), and the wikibot Edoderoobot. 


Paloma:  question on data modeling.  There are different ways to say the same thing.  Is there any community on best practices for architectural data or buildings data?  This is important for efficient SPARQL queries.  Josh: yes, should duplicate for architects, even though it means duplication of data.  Katie: still good to have flexibility of Participant, because there are many participants/contributors in architectural practice. 

MAP prototyping:

Data from each source should be represented in the side bar.

Queries by place, events


Q&A, comments:

Katie: Ethics of showing addresses, etc.? Beth: See SAA Design Records Section meeting, July 27, 2021

Beth: great recent presentations at LD4 and SAA that have been informative. Ex. Use of  Flurit- discussion in LD4

Beth: Wikibase- for those not quite ready for Wikidata. Paloma: described HRC’s work

Beth: Data structure is now evolving for efficiencies in data collecting and migration into different systems- ex. Alexander’s Drawings Database is prime for revisions (aligning labels,  cleaning up fields, and reconciling data in 17,000 records).  Katie and Josh are working on the  next collection Karl Kamrath.   

Beth: Sustainability- importance of process documentation to hand off to the next wonderful GRA

Melanie:  Assessment – positive impact and use.  How to track specific changes.  Interactive map application- another point to monitor usage, and how to track.  Josh: ArcGIS add in?  Where are Wikidata operationalized?  As with the DAMS? How many times are the terms used? 

Mandy: Islandora 8 taxonomies…. What is happening in the community- use cases, reconciliation with OpenRefine (source of choice), and ingesting (batch) for others to use.


Assessmentall

Please contribute to google sheet: https://docs.google.com/document/d/1yRzhBPx6BG2UETEei21DJ5Swc8GVRNN8eAr7ZvphEKw/edit

Some discussion already occurred during presentation.  Next meeting will continue guided discussion from the Google sheet.

Looking ahead

All


Action items

  • contribute discussion topics and questions to the Assessment document noted above
  • add item here, rinse, repeat