Capstone 2016 - Special Collections Assessment - Aberle

Capstone 2016 - Special Collections Assessment - Aberle

The wiki page is used only to track the project

Data Clean Up

Decisions Made:

  1. Multiple items separate lines (multiple items bound together, different works): When there are multiple works with one item number because the works were bound together: separate lines with the item record repeated.

  2. Multiple titles - as translations - made into single cell (one work)

  3. Journals - One item record in the data record for the entire series then one item line in the data; An item record for each journal, then each item record should be retained in the data. When possible (possibly in the future, add journal volumes into the data record)

  4. Removed BAX provenance when it was not related to the Special Collections item - IF we want to run an exercise based on our entire collection, we should just run provenance fields with title/item record

  5. Removed records i127395842, i127395854, and i127395866. The works were withdrawn and await the BAX replacement for Burtler's The Architecture of Edward Lutyens. The copies are technically no longer in Special Collections and there is not currently any access to the BAX copies.  The works should therefore not be counted as part of the survey. It is possible that there are other withdrawn works in this survey; however, I did not come into contact with them. They were removed on copy 10.

  6. Normalized language, provenance, location, and double checked call numbers in OpenRefine. Added language information based on titles whenever possible (228 blank cells to 26 blank cells).

  7. Normalized authors: clustered, combined when I could, fixed diacritics. There's no good way in OR to know if you got them all but an improvement was made to the data.

  8. With regards to publishing information - removed the cataloger's uncertainty ([], ?).  Because just want to be able to say this is a range.

  9. The first 800 records or so, I tried to edit both the place and date in publication but realized it would be impossible by moving date into separate cell and then cleaning up the punctuation so that OpenRefine would have a symbol to sort on. In the end, I only moved date over.

  10. I have not be able to edit titles or authors consistently. I have tried with regards to language issues (Chinese, Japanese, Central European languages, and Russian were particularly problematic) to fix at least title and author so that they are legible.

  11. When two languages were listed (eg. English/Spanish), I consolidated them into mul (multiple language).

  12. Two records had undetermined in their language field, removed, fields are now blank (no language).

  13. Geography of publishers completed. If there was more than location listed for publisher, I always selected the first place listed. I normalized the city names (not always to English - inconsistent about which spelling I use), added the states were applicable, and country. All this work was completed in OpenRefine. 

  14. Dates - for the purpose of the capstone, reduced the complexity

    1. Dates original: ranges, single date, unknown dates, partially known dates, long ranges 1925-

    2. Date ranges: selected the average of the range to represent the range when it falls between two quarters (See 14.d)

    3. Long ranges: removed the range (1925- to 1925). Journals tend to be represented as either xxxx-, a complete range, or no dates at all.

    4. Partially known dates where last two numbers unknown: 19-- placed in first quarter

    5. Grouped ranges in terms of quarters: 01-25; 26-50; 51-75; 76-00

    6. Oldest: 1568

 

Questions Remaining:

  1. Other library's provenance notes: Katie retain, Jessica remove

  2. Withdrawn works that were supposed to be replaced with a gift copy (original item withdrawn, but new item not added) – See decisions made #5.

Cell Issues

  1. When there are multiple copies in the collection - it triggers a misalignment 

  2. When there are multiple items for a work - they tend to duplicate author, title, subjects, language, and provenance. Tends to be a single duplication no matter the number of copies but exceptions exist.

  3. Two bib records with the same item records attached in the export

    1. I do a search for the other item number before adding it - it is inconsistent. 

  4. When two different items - not journals - have the same call number, both works attach to a single item record - have to add a new line with item record and info

  5. 590/790 act globally, so I have to figure out which copy has provenance and attach it to that ite

Record Issues

  1. Rowe and Moore: either attribute is missing (From the Library of...) or the wrong location is attached to the record. It would behoove us to run a search on Rowe and Moore to update the records.

  2. Works associated with i1362818x American Architecture by Fiske Kimball. Noted that copy 4 is a Cret copy that was in ARG but is now missing. Copy 2 and 3 are in ARZSP. Double checked two copies downstairs and one copy upstairs: not the Cret copy. Removed attribution since we do not have a copy in SC attributed to Crey. We may want to run a search on the Crey library and transfer anything else into SC that is in ARG.

  3. Works associated with i120051187  Copy 4 is a George Dahl copy but there is no copy 4. There are 3 copies in SC. More copies exist as different records b/c different publication dates. Maybe attribution attached to wrong record? Removed attribution on Copy 10.

  4. Records i100902340, i100902352, i100902364 are Moore - they have no title, publication info, authors - only location, record number, and call number.

  5. Noticed that some records have poorly constructed titles - words incomplete

  6. Record i189709844 is in the catalog for SOA but has never shown up. Retained this record. Dan is looking into why this copy has not arrived.

  7. Some of the works are held with Holly - retained records.

 

*Versions of data record will be saved daily. These versions are named: ArchSpecCollMS5_[version number]. They can be found in Jessica Aberle's box account: Capstone_Aberle>Working Spread Sheets -Melanie.

 

 

Version Milestones

Version ArchSpecColMS5_4: cell alignment complete. Still a snap shot of how provenance fields exported. When I had to make decisions while fixing cell alignment, I did have to make decisions about how provenance would be represented. Nonetheless, all provenance issues are still apparent in this version. Provenance will be the next aspect to tackle - I still need sierra in order to verify item numbers with former book owners.

Version ArchSpecColMS5_10: Provenance verified. 590 and 790 notes combined into single cell: Provenance

Version ArchSpecColMS5_11: Record clean up in OpenRefine

Version ArchSpecColMS5_12: Exported record from OpenRefine. Made additional changes in excel and re-uploaded as new project in OR.

Version ArchSpecColMS5_13: Returned to Excel to clean cell data

Version ArchSpecColMS5_20: Dates cleaned and double checked in OpenRefine; Language refined again in OpenRefine - the last time language will be altered

Version ArchSpecColMS5_22: Geography of place completed; still need to add geocoding for mapping purposes. 

Version ArchSpecColMS5_23: Publication dates normalized into quarter centuries.

 

 

Location of Documents

Associated documents can be found in Jessica Aberle's UTbox account. Upon completion of the project all documents will be transferred to Katie Pierce Meyer. The archival home of the documents will be determined by KPM. Copies of the documents can also be found in Jessica Aberle's folder on the server. Work log for the project can be found with Jessica Aberle's GRA work log as a separate tab.