Research access - transcription

Research access - transcription

Transcription

The primary objects in the transcription database are "textual units" as identified by the text detection algorithm. All metadata is recorded in the context of these units. 

 

S. NoLabelPropertyRangeUsageObligationVocab schemaSyntax schemaExampleComments
1Identifier_id 

This identifier uniquely identifies a textual unit. It serves as a mechanism for anonymizing identified textual units for crowdsourcing tasks.

*Not all textual units may contain identifiable text. The algorithm generates several false positives.

1 Generated using the bson.ObjectId() method.  5710cda96a15476a01bea518 
2Textual unit informationtextUnitTextUnitThis property contains information about the characteristics of the textual unit image.1    
3Feature informationfeaturesFeatureThis property contains information about identified features within textual units. Features are used to automatically detect characteristics of textual units or to select units for forwarding to transcription engines.0-1    

 

TextUnit

S. NoLabelPropertyRangeUsageObligationVocab schemaSyntax schemaExampleComments
1Image file

textUnit.imageFile

 The image file associated with this textual unit.1    
2Anonmymized image file

textUnit.anonymizedImageFile

 The anonymized image file that corresponds to the imageFile value for textUnit.imageFile1

 

 

  5710cda96a15476a01bea518.jpg One mechanism for anonymization is the use of system generated id for image file.
3

Parent page id

textUnit.parentPageId

 The unique identifier assigned to the page that contains this textual unit.1    
4Height

textUnit.height

 Height in pixels of the image containing this textual unit1  360 
5Width

textUnit.width

 Width in pixels of the image containing this textual unit1  468 
6X location

textUnit.locationX

 The x location for the top left corner of this image within the parent page1   1289 
7Y location

textUnit.locationY

 The y location for the top left corner of this image within the parent page1  5734 
6Word number

textUnit.numWord

 

The word number as identified by the text unit detection algorithm.

Currently, textual unit counting begins at 1 on each page and does not reset for new lines. Thus, all textual units on a page are numbered sequentially.

1   29 
5Line number

textUnit.numLine

 

The line number as identified by the text unit detection algorithm.

Line counting begins at 1 on each page.

1  468 
6

File size

textUnit.fileSize

 The file size in bytes for this textual unit1   23400 

Feature

S. NoLabelPropertyRangeUsageObligationVocab schemaSyntax schemaExampleComments
1Images with width greater than 15 and less than 312 (exclusive)

textUnit.widthGT15LT312

 

This property records whether the width of this textual unit is greater than (exclusive) 15 and less than (exclusive) 312.

This property will be true if the image width is within the acceptable range, false otherwise.

0-1True, False   
2Anonmymized image file

textUnit.anonymizedImageFile

 The anonymized image file that corresponds to the imageFile value for textUnit.imageFile1

 

 

  5710cda96a15476a01bea518.jpg One mechanism for anonymization is the use of system generated id for image file.

Transcription

S. NoLabelPropertyRangeUsageObligationVocab schemaSyntax schemaExampleComments
1Zooniverse data

textUnit.zooniverse

ZooniverseAll data for transcription of this textual unit via the Zooniverse infrastructure1    
2TBA

textUnit.TBA

 A future infrastructure1

 

 

  

 

Zooniverse

S. NoLabelPropertyRangeUsageObligationVocab schemaSyntax schemaExampleComments
1Ready to send to Zooniverse

textUnit.readyToSend

 A value of true indicates that this textual unit can be sent to Zooniverse0-1 True, false   
2Sent Zooniverse

textUnit.sent

 A value of true indicates that this textual unit has already been added to a Zooniverse subject set0-1

 

 True, false

  

3Received text from ZooniversetextUnit.receivedData A value of true indicates that data for this textual unit has been retrieved from Zooniverse