Research access - transcription
Transcription
The primary objects in the transcription database are "textual units" as identified by the text detection algorithm. All metadata is recorded in the context of these units.
| S. No | Label | Property | Range | Usage | Obligation | Vocab schema | Syntax schema | Example | Comments |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Identifier | _id | This identifier uniquely identifies a textual unit. It serves as a mechanism for anonymizing identified textual units for crowdsourcing tasks. *Not all textual units may contain identifiable text. The algorithm generates several false positives. | 1 | Generated using the bson.ObjectId() method. | 5710cda96a15476a01bea518 | |||
| 2 | Textual unit information | textUnit | TextUnit | This property contains information about the characteristics of the textual unit image. | 1 | ||||
| 3 | Feature information | features | Feature | This property contains information about identified features within textual units. Features are used to automatically detect characteristics of textual units or to select units for forwarding to transcription engines. | 0-1 |
TextUnit
| S. No | Label | Property | Range | Usage | Obligation | Vocab schema | Syntax schema | Example | Comments |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Image file | textUnit.imageFile | The image file associated with this textual unit. | 1 | |||||
| 2 | Anonmymized image file | textUnit.anonymizedImageFile | The anonymized image file that corresponds to the imageFile value for textUnit.imageFile | 1 |
| 5710cda96a15476a01bea518.jpg | One mechanism for anonymization is the use of system generated id for image file. | ||
| 3 | Parent page id | textUnit.parentPageId | The unique identifier assigned to the page that contains this textual unit. | 1 | |||||
| 4 | Height | textUnit.height | Height in pixels of the image containing this textual unit | 1 | 360 | ||||
| 5 | Width | textUnit.width | Width in pixels of the image containing this textual unit | 1 | 468 | ||||
| 6 | X location | textUnit.locationX | The x location for the top left corner of this image within the parent page | 1 | 1289 | ||||
| 7 | Y location | textUnit.locationY | The y location for the top left corner of this image within the parent page | 1 | 5734 | ||||
| 6 | Word number | textUnit.numWord | The word number as identified by the text unit detection algorithm. Currently, textual unit counting begins at 1 on each page and does not reset for new lines. Thus, all textual units on a page are numbered sequentially. | 1 | 29 | ||||
| 5 | Line number | textUnit.numLine | The line number as identified by the text unit detection algorithm. Line counting begins at 1 on each page. | 1 | 468 | ||||
| 6 | File size | textUnit.fileSize | The file size in bytes for this textual unit | 1 | 23400 |
Feature
| S. No | Label | Property | Range | Usage | Obligation | Vocab schema | Syntax schema | Example | Comments |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Images with width greater than 15 and less than 312 (exclusive) | textUnit.widthGT15LT312 | This property records whether the width of this textual unit is greater than (exclusive) 15 and less than (exclusive) 312. This property will be true if the image width is within the acceptable range, false otherwise. | 0-1 | True, False | ||||
| 2 | Anonmymized image file | textUnit.anonymizedImageFile | The anonymized image file that corresponds to the imageFile value for textUnit.imageFile | 1 |
| 5710cda96a15476a01bea518.jpg | One mechanism for anonymization is the use of system generated id for image file. |
Transcription
| S. No | Label | Property | Range | Usage | Obligation | Vocab schema | Syntax schema | Example | Comments |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Zooniverse data | textUnit.zooniverse | Zooniverse | All data for transcription of this textual unit via the Zooniverse infrastructure | 1 | ||||
| 2 | TBA | textUnit.TBA | A future infrastructure | 1 |
|
Zooniverse
| S. No | Label | Property | Range | Usage | Obligation | Vocab schema | Syntax schema | Example | Comments |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Ready to send to Zooniverse | textUnit.readyToSend | A value of true indicates that this textual unit can be sent to Zooniverse | 0-1 | True, false | ||||
| 2 | Sent Zooniverse | textUnit.sent | A value of true indicates that this textual unit has already been added to a Zooniverse subject set | 0-1 |
True, false | ||||
| 3 | Received text from Zooniverse | textUnit.receivedData | A value of true indicates that data for this textual unit has been retrieved from Zooniverse |