Research access - transcription

Research access - transcription

Transcription

The primary objects in the transcription database are "textual units" as identified by the text detection algorithm. All metadata is recorded in the context of these units. 

 

S. No

Label

Property

Range

Usage

Obligation

Vocab schema

Syntax schema

Example

Comments

S. No

Label

Property

Range

Usage

Obligation

Vocab schema

Syntax schema

Example

Comments

1

Identifier

_id

 

This identifier uniquely identifies a textual unit. It serves as a mechanism for anonymizing identified textual units for crowdsourcing tasks.

*Not all textual units may contain identifiable text. The algorithm generates several false positives.

1

 Generated using the bson.ObjectId() method.

 

 5710cda96a15476a01bea518

 

2

Textual unit information

textUnit

TextUnit

This property contains information about the characteristics of the textual unit image.

1

 

 

 

 

3

Feature information

features

Feature

This property contains information about identified features within textual units. Features are used to automatically detect characteristics of textual units or to select units for forwarding to transcription engines.

0-1

 

 

 

 

 

TextUnit

S. No

Label

Property

Range

Usage

Obligation

Vocab schema

Syntax schema

Example

Comments

S. No

Label

Property

Range

Usage

Obligation

Vocab schema

Syntax schema

Example

Comments

1

Image file

textUnit.imageFile

 

The image file associated with this textual unit.

1

 

 

 

 

2

Anonmymized image file

textUnit.anonymizedImageFile

 

The anonymized image file that corresponds to the imageFile value for textUnit.imageFile

1

 

 

 

 5710cda96a15476a01bea518.jpg

 One mechanism for anonymization is the use of system generated id for image file.

3

Parent page id

textUnit.parentPageId

 

The unique identifier assigned to the page that contains this textual unit.

1

 

 

 

 

4

Height

textUnit.height

 

Height in pixels of the image containing this textual unit

1

 

 

360

 

5

Width

textUnit.width

 

Width in pixels of the image containing this textual unit

1

 

 

468

 

6

X location

textUnit.locationX

 

The x location for the top left corner of this image within the parent page

1

 

 

 1289

 

7

Y location

textUnit.locationY

 

The y location for the top left corner of this image within the parent page

1

 

 

5734

 

6

Word number

textUnit.numWord

 

The word number as identified by the text unit detection algorithm.

Currently, textual unit counting begins at 1 on each page and does not reset for new lines. Thus, all textual units on a page are numbered sequentially.

1

 

 

 29

 

5

Line number

textUnit.numLine

 

The line number as identified by the text unit detection algorithm.

Line counting begins at 1 on each page.

1

 

 

468

 

6

File size

textUnit.fileSize

 

The file size in bytes for this textual unit

1

 

 

 23400

 

Feature

S. No

Label

Property

Range

Usage

Obligation

Vocab schema

Syntax schema

Example

Comments

S. No

Label

Property

Range

Usage

Obligation

Vocab schema

Syntax schema

Example

Comments

1

Images with width greater than 15 and less than 312 (exclusive)

textUnit.widthGT15LT312

 

This property records whether the width of this textual unit is greater than (exclusive) 15 and less than (exclusive) 312.

This property will be true if the image width is within the acceptable range, false otherwise.

0-1

True, False

 

 

 

2

Anonmymized image file

textUnit.anonymizedImageFile

 

The anonymized image file that corresponds to the imageFile value for textUnit.imageFile

1

 

 

 

 5710cda96a15476a01bea518.jpg

 One mechanism for anonymization is the use of system generated id for image file.

Transcription

S. No

Label

Property

Range

Usage

Obligation

Vocab schema

Syntax schema

Example

Comments

S. No

Label

Property

Range

Usage

Obligation

Vocab schema

Syntax schema

Example

Comments

1

Zooniverse data

textUnit.zooniverse

Zooniverse

All data for transcription of this textual unit via the Zooniverse infrastructure

1

 

 

 

 

2

TBA

textUnit.TBA

 

A future infrastructure

1

 

 

 

 



 

Zooniverse

S. No

Label

Property

Range

Usage

Obligation

Vocab schema

Syntax schema

Example

Comments

S. No

Label

Property

Range

Usage

Obligation

Vocab schema

Syntax schema

Example

Comments

1

Ready to send to Zooniverse

textUnit.readyToSend

 

A value of true indicates that this textual unit can be sent to Zooniverse

0-1

 True, false

 

 

 

2

Sent Zooniverse

textUnit.sent

 

A value of true indicates that this textual unit has already been added to a Zooniverse subject set

0-1

 

 True, false

 

 



3

Received text from Zooniverse

textUnit.receivedData

 

A value of true indicates that data for this textual unit has been retrieved from Zooniverse