4.1 Formatting transcription process (transcribers)
Connected speech formatting
If you are completing the transcription training process, ignore steps 1-4, since the folder will contain the samples and templates that you will work from.
Choose folder depending Study of enrollment of the participant (eg. Clinical Trial, Observational)
Choose Spanish, Catalan, or English, depending on the sample language that you are formatting
Choose clipped audio of tasks
Choose the task that you are formatting. For example, if it's the WAB picnic description, choose S3_PicnicScene_Picture Description.
Go to "Taskname_formatted_for_clan." For example, if it's the WAB picnic description, choose 2. PicnicScene_formatted_for_clan
Once in that folder, you should see the template named according to the task. For example: CODE001_BACC001_PicnicScene_Spa_Timepoint_YYYYMMDD
This template helps you name the file and contains the headers that you will need for the transcription.
Copy and paste the whisper transcription in this .cha file
File format for Connected Speech samples
You will need to fill out some fields in the headers of the file (headers are the first few lines in the document starting with @)
Preserve the format of the headers
There should be one tab after the colon in each header, so if it gets deleted, put a single tab back in
There are a specific number of "pipes" (this symbol: |) in the @ID header, so don't delete any
Don't add spaces in any of the fields where there are none in the template
Fill out the fields in the headers:
@Language
This should be included in the template according to the language you previously chose. If not, enter the language of the sample (in lowercase):
spa (Spanish)
cat (Catalan)
eng (English)
@Participant Code: Enter the participant's code here, e.g. BISE016. You will also need to enter the correct BACC### (Not applicable for local participants. If it's a local participant, only enter participant code). You can find this information in the Connected Speech Data Analysis Smartsheet.
@ID: Timepoint: Enter LRT or VISTA or OBS if Participant's code name is BILP or BISE respectively. Also, add the timepoint (e.g., pre, mid, post, etc.).
BISD (Bilingual Semantic Dementia), BILP (Bilingual Logopenic), BISE (Bilingual Speech Entrainment)
@Time Duration: Enter the start and end time of the sample
If you used a timer, enter 00:00:00 for the start
Preserve the format of the time as indicated
Note the timecode of the video at the onset of the first word a participant says after the clinician prompt (excluding any words you are omitting from the beginning, as discussed above)
Note the timecode at the offset of the last word the participant says on the script topic/picture description
If the clinician redirects or re-prompts during the probe, omit the duration of this from the total duration
Name of script/sample
After the "comment" header, type the name of the script topic, or type the title of the discourse sample, e.g., "PicnicScene" for the WAB picnic description
@Comment: PicnicScene
@Comment: CatRescue
@Comment: ImportantEvent
After the Comment header, the transcription starts. Each utterance has to appear after *PAR: and the text it needs to be a TAB (so it's the larger spacing that needs to be present). Do not leave a space between *PAR: and the text.
File format for evaluación de lenguaje básica A and B samples
20240101 Dr.Santos: Two tasks:
1) Rename existing files:
For example, in the folder “B--SpeechFTLD_B> B--Picnic-description-transcripts> B--
Utterances”, the file “T17A_DG1_1892153_tra_c-unit.TextGrid.rtf.chstr.longtr.flo” should be
renamed to “1892153_20210727_utterances”
In the folder “B--SpeechFTLD_B> B--Picnic-description-transcripts> B--Literal”, the file
“T17A_DG1_1892153_tra_c-unit.TextGrid.rtf.chstr.longtr.flo” should be renamed to
“1892153_20210727_literal”
You can find the date of the speechFTLD_B evaluation (in which the picnic scene task was
administered) in the excel sheet “speechFTLD_B-taskcompletion-nps-demog-dx_20240129.xlsx”
in column K called “SpeechFTLD_B”. This date is written in the file name in the format
YYYYMMDD. This excel sheet is in the folder “B--SpeechFTLD_B> B--Picnic-description-
transcripts> Z--Task-completion-Demog-Dx-NPS”
2) Create new transcription files
Create picnic transcriptions (one literal transcript and another transcript separated in
utterances) for the participants who completed the Sentence Production task (column N called
“Sent_production”). Prioritize participants with the following dx: svPPA, lvPPA, nfPPA, ADtyp,
and ADatyp-lang (see column “I” called “DX_synd” for dx).
The audio files of the picnic description are in the folder “B--SpeechFTLD_B> A--SpeechFTLD_B-
audio-files". The picnic files end with “_PICT.wav”
After creating the files and saving them in the appropriate folders, go to column BC called
“UT_picnic_transcription” and write “yes” in the appropriate cell.
Filename Format for VISTA Samples
The transcription file should be saved in the following format: CODE###_BACC###_TaskName_Language_Timepoint_Date
For the Language it should be Spa pending:
revise Spanish transcriptions and change naming from "Span" to "Spa" for Spanish, Cat for Catalan, or Eng for EnglishFor the timepoint, it should be: Pre, Mid, Post, 6m, 12m followed by the number of the probe
You will find the date in the date of administration column in the Connected Speech Data Analysis Smartsheet. It is very important that you carefully follow the format of YYYYMMDD. https://app.smartsheet.com/sheets/q86RQPjgG33Q7c3PjxqG4P64c37JrJ3gjcfr5g71
NameofScriptORDiscourseTask
NameofScript For VISTA, for example, the second script probe of the script "My Hobbies" during post-treatment for SE001 would be:
CODE###_BACC###_NameOfScript_Language_Timepoint_YYYYMMDD
For example: BISE018_BACC001_MyHobbies_Spa_Post_20240525
NameofDiscourseTask should be:
CODE###_BACC###_Taskname_Language_Timepoint_YYYYMMDD
BILP022_BACC002_PicnicScene_Cat_Pre_230517.cha
BISE011_BACC004_CatRescue_Spa_Obs1_230925.cha
!! Don't include spaces in the filename
For local participants (not Barcelona), follow this file naming format: (CODE001_Taskname_Language (Spa, Eng)_Timepoint_Date)
Please let us know when this is complete and update the Connected Speech Data analysis smartsheet