4.1 Formatting transcription process (transcribers)

4.1 Formatting transcription process (transcribers)

Connected speech formatting

If you are completing the transcription training process, ignore steps 1-4, since the folder will contain the samples and templates that you will work from. 

  1. Go to 101--Connected Speech_Data

  2. Choose folder depending Study of enrollment of the participant (eg. Clinical Trial, Observational)

  3. Choose Spanish, Catalan, or English, depending on the sample language that you are formatting

  4. Choose clipped audio of tasks

  5. Choose the task that you are formatting. For example, if it's the WAB picnic description, choose S3_PicnicScene_Picture Description. 

  6. Go to "Taskname_formatted_for_clan." For example, if it's the WAB picnic description, choose 2. PicnicScene_formatted_for_clan

  7. Once in that folder, you should see the template named according to the task. For example: CODE001_BACC001_PicnicScene_Spa_Timepoint_YYYYMMDD

  8. This template helps you name the file and contains the headers that you will need for the transcription.

  9. Copy and paste the whisper transcription in this .cha file

File format for Connected Speech samples

 

 

  1. You will need to fill out some fields in the headers of the file (headers are the first few lines in the document starting with @)

  2. Preserve the format of the headers

    1. There should be one tab after the colon in each header, so if it gets deleted, put a single tab back in

    2. There are a specific number of "pipes" (this symbol: |) in the @ID header, so don't delete any

    3. Don't add spaces in any of the fields where there are none in the template

    4. Fill out the fields in the headers:

      1. @Language

        1. This should be included in the template according to the language you previously chose. If not, enter the language of the sample (in lowercase):

          1. spa (Spanish)

          2. cat (Catalan)

          3. eng (English)

      2. @Participant Code: Enter the participant's code here, e.g. BISE016. You will also need to enter the correct BACC### (Not applicable for local participants. If it's a local participant, only enter participant code). You can find this information in the Connected Speech Data Analysis Smartsheet.

      3. @ID: Timepoint: Enter LRT or VISTA or OBS if Participant's code name is BILP or BISE respectively. Also, add the timepoint (e.g., pre, mid, post, etc.). 

        BISD (Bilingual Semantic Dementia), BILP (Bilingual Logopenic), BISE (Bilingual Speech Entrainment)

      4. @Time Duration: Enter the start and end time of the sample

      5. If you used a timer, enter 00:00:00 for the start

      6. Preserve the format of the time as indicated

      7. Note the timecode of the video at the onset of the first word a participant says after the clinician prompt (excluding any words you are omitting from the beginning, as discussed above)

      8. Note the timecode at the offset of the last word the participant says on the script topic/picture description

      9. If the clinician redirects or re-prompts during the probe, omit the duration of this from the total duration

    5. Name of script/sample

      1. After the "comment" header, type the name of the script topic, or type the title of the discourse sample, e.g., "PicnicScene" for the WAB picnic description

        1. @Comment: PicnicScene

        2. @Comment: CatRescue

        3. @Comment: ImportantEvent

  3. After the Comment header, the transcription starts. Each utterance has to appear after *PAR: and the text it needs to be a TAB (so it's the larger spacing that needs to be present). Do not leave a space between *PAR: and the text.

File format for evaluación de lenguaje básica A and B samples


20240101 Dr.Santos: Two tasks:
1) Rename existing files:
For example, in the folder “B--SpeechFTLD_B> B--Picnic-description-transcripts> B--
Utterances”, the file “T17A_DG1_1892153_tra_c-unit.TextGrid.rtf.chstr.longtr.flo” should be
renamed to “1892153_20210727_utterances”


In the folder “B--SpeechFTLD_B> B--Picnic-description-transcripts> B--Literal”, the file
“T17A_DG1_1892153_tra_c-unit.TextGrid.rtf.chstr.longtr.flo” should be renamed to
“1892153_20210727_literal”


You can find the date of the speechFTLD_B evaluation (in which the picnic scene task was
administered) in the excel sheet “speechFTLD_B-taskcompletion-nps-demog-dx_20240129.xlsx”
in column K called “SpeechFTLD_B”. This date is written in the file name in the format
YYYYMMDD. This excel sheet is in the folder “B--SpeechFTLD_B> B--Picnic-description-
transcripts> Z--Task-completion-Demog-Dx-NPS”


2) Create new transcription files
Create picnic transcriptions (one literal transcript and another transcript separated in
utterances) for the participants who completed the Sentence Production task (column N called
“Sent_production”). Prioritize participants with the following dx: svPPA, lvPPA, nfPPA, ADtyp,
and ADatyp-lang (see column “I” called “DX_synd” for dx).


The audio files of the picnic description are in the folder “B--SpeechFTLD_B> A--SpeechFTLD_B-
audio-files". The picnic files end with “_PICT.wav”
After creating the files and saving them in the appropriate folders, go to column BC called
“UT_picnic_transcription” and write “yes” in the appropriate cell.

 

Filename Format for VISTA Samples

  1. The transcription file should be saved in the following format: CODE###_BACC###_TaskName_Language_Timepoint_Date

    1. For the Language it should be Spa pending:
      revise Spanish transcriptions and change naming from "Span" to "Spa" for Spanish, Cat for Catalan, or Eng for English

    2. For the timepoint, it should be: Pre, Mid, Post, 6m, 12m followed by the number of the probe

    3. You will find the date in the date of administration column in the Connected Speech Data Analysis Smartsheet. It is very important that you carefully follow the format of YYYYMMDD. https://app.smartsheet.com/sheets/q86RQPjgG33Q7c3PjxqG4P64c37JrJ3gjcfr5g71

    4. NameofScriptORDiscourseTask

      1. NameofScript For VISTA, for example, the second script probe of the script "My Hobbies" during post-treatment for SE001 would be:

        1. CODE###_BACC###_NameOfScript_Language_Timepoint_YYYYMMDD

        2. For example: BISE018_BACC001_MyHobbies_Spa_Post_20240525

      2. NameofDiscourseTask should be:

        1. CODE###_BACC###_Taskname_Language_Timepoint_YYYYMMDD

        2. BILP022_BACC002_PicnicScene_Cat_Pre_230517.cha

        3. BISE011_BACC004_CatRescue_Spa_Obs1_230925.cha

    5. !! Don't include spaces in the filename

    6. For local participants (not Barcelona), follow this file naming format: (CODE001_Taskname_Language (Spa, Eng)_Timepoint_Date)


Please let us know when this is complete and update the Connected Speech Data analysis smartsheet