VISTA Reliability - Blinding Procedure

Because our projects have multiple steps in which lab members must transcribe patients' speech, we have decided to use a system that takes advantage of work that has already been done. Checking the reliability of transcriptions normally requires comparing the transcriptions of two (or more) individuals and identifying any discrepancies between them. As we derive two independent pipelines for VISTA data (POM & Connected Speech), this would typically take 4 individuals transcribing patients' data (2 for POM and 2 for Connected Speech). However, we elected to use a reliability structure that only requires 3 individuals by using the transcription of one transcriber (who we denote as Rater 2) for both POM and Connected Speech reliability. This allows us to minimize the work that is required for assessing reliability.

For a diagram, please see here.

In order to use Rater 2’s transcriptions for both POM and Connected Speech, we need to ensure that we control for potential biases similarly across POM and Connected Speech. In other words, we need to be careful about the amount of information we give the Raters/transcribers for any given observation that they need to transcribe. For information that must be withheld from the transcribers to prevent bias, we must blind

it. While the specific factors that are relevant for this process differ according to whether Reliability is for POM or Connected Speech (see Script Selection boxes below), here are some key factors:

Patient ID
Language (Spanish, Catalan)
Observation (pre1, pre2, post1, tx session, post2, 3mo, 6mo, 12mo)
Script (1, 2, 3, 4, 5, 6, 7, 8)
Script training status (trained, untrained)

Our goal is to harmonize the two VISTA Reliability procedures as much as possible. For blinding, there are some differences that need to be addressed. For POM, the SLP will always be Rater 1. As such, they will have access to the Patient ID, the language, the observation, the script and the script training status. However, for Connected Speech, both transcribers must be blinded to the observation time point of the audio. We have decided that the most straightforward way to unify these processes is to have VISTA POM Reliability use clipped audio samples (rather than session videos like previously) that come directly from the current transcription pipeline along with different transcripts to be used as the base for the transcribers' coding. For Transcriber 1, the Reliability Supervisor will provide them with the SLP’s POM transcript to use as a base. For Transcriber 2, the Reliability Supervisor will provide them with the Whisper transcript. This means that the clipping and whisper teams will carry on like normal for the sessions for VISTA Connected Speech with the addition of the two treatment sessions needed for VISTA POM Reliability.

The Reliability Supervisor will achieve the above goals by blinding the materials needed for Reliability in the Box folders and linking to these blinded materials in the relevant SmartSheets. This means that both the audio and transcript files for the Raters to use will need to have a naming convention that hides the observation time point. VISTA POM and Connected Speech have different requirements for the number of transcriptions that must be checked and the number of possible observations that can be chosen. Read through the tabs in this table for a more in-depth explanation and overview of each procedure’s script selection and rationale.

Script selection for VISTA POM takes into account the language of the scripts and the observation time point. For VISTA POM, there are 6 total sessions. However, unlike with Connected Speech, Reliability for POM involves checking ALL scripts for each observation point. Also note that Pre_1 and Post_1/Post_2 sessions are probed in one language per session (either Spanish OR Catalan/English, depending on the multilingual being probed) the treatment sessions have scripts probed in both languages and that the post-treatment session can be either Post_1 or Post_2.

	Observation
Language	Pre_1	Tx_Phase1	Tx_Phase2	Post_1 OR Post_2
Spanish
Catalan/English
Spa + Cat/Eng
Spanish
Catalan/English

VISTA POM Reliability will use blinded SmartSheets as sources for materials. The Reliability Supervisor will populate the SmartSheet with the Box links to the blinded materials.

As shown above, VISTA POM Reliability requires that we choose two treatment sessions for Reliability. This means that we need to be careful in applying randomization to treatment session selection. We can randomly select numbers from 1-18. As there are 18 treatment sessions split across two phases (the first 9 treatment sessions in phase 1 and the last 9 treatment sessions in phase 2). We can use a Random Number Generator and generate numbers until we have one number that falls between 1-9 for Tx_Phase1 and one number that falls between 10-18 for Tx_Phase2.

Mid-Tx (between Tx_Phase1 and Tx_Phase2) will be a good time to do Reliability for the first two sessions (Pre_1 and Tx_Phase1).

Reliability for Connected Speech will use a different approach to randomization than VISTA POM. We have randomized which scripts will be selected by each Observation rather than by each individual patient. For each observation, one trained script (1,2,3,5,6,7) and one untrained script (4,8) will be selected for each patient. For trained scripts, we rotate through so that all trained scripts are selected at least once. For untrained scripts, we oscillate between script 4 and script 8. The Reliability Supervisor will need to blind different materials for each Transcriber. For both Transcribers, the relevant time point’s Audios must be blinded and linked. For Transcriber 1, the SLP’s POM transcript will need to be pasted into the SmartSheet. For Transcriber 2, the Whisper transcript will need to be blinded and linked in the SmartSheet.

		Observation
Script	Status	Pre_1	Pre_2	Post_1	Post_2	3moFU*	6moFU	12moFU
1	Trained
2	Trained
3	Trained
4	Untrained
5	Trained
6	Trained
7	Trained
8	Untrained

Not all participants will have a 3 month follow-up.

Codename	Meaning
BISE004_Literatura_AU	For naming convention, we simply remove the observation and the date from the clipped Data Raw files and replace it with the code below for the corresponding observation.
AU	Pre1
AD	Pre2
PU	Post1
PD	Post2
TE	3mo
SE	6mo
DD	12mo
LC	trained
BS	untrained
NN	treatment 1
WW	treatment 2
RR	treatment 3
UO	treatment 4
VI	treatment 5
II	treatment 6
VV	treatment 7
GG	treatment 8
NI	treatment 9
EE	treatment 10
VE	treatment 11
LE	treatment 12
TI	treatment 13
TU	treatment 14
TF	treatment 15
TX	treatment 16
TN	treatment 17
TH	treatment 18

Reliability Meeting Notes

Page-specific notes:

July 15, 2025:

-Updated the page to use advanced tabs/tables.

March 7, 2025:
-All the scripts probed in the session and the sample/source of the sample is changed from being the video of the session to the individual audios of the scripts.
-We can randomly select the two treatment sessions for each patient.
-Pre1 and Post sessions will be probed in one language per session. The Tx sessions will be include probes in both languages during each session.
-When we do this by language, only the Pre + Post observations/sessions will be language-specific because the Tx sessions include both languages.
-Mid Tx will be a good time to do Reliability for the first two observations.