Connected speech refers to natural, continuous spoken language produced in conversation or narrative, rather than isolated words or sentences. It provides rich acoustic and linguistic information on lexical retrieval, syntax, fluency, and speech rate, making it a sensitive measure of language impairment in PPA.

The administration and data analysis of the connected speech samples constitutes the research the Aim 2: Identify bilingualism factors associated with differential patterns of language impairment in Hispanics with PPA using metrics derived from connected speech.

[ 1 Overview of the processes and this guide ] [ 1.1 Detailed methodology ] [ 2 Training videos ] [ 3 Data structure overview ] [ 4 Pre-R01 Connected Speech samples ] [ 5 Overview of Processes/ Procedures ]

Overview of the processes and this guide

In connected speech, there are different processes, our main communication is completed though R01_CS_Data_Processing | Multilingual Aphasia and Dementia Research Lab | Microsoft Teams and in the https://cloud.wikis.utexas.edu/wiki/spaces/MADRWiki/pages/482182128

Process and guide	Current leads	Responsabilities of the lead

Process and guide	Current leads	Responsabilities of the lead
https://cloud.wikis.utexas.edu/wiki/x/7oBHI	Connected Speech supervisor	Oversees data accuracy and completeness across all Connected Speech and VISTA–Connected Speech SmartSheets and REDCap entries. Supports coordination of data tracking, ensures missing data are identified and added, and assists with supervising data processes related to connected speech.
https://cloud.wikis.utexas.edu/wiki/x/7oBHI	Data processing lead: Jada Li	Tasks assigned by connected speech supervisor
Adminisration, Recording and Saving audios
https://cloud.wikis.utexas.edu/wiki/spaces/MADRWiki/pages/342033746 https://cloud.wikis.utexas.edu/wiki/spaces/MADRWiki/pages/134283924	Speech and Language Pathologists DFT Sant Pau (Depanem): Jesús	Following SOPs to unsure accurate administration, recording and data saving procedures
Clipping+Whisper team (Arely, Aaliyah, Carmen, Jimena C, Jimena P, Jada) 3.1 Clipping and Whisper Student Team \| Multilingual Aphasia and Dementia Research Lab \| Microsoft Teams Meeting: Biweekly meeting with student leads, any other student part of the team is welcome to join in they are in the lab Thursdays at 3.30pm: https://cloud.wikis.utexas.edu/wiki/x/8IO9H
https://cloud.wikis.utexas.edu/wiki/spaces/MADRWiki/pages/56197971	Arely Aguilar	Train new students on clipping Supervise clipping status using the report Supervise reclipping status using the reclipping reports Make any changes to the Connected Speech SmartSheets or reports if indicated by the supervisor
https://cloud.wikis.utexas.edu/wiki/spaces/MADRWiki/pages/56197201	Aaliyah M Segura	Train new students on Whisper Once a week, supervise pending samples to run (Picnic Scene or Important Event) Run samples or post them in the Whisper & Clipping channel for the next student starting their shift Ensure all samples requested via the channel are being run Supervise Whisper status using the reports Keep running Connected Speech samples in order of priority set in the “3. Whisper Transcription Process (Research Assistants)” page -- Deadline goal - running all samples before 5 dic 2025
Transcription team (Whendy, Helena, Jaume)
https://cloud.wikis.utexas.edu/wiki/spaces/MADRWiki/pages/56197560		Transcribing following the priority set in R01_CS_Data_Processing \| Multilingual Aphasia and Dementia Research Lab \| Microsoft Teams and in the https://cloud.wikis.utexas.edu/wiki/x/8IO9H
4.2 Utterance segmentation process (transcribers)
6. Coding process (transcribers)archived
Data processing team
5. Language analysis process/overview	Project lead
6. Acoustic Derivations Guide	Project lead
7. Connected Speech/Transcription Reliability	Project lead

Detailed methodology

Source: AoA_2025_Grasso_Santos

Training videos

Specific training videos of these processes. CS_Connected SpeechR01Protocol_Training Videos

Data structure overview

Connected speech overview structure, that explains how the files are saved in Box:

https://cloud.wikis.utexas.edu/wiki/spaces/MADRWiki/whiteboard/246612214

Here are the Connected Speech Data management smart sheets: CONNECTED SPEECH DATA MANAGEMENT SMARTSHEET FOLDER
All participants follow this structure, independently of the site of treatment (Mexico, Barcelona, Austin, etc.,) (Decision 20241115)

Pre-R01 Connected Speech samples

Before April 2024 all Connected Speech Samples were recorded via ZOOM. These samples are part of the Connected Speech Data Raw but the videos/audios are kept in a separate folder because they were recorded differently, and the format is also different.

The Pre-R01 tasks include:

STUDY: Therapy trial pre-R01.
- Picnic Scene
- Cat Rescue
- Important Event
STUDY: SpeechFTLD A and B, samples can currently be found here: https://utexas.box.com/s/mu7f0437ls63p22wy4on6f04jxotpgx2
- Procedural task Brushing teeth
- Picnic scene

20251001 Dr.Grasso and Sonia meeting, for Pre-R01 samples we currently have the 1st visit date for each timepoint in the Connected Speech smartsheets for the administration of the Connected Speech samples. We plan on having the exact date of administration for all the timepoints to have more accurate data, but for now we will use those dates that have been extracted from the MADR participant sheet for all the timepoints except for POST, since we don’t have a post timepoint in the MADR participant smartsheet. Jada (RA student) is currently working on adding the post-tx exact dates to the connected speech smartsheets, and we plan on start adding the rest of the dates switching them from the current approximate dates once the post-tx dates are added for both Spanish and Catalan.

Scheme of where the pre-R01 samples are saved:

Overview of Processes/ Procedures

Can the following elements get included in the app? (Y/N/M)

The general overview of processing procedures is as follows:

Sample is recorded YES
Sample is saved in folder on Box NO- and Maybe not necessary
Sample is clipped to ensure no clinician speech or background noise is included Maybe. Probably this could happen OUTSIDE of app and just have an option to re-upload when finished so that the following steps happen automatically. Could diarization take care of this Kesha, has it gotten any better? Then, it could happen in the app and decide to apply diarization so it’s patient-only speech or not!
1. Cody says in the app there should be a way to clip within app and that way they can stay in app.
  1. Still we want an upload option for audios for older samples
a. Audio sample gets run through the following steps
1. A script that cuts pauses from start and end of samples (Kesha confirmed this is correct and is already in the Acoustic Pipeline) maybe Would definitely like this included. Kesha is this detected via a specific dB level at present? Takes in audio sample and for each audio segment it determines if it’s silent or not. Kesha will look into it
2. Audio is run through Acoustic Pipeline (Kesha’s script) MAYBE Would definitely like this included
3. Making Cognitive Screening Easy [CC]

4. b.Audio sample goes through the following steps

a. Processed through Whisper Maybe- Currently TACC (need to see if Kesha has a more verbatim model for Whisper yet) Would like this included but seems like the thing that could get pricey. Would need to be available only to certain users (true of the steps above and below as well)

b. Transcriber finalizes transcript from Whisper for CLAN (corrects it and formats it) MAYBE?? Probably this could happen OUTSIDE of app and just have an option to re-upload when finished so that the following steps happen automatically (how hard do we think it would be to automatize it?)

c. Transcript is formatted for CLAN so we can derive specific features from CLAN’s system

d. Transcript is then stripped using a specific code to ensure certain elements that can negatively influence linguistic pipeline aren’t included (this goes through a script that strips specific content) Maybe Would definitely like this included

e. Linguistic features are extracted from linguistic pipeline (Kesha’s script) MAYBE Would definitely like this included

MUCH LATER: Some graphs showing how the person did on each feature relative to controls (or others with the dx) on the same task