9. Acoustic Derivations Guide

9. Acoustic Derivations Guide

Acoustic Derivation Guide

Contents

 

Section 1. Audio Data Quality Test

This protocol describes how to check audio quality prior to acoustic analysis, especially when there’s a deviation from the recording protocol but it also serves as a sanity check in general.

image-20250911-200954.png
Acoustic Quality Check overview (last updated 20250911)

Overview: Connected Speech Detailed overview and Box folder structure

Relevant Zoom meeting recording with Dr. Fernando Llanos can be found here: Zoom Recording Re Checking Audio Sample Quality with Dr. Fernando Llanos_20250218.mp4 | Powered by Box .

We want to check subset of samples

1. Access data for quality check (Data Quality Checks)

  1. Detect which audios have to be tested using:

    1. Smartsheets of Connected Speech Data Management

    2. SNR data of Redcap: Connected Speech Data Control_SNR

  2. Create folder within Data Quality Checks folder, naming it “Data Quality Check_YYYYMMDD (Date in which the audios-videos have been copied over).

  3. Copy audios from Connected Speech_Data folder to the newly created Data Quality Check_YYYYMMDD folder (! Copy not move)

2. Check data quality via PRAAT

The following procedures describe checking the quality of audio recordings from three aspects (SNR, number of syllables, and silence/sounding evaluation). They serve two purposes: one as a check for problematic or concerning files, the other as a routine sanity check for a subset of samples from a specific project.

1. Signal-to-Noise Ratio (SNR)

  1. Open an audio file and visually inspect the waveform, especially the non-speech silence parts. Two things to pay attention to: (1) the amplitude of the non-speech signal (see two okay examples in the screenshot below); (2) the spectrum (specifically the formant) should look clear/white; especially high-frequency noise (e.g., around 3k Hz) would appear black in the spectrum and thus affect formant measurements whereas most ambient noise is “white” (see example below).

image-20250218-215746.png
Compare two non-speech signals: left is not ideal but still okay; right is okay.
image-20250218-222052.png
image-20250218-215028.png
There are some “black” noise spectrum (around formants 2&3).
  1. Manually calculate signal-to-noise ratio (SNR) and compare it between a speech and non-speech silence segment. (original calculated SNR by clinician here: Connected Speech Data Control_SNR)

    1. Select one speech signal segment and extract the root-mean-square (RMS; a measure of loudness) value—RMS1;

    2. Select one non-speech signal segment of similar length and extract its RMS value—RMS2;

    3. Divide RMS1 by RMS2, then we got the SNR expressed in voltage—SNRvoltage (see more in this video Signal-to-Noise Ratio);

    4. Should we multiply SNRvoltage by 100, it indicates the percentage of speech signal lost; based on this percentage, we can determine a threshold to evaluate other audio files (for the sample above, despite noise the SNR is still good and reliable for acoustic analysis).

    5. Additionally, SNR can also be expressed in dB using the formula: SNRdB = 20 x log10(SNRvoltage). A threshold that we consider acceptable is 20-40db.

When possible, use sounding and silent periods marked in text grids to ensure that you are using appropriate time points for SNR calculation. For example, see below.

image-20250408-193057.png

2. The number of syllables

  1. Open an audio file of concern in PRAAT, select a 30-second speech segment, listen to it and manually count the number of syllables.

  2. Estimate the number of syllables across the whole duration of the file and compare that with the value extracted from the script.

  3. If the error rate is around 6%, we would accept but anything higher than 10% would be concerning.

3. Silence and sounding

  1. Open an audio file and its associated text grid in PRAAT.

  2. Visually review (i.e., skimming through the two bottom tracks in the picture above) whether the labels of “silent” and “sounding” in the textgrid accurately match the audio segments and pay attention to any discrepancies. This serves as a sanity check regarding background noise and buzzing sound issues (e.g., some non-speech signal might be labeled as sounding).

Some ideas for future projects:

  • Kesha may help with creating a script to calculate SNR and check across files based on the threshold for decision-making on audio quality.

Below are some decisions made (documented by Sonia) during our meeting with Dr. Llanos.

  1. If the SNR is between 20db and 40db, we can consider it acceptable and use the audio for acoustic analysis. Estimate SNR, calculate % loss of all the files, first 20% of each file, estimate SNR and measure by hand any of the metrics that we are using for each file.
  2. Samples recorded with soundcard in person vs samples recorded with zoom, which acoustic measures can we derive-obtain reliably? Prosodic elements, pause, duration of pauses, articulation rate YES but anything related to amplitude or spectral properties NO.
  3. Make changes in recording protocol if needed and take a note of the change in running list of changes
  4. Spectral properties and and amplitude - for differences of audacity nivel de
@Sonia Marqués check when CS samples started to be recorded with 50% nivel de grabación volume and take note in running list of changes Feb 19, 2025
@Sonia Marqués discuss in SLP meeting if they have been adjusting the knob a bit to the left https://cloud.wikis.utexas.edu/wiki/x/yoFZAw and find email with rosemary, - SMK: I don’t remember that this was decided

 

Section 2. Acoustic derivations pipeline guide:

 

1. Access TACC Account

Access the shared credentials to enter TACC Analysis Portal

Go to https://stache.utexas.edu/

Enter with your UT credentials

Click on “secret”

 

 

1.1

Access TACC Analysis Portal https://tap.tacc.utexas.edu/jobs/

 

Enter Dr. Grasso’s account information generated by Stache and log in.

 

The token changes every 30 seconds so please type it ASAP.

 

1.2

Submit a job on TACC by clicking on the dropdowns and selecting:

Lonestar 6

Jupyter Notebook

DBS23006

vm-small

Nodes 1; Tasks 1

Job name (can be anything, we will use AcousticDer_Practice in this tutorial guide)

Time limit (1 hour should be more than enough. However, if it’s your first time following this tutorial, you may want to give yourself more time to get through it – 2 hours should be fine)

Click on “submit”

*Note. If you're struggling to get nodes on the vm-small queue, I'd recommend trying the development queue. This applies to everything except transcription (where you should try gpu-a100-small, followed by gpu-a100-dev, followed by gpu-a100).

*Note. If you aren’t able to submit a job (see error on the right as an example), follow the steps in this recording (Video Conferencing, Web Conferencing, Webinars, Screen Sharing ) to clear cache on TACC.

image-20240603-192512.png

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

image-20250220-230337.png

2. Enter Jupyter

If there are available nodes (picture A), you will be able to enter Jupiter right away. In that case, follow these steps:

 

Click on “connect”

Click on “work”

Click on “acousticScripts”

 

If there are no available nodes (picture B), you will have to wait in a queue until it’s available.

 

 

 

image-20240603-192617.png

 

image-20240603-192812.png

3. Set up folder

Click on “new”

Click on “folder”

Name folder (in this webpage, the name ACtrial will be used)

File names can not have spaces nor brackets

 

 

image-20240603-192948.png

 

4. Upload input files

Go into the folder you just created

Click on “upload” and upload the input files

If you cannot see the file you uploaded, click on “last modified” a couple of times. Sometimes it doesn’t update immediately.

 

 

5. Go to the terminal and navigate to acoustic script folder

Once the file has been uploaded, click on the New menu dropdown

Click on terminal

Once in the terminal, type cdw, press enter

Type cd acousticScripts, press enter

 

 

image-20240603-193051.png

 

6. Trim pauses at the start and end of audio files

(This script works for any type of audio file! It will output an monochannel .wav file without pauses at the start and end)

If your files are mono channel .wav files that already trimmed, move to step 7. Otherwise, execute this step even though they are in wav format as they may not be monochannel.

 

Type conda activate racs, press enter

Type or copy the following command, then press enter:

python monoTrimAudioFiles.py

ACinput/

Keep in mind that the red and green sections change depending on the name you gave your folder (with input files) and the name you will give your folder (with output files). The commands in purple always remain the same (See picture A).

Wait a few seconds (or minutes). You will know when it’s done running when you see this at the bottom of the terminal (see picture B).

Then, type conda deactivate

Press enter (see picture C)

 

image-20240603-193200.png

The parts circled in red change depending on the name of your folder and the output folder.

ACinput/ is the name I gave my folder. Your command will change depending on the name of your folder. Remember that the exact orthography must match. If the name of the file that you uploaded is all in lowercase, the command in the terminal must be all in lowercase.

The output folder’s name is the input folder’s named followed by ‘_Trimmed’. For example, for the input folder ACinput/, the output folder will be named ACinput_monoTrimmed/

image-20240603-193228.png

 

 

7. Derive first acoustic parameters

Then, type conda activate textGridPauseSyllable

Then, enter the command below, press enter:

python derivePraatFeatsAndTextGrids_CAC.py
ACinput_monoTrimmed/
ACderiveOut1
ACTextGrids/

Again, the sections in purple will remain the same while the parts in red, orange, and blue may change.

- In Example A, the folder name in red needs to match the folder name that you entered in step 6 (which is the name of the prior step’s output folder).

- The orange name will be added to the front of the output file (see the picture to the right).

- The name in blue can be whatever you decide. This will create a new folder for the textgrids to go.

The orange and blue name process will be repeated in step 8, and they need to match in both steps.

 

This Generalized command does NOT need to be typed. It is to show you the role of each name in the command.

python derivePraatFeatsAndTextGrids_CAC.py
inputDirectory/
outputName textgriddirectory/

 

If your files were originally WAV file and you didn’t need to do step 6, then the section in red will be the same name that you gave the folder where your input files are located.

If you encountered an error in this step, it is possible that there is a “hidden” file that is causing another error. This error will look like:

Sound not read from sound file “/work/09424/smgrasso1/ls6/acousticScripts/TRIMMED_FOLDER/.ipynb_checkpoints”.

To fix this, you need to delete this hidden file by typing in:

rm -rf TRIMMED_FOLDER/.ipynb_checkpoints

Then hit Enter.

If another issue arises in which you receive the error message “there is no module named ‘parselmouth’”, please type in:

pip install praat-parselmouth


Hit Enter.

Then, repeat the normal derivePraat… command of this step.

 

image-20240603-193345.png

 

Note: ACtrialTextGrids can change (green section only). This code creates a new directory. Whatever you type, make sure it matches the final command (the long one at the end of step 7).

 

You should see this when the command is done running:

image-20240603-193443.png

 






Example of orange name for PraatFeatsFromCitedWork.csv output. In this case, the command used had “ACderiveOut1” as the orange name.

image-20250312-222212.png

 

 

8. Derive second acoustic parameters

Now, type the following command, then press enter when done:

python derivePauseDurationStatsFromTextGrid_CAC.py ACTextGrids/
ACderiveout2

This command must have the exact matching name of the blue folder from Step 7. The orange name will be used to name the output file from this step (see to the right for an example).

PostTxSamples_NFV #Generalized command: python derivePauseDurationStatsFromTextGrid_CAC.py textgriddirectory outputName

Remember that the code in blue will change depending on what you typed in previous steps *they need to match what you typed previously*

If another issue arises in which you receive the error message “there is no module named ‘textgrid’”, please type in:

pip install textgrid


Hit Enter.

Then, repeat the normal derivePause… command of this step.

 

image-20240603-193615.png

You will see this when it is done running:

image-20240603-193552.png

 

 

 

Example of orange name for PauseSyllableDurationStats.csv output. In this case, the command used had “ACderiveOut2” as the orange name.

image-20250312-223439.png

9. Download output files

Go back to the working directory. If the generated files did not pop up, click on “last modified,” sometimes it takes a minute to update.

You will see 2 output files. Check the boxes and click on download. You may need to download one by one.

image-20240603-193649.png

 

10. Log out and end job

Go back to the terminal and run the following command:

Log out from the terminal by typing the command logout (see picture A)

Press enter

Go back to the original TACC page and click on “end job” (see picture B).

 

IT IS VERY IMPORTANT TO END THE JOB AS THE NODES ARE VERY LIMITED. THE JOB WILL KEEP RUNNING UNLESS YOU COMPLETE THIS STEP.

 

image-20240603-193720.png

 

image-20240603-193749.png

 

11. Merge the files

Once the files have been downloaded. Merge the columns to create a single file.

On file A: you can see 7 features: # of syllables, speech rate, average syllable duration, articulation rate, speech-to-pause ratio, time, # of pauses (Picture A).

On file B, you can see two features: Mean Pause Duration and Variability of Pause Duration (Picture B).

Manually copy and paste these two features into file A to form one spreadsheet with 9 features (see picture).

 

This is the final file that contains the acoustic derivations of the samples.



If you need to check the sounding/silent periods in the audio files, please save the .tgt text grid files from the blue folder (textgrid directory) that you used above in Steps 7 and 8.

 

image-20240603-194127.png

 

image-20240603-194141.png

 

image-20240603-194201.png