9. Acoustic Derivations Guide
Acoustic Derivation Guide
Contents
- 1 Contents
- 2 Section 1. Audio Data Quality Test
- 3 Section 2. Acoustic derivations pipeline guide:
- 3.1 1. Access TACC Account
- 3.2 2. Enter Jupyter
- 3.3 3. Set up folder
- 3.4 4. Upload input files
- 3.5 5. Go to the terminal and navigate to acoustic script folder
- 3.6 6. Trim pauses at the start and end of audio files
- 3.7 7. Derive first acoustic parameters
- 3.8 8. Derive second acoustic parameters
- 3.9 9. Download output files
- 3.10 10. Log out and end job
- 3.11 11. Merge the files
Section 1. Audio Data Quality Test
This protocol describes how to check audio quality prior to acoustic analysis, especially when there’s a deviation from the recording protocol but it also serves as a sanity check in general.
Overview: Connected Speech Detailed overview and Box folder structure
Relevant Zoom meeting recording with Dr. Fernando Llanos can be found here:
Zoom Recording Re Checking Audio Sample Quality with Dr. Fernando Llanos_20250218.mp4 | Powered by Box .
We want to check subset of samples
1. Access data for quality check (Data Quality Checks)
Detect which audios have to be tested using:
SNR data of Redcap: Connected Speech Data Control_SNR
Create folder within Data Quality Checks folder, naming it “Data Quality Check_YYYYMMDD (Date in which the audios-videos have been copied over).
Copy audios from Connected Speech_Data folder to the newly created Data Quality Check_YYYYMMDD folder (! Copy not move)
2. Check data quality via PRAAT
The following procedures describe checking the quality of audio recordings from three aspects (SNR, number of syllables, and silence/sounding evaluation). They serve two purposes: one as a check for problematic or concerning files, the other as a routine sanity check for a subset of samples from a specific project.
1. Signal-to-Noise Ratio (SNR)
Open an audio file and visually inspect the waveform, especially the non-speech silence parts. Two things to pay attention to: (1) the amplitude of the non-speech signal (see two okay examples in the screenshot below); (2) the spectrum (specifically the formant) should look clear/white; especially high-frequency noise (e.g., around 3k Hz) would appear black in the spectrum and thus affect formant measurements whereas most ambient noise is “white” (see example below).
Manually calculate signal-to-noise ratio (SNR) and compare it between a speech and non-speech silence segment. (original calculated SNR by clinician here: Connected Speech Data Control_SNR)
Select one speech signal segment and extract the root-mean-square (RMS; a measure of loudness) value—RMS1;
Select one non-speech signal segment of similar length and extract its RMS value—RMS2;
Divide RMS1 by RMS2, then we got the SNR expressed in voltage—SNRvoltage (see more in this video
Signal-to-Noise Ratio);Should we multiply SNRvoltage by 100, it indicates the percentage of speech signal lost; based on this percentage, we can determine a threshold to evaluate other audio files (for the sample above, despite noise the SNR is still good and reliable for acoustic analysis).
Additionally, SNR can also be expressed in dB using the formula: SNRdB = 20 x log10(SNRvoltage). A threshold that we consider acceptable is 20-40db.
When possible, use sounding and silent periods marked in text grids to ensure that you are using appropriate time points for SNR calculation. For example, see below.
2. The number of syllables
Open an audio file of concern in PRAAT, select a 30-second speech segment, listen to it and manually count the number of syllables.
Estimate the number of syllables across the whole duration of the file and compare that with the value extracted from the script.
If the error rate is around 6%, we would accept but anything higher than 10% would be concerning.
3. Silence and sounding
Open an audio file and its associated text grid in PRAAT.
Visually review (i.e., skimming through the two bottom tracks in the picture above) whether the labels of “silent” and “sounding” in the textgrid accurately match the audio segments and pay attention to any discrepancies. This serves as a sanity check regarding background noise and buzzing sound issues (e.g., some non-speech signal might be labeled as sounding).
Some ideas for future projects:
Kesha may help with creating a script to calculate SNR and check across files based on the threshold for decision-making on audio quality.
Below are some decisions made (documented by Sonia) during our meeting with Dr. Llanos.
- If the SNR is between 20db and 40db, we can consider it acceptable and use the audio for acoustic analysis. Estimate SNR, calculate % loss of all the files, first 20% of each file, estimate SNR and measure by hand any of the metrics that we are using for each file.
- Samples recorded with soundcard in person vs samples recorded with zoom, which acoustic measures can we derive-obtain reliably? Prosodic elements, pause, duration of pauses, articulation rate YES but anything related to amplitude or spectral properties NO.
- Make changes in recording protocol if needed and take a note of the change in running list of changes
- Spectral properties and and amplitude - for differences of audacity nivel de
Section 2. Acoustic derivations pipeline guide:
1. Access TACC Account | Access the shared credentials to enter TACC Analysis Portal Go to https://stache.utexas.edu/ Enter with your UT credentials Click on “secret”
| |
1.1 | Access TACC Analysis Portal https://tap.tacc.utexas.edu/jobs/
Enter Dr. Grasso’s account information generated by Stache and log in.
The token changes every 30 seconds so please type it ASAP. |
|
1.2 | Submit a job on TACC by clicking on the dropdowns and selecting: Lonestar 6 Jupyter Notebook DBS23006 vm-small Nodes 1; Tasks 1 Job name (can be anything, we will use AcousticDer_Practice in this tutorial guide) Time limit (1 hour should be more than enough. However, if it’s your first time following this tutorial, you may want to give yourself more time to get through it – 2 hours should be fine) Click on “submit” *Note. If you're struggling to get nodes on the vm-small queue, I'd recommend trying the development queue. This applies to everything except transcription (where you should try gpu-a100-small, followed by gpu-a100-dev, followed by gpu-a100). *Note. If you aren’t able to submit a job (see error on the right as an example), follow the steps in this recording ( |
|
2. Enter Jupyter | If there are available nodes (picture A), you will be able to enter Jupiter right away. In that case, follow these steps:
Click on “connect” Click on “work” Click on “acousticScripts”
If there are no available nodes (picture B), you will have to wait in a queue until it’s available.
|
|
3. Set up folder | Click on “new” Click on “folder” Name folder (in this webpage, the name ACtrial will be used) File names can not have spaces nor brackets
|
|
4. Upload input files | Go into the folder you just created Click on “upload” and upload the input files If you cannot see the file you uploaded, click on “last modified” a couple of times. Sometimes it doesn’t update immediately.
| |
5. Go to the terminal and navigate to acoustic script folder | Once the file has been uploaded, click on the New menu dropdown Click on terminal Once in the terminal, type cdw, press enter Type cd acousticScripts, press enter
|
|
6. Trim pauses at the start and end of audio files(This script works for any type of audio file! It will output an monochannel .wav file without pauses at the start and end) | If your files are mono channel .wav files that already trimmed, move to step 7. Otherwise, execute this step even though they are in wav format as they may not be monochannel.
Type conda activate racs, press enter Type or copy the following command, then press enter: python monoTrimAudioFiles.py ACinput/ Keep in mind that the red and green sections change depending on the name you gave your folder (with input files) and the name you will give your folder (with output files). The commands in purple always remain the same (See picture A). Wait a few seconds (or minutes). You will know when it’s done running when you see this at the bottom of the terminal (see picture B). Then, type conda deactivate Press enter (see picture C) |
The parts circled in red change depending on the name of your folder and the output folder. ACinput/ is the name I gave my folder. Your command will change depending on the name of your folder. Remember that the exact orthography must match. If the name of the file that you uploaded is all in lowercase, the command in the terminal must be all in lowercase. The output folder’s name is the input folder’s named followed by ‘_Trimmed’. For example, for the input folder ACinput/, the output folder will be named ACinput_monoTrimmed/
|
7. Derive first acoustic parameters | Then, type conda activate textGridPauseSyllable Then, enter the command below, press enter: python derivePraatFeatsAndTextGrids_CAC.py Again, the sections in purple will remain the same while the parts in red, orange, and blue may change. - The orange name will be added to the front of the output file (see the picture to the right). - The name in blue can be whatever you decide. This will create a new folder for the textgrids to go. The orange and blue name process will be repeated in step 8, and they need to match in both steps.
This Generalized command does NOT need to be typed. It is to show you the role of each name in the command. python derivePraatFeatsAndTextGrids_CAC.py
If your files were originally WAV file and you didn’t need to do step 6, then the section in red will be the same name that you gave the folder where your input files are located. If you encountered an error in this step, it is possible that there is a “hidden” file that is causing another error. This error will look like: Sound not read from sound file “/work/09424/smgrasso1/ls6/acousticScripts/TRIMMED_FOLDER/.ipynb_checkpoints”. rm -rf TRIMMED_FOLDER/.ipynb_checkpoints If another issue arises in which you receive the error message “there is no module named ‘parselmouth’”, please type in: pip install praat-parselmouth
Then, repeat the normal derivePraat… command of this step. |
Note: ACtrialTextGrids can change (green section only). This code creates a new directory. Whatever you type, make sure it matches the final command (the long one at the end of step 7).
You should see this when the command is done running:
|
8. Derive second acoustic parameters | Now, type the following command, then press enter when done: python derivePauseDurationStatsFromTextGrid_CAC.py ACTextGrids/ This command must have the exact matching name of the blue folder from Step 7. The orange name will be used to name the output file from this step (see to the right for an example). PostTxSamples_NFV #Generalized command: python derivePauseDurationStatsFromTextGrid_CAC.py textgriddirectory outputName Remember that the code in blue will change depending on what you typed in previous steps *they need to match what you typed previously* If another issue arises in which you receive the error message “there is no module named ‘textgrid’”, please type in:
Then, repeat the normal derivePause… command of this step.
| You will see this when it is done running:
Example of orange name for PauseSyllableDurationStats.csv output. In this case, the command used had “ACderiveOut2” as the orange name. |
9. Download output files | Go back to the working directory. If the generated files did not pop up, click on “last modified,” sometimes it takes a minute to update. You will see 2 output files. Check the boxes and click on download. You may need to download one by one. |
|
10. Log out and end job | Go back to the terminal and run the following command: Log out from the terminal by typing the command logout (see picture A) Press enter Go back to the original TACC page and click on “end job” (see picture B).
IT IS VERY IMPORTANT TO END THE JOB AS THE NODES ARE VERY LIMITED. THE JOB WILL KEEP RUNNING UNLESS YOU COMPLETE THIS STEP. |
|
11. Merge the files | Once the files have been downloaded. Merge the columns to create a single file. On file A: you can see 7 features: # of syllables, speech rate, average syllable duration, articulation rate, speech-to-pause ratio, time, # of pauses (Picture A). On file B, you can see two features: Mean Pause Duration and Variability of Pause Duration (Picture B). Manually copy and paste these two features into file A to form one spreadsheet with 9 features (see picture).
This is the final file that contains the acoustic derivations of the samples. If you need to check the sounding/silent periods in the audio files, please save the .tgt text grid files from the blue folder (textgrid directory) that you used above in Steps 7 and 8. |
|
