3. Whisper Transcription Process (Research Assistants)

3. Whisper Transcription Process (Research Assistants)

 

How to Process Audios Through Whisper

What is Whisper? - Whisper is an advanced automatic speech recognition (ASR) system developed by OpenAI that converts spoken language from audio recordings into written text. It supports transcription in multiple languages and can also translate speech into English. When executed on the Texas Advanced Computing Center (TACC), Whisper efficiently processes large-scale audio datasets to produce precise, time-aligned transcriptions—making it an invaluable tool for research, analysis, and data processing workflows.

Running Whisper_Video Tutorial.mov
Video Tutorial On How To Use Whisper

Clipping and reclipping should always be completed in the lab. If you must work remotely, it’s essential to follow these steps since we are handling patient data:

  • Connect through the UT VPN

  • Work in a private, secure location

1. Update Status of Whisper Processing in Smartsheet

Update the status of whisper transcriptions in Smartsheet report the files belong in such as these:

Make sure to mark them depending on the step you are on:

  • “In progress” if you are going to run it in Whisper

  • “Completed” once you are done running them

“Pre-whisper” means that the file has already been transcribed manually before Whisper began to be used.

2. Checking and correcting Audio File Names and that files are saved in the correct folder Before Running Whisper

Before processing your audio files with Whisper, it’s important to verify that all file names are correct and consistent. Proper naming helps avoid errors during transcription and keeps your workflow organized. Make sure that each audio file follows the expected naming convention (for example, speaker_date_topic.wav) and that there are no typos, missing extensions, or unsupported characters (such as spaces or special symbols).

3. Access TACC Account

Access the shared credentials on 1password to enter TACC Analysis Portal

  • Go to https://stache.utexas.edu/

  • Enter with your UT credentials

  • Click on "secret"

  • If you can't find it, message Dr.Grasso in you supervisor chat

3.1 Option A

USING THE LABS laptop

Use the MAC laptop found in the main lab space, if not on table, ask Sonia to retrieve it for you

If you are accessing TACC for the first time, please go here: https://tap.tacc.utexas.edu/

If you have already logged into TACC before, please access the Analysis Portal here: https://tap.tacc.utexas.edu/jobs/
Enter Dr. Grasso's account information generated on Stache and log in.

3.1 Option B

USING your own device

If you are accessing it for the first time you need a TACC token. Send a message on the channel requestion for the TACC token tagging the team and @Kesha Pugalenthi .

The TACC token is a number that gets sent to us privately and it has an expiration time, please make sure to stay on this site until you have entered the TACC Token

 

Please do not sign out of the TACC account after running whisper in your device so this step isn’t needed the next time you run whisper

 

Request for TACC token
image-20250626-190610.png

 

3.2

Submit a job on TACC by clicking on the dropdowns and selecting:

  • Lonestar 6

  • Jupiter Notebook

  • DBS23006

  • gpu-a100-small

  • Nodes 1; Tasks 1

  • Job name (can be anything)

  • Time limit (2 hours is recommended when downloading several or lengthy audio files)

  • Click on "submit"

    *Note: If you're struggling to get nodes on the gpu-a100-small queue, try by gpu-a100-dev, followed by gpu-a100

  • You can check which queues are open here

4. Enter Jupiter

If there are available nodes (picture A), you will be able to enter Jupiter right away. In that case, follow these steps:

  • Click on "connect"

  • Click on "work"

  • Click on whisperRuns


    If there are no available nodes (picture B), you will have to wait in a queue until it's available.


A

B

5. Set up Folder

  • Click on "new"

  • Click on "folder"

  • Name folder (for this example, the name Feb09Spanish_WAB will be used)


    This step is only needed when running several files at the same time.

6. Upload Audio Files

  • Go to the file you created (Feb09Spanish_WAB)

  • Create another file with the name "input"

  • Go into the folder and click on "upload" to upload audio files.  It is important not to leave spaces between words when naming the audio eg: BISE016_post_1_Cat instead of 

    BISE016_post 1_Cat because the program will not find the file. 

  • Remember that you must have downloaded and stored the audio files prior to this step

    If running only 1 file, upload your file directly in the folder whisperRuns



-------------------------------------------------------------------------------------------------

7. Go to the Terminal

  • Once your files have been uploaded, click on the new menu dropdown

  • Click on terminal

8. Type the Command (If running several files at once)

Once you are in the terminal, follow these steps (if running several audio files):

  • Type cdw, press enter

  • Type cd whisperRuns, press enter

  • Type cd Feb09Spanish_WAB, press enter

  • Type conda activate runWhisper, press enter

  • Type command below, then press enter:



    whisper --model large-v3 --language Spanish --output_format txt --device cuda --word_timestamps True --hallucination_silence_threshold 2 --output_dir Feb09Output input/*




The commands in red change depending on:

  • Language of the audio sample, in this case Spanish.

  • Feb09Output is the name that we are giving to the folder where the generated files will be stored. You can change this part of the command depending on the name you want to give the folder.

  • Input is the name of the folder that we previously created (where our input audio files are saved). Remember that the exact ortography must match. If the name of the folder is all in lowercase, the command in the terminal must be in all lowercase.

 

 

8. Type the command (if running single file)

If running 1 single file, follow these steps:

  • Type cdw, press enter

  • Type cd whisperRuns, press enter

  • Type conda activate runWhisper

  • Press enter

Type command below, then press enter:

whisper --model large-v3 --language Spanish --output_format txt --device cuda --hallucination_silence_threshold 8 --output_dir BISE004_Output BISE004_CatRescue.wav

The commands in bold change depending on:

  • Language of the audio sample, in this case Spanish.

  • BISE004_Output is the name that we are giving to the output file. You can change this part of the command depending on the name you want to give the file.

  • BISE004_CatRescue.wav is the name of the file that we previously uploaded. The command will change depending on the file being transcribed and its name.

    Remember that the exact ortography must match. If the name of the folder is all in lowercase, the command in the terminal must be in all lowercase.

 

 

9. Whisper Running

Depending on the number of files, they may take some time to run. You will see it transcribing in real time and will know when it's done running when you see this at the bottom of the terminal (see picture).

10. Find Output Files

Once the files are finished running, follow the next steps:

  • Go back to the notebook

  • Click on the refresh button (if needed)

    If the generated files did not pop up, click on "last modified".
    If single file, you will find the output file within the whisperRuns folder (see picture A).
    If multiple files, you will find the output files within the Feb09Spanish_WAB folder (see picture B). You should see a subfolder called Feb09Output where your transcriptions will be stored.

A




B

11. Download Output Files

  • Download your files

  • You may need to download one by one (if several files)

  • To download your folder of transcriptions as a zip file, type command below, then press enter:

Type command below, then press enter:

zip -r BISE004_Output.zip BISE004_Output

The commands in bold change depending on:

  • BISE004_Output.zip is the name that we are giving to the output zip file file. You can change this part of the command depending on the name you want to give the file.

  • BISE004_Output is the name of the folder with transcripts

12. Clear Cache from TACC

Run command to clear cache. This is an important step because TACC will not allow you to start in the future if the home directory exceeds 9GB (the cache directory is in the home directory).

  • Type rm -r ~/.cache/ on the terminal

  • Press enter

13. Log Out

  • Logout from the terminal by typing the command "logout" (see picture A).






  • Go back to the original TACC page and click on "end job"


    IT IS VERY IMPORTANT TO END THE JOB AS THE NODES ARE VERY LIMITED. IT WILL KEEP RUNNING UNLESS YOU DO THIS STEP.

A


B

14. Upload Files to Box

Go to this link: https://utexas.box.com/s/ghd8ho1ciko1n2le94u796cnhcf57rgw
The link will show you the hierarchy (task steps and box folders)
As you can see on the hierarchy, we want to be on the Connected Speech Data folder: https://utexas.box.com/s/uz6206lel0544c3auou0nsw57egcv5q6
Once in this folder, go to the following sub-folders:

  1. Therapy trial

  2. Spanish (choose language depending on your transcription)

  3. Clipped audio of tasks

  4. S1_CatRescue_PictureStoryDescription

  5. Finally, upload the txt files (Whisper output) in this folder: 1. TaskName_whisper_output

  6. After audio/video file(s) has been run in whisper, move the file(s) to this folder: 0. TaskName_Audios Processed through whisper

    Keep in mind that the folders highlighted in purple will change depending on your transcription task. Cat Rescue Picture Description was chosen for the example. However, you will need to choose the appropriate folder name depending on the type of file you are running through Whisper. For example, if you are running a Cat Rescue Recall task, then the folders would be:
     

    1. Therapy trial

    2. Spanish

    3. S1_CatRescue_Recall

    4. CatRescueRecall_whisper_output

 

 



15. Delete Files from TACC

Once the files have been uploaded to Box, delete them from TACC.

16. Unfinished Whisper Files

A printed calendar of schedules for the Whisper and Clipping Team is now available on the main lab’s big desk. This calendar should be used to verify who is available to assume Whisper tasks if a shift concludes before all files have been processed. The digital version of the calendar can be accessed through the provided link. At the end of each day, the last individual using Whisper is responsible for saving all completed files, properly shutting down the laptop, and storing it in the white cabinet located next to the snacks.

Digital Smartsheet: https://utexas.app.box.com/file/2043269077900?s=eizx5tb1ijkjpr35gv5jib5efeqr64xj