3. Whisper Transcription Process (Research Assistants)

- 1.1 How to Process Audios Through Whisper
2 Samples to Prioritize

How to Process Audios Through Whisper

What is Whisper? - Whisper is an advanced automatic speech recognition (ASR) system developed by OpenAI that converts spoken language from audio recordings into written text. It supports transcription in multiple languages and can also translate speech into English. When executed on the Texas Advanced Computing Center (TACC), Whisper efficiently processes large-scale audio datasets to produce precise, time-aligned transcriptions—making it an invaluable tool for research, analysis, and data processing workflows.

Video Tutorial On How To Use Whisper

Clipping and reclipping should always be completed in the lab. If you must work remotely, it’s essential to follow these steps since we are handling patient data:

Connect through the UT VPN
Work in a private, secure location

1. Update status of whisper processing in smartsheet	Update the status of whisper transcriptions in Smartsheet report the files belong in such as these: Spanish: 1.4 SPANISH CS WHISPER PROCESSING STATUS Catalan: 2.4 CATALAN CS WHISPER PROCESSING STATUS Mark you name under the file you are running. Make sure to mark them depending on the step you are on: “In progress” if you are going to run it in Whisper “Completed” once you are done running them “Pre-whisper” means that the file has already been transcribed manually before Whisper began to be used. These steps are essential to ensuring files are accurately documented throughout the process.
2. Checking and correcting audio file names and that files are saved in the correct folder before running whisper	Before processing your audio files with Whisper, it’s important to verify that all file names are correct and consistent. Proper naming helps avoid errors during transcription and keeps your workflow organized. Make sure that each audio file follows the expected naming convention (for example, `speaker_date_topic.wav`) and that there are no typos, missing extensions, or unsupported characters (such as spaces or special symbols).
3. Access TACC account	Access the shared credentials on 1password to enter TACC Analysis Portal Go to https://stache.utexas.edu/ Enter with your UT credentials Click on "secret" If you can't find it, message Dr.Grasso in you supervisor chat
3.1 Option A USING THE LABS laptop	Use the MAC laptop found in the main lab space, if not on table, ask Sonia to retrieve it for you If you are accessing TACC for the first time, please go here: https://tap.tacc.utexas.edu/ If you have already logged into TACC before, please access the Analysis Portal here: https://tap.tacc.utexas.edu/jobs/ Enter Dr. Grasso's account information generated on Stache and log in.
3.1 Option B USING your own device	If you are accessing it for the first time you need a TACC token. Send a message on the channel requestion for the TACC token tagging the team and @Kesha Pugalenthi . The TACC token is a number that gets sent to us privately and it has an expiration time, please make sure to stay on this site until you have entered the TACC Token Please do not sign out of the TACC account after running whisper in your device so this step isn’t needed the next time you run whisper	Request for TACC token
3.2	Submit a job on TACC by clicking on the dropdowns and selecting: Lonestar 6 Jupiter Notebook DBS23006 gpu-a100-small Nodes 1; Tasks 1 Job name (can be anything) Time limit (2 hours is recommended when downloading several or lengthy audio files) Click on "submit" Note: If you're struggling to get nodes on the gpu-a100-small queue, try by gpu-a100-dev, followed by gpu-a100 You can check which queues are open here* https://tap.tacc.utexas.edu/status/render/Lonestar6/?back=jobs
4. Enter jupiter	If there are available nodes (picture A), you will be able to enter Jupiter right away. In that case, follow these steps: Click on "connect" Click on "work" Click on whisperRuns If there are no available nodes (picture B), you will have to wait in a queue until it's available.	A B
5. Set up folder	Click on "new" Click on "folder" Name folder (for this example, the name Feb09Spanish_WAB will be used) This step is only needed when running several files at the same time.
6. Upload audio files	Go to the file you created (Feb09Spanish_WAB) Create another file with the name "input" Go into the folder and click on "upload" to upload audio files. It is important not to leave spaces between words when naming the audio eg: BISE016_post_1_Cat instead of BISE016_post 1_Cat because the program will not find the file. Remember that you must have downloaded and stored the audio files prior to this step If running only 1 file, upload your file directly in the folder whisperRuns	-------------------------------------------------------------------------------------------------
7. Go to the terminal	Once your files have been uploaded, click on the new menu dropdown Click on terminal
8. Type the command (If running several files at once)	Once you are in the terminal, follow these steps (if running several audio files): Type cdw, press enter Type cd whisperRuns, press enter Type cd Feb09Spanish_WAB, press enter Type conda activate runWhisper, press enter Type command below, then press enter: whisper --model large-v3 --language Spanish --output_format txt --device cuda --word_timestamps True --hallucination_silence_threshold 2 --output_dir Feb09Output input/*	The commands in red change depending on: Language of the audio sample, in this case Spanish. Feb09Output is the name that we are giving to the folder where the generated files will be stored. You can change this part of the command depending on the name you want to give the folder. Input is the name of the folder that we previously created (where our input audio files are saved). Remember that the exact ortography must match. If the name of the folder is all in lowercase, the command in the terminal must be in all lowercase.
8. Type the command (if running single file)	If running 1 single file, follow these steps: Type cdw, press enter Type cd whisperRuns, press enter Type conda activate runWhisper Press enter Type command below, then press enter: whisper --model large-v3 --language Spanish --output_format txt --device cuda --hallucination_silence_threshold 2 --output_dir BISE004_Output BISE004_CatRescue.wav The commands in bold change depending on: Language of the audio sample, in this case Spanish. BISE004_Output is the name that we are giving to the output file. You can change this part of the command depending on the name you want to give the file. BISE004_CatRescue.wav is the name of the file that we previously uploaded. The command will change depending on the file being transcribed and its name. Remember that the exact ortography must match. If the name of the folder is all in lowercase, the command in the terminal must be in all lowercase.
9. Whisper running	Depending on the number of files, they may take some time to run. You will see it transcribing in real time and will know when it's done running when you see this at the bottom of the terminal (see picture).
10. Find output files	Once the files are finished running, follow the next steps: Go back to the notebook Click on the refresh button (if needed) If the generated files did not pop up, click on "last modified". If single file, you will find the output file within the whisperRuns folder (see picture A). If multiple files, you will find the output files within the Feb09Spanish_WAB folder (see picture B). You should see a subfolder called Feb09Output where your transcriptions will be stored.	A B
11. Download output files	Download your files You may need to download one by one (if several files) To download your folder of transcriptions as a zip file, type command below, then press enter: Type command below, then press enter: zip -r BISE004_Output.zip BISE004_Output The commands in bold change depending on: BISE004_Output.zip is the name that we are giving to the output zip file file. You can change this part of the command depending on the name you want to give the file. BISE004_Output is the name of the folder with transcripts
12. Clear cache from TACC	Run command to clear cache. This is an important step because TACC will not allow you to start in the future if the home directory exceeds 9GB (the cache directory is in the home directory). Type rm -r ~/.cache/ on the terminal Press enter
13. Log out	Logout from the terminal by typing the command "logout" (see picture A). Go back to the original TACC page and click on "end job" IT IS VERY IMPORTANT TO END THE JOB AS THE NODES ARE VERY LIMITED. IT WILL KEEP RUNNING UNLESS YOU DO THIS STEP.	A B
14. Upload files to box	Go to this link: https://utexas.box.com/s/ghd8ho1ciko1n2le94u796cnhcf57rgw The link will show you the hierarchy (task steps and box folders) As you can see on the hierarchy, we want to be on the Connected Speech Data folder: https://utexas.box.com/s/uz6206lel0544c3auou0nsw57egcv5q6 Once in this folder, go to the following sub-folders: Therapy trial Spanish (choose language depending on your transcription) Clipped audio of tasks S1_CatRescue_PictureStoryDescription Finally, upload the txt files (Whisper output) in this folder: 1. TaskName_whisper_output After audio/video file(s) has been run in whisper, move the file(s) to this folder: 0. TaskName_Audios Processed through whisper Keep in mind that the folders highlighted in purple will change depending on your transcription task. Cat Rescue Picture Description was chosen for the example. However, you will need to choose the appropriate folder name depending on the type of file you are running through Whisper. For example, if you are running a Cat Rescue Recall task, then the folders would be: Therapy trial Spanish S1_CatRescue_Recall CatRescueRecall_whisper_output
15. Delete files from TACC	Once the files have been uploaded to Box, delete them from TACC.
16. Unfinished whisper files	This calendar should be used to verify who is available to assume Whisper tasks if a shift concludes before all files have been processed. The calendar can be accessed through the provided link. At the end of each day, the last individual using Whisper is responsible for saving all completed files, properly shutting down the laptop, and storing it in Ana Pau's file cabinet.	Digital Smartsheet: https://utexas.app.box.com/file/2089093276662?s=0an0ofh1msyc6sycwbbhzgcimbkn1pfz