Content Comparison

...

The small set of login nodes are a shared resource – type the users command to see everyone currently logged in. Login nodes are not meant for running interactive programs – for that you submit a description of what you want done to a batch system, which distributes the work to one or more compute nodes.

...

Do not perform significant network access from your batch jobs.
- Instead, stage your data from a login node onto $SCRATCH before submitting your job.

...

Here is a comparison of the configurations and ls6 and stampede3. stampede3 is the newer (and larger) cluster, just launched in 2024; ls6 was launched in 2022.

	ls6	stampede3
login nodes	3 128 cores each 256 GB memory	4 96 cores each 250 GB memory
standard compute nodes	560 AMD Epyc 7763 "Milan" nodes 128 cores per node 256 GB system memory 288 GB local storage on /tmp	560 Intel Xeon "Sapphire Rapids" nodes 112 cores per node 128 GB memory 1060 Intel Platinum 8160 "Skylake" nodes 48 cores per node 192 GB memory 224 Intel Xenon Platinum 8380 "Ice Lake" nodes 80 cores per node 256 GB memory
GPU nodes	88 AMD Epyc 7763 "Milan" nodes 128 cores per node 256 GB memory 84 GPU nodes have 3x NVIDIA A100 GPUs w/40GB RAM each 4 GPU nodes have 2x NVIDIA H100 GPUs w/40GB RAM each	20 GPU Max 1550 "Ponte Vecchio" nodes 96 cores per node 512 GB memory 4x Intel GPU Max 1550 GPUs w/128GB RAM each
batch system	SLURM	SLURM
maximum job run time	48 hours, normal queue 2 hours, development queue	48 hours on GPU nodes 24 hours on other nodes, normal queue 2 hours, development queue

User guides for ls6 and stampede3 can be found at:

https://docs.tacc.utexas.edu/hpc/lonestar6/
- see https://docs.tacc.utexas.edu/hpc/lonestar6/#running-queues for all available queues
https://docs.tacc.utexas.edu/hpc/stampede3

...

For example, the following module load command makes the singularity apptainer container management system available to you:

Code Block

language	bash
title	How module load affects $PATH

# first type "singularityapptainer" to show that it is not present in your environment:
singularityapptainer
# it's not on your $PATH either:
which singularityapptainer

# now add biocontainers to your environment and try again:
module load biocontainers
# and see how singularityapptainer is now on your $PATH:
which singularityapptainer
# you can see the new directory at the front of $PATH
echo $PATH

# to remove it, use "unload"
module unload biocontainers
singularityapptainer
# gone from $PATH again...
which singularityapptainer

Note that apptainer is another name for singularity.

...

It is quite a large systems administration task to install software at TACC and configure it for the module system. As a result, TACC was always behind in making important bioinformatics software available. To address this problem, TACC moved to providing bioinformatics software via containers, which are similar to virtual machines like VMware and Virtual Box, but are lighter weight: they require less disk space because they rely more on the host's base Linux environment. Specifically, TACC (and many other HPC/High Performance Computing clusters) use Singularity Apptainer containers, which are similar to Docker containers but are more suited to the HPC environment – in fact one can build a Docker container then easily convert it to Singularity Apptainer for use at TACC.

TACC obtains its containers from BioContainers (https://biocontainers.pro/ and https://github.com/BioContainers/containers), a large public repository of bioinformatics tool Singularity Apptainer containers. This has allowed TACC to easily provision thousands of such tools!

...

Tip

The standard TACC module system has been phased out for bioinformatics programs, so always look for your application in BioContainers.

While it's great that there are now hundreds of programs available through BioContainers, the one drawback is that they can only be run on cluster compute nodes, not on login nodes.

To test BioContainer program interactively , – or even to ask the program for its usage – you will need to use TACC's idev command to obtain an interactive cluster node. More on this shortly...

...

Code Block

language	bash

# Load the Biocontainers master module 
module load biocontainers

# Verify kallisto is not yet available
kallisto 

# Load the default kallisto biocontainer 
module load kallisto

# Verify kallisto is notavailable available-- (althoughbut not on login nodes)
kallisto

Note that loading a BioContainer does not add anything to your $PATH. Instead, it defines an alias, which is just a shortcut for executing the command using its container. You can see the alias definition using the type command. And you can ensure the program is available using the command -v utility.

Code Block

language	bash

# Note that kallisto has not been added to your $PATH, but instead has an alias
which kallisto

# Ensure kallisto is available with command -v
command -v kallisto

# To see how TACC calls the kallisto container:
type kallisto

installing custom software

...

Create a commands file containing exactly one task per line.
Prepare a job control file for the commands file that describes how the job should be run.
You submit the job control file to the batch system.
1. The job is then said to be queued to run.
The batch system prioritizes the job based on the number of compute nodes needed and the job run time requested.
When compute nodes become available, the job tasks (command lines in the <job_name>.cmds file) are assigned to one or more compute nodes and begin to run in parallel.
The job completes when either:
1. you cancel the job manually
2. all job tasks complete (successfully or not!)
3. the requested job run time has expired

...

Here are the main components of the SLURM batch system.

	ls6, stampede3, ls5
batch system	SLURM
batch control file name	<`job_name>.slurm`
job submission command	`sbatch <job_name>.slurm`
job monitoring command	`showq -u`
job stop command	`scancel -n <job name>`

...

Let's go through a simple example. Execute the following commands to copy a pre-made premade simple.cmds commands file:

...

There are 8 tasks. Each task sleeps for 5 seconds, then uses the echo command to output a string containing the task number and date to a log file named for the task number. Notice that we can put two commands on one line if they are separated by a semicolon ( ; ).

You can count the number of lines in the simple.cmds file using the wc (word count) command with the -l (lines) option:

Code Block

language	bash
title	Count file lines

wc -l simple.cmds

Use the handy launcher_creator.py program to create the job control file.

Expand

title	Setup (if needed)...

launcher_creator.py -j simple.cmds -n simple -

Code Block

title
language	bash

Create batch submission script for simple commands

Set up ~/local/bin directory

mkdir -p ~/local/bin
cd ~/local/bin
ln -s -f /work/projects/BioITeam/common/bin/launcher_creator.py

Code Block

language	bash
title	Create batch submission script for simple commands

launcher_creator.py -j simple.cmds -n simple -t 00:01:00 -a TRA23004 -q developmentOTH21164

You should see output something like the following, and you should see a simple.slurm batch submission file in the current directory.

Code Block

Project simple.
Using job file simple.cmds.
Using developmentnormal queue.
For 00:01:00 time.
Using TRA23004OTH21164 allocation.
Not sending start/stop email.
Launcher successfully created. Type "sbatch simple.slurm" to queue your job.

...

Code Block

language	bash
title	Submit simple job to batch queue

sbatch simple.slurm 
showq -u

# Output looks something like this:
-------------------------------------------------------------
          Welcome to the Lonestar6 Supercomputer
-------------------------------------------------------------
--> Verifying valid submit host (login2)...OK
--> Verifying valid jobname...OK
--> Verifying valid ssh keys...OK
--> Verifying access to desired queue (normal)...OK
--> Checking available allocation (TRA23004)...OK
Submitted batch job 1722779

...

 desired queue (normal)...OK
--> Checking available allocation (OTH21164)...OK
Submitted batch job 2411919

The queue status will initially show your job as WAITING until a node becomes available:

Code Block

SUMMARY OF JOBS FOR USER: <abattenh>

ACTIVE JOBS--------------------
JOBID     JOBNAME    USERNAME      STATE   NODES REMAINING STARTTIME
================================================================================

WAITING JOBS------------------------
JOBID     JOBNAME    USERNAME      STATE   NODES WCLIMIT   QUEUETIME
================================================================================
2411919   simple     abattenh      Waiting 1     0:01:00   Wed May 28 15:39:24

Total Jobs: 1     Active Jobs: 0     Idle Jobs: 1     Blocked Jobs: 0

Once your job is ACTIVE (running) you'll see something like this:

Code Block

SUMMARY OF JOBS FOR USER: <abattenh>

ACTIVE JOBS--------------------
JOBID     JOBNAME    USERNAME      STATE   NODES REMAINING STARTTIME
================================================================================
1722779   simple     abattenh      Running 1      0:00:39  Sat Jun  1 21:55:28

WAITING JOBS------------------------
JOBID     JOBNAME    USERNAME      STATE   NODES WCLIMIT   QUEUETIME
================================================================================

Total Jobs: 1     Active Jobs: 1     Idle Jobs: 0     Blocked Jobs: 0

...

Every job, no matter how few tasks requested, will be assigned at least one node. Each lonestar6 node has 128 physical cores, so each of the 8 tasks can be assigned to a different core.

...

Expand

title	Answer

ls should show you something like this:

Code Block
cmd1.log cmd3.log cmd5.log cmd7.log simple.cmds simple.o924965 cmd2.log cmd4.log cmd6.log cmd8.log simple.e924965 simple.slurm

The newly created files are the .log files, as well as error and output logs simple.e924965 and simple.o924965.

filename wildcarding

You can look at one of the output log files like this:

...

Code Block

Command 1 on c304-005.ls6.tacc.utexas.edu - Sat Jun  1 21:55:47 CDT 2024
Command 2 on c304-005.ls6.tacc.utexas.edu - Sat Jun  1 21:55:47 CDT 2024
Command 3 on c304-005.ls6.tacc.utexas.edu - Sat Jun  1 21:55:51 CDT 2024
Command 4 on c304-005.ls6.tacc.utexas.edu - Sat Jun  1 21:55:48 CDT 2024
Command 5 on c304-005.ls6.tacc.utexas.edu - Sat Jun  1 21:55:45 CDT 2024
Command 6 on c304-005.ls6.tacc.utexas.edu - Sat Jun  1 21:55:49 CDT 2024
Command 7 on c304-005.ls6.tacc.utexas.edu - Sat Jun  1 21:55:51 CDT 2024
Command 8 on c304-005.ls6.tacc.utexas.edu - Sat Jun  1 21:55:48 CDT 2024

echo

Lets take a closer look at a typical task in the simple.cmds file.

...

The echo command is like a print statement in the bash shell. echo takes its arguments and writes them to standard output. While not always required, it is a good idea to put echo's output string in double quotes ( " ).

backtick evaluation

So what is this funny looking `date` bit doing? Well, date is just another Linux command (try just typing it in) that just displays the current date and time. Here we don't want the shell to put the string "date" in the output, we want it to execute the date command and put the result text into the output. The backquotes ( ` ` ) also called backticks) , around the date command tell the shell we want that command executed and its standard output substituted into the string. (Read more about Quoting in the shell)

...

The '2>&1' part says to redirect standard error to the same place. Technically, it says to redirect standard error (built-in Linux stream 2) to the same place as standard output (built-in Linux stream 1); and since . Since standard output is going to cmd3.log, any standard error will go there also. (Read more about Standard streams and redirection)

When the TACC batch system runs a job, all outputs generated by tasks in the batch job are directed to one output and error file per job. Here they have names like simple.e924965e2411919 and simple.o924965o24119195.

simple.

...

o2411919 contains all standard output

...

generated by your task that was not redirected elsewhere
simple.

...

e2411919 contains all standard error generated by your tasks that was not redirected elsewhere

...

both also contain information relating to running your job and its tasks

...

For large jobs with complex tasks, it is not easy to troubleshoot execution problems using these files.

...

The launcher module knows how to interpret various job parameters in the <job_name>.slurm batch SLURM submission script and use them to create your job and assign its tasks to compute nodes. Our launcher_creator.py program is a simple Python script that lets you specify job parameters and writes out a valid <job_name>.slurm submission script.

...

Code Block

language	bash
title	Get usage information for launcher_creator.py

# Use spacebar to page forward; Ctrl-c or q to exit
launcher_creator.py | more

...

Code Block

language	bash
title	Create batch submission script for simple commands

launcher_creator.py -j simple.cmds -n simple -t 00:01:00 -a TRA23004 -q developmentOTH21164

The name of your commands file is given with the -j simple.cmds option.
Your desired job name is given with the -n simple option.
- The <job_name> (here simple) is the job name you will see in your queue.
- By default a corresponding <job_name>.slurm batch file is created for you.
  - It contains the name of the commands file that the batch system will execute.

...

queue name

maximum runtime

purpose

development

2 hrs

development/testing and short jobs

(

typically has short queue wait times

)

normal

48 hrs

normal jobs

(

queue waits are often quite long

)

In launcher_creator.py, the queue is specified by the -q argument.
- The default queue is development. Specify -q normal for normal queue jobs.
The maximum runtime you are requesting for your job is specified by the -t argument.
- Format is hh:mm:ss
- Note that your job will be terminated without warning at the end of its time limit!

...

You specify that allocation name with the -a argument of launcher_creator.py.
If you have set an $ALLOCATION environment variable to an allocation name, that allocation will be used.

Expand

title	Our class ALLOCATION was set in .bashrc

The .bashrc login script you've installed for this course specifies the class's allocation as shown below. Note that this allocation will expire after the course, so you should change that setting appropriately at some point.

Code Block

language	bash
title	ALLOCATION setting in .bashrc

# This sets the default project allocation for launcher_creator.py
export ALLOCATION=TRA23004OTH21164

When you run a batch job, your project allocation gets "charged" for the time your job runs, in the currency of SUs (System Units).
SUs are related in some way to node hours, usually 1 SU = 1 "standard" node hour.

...

Exercise: How many tasks are specified in the wayness.cmds file?

...

title	Hint

...

Expand

title	Answer

Find the number of lines in the wayness.cmds commands file using the wc (word count) command with the -l (lines) option:

Code Block

language	bash

wc -l wayness.cmds

The file has 16 lines, representing 16 tasks.

...

Code Block

language	bash
title	Create batch submission script for wayness example

launcher_creator.py -j wayness.cmds -n wayness -w 4 -t 00:02:00 -a TRA23004OTH21164 -q development
sbatch wayness.slurm
showq -u

Exercise:

With 16 tasks requested and wayness of 4, how many nodes will this job require?
How much memory will be available for each task?

Expand

title	Answer

4 nodes (16 tasks x 1 node/4 tasks)
64 GB (256 GB/node * 1 node/4 tasks)

ExerciseExercise:

If you specified a wayness of 2, how many nodes would this job require?
How much memory could each task use?

Expand

title	Answer

8 nodes (16 tasks x 1 node/2 tasks)
128 GB (256 GB/node * 1 node/2 tasks)

...

Code Block

language	bash

cat cmd*log

# or, for a listing ordered by command number (the 2nd space-separated field)
cat cmd*log | sort -k 2k2,2n

The vertical bar ( | ) above is the pipe operator, which connects one program's standard output to the next program's standard input.

...

Code Block

language	bash

   4 c303-005.ls6.tacc.utexas.edu
   4 c303-006.ls6.tacc.utexas.edu
   4 c304-005.ls6.tacc.utexas.edu
   4 c304-006.ls6.tacc.utexas.edu

(Read more about awk in Some Linux commands: awk, and read more about Piping a histogram). And we'll be doing more of this sort of thing soon!

Some best practices

Redirect task output and error streams

...

Code Block

language	bash
title	Redirect standard output and standard error

# Redirect both standard output and standard error to a file


my_program input_file1 output_file1 output_file1 > file1.log 2>&1 > file1.log 2>&1

# Redirect standard output and standard error to separate files
my_program 1>out.txt 2>err.log

Combine serial workflows into scripts

...

Here's an example directory structure

$WORK/my_project /01.original # contains or links to original fastq files /02.fastq_prep # run fastq QC and trimming jobs here /03.alignment # run alignment jobs here /04.analysis # analyze gene overlap here /51 /81.test1 # play around with stuff here /52 /82.test2 # play around with other stuff here

...

Code Block

language	bash
title	Symbolic link to relative path

cd $WORK/my_project/02.fastq_prep
ln -sf ../01.original fq
ls ./fq/my_raw_sequences.fastq.gz

relative path syntax

As we have seen, there are several special "directory names" the bash shell understands:

...

Code Block

language	bash
title	Start an idev session

idev -m 60 -N 1 -A TRA23004OTH21164 -p normal -r CoreNGS--reservation=core-ngs-class-0603

Notes:

-p normal requests nodes on the normal queue
- this is the default for our reservation, while the development queue is the normal default
-m 60 asks for a 60 minute session
-A TRA23004 OTH21164 specifies the TACC allocation/project to use
-N 1 asks for 1 node
--reservation=CoreNGS-Tuecore-ngs-class-0603 gives us priority access to TACC nodes for the today's class. You normally won't use this option.

When you ask for an idev session, you'll see output as similar to that shown below. Note that the process may repeat the "job status: PD" (pending) step while it waits for an available node.

Code Block

  -> Checking on the status of development queue. OK

 -> Defaults file    : ~/.idevrc
 -> System           : ls6
 -> Queue            : developmentnormal         (cmd line: -p        )
 -> Nodes            : 1              (cmd line: -N        )
 -> Tasks per Node   : 128            (Queue default       )
 -> Time (minutes)   : 60             (cmd line: -m        )
 -> Project          : TRA23004OTH21164       (cmd line: -A        )

-----------------------------------------------------------------
          Welcome to the Lonestar6 Supercomputer
-----------------------------------------------------------------

--> Verifying valid submit host (login1login2)...OK
--> Verifying valid jobname...OK
--> Verifying valid ssh keys...OK
--> Verifying access to desired queue (developmentnormal)...OK
--> Checking available allocation (TRA23004OTH21164)...OK
Submitted batch job 2354652412167

 -> After your idev job begins to run, a command prompt will appear,
 -> and you can begin your interactive development session.
 -> We will report the job status every 4 seconds: (PD=pending, R=running).

 -> job status:  PD
 -> job status:  R

 -> Job is now running on masternode= c302c304-005006...OK
 -> Sleeping for 7 seconds...OK
 -> Checking to make sure your job has initialized an env for you....OK
 -> Creating interactive terminal session (login) on master node c302c304-005006.
 -> ssh -Y  -o "StrictHostKeyChecking no" c302c304-005006

Once the idev session has started, it looks quite similar to a login node environment, except for these differences:

the hostname command on a login node will return a login server name like login3.ls6.tacc.utexas.edu
- while in an idev session hostname returns a compute node name like c303-006.ls6.tacc.utexas.edu
you cannot submit a batch job from inside an idev session, only from a login node
your idev session will end when the requested time has expired
- or you can just type exit to return to a login node session

...

Version	Old Version 121	New Version Current
Changes made by	Daryl Barth	Anna Battenhouse
Saved on	Jun 04, 2024	Jun 04, 2025

Content Comparison

Versions Compared

Key

installing custom software

filename wildcarding

echo

backtick evaluation

Some best practices

Redirect task output and error streams

Combine serial workflows into scripts

relative path syntax