/
TACC

TACC

TACC is the Texas Advanced Computing Center. TACC services include access to several supercomputers that process large, complicated datasets. Each supercomputer (e.g., stampede2, frontera, lonestar, jetsream) has its own intense documentation. Therefore, it doesn't make sense to know about many supercomputers. In my case, I will be reviewing the Stampede2 supercomputer. As you can tell by clicking on the link, the documentation is thorough. This page is meant to break down Stampede2 into the most practical parts.

Getting a TACC account

You will need to start by getting a TACC account. Go to this page (Get an Account), read the required documentation, and fill in all required information. Once you have an account, you can sign in to the TACC user portal:


It should like this, but it won't list your resources because you don't have any yet. You will need your PI (me or Sara) to add you to our projects. When you get access you can see your projects by hovering over Allocations on the page ribbon and clicking "Projects and Allocations." This is what my page looks like.

 


Click "View Project Details" and you will see this page:

This page lists your allocations. You should have access to your laboratory's Corral drive (which I will explain later) that gives you hard drive space. However, you will need to request processing hour. These are called SUs. These are (in hours) the total processing you need to complete your job.  Click on "Request Increase" and check "stampede2."

This is an online form where you can request processing hours. It will ask you for the number of SUs you want and a justification. For now, just put 100 SUs with this justification "2 Nodes @ 1 hour for 50 jobs = 100 SUs." Once the form is submitted, you will need to wait until TACC approves your SU allocation. Once you hear back, you can continue with the next steps.

Logging onto TACC

Assuming you are using a MAC, almost everything is done through terminal (if you are unfamiliar with terminal coding, see my Shell Coding Page). In order to access a TACC supercomputer you need to type in the following command:

ssh -XY (username)@(Supercomputer).tacc.utexas.edu

where the username is your personal username and the supercomputer is stampede2 (in this case). You will be prompted towards two-factor identification with your password and a code sent to your phone.

Stampede2 has three main drives:

  1. Work drive - $WORK (This is the drive you execute stuff from)
  2. Home drive - $HOME (For storing a few small files)
  3. Scratch drive - $SCRATCH (This is the drive you temporarily store your data when performing an analysis).

You can change to these directories by cd'ing to the right term listed above. For example

cd $WORK 

will get you to the work drive. Try cd'ing to each drive and enter pwd to get an idea of what the folder structure is like. If you have a corral account you can get to it using commands like this:

cd /corral-repl/utexas/(Your labs' project name)/

The project name should be listed in your portal account. Mine is "Comparing-mem-sys." Corral is a massive storage server where you are supposed to permanently keep your data. Thus, if you are performing an analysis, you should copy all the necessary files over to the SCRATCH drive first, and then have your scripts pull those data. After processing, you can move these back to corral.

Example: cp -r /corral-repl/utexas/(Your labs' project name)/Test_Folder   /scratch/(Your TACC Number)/(Your Username)/

Interactive Sessions

You cannot just start throwing commands out on a TACC supercomputer. Doing this will draw distributed resources at a fast rate and will hurt other people's jobs. Instead, you must ask for resources. There are two ways to do this. The first is idev

idev

idev stands for interactive development. idev is good for first-time users and testing out jobs. It's very easy to get started. Simply type idev into the command line and hit Enter. You will get a whole bunch of information until you hit this screen:

When you start an idev session, the computer starts looking for the resources you need. The default setting of idev is to give you access to a single node and 30 minutes of time. You can adjust many settings to your needs:

idev -p normal -N 2 -n 8 -m 150

This command will open up a session with 2 nodes, where you will run 8 total tasks in normal node for no more than 2.5 hours. The "normal" component means you are using normal nodes. These are not as efficient as the SKX or ICX ones, which can run faster (see stampede2 documentation). Note that you can't already be in an idev session if you want to change your settings. you must exit out of the session and create another one. Also note that the more time and nodes you request, the longer it will take to get those resources. Instead of seeing a few PD's for job status, there will be a lot before it can run (R).

Opening Matlab, AFNI, and other software

Stampede2 is setup with a ton of software, such as Matlab and Python, so you won't have to download these. However, they don't come preloaded. You need to make them available by typing this on every session:

module load (software)

Software can be matlab, python, or whatever they have. To see all available software (some of which you might not be able to use), type:

module spider (it may take a while)

Unfortunately, AFNI isn't on TACC. However, you can download it. This can be tricky because TACC doesn't allow you to sudo. You will likely need to contact a systems administrator to set this up. Once, you have it downloaded, you can type 

module load afni

to get started.

SBatch (now were getting serious)

Whereas idev is good for testing out jobs, SBatch is a great method for parallelizing jobs. SBatch is really just a job scheduler, but you can schedule them all at the same time (within certain limitations)! First, you will need to download the following set of files and move them to your $WORK drive. I suggest making a folder called "TACC_Utilities" and moving all these files into it. You can move the files using the scp command from your local computer:

scp (Path to your downloaded files)/*  (your TACC username)@stampede2.tacc.utexas.edu:/(path to your TACC_Utilities Folder)/

Once these files are in the desired TACC folder, go to that folder. You will need two files to start your sbatch:

  1. The command list file
  2. The Script you want to run
  3. The Launch file.
Command List File

This file is going to be the list of commands you want to parallelize. Below are examples that you need to keep in your "TACC_Utilities" folder. 

Note that each line of both scripts is a "job." In other words, each line is what is parallelized. The CShell command will run the example script (see below) under different conditions (1-10). The matlab script shows you the syntax for running matlab commands without opening its GUI. This script will work the same as the CShell command so I wont discuss it further.

The Script you want to run

This is the actual script you want to run several time in parallel (you don't need to make all your jobs use the same script, but I am doing it here for simplicity). As long as your script is configured properly, you don't need to do anything special other than to keep it in the TACC_Utilities folder.

The Launch File

Below is an example of the Launch file you need. This will be the only script you actually run. It controls the output of the jobs you want to parallelize. There are several parameters you need to set:

launch -N (NODES) -n (JOBS) -J (NAME OF JOB) -s (YOUR COMMAND LIST FILE) -o (THE NAME OF YOUR OUTPUT FILE) -m (YOUR EMAIL) -p normal -r (HOW LONG IT WILL TAKE TO DO ONE JOB) -A (THE NAME OF THE PI's PROJECT YOU HAVE A TACC ACCOUNT UNDER)

Tips: 

  1. In general, the number of commands you are running, nodes, and jobs, should be equal
  2. Name your job something easily recognizable.
  3. Same with your output file.
  4. You don't have to give your email, but it is cool because it lets you know when your Sbatch starts and finishes.
  5. You don't have to list the node types (-p) as normal, there are other processors that are faster (e.g., ICX and SKX).
  6. Always overestimate how long it will take. If you underestimate this then you have a bunch of unfinished scripts and waster hours of processing.

GIVE THIS A TRY FOR YOURSELF! 

Once you know your script works and everything is in the right place you can cd to the TACC_Utilities folder and run the Launcher with bash Launch_jobs.sh

Happy Slurming!







Related content