Using the stampede2 cluster

This page is a quick start for using the stampede2 cluster at TACC.

The stampede2 User Guide

For complete up-to-date information, always see: TACC's stampede2 User Guide

Logging in

Start a new terminal window. For MACs this is done by clicking on the magnifying glass on the right hand side of the toolbar at the top of the page and type "terminal".  For windows this can be done using putty or cygwin. 

Before logging onto TACC servers, multi-factor authentication must be set up. Click here for an overview of this process, and click here to begin setting it up

You will need to provide your password and a TACC token to successfully log in to stampede2 or any other TACC cluster.

ssh <my_user_name>@stampede2.tacc.utexas.edu

Setting up up a profile

There are many flavors of Linux/Unix shells. The default for TACC's Linux (and most other Linuxes) is bash (bourne againshell), which we will use throughout.

Whenever you login via an interactive shell as you did above, a well-known script is executed by the shell to establish your favorite environment settings. We've set up a common profile for you to start with that will help you know where you are in the file system and make it easier to access some of our shared resources. To set up this profile, do the following steps after logging in:

Copy a preconfigured "profile" to use with your account



cdh
mv ~/.profile ~/.profile.old
cp /work2/projects/BioITeam/projects/courses/Core_NGS_Tools/tacc/bashrc.corengs.stampede2 ~/.profile
echo "export PATH=\$PATH:/work2/projects/BioITeam/common/bin/" >> ~/.profile
chmod 600 .profile
source .profile
ls 


The chmod 600 .profile command marks the file as readable/writable only by you. The .profile script file will not be executed unless it has these permissions settings. Note that the well-known filename is .profile (or .profile_user on some systems), which is specific to the bash shell.

Notice that when you do a normal ls to list the contents of your home directory, this file doesn't appear. That's because it's a hidden "dot file" – a file that has no filename, only an extension. To see these hidden files, use the -a (all) switch for ls.

Here, you also got a chance to see several routine and important unix commands in use: ls to list all files, mv to move a file to a different location or in this case, to rename a file, cp to make a copy of an existing file.  You also saw the concept of  wildcards in specifying path: ~ indicating home directory and the use of an already set alias : cdh to change to the home directory.

Transferring Files to and from stampede2

Obtaining the Path

It's a good idea to open 2 terminals for transferring files.

  • One logged in on stampede2 with the current directory set to where you want to transfer files to or from
  • One on your computer with the current directory set to where you want to transfer files to or from

On stampede2:

Go to the directory where you want your files to be or where you want to copy from.

Type

To find the absolute path, needed to pull or push files from/to Stampede
pwd

This gives the absolute path to your directory. It might start with "home" or "work" depending on what directory you're in.

Mac/Linux

 On a local Mac or Linux machine

Transferring files to stampede2

On your computer's side:

Go to the directory where you want to copy files from.

 scp stuff.fastq my_user_name@stampede2.tacc.utexas.edu:/home/.../

Replace the "/home/.../" with the "pwd" information obtained earlier.

This command would transfer "stuff.fastq" in your current directory to a specified directory on stampede2.

Transferring files from stampede2

On your computer's side:

Go to the directory where you want to copy files to.

 scp my_user_name@stampede2.tacc.utexas.edu:/home/.../stuff.fastq ./

Replace the "/home/..." with the "pwd" information obtained earlier.

This command would transfer "stuff.fastq" from the specified directory on stampede2 to your current directory on your computer.

Copying Directories

Sometimes you may want to transfer more than one file.

If you wanted to transfer a directory, use the -r option like so:

 scp -r my_folder my_user_name@stampede2.tacc.utexas.edu:/...

You can also transfer directories from stampede2 in the same manner:

 scp -r my_user_name@stampede2.tacc.utexas.edu:/home/.../my_folder ./

Windows

 On a local Windows machine

SSH Secure File Transfer (Windows) is available as part of the SSH Secure Shell client which can be downloaded from Bevoware. You can also use winscp (free to download).


Modules

Modules are programs or sets of programs that have been set up to run on TACC. They make managing your computational environment very easy. All you have to do is load the modules that you need and a lot of the advanced wizardry needed to set up the linux environment has already been done for you. New commands just appear.

To see all modules available in the current context, type:

 module avail
 module load gatk

Why not load all the modules by default? Well, you actually may want to add many of the moduels that you encounter in later tutorials to be loaded on login. The reason they are not loaded by default is to keep things lean for those people simulating hurricanes who don't want to load Bioperl every time they log in. Occasionally two different modules also don't play nice together and you will get messages that you have to "swap" one for another.

Since module avail only shows modules in the current context (i.e. based on your currently loaded modules), to see all possible modules use:

 module spider <freetext>

If you specify some text for <freetext>, you'll see all modules with that text anywhere in their title or description. For example, try to find the transcriptome assembler Trinity.

Containers

A container is a way to encapsulate an application's code with all of its dependancies so that it can be run anywhere with no or minimal setup. TACC uses singularity as its container solution. Biocontainers, a project that containerizes bioinformatics software with all its dependencies is available on TACC clusters. More information about the bioinformatics tools present in Biocontainers can be found here

 module load biocontainers

Now let's go on to look at the directory structure at stampede2.