Linux and Lonestar 5 Setup -- GVA2020
- 1 Overview:
- 2 Objectives:
- 3 Example things you will encounter in the course:
- 4 Tutorial:
- 4.1.1.1 Standard output
- 4.1.1.2 Standard output plus hidden files
- 4.1.1.3 Standard output plus hidden files in a single column
- 4.1.1.4 Standard output plus hidden files in a single column with additional information
- 4.1.1.5 Standard output plus hidden files in a single column
- 4.1.1.6 Creating a shortcut to the main Lonestar working directories
- 4.1.1.7 Print the contents of the .profile file to the screen
- 4.1.1.8 How to start the nano text editor
- 4.1.1.9 Redirecting STDOUT
- 4.1.1.10 Example move (mv) commands to rename the file something better
- 4.1.2 Diagram of Lonestar5 directories: What connects to what, how fast, and for how long.
- 4.1.3 Understanding "jobs" and compute nodes.
- 4.2 Running a job
- 4.3 Interrogating the launcher queue
- 4.4 Evaluating your first job submission
- 4.5 Transferring files to and from lonestar with scp
- 4.6 scp tutorial page.
- 4.7 Moving beyond the preinstalled commands on TACC
- 4.7.1 TACC modules
- 4.7.2 Downloading from the web directly to tacc
- 4.7.2.1 wget alternative
- 4.7.3 Github
- 4.7.4 pip
- 4.8 This concludes the the linux and lonestar refresher/introduction tutorial.
Overview:
This portion of the class is devoted to making sure we are all starting from the same starting point on lonestar. This tutorial was developed as a combined version of multiple other tutorials which were previously given credit here. Anyone wishing to use this tutorial is welcome.
This is probably the longest tutorial in the entire class. It is designed to take between 1/2 and 3/4 of the first class. Do not stress if you feel people are moving through it faster than you are, or if you do not get it done before the next presentation. There will be links back to this tutorial from other tutorials as needed, and by the 2nd half of Wednesday's class when we start with the specialized tutorials, you can circle back to this tutorial as well.
Objectives:
Familiarize yourself with the way course material will be presented.
Log into lonestar5.
Change your lonestar profile to the course specific format.
Refresh understanding of basic linux commands with some course organization.
Review use of the nano text editor program, and become familiar with several other text editor programs.
Example things you will encounter in the course:
As this is the first real tutorial you are encountering in this course, some housekeeping matters to familiarize you with how information will be presented.
Code blocks
There will be 4 types of code blocks used throughout this class. Text inside of code blocks represent at least 1 possible correct answer, and should either be typed EXACTLY into the terminal window as they are, or copy pasted. There is a notable exception that text between <> symbols represent something that you need to replace before sending it to the terminal. Yes, the <> marks themselves also need to be replaced. We try to put informative text within the brackets so you know what to replace it with. If you are ever unsure of what to replace the <> text with, just ask.
Visible
These are code blocks that you would have no idea what to type without help. (like when a new command is being introduced)
These will typically be associated with longer/more detailed text above the text box explaining things.
An example code block showing you the command you need to type into the prompt to list what directory you are currently in:
pwd
Hinted
These are code blocks that you can probably figure out what to type with a hint that goes beyond what the tutorial is requesting. Access the hint by clicking the triangle or hint hyperlink text.
These exist to force you to think about what command you need, and hopefully make some connections to help you remember what you will need to type in the future.
These should all come with additional explanation as to what is going on.
Rather than just expanding these by reflex, I strongly suggest seeing if you can figure out what the command will be, and checking your work
Example:
Hidden:
These code blocks represent things that you should have seen several times already, or things that can be succinctly explained.
Example:
use the pwd command to print your current working directory
pwd
Speed bump:
This combines the previous 2 types to deliberately slow you down and be cumbersome.
If you find yourself consistently wrong about what eventually shows up in the text box, slow down, step back, think about whats going on, and consider asking a question.
These should only come after you have seen the same (or very similar) commands in the other formats previously
Example:
Warnings
Why the tutorials have warnings?
Warnings exist for 2 reasons:
Something you are about to do can have negative impact on you
You saw an example of this talking about paying attention to warnings when using ssh to access new remote computers
Something you are about to do can have negative impacts on others
this will be related mostly to the use of "idev" sessions beginning tomorrow.
Info boxes
These are used to give more general background about things
These are somewhat new this year, and feedback on them is welcomed.
Tip boxes
Things I wish I knew sooner
As an example: On the command line, you can use the tab key to try to autofill the "rest" of whatever you are typing, weather it is the name of the directory, a long file, or even a command. Hitting tab twice will list all possible matches to whatever you have already typed when there are multiple different possibilities
Tutorial:
Logging into lonestar5
Hopefully you were able to log into ls5 last week as part of the pre-class assignment. If not make sure the instructor is aware as there are additional elements that still need to be addressed (potentially adding you to the project allocation and definitely being added to the reservation that we will use starting tomorrow).
When prompted enter your password, and digital security code from the app, and answer "yes" to the security question if you see one.
As a reminder, the ssh command, and launching programs to give you the prompt to type them was provided as part of the pre-class assignment. Convenient links incase you need them or want to refresh your memory:
Setting up your lonestar profile
There are many flavors of Linux/Unix shells. The default for TACC's Linux (and most other Linuxes) is bash (bourne again shell), which we will use throughout.
Whenever you login via an interactive shell as you did above, a well-known script is executed by the shell to establish your favorite environment settings. I've set up a common profile for you to start with that will help you know where you are in the file system and make it easier to access some of our shared resources. If you already have a profile set up on lonestar that you like, we want to make sure that we don't destroy it but it is critical to make sure that we change it temporarily so everyone is working from the same place through the class. Use the ls command to check if you have a profile already set up in your home directory.
If you already have a .profile or .bashrc file, use the mv command to change the name to something descriptive (for example ".profile_pre_GVA_backup"). Otherwise continue to creating a new files.
A warning about deleting files
Most of us are used to having an 'undo' button, trash/recycling collection of deleted files, or warnings when we tell a computer to do something that can't be undone. The command line offers none of these options. In extreme situations on TACC, you can use the help desk ticket system to recover a deleted file, but there is no guarantee files can be recovered under normal circumstances (we will cover exceptions to this later).
The specific warning right now is that if you have an existing profile, and have not done the above commands correctly, you will not be able to recover your existing profile. Thus this is a great opportunity to interact with your instructor and make 100% the above steps have been correctly performed. Type ls -al onto the command line and then share your screen on zoom if you are not sure
Now that we have backed up your profiles so you won't lose any previous settings, you can copy our predefined GVA2020.bashrc file from the /corral-repl/utexas/BioITeam/scripts/ folder to your $HOME folder as .bashrc and the predefined GVA2020.profile as .profile from the same location before using the chmod command to change the permissions to read and write for the user only.
The chmod 700 <FILE> command marks the file as readable/writable/executable only by you. The .bashrc script file will not be executed unless it has these permissions settings.
Understanding why some files start with a "."
In the above code box, you see that the names start with a . when a filename starts with a . it conveys a special meaning to the operating system/command line. Specifically, it prevents that file from being displayed when you use the ls command unless you specifically as for hidden files to be displayed using the -a option. Such files are termed "dot-files" if you are interested in researching them further.
Let's look at a few different ways we will use the ls command throughout the course. Compare the output of the following 4 commands:
Standard output
ls #ignore everything that comes after the # mark. There is a problem on this wiki page but things after a # wont effect commands
Standard output plus hidden files
ls -a
Standard output plus hidden files in a single column
ls -a -1
Standard output plus hidden files in a single column with additional information
ls -a -lThroughout the course you will notice that many options are supplied to commands via a single dash immediately followed by a single letter. Usually when you have multiple commands supplied in this manner you can combine all the letters after a single dash to make things easier/faster to type. Experiment a little to prove to yourself that the following 2 commands give the same output.
Standard output plus hidden files in a single column
ls -a -1
ls -alWhile knowing that you can combine options in this way helps you analyze data faster/better, the real value comes from being able to decipher commands you come across on help forums, or in publications.
For ls specifically the following association table is worth making note of, but if you want the 'official' names consider using the man command to bring up the ls manual.
Getting back to your profile... Since .bashrc is executed when you login, to ensure it is set up properly you should first logout:
then log back in:
If everything is working correctly you should now see this as your prompt:
tacc:~$If you see anything besides "tacc:~$", get my attention and be ready to share your screen rather than continuing forward.
Setting up other shortcuts:
In order to make navigating to the different file systems on lonestar a little easier ($SCRATCH and $WORK), you can set up some shortcuts with these commands that create folders that "link" to those locations. Run these commands when logged into lonestar with a terminal, from your home directory.
Creating a shortcut to the main Lonestar working directories
cdh
ln -s $SCRATCH scratch
ln -s $WORK work
ln -s $BI BioITeam
Several people report seeing an error message stating "ln: failed to create symbolic link 'BioITeam/BioITeam': Permission denied." This is being investigated, but is not expected to impact today's tutorial.
Understanding what your .bashrc file actually does.
Editing files
There are a number of options for editing files at TACC. These fall into three categories:
Linux text editors installed at TACC (nano, vi, emacs). These run in your terminal window. vi and emacs are extremely powerful but also quite complex, so nano is the best choice as a first local text editor. If you are already familiar with one of the other programs you are welcome to continue using it.
Text editors or IDEs that run on your local computer but have an SFTP (secure FTP) interface that lets you connect to a remote computer (Notepad++ or Komodo Edit). Once you connect to the remote host, you can navigate its directory structure and edit files. When you open a file, its contents are brought over the network into the text editor's edit window, then saved back when you save the file.
Software that will allow you to mount your home directory on TACC as if it were a normal disk e.g. MacFuse/MacFusion for Mac, or ExpanDrive for Windows or Mac ($$, but free trial). Then, you can use any text editor to open files and copy them to your computer with the usual drag-drop.
We'll go over nano together in class, but you may find these other options more useful for your day-to-day work so feel free to go over these sections in your free time to familiarize yourself with their workings to see if one is better for you.
As we will be using nano throughout the class, it is a good idea to review some of the basics. nano is a very simple editor available on most Linux systems. If you are able to use ssh, you can use nano. To invoke it, just type:
How to start the nano text editor
nano
You'll see a short menu of operations at the bottom of the terminal window. The most important are:
ctl-o - write out the file
ctl-x - exit nano
You can just type in text, and navigate around using arrow keys. A couple of other navigation shortcuts:ctl-a - go to start of line
ctl-e - go to end of line
Be careful with long lines – sometimes nano will split long lines into more than one line, which can cause problems in our commands files, and if you copy paste code into a nano editor.
What can you do to see contents of a file without opening it for editing?
Note that all of the above state that it is bad to view binary files. Binary files exist for computers to read, not humans, and are thus best ignored. We'll go over this in more detail as well as some conversion steps when we deal with .sam and .bam files later in the course.
How should we name files and folders?
In general you will want to adopt a consistent pattern of naming, and it should be your own and something that makes sense to you. After that there are some tips:
The most important thing to get used to is the convention of using . _ or capitalizing the first letter in each word in names rather than spaces in names, and limiting your use of any other punctuation. Spaces are great for mac and windows folder names when you are using visual interfaces, but on the command line, a space is a signal to start doing something different. Imagine instead of a BioITeam folder you wanted to make it a little easier to read and wanted to call it "Bio I Team" certainly everyone would agree its easier to read that way, but because of the spaces, bash will think you want to create 3 folders, 1 named Bio another named I and a third named Team. Now this is certainly behavior you can use when appropriate to your advantage, but generally speaking spaces will not be your friend. Early on in my computational learning I was told "A computer will always do exactly what you told it to do. The trick is telling it to do what you want it to do".
Name things something that makes it obvious to you what the contents are not just today but next week, next month, and next year even if you don't touch the it for weeks-months-years.
Stringing commands together and controlling their output
In a linux shell, it is often useful to take output of one command save it to a new file rather than having it print to the screen. It uses a familiar metaphor: "pipes". The linux operating system expects some "standard input pipe" and gives output back through a "standard output pipe". These are called "stdin" and "stdout" in linux. There's also a special "stderr" for errors; we'll ignore that for now. Usually, your shell is filling the operating system's stdin with stuff you type - the commands with options. The shell passes responses back from those commands to stdout, which the shell usually dumps to your screen. The ability to switch stdin and stdout around is one of the key reasons linux has existed for decades and beat out many other operating systems. Let's start making use of this. Change to the scratch directory and make a new folder called "piping" and put list of the full contents of the $BI folder to a new file called whatisHere.
Redirecting STDOUT
cds
mkdir piping
ls -1 $BI > whatisHere
cat whatisHereWhen you execute the ls -1 > whatisHere command, you should have noticed nothing happened on the screen, and when you cat the whatisHere file, you should notice the output you would have expected from the ls -1 > whatisHere command. Often it is useful to chain commands together using the output of the first command as the input of the second command. Commands are chained together using the "|" character (shift \ above the return key). Use redirection to put the first 2 lines of the $BI directory contents into the whatisHere file.
Again, you should see your answer only showing up after the cat command. Note that by using a single > you are overwriting the existing contents. This is now your second warning that there is no warning that a file is about to be deleted, also remember linux doesn't have an "undo" features or trash/recycle bin functionality you may be used to from mac/windows. We will make use of the redirect output (stdout) character (>), and the "pass output along as input" "|" throughout the course. Not all shells are equal - the bash shell lets you redirect stdout with either > or 1>; stderr can be redirected with 2>; you can redirect both stdout and stderr using &>. If these don't work, use google to try to figure it out. The web site stackoverflow is a usually trustworthy and well annotated site for OS and shell help.
Understanding TACC
Now that we've been using lonestar for a little bit, and have it behaving in a way that is a little more useful to us, let's get more of a functional understanding of what exactly it is and how it works.
Diagram of Lonestar5 directories: What connects to what, how fast, and for how long.
Lonestar is a collection of 1,252 computers with 24 cores connected to three file servers, each with unique characteristics. You need to understand the file servers to know how to use them effectively.
| $HOME | $WORK | $SCRATCH |
|---|---|---|---|
Purged? |