The Quickest Unix Refresher ever
Unix Command Cheat Sheet
What's going on when you ssh into a server
SSH (secure shell) lets you connect to a different computer(server). you've logged on, you're now at a Linux command line in your Home directory!
It looks as if you're running directly on the remote computer, but really there are two programs communicating:
your local Terminal
the remote Shell
There are many shell programs available in Linux, but the default is bash (Bourne-again shell).
Your Terminal is pretty "dumb" – just sending what you type over its Secure Sockets Layer (SSL) connection to the remote computer, then displaying the text sent back by the remote computer's shell. The real work is being done on the remote computer, by executable programs called by the bash shell (also called commands, since you call them on the command line).
Structure of commands
When running scripts and software tools, all the inputs you provide to it are called arguments or parameters or options. Each tool/script can be a little different in how it takes its arguments. But typically, they follow a structure.
command <options> inputfile > outputfile
command -option1 value1 -option2 value2 -option3 inputfile > outputfile
option1 and option2 are types that take a value.
option3 is a yes or no flag, so it does not take a value.
Simple Example: ls -la
Example: blastp -query scaffolds.fasta -db TAIR10_pep_20101214 -eval 0.0001 -outfmt 6 -out blastp.out
If you need to find out the options for a command, try any of these (again each tool is different):
command -h
command --h
command -help
command
Exercise 1
What is the difference between typing just ls (then Enter), and ls -a (then Enter)?
What do you see when you enter ls -l?
How can you tell which entries are files and which are directories?
Using help to figure out how to use a command- identify how to construct the blastp command. Identify an option that requires an input vlaue and an option that doesn’t.
Command input errors
You don't always type in commands, options and arguments correctly – you can misspell a command name, forget to type a space, specify an unsupported option or a non-existent file, or make all kinds of other mistakes.
What happens? The shell attempts to guess what kind of error it is and reports an appropriate error message as best it can.
Some examples:
# You try to use an unsupported option
student30@gsafcomp02:~$ ls -z
ls: invalid option -- 'z'
Try 'ls --help' for more information.
# You mis-type a command name, or a command not installed on your system
student30@gsafcomp02:~$ catt
...
Command 'catt' not found...
# You specify the name of a file that does not exist
student30@gsafcomp02:~$ ls xxx
ls: cannot access 'xxx': No such file or directory
# You try to access a file or directory you don't have permissions for
student30@gsafcomp02:~$ cat /etc/sudoers
cat: /etc/sudoers: Permission deniedAlways use tab to complete file names/command names (sometimes may need to double tab) to avoid a lot of these spelling errors etc.
Basic linux commands you should know like breathing air
For accessing directories
ls- list the contents of the current directorypwd- print the present working directory - tells you where you are currently. The format is something like/home/myID- just like on most computer systems, this represents leaves on the tree of the file system structure, also called a "path".rm <file>deletes a file. This is permanent - not a "trash can" deletion.mkdir <dirname>andrmdir <dirname>make and remove the directory "dirname". This only removes empty directories - "rm -r <dirname>" will remove everything.cd <whereto>- change the present working directory to<whereto>You will need to provide a path like /work/myID to change to that directory.Some special
<wheretos>:..(period, period) means "up one level". . means current directory. ~ (tilde) means "my home directory".~myfriend(tilde "myfriend) means "myfriend's home directory".
Detour: Wildcards and special file names
The shell has shorthand to refer to groups of files by allowing wildcards in file names. * (asterisk) is the most common; it is a wildcard meaning "any length of any characters". Other useful ones are []to allow for any character in the set <characters>>
For example: ls *.bam lists all files in the current directory that end in .bam
Three special file names:
.(single period) means "this directory". So ls -l . means "list contents of this current directory"..(two periods) means "directory above current." Sols -l ..means "list contents of the parent directory."~ (tilde) means "my home directory". So
ls -l ~means "list contents of the my home directory."
Exercise 2
List contents of your home directory and change to the my_rnaseq_course/partA directory. List contents of the current directory. List the contents of the directory above.
For copying/moving files
cp <source> <destination>copies the filesourceto the location and/or file namedestination}. Using.(period) means "here, with the same name".cp -r <dirname> <destination>will recursively copy the directorydirnameand all its contents to the directorydestination.scp <user>@<host>:<source> <destination>works just like cp but copiessourcefrom the useruser's directory on remote machinehostto the local filedestinationmv <source> <destination>moves the filesourceto the location and/or file namedestination}.wget <url>fetches a file with a valid URL. It's not that common but we'll usewgetto pull data from one of TACC's web-based storage devices.
For looking into files
nano - The text editor we'll be using to edit a filehead <file>andtail <file>shows you the top or bottom 10 lines of a file<file>more <file>andless <file>both display the contents of<file>in nice ways. Read the bit above aboutmanto figure out how to navigate and search when usinglessfile <file>tells you what kind of file<file>is.cat <file>outputs all the contents of<file>- CAUTION - only use on small files.
Other miscellaneous
dfshows you the top level of the directory structure of the system you're working on, along with how much disk space is availableman<unixcommand> displays the manual page for a unix command.> is used to redirect STDOUT and STDERR to files.
|(pipe) is used to pipe together multiple commands such that output of first command becomes input of second command.
The concept of PATH
On a unix command line, you can only access files that are in your current working directory. If you are in my_rnaseq_course/partA/fastqc_exercise and you issue the command:
cdh
cd my_rnaseq_course/partA/fastqc_exercise
less Sample1_R1.fastq
this will work only if Sample1_R1.fastq is located in the current directory.
To access files outside your current directory, you can provide the absolute path or relative path to find the file. If file is actually located in
/stor/home/daras/my_rnaseq_course/partA/fastqc_exercise, then you can open it by using one of these two commands:
less /stor/home/daras/my_rnaseq_course/partA/fastqc_exercise
****your home directory path is different from mine*****
(or)
less data/Sample1_R1.fastq
Exception: If the location of a file, most often, an executable is included in your shell environment variable called PATH, you can run it from anywhere without specifying where it is.
echo $PATH to see what is in your PATH.
Always use tab to complete file names (sometimes may need to double tab). If the file is in the current directory or is in your PATH, tab will do auto complete.
Always check where you are and what files are there, on a Unix environment!
.bash_profile, .profile files
A startup script that gets executed every time a session is started interactively. You can put any command in that file that you could type at the command prompt. Put commands here to set up your particular environment, and to customize things to your preferences (such as paths, aliases, modules to load).
File Editors
There are a number of options for editing files at TACC. These fall into three categories:
Linux text editors installed at TACC (nano, vi, emacs). These run in your terminal window. vi and emacs are extremely powerful but also quite complex, so nano may be the best choice as a first local text editor.
Text editors or IDEs that run on your local computer but have an SFTP (secure FTP) interface that lets you connect to a remote computer (Notepad++ or Komodo Edit). Once you connect to the remote host, you can navigate its directory structure and edit files. When you open a file, its contents are brought over the network into the text editor's edit window, then saved back when you save the file.
Software that will allow you to mount your home directory on TACC as if it were a normal disk e.g. MacFuse/MacFusion for Mac, or ExpanDrive for Windows or Mac ($$, but free trial). Then, you can use any text editor to open files and copy them to your computer with the usual drag-drop.
As we will be using nano throughout the class, it is a good idea to review some of the basics. nano is a very simple editor available on most Linux systems. If you are able to use ssh, you can use nano. To invoke it, just type:
nano (or)
nano <filename>
You'll see a short menu of operations at the bottom of the terminal window. The most important are:
ctl-o - write out the file
ctl-x - exit nano
You can just type in text, and navigate around using arrow keys. A couple of other navigation shortcuts:ctl-a - go to start of line
ctl-e - go to end of line
Be careful with long lines – sometimes nano will split long lines into more than one line, which can cause problems in our commands files.
Naming Files
Try to find a convention and stick to it when naming files and directories. But, most importantly:
Case matters: directory named BioITeam is different from directory named bioiteam.
Do not use white spaces in file names: Though you may be tempted to name your directory my raw data, such naming makes sense when you are looking at the directory visually on your mac finder or windows explorer, but in command line, space means next option. So, mkdir my raw data will actually make 3 directories: my, raw, and data. Use uppercase, or underscores instead of white spaces like my_raw_data.
Be careful with using special characters : Typically, underscores,dashes, periods are ok in filenames. But avoid, punctuations and other such special characters. A directory called sarah's raw data would be a bad idea.
Exercise 3
How many fastq files are in /stor/scratch/Courses/rnaseq_course/partA/fastqc_exercise/data and what are their sizes?
Change to the directory that contains the fastq files.
We want to run fastqc on these fastq files. Figure out if this tool (fastqc) is installed and in your path. Figure out what options you need to use to run this tool.
Create a file called commands.fastqc and put the fastqc commands in it. Save and close the file
BACK TO THE COURSE OUTLINE