The Quickest Unix Refresher ever

The Quickest Unix Refresher ever

Unix Command Cheat Sheet

What's going on when you ssh into a server

SSH (secure shell) lets you connect to a different computer(server). you've logged on, you're now at a Linux command line in your Home directory!

It looks as if you're running directly on the remote computer, but really there are two programs communicating:

  1. your local Terminal

  2. the remote Shell

There are many shell programs available in Linux, but the default is bash (Bourne-again shell).

Your Terminal is pretty "dumb" – just sending what you type over its Secure Sockets Layer (SSL) connection to the remote computer, then displaying the text sent back by the remote computer's shell. The real work is being done on the remote computer, by executable programs called by the bash shell (also called commands, since you call them on the command line).

Structure of commands

When running scripts and software tools, all the inputs you provide to it are called arguments or parameters or options.  Each tool/script can be a little different in how it takes its arguments. But typically, they follow a structure.

command <options>  inputfile > outputfile

command  -option1 value1 -option2 value2 -option3 inputfile > outputfile

  • option1 and option2 are types that take a value.

  • option3 is a yes or no flag, so it does not take a value.

Simple Example: ls -la

Example: blastp -query scaffolds.fasta -db TAIR10_pep_20101214 -eval 0.0001 -outfmt 6 -out blastp.out

If you need to find out the options for a command, try any of these (again each tool is different):

  • command -h

  • command --h

  • command -help

  • command

Exercise 1

What is the difference between typing just ls (then Enter), and ls -a (then Enter)?

ls by itself displays names of the files and sub-directories in your current Home directory, other than dot files.

data docs haiku.txt jabberwocky.txt mobydick.txt

ls -a shows all files, including dot files whose names start with a period ( . ) which are normally not listed

. .bash_logout data haiku.txt .local .profile .. .bashrc docs jabberwocky.txt mobydick.txt .zfs

What do you see when you enter ls -l?

A long listing files and sub-directories in your current Home directory, other than dot files.

drwxr-x--- 3 student30 CCBB_Workshops_1 7 May 2 15:54 data drwxr-x--- 2 student30 CCBB_Workshops_1 4 Apr 25 14:34 docs -rw-r----- 1 student30 CCBB_Workshops_1 218 Sep 22 2015 haiku.txt -rw-r----- 1 student30 CCBB_Workshops_1 992 Sep 22 2015 jabberwocky.txt -rw-r----- 1 student30 CCBB_Workshops_1 12319 Apr 9 2014 mobydick.txt

How can you tell which entries are files and which are directories?

Because of the coloring ls applies to its output, directories are a different color (e.g. yellow or blue) than files (white).

Also, in a long listing, the left-most permissions display will start with a "d" for directories (e.g. drwxrwx---)

Using help to figure out how to use a command- identify how to construct the blastp command. Identify an option that requires an input vlaue and an option that doesn’t.

blastp -h or blastp --help

Command input errors

You don't always type in commands, options and arguments correctly – you can misspell a command name, forget to type a space, specify an unsupported option or a non-existent file, or make all kinds of other mistakes.

What happens? The shell attempts to guess what kind of error it is and reports an appropriate error message as best it can.

Some examples:

# You try to use an unsupported option student30@gsafcomp02:~$ ls -z ls: invalid option -- 'z' Try 'ls --help' for more information. # You mis-type a command name, or a command not installed on your system student30@gsafcomp02:~$ catt ... Command 'catt' not found... # You specify the name of a file that does not exist student30@gsafcomp02:~$ ls xxx ls: cannot access 'xxx': No such file or directory # You try to access a file or directory you don't have permissions for student30@gsafcomp02:~$ cat /etc/sudoers cat: /etc/sudoers: Permission denied

Always use tab to complete file names/command names (sometimes may need to double tab) to avoid a lot of these spelling errors etc.

Basic linux commands you should know like breathing air

For accessing directories

  • ls - list the contents of the current directory

  • pwd - print the present working directory  - tells you where you are currently. The format is something like /home/myID - just like on most computer systems, this represents leaves on the tree of the file system structure, also called a "path".

  • rm <file> deletes a file. This is permanent - not a "trash can" deletion.

  • mkdir <dirname> and rmdir <dirname> make and remove the directory "dirname". This only removes empty directories - "rm -r <dirname>" will remove everything.

  • cd <whereto> - change the present working directory to <whereto>  You will need to provide a path like /work/myID to change to that directory.

    • Some special <wheretos>.. (period, period) means "up one level".   . means current directory. ~ (tilde) means "my home directory". ~myfriend (tilde "myfriend) means "myfriend's home directory".

 

Detour: Wildcards and special file names

The shell has shorthand to refer to groups of files by allowing wildcards in file names. * (asterisk) is the most common; it is a wildcard meaning "any length of any characters". Other useful ones are []to allow for any character in the set <characters>> 

For example: ls *.bam lists all files in the current directory that end in .bam

Three special file names:

  1. . (single period) means "this directory".  So ls -l . means "list contents of this current directory"

  2. .. (two periods) means "directory above current." So ls -l .. means "list contents of the parent directory."

  3.  ~ (tilde) means "my home directory". So ls -l ~ means "list contents of the my home directory."

Exercise 2

List contents of your home directory and change to the my_rnaseq_course/partA directory. List contents of the current directory. List the contents of the directory above.

ls -l ~

#change to a particular directory

cd ~/my_rnaseq_course/partA

#list contents of current directory

ls -l .

#list cotents of directory above

ls -l ..

 

For copying/moving files

  • cp <source> <destination> copies the file source to the location and/or file name destination}. Using . (period) means "here, with the same name".  cp -r <dirname> <destination> will recursively copy the directory dirname and all its contents to the directory destination.

  • scp <user>@<host>:<source> <destination> works just like cp but copies source from the user user's directory on remote machine host to the local file destination

  • mv <source> <destination> moves the file source to the location and/or file name destination}.   

  • wget <url> fetches a file with a valid URL. It's not that common but we'll use wget to pull data from one of TACC's web-based storage devices.

For looking into files

  • nano - The text editor we'll be using to edit a file

  • head <file> and tail <file> shows you the top or bottom 10 lines of a file <file>

  • more <file> and less <file> both display the contents of <file> in nice ways. Read the bit above about man to figure out how to navigate and search when using less

  • file <file> tells you what kind of file <file> is.

  • cat <file> outputs all the contents of <file> - CAUTION - only use on small files.

Other miscellaneous

  • df shows you the top level of the directory structure of the system you're working on, along with how much disk space is available

  • man <unixcommand> displays the manual page for a unix command.

  • >  is used to redirect STDOUT and STDERR to files.

  • | (pipe) is used to pipe together multiple commands such that output of first command becomes input of second command.

The concept of PATH

On a unix command line, you can only access files that are in your current working directory. If you are in my_rnaseq_course/partA/fastqc_exercise and you issue the command: 

cdh

cd my_rnaseq_course/partA/fastqc_exercise

less Sample1_R1.fastq

this will work only if Sample1_R1.fastq is located in the current directory.

To access files outside your current directory, you can provide the absolute path or relative path to find the file.  If file is actually located in  

/stor/home/daras/my_rnaseq_course/partA/fastqc_exercise, then you can open it by using one of these two commands:

less  /stor/home/daras/my_rnaseq_course/partA/fastqc_exercise

****your home directory path is different from mine*****

(or)

less data/Sample1_R1.fastq 

Exception: If the location of a file, most often, an executable is included in your shell environment variable called PATH, you can run it from anywhere without specifying where it is.  

echo $PATH to see what is in your PATH.

Always use tab to complete file names (sometimes may need to double tab). If the file is in the current directory or is in your PATH, tab will do auto complete.

Always check where you are and what files are there, on a Unix environment!

.bash_profile, .profile files

A startup script that gets executed every time a session is started interactively. You can put any command in that file that you could type at the command prompt. Put commands here to set up your particular environment, and to customize things to your preferences (such as paths, aliases, modules to load).

File Editors

There are a number of options for editing files at TACC. These fall into three categories:

  • Linux text editors installed at TACC (nanoviemacs). These run in your terminal window. vi and emacs are extremely powerful but also quite complex, so nano may be the best choice as a first local text editor.

  • Text editors or IDEs that run on your local computer but have an SFTP (secure FTP) interface that lets you connect to a remote computer (Notepad++ or Komodo Edit). Once you connect to the remote host, you can navigate its directory structure and edit files. When you open a file, its contents are brought over the network into the text editor's edit window, then saved back when you save the file.

  • Software that will allow you to mount your home directory on TACC as if it were a normal disk e.g. MacFuse/MacFusion for Mac, or ExpanDrive for Windows or Mac ($$, but free trial). Then, you can use any text editor to open files and copy them to your computer with the usual drag-drop.

As we will be using nano throughout the class, it is a good idea to review some of the basics. nano is a very simple editor available on most Linux systems. If you are able to use ssh, you can use nano. To invoke it, just type:

nano  (or)  

nano <filename>

You'll see a short menu of operations at the bottom of the terminal window. The most important are:

  • ctl-o - write out the file

  • ctl-x - exit nano
    You can just type in text, and navigate around using arrow keys. A couple of other navigation shortcuts:

  • ctl-a - go to start of line

  • ctl-e - go to end of line

Be careful with long lines – sometimes nano will split long lines into more than one line, which can cause problems in our commands files.

Naming Files

Try to find a convention and stick to it when naming files and directories.  But, most importantly:

  • Case matters: directory named BioITeam is different from directory named bioiteam.

  • Do not use white spaces in file names: Though you may be tempted to name your directory my raw data, such naming makes sense when you are looking at the directory visually on your mac finder or windows explorer, but in command line, space means next option.  So, mkdir my raw data will actually make 3 directories: my, raw, and data.  Use uppercase, or underscores instead of white spaces like my_raw_data.

  • Be careful with using special characters : Typically, underscores,dashes, periods are ok in filenames. But avoid, punctuations and other such special characters. A directory called sarah's raw data would be a bad idea.

Exercise 3

How many fastq files are in /stor/scratch/Courses/rnaseq_course/partA/fastqc_exercise/data and what are their sizes?

ls -l /stor/scratch/Courses/rnaseq_course/partA/fastqc_exercise/

Change to the directory that contains the fastq files.

cd /stor/scratch/Courses/rnaseq_course/partA/fastqc_exercise/

We want to run fastqc on these fastq files. Figure out if this tool (fastqc) is installed and in your path. Figure out what options you need to use to run this tool.

If you try typing fastqc and hit tab multiple times, you’ll see all files/executables that are available with that name.

#to get usage information for fastqc

fastqc -h

Create a file called commands.fastqc and put the fastqc commands in it. Save and close the file

nano commands. fastqc

fastqc Sample1_R1.fastq

#ctrl+x to quit

#Y to save the file

 

BACK TO THE COURSE OUTLINE