The Quickest Unix Refresher ever

Unix Command Cheat Sheet

Basic linux commands you need to know like breathing air

  • ls - list the contents of the current directory
  • pwd - print the present working directory  - tells you where you are currently. The format is something like /home/myID - just like on most computer systems, this represents leaves on the tree of the file system structure, also called a "path".
  • cd <whereto> - change the present working directory to <whereto>  You will need to provide a path like /work/myID to change to that directory.
    • Some special <wheretos>.. (period, period) means "up one level".   . means current directory. ~ (tilde) means "my home directory". ~myfriend (tilde "myfriend) means "myfriend's home directory".
  • nano - The text editor we'll be using
  • df shows you the top level of the directory structure of the system you're working on, along with how much disk space is available
  • head <file> and tail <file> shows you the top or bottom 10 lines of a file <file>
  • more <file> and less <file> both display the contents of <file> in nice ways. Read the bit above about man to figure out how to navigate and search when using less
  • file <file> tells you what kind of file <file> is.
  • cat <file> outputs all the contents of <file> - CAUTION - only use on small files.
  • rm <file> deletes a file. This is permanent - not a "trash can" deletion.
  • cp <source> <destination> copies the file source to the location and/or file name destination}. Using . (period) means "here, with the same name".  cp -r <dirname> <destination> will recursively copy the directory dirname and all its contents to the directory destination.
  • scp <user>@<host>:<source> <destination> works just like cp but copies source from the user user's directory on remote machine host to the local file destination
  • mkdir <dirname> and rmdir <dirname> make and remove the directory "dirname". This only removes empty directories - "rm -r <dirname>" will remove everything.
  • wget <url> fetches a file with a valid URL. It's not that common but we'll use wget to pull data from one of TACC's web-based storage devices.
  • man <unixcommand> displays the manual page for a unix command.
  • >  is used to redirect STDOUT and STDERR to files.


Wildcards and special file names


The shell has shorthand to refer to groups of files by allowing wildcards in file names. * (asterisk) is the most common; it is a wildcard meaning "any length of any characters". Other useful ones are []to allow for any character in the set <characters>> 

For example: ls *.bam lists all files in the current directory that end in .bam

Three special file names:

  1. . (single period) means "this directory".  So ls -l . means "list contents of this current directory"
  2. .. (two periods) means "directory above current." So ls -l .. means "list contents of the parent directory."
  3.  ~ (tilde) means "my home directory". So ls -l ~ means "list contents of the my home directory."


The concept of PATH

On a unix command line, you can only access files that are in your current working directory. If you are in /scratch/01184/daras/ and you issue the command: 

less genomeFile

this will work only if genomeFile is located in /scratch/01184/daras/.

To access files outside your current directory, you can provide the absolute path or relative path to find the file.  If genomeFile is actually located in   /scratch/01184/daras/data, then you can open it by using one of these two commands:

less  /scratch/01184/daras/data/genomeFile

(or)

less data/genomeFile

Exception: If the location of a file, most often, an executable is included in your shell environment variable called PATH, you can run it from anywhere without specifying where it is.  

echo $PATH to see what is in your PATH.

Always use tab to complete file names. If the file is in the current directory or is in your PATH, tab will do auto complete.

Always remember where you are, on a Unix environment!


So many options...

When running scripts and software tools, all the inputs you provide to it are called arguments or parameters or options.  Each tool/script can be a little different in how it takes its arguments. But typically, they follow a structure.

command <options>  inputfile > outputfile

command  -option1 value1 -option2 value2 -option3 inputfile > outputfile

  • option1 and option2 are types that take a value.
  • option3 is a yes or no flag, so it does not take a value.

Example: blastp -query scaffolds.fasta -db TAIR10_pep_20101214 -eval 0.0001 -outfmt 6 -out blastp.out

If you need to find out the options for a command, try any of these (again each tool is different):

  • command -h
  • command --h
  • command -help
  • command


.bash_profile, .profile files

A startup script that gets executed every time a session is started interactively. You can put any command in that file that you could type at the command prompt. Put commands here to set up your particular environment, and to customize things to your preferences (such as paths, aliases, modules to load).

File Editors

 

There are a number of options for editing files at TACC. These fall into three categories:

 

  • Linux text editors installed at TACC (nanoviemacs). These run in your terminal window. vi and emacs are extremely powerful but also quite complex, so nano may be the best choice as a first local text editor.
  • Text editors or IDEs that run on your local computer but have an SFTP (secure FTP) interface that lets you connect to a remote computer (Notepad++ or Komodo Edit). Once you connect to the remote host, you can navigate its directory structure and edit files. When you open a file, its contents are brought over the network into the text editor's edit window, then saved back when you save the file.
  • Software that will allow you to mount your home directory on TACC as if it were a normal disk e.g. MacFuse/MacFusion for Mac, or ExpanDrive for Windows or Mac ($$, but free trial). Then, you can use any text editor to open files and copy them to your computer with the usual drag-drop.

 

As we will be using nano throughout the class, it is a good idea to review some of the basics. nano is a very simple editor available on most Linux systems. If you are able to use ssh, you can use nano. To invoke it, just type:


nano  (or)  

nano <filename>

You'll see a short menu of operations at the bottom of the terminal window. The most important are:

 

  • ctl-o - write out the file
  • ctl-x - exit nano
    You can just type in text, and navigate around using arrow keys. A couple of other navigation shortcuts:
  • ctl-a - go to start of line
  • ctl-e - go to end of line

Be careful with long lines – sometimes nano will split long lines into more than one line, which can cause problems in our commands files.

Naming Files

Try to find a convention and stick to it when naming files and directories.  But, most importantly:

  • Case matters: directory named BioITeam is different from directory named bioiteam.
  • Do not use white spaces in file names: Though you may be tempted to name your directory my raw data, such naming makes sense when you are looking at the directory visually on your mac finder or windows explorer, but in command line, space means next option.  So, mkdir my raw data will actually make 3 directories: my, raw, and data.  Use uppercase, or underscores instead of white spaces like my_raw_data.
  • Be careful with using special characters : Typically, underscores,dashes, periods are ok in filenames. But avoid, punctuations and other such special characters. A directory called sarah's raw data would be a bad idea.



BACK TO THE COURSE OUTLINE