Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Here is a completed Example BWA alignment script.You may want to open it in a separate window so you can read along as it is discussed here. It is also available in the course materials as align_bwa.sh.

What could possibly go wrong?

The first thing you will notice about this script is that there is a lot of argument and error checking -- more than the actual "work" code. This is a hallmark of a well-written shell script, especially one that will be run at TACC by many processors at a time.

Let's say you run 20 alignments in parallel at TACC. How do you know if they all completed successfully? If they did not, how do you tell what went wrong? Do you really want to poke around in 20 directories/files to figure it out?

Also, a good shell script should be relatively easy to call. That's why, for example, this script takes only a short name of the desired reference and uses it to select the correct path, and only requires the name of the R1 fastq for paired-end reads, and uses that path to determine the name of the R2 fastq file.

The approach this script takes to error checking is that many many things can go wrong (this is from experience; every error check in this script checks for something that has gone wrong for us in the past :)

Script functions

Shell scripts can define functions, which are a convenient way to avoid repeating the same few lines of code again and again. This philosophy of code writing is called DRY, for Don't Repeat Yourself. Like shell scripts themselves, shell script functions take their arguments on the line that invokes them, and refers to them as $1, $2, etc.

Let's look at the simplest function in the align_bwa.sh script:

Code Block

err() {
  echo "$1...exiting";
  exit 1;
}

This function takes one argument, echoes it along with some boilerplate text ("...exiting") and exits. You might call it like this from a shell script:

Code Block

if [ ! -e "$MY_FILE" ]; then 
  err "File '$MY_FILE' does not exist"; 
fi

If the filename stored in the MY_FILE variable exists, nothing happens. If the file does not exist, the shell script exits when the err function is called, after printing out a diagnostic message.

Why write a function this simple? Because we're going to build on it, writing more specialized error checking functions, and we want all of them to print out the boilerplate text ("...exiting") before they exit. That way we can easily grep for this well-know text in our execution log files. For a single file:

Code Block

grep 'exiting' myrun.log

Or even better, for any log file in subdirectories of the current directory:

Code Block

find . -name "*.log" | xargs grep 'exiting'

Daechan original