Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

A brief introduction to shell scripting. Please see /corral-replwork/utexasprojects/BioITeam/ngs_introcommon/scripts for several example scrtips for a number of well-written scripts.

Table of Contents
separatornewline

What and why is Shell Scripting?

...

Code Block
login1$ ffkakldk
-bash: ffkakldk: command not found

Let’s assume that you go to a restaurant. The rule of the restaurant is to order one by one: order a drink and get it, then order an appetizer and get it, then order dishes and so forth. What a stupid ordering system is! To make matters worse, even if you want to order the exact same meal you ate before, you must go through the tedious process again. Shell scripting addresses this problem. A shell script is series of commands written in a plain text file. Instead of entering commands one by one, you can store the sequence of commands to text file and tell the shell to execute this text file. When you want to repeatedly execute the series of command lines for multiple datasets, the shell script can A shell script is series of commands written in a plain text file. Instead of entering commands one by one, you can store the sequence of commands to text file and tell the shell to execute this text file. When you want to repeatedly execute the series of command lines for multiple datasets, the shell script can automate your task and save lots of time.

...

The actual bam sorting and indexing are straightforward calls to samtools (although you might want to check out the -m maximum memory option for samtools sort; it can speed up sorting of large files considerably):

Code Block
echo "Sorting '$OUT_PFX.bam'...";
samtools sort $OUT_PFX.bam $OUT_PFX.sorted;
ckRes $? "samtools sort";
ckFileSz "$OUT_PFX.sorted.bam";

echo "Indexing '$OUT_PFX.sorted.bam'...";
samtools index $OUT_PFX.sorted.bam;
ckRes $? "samtools index";
ckFileSz "$OUT_PFX.sorted.bam.bai";

Finally, we call samtools flagstat to report alignment statistics:

Code Block
samtools flagstat $OUT_PFX.sorted.bam | tee $OUT_PFX.flagstat.txt
ckRes $? "samtools flagstat";
ckFileSz "$OUT_PFX.flagstat.txt";

Note we tee the output so that the information is both stored in a file and appears in the execution log. Output from samtools flagstat looks like this for paired end alignments:

4029634 + 0 in total (QC-passed reads + QC-failed reads) 2492274 + 0 duplicates 3292433 + 0 mapped (81.71%:-nan%) 4029634 + 0 paired in sequencing 2014817 + 0 read1 2014817 + 0 read2 3158008 + 0 properly paired (78.37%:-nan%) 3181258 + 0 with itself and mate mapped 111175 + 0 singletons (2.76%:-nan%) 5339 + 0 with mate mapped to a different chr 3711 + 0 with mate mapped to a different chr (mapQ>=5)
Code Block
echo "Sorting '$OUT_PFX.bam'...";
samtools sort $OUT_PFX.bam $OUT_PFX.sorted;
ckRes $? "samtools sort";
ckFileSz "$OUT_PFX.sorted.bam";

echo "Indexing '$OUT_PFX.sorted.bam'...";
samtools index $OUT_PFX.sorted.bam;
ckRes $? "samtools index";
ckFileSz "$OUT_PFX.sorted.bam.bai";

Finally, we call samtools flagstat to report alignment statistics:

Code Block
samtools flagstat $OUT_PFX.sorted.bam > $OUT_PFX.flagstat.txt
ckRes $? "samtools flagstat";
ckFileSz "$OUT_PFX.flagstat.txt";

To summarize the statistics from all your (possibly parallel) alignments, you could do something like this:

Code Block
find . -name "*.flagstat.txt" | xargxargs grep 'mapped ('