...
Let’s assume that you go to a restaurant. The rule of the restaurant is to order one by one: order a drink and get it, then order an appetizer and get it, then order dishes and so forth. What a stupid ordering system is! To make matters worse, even if you want to order the exact same course meal you ate before, you should must go through the tedious process again. Shell Script can fix the problems. Shell Script scripting addresses this problem. A shell script is series of commands written in a plain text file. Instead of entering commands one by one, you can store the sequence of commands to text file and tell the shell to execute this text file. When you want to repeatedly execute the series of command lines for multiple datasets, the shell script can automate your task and save lots of time.
Exercise 1 - Hello world
Below is the easiest shell script in the world. Open a new text in your favorite text editor, type below lines, and save as test.sh. The extension of the file should be .sh. Using cd command, go to the directory where test.sh is. Then, execute the script by entering './test.sh'
...
Create and run this script.
| Code Block |
|---|
#!/bin/bash echo "-------------------" echo "Hello, Shell World!" echo "-------------------" |
...
-------------------"
|
Open your favorite text editor, enter these lines, and save as test.sh (note the file extension for shell scripts is .sh). Using cd command, go to the directory where test.sh is. Then, execute the script by entering './test.sh'
Exercise 2
You already generated reference file. Now, you want to do the following task in order. 1.
- Map a fastq file on reference genome using BWA
...
- Convert .sai file to .sam file
...
- Convert the .sam file into .bam file using Samtools
...
- Count the number of aligned and unaligned reads, respectively, and calculate the mapping rate.
Below is an example script for yeast. This works only in my machine because my yeast reference genome is under /Users/Daechan/Desktop/ref/ and the sample input file name is "subset.test.fastq" Change refPath and inFile variables, save as basic_script.sh, and execute it by entering "./basic_script.sh"
| Code Block |
|---|
#!/bin/bash echo "------------------------------------------"; echo "Define variables"; echo "------------------------------------------"; refPath="/Users/Daechan/Desktop/ref/sacCer1.fa"; inFile="subset.test.fastq" PFX="subset.test.output" saiFile="$PFX.sai"; samFile="$PFX.sam"; bamFile="$PFX.bam"; echo "Reference File: $refPath" echo "Input File name: $inFile" echo "Prefix for SAI BAM SAM files: $PFX" echo "SAI file name: $saiFile" echo "SAM file name: $samFile" echo "BAM file name: $bamFile" echo "------------------------------------------"; echo "Align subset.test.fastq on yeast genome"; echo "------------------------------------------"; bwa aln $refPath $inFile > $saiFile; echo "...Done"; echo "" echo "------------------------------------------"; echo "Convert the SAI file into the SAM file"; echo "------------------------------------------"; bwa samse $refPath $saiFile $inFile > $samFile; echo "...Done"; echo "" echo "------------------------------------------"; echo "Convert the SAM file into the BAM file"; echo "------------------------------------------"; samtools view -bS $samFile > $bamFile; echo "...Done"; echo "" echo "------------------------------------------"; echo "Calculate the simple statistics"; echo "------------------------------------------"; numMapRd=$(samtools view -F 0X040x04 $bamFile | wc -l ); echo "The number of mapped read: $numMapRd"; echo "" numUnmapRd=$(samtools view -f 0X040x04 $bamFile | wc -l ); echo "The number of unmapped read: $numUnmapRd" echo "" echo "Mapping rate is : $[$[numMapRd*100]/$[numMapRd+numUnmapRd]]%" |
...