...
Note that when we supplied the text "Expert scripter", we put it in quotes, which group the two words into one argument to the script. Without the quotes, the word "Expert" would be seen by the script as argument 1 and "scripter" would be seen as argument 2 (which our script ignores).
Exercise 2
You already generated reference file. Now, you want to do the following tasks in order.
...
BWA alignment script
The first real script you will likely find yourself wanting is one that performs a standard set of alignment tasks such as mapping, bam file creation and statistics reporting. The script below does exactly that using the bwa aligner.
- Aligns a fastq file to a pre-made reference genome using bwaExtract
- Extracts alignments from bwa's proprietary binary .sai file to a .sam file
- Convert Converts the .sam file into a .bam file using samtools
- Sort and index the .bam file so that it can be viewed in IGV.
- Count the number of aligned and unaligned reads, and calculate the mapping rate.
Below is an example script for yeast. This works only in my machine because my yeast reference genome is under /Users/Daechan/Desktop/ref/ and the sample input file name is "subset.test.fastq" Change refPath and inFile variables, save as basic_script.sh, and execute it by entering "./basic_script.sh"
...
The script takes
| Code Block |
|---|
#!/bin/bash
echo "------------------------------------------";
echo "Define variables";
echo "------------------------------------------";
refPath="/Users/Daechan/Desktop/ref/sacCer1.fa";
inFile="subset.test.fastq"
PFX="subset.test.output"
saiFile="$PFX.sai";
samFile="$PFX.sam";
bamFile="$PFX.bam";
echo "Reference File: $refPath"
echo "Input File name: $inFile"
echo "Prefix for SAI BAM SAM files: $PFX"
echo "SAI file name: $saiFile"
echo "SAM file name: $samFile"
echo "BAM file name: $bamFile"
echo "------------------------------------------";
echo "Align subset.test.fastq on yeast genome";
echo "------------------------------------------";
bwa aln $refPath $inFile > $saiFile;
echo "...Done";
echo ""
echo "------------------------------------------";
echo "Convert the SAI file into the SAM file";
echo "------------------------------------------";
bwa samse $refPath $saiFile $inFile > $samFile;
echo "...Done";
echo ""
echo "------------------------------------------";
echo "Convert the SAM file into the BAM file";
echo "------------------------------------------";
samtools view -bS $samFile > $bamFile;
echo "...Done";
echo ""
echo "------------------------------------------";
echo "Calculate the simple statistics";
echo "------------------------------------------";
numMapRd=$(samtools view -F 0x04 $bamFile | wc -l );
echo "The number of mapped read: $numMapRd";
echo ""
numUnmapRd=$(samtools view -f 0x04 $bamFile | wc -l );
echo "The number of unmapped read: $numUnmapRd"
echo ""
echo "Mapping rate is : $[$[numMapRd*100]/$[numMapRd+numUnmapRd]]%"
|
Below is an example script for yeast. This works only in my machine because my yeast reference genome is under /Users/Daechan/Desktop/ref/ and the sample input file name is "subset.test.fastq" Change refPath and inFile variables, save as basic_script.sh, and execute it by entering "./basic_script.sh"
Exercise3
You can repeatedly use the above script for multiple datasets by allowing the script to take arguments. Below script will give you the same result as above. Plus, variant calling file (vcf) will be generated. Before execute the script, save the below lines as args_script.sh and enter "./args_script.sh" without arguments. It will give you information about positional arguments. If you want to map the input file "subset.test.fastq" onto yeast reference genome and put a prefix "test" on all output files, enter "./args_script.sh sacCer1 subset.test.fastq test
...