...
Below is the easiest shell script in the world. Open a new text in your favorite text editor, type below lines, and save as test.sh. The extension of the file should be .sh. Using ch cd command, go to the directory where test.sh is. Then, execute the script by entering './test.sh'
...
You already generated reference file. Now, you want to do the following task in order. 1. Map a fastq file on reference genome using BWA 2. Convert .sai file to .sam file 3. Convert the .sam file into .bam file using Samtools 4. Count the number of aligned and unaligned reads, respectively, and calculate the mapping rate. Below is an example script for yeast. This works only in my machine because my yeast reference genome is under /Users/Daechan/Desktop/ref/ and the sample input file name is "subset.test.fastq" Change refPath and inFile variables, save as basic_script.sh, and execute it by entering "./basic_script.sh"
| Code Block |
|---|
#!/bin/bash echo "------------------------------------------"; echo "Define variables"; echo "------------------------------------------"; refPath="/Users/Daechan/Desktop/ref/sacCer1.fa"; inFile="subset.test.fastq" PFX="subset.test.output" saiFile="$PFX.sai"; samFile="$PFX.sam"; bamFile="$PFX.bam"; echo "Reference File: $refPath" echo "Input File name: $inFile" echo "Prefix for SAI BAM SAM files: $PFX" echo "SAI file name: $saiFile" echo "SAM file name: $samFile" echo "BAM file name: $bamFile" echo "------------------------------------------"; echo "Align subset.test.fastq on yeast genome"; echo "------------------------------------------"; bwa aln $refPath $inFile > $saiFile; echo "...Done"; echo "" echo "------------------------------------------"; echo "Convert the SAI file into the SAM file"; echo "------------------------------------------"; bwa samse $refPath $saiFile $inFile > $samFile; echo "...Done"; echo "" echo "------------------------------------------"; echo "Convert the SAM file into the BAM file"; echo "------------------------------------------"; samtools view -bS $samFile > $bamFile; echo "...Done"; echo "" echo "------------------------------------------"; echo "Calculate the simple statistics"; echo "------------------------------------------"; numMapRd=$(samtools view -F 0X04 $bamFile | wc -l ); echo "The number of mapped read: $numMapRd"; echo "" numUnmapRd=$(samtools view -f 0X04 $bamFile | wc -l ); echo "The number of unmapped read: $numUnmapRd" echo "" echo "Mapping rate is : $[$[numMapRd*100]/$[numMapRd+numUnmapRd]]%" |
...
You can repeatedly use the above script for multiple datasets by allowing the script to take arguments. Below script will give you the same result as above. Plus, variant calling file (vcf) will be generated. Before execute the script, save the below lines as args_script.sh and enter "./args_script.sh" without arguments. It will give you information about positional arguments. If you want to map the input file "subset.test.fastq" onto yeast reference genome and put a prefix "test" on all output files, enter "./args_script.sh sacCer1 subset.test.fastq test
| Code Block |
|---|
#!/bin/bash assembly=$1; inFile=$2; PFX=$3; saiFile="$PFX.sai"; samFile="$PFX.sam"; bamFile="$PFX.bam"; sortBam="$PFX.sorted.bam"; bcfFile="$PFX.bcf"; vcfFile="$PFX.vcf"; usage() { echo "------------------------------------------"; echo "args_script.sh <assembly> <inFile> <PFX>"; echo "Required Arguments" echo " assembly hg18 mm9 sacCer1"; echo " inFile path to input sequence file"; echo " outPFX prefix for output files"; echo "------------------------------------------"; exit 1; } if [ "$assembly" == "" ]; then usage ; fi if [ "$inFile" == "" ]; then usage ; fi if [ "$PFX" == "" ]; then usage ; fi if [ "$assembly" == "sacCer1" ]; then ref="/Users/Daechan/Desktop/ref/sacCer1.fa"; elif [ "$assembly" == "mm9" ]; then ref="/Users/Daechan/Desktop/ref/mm9.fa"; elif [ "$assembly" == "hg18" ]; then ref="/Users/Daechan/Desktop/ref/hg18.fa"; fi echo "------------------------------------------"; echo "Define variables"; echo "------------------------------------------"; echo "Reference File: $assembly" echo "Input File name: $inFile" echo "Prefix for SAI BAM SAM files: $PFX" echo "SAI file name: $saiFile" echo "SAM file name: $samFile" echo "BAM file name: $bamFile" echo "Sorted BAM file name: $sortBam" echo "BCF file name: $bcfFile" echo "VCF file name: $vcfFile" echo "" echo "------------------------------------------"; echo "Align subset.test.fastq on yeast genome"; echo "------------------------------------------"; bwa aln $ref $inFile > $saiFile; echo "...Done"; echo "" echo "------------------------------------------"; echo "Convert the SAI file into the SAM file"; echo "------------------------------------------"; bwa samse $ref $saiFile $inFile > $samFile; echo "...Done"; echo "" echo "------------------------------------------"; echo "Convert the SAM file into the BAM file"; echo "------------------------------------------"; samtools view -bS $samFile > $bamFile; echo "...Done"; echo "" echo "------------------------------------------"; echo "Calculate the simple statistics"; echo "------------------------------------------"; numMapRd=$(samtools view -F 0X04 $bamFile | wc -l ); echo "The number of mapped read: $numMapRd"; echo "" numUnmapRd=$(samtools view -f 0X04 $bamFile | wc -l ); echo "The number of unmapped read: $numUnmapRd" echo "" echo "Mapping rate is : $[$[numMapRd*100]/$[numMapRd+numUnmapRd]]%" echo "------------------------------------------"; echo "Call variants"; echo "------------------------------------------"; echo "Sorting BAM file..."; samtools sort $bamFile $sortBam echo "" echo "Indexing reference sequences in ref.fa by samtools faidx..."; samtools faidx $ref echo "" echo "Generating bcf file..."; samtools mpileup -uf $ref $sortBam | bcftools view -bvcg - > $bcfFile echo "" echo "Generating vcf file..."; bcftools view $bcfFile > $vcfFile echo "" echo "...Done" |