Let's use Anna Battenhouse's shell script align_bwa.sh to align the paired end fastq file to the hg19 version of the genome calling the output trios_mapping. The input fastq file can be found at $BI/ngs_course/human_variation/allseqs_R1.fastq or if you have already copied today's materials to your scratch directory at $SCRATCH/BDIB_Human_tutorial/raw_files. Throughout the course we have been telling you to copy things to a local location before working with them, and you can continue to do so, but it is important to remember this is not necessarily a requirement. If you use launcher_creator.py to create the job submission script. Be sure to append &> "meaningful file name" to the end of the script to capture the standard output and standard Expand |
---|
title | Click here for a hint |
---|
| Type align_bwa.sh without any options to gain insight into the programs expectations. As you might have put together, the "BWA" part of the script name represents the BWA mapping tool, but what isn't obvious is that it also uses SAMtools. Make sure that these things are available before starting the commands Code Block |
---|
language | bash |
---|
title | If on an iDev node: |
---|
collapse | true |
---|
| module load bwa
module load samtools
cds
cd BDIB_Human_tutorial
align_bwa.sh raw_files/allseqs_R1.fastq trios_mapping hg19 1
|
Expand |
---|
title | If you got an error message check here for a possible reason why |
---|
| Code Block |
---|
title | If your error message was this: |
---|
| Fastq read1 File 'raw_files/allseqs_R1.fastq' not found...exiting |
Code Block |
---|
language | bash |
---|
title | Did you remember to do this after the copy command finished? |
---|
| cd $SCRATCH/BDIB_Human_tutorial/rawfiles
gunzip *.gz # this will unzip all the compressed files you have just copied |
|
Code Block |
---|
language | bash |
---|
title | Make job submission script for mapping & variant calling |
---|
collapse | true |
---|
| echo "align_bwa.sh $BI/ngs_course/human_variation/allseqs_R1.fastq trios_mapping hg19 &> aln.trios_mapping.log" > commands # this is an alternative way to generate a commands file than nano, use nano if more comfortable, or try this new option
launcher_creator.py -l map_call.sge -n map_call -t 01:00:00 -j commands -A UT-2015-05-18
module load bwa
module load samtools
qsub map_call.sge
|
|
Code Block |
---|
title | "ls -lt" output ... (a way of checking if things completed correctly |
---|
collapse | true |
---|
| -rw-r-rw-r-- 1 sphsmithded G-801020802740 5732392 May 2028 2303:0113 maptrios_callmapping.flagstat.o586338txt
-rw-rw-r----- 1 sphsmithded G-801020802740 3922149304 May 2028 2303:01 test.flagstat.txt13 trios_mapping.sorted.bam.bai
-rw-rw---r--- 1 sphsmithded G-801020802740 2175952334770401 May 2028 2303:0113 testtrios_mapping.sorted.bam.bai
-rw-rw-r----- 1 sphsmithded G-801020802740 334782188411155732 May 2028 2303:01 test.sorted.bam
-rw-r--r-- 1 sphsmith G-801020 13135 May 20 23:00 map_call.e586338
-rw------- 1 sphsmith G-801020 411244396 May 20 23:00 test.bam
-rw------- 1 sphsmith G-801020 45695040 May 20 22:49 test.read2.sai12 trios_mapping.bam
-rw-rw-r----- 1 sphsmithded G-801020802740 4537240062257040 May 2028 2203:3901 testtrios_mapping.read1read2.sai
-rw-rrw--r-- 1 sphsmithded G-801020802740 061884104 May 2028 2202:2651 maptrios_callmapping.read1.pe586338sai |
Code Block |
---|
title | "samtools flagstat trios_mapping.sorted.bam" expected results |
---|
collapse | true |
---|
| Running flagstat...
4546280 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
39922743992266 + 0 mapped (87.81%:nan%)
4546280 + 0 paired in sequencing
2273140 + 0 read1
2273140 + 0 read2
4029040234 + 0 properly paired (0.89%88%:nan%)
36369463636928 + 0 with itself and mate mapped
355328355338 + 0 singletons (7.82%:nan%)
4412843366 + 0 with mate mapped to a different chr
1563415352 + 0 with mate mapped to a different chr (mapQ>=5) |
|