...
With that, we're ready to get started on the first exercise.
Exercise #1: BWA - Yeast ChIP-seq
Like other tools you've worked with so far, you first need to load bwa using the module system. So, go ahead and do that now, and then enter 'bwa' with no arguments to view the help page.
Expand | ||
---|---|---|
| ||
|
As you can see, there are several commands one can use with bwa to do different things. To test out one of them, we're going to index the genome with the 'index' command. If you enter 'bwa index' with no arguments, you will see a list of the required arguments. Here, we only need to specify two things: whether to use 'bwtsw' or 'is' indexing, and the name of the fasta file. We will specify 'bwtsw' as the indexing option, and the name of the FASTA file is, obviously, sacCer3.fa. The output of this command, importantly, is a group of files that are all required together as the index. So, within the references directory, we will create another directory called "yeast", move the yeast FASTA file into that directory, and run the index commands from there:
Code Block |
---|
cd $SCRATCH/references/
mkdir yeast
mv sacCer3.fa yeast
cd yeast
bwa index -a bwtsw sacCer3.fa |
This command should produce a set of output files that look like this:
Code Block |
---|
sacCer3.fa
sacCer3.fa.amb
sacCer3.fa.ann
sacCer3.fa.bwt
sacCer3.fa.pac
sacCer3.fa.sa |
Now, we're ready to execute the actual alignment, with the goal of producing a SAM/BAM file from the input FASTQ files and reference. We will first generate SAI files from each of the FASTQ files with the reference individually using the 'aln' command, then combine them (with the reference) into one SAM/BAM output file using the 'sampe' command. We need a directory to put the alignments when they are finished, as well as any intermediate files, so create a directory called 'alignments'. The command flow, all together, is as follows:
Code Block |
---|
cds
mkdir alignments
bwa aln references/yeast/sacCer3.fa fastq_align/Sample_Yeast_L005_R1.cat.fastq.gz > alignments/yeast_R1.sai
bwa aln references/yeast/sacCer3.fa fastq_align/Sample_Yeast_L005_R2.cat.fastq.gz > alignments/yeast_R2.sai
bwa sampe references/yeast/sacCer3.fa alignments/yeast_R1.sai alignments/yeast_R2.sai fastq_align/Sample_Yeast_L005_R1.cat.fastq.gz fastq_align/Sample_Yeast_L005_R2.cat.fastq.gz > alignments/yeast_pairedend.sam |
Notice how each file is in its proper directory, which requires us to specify the whole file path in the alignment commands.
Option | Effect | Best Practice Setting |
---|---|---|
-k | ||
-n | ||
-l | ||
-t |
...