...
| Code Block | ||||
|---|---|---|---|---|
| ||||
idev -m 180 -N 1 -A OTH21164 -r CoreNGS # or -A TRA23004-Thu idev -m 120 -N 1 -A OTH21164 -p development # or -A TRA23004 |
| Code Block | ||
|---|---|---|
| ||
module load biocontainers # takes a while module load bwa bwa |
...
| Code Block | ||||
|---|---|---|---|---|
| ||||
mkdir -p $SCRATCH/core_ngs/references/bwa/sacCer3 cd $SCRATCH/core_ngs/references/bwa/sacCer3 # Create a symbolic link to the sacCer3 FASTA file so it can # be accessed directly from this directory ln -sf ../../fasta/sacCer3.fa ls -l |
...
| Code Block | ||||
|---|---|---|---|---|
| ||||
mkdir -p $SCRATCH/core_ngs/alignment/yeast_bwa cd $SCRATCH/core_ngs/alignment/yeast_bwa # Create symbolic links to the fastq and sacCer3 reference directories # so we can access them directly from this alignment directory ln -sf ../fastq ln -sf ../../references/bwa/sacCer3 ls -l |
As our workflow indicated, we first use bwa aln on the R1 and R2 FASTQs, producing a BWA-specific .sai intermediate binary files.
Exercise: What does does bwa aln needs in the way of arguments?
...
And where does it write its binary output?
| Expand | ||
|---|---|---|
| ||
| Code Block | | language | bash
Required arguments are a <prefix> of the bwa index files, and the input FASTQ file. There are lots of options, but here is a summary of the most important ones.
...
Other options control the details of how much a mismatch or gap is penalized, limits on the number of acceptable hits per read, and so on. Much more information can be found on the BWA manual page.
For a basic alignment like this, we can just go with the default alignment parameters.
Note that bwa writes its (binary) output to standard output by default, so we need to redirect that to a .sai file.
For simplicity, we will just execute these commands directly, one at a time. Each command should only take few minutes and you will see bwa's progress messages in your terminal.
| Code Block | ||||
|---|---|---|---|---|
| ||||
| # If not already loaded:
module load biocontainers
module load bwa
cd $SCRATCH/core_ngs/alignment/yeast 2>&1 | more If you just type bwa aln | more, the text will probably scroll off your Terminal. This is because bwa always writes its usage to standard error, not to standard output. The pipe ( | ) only connects standard output from bwa aln to standard input of the more command – but no standard output is generated. The standard error text that is generated just goes to your Terminal, bypassing more. |
| Expand | ||
|---|---|---|
| ||
The bwa aln usage says: Required arguments are a <prefix> of the bwa index files, and the input FASTQ file. There are no arguments specified for the program's output results, but later in the so we know output results are written to standard output by default (unless the -f option is specified). |
There are lots of options, but here is a summary of the most important ones.
| Option | Effect |
|---|---|
| -l | Specifies the length of the seed (default = 32) |
| -k | Specifies the number of mismatches allowable in the seed of each alignment (default = 2) |
| -n | Specifies the number of mismatches (or fraction of bases in a given alignment that can be mismatches) in the entire alignment (including the seed) (default = 0.04) |
| -t | Specifies the number of threads |
Other options control the details of how much a mismatch or gap is penalized, limits on the number of acceptable hits per read, and so on. Much more information can be found on the BWA manual page. For a basic alignment like this, we can just go with the default alignment parameters.
Since bwa writes its (binary) output to standard output by default, so we need to redirect that to a .sai file.
For simplicity, we will just execute these commands directly, one at a time. Each command should only take few minutes and you will see bwa's progress messages in your terminal.
| Code Block | ||||
|---|---|---|---|---|
| ||||
# If not already loaded:
module load biocontainers
module load bwa
cd $SCRATCH/core_ngs/alignment/yeast_bwa
bwa aln sacCer3/sacCer3.fa fastq/Sample_Yeast_L005_R1.cat.fastq.gz > yeast_pe_R1.sai
bwa aln sacCer3/sacCer3.fa fastq/Sample_Yeast_L005_R2.cat.fastq.gz > yeast_pe_R2.sai |
...
| Tip | ||
|---|---|---|
| ||
Double check that output was written by doing ls -lh and making sure the file sizes listed are not 0.. When redirection ( > ) to a file is used, the target file is created whether or not the command produced any output! |
Exercise: How long did it take to align the R2 file?
| Expand | |||||
|---|---|---|---|---|---|
| |||||
The last few lines of bwa's execution output should look something like this:
So the R2 alignment took ~85 ~78 seconds (~1.4 3 minutes). |
Since you have your own private compute node, you can use all its resources. It has 128 cores, so re-run the R2 alignment asking for 60 execution threads.
...
| Expand | ||||||
|---|---|---|---|---|---|---|
| ||||||
The last few lines of bwa's execution output should look something like this:
So the R2 alignment took only ~8 seconds (real time), or 10+ times as fast as with only one processing thread. Note, though, that the CPU time with 60 threads was greater (~180 sec) than with only 1 thread (~85 sec). That's because of the thread management overhead when using multiple threads.
So the R2 alignment took only ~6 seconds (real time), or 10+ times as fast as with only one processing thread. Note, though, that the CPU time with 60 threads was greater (~166 sec) than with only 1 thread (~78 sec). That's because of the thread management overhead when using multiple threads. |
Exercise: How would you capture the diagnostic output from bwa aln?
| Expand | ||
|---|---|---|
| ||
Examine the sub-command's usage: bwa aln 2>&1 | more If you just type bwa aln | more, the text will probably scroll off your Terminal. This is because bwa always writes its usage to standard error, not to standard output. The pipe ( | ) only connects standard output from bwa aln to standard input of the more command – but no standard output is generated. The standard error text that is generated just goes to your Terminal, bypassing more. |
| Expand | ||
|---|---|---|
| ||
The bwa aln usage says: There are no arguments specified for the program's output results, but later in the so we know output results are written to standard output by default (unless the -f option is specified). |
N
Next we use the bwa sampe command to pair the reads and output SAM format data. Just type that command in with no arguments to see its usage.
...
You may have noticed that some alignment records contain contig names (e.g. chrV) in field 3 while others contain an asterisk ( * ). The * means the record didn't map. We're going to use this heuristic along with cut to see about how many records represent aligned sequences. (Note this is not the strictly correct method of finding unmapped reads because not all unmapped reads have an asterisk in field 3. Later you'll see how to properly distinguish between mapped and unmapped reads using samtools.)
...