Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Throughout the course we have focused on aligning a single sample of reads against a single reference genome with a few exceptions, but sometimes we know that is not the biological case. Sometimes we know that there are multiple different things present in a single sample. Most common situation would be a sample with a chromosome as well as a plasmid. Here we will examine the same sample used in the novel DNA identification tutorial to see how inclusion of the 2nd reference file changes the mapping results.

Note

The discussion of concepts in this tutorial are identical to the advanced mapping with bowtie2 tutorial and work with the same data. Discussion of results is different in both tutorials.


Learning objectives

  1. Understand how mapping against multiple reference genomes simultaneously decreases noise and increases confidence in variant calls.
  2. Map reads against multiple references, and evaluate mapping success/frequency.

Mapping against multiple references

...

Note that the alignment of columns is not awesome but can easily be copied to excel for better alignment .Additionally note that and that this would work for any number of sam files (aka samples). 


Info
titleDownstream consequences of mapping to both references at the same time.

Notice that if you divide the reads mapping to the chromosome (1556647) by the total reads (2452484) you get 63.47% not the 65.74% listed above for mapping to the single reference.

Expand
titleAny idea why?

As mentioned in the introduction, this is expected as having the second reference present allows for individual reads to map to the correct reference and with a higher score than it would have mapped to the incorrect reference, rather than forcing the read to map to the only available reference and reporting a lower alignment score.

If you were to take this set of reads, map it to the chromosome alone (as was done in the novel DNA tutorial), and map it to both chromosome and plasmid together (as you have just done here) and call SNVs on both sam files (with knowledge you gained from the SNV tutorial) and compare the variants, you would see fewer variants in the sample where the reads were mapped to both references. You could further use the information in the human trios tutorial to call the SNVs at the same time, and conduce the comparison presented in that tutorial to specifically identify which variants are different.

While the data presented above may be referred to as 'tidy format' where each line contains 1 piece of information, making it easy for computers to parse, personally, my human eyes prefer tables where rows are related to samples (in this case sam files), and columns are related to values (in this case chromosomes mapped to). I'm sure that someone better versed in bash/awk/etc could put together a script to do so directly from the raw file. As I'm better versed in python, instead I use the for loop from the above command, and pipe the tidy results to a .tsv file, then use a python script to convert it to a table. How can you get a hold of this awesome tool you ask? BioITeam of course. 

...