Table of Contents |
---|
...
Expand title Why are some reads different colors? Right click on one of the reads to bring up a set of options. Look at the color alignment by section to see what it currently is and what you might prefer it as. Interested in determining the probability that a read is not where it should be? What is a typical mapping quality (MQ) for a read?
Expand title Click here for the formula. The estimated probability that a read is mapped incorrectly is 10^(-MQ/10). Where MQ is the mapping quality.
Can you find a variant where the sequenced sample differs from the reference? This would be like looking for a needle in a haystack if not for the use of variant callers and the
control-f
andcontrol-b
options to zoom right to areas where there are discrepancies between reads and the reference genome that might indicate there were mutations in the sequenced E. coli.Expand title Some interesting example coordinates Expand title Coordinate 161,041. What gene is this in and what is the effect on the protein sequence? Gene is pcnB, mutation is a snp
Expand title Coordinate 3,248,957. What gene is this in and what is the effect on the protein sequence? Gene is infB, mutation is a snp
Expand title Coordinate 3,894,997. What type of mutation is this? Deletion of the rbsD gene
Expand title Check out the rbsA gene region? What's going on here? There was a large deletion. Can you figure out the exact coordinates of the endpoints?
Expand title Navigate to coordinate 3,289,962. Compare the results for different alignment programs and settings. Can you explain what's going on here? There is a 16 base deletion in the gltB gene reading frame.
Expand title What is going on in the pykF gene region? You might see red read pairs. What does that mean? Can you guess what type of mutation occurred here? The read pairs are discordantly mapped. There was an insertion of a new copy of a mobile genetic element (an IS150 element) that exists at other locations in the reference sequence.
- See if you can find more interesting locations. There are ~40 ~190 mutations total in this sample MOST of which are false positives.
...
- Close IGV (if you have it open from the first tutorial with your mapping, SNV, and SV data) and reopen it.
- Select "Human hg19" as the reference genome from the top left drop down (you may need to select "more" to have hg19 as an option)
- Load the bam files you downloaded: File > Load from File… and select HCC1143.normal.21.19M-20M.bam
- Turn on dbSNP annotations File > Load from Server… >Annotation > Variation and Repeats > dbSNP 1.4.7
- Right click on the track name on the left and select sort alignments by start location
There are 2 mutations visible in the chr21:19,479,237-19,479,814 region answer the following questions:
Expand title Are both SNPS supported by reads mapping to both the forward and reverse DNA strand (hint: make sure reads are colored by strand)? Yes, both forward and reverse reads (red and blue if colored by strand) contain the SNPs compared to the reference
Expand title Which is more likely to be related to disease? why? The one on the left does not correspond to a dbSNP entry and is therefore more likely to be related to disease state.
There are 2 SNPs visible in the chr21:19,666,833-19,667,007 region. Answer the following questions:
Expand title Two mutations very close together is often a case of poor alignment scores. Is that the case here (remember this is human data)? No, each read only has 1 mutation on it, these are 2 different alleles each with its own SNP relative to 'wt'. Both are reported in dbSNP
Expand title Is either likely to be related to disease? Neither is likely to be related to disease (or at least not to rare disease) as both mutations have previously been identified as naturally occurring by dbSNP
Expand title What is going on in the chr21:19,324,469-19,331,468 region? Homozygous deletion. In the track on the left, right click and select 'view as pairs' to see linkage between R1 and R2 to see individual reads mapping to both sides of the deletion
Expand title What is going on in the chr21:19,102,154-19,103,108 region? This is an example of poor alignment to a repetitive AluY element. Notice how of the read pairs that map with numerous SNPs have 1 read that maps with lots of SNPs and the other read maps with none? This is caused by mapping reads to a limited area of the whole genome, if these reads had been allowed to map to the entire genome it is very likely that both read pairs would map without SNPs somewhere else in the genome.
What other interesting things can you find?
Optional Tutorial Exercises ...
...