Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

IGV likes its reference genome files in GFF (Gene Feature Format). Unfortunately, our old friend bp_seqconvert.pl doesn't deal with GFF. So, we're going to show you another tool for sequence format conversion called Readseq. We've already installed it into the $BI/bin directory so you don't have to, but here we provide the steps that can be used to install it in a local directory.

Expand
titleWe've readseq.jar is already installed it into the $BI/bin directory so you don't have to install it yourself, but here we provide are the steps that can be used to install it in a local directory.

To use it you need to first download the file readseq.jar linked from here. To get this onto TACC easily, use:

Code Block
wget httphttps://iubio.bio.indiana.edu/soft/molbiosourceforge.net/projects/readseq/files/readseq/java2.1.19/readseq.jar 

...


After that, you simply need to know where you downloaded it. As it is an executable $HOME or $WORK would be good places for it if you were going to use it on TACC (incase you can't remember where the BioITeam installation is) or if you were going to put it on your laptop as you may get tired transferring files back and forth just to do simple file conversions if you have to do them often. In tomorrows final tutorial there will be a section about making java calls easier.

Readseq is written in java which makes it a little more complicated to use, but the general command to run the software is one of these (note that you do need to include the entire path, not just the "readseq.jar" name):

Code Block
java -jar /corral-repl/utexas/BioITeam/bin/readseq.jar
java -cp /corral-repl/utexas/BioITeam/bin/readseq.jar run

...

Expand
titleWhy the funny invocation?
You are actually using the command java and telling it where to find a "jar" file of java code to run. The -jar and -cp options run it in different waysjava and telling it where to find a "jar" file of java code to run. The -jar and -cp options run it in different ways. It is important to learn that java executables (.jar files) always require specifying the full path to the executable. In tomorrows final lecture we'll cover how you can work around this so you can build your own shortcuts and not have to remember where all your .jar files are stored (can be particularly difficult if you store them in different places (like some in your $HOME/local/bin directory, and some in various BioITeam directories.


To do the conversion that we want, use this command:

...

  1. Close IGV (if you have it open from the first tutorial with your mapping, SNV, and SV data) and reopen it. 
  2. Select "Human hg19" as the reference genome from the top left drop down (you may need to select "more" to have hg19 as an option)
  3. Load the bam files you downloaded: File > Load from File…  and select HCC1143.normal.21.19M-20M.bam
  4. Turn on dbSNP annotations File > Load from Server… > Tutorials > Variants > dbSNP 1.3.1
  5. Right click on the track name on the left and select sort alignments by start location
  6. There are 2 mutations visible in the chr21:19,479,237-19,479,814 region answer the following questions:
    1. Are both SNPS supported by reads mapping to both the forward and reverse DNA strand (hint: make sure reads are colored by strand)?
    2. Which is more likely to be related to disease? why?

      Expand
      titleAnswers

      a. Yes, both forward and reverse reads (red and blue if colored by strand) contain the SNPs compared to the reference

      b. The one on the left does not correspond to a dbSNP entry and is therefore more likely to be related to disease state


  7. There are 2 SNPs visible in the chr21:19,666,833-19,667,007 region. Answer the following questions:

    1. Two mutations very close together is often a case of poor alignment scores. Is that the case here (remember this is human data)?

    2. Is either likely to be related to disease? 

      Expand
      titleAnswers

      a. No, each read only has 1 mutation on it, these are 2 different alleles each with its own SNP relative to 'wt'. Both are reported in dbSNP

      b. Neither is likely to be related to disease or at least to rare disease as both mutations have previously been identified as naturally occurring by dbSNP


  8. What is going on in the chr21:19,324,469-19,331,468 region?

    Expand
    titleAnswers

    Homozygous deletion. In the track on the left, right click and select 'view as pairs' to see linkage between R1 and R2 to see individual reads mapping to both sides of the deletion

  9. What is going on in the chr21:19,102,154-19,103,108 region?

    Expand
    titleAnswers

    This is an example of poor alignment to a repetitive AluY element. Notice how of the read pairs that map with numerous SNPs have 1 read that maps with lots of SNPs and the other read maps with none? This is caused by mapping reads to a limited area of the whole genome, if these reads had been allowed to map to the entire genome it is very likely that both read pairs would map without SNPs somewhere else in the genome.

  10. What other interesting things can you find?


Optional Tutorial Exercises ...

...