Evaluating eukaryotic assemblies
PASA
CEGMA
Visualizing assemblies using Mauve
Mauve - progressiveAlignment
IGV - glimmer to bed or gff3 conversion
Creating publication quality genome graphics
- Circos
- cgview
Probably best installed
Using Mauve to visualize the assembly
Copy the contigs.fa files in the single_out, pairedc20_out, pairedc25_out, and pairedc50_out directories that were assembled earlier to a directory on your computer's Desktop. Make sure you rename the files as you copy them. They're all named contigs.fa, so if you don't rename them, you'll overwrite all of the files as you copy them.
If you happen to be unable to complete your Velvet assembly, you can copy the following files to your local computer:uses a fast sequence matching algorithm to identify Locally Collinear Blocks (LCBs) between genomes. It has a really slick interface for viewing whole genome alignments or of viewing your genome assembly aligned to a closely related species.
Download Mauve
On your Desktop (local) machine. Go here in a web browser:
Choose Linux (it's ok to leave the other boxes blank).
Download Data
In a terminal on your Desktop (local) machine.
| Code Block |
|---|
cd ~/Desktop mkdir velvet_contigs cd velvet_contigs scp your_name@lonestar.taccscp -r username@tacc.utexas.edu:/corral-repl/utexas/BioITeam/velvet/all_fa/*paired*fa ./ |
"pairedc20" means the read set that has 3 subsets each with coverage of 20.
On your local computer.
Download Mauve from http://asap.ahabs.wisc.edu/mauve/download.php
The package comes pre-installed so all that's necessary to do is to execute the "Mauve" file.
With Mauve open, go to File -> Align with ProgressiveMauve
Find the 3 contig files ending with ".fa" that correspond to the paired reads and add them to be aligned on the list. Then create the alignment.
When the alignment is complete, you'll see a graph.
For an explanation of this graph, the following was taken from the Mauve website (http://asap.ahabs.wisc.edu/mauve-aligner/mauve-user-guide/using-the-alignment-viewer.html):
When a block lies above the center line the aligned region is in the forward orientation relative to the first genome sequence. Blocks below the center line indicate regions that align in the reverse complement (inverse) orientation. Regions outside blocks lack detectable homology among the input genomes. Inside each block Mauve draws a similarity profile of the genome sequence. The height of the similarity profile corresponds to the average level of conservation in that region of the genome sequence. Areas that are completely white were not aligned and probably contain sequence elements specific to a particular genome.
...
|
Example: Comparing two whole genomes
First we're just going to view two related species versus one another.
Choose: File → Align with progressiveMauve.
Navigate to the mauve_data folder that you downloaded and choose:
NC_000913.2.gbkNC_004631.1.gbk
Then hit the Align... button. Choose to save the result as genome_alignment_result or something similar.
A console window will pop up and show a bunch of commands that are being run at the command line for you. You could run them yourself at the terminal if you wanted to use Mauve on Lonestar or another machine in power-user mode.
After a little bit, a window will pop up showing the aligned genomes. It should look something like this
Navigating in Mauve
- You can move around and zoon in and out using control and the arrow keys.
- You can click on a region in one genome to center the aligned regions in the other genome to it.
What do the
Example: Comparing an assembly to a reference genome
Transferring annotation
Mauve has a useful feature to transfer the coordinates of genes across the alignment that it has made.
Other ways to creating genome graphics
There are various powerful (non-interactive) programs
- Circos
- cgview
These tools are easiest to install Probably best installed
Evaluating eukaryotic assemblies
If you are going to seriously annotate a eukaryotic genome, then you are going to need a machine with database infrastructure and other tools installed. Here are two systems for annotating gene structure in eukaryotic genomes:
PASA - Program to Assemble Spliced Alignments
CEGMA - Core Eukaryotic Genes Mapping Approach