...
Mauve uses a fast sequence matching algorithm to identify Locally Collinear Blocks (LCBs) between genomes. It has a really slick interface for viewing whole genome alignments or of viewing your genome assembly aligned to a closely related species.
...
Install Mauve
On your Desktop (local) machine. Go here in a web browser:
...
Choose Linux (it's ok to leave the other boxes blank).
Mauve has a binary that you should be able to launch by double-clicking on it.
Download Data
In a terminal on your Desktop (local) machine.
| Code Block |
|---|
cd ~/Desktop scp -r username@taccusername@lonestar.tacc.utexas.edu:/corral-repl/utexas/BioITeam/ngs_course/mauve_examples ~/Desktop |
Example: Comparing two whole genomes
First we're just going to view two related species versus one another.
Choose: File → Align with progressiveMauve...
Navigate to the mauve_dataexamples folder that you downloaded and choose:
...
Example: Comparing assemblies to a reference genome
Next, we are going to compare several assemblies that were created in.
Choose: Tools → Move Contigs...
Move Contigs reorders the contigs in each assembly based on their similarity so that we won't have a jumbled mess of connections and will be better able to compare the assemblies.
Navigate to the mauve_examples folder that you downloaded and choose:
NC_000913.2.gbk- reference genome for a related (but not identical) strain.pairedc20.fa
Be sure to add them in this order! We need the reference first.
To remind you of what we are looking at:
| Set 1 | Set 2 | Set 3 | Set 4 |
|---|---|---|---|---|
File Name | single.fa | pairedc50.fa | pairedc25.fa | pairedc20.fa |
Read Size | 100 | 100 | 100 | 100 |
Paired/Single Reads | Single | Paired | Paired | Paired |
Gap Sizes | NA | 400 | 400, 3000 | 400, 3000, 1500 |
Coverage | 50 | 50 | 25 for each subset | 20 for each subset |
Number of Subsets | 1 | 1 | 2 | 3 |
Try out some of the other files as well.
You can also try to assemble all five files at once with Align with progressiveMauve...:
NC_000913.2.gbksingle.fapairedc20.fapairedc25.fapairedc50.fa
Warning! single.fa has a lot of sequences. If you add it to the mix, things will get ugly.
Transferring annotation
Mauve has a useful feature to transfer the coordinates of genes across the alignment that it has made by Move Contigs. This can be a good way of assigning orthologs.
Choose: Tools → Export... → Export Orthologs
Other ways to create genome graphics
There are various powerful (non-interactive) programs for drawing the awesome pictures that you see in . These tools are easiest to install on a computer where you have administrator privilegespublications. They are not CPU intensive and you have to fiddle with the configuration files, so it's best to use them on your own computer where you can easily view the images that they create. These tools are also easiest to install on a computer where you have administrator privileges.
- Circos
- cgview
Evaluating eukaryotic assemblies
If you are going to seriously annotate a eukaryotic genome, then you are going to need a machine with database infrastructure and other tools installed. Here are two systems for annotating gene structure in eukaryotic genomes:
...
- - fully featured. Very cool functions for drawing connections between different chromosome locations.
- CGView Simpler to use. Better functionality for showing genes up close as arrows. Does not draw links.