GS De novo assembler
Summary
Performs assembly of reads and generates contigs. Current Version 2.5.3. Full Roche manual.
Input options
Sff files
Fasta files
Converted Sanger data- Fasta files and corresponding Quality files
Output options
Consensus sequence (contigs)
Corresponding quality scores
ACE files
Assembly metrics files
Pairwise alignments
Read status file
Alignment views - GUI only
Flowgrams - GUI only
For paired end data, scaffold files
Running GS De Novo assembler
GUI Assembler -
Can be accessed by typing gsAssembler
Commandline Assembler -
runAssembly -o /data/filename /data/R_/D_
For paired end data, runAssembly -o /data/filename -p /data/R_/D_
Some options
Incremental de novo assembly - will allow you to add more data to the assembly when needed.
Large or complex genomes - for genomes larger than 15 Mb, use this option.
Trimming database file - Provide a file with fasta sequences that need to be removed (trimmed) from reads (like vectors).
Screening database file - Provide a file containing contamination sequences for screening.
cDNA assembly- use option -cdna
Things to remember
Reads shorter than 50 bp long are removed by default.
The tool is more powerful and produces better assemblies when using sff files than just fasta files as input. The flowgrams are used when computing signals.
It is a good idea to use Repeatmasker to handle repeats before assembly.
The current assembler version uses 3 to 4 bytes of memory per base and is equipped to run only on a single processor. In cases where memory is not enough to do an assembly, try the incremental de novo assembly option.