GS De novo assembler

GS De novo assembler

Summary

Performs assembly of reads and generates contigs. Current Version 2.5.3. Full Roche manual.

Input options

  • Sff files

  • Fasta files

  • Converted Sanger data- Fasta files and corresponding Quality files

Output options

  • Consensus sequence (contigs)

  • Corresponding quality scores

  • ACE files

  • Assembly metrics files

  • Pairwise alignments

  • Read status file

  • Alignment views - GUI only

  • Flowgrams - GUI only

  • For paired end data, scaffold files

Running GS De Novo assembler

GUI Assembler - 

  • Can be accessed by typing gsAssembler

Commandline Assembler - 

  • runAssembly -o /data/filename /data/R_/D_

  • For paired end data, runAssembly -o /data/filename -p /data/R_/D_

Some options

  • Incremental de novo assembly - will allow you to add more data to the assembly when needed.

  • Large or complex genomes - for genomes larger than 15 Mb, use this option.

  • Trimming database file - Provide a file with fasta sequences that need to be removed (trimmed) from reads (like vectors).

  • Screening database file - Provide a file containing contamination sequences for screening.

  • cDNA assembly- use option -cdna

Things to remember

  • Reads shorter than 50 bp long are removed by default. 

  • The tool is more powerful and produces better assemblies when using sff files than just fasta files as input. The flowgrams are used when computing signals.

  • It is a good idea to use Repeatmasker to handle repeats before assembly.

  • The current assembler version uses 3 to 4 bytes of memory per base and is equipped to run only on a single processor. In cases where memory is not enough to do an assembly, try the incremental de novo assembly option.