Conversion of mapreads output to GFF, SAM, or BAM format

Summary

ABI and SAM utilities can be used to convert mapreads output into GFF and then SAM format. This can be used to obtain the read sequences in base space.

Available on

Fourierseq

ABI MaToGff: /usr/local/genome/bin/MaToGff
SAM utility GffToSam: /usr/local/genome/bin/GffToSam

How to run these utilites

Important: The mapreads alignment must have been done with quality file for the last sam conversion step to work.

1. First, to convert mapreads output files into gff files, run MaToGff:

java -Xmx1024m -cp :/usr/local/genome/bin/MaToGff/lib/java/matogff/* com.apldbio.aga.analysis.secondary.modules.MaToGff mapreads.out --convert=mapped --tempdir=/home/<userid>/tmp/ --sort > output.gff 2>output.log

where

mapreads.out : mapreads output
       --convert = mapped : reports only one match per read in the gff file (picks the position that aligns with lowest number of mismatches).
       --convert= hits :an alternative option that will report all matches.
       --tempdir = /home/<userid>/tmp;: provide the location of YOUR temp directory
       output.gff : the gff output

Note: there are numerous other options (including for --convert). Just typing in java -Xmx1024m -cp :/usr/local/genome/bin/MaToGff/lib/java/matogff/* com.apldbio.aga.analysis.secondary.modules.MaToGff without any parameters will display all available options.

2. Next, convert to base space/annotate the gff file run AnnotatecChanges ::

java -Xmx1024m -cp :/usr/local/genome/bin/MaToGff/lib/java/matogff/* com.apldbio.aga.analysis.secondary.modules.AnnotateChanges output.gff reference.fasta --b --correctTo=[reference|GSAF:consistent|missing|singles] > output.annotated.gff 2>annotate.log

where

output.gff : gff output from previous MaToGff utility
       reference.fasta : reference file that mapping was performed against (in fasta format)
       --b : Converts to base space
       --correctTo = indicates how to perform correction when converting to base space
       output.annotatedgff : the annotated gff file with read sequences in base space

Note: To get an understanding of all options, type in java -Xmx1024m -cp :/usr/local/genome/bin/MaToGff/lib/java/matogff/* com.apldbio.aga.analysis.secondary.modules.AnnotateChanges without any parameters.

3. To convert this gff file to SAM format :

GffToSam -i output.annotated.gff -o output.annotated.sam -id [GSAF:someidforthisfile] -sm [GSAF:samplename]

Because ABI gff to SAM converter converts to an older SAM format, it does not have the required @SQ headers on top. So, it needs to be added manually. For every chromosome, add a line like:

@SQ\tSN:[GSAF:chrno]\tLN:[GSAF:numberofbasesinchromosome]

4. To convert this SAM file to BAM format:

samtools view -b -S output.annotated.sam > output.bam"

where

-b: indicates output BAM
            -S: indicates input SAM

5. To sort and index BAM file:

samtools sort output.bam output.sorted"

where

output.bam :  input BAM file
            output.sorted.bam : sorted BAM file

"samtools index output.sorted.bam output.sorted.bai"

where

output.sorted.bam :  input sorted BAM file
            output.sorted.bai: indexed BAM file

Note: For viewing in IGV, make sure that the sorted and indexed bam files are in the same directory (and named as file.bam and file.bai). Load the BAM file into IGV to view alignment.