Conversion of mapreads output to GFF, SAM, or BAM format
Summary
ABI and SAM utilities can be used to convert mapreads output into GFF and then SAM format. This can be used to obtain the read sequences in base space.
Available on
ABI MaToGff: /usr/local/genome/bin/MaToGff
SAM utility GffToSam: /usr/local/genome/bin/GffToSam
How to run these utilites
Important: The mapreads alignment must have been done with quality file for the last sam conversion step to work.
1. First, to convert mapreads output files into gff files, run MaToGff:
java -Xmx1024m -cp :/usr/local/genome/bin/MaToGff/lib/java/matogff/* com.apldbio.aga.analysis.secondary.modules.MaToGff mapreads.out --convert=mapped --tempdir=/home/<userid>/tmp/ --sort > output.gff 2>output.log
where
mapreads.out : mapreads output --convert = mapped : reports only one match per read in the gff file (picks the position that aligns with lowest number of mismatches). --convert= hits :an alternative option that will report all matches. --tempdir = /home/<userid>/tmp;: provide the location of YOUR temp directory output.gff : the gff output
Note: there are numerous other options (including for --convert). Just typing in java -Xmx1024m -cp :/usr/local/genome/bin/MaToGff/lib/java/matogff/* com.apldbio.aga.analysis.secondary.modules.MaToGff without any parameters will display all available options.
2. Next, convert to base space/annotate the gff file run AnnotatecChanges ::
java -Xmx1024m -cp :/usr/local/genome/bin/MaToGff/lib/java/matogff/* com.apldbio.aga.analysis.secondary.modules.AnnotateChanges output.gff reference.fasta --b --correctTo=[reference|GSAF:consistent|missing|singles] > output.annotated.gff 2>annotate.log
where
output.gff : gff output from previous MaToGff utility reference.fasta : reference file that mapping was performed against (in fasta format) --b : Converts to base space --correctTo = indicates how to perform correction when converting to base space output.annotatedgff : the annotated gff file with read sequences in base space
Note: To get an understanding of all options, type in java -Xmx1024m -cp :/usr/local/genome/bin/MaToGff/lib/java/matogff/* com.apldbio.aga.analysis.secondary.modules.AnnotateChanges without any parameters.
3. To convert this gff file to SAM format :
GffToSam -i output.annotated.gff -o output.annotated.sam -id [GSAF:someidforthisfile] -sm [GSAF:samplename]
Because ABI gff to SAM converter converts to an older SAM format, it does not have the required @SQ headers on top. So, it needs to be added manually. For every chromosome, add a line like:
@SQ\tSN:[GSAF:chrno]\tLN:[GSAF:numberofbasesinchromosome]
4. To convert this SAM file to BAM format:
samtools view -b -S output.annotated.sam > output.bam"
where
-b: indicates output BAM -S: indicates input SAM
5. To sort and index BAM file:
samtools sort output.bam output.sorted"
where
output.bam : input BAM file output.sorted.bam : sorted BAM file
"samtools index output.sorted.bam output.sorted.bai"
where
output.sorted.bam : input sorted BAM file output.sorted.bai: indexed BAM file
Note: For viewing in IGV, make sure that the sorted and indexed bam files are in the same directory (and named as file.bam and file.bai). Load the BAM file into IGV to view alignment.
Welcome to the University Wiki Service! Please use your IID (yourEID@eid.utexas.edu) when prompted for your email address during login or click here to enter your EID. If you are experiencing any issues loading content on pages, please try these steps to clear your browser cache.