SOAP

Summary

SOAP2.18 (short oligonucleotide analysis package) is a versatile and fast aligner for short reads. It uses the 2-way burrows-wheeler transform to reduce the amount of memory needed while mapping. It handles only base space data.

Available on

Fourierseq

User documentation

How to run SOAP

Because SOAP does not handle color space data, the only way to use SOAP with color space reads is to convert both the reads and the reference to mock base space.

Example pipeline for running soap with color space reads (when dealing with base space reads, follow step 3 onwards)

1. Convert the reference to mock base space.

bs2cs ref.fasta > ref.csfasta

cs2mbs ref.csfasta > ref.m.fasta

where

ref.fasta : reference in base space
        ref.csfasta : reference in color space (for temporary purposes)
        ref.m.fasta : reference in mock base space

2. Convert the reads to mock base space

cs2mbs -d -r in.csfasta > in.m.fasta

where

  
in.csfasta : reads file in color space
in.m.fasta : reads file in mock base space
\-d : drop the first colorspace base during conversion. This will ignore the first color space base which is part of the primer.
\-r :  For each read, include the reverse of the mock base space sequence.

3. Create SOAP indexes for the reference genome

2bwt-builder ref.m.fasta

where

ref.m.fasta : reference in mock base space

4. Align using SOAP

soap -D ref.m.fasta.index -v 3 -a in.m.fasta -o out

where

ref.m.fasta.index : base name for the SOAP reference indexes
in.m.fasta : reads file (in mock base space)
out : mapping output file
\-v 3 : mismatches allowed in the entire alignment

Troubleshooting

  • If you have lots of warning message as 'length y < 0, countinue as 13', it means that your read length is too short, so SOAP cannot handle them properly. Currently, SOAP supports only reads longer than 30 bp.( NewsGroup article; it described 2.18 version, but 2.20 shows the same result.)