Tricks to preprocess SOLiD and 454 data
Some tricks to preprocess/assess ABI SOLiD data
- Look for dominant sequences in your data
- grep -v '^>' F3.csfasta |sort|uniq -c -w 25|sort -n -r|head -20
-
- F3.csfasta : Input file- raw csfasta file from ABI SOLiD
- This command looks for dominant sequences with unique bases in the first 25 bases of the read - change 25 if you want more or less o the read to be considered when looking for dominant sequences.
Some tricks to preprocess/assess 454 data
- Make 454 data into format of one sequence per line
- makeSeqsOneLine 454.fna > 454.modified.fna
-
- 454.fna : Input file of raw 454 data
- 454.modified.fna : Output file of modified 454 data
- Pull out read sequences (with read id) containing a certain pattern (Let's say 'TAGGAC')
- grep -B 1 'TAGGAC' 454.modified.fna |grep -v '^-' > 454.pattern.fna
-
- 454.modified.fna : Modified 454 data
- 454.pattern.fna : Fasta file with only reads containing the specified pattern.
- Pull out read sequences (with read id) starting with a certain pattern (Let's say 'TAGGAC')
- grep -B 1 'TAGGAC' 454.modified.fna |grep -v '-' > 454.pattern.fna
-
- 454.modified.fna : Modified 454 data
- 454.pattern.fna : Fasta file with only reads starting with the specified pattern.
- To get the reverse complement sequences for a fasta file, run the following command on fourierseq:
-
- reversecomplement.pl test.fasta|sed 's/U/T/g' > test.revcomp.fasta
-
- test.fasta: Fasta input file
- test.revcomp.fasta : Fasta output file, with reverse complemented sequences
Welcome to the University Wiki Service! Please use your IID (yourEID@eid.utexas.edu) when prompted for your email address during login or click here to enter your EID. If you are experiencing any issues loading content on pages, please try these steps to clear your browser cache.