Transcriptome Assembly
Assembly of RNA-seq short reads into a transcriptome. 12 hour minimum ($876 internal, $1,116 external) per project.
1. Quality Assessment
Quality of data assessed by FastQC.
- Deliverables
- Reports generated by FastQC.
- Tools Used
- FastQC: Used to generate quality summaries of data:
- Per base sequence quality report: useful for deciding if trimming necessary.
- Sequence duplication levels: evaluation of library complexity. Higher levels of sequence duplication may be expected for high coverage RNAseq data.
- Overrepresented sequences: evaluation of adapter contamination.
- FastQC: Used to generate quality summaries of data:
2. Assembly
We use community standard assemblers to generate a de novo assembly. Assembly is a very computationally complex task, and may not finish within the time limits imposed on compute jobs at TACC, especially for large data sets. If an initial assembly run doesn't complete within TACC time limits, we employ a variety of strategies such as in silico normalization to get a complete assembly. There is no charge if assembly is not possible.
- Deliverables
FASTA file of assembly.
Quantitative assessment of transcriptome completeness (using the BUSCO metric)
- Tools Used
- Trinity for eukaryotes
- rnaSPAdes for bacteria
- BUSCO
3. ORF prediction
- Deliverables
FASTA file containing predicted ORFs for each transcript.
- Tools Used
- Transdecoder
4. Optional: Annotation based on Homology
The assembled transcripts are processed using eggNOG mapper, including BLASTing against UniProt databases, HMMER search against PFAM databases, and KEGG database mapping. The results of these searches provide annotations related to the function of the assembled transcripts, which are important, e.g., for downstream differential expression analysis.
- Deliverables
Annotation files containing annotated information for each predicted ORF, e.g., homologous gene name, functional annotation, GO terms, KEGG classification.
Tools Used
eggNOG mapper