...
Learn how Kallisto works and how to use it.
Introduction
Kallisto is a super fast tool for transcript quantification. It gets it speed by skipping the alignment step. Instead of actually aligning reads to a transcriptome, it identifies transcripts that a read is compatible with in order to quantify the transcript. This process is called pseudoalignment/pseudomapping.
Typically, after quality assessment, the first two steps of RNA-Seq workflow are mapping, followed by quantification of genes/transcripts. Mapping with a mapper can take a 6+ hours for a very large file. Quantification can then take several more hours. Kallisto avoids the mapping step and through a process called pseudoalignment/ pseudomapping, it proceeds directly to the quantification step. It is reported that Kallisto can quantify 30 million human reads in less than 3 minutes on a mac laptop.
Get your data
Six raw data files have been provided for all our further RNA-seq analysis:
- c1_r1, c1_r2, c1_r3 from the first biological condition
- c2_r1, c2_r2, and c2_r3 from the second biological condition
Get set up for the exercises
Code Block | ||
---|---|---|
| ||
cds cd my_rnaseq_course cd day_2/kallisto_exercise |
Code Block | ||
---|---|---|
| ||
module spider kallisto module load intel/17.0.4biocontainers module load hdf5/1.8.16 module load kallisto/0.43.1 |
Part 1. Create a index of your reference
NO NEED TO RUN THIS NOW- YOUR INDEX HAS ALREADY BEEN BUILT!
Code Block | ||
---|---|---|
| ||
kallisto index -i transcripts.idx transcripts.fasta |
Warning | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||
Create a
|
Output files:
abundances.tsv: A tsv file containing raw read count and a normalized expression value for each transcript.
Back to Course Outline