Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Learn how Kallisto works and how to use it.

Introduction

Kallisto is a super fast tool for transcript quantification. It gets it speed by skipping the alignment step. Instead of actually aligning reads to a transcriptome, it identifies transcripts that a read is compatible with in order to quantify the transcript. This process is called pseudoalignment/pseudomapping.


Typically, after quality assessment, the first two steps of RNA-Seq workflow are mapping, followed by quantification of genes/transcripts.  Mapping with a mapper can take a 6+ hours for a very large file. Quantification can then take several more hours. Kallisto avoids the mapping step and through a process called pseudoalignment/ pseudomapping, it proceeds directly to the quantification step.  It is reported that Kallisto can quantify 30 million human reads in less than 3 minutes on a mac laptop. 




Get your data

Six raw data files have been provided for all our further RNA-seq analysis:

  • c1_r1, c1_r2, c1_r3 from the first biological condition
  • c2_r1, c2_r2, and c2_r3 from the second biological condition


Get set up for the exercises


Code Block
titleGet set up for the exercises
cds
cd my_rnaseq_course
cd day_2/kallisto_exercise



Kallisto is available on lonestar6- check for it.


Code Block
titleLook for kallisto module
module load biocontainers
module load kallisto


Code Block
titleFind the singularity command
type kallisto


Part 1. Create a index of your reference

NO NEED TO RUN THIS NOW- YOUR INDEX HAS ALREADY BEEN BUILT!

Code Block
titleKallisto index
singularity exec ${BIOCONTAINER_DIR}/biocontainers/kallisto/kallisto-0.45.0--hdcc98e5_0.simg kallisto index -i transcripts.idx transcripts.fasta 

Part 2. Quantify transcripts using kallisto


Warning
titleSubmit to the TACC queue or run in an idev shell

Create a commands file and use launcher_creator.py followed by sbatch.

Code Block
titlePut this in your commands file
nano commands.quant

singularity exec ${BIOCONTAINER_DIR}/biocontainers/kallisto/kallisto-0.45.0--hdcc98e5_0.simg kallisto quant -i ../reference/transcripts.idx -b 100 -o GSM794483_kallisto ../data/GSM794483_C1_R1_1.fq ../data/GSM794483_C1_R1_2.fq
singularity exec ${BIOCONTAINER_DIR}/biocontainers/kallisto/kallisto-0.45.0--hdcc98e5_0.simg kallisto quant -i ../reference/transcripts.idx -b 100 -o GSM794484_kallisto ../data/GSM794484_C1_R2_1.fq ../data/GSM794484_C1_R2_2.fq
singularity exec ${BIOCONTAINER_DIR}/biocontainers/kallisto/kallisto-0.45.0--hdcc98e5_0.simg kallisto quant -i ../reference/transcripts.idx -b 100 -o GSM794485_kallisto ../data/GSM794485_C1_R3_1.fq ../data/GSM794485_C1_R3_2.fq
singularity exec ${BIOCONTAINER_DIR}/biocontainers/kallisto/kallisto-0.45.0--hdcc98e5_0.simg kallisto quant -i ../reference/transcripts.idx -b 100 -o GSM794486_kallisto ../data/GSM794486_C2_R1_1.fq ../data/GSM794486_C2_R1_2.fq
singularity exec ${BIOCONTAINER_DIR}/biocontainers/kallisto/kallisto-0.45.0--hdcc98e5_0.simg kallisto quant -i ../reference/transcripts.idx -b 100 -o GSM794487_kallisto ../data/GSM794487_C2_R2_1.fq ../data/GSM794487_C2_R2_2.fq
singularity exec ${BIOCONTAINER_DIR}/biocontainers/kallisto/kallisto-0.45.0--hdcc98e5_0.simg kallisto quant -i ../reference/transcripts.idx -b 100 -o GSM794488_kallisto ../data/GSM794488_C2_R3_1.fq ../data/GSM794488_C2_R3_2.fq 


Expand
titleUse this Launcher_creator command

launcher_creator.py -n kallisto -t 01:00:00 -j commands.quant -q normal -a OTH21164 -l kallisto_launcher.slurm -m "module load biocontainers;module load kallisto"



Output files:

abundances.tsv:  A tsv file containing raw read count and a normalized expression value for each transcript. 


Back to Course Outline