Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

« Previous Version 8 Next »

All these steps have already been run. We'll be spending time looking at the commands and output. Let's get set up.

Get to the results

Get set up
cds
cd my_rnaseq_course
cp /corral-repl/utexas/BioITeam/rnaseq_course/cufflinks_results .

Step 1: Tophat

We've already gone over how tophat results look here so let's move on to step 2. 

HOW WAS IT RUN?

 How do I look into the file?
#If you've copied it over successfully:
cat run_commands/commands.cufflinks
 
#If you don't have a local copy, you can read it from the source: 
cat /corral-repl/utexas/BioITeam/rnaseq_course/cufflinks_results/commands.cufflinks


HOW DOES THE OUTPUT LOOK?

 Take a look at output for one of our samples, C1_R1. The important file is transcripts.gtf, which contains Tophat's assembled junctions for C1_R1.

Cufflinks output files
#If you have a local copy:
ls -l C1_R1_clout
 
 
#If you don't have a local copy:
ls -l /corral-repl/utexas/BioITeam/rnaseq_course/cufflinks_results/C1_R1_clout

-rw------- 1 daras G-801020   627673 May 17 16:58 genes.fpkm_tracking
-rw------- 1 daras G-801020  1021025 May 17 16:58 isoforms.fpkm_tracking
-rw------- 1 daras G-801020        0 May 17 16:50 skipped.gtf
-rw------- 1 daras G-801020 14784740 May 17 16:58 transcripts.gtf

Step 3: Cuffmerge

HOW WAS IT RUN?

We first created a file listing the paths of all per-sample transcripts.gtf files so far, then pass that to cuffmerge:

How did we do that?
find . -name transcripts.gtf > assembly_list.txt
 
#If you have a local copy:
cat assembly_list.txt

#If you don't have a local copy:
cat /corral-repl/utexas/BioITeam/rnaseq_course/cufflinks_results/assembly_list.txt
commands.cuffmerge file
cat run_commands/commands.cuffmerge

 
     cuffmerge  -g reference/genes.exons.gtf  assembly_list.txt

 

HOW DOES THE OUTPUT LOOK?

The most important file is merged.gif, which contains the consensus transcriptome annotations cuffmerge has calculated.

cuffmerge output
#If you have a local copy:
ls -l merged_asm
 
#If you don't have a local copy:
cat /corral-repl/utexas/BioITeam/rnaseq_course/cufflinks_results/merged_asm

-rwxrwxr-x  1 daras G-803889  1571816 Aug 16  2012 genes.fpkm_tracking
-rwxrwxr-x  1 daras G-803889  2281319 Aug 16  2012 isoforms.fpkm_tracking
drwxrwxr-x  2 daras G-803889    32768 Aug 16  2012 logs
-r-xrwxr-x  1 daras G-803889 32090408 Aug 16  2012 merged.gtf
-rwxrwxr-x  1 daras G-803889        0 Aug 16  2012 skipped.gtf
drwxrwxr-x  2 daras G-803889    32768 Aug 16  2012 tmp
-rwxrwxr-x  1 daras G-803889 34844830 Aug 16  2012 transcripts.gtf

Step 4: Cuffdiff

HOW WAS IT RUN?

commands.cuffdiff file
#If you have a local copy:
 cat run_commands/commands.cuffdiff
 
cuffdiff -o diff_out -b reference/genome.fa -p 8 -L C1,C2 -u merged_asm/merged.gtf C1_R1_thout/accepted_hits.bam,C1_R2_thout/accepted_hits.bam,C1_R3_thout/accepted_hits.bam C2_R1_thout/accepted_hits.bam,C2_R2_thout/accepted_hits.bam,C2_R3_thout/accepted_hits.bam
 
#If you don't have a local copy:
cat /corral-repl/utexas/BioITeam/rnaseq_course/cufflinks_results/commands.cuffdiff


HOW DOES THE OUTPUT LOOK?
cuffdiff output
#If you have a local copy:
ls -l diff_out
 
#If you don't have a local copy:
cat /corral-repl/utexas/BioITeam/rnaseq_course/cufflinks_results/diff_out

-rwxr-x--- 1 daras G-801020  2691192 Aug 21 12:20 isoform_exp.diff  : Differential expression testing for transcripts
-rwxr-x--- 1 daras G-801020  1483520 Aug 21 12:20 gene_exp.diff     : Differential expression testing for genes
-rwxr-x--- 1 daras G-801020  1729831 Aug 21 12:20 tss_group_exp.diff: Differential expression testing for primary transcripts
-rwxr-x--- 1 daras G-801020  1369451 Aug 21 12:20 cds_exp.diff      : Differential expression testing for coding sequences
 
-rwxr-x--- 1 daras G-801020  3277177 Aug 21 12:20 isoforms.fpkm_tracking
-rwxr-x--- 1 daras G-801020  1628659 Aug 21 12:20 genes.fpkm_tracking
-rwxr-x--- 1 daras G-801020  1885773 Aug 21 12:20 tss_groups.fpkm_tracking
-rwxr-x--- 1 daras G-801020  1477492 Aug 21 12:20 cds.fpkm_tracking
 
-rwxr-x--- 1 daras G-801020  1349574 Aug 21 12:20 splicing.diff  : Differential splicing tests
-rwxr-x--- 1 daras G-801020  1158560 Aug 21 12:20 promoters.diff : Differential promoter usage
-rwxr-x--- 1 daras G-801020   919690 Aug 21 12:20 cds.diff       : Differential coding output


PARSING CUFFDIFF OUTPUT

Here is a basic command useful for parsing/sorting the gene_exp.diff or isoform_exp.diff files:

Linux one-liner for sorting cuffdiff output by log2 fold-change values
cat diff_out/isoform_exp.diff | awk '{print $10 "\t" $4}' | sort -n -r | head

Exercise 1: Find the 10 most up-regulated genes, by fold change that are classified as significantly changed. 

One-line command to get 10 most up regulated genes
cat diff_out/gene_exp.diff |grep 'yes'|sort -k10nr,10|head

 
#Lets pull out just gene names
cat diff_out/gene_exp.diff |grep 'yes'|sort -k10nr,10|head|cut -f 3
 Solution

Top 10 upregulated genes

scf

crc

KdelR

Nacalpha

CG8979

CG4389

Vha68-2

regucalcin

CG3835

CG1746


Exercise 2: Find the 10 most down-regulated genes, by fold change that are classified as significantly changed. 

One-line command to get 10 most down regulated genes
cat diff_out/gene_exp.diff |grep 'yes'|sort -k10n,10|head|cut -f 3
 Solution

Top 10 downregulated genes

CG17065

Git

CG18519

del

CG13319

RNaseX25

msb1l

achi

CG15611

CG11309


 

Exercise 3: Find the 10 most up-regulated isoforms, by fold change that are classified as significantly changed. What genes do they belong to?

One-line command to get 10 most up-regulated isoforms
cat diff_out/isoform_exp.diff |grep 'yes'|sort -k10nr,10|head

 
#To pull out gene names:
cat diff_out/isoform_exp.diff |grep 'yes'|sort -k10nr,10|head 
 Solution

sick

mub

CG5021

akirin

CG4587

c(3)G

Ank2

CG1674

CrebB-17A

stai

 

 

  • No labels