Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Day 1: Linux/TACC Introduction and Read Mapping

Extras

Day 2: Calling Genome Variants

Extras (come early Day 3)

Additional topics

Even More Extras

Day 3: RNA-seq

Extras

Day 4: Assembly and Annotation

Resources

Misc

This is the link to the gsaf web site

Style

.primer454 {background-color:skyblue;}
.key454, .keyfixed454 {background-color:yellow;}
.mid454, .midfixed454 {background-color:goldenrod;}
.other454 {background-color:greenyellow;}
.p7 {background-color:0xEBFEBE;}
.fixed454, .keyfixed454, .midfixed454 {font-family:courier,monospace;}

Rough Draft

All notes/lessons/etc. under bioiteam wiki – next meeting next Thursday May 16th, 11 am.

Survey ahead of time if possible – Aaron to create draft, run by S&J – learn nano – send seqanswers “just got your data” post.

Setup stuff on local machines: IGV; BAM/BAI files from all mappings; R; LAMP stack?? GMOD??

Data pre-loaded on corral:
a) bact. Genome specified by Geoff & bam files from D1
b) 1000 genome data (Anna)
c) sample exome (Scott)

Everyone to update Wiki

Agenda:
Day 1: Beginnings
• 45 min Linux refresher – SPHS handout - & ppt – autocomp/pipes/ssh/wget (genome upfront – download data from Corral):
• 20 min Introduction to mapping – launchers at TACC; - mapping by 3-4 different mappers – BWA/bowtie/shrimp/SSAHA2 (bfast?) – Geoff – must have outputs ready.
• 30 min Using TACC – dir.struct/module+spider/scp/bbcp/group sharing – Aaron
• 30 min Run variant caller (samtools) on login node at TACC, SCP, view output in IGV (requires bac. Genome & gff pre-installed for the ref). Geoff
<break>
• 15 min Input and output file formats – handout: FASTQ/BAM – Anna technology specific output Scott
• 15 min Building a reference - Anna
• 60 min ADVANCED session: EXERCISE: shell scripting of mapper & variant caller: given ref & reads, will produce mutations; Daechan

Day 2: Mapping & Variants
• 30 minutes: mapped data evaluation with samtools
• 30 min Installing/compiling tools: (aside from “module”) JB’s tool – google code Jeff
• 45 min View output & compare: false pos/neg – comparisons; what’s hard & weird - in IGV (GVF file/VCF file/BAMs). <break>
• 30 min UCSC genome browser, GEO/SRA data downloads Anna
• 30 min Variant calling with GATK (use their wiki), more detail on .vcf format, look at Human data 1000 genome VCF files and describe how to access 1000 genome data Anna Scott
• 60 min Characterizing & comparing variant files – annovar/snpeff/plink/vaast/qiime Dhivya & Scott shell/perl/python scripting – candidates for recessive disease

• ADVANCED: SAM files: parsing, picard tools (read groups, validation is important/check return codes), flagstat, filter (e.g. only use properly paired reads), insert size dist, mapping %, mapping bias by read or by genome location (e.g. on) Anna & Scott (BED tools), calling variants in mixed populations (freebayes), ChIP-seq analysis???

Day 3: RNA-seq Scott
• 60 min Quantitation & statistics: map & count Jeff ; normalization; tophat/cufflinks(cuffmerge)/cuffdiff – human & e. coli. Maybe a digression into R?
• 30 min Splice variant analysis: continue from tophat
<break>
• 60 min non-coding RNA analysis: unique mapping (shrimp/grep), miRNA’s abundance/editing, other: snoRNA, snRNA, lincRNA, piRNA, tRNA, degradome, etc. etc. (not poly-A; not annotated)
• 60 min Transcriptome assembly & annotation: velvet/oases, TrinityRNAseq; BLAST, GOminer, (ELI?)

Day 4: Assembly
• 90 min de novo assembly: E. coli bacteria --velvet (Aaron), mira (refguided Aaron), Allpaths(-LG) (Scott – optional..), mention: abyss, SOAPdenovo
• 30 min Finding and annotating genes – maker, glimmer; web tools: JCVI, NCBI, psi-blast & CDD; pfam/rfam (Scott & Jeff)
• 45 min Evaluating & visualizing assemblies – Comparing: treat assembler output as a reference genome and proceed with prior tools – challenges: contigs, errors; Visualizing: mauve, circos (may need install help) (Scott); cgview (Jeff).
• 30 Genome databases: Introduction to GMOD and/or SequenceServer (can we standup web servers on the class computers?) (Scott)

Notes from 5/17/12:
Conventions decided on - expands for hints, formats for command prompt/code
Aaron to write .sge maker script
All qsub's will run "./commands"
All examples have to have a "commands" file.
AB/DC to tacc-ify scripts; SPHS to put up chr20 fastq's, bams, and vcf's for example.
Append GATK to diploid calling.

...

May 2013

Warning

We will meet in Room 101B of the Flawn Academic Center (FAC) building.  We STRONGLY encourage you to use the computers provided in the classroom, but you may also bring your personal laptops.

Table of Contents

Resources tool list, file formats & more

Link to Etherpad: https://etherpad.mozilla.org/g2NxIEAFWL

Use this to post any questions you have about the lessons and tutorials.

Your Instructors

Name

Initials

Affiliation

Expertise

Scott Hunicke-Smith

SPHS

Director GSAF

Everything, if loosely defined (but especially awk)

Jeff Barrick

JB

Asst. Prof. Biochemistry

Microbes, Perl, C++, Mac, miscellanea

Dhivya Arasappan (in absentia)

DA

GSAF

RNA-seq, transcriptome assembly

Anna Battenhouse

AB

Iyer Lab

Eukaryotes, Bash scripting, UCSC Genome Browser

Daechan Park

DP

Iyer Lab

Eukaryotes, ChIP-seq, Python, Samtools

Nichole Bennett

NB

Parmesan/Singer Labs

Python, R, Unix

Dan Deatherage

DD

Barrick Lab

Unix, Python, NGS Library Prep

Nathan Abell

NA

Iyer Lab

Eukaryotes, RNA-Seq

instructor action item list

Expand
Info for the instructors
Info for the instructors

Day 1a: Scott 1b: Jeff
Day 2a: Jeff, Daechan, Anna, 2b: Scott
Day 3a: Jeff 3b: Iyer lab
Day 4a: Jeff, 4b: Scott

Instructors: meet 9am Monday for final check

Each Part 1/Part 2 section needs to be standardized with:
*Learning Objectives
*Theory
*Workflow diagram (data, toolbox/recipe, exercises)
*Tutorial (bulk of time here)
*Recap learning objectives
*Next steps...

Anchor
day1
day1

Day 1: Linux/TACC Introduction and Read Mapping

Part 1: Linux/TACC Introduction

Part 2: Read Mapping

Enrichment modules (4:30-5:30)

Extras

Anchor
day2
day2

Day 2: Handling Raw and Aligned sequences, and Calling Genome Variants

Part 1. Handling Raw and Aligned sequences

Part 2. Calling Genome Variants

Enrichment module (12:30-1:30)

Enrichment modules (4:30-5:30)

Extras

Anchor
day3
day3

Day 3: RNA-seq

Part 1. Introduction to RNA-seq Counting

Part 2. The Tuxedo RNA-seq Pipeline (Tophat & Cufflinks)

Enrichment module (12:30-1:30)

Enrichment modules (4:30-5:30)

Extras

Anchor
day4
day4

Day 4: Assembly and Annotation

Part 1. Genome Assembly

Part 2. Assembly Annotation

Enrichment module (12:30-1:30)

  • Office hours: "I want to learn how to install and use this tool called ______ that we didn't talk about in class." (JB).

Enrichment module (4:30-5:30)

Resources

As you're getting settled