SSC Intro to NGS Bioinformatics Course

Day 1: Linux/TACC Introduction and Read Mapping

Extras

Day 2: Calling Genome Variants

Extras (come early Day 3)

Additional topics

Even More Extras

Day 3: RNA-seq

Extras

non-coding RNA analysis

Day 4: Assembly and Annotation

Resources

Resources: tool list, file formats & more
Scott's list of linux one-liners
Example BWA alignment script
Exercises

Misc

This is the link to the gsaf web site

Rough Draft

All notes/lessons/etc. under bioiteam wiki – next meeting next Thursday May 16th, 11 am.

Survey ahead of time if possible – Aaron to create draft, run by S&J – learn nano – send seqanswers “just got your data” post.

Setup stuff on local machines: IGV; BAM/BAI files from all mappings; R; LAMP stack?? GMOD??

Data pre-loaded on corral:
a) bact. Genome specified by Geoff & bam files from D1
b) 1000 genome data (Anna)
c) sample exome (Scott)

Everyone to update Wiki

Agenda:
Day 1: Beginnings
• 45 min Linux refresher – SPHS handout - & ppt – autocomp/pipes/ssh/wget (genome upfront – download data from Corral):
• 20 min Introduction to mapping – launchers at TACC; - mapping by 3-4 different mappers – BWA/bowtie/shrimp/SSAHA2 (bfast?) – Geoff – must have outputs ready.
• 30 min Using TACC – dir.struct/module+spider/scp/bbcp/group sharing – Aaron
• 30 min Run variant caller (samtools) on login node at TACC, SCP, view output in IGV (requires bac. Genome & gff pre-installed for the ref). Geoff
<break>
• 15 min Input and output file formats – handout: FASTQ/BAM – Anna technology specific output Scott
• 15 min Building a reference - Anna
• 60 min ADVANCED session: EXERCISE: shell scripting of mapper & variant caller: given ref & reads, will produce mutations; Daechan

Day 2: Mapping & Variants
• 30 minutes: mapped data evaluation with samtools
• 30 min Installing/compiling tools: (aside from “module”) JB’s tool – google code Jeff
• 45 min View output & compare: false pos/neg – comparisons; what’s hard & weird - in IGV (GVF file/VCF file/BAMs). <break>
• 30 min UCSC genome browser, GEO/SRA data downloads Anna
• 30 min Variant calling with GATK (use their wiki), more detail on .vcf format, look at Human data 1000 genome VCF files and describe how to access 1000 genome data Anna Scott
• 60 min Characterizing & comparing variant files – annovar/snpeff/plink/vaast/qiime Dhivya & Scott shell/perl/python scripting – candidates for recessive disease

• ADVANCED: SAM files: parsing, picard tools (read groups, validation is important/check return codes), flagstat, filter (e.g. only use properly paired reads), insert size dist, mapping %, mapping bias by read or by genome location (e.g. on) Anna & Scott (BED tools), calling variants in mixed populations (freebayes), ChIP-seq analysis???

Day 3: RNA-seq Scott
• 60 min Quantitation & statistics: map & count Jeff ; normalization; tophat/cufflinks(cuffmerge)/cuffdiff – human & e. coli. Maybe a digression into R?
• 30 min Splice variant analysis: continue from tophat
<break>
• 60 min non-coding RNA analysis: unique mapping (shrimp/grep), miRNA’s abundance/editing, other: snoRNA, snRNA, lincRNA, piRNA, tRNA, degradome, etc. etc. (not poly-A; not annotated)
• 60 min Transcriptome assembly & annotation: velvet/oases, TrinityRNAseq; BLAST, GOminer, (ELI?)

Day 4: Assembly
• 90 min de novo assembly: E. coli bacteria --velvet (Aaron), mira (refguided Aaron), Allpaths(-LG) (Scott – optional..), mention: abyss, SOAPdenovo
• 30 min Finding and annotating genes – maker, glimmer; web tools: JCVI, NCBI, psi-blast & CDD; pfam/rfam (Scott & Jeff)
• 45 min Evaluating & visualizing assemblies – Comparing: treat assembler output as a reference genome and proceed with prior tools – challenges: contigs, errors; Visualizing: mauve, circos (may need install help) (Scott); cgview (Jeff).
• 30 Genome databases: Introduction to GMOD and/or SequenceServer (can we standup web servers on the class computers?) (Scott)

Notes from 5/17/12:
Conventions decided on - expands for hints, formats for command prompt/code
Aaron to write .sge maker script
All qsub's will run "./commands"
All examples have to have a "commands" file.
AB/DC to tacc-ify scripts; SPHS to put up chr20 fastq's, bams, and vcf's for example.
Append GATK to diploid calling.

Day 1 followup
Daechan original shell scripting page