Genome Variant Analysis Course 2015

We will meet in Room 4.128 of Mezes Hall (MEZ). We strongly encourage you to use the computers provided in the classroom for these tutorials, but you may also bring your personal laptops.

 

Course Overview

The course will be built based on 2 ~90 minute sections per day for 4 days, with a typical format of a brief presentation and a hands on guided tutorial during each section with additional "bonus tutorials" covering important (yet not critical) aspects of NGS data analysis that can be completed in each section time permitting, or on your own. By the end of this course, we hope to achieve the following goals:

  1. Teach you different ways next generation sequencing libraries are constructed, and the advantages/disadvantages associated with the different types. 
  2. Familiarize you with how the Texas Advanced Computing Center (TACC) can be used to simplify and speed up your data analysis.
  3. Teach you the basics of read mapping in both individuals and populations, and identifying variants within individuals and rare variants within populations.
  4. Provide reference materials covering a breadth of material sufficient to give you a starting point of where to begin you own data analysis, and enough experience that you can begin that analysis on your own.

Your Instructors

Name

Initials

Affiliation

Expertise

Daniel Deatherage

DD

Barrick Lab

Unix, Python, NGS Library Prep, Capture, Rare Variant Identification

Sean LeonardSLBarrick LabUnix, R

A nod to the past

This class has been taught multiple times in the last few years. We wish to acknowledge a great deal of help with creating these web pages and materials from previous instructors of the Intro to NGS Bioinformatics course taught in May 2013 and the Genome Variant Analysis Course 2014 taught in May 2014.

Two individuals warrant special mention, the director of the GSAF Scott Hunicke-Smith, and Jeffrey Barrick have been the driving force behind this class for a number of years, and the majority of the tutorials presented here were developed by them or adapted from their work.

Course Schedule

Tuesday, May 26th. Day 1 – "The Basics"

Presentation: Next Generation Sequencing Library Preparation and Experimental Design (and general introduction)

Tutorial: Introduction to linux and lonestar

Bonus Tutorial: Evaluating raw sequencing data

Presentation: Single-nucleotide variant (SNV) calling

Presentation: Structural variant (SV) calling

Tutorial: Bacterial genome variants the easy way – breseq

Wednesday May 27th. Day 2 – "The Principles of Variant Calling"

Presentation: Read Mapping

Tutorial: Mapping with bowtie2

Tutorial: SNV calling with SAMtools with a post-class fix now available here

Tutorial: SV calling with SVDetect

Tutorial: Integrative Genome Viewer (IGV)

Bonus Tutorial: Evaluating mapped read data

Thursday May 28th. Day 3 – "Human Variant Calling"

Pre-presentation task: Day 3 Start (includes tutorials)

Presentation: What changes with humans?

Tutorial: Human Trios Analysis

Bonus Tutorial: Human variants with GATK

Bonus Tutorial: Tumor/normal Analysis with Virmid

Bonus Tutorial: Linux 1 liners (how to use grep and awk to get the most out of your work)

Bonus Tutorial: samtools mpileup in more detail on human (makes use of linux 1 liners)

Friday May 29th. Day 4 – "(Rare) Variant Detection in Populations"

Tutorial: Annotating variants with annovar

Bonus Tutorial: Filtering and screening variants

Presentation: Where do errors come from, and what can we do about them?

Presentation: Alternative library prep methods

Tutorial: Exome capture and metrics

Tutorial: Sequencing error correction (SSCS reads)

Bonus Tutorial: Rare variant detection in bacteria using breseq

Additional Resources

Here is a jumbled mess of things that have been presented in years past that should be ordered to be more useful.