Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Pink post-it – I need a bit of help.

Conventions

Text that you find in courier font refers to a program or file name on a computer.

...

Expand
Solution
Solution

Solution sections will contain the commands so that you could copy-and-paste them if you have to. They should be exactly accurate.

Goals and challenges

Course goals

  • Hands-on, tutorial style – learn by doing
  • Cover the NGS tool basics – the first few things you'll do after receiving raw sequences
  • Get you comfortable with Linux and TACC – your best "frenemies"
  • Make you self sufficient in 4 days to become experts over time
  • Show some "best practices" for working with NGS data

NGS Challenges

Large and growing datasets

...

  • Organization and naming conventions are critical.
  • Your data can get out of hand very quickly!

progression of Iyer Lab ChIP-seq datasets over time:

  • 2008 – Yeast heat shock remodeling of chromatin
    • 2 yeast datasets
    • less than 2 million reads
  • 2010 – Allelic bias in CTCF binding
    • 13 CTCF datasets from 3 GM cell lines
    • ~200 million reads
  • 2012 – Analysis of 3 TFs across 11 cell lines
    • 32 datasets gathered over 3 years
    • ~ 1 billion reads
  • 2014 – QTL analysis of CTCF binding
    • 52 very deeply sequenced CTCF datasets
    • ~ 8 billion reads
  • in progress – Functional analysis of glioblastoma tumors and cell lines
    • > 300 400 datasets so far
    • > 17 20 billion reads