...
Pink post-it – I need a bit of help.
Conventions
Text that you find in courier font refers to a program or file name on a computer.
...
| Expand | ||||
|---|---|---|---|---|
| ||||
Solution sections will contain the |
Goals and challenges
Course goals
- Hands-on, tutorial style – learn by doing
- Cover the NGS tool basics – the first few things you'll do after receiving raw sequences
- Get you comfortable with Linux and TACC – your best "frenemies"
- Make you self sufficient in 4 days to become experts over time
- Show some "best practices" for working with NGS data
NGS Challenges
Large and growing datasets
...
- Organization and naming conventions are critical.
- Your data can get out of hand very quickly!
progression of Iyer Lab ChIP-seq datasets over time:
- 2008 – Yeast heat shock remodeling of chromatin
- 2 yeast datasets
- less than 2 million reads
- 2010 – Allelic bias in CTCF binding
- 13 CTCF datasets from 3 GM cell lines
- ~200 million reads
- 2012 – Analysis of 3 TFs across 11 cell lines
- 32 datasets gathered over 3 years
- ~ 1 billion reads
- 2014 – QTL analysis of CTCF binding
- 52 very deeply sequenced CTCF datasets
- ~ 8 billion reads
- in progress – Functional analysis of glioblastoma tumors and cell lines
- > 300 400 datasets so far
- > 17 20 billion reads