
Your Instructors
Most of us are members (or alumni) of the functional genomics lab of Vishwanath Iyer, UT Austin.
- Anna Battenhouse, Associate Research Scientist, Iyer Lab, abattenhouse@utexas.edu
- BA English literature, 1978
- Commercial software development 1982 – 2005
- Joined Iyer Lab 2007 (“retirement career”)
- BS Biochemistry, 2013
- Amelia Weber Hall, Graduate Student, Iyer Lab, ameliahall@utexas.edu
- 5th year Microbiology graduate student
- Laboratory Technician at UT 2007-2010
- BS Molecular Genetics, 2007
- Nathan Abell, Research Assistant, Xhemalce Lab, abell.nathan@gmail.com
- Undergraduate researcher in Iyer Lab 2011-2013
- BS Molecular Biology, UT, 2013
- Research Assistant
- Dakota Derryberry, Graduate Student, Wilke Lab, dakotaz@utexas.edu
About the Iyer Lab
http://iyerlab.org/
- Main focus is functional genomics
- large-scale transciptional reprogramming in response to diverse stimuli
- Encode consortium collaborator
- work in human and yeast
- Research methods include
- microarrays (Dr. Iyer was co-inventor)
- high-throughput sequencing (since 2007)
- especially ChIP-seq
- also RNA-seq, RIP-seq, MNase-seq ...
- we now have > 1,500 NGS datasets

Communication
Post its
Green post-it – I'm good at the moment.
Pink post-it – I need a bit of help.
Conventions
Text that you find in courier font refers to a program or file name on a computer.
If you see a block of text like this:
it means, "type the command ls -h into a terminal window, hit return, and see what happens".
We intend this course to offer as much self-learning as possible. Consequently, you'll find many sections like this - click on the triangle to expand them:
Hint sections will provide you some guidance on what to do next, but will not spell it out. |
and some sections like this:
Solution sections will contain the commands so that you could copy-and-paste them if you have to. They should be exactly accurate. |
Course goals
- Hands-on, tutorial style – learn by doing
- Cover the NGS tool basics – the first few things you'll do after receiving raw sequences
- Get you comfortable with Linux and TACC – your best "frenemies"
- Make you self sufficient in 4 days to become experts over time
- Show some "best practices" for working with NGS data
NGS Challenges
Large and growing datasets
NGS methods procude staggering amounts of data!
Typical dataset these days
- yeast: 5 – 20 million reads
- human: 20 – 100 million reads
- paired end, length 75 – 100 bases
The initial fastq files are big (100s of MB to GB) – and they're just the start.
- Organization and naming conventions are critical.
- Your data can get out of hand very quickly!
progression of Iyer Lab ChIP-seq datasets over time:
- 2008 – Yeast heat shock remodeling of chromatin
- 2 yeast datasets
- less than 2 million reads
- 2010 – Allelic bias in CTCF binding
- 13 CTCF datasets from 3 GM cell lines
- ~200 million reads
- 2012 – Analysis of 3 TFs across 11 cell lines
- 32 datasets gathered over 3 years
- ~ 1 billion reads
- 2014 – QTL analysis of CTCF binding
- 52 very deeply sequenced CTCF datasets
- ~ 8 billion reads
- in progress – Functional analysis of glioblastoma tumors and cell lines
- > 400 datasets so far
- > 20 billion reads