...
- Anna Battenhouse, Associate Research Scientist, Iyer Lab, abattenhouse@utexas.edu
- BA English literature, 1978
- Commercial software development 1982 – 2005
- Joined Iyer Lab 2007 (“retirement career”)
- BS Biochemistry, 2013
- Amelia Weber Hall, Graduate Student, Iyer Lab, ameliahall@utexas.edu
- 5th year Microbiology graduate student
- Laboratory Technician at UT 2007-2010
- BS Molecular Genetics, 2007
- Nathan Abell, Research Assistant, Xhemalce Lab, abell.nathan@gmail.com
- Undergraduate researcher in Iyer Lab 2011-2013
- BS Molecular Biology, UT, 2013
- Research Assistant
- Dakota Derryberry, Graduate Student, Wilke Lab, dakotaz@utexas.edu
- ???
About the Iyer Lab
Dr. Vishy Iyer, PI | |
Main focus is functional genomics
|
|
Research methods include
| |
|
Communication
Post its
Green post-it – I'm good at the moment.
...
- Hands-on, tutorial style – learn by doing
- common bioinformatics tools & file formats
- Introduce NGS vocabulary
- both high-level view and practice with specific tools
- Cover the NGS tool basics – the first few things you'll do after receiving raw sequences
- raw sequence preparation
- alignment to reference
- basic alignment analysis
- Understand and practice required skills
- Get you comfortable with Linux and TACC – your best "frenemies"
- Make you self sufficient in 4 days to become experts over time
- Show some "best practices" for working with NGS data
NGS Challenges
Diverse skill set requirements
- Analysis – making sense of raw data
- one part bioinformatics and statistics
- one part scripting / programming
- Linux command line
- bash scripting
- R, python, perl
- Management – making order out of chaos
- one part organization
- one part data wrangling
- Adoption of best practices is critical!
Large and growing datasets
...
- Organization and naming conventions are critical.
- Your data can get out of hand very quickly!
progression of Iyer Lab ChIP-seq datasets over time:
- 2008 – Yeast heat shock remodeling of chromatin
- 2 yeast datasets
- less than 2 million readssequences
- 2010 – Allelic bias in CTCF binding
- 13 CTCF datasets from 3 GM cell lines
- ~200 million readssequences
- 2012 – Analysis of Transcription factor data analysis (ENCODE2)
- 32 ChIP-seq datasets gathered over 3 years (3 TFs across 11 cell lines
- 32 datasets gathered over 3 years
- ~ 1 billion reads
- )
- ~ 1 billion sequences
- 2013 – miRNA overexpression effects
- 42 RNAseq datasets (7 conditions)
- ~ 2.6 billion sequences
- 2014 – eQTL analysis of CTCF binding
- 52 very deeply sequenced CTCF datasets
- ~ 8 billion readssequences
- in progress – Functional analysis of glioblastoma tumors and cell lines
- > 400 datasets so far (ChIP-seq, RNAseq, miRNAseq, exome/genome sequencing)
- > 20 billion readssequences