| Table of Contents |
|---|
Your Instructors
Anna Battenhouse, Associate Research Scientist, Marcotte Lab and BRCF, abattenhouse@utexas.edu, abattenhouse@utexas.edu,
Biomedical Research Computing Facility Manager, and Marcotte lab staffBA English literature, 1978
Commercial software development 1982 – 20052007
Joined Iyer Lab 2007 (“retirement career”)
BS Biochemistry, UT Austin, 2013
- J
Haridha Shivram, Iyer Lab, ameliahall@utexas.edu
BS Molecular Genetics, 2007 (University of Rochester)
PhD Microbiology, 2017 (University of Texas at Austin)
Laboratory Technician at UT 2007-2010
Dakota Derryberry, M.S., dakotaz@utexas.edu
BA Biology, University of Chicago, 2009
MS Computational Biology, University of Texas at Austin, 2017
Benni Goetz, M.S., (Research Engineering/Scientist Associate III), benni@utexas.edu
joined the Bioinformatics Consulting Group in 2012
...
- Joined the Biomedical Research Computing Facility (BRCF) and Marcotte Lab 2017
- Also affiliated with
- Bioinformatics Consulting Group (BCG)
- Genome Sequencing and Analysis Facility (GSAF)
- Cryo Electron Microscopy core facility (CryoEM)
- Edward Marcotte lab
- Matt Bramble, matthew.bramble@austin.utexas.edu,
Associate Research Scientist, Bioinformatics Consulting Group- Master’s degrees from UT Austin in Molecular Biology and Statistics
- 10 years of experience with R and Python
- Recently joined the CBRS Bioinformatics Consulting Group after six years at MD Anderson Cancer Center analyzing a wide range of NGS epigenomics data
- Areas of expertise include: Hi-C (chromatin conformation) analysis, mouse somatic variant analysis, and single cell RNAseq analysis
About the Iyer Lab (where Anna learned NGS)
Dr. Vishy Iyer, PI | |
Main focus is functional genomics
| |
Research methods include
| |
|
Communication
Post its
Green post-it – I'm good at the moment.
...
Asking questions
Feel free to ask questions any time during the instructor's lecture and demonstrations.
For online attendees, you can also post your question to the Zoom chat. We'll sometimes use breakout rooms when troubleshooting problems you run into, if so, TA Matt Bramble will assign you to one.
Getting help
Since most folks are new to the Linux command line, we expect you to run into problems! Please let us know if you're having difficulties!
Making mistakes and running into problems is key to learning the Linux command line! It is not only expected – it is encouraged
.
Conventions
If you see a block of text like this:
| Code Block | ||||
|---|---|---|---|---|
| ||||
ls -h |
it means, "type the command ls -h into a terminal window, hit return Enter, and see what happens".
We intend this course to offer as much self-learning as possible. Consequently, you'll find many sections like this - click on the triangle to expand them:
| Expand | ||
|---|---|---|
| ||
Hint sections will provide you some guidance on what to do next, but will not spell it out. |
and some sections like this:
| Expand | ||
|---|---|---|
| ||
Solution sections will contain the |
Course goals
- Hands-on, tutorial style – learn by doing
- common Common bioinformatics tools & file formats
- Introduce NGS vocabulary
- both high-level view and practice with specific tools
- Cover the NGS basics
- the The first few things you'll do after receiving raw sequences
- raw sequence QC and preparation
- alignment to reference
- basic alignment analysis
- the The first few things you'll do after receiving raw sequences
- Understand and practice required skills
- Get you comfortable with Linux and TACC – your best "frenemies"
- Make you self-sufficient enough in 4 5 days to become experts over time
- Show some "best practices" for working with NGS data
NGS Challenges
...
|
Large and growing datasets
NGS methods produce staggering amounts of data!
Typical dataset these days
- yeast: 5 – 20 million reads
- human: 20 – 250 million reads (~5 - 8 million for TagSeq)
- single end (SE) or paired end (PE), length 75 50 – 250 bases300 bases (100 or 150 typical)
The initial fastq FASTQ files are big (100s of MB to GB) – and they're just the start.
- Organization and naming conventions are critical.
- Your data can get out of hand very quickly!
progression Progression of Iyer Lab datasets over time:
...