Read Mapping with bowtie2 Tutorial GVA2023
Overview
Once you know you are working with the best quality data (Evaluating Raw Sequencing data tutorial) possible, the first step in nearly every NGS analysis pipeline is to map sequencing reads to a reference genome. In this tutorial we'll explore these basic principles using bowtie2 on TACC.
The world of read mappers is settling down after being a bioinformatics Wild West where there was a new gun in town every week that promised to be a faster and more accurate shot than the current record holder. Things seem to have reached the point where there is mainly a trade-off between speed, accuracy, and configurability among read mappers that have remained popular. There are over 50 read mapping programs listed here. Each mapper has its own set of limitations (on the lengths of reads it accepts, on how it outputs read alignments, on how many mismatches there can be, on whether it produces gapped alignments, etc). It is possible a different read mapper would be better for your set of experiments. More will be discussed about selecting a good tool on Friday.
Other read mappers
Previous versions of this class and tutorial have covered using bowtie and bwa. Please consult these tutorials for more specific information on each mapping program. A previous version of this tutorial included a trimmed down version of the bwa tutorial if you just want the 'flavor' of what other read mappers involve.
Learning Objectives
This tutorial covers the commands necessary to use bowtie2 to map reads to a reference genome, and concepts applicable to many more mappers.
- Become comfortable with the basic steps of indexing a reference genome, mapping reads, and converting output to
SAM/BAM
format for downstream analysis. - Use bowtie2 to map reads from an E. coli Illumina data set to a reference genome and compare the output.
Theory
Please see the Introduction to mapping presentation on the course outline for more details of the theory behind read mapping algorithms and critical considerations for using these tools and references correctly.
Tutorial: E. coli genome re-sequencing data
The following DNA sequencing read data files were downloaded from the NCBI Sequence Read Archive via the corresponding European Nucleotide Archive record. They are Illumina Genome Analyzer sequencing of a paired-end library from a (haploid) E. coli clone that was isolated from a population of bacteria that had evolved for 20,000 generations in the laboratory as part of a long-term
Welcome to the University Wiki Service! Please use your IID (yourEID@eid.utexas.edu) when prompted for your email address during login or click here to enter your EID. If you are experiencing any issues loading content on pages, please try these steps to clear your browser cache. If you require further assistance, please email wikihelp@utexas.edu.