Breseq Tutorial
Introduction
breseq is a tool developed by the Barrick lab intended for analyzing genome re-sequencing data for bacteria. It is primarily used to analyze laboratory evolution experiments with microbes. In these experiments, there is usually a high-quality reference genome for the ancestral strain, and one is interested in exhaustively finding all of the mutations that occurred during the evolution experiment. Then one might want to construct a phylogenetic tree of individuals samples from a single population or determine whether the same gene is mutated in many independent evolution experiments in an environment.
Input data / expectations:
- Haploid reference genome
- Relatively small (<20 Mb) reference genome
- Input FASTQ reads can be from any sequencing technology
- Average genomic coverage > 30-fold
- Less than ~1,000 mutations expected
- Detects SNVs and SVs from single-end reads (does not use paired-end distance information)
- Produces annotated HTML output
You can learn a great deal more about breseq by reading the Online Documentation.
Here is a rough outline of the workflow in breseq with proposed additions.
This tutorial was reformatted from the most recent version found here. Our thanks to the previous instructors.
Objectives:
- Use a very self contained/automated pipeline to identify mutations.
- Explain the types of mutations found in a complete manner before using methods better suited for higher order organisms.
Example 1: Bacteriophage lambda data set
First, we'll run breseq on a small data set to be sure that it is installed correctly, and to get a taste for what the output looks like. This sample is a mixed population of bacteriophage lambda that was co-evolved in lab with its E. coli hosts.
Environment
To set your profile up to run breseq, we need to add "module load bowtie/2.1.0" to your profile.
cdh #move to your home directory echo "module load bowtie/2.1.0" >> .profile #this command updates your profile to automatically load the bowtie module
After you've completed these commands, exit lonestar and re log in to re run your profile.
Data
The data files for this example are in the path:
$BI/ngs_course/lambda_mixed_pop/data
Copy this directory to a new directory called BDIB_breseq in your $SCRATCH
space and cd
into it.
If the copy worked correctly you should see the following 2 files:
File Name | Description | Sample |
---|---|---|
| Single-end Illumina 36-bp reads | Evolved lambda bacteriophage mixed population genome sequencing |
|