/
Stampede2 Breseq Tutorial GVA2023

Stampede2 Breseq Tutorial GVA2023

Overview

breseq is a tool developed by the Barrick lab (of which your instructor is a member) intended for analyzing genome re-sequencing data for bacteria. It is primarily used to analyze laboratory evolution experiments with microbes. In these experiments, there is usually a high-quality reference genome for the ancestral strain, and one is interested in exhaustively finding all of the mutations that occurred during the evolution experiment. Then one might want to construct a phylogenetic tree of individuals samples from a single population or determine whether the same gene is mutated in many independent evolution experiments in an environment.

Learning objectives:

  • Quick introduction to a self contained/automated pipeline to identify mutations.
  • Explain the types of mutations found in a complete manner before using methods better suited for higher order organisms.
  • Examine the same data used in the Mapping, and SNV tutorials as breseq output.


Input data / expectations:

  • Haploid reference genome
  • Relatively small (<20 Mb) reference genome
  • Average genomic coverage > 30-fold
  • Less than ~1,000 mutations expected
  • Detects SNVs and SVs from split read alignment of reads (does not use paired-end distance information)
  • Produces annotated HTML output

This does mean that breseq is not suited for diploids, and other very large genomes. GATK is a similar pipeline that has many more additional options that is suited for diploids and large genomes. You can learn a great deal more about breseq by reading the Online Documentation.

Here is a rough outline of the workflow in breseq with proposed additions.


breseq access

In order to run breseq, we need to install it. If you think this sounds like a great opportunity to use conda you are right! Using https://anaconda.org/  you find 2 different results for breseq.