Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Overview

We analyzed a single lambda phage evolution experiment as both an introduction to variant visualization in the first part of Wednesday's class, but breseq is designed to be able to with "small" genomes, not just "tiny" genomes. It also has more advanced tools for visualizing mutations across multiple genomes and more complete identification of all mutations.

...

  1. Run breseq on 7 different E. coli clones evolved for 40,000 generations
  2. Use the packaged gdtools commands to generate a comparison table of all mutations from all 7 samples
  3. Learn shortcuts for compressing, transferring, and extracting multiple folders/files with single commands
  4. Visualize the data on your local computer

Get some data

The data files for this example are in the following path: $BI/ngs_course/ecoli_clones/data/ go ahead and copy them to a new folder in your $SCRATCH directory called GVA_breseq_multi-sample:

...

Code Block
languagebash
titleGeneric command
breseq -j 6 -r NC_012967.1.gbk -o run_output/<XX>K <ID>_1.fastq <ID>_2.fastq &> logs/<XX>K.log.txt &
partpuprose
-j 6use 6 processors/threads .. this both speeds our run up, and would cause it to fail if we were using the module version of bowtie2
-o run_output/<xx>kdirects all output to the run_output directory, AND creates a new directory with 2 digits (<XX>) followed by a K for individual samples data. If we don't include the second part referencing the individual sample, breseq would write the output from all of the runs on top of one other. The program will undoubtedly get confused, possibly crash, but definitely not be analyzable
<ID>this is just used to denote read1 and or read2 ... note that in our acutal commands they reference the fastq files, and are supplied without an option
&> <XX>00K.log.txtRedirect both the standard output and the standard error streams to a file called <XX>00k.log.txt. and put that file in a directory named logs. The &> are telling the command line to send the streams to that file.
&run the preceding command in the background. This is required so all the commands will run at once
Info
titleWhy did we use -j 6?
  1. Each lonestar5 compute node has 48 processors available.
  2. We have 7 samples to run, so by requesting 6 processors, we allow all 7 samples to start at the same time leaving us with 6 unused processors.
  3. If we had requested 7 processors for each sample, only 6 samples would start initially and the 7th would start after the first finishes.
  4. These are similar to the considerations you have to make with the job submission system we will go over at the end of tomorrow's class.

...

Now you can click through each individual sample's output to see the different mutations in any given sample, but the whole point of this tutorial has been to look at multiple samples at onceas you could in the intro to breseq tutorial.


Other useful tutorials

Data analysis with breseq (like all other programs) is only as good as the data that goes into it. The MultiQC and Trimmomatic tutorials work well upstream of breseq, while the identification of novel DNA elements may be useful for things such as trying to identify unknown plasmids and makes use of the genome assembly tutorial.



Back to the GVA2020 page