...

Code Block

language	bash
title	Click here for the solution
collapse	true

cds
mkdir BDIB_breseq
cp $BI/ngs_course/lambda_mixed_pop/data/* BDIB_breseq_lambda
cd BDIB_breseq_lambda
ls

If the copy worked correctly you should see the following 2 files:

...

Because this data set is relatively small (roughly 100x coverage of a 48,000 bp genome), a breseq run will take < 5 minutes. Submit this command to the TACC development queue or run on an idev node. general

Code Block

language	bash
title	breseq commands for commands file

breseq -j 12 -r lambda.gbk lambda_mixed_population.fastq > log.txt

...

Navigate to the output directory in the finder and open the a file called index.html. This will open the results in a web browser window that you can click through different mutations and other information and see the evidence supporting it.

Example 2: E. coli data sets

Now we'll try running breseq on some Escherichia coli genomes from an evolution experiment. These files are larger. You don't want to run them in interactive mode. We'll submit them to the TACC queue all at once.

Data

The data files for this example are in the path:

Code Block
$BI/ngs_course/ecoli_clones/data

...

File Name

...

Description

...

Sample

...

SRR030252_1.fastq SRR030252_2.fastq

...

Paired-end Illumina 36-bp reads

...

0K generation evolved E. coli strain

...

SRR030253_1.fastq SRR030253_2.fastq

...

Paired-end Illumina 36-bp reads

...

2K generation evolved E. coli strain

...

SRR030254_1.fastq SRR030254_2.fastq

...

Paired-end Illumina 36-bp reads

...

5K generation evolved E. coli strain

...

SRR030255_1.fastq SRR030255_2.fastq

...

Paired-end Illumina 36-bp reads

...

10K generation evolved E. coli strain

...

SRR030256_1.fastq SRR030256_2.fastq

...

Paired-end Illumina 36-bp reads

...

15K generation evolved E. coli strain

...

The summary page provides useful information about the percent of reads mapping to the genome as well as the overall coverage of the genome. The Mutation Predictions page is where most of the analysis time is spent in determining which mutations are important (and more rarely inaccurate).

Feel free to click around through the different mutations and examine their evidence when you have time, but first start the next breseq run so that it can be in the queue and completing while you look at the data. We will go over the different types of mutations and the evidence for them as a group towards the end of class today, but additional information on analyzing the output can be found at the following reference:

Deatherage, D.E., Barrick, J.E.. (2014) Identification of mutations in laboratory-evolved microbes from next-generation sequencing data using breseq. Methods Mol. Biol. 1151:165-188. «PubMed»

Example 2: E. coli data sets

Now we'll try running breseq on some Escherichia coli genomes from an evolution experiment. These files are larger. You don't want to run them in interactive mode. We'll submit them to the TACC queue all at once.

Data

The data files for this example are in the following path. Go ahead and copy them to a new folder in your $SCRATCH directory called BDIB_breseq_coli_clones:

Code Block

title	location of data files

$BI/ngs_course/ecoli_clones/data

File Name	Description	Sample
`SRR030252_1.fastq SRR030252_2.fastq`	Paired-end Illumina 36-bp reads

20K

0K generation evolved E. coli strain

SRR030258

SRR030253_1.fastq

SRR030258

SRR030253_2.fastq

Paired-end Illumina 36-bp reads

40K

2K generation evolved E. coli strain

NC

SRR030254_

012967.

1.

fasta

Reference Genome

E. coli B str. REL606

Running breseq on TACC

breseq may take an hour or two to run on these sequences, so you should submit to the serial queue instead of the development queue on TACC and should give a run time of 4 hours as a conservative estimate.

Since we have multiple data sets, this example will also give us an opportunity to run several commands as part of a single job on TACC, and use multiple cores on a single processor.

You'll want each command to look something like this:

Code Block
login1$ breseq -r NC_012967.1.gbk -o output_00K SRR030252_1.fastq SRR030252_2.fastq

...

`fastq SRR030254_2.fastq`	Paired-end Illumina 36-bp reads	5K generation evolved E. coli strain
`SRR030255_1.fastq SRR030255_2.fastq`	Paired-end Illumina 36-bp reads	10K generation evolved E. coli strain
`SRR030256_1.fastq SRR030256_2.fastq`	Paired-end Illumina 36-bp reads	15K generation evolved E. coli strain
`SRR030257_1.fastq SRR030257_2.fastq`	Paired-end Illumina 36-bp reads	20K generation evolved E. coli strain
`SRR030258_1.fastq SRR030258_2.fastq`	Paired-end Illumina 36-bp reads	40K generation evolved E. coli strain
`NC_012967.1.fasta`	Reference Genome	E. coli B str. REL606

Code Block

language	bash
title	Command to copy data files to new folder
collapse	true

mkdir BDIB_breseq_coli_clones
cp $BI/ngs_course/ecoli_clones/data/* BDIB_breseq_coli_clones

cd BDIB_breseq_coli_clones

Running breseq on TACC

breseq may take an hour to run on these sequences, so you should submit to the normal queue instead of the development queue on TACC and should give a run time of 3 hours as a conservative estimate. Since we have multiple data sets, this example will also give us an opportunity to run several commands as part of a single job on TACC, and use multiple cores on a single processor. You'll want each command (line) in the commands file to look something like this:

Code Block
breseq -j 12 -r NC_012967.1.gbk -o output_<XX>K SRR030252_1.fastq SRR030252_2.fastq &> <XX>K.log.txt

Notice we've added some additional options:

part	puprose
&> <XX>00K.log.txt	Redirect both the standard output and the standard error streams to a file called <XX>00k.log.txt. It is important that you replace the <XX> to send it to different files, but KEEP the &> as those are telling the command line to send the streams to that file.
-o output_<xx>00k	all of those output directories should be put in the specified directory, instead of the current directory. If we don't include this (and chande the <XX>), then we will end up writing the output from all of the runs on top of one other. The program will undoubtedly get confused, possibly crash, and generally it

...

will be a mess.

Tip

title	checking command before submitting to queues

It is often a good idea to try running a command that you are about to submit to the TACC queue yourself, just to be sure you have all the options and paths correct. Otherwise you will have to wait until it starts running on TACC in order to find out that it it failed immediately, which can be frustrating. Try running the command above

...

on the terminal before using launcher_creator.py. If you include the &> option at the end, you will see nothing happen as all of the output is being directed to a new location. Count to ten slowly and then use control-c to cancel the command

...

and use ls to make sure the output file is created and use tail or cat to make sure that the program is running rather than crashing.

Expand

Here's what an example commands file might look like...

Here's what an example commands file might look like...	title	Click here for commands file example and launcher_creator.py generator

Code Block

title	Example commands file

breseq -j 12 -r NC_012967.1.gbk -o output_00K SRR030252_1.fastq SRR030252_2.fastq &> 00K.log.txt
breseq -j 12 -r NC_012967.1.gbk -o output_02K SRR030253_1.fastq SRR030253_2.fastq &> 02K.log.txt
breseq -j 12 -r NC_012967.1.gbk -o output_05K SRR030254_1.fastq SRR030254_2.fastq SRR030254_2.fastq&> 05K.log.txt
breseq -j 12 -r NC_012967.1.gbk -o output_10K SRR030255_1.fastq SRR030255_2.fastq &> 10K.log.txt
breseq -j 12 -r NC_012967.1.gbk -o output_15K SRR030256_1.fastq SRR030256_2.fastq &> 15K.log.txt
breseq -j 12 -r NC_012967.1.gbk -o output_20K SRR030257_1.fastq SRR030257_2.fastq &> 20K.log.txt
breseq -j 12 -r NC_012967.1.gbk -o output_40K SRR030258_1.fastq SRR030258_2.fastq

Once you have your commands file ready, then you need to create your launcher.sge script.

Expand

Can't remember the commands?

 &> 40K.log.txt

Code Block
launcher_creator.py -q serialnormal -t 43:00:00 ... <your other options> qsub launcher.sge

Examining breseq results

As before, copy the data back to your computer and examine the HTML output in a web browser.

...

Version	Old Version 6	New Version 7
Changes made by	Deatherage, Daniel E	Deatherage, Daniel E
Saved on	May 22, 2015	May 24, 2015

Versions Compared

Key

Example 2: E. coli data sets

Data

Example 2: E. coli data sets

Data

Running breseq on TACC

Running breseq on TACC

Examining breseq results

Content Comparison

Versions Compared

Key

Example 2: E. coli data sets

Data

Example 2: E. coli data sets

Data

Running breseq on TACC

Running breseq on TACC

Examining breseq results