Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In order to run breseq, we need to install it. If you think this sounds like a great opportunity to use conda you are right! Using https://anaconda.org/  you find 2 different results for breseq. 1 is in the bioconda channel that we have worked with multiple times and has been downloaded >19,000 times. The other is from a contributor with limited track record, nothing has been updated in more than 3.5 years, and has only been accessed 6 times. Like with apps on your phone the answer to "which is the correct one" should be rather obvious.

...

languagebash
titleCheck that you have access to breseq

...

If you try the basic installation command, you might notice the following important lines:

No Format
The following packages will be UPDATED:
...
  openssl                                 1.0.2u-h7b6447c_0 --> 1.1.1k-h27cfd23_0
...

The following packages will be DOWNGRADED:

  samtools                                  1.9-h10a08f8_12 --> 1.7-1

Recall that in our SNV tutorial, we went through a lot of trouble to install samtools version 1.9 and that part of the solution was preventing openssl version 1.1 from installing. Based on this (and our desire to maintain these tools), it is a better idea to set up a new environment with breseq in it as we did for SVDetect.

As breseq is a "all in 1 tool" that works to map, call variants, sift signal from noise, and provide basic visualization, you may think that breseq is the only program you would use for the analysis of appropriate samples. Do not forget that read qc and read processing actually take place before running breseq. Therefor a full environment might contain (such as the one I use in my own analysis): fastqc, trimmomatic, and breseq. Nicely, all 3 programs are in the bioconda channel. Using what you have already learned so far, see if you can create a new environment with these 3 programs.

Code Block
languagebash
titleYou can name your new environment anything you want, my suggestion would be GVA-breseq so you remember both that it was part of this class, as well as what is in it.
collapsetrue
conda create --name GVA-breseq -c bioconda fastqc trimmomatic breseq
conda activate GVA-breseq

Using the what you know so far see if you can figure out what versions of the 3 programs you have installed is.

Expand
titleClick here for expected answer
No Format
(GVA-breseq) tacc:/scratch/0004/train402/GVA_samtools_tutorial$ fastqc --version
FastQC v0.11.9
(GVA-breseq) tacc:/scratch/0004/train402/GVA_samtools_tutorial$ trimmomatic -version
0.39
(GVA-breseq) tacc:/scratch/0004/train402/GVA_samtools_tutorial$ breseq --version
breseq 0.35.7


breseq should now run using the breseq command. breseq without any options will show you what the command expectations are.

...

Code Block
languagebash
titleBy this point in the course you should not need to expand this box to see the suggested solution. You should continue expanding boxes such as this to make sure you are not drifting too far
collapsetrue
mkdir $SCRATCH/GVA_breseq_lambda_mixed_pop
cp $BI/ngs_course/lambda_mixed_pop/data/* $SCRATCH/GVA_breseq_lambda_mixed_pop
As mentioned over zoom this is one instance that i know for sure copying these files while on an idev node may not work giving Input/Output . If you are already an idev session and this does not work, just use the logout command to exit the idev session and retry the copy command. If both methods fail, please get my attention so we can figure out what is going on.By a similar token if you actually are on an idev node and are able to transfer the files, please let me know as it may help figure out what the real source of the error is
Note
titlePossible errors on idev nodes
errors
on
idev nodes

As mentioned yesterday, you can not copy from the BioITeam (because it is on corral-repl) while on an idev node. Logout of your idev session, copy the files, and be sure to start a new idev session as breseq should not be run on the headnode.

Code Block
titleNow use the ls command to see what files were copied. again, you should not need to expand this to get the output listed below
collapsetrue
ls $SCRATCH/GVA_breseq_lambda_mixed_pop

...

Because this data set is relatively small (roughly 100x coverage of a 48,000 bp genome), a breseq run will take < 5 minutes, but it is computationally intense enough that it should not be run on the head node since we have reservations and theres no reason not to use them

Warning
titleRemember to make sure you are on an idev done

For reasons discussed numerous times throughout the course already, please be sure you are on an idev done. Remember the hostname command and showq -u can be used to check if you are on one of the login nodes or one of the compute nodes. If you need more information or help re-launching a new idev node, please see this tutorial.

Code Block
languagebash
titlebreseq command
cd $SCRATCH/GVA_breseq_lambda_mixed_pop
breseq -r lambda.gbk lambda_mixed_population.fastq

...