Breseq - installation GVA2020

Overview

As we have seen though several points in the course, controlling for different versions of different programs can cause headaches. In this tutorial you will install your own copy of the breseq analysis pipeline. Additionally (and more importantly), you will install an updated version of bowtie2 which will address a bug in bowtie2 that prevents breseq from being able to run on multiple threads.

Learning objectives

  1. Check versions of bowtie2 and breseq to verify this tutorial is necessary
  2. upgrade bowtie2 
  3. clone your own copy of breseq

Checking installations and versions

bowtie2 checks
which -a bowtie2
bowtie2 --version

I expect the which command gives 1 line of data listing '/opt/apps/intel18/bowtie/2.3.4/bin/bowtie2' and the 2nd command gives several lines of output that includes 'version 2.3.4' in the first line, you will need to upgrade bowtie2 to a different version. This tutorial will go over installing bowtie 2.3.5.1 as this is version is known to address the error while more recent versions are unchecked.

breseq checks
which -a breseq 
breseq --version

I expect the which command gives 1 line of data listing '/corral-repl/utexas/BioITeam/breseq/bin/breseq' and the 2nd command gives 'breseq 0.35.1', if so you do not have to clone your own copy of breseq, but it is still encouraged particularly if you envision using breseq in your own work outside of this course.

Warning against an idev node

Unlike most warnings you get about idev nodes during this class, this one is actually a warning against being on an idev node as idev sessions typically download files slower.


Upgrading bowtie2

download version 2.3.5.1 of bowtie2 to your src folder in work directory
mkdir $WORK/src
cd $WORK/src
wget https://github.com/BenLangmead/bowtie2/releases/download/v2.3.5.1/bowtie2-2.3.5.1-linux-x86_64.zip
unzip bowtie2-2.3.5.1-linux-x86_64.zip

Next we need to make sure that that version of bowtie2 is in your path variable. Do one of the following:

  1. Make 2 changes to your .bashrc file using nano

    modify your your .bashrc file in your $HOME directory
    # comment out the line in the module system listing module load bowtie/2.3.4
    # add the following line in the section dealing with the path variable
    export PATH=$PATH:$WORK/src/bowtie2-2.3.5.1-linux-x86_64  #temp bowtie2 executables
  2. Copy the updated version of the .bashrc file from $BI/scripts

    Preferred solution
    cp /corral-repl/utexas/BioITeam/scripts/GVA2020.bashrc.updated_bowtie2 $HOME/.bashrc
    chmod 700 .bashrc

Finally, log out of tacc and log back in using ssh. 

bowtie2 checks
which -a bowtie2
bowtie2 --version

should now return 1 line similar to '/work/01821/ded/lonestar/src/bowtie2-2.3.5.1-linux-x86_64/bowtie2' and the first line of the 2nd command end with "version 2.3.5.1". If not get my attention on zoom.

Testing what happens if you load the bowtie/2.3.4 module

As mentioned in an earlier tutorial, when you load a module TACC assumes you are about to use it and therefore appends the directories associated with the module at the front of the $PATH. Consider the output to the following block of code to figure out how you could rerun the mapping tutorial with bowtie2 version 2.3.4.

bowtie2 checks
which -a bowtie2
module load bowtie/2.3.4
which -a bowtie2
module unload bowtie/2.3.4
which -a bowtie2

Remember, when you use the -a option on a which command and have multiple lines of output, it is always the top line that is executed.


Cloning breseq

As mention above, this is not required to complete other breseq based tutorials in the course, however, it is highly recommended for anyone who anticipates using breseq in their own work. Initially, cloning a github repository as exceptionally similar to using the wget command to download bowtie2 above, it involves typing 'git clone' followed by a web address where the repository is stored. As we did for installing bowtie2 with wget we'll clone the repository into a 'src' directory inside of $WORK.

Using the mkdir command to create a folder named 'src' inside of your $WORK directory
mkdir $WORK/src  #If you already have a src directory, you'll get a very benign error message stating that the folder already exists and thus can not be created
cd $WORK/src

In a web browser navigate to github and search for 'breseq' in the top right corner of the page. The top result will be for barricklab/breseq; click the green box for 'clone or download' and either control/command + C on the address listed, or click the clipboard icon to copy the repository address. This image may be helpful if you are are having trouble locating the green box

Once you have copied the address and are in the $WORK/src directory clone the repository with 'git clone'
git clone https://github.com/barricklab/breseq.git

You will see several download indicators increase to 100%, and when you get your command prompt back the ls command will show a new folder named 'breseq' containing a set of files. Alternatively, you may get an error message saying "fatal: destination path 'breseq' already exists and is not an empty directory." This means you have previously clone the repository, don't worry that just means when you use the 'git pull' command in the compiling breseq section you will get a message saying a bunch of changes are being made rather than a message of 'already up to date'  that a fresh cloning will show.

If you don't see said directory, or can't cd into that directory let the instructor know.

You may be thinking that this really does seem remarkably similar to the wget downloading you did for bowtie2 and wondering why you just don't do that

The answer is that in the future if you want to upgrade to the latest version of breseq, rather than having to navigate to the github page, check if the version is different, copy the download link, you can now use the 'git pull' command to check if there is a new version available and automatically download it. Further git control allows you to more quickly roll back to an older version if you need to (say you want to add another small set of samples to an existing analysis that you did a year ago without wanting to have to rerun all the old samples).


Compiling breseq

Much like how downloading bowtie2 was not enough to make it usable, the same will be true of breseq. Unlike bowtie2, rather than just editing our path or moving executable files around we have to compile the code.

Command to compile the most recent version of breseq
mkdir -p $HOME/local/bin
module unload bowtie/2.3.4 
cd $WORK/src/breseq
git pull
make clean
./bootstrap.sh
./configure --prefix=$HOME/local
make
make install

The above will take several minutes to complete, but should always be printing something something new to the screen within 30 seconds or less.

Testing breseq

It is always a good idea to test the compilation after you use the make install command.
make test

This command is expected to take a total of ~13 minutes with no one step should take more than 30-60 seconds. Unfortunately, with all the the text scrolling around on the screen, it makes it difficult to notice that there are actually a number of different tests being conducted each of which can pass or fail as a way of informing you what exactly is going wrong. Just before you get a final report of how long the command took, you see a block of text that reads: 

OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
Passed check
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO

It is a good idea to use 'find' option like you would on a web browser hitting command F to bring up a find box and hit the back arrow several times looking for the word ' check' which IS preceded by a space. I expect this will show all tests passed with 1 test failing in an area that is being actively developed currently relating to applying mutations to 'gbk' files. This should not cause any issues for your work in the course our outside of it.

Your installation has likely failed and you need to get my attention if:

You see multiple tests have failed, or the make test command takes less than 5 minutes


Next steps

Now that you have your own copy of breseq, you can:

  1. Go back to the intro breseq tutorial and map the set of data you worked through the mapping and SNV discovery process so you can compare it directly to the results you saw in the IGV tutorial.
  2. You may also move onto the advanced breseq tutorial.
  3. While it doesn't seem that it is directly applicable to anyone's work this year, there is also a tutorial that deals with molecular indexes and lower error rates that uses breseq.
  4. You could combine all the different parts of the required tutorials, and go back to the read processing tutorial and trim both read1 and read2, then run the improved samples through breseq, and compare the results to running those same files through bowtie2 tutorial and SNV tutorials separately.
  5. If you aren't sure what you should be working on, as always, just ask and I'll give some recommendations. 

Return to GVA2020 home page.