Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Most approaches for predicting structural variants require you to have paired-end or mate-pair reads. They use the distribution of distances separating these reads to find outliers and also look at pairs with incorrect orientations. As mentioned during several of the presentations, many researchers choose to ignore these types of mutations and combined with the increased difficulty of accurately identifying them, the community is less settled on the "best" way to analyze them. Here we present a tutorial on SVDetect based on the quality of its instructions, and easy of installation despite its use of relatively hefty configuration fileson a somewhat older program SVDetect. SVDetect is a type of program that makes use of configuration files rather than command line options (something you may encounter with other programs in your own work).

Other possible tools:

  • BreakDancer - hard to install prerequisites on TACC. Requires installing libgd and the notoriously difficult GD Perl module.
  • PEMer - hard to install prerequisites on TACC. Requires "ROOT" package.

...

Comparison of many different SV tools

Learning objectives:

  1. Identify structural variants in a new data set.

  2. Work with a new type of program that uses configuration files rather than entering all information on a single command at the command line. This is very similar to the queue system TACC uses making this a good introduction.

...

Possible Input/Output errors experienced with the above command
Code Block
languagebash
titlesuggested directory set up
cds
cp -r $BI/gva_course/structural_variation/data GVA_sv_tutorial
cd GVA_sv_tutorial
Note
title

At least 1 student was experiencing an issue with the above command where an "Input/Output" error message was generated, and the files were copied, but the files were


This is Illumina mate-paired data (having a larger insert size than paired-end data) from genome re-sequencing of an E. coli clone.

...

First we need to (surprise!) map the data. This will hopefully reinforce the bowtie2 tutorial you just completed.

Warning
titleDo not run on head node

Use showq -u hostname to verify you are still on the idev node.

If not, and you need help getting a new idev node, see this tutorial.

Code Block
bowtie2-build NC_012967.1.fasta NC_012967.1
bowtie2 -t -p 4864 -X 5000 --rf -x NC_012967.1 -1 61FTVAAXX_2_1.fastq -2 61FTVAAXX_2_2.fastq -S 61FTVAAXX.sam

...

You may notice that these commands complete pretty quickly. Always remember speed is not necessarily representative of how taxing something is for TACC's head node, and always try to be a good TACC citizen and do as much as you can on idev nodes or as job submissions

...

titlePeople have previously asked what makes bowtie2 better/different than 'bowtie'

...

Install SVDetect

This will be the most complicated installation yet. In addition to needing to install several different programs in the same conda installation command, we will need to install perl modules through cpan. Unfortunately, the cpan network can not be accessed through the compute nodes, so you must log out of your idev session using the logout command before continuing. If you are unsure if you are in an idev session remember you can use the hostname command to check.

conda  installation

Like we saw in our samtools installation, we will need to install several programs at the same time to make sure they are all going to work with each other. In addition, we are going to create a new environment for working with SVDetect as some of the dependencies of SVDetect clash with those of samtools.

Code Block
languagebash
conda create --name SVDetect -c bioconda -c conda-forge -c imperial-college-research-computing _libgcc_mutex perl libgcc-ng svdetect



Code Block
languagebash
titleWe can now activate our new environment
conda activate SVDetect

cpan module installations

If you attempt to launch cpan, you likely get a message similar to the following:

No Format
/home1/0004/train402/miniconda3/envs/svdetect/bin/perl: symbol lookup error: /home1/apps/bioperl/1.007002/lib/perl5/x86_64-linux-thread-multi/auto/version/vxs/vxs.so: undefined symbol: Perl_xs_apiversion_bootcheck
Code Block
languagebash
titleUsing the "which -a" command shows us we actually have access to multiple different cpan executables. Friday's class will discuss more about how you can end up with multiple executable files named the same thing stored in different directories, and how the command line will treat them
which -a cpan
No Format
~/miniconda3/envs/SVDetect/bin/cpan
/bin/cpan
/usr/bin/cpan


While just typing cpan the first location was used and we saw it didn't work as we had hoped. We can explicitly launch the 2nd location by using the full path to the executable file on the prompt

Code Block
languagebash
/bin/cpan

In the following block note that each elipse will include large blocks of scrolling text as different modules are downloaded and installed. The process will take several minutes in total, just be ready to execute the next command when you get the cpan prompt back.

Code Block
titleInstall Perl modules required for SVDetect
# choose 'yes' to do as much automatically as possible 
# choose 'local::lib' for the approach you want (as you don't have admin rights on TACC)
...
cpan[1]> install Config::General
...
cpan[2]> install Tie::IxHash
...
cpan[3]> install Parallel::ForkManager
...
cpan[4]> quit


Once you quit cpan, you will get a message to restart your shell. Since you are on a remote computer, you can accomplish the same thing by logging out of TACC and sshing back in.


Tip

If the above bold letters are not enough of a clue for what you need to do here (and/or where you need to go to find appropriate minitutorials), now is a good time to start thinking about what question you need to be asking or sending in an email. It is ok to be overwhelmed or lost especially with the class being virtual and not being able to get good feedback from me directly on your progress. I am happy too help, but can only do so if I know you are struggling.


Once you have logged back in, be sure to restart a new idev session, and activate your SVDetect conda environment.

Analyze read mapping distribution

The first step is to look at all mapped read pairs and whittle down the list only to those that have an unusual insert sizes (distances between the two reads in a pair).

Code Block
cd $SCRATCH/GVA_sv_tutorial
BAM_preprocessingPairs.pl -p 0 61FTVAAXX.sam

...

The following commands will take a while few minutes each and must be completed in order, so no advantages/ability to have them run in the background. Consult the manual for a full description of what these commands and options are doing while the commands are running.

Config/General.pm

did not return a true value at /corral-repl/utexas/BioITeam/bin/SVDetect line 48. BEGIN failed--compilation aborted at /corral-repl/utexas/BioITeam/bin/SVDetect line 48.
Code Block
titleCommands to run SNVDetect
SVDetect linking -conf svdetect.conf
SVDetect filtering -conf svdetect.conf
SVDetect links2SV -conf svdetect.conf
Warning
titlePossible errors on idev nodes

In reviewing these tutorials these commands were not executing for me in idev sessions for unknown reasons. By chance I had an idev session time out with me noticing and I noticed it did run on the head node. Try the above commands 1 at a time, but if you see error messages like the following logout of your idev session with the logout command, and then execute them 1 at a time on the head node. While this is not the best citizenship, the program gave no indications of being a problem.

Feedback from other students says this is not a problem limited to me and that multiple people are experiencing the same problem. Running these commands on the head node should be acceptable, for reasons that will be discussed in zoom.

No Format


Take a look at the resulting final output file: 61FTVAAXX.ab.sam.links.filtered.sv.txt. Another downside of command line applications is that while you can print files to the screen, the formatting is not always the nicest. On the plus side in 95% of cases, you can directly copy the output from the terminal window to excel and make better sense of what the columns actually are

...

title
Expand
titleAny idea what sorts of mutations produced these three structural variants?

1. This is a tandem head-to-tail duplication of the region from approximately 600000 to 663000.
2. This is just the origin of the circular chromosome, connecting its end to the beginning!
3. This is a big chromosomal inversion mediated by recombination between repeated IS elements in the genome. It would not have been detected if the insert size of the library wasn't > ~1,500 bp!

... Many of the others are due to new insertions of transposable elements.

Expand

click here for installation instructions

Optional: Install SVDetect

We have installed SVdetect for you already as installation is a bit difficult (though still much easier than the alternatives listed in the introduction). You can verify it's location using which SVDetect in your $PATH under $BI/bin. One of the advantages (or disadvantages) of using the communal resource is that someone else can update all the necessary programs and packages for you. Alternatively, you can make a personal copy of the program yourself using the following commands. NOTE that this is presented mostly to underscore how spoiled we are with modules and the BioITeam.

Install SVDetect scripts

Navigate to the SVDetect project page

More information:

Download the code onto TACC.

Code Block
wget http://downloads.sourceforge.net/project/svdetect/SVDetect/0.80/SVDetect_r0.8b.tar.gz
tar -xvzf SVDetect_r*.tar.gz
cd SVDetect_r*

Move the Perl scripts and make them executable

Code Block
cp bin/SVDetect $HOME/local/bin
chmod 775 scripts/BAM_preprocessingPairs.pl
cp scripts/BAM_preprocessingPairs.pl $HOME/local/bin

Install required Perl modules

SVdetect requires a few Perl modules to be installed. In the default TACC environment, you can use the cpan shell to install most well-behaved Perl modules (with the exception of some complicated ones that require other libraries to be installed or things to compile). Here's how:

Code Block
titleInstall Perl modules required for SVDetect
This can not be done from an idev session
login1$ cpan
# choose yes to do as much automatically as possible and 'local::lib' for how you want to install modules as you don't have admin rights on TACC
...
cpan[4]> install Config::General
...
cpan[4]> install Tie::IxHash
...
cpan[4]> install Parallel::ForkManager
...
cpan[4]> quit
login1$


Return to GVA2020 Return to GVA2021 course page.