Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents


Most approaches for predicting structural variants require you to have paired-end or mate-pair reads. They use the distribution of distances separating these reads to find outliers and also look at pairs with incorrect orientations. As mentioned during several of the presentations, many researchers choose to ignore these types of mutations and combined with the increased difficulty of accurately identifying them, the community is less settled on the "best" way to analyze them. Here we present a tutorial on SVDetect based on the quality of its instructions, and easy of installation despite its use of relatively hefty configuration files.


Example: E. coli genome with structural variation

Here's an E. coli genome re-sequencing sample where a key mutation producing a new structural variant was responsible for a new phenotype involving citrate, one of Dacia's favorite topicssomething the Barrick lab has studied.

Code Block
titlesuggested directory set up
cp -r $BI/gva_course/structural_variation/data BDIBGVA_sv_tutorial
cd BDIBGVA_sv_tutorial

This is Illumina mate-paired data (having a larger insert size than paired-end data) from genome re-sequencing of an E. coli clone.

File Name




Paired-end Illumina, First of mate-pair, FASTQ format

Re-sequenced E. coli genome


Paired-end Illumina, Second of mate-pair, FASTQ format

Re-sequenced E. coli genome


Reference Genome in FASTA format

E. coli B strain REL606


Simple tab delimtered file based on the size of the reference needed for SVDetect so you don't have to create it yourself

Map data using bowtie2

First we need to (surprise!) map the data. This will hopefully reinforce the bowtie2 tutorial you just completed, but if you are feeling adventurous you could use BWA as optional reinforcement.

titleDo not run on head node

Make sure you are on an idev node using the command: showq -u

Code Block
titleIf you need a new idev node
idev  -m 120180 -r CCBB_5.23.17PMDay_2 -A UT-2015-05-18
Code Block
bowtie2-build NC_012967.1.fasta NC_012967.1
bowtie2 -p 48 -X 5000 --rf -x NC_012967.1 -1 61FTVAAXX_2_1.fastq -2 61FTVAAXX_2_2.fastq -S 61FTVAAXX.sam


  • --rf tells bowtie2 that your read pairs are in the "reverse-forward" orientation of a mate-pair library
  • -X 5000 tells bowtie2 to not mark read pairs as discordant unless their insert size is greater than 5000 bases.


You may notice that these commands complete pretty quickly. Always remember speed is not necessarily representative of how taxing something is for TACC's head node, and always try to be a good TACC citizen and do as much as you can on idev nodes or as job submissions 


titleIf you use version 1 of bowtie to do your mapping (or several other mapping programs), you won't predict any read SVs. Why?

bowtie doesn't output discordantly mapped pairs!

Run SVDetect

The first step is to look at all mapped read pairs and whittle down the list only to those that have an unusual insert sizes (distances between the two reads in a pair).


titleAnswers from the std output
  1. -- using -1142.566-5588.410 as normal range of insert size

  2. Approximately 20% based on:

    -- 994952 mapped pairs

    ---- 195705 abnormal mapped pairs

  3. Approximately 0.5% based on:

    -- Total : 1000000 pairs analysed

    -- 5048 pairs whose one or both reads are unmapped

  4. Consider the following:
    1. The first answer can tell you what type of library it is if you did not know ahead of time (remember paired end reads have ~500-700bp inserts on average not 1000s of bp)
    2. This should help underscore that a significant portion of your total reads (and thus variation) may be in structural variants. Unfortunately, this does require generating a mate-pair library to learn.
    3. Low levels of unmapped read pairs suggests that both the reference is accuateaccurate, of a high quality, and free of contamination.
      1. Note that contamination in this case refers only to other organisms, not other samples.


SVDetect demonstrates a common strategy in some programs with complex input where instead of including a lot of options on the command line, it reads in a simple text file that sets all of the required options. Lets look at how to create a configuration file:


You'll need to substitute the output of the pwd command on lines 7 and 8 below in front of "/full/path/to/" , but you will need to leave "61FTVAAXX.ab.sam" and "NC_012967.1.lengths".

This is often a source of problems in this tutorial

Code Block
titleCreate the file svdetect.conf with this text





You also need to make sure you have a copy of the tab-delimited file of chromosome lengths named NC_012967.1.lengths. YOU CAN NOT COPY PASTE THIS COMMAND into a new file for 2 reasons! The first reason you can't copy the command is the tab characters don't translate correctly. The second is the nano text editor is not available on the compute nodes, and learning to use the vim editor for a single line is way more work than it is worth.   Make sure the the NC_012967.1.lengths file you copied has the following structure up to the comment, and that the <tab> is replaced with an actual tab character.


Take a look at the resulting file: Another downside of command line applications is that while you can print files to the screen, the formatting is not always the nicest. On the plus side in 95% of cases, you can directly copy the output from the terminal window to excel and make better sense of what the columns actually are

WeI've highlighted a few lines below:


titleclick here for installation instructions

Optional: Install SVDetect

We have installed SVdetect for you already as installation is a bit difficult (though still much easier than the alternatives listed in the introduction). You can verify it's location using which SVDetect in your $PATH under $BI/bin. One of the advantages (or disadvantages) of using the communal resource is that someone else can update all the necessary programs and packages for you. Alternatively, you can make a personal copy of the program yourself using the following commands. NOTE that this is presented mostly to underscore how spoiled we are with modules and the BioITeam.

Install SVDetect scripts

Navigate to the SVDetect project page

More information:

Download the code onto TACC.

Code Block
tar -xvzf SVDetect_r*.tar.gz
cd SVDetect_r*

Move the Perl scripts and make them executable

Code Block
cp bin/SVDetect $HOME/local/bin
chmod 775 scripts/
cp scripts/ $HOME/local/bin

Install required Perl modules

SVdetect requires a few Perl modules to be installed. In the default TACC environment, you can use the cpan shell to install most well-behaved Perl modules (with the exception of some complicated ones that require other libraries to be installed or things to compile). Here's how:

Code Block
titleInstall Perl modules required for SVDetect
login1$ module load perl
login1$ cpan cpan
# choose yes to do as much automatically as possible and 'local::lib' for how you want to install modules as you don't have admin rights on TACC
cpan[4]> install Config::General
cpan[4]> install Tie::IxHash
cpan[4]> install Parallel::ForkManager
cpan[4]> quit

Return to GVA2017 GVA2019 course page.