Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

File Name

Description

Sample

61FTVAAXX_2_1.fastq

Paired-end Illumina, First of mate-pair, FASTQ format

Re-sequenced E. coli genome

61FTVAAXX_2_2.fastq

Paired-end Illumina, Second of mate-pair, FASTQ format

Re-sequenced E. coli genome

NC_012967.1.fasta

Reference Genome in FASTA format

E. coli B strain REL606

NC_012967.1.lengths

Simple tab delimtered file based on the size of the reference needed for SVDetect so you don't have to create it yourself 

Map data using bowtie2

First we need to (surprise!) map the data. This will hopefully reinforce the bowtie2 tutorial you just completed, but if you are feeling adventurous you could use BWA as optional reinforcement.

...

Code Block
titleCreate the file svdetect.conf with this text
linenumberstrue
<general>
input_format=sam
sv_type=all
mates_orientation=RF
read1_length=35
read2_length=35
mates_file=/full/path/to/61FTVAAXX.ab.sam
cmap_file=/full/path/to/NC_012967.1.lengths
num_threads=48
</general>

<detection>
split_mate_file=0
window_size=2000
step_length=1000
</detection>

<filtering>
split_link_file=0
nb_pairs_threshold=3
strand_filtering=1
</filtering>

<bed>
  <colorcode>
    255,0,0=1,4
    0,255,0=5,10
    0,0,255=11,100000
  </colorcode>
</bed>

 

You also need to create a make sure you have a copy of the tab-delimited file of chromosome lengths named NC_012967.1.lengths. YOU CAN NOT COPY PASTE THIS COMMAND ! Use the nano commandinto a new file for 2 reasons! The first reason you can't copy the command is the tab characters don't translate correctly. The second is the nano text editor is not available on the compute nodes, and learning to use the vim editor for a single line is way more work than it is worth. Make sure the the NC_012967.1.lengths file has the following structure up to the comment, and that the <tab> is replaced with an actual tab character.

Code Block
titleFile NC_012967.1.lengths
1<tab>NC_012967<tab>4629812  # Use the tab key rather than writing out <tab>!!

...

Expand
titleclick here for installation instructions

Optional: Install SVDetect

We have installed SVdetect for you already as installation is a bit difficult (though still much easier than the alternatives listed in the introduction). You can verify it's location using which SVDetect in your $PATH under $BI/bin. One of the advantages (or disadvantages) of using the communal resource is that someone else can update all the necessary programs and packages for you. Alternatively, you can make a personal copy of the program yourself using the following commands. NOTE that this is presented mostly to underscore how spoiled we are with modules and the BioITeam.

Install SVDetect scripts

Navigate to the SVDetect project page

More information:

Download the code onto TACC.

Code Block
wget https://sourceforge.net/projects/svdetect/files/latest/download
tar -xvzf SVDetect_r*.tar.gz
cd SVDetect_r*

Move the Perl scripts and make them executable

Code Block
cp bin/SVDetect $HOME/local/bin
chmod 775 scripts/BAM_preprocessingPairs.pl
cp scripts/BAM_preprocessingPairs.pl $HOME/local/bin

Install required Perl modules

SVdetect requires a few Perl modules to be installed. In the default TACC environment, you can use the cpan shell to install most well-behaved Perl modules (with the exception of some complicated ones that require other libraries to be installed or things to compile). Here's how:

Code Block
titleInstall Perl modules required for SVDetect
login1$ module load perl
login1$ cpan
...
cpan[4]> install Config::General
...
cpan[4]> install Tie::IxHash
...
cpan[4]> install Parallel::ForkManager
...
cpan[4]> quit
login1$

Return to GVA2017 course page.