Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We have installed SVdetect for you under $BI/bin so it . It should be in your $PATH. Installation is a bit difficult,   so you can please skip this part.!

Install SVDetect scripts

Navigate to the SVDetect project page

...

Here's an E. coli genome re-sequencing sample where a key mutation producing a new structural variant was responsible for a new phenotype.

Code Block
cds

cp -r $BI/gva_course/structural_variation/data structural_variation
cd structural_variation

...

File Name

Description

Sample

61FTVAAXX_2_1.fastq

Paired-end Illumina, First of mate-pair, FASTQ format

Re-sequenced E. coli genome

61FTVAAXX_2_2.fastq

Paired-end Illumina, Second of mate-pair, FASTQ format

Re-sequenced E. coli genome

NC_012967.1.fasta

Reference Genome in FASTA format

E. coli B strain REL606

Map data using BWA

You should submit the bwa aln and bwa sampe commands as jobs to the queue, one after the other.

...

First we need to (surprise!) map the data. We'll use bowtie2 for this tutorial, but BWA would also work with the proper options.

Warning
titleDo not run on head node

Many of the commands past this point are computationally intensive. You should run them through an idev shell or by qsub. We recommend idev for the tutorial, but you could also qsub any command that takes more than a few seconds to complete.

Code Block
titleExample command to start an idev shell
idev -m 60 -q development -a CCBB
Code Block
bowtie2-build NC_012967.1.fasta 61FTVAAXX_2_1.fastq
bwa aln -f 61FTVAAXX_2_2.sai NC_012967.1.fasta 61FTVAAXX_2_2.fastq
bwa sampe -n 0 -N 100 -f 61FTVAAXX.sam
bowtie2 -p 12 -X 5000 --rf -x NC_012967.1 -1.fasta 61FTVAAXX_2_1.sai 61FTVAAXX_2_2.saifastq -2 61FTVAAXX_2_12.fastq -S 61FTVAAXX_2_2.fastq
.sam

Possibly unfamiliar options:

  • -

...

  • -rf tells bowtie2 that your read pairs are in the "reverse-forward" orientation of a mate-pair library
  • -X 5000 tells bowtie2 to not mark read pairs as discordant unless their insert size is greater than 5000 bases.

If you use bowtie (version 1) to do your mapping, you won't predict any read SVs. Why?

Expand
Answer....
Answer....

bowtie doesn't map discordant output discordantly mapped pairs!

Run SVDetect

The first step is to look at all mapped read pairs and whittle down the list only to those that have an unusual insert sizes (distances between the two reads in a pair). You should submit this command to the TACC queue.

...

Code Block
titleCreate the file svdetect.conf with this text
<general>
input_format=sam
sv_type=all
mates_orientation=RF
read1_length=35
read2_length=35
mates_file=/full/path/to/61FTVAAXX.ab.sam
cmap_file=/full/path/to/NC_012967.1.lengths
num_threads=1
</general>

<detection>
split_mate_file=0
window_size=2000
step_length=1000
</detection>

<filtering>
split_link_file=0
nb_pairs_threshold=3
strand_filtering=1
</filtering>

<bed>
  <colorcode>
    255,0,0=1,4
    0,255,0=5,10
    0,0,255=11,100000
  </colorcode>
</bed>
Note
You'll need to substitute your own paths for /full/path/to/61FTVAAXX.ab.sam and /full/path/to/NC_012967.1.lengths.

You also need to create a tab-delimited file of chromosome lengths. Use the tab key rather than writing out <tab>!!

Code Block
titleFile NC_012967.1.lengths
1<tab>NC_012967<tab>4629812

You'll want to submit the first two commands to the TACC queue or do them in an idev shell. They take a while.

Consult the manual for a full description of what these commands and options are doing.

Code Block
titleCommands to run SNVDetect
SVDetect linking -conf ./svdetect.conf
SVDetect filtering -conf ./svdetect.conf
SVDetect links2SV -conf ./svdetect.conf

Take a look at the resulting file: 61FTVAAXX.ab.sam.links.filtered.sv.txt.

...

Code Block
chr_type        SV_type BAL_type        chromosome1     start1-end1     average_dist    chromosome2     start2-end2     nb_pairs        score_strand_filtering  score_order_filtering   score_insert_size_filtering     final_score     breakpoint1_start1-end1 breakpoint2_start2-end2
...
INTRA   NORMAL_SENSE    -       chrNC_012967    599566-601002601025   -       chrNC_012967    662994663036-664998664898   430 1577    100%    -       -       1       -       -
...
INTRA   NORMAL_SENSE    -       chrNC_012967    3-2025  -       chrNC_012967    4627019-4628998 989288     100%    -       -       1       -       -
...
INTRA   REVERSE_SENSE   -       chrNC_012967    16999-1890319033     -       chrNC_012967    27759792775082-27778382777014 873274     100%    -       -       1       -       -

Any idea what sorts of mutations produced these three structural variants?

Expand
Answers...
Answers...

1. This is a tandem head-to-tail duplication of the region from approximately 600000 to 663000.
2. This is just the origin of the circular chromosome, connecting it's its end to the beginning!
3. This is a big chromosomal inversion mediated by recombination between repeated IS elements in the genome. It would not have been detected if the insert size of the library wasn't > ~1,500 bp!

... Many of the others are due to new insertions of transposable elements.

Very, Very Advanced Exercise

...