...
Most approaches for predicting structural variants require you to have paired-end or mate-pair reads. They use the distribution of distances separating these reads to find outliers and also look at pairs with incorrect orientations. As mentioned during several of the presentations, many researchers choose to ignore these types of mutations and combined with the increased difficulty of accurately identifying them, the community is less settled on the "best" way to analyze them. Here we present a tutorial on SVDetect based on the quality of its instructions, and easy of installation despite its use of relatively hefty configuration fileson a somewhat older program SVDetect. SVDetect is a type of program that makes use of configuration files rather than command line options (something you may encounter with other programs in your own work).
Other possible tools:
- BreakDancer - hard to install prerequisites on TACC. Requires installing libgd and the notoriously difficult GD Perl module.
- PEMer - hard to install prerequisites on TACC. Requires "ROOT" package.
...
Comparison of many different SV tools
Learning objectives:
Identify structural variants in a new data set.
- Work with a new type of program that uses configuration files rather than entering all information on a single command at the command line. This is very similar to the queue system TACC uses making this a good introduction.
...
| Code Block | ||||
|---|---|---|---|---|
| ||||
cds cp -r $BI/gva_course/structural_variation/data GVA_sv_tutorial cd GVA_sv_tutorial | ||||
| Note | ||||
| title | Possible Input/Output errors experienced with the above command||||
At least 1 student was experiencing an issue with the above command where an "Input/Output" error message was generated, and the files were copied, but the files were |
This is Illumina mate-paired data (having a larger insert size than paired-end data) from genome re-sequencing of an E. coli clone.
...
First we need to (surprise!) map the data. This will hopefully reinforce the bowtie2 tutorial you just completed.
| Warning | ||
|---|---|---|
| ||
Use showq -u hostname to verify you are still on the idev node. If not, and you need help getting a new idev node, see this tutorial. |
| Code Block |
|---|
bowtie2-build NC_012967.1.fasta NC_012967.1 bowtie2 -t -p 4864 -X 5000 --rf -x NC_012967.1 -1 61FTVAAXX_2_1.fastq -2 61FTVAAXX_2_2.fastq -S 61FTVAAXX.sam |
...
You may notice that these commands complete pretty quickly. Always remember speed is not necessarily representative of how taxing something is for TACC's head node, and always try to be a good TACC citizen and do as much as you can on idev nodes or as job submissions
...
| title | People have previously asked what makes bowtie2 better/different than 'bowtie' |
|---|
...
Install SVDetect
This will be the most complicated installation yet. In addition to needing to install several different programs in the same conda installation command, we will need to install perl modules through cpan. Unfortunately, the cpan network can not be accessed through the compute nodes, so you must log out of your idev session using the logout command before continuing. If you are unsure if you are in an idev session remember you can use the hostname command to check.
conda installation
Like we saw in our samtools installation, we will need to install several programs at the same time to make sure they are all going to work with each other. In addition, we are going to create a new environment for working with SVDetect as some of the dependencies of SVDetect clash with those of samtools.
| Code Block | ||
|---|---|---|
| ||
conda create --name SVDetect -c bioconda -c conda-forge -c imperial-college-research-computing _libgcc_mutex perl libgcc-ng svdetect |
| Code Block | ||||
|---|---|---|---|---|
| ||||
conda activate SVDetect |
cpan module installations
If you attempt to launch cpan, you likely get a message similar to the following:
| No Format |
|---|
/home1/0004/train402/miniconda3/envs/svdetect/bin/perl: symbol lookup error: /home1/apps/bioperl/1.007002/lib/perl5/x86_64-linux-thread-multi/auto/version/vxs/vxs.so: undefined symbol: Perl_xs_apiversion_bootcheck |
| Code Block | ||||
|---|---|---|---|---|
| ||||
which -a cpan |
| No Format |
|---|
~/miniconda3/envs/SVDetect/bin/cpan
/bin/cpan
/usr/bin/cpan |
While just typing cpan the first location was used and we saw it didn't work as we had hoped. We can explicitly launch the 2nd location by using the full path to the executable file on the prompt
| Code Block | ||
|---|---|---|
| ||
/bin/cpan |
In the following block note that each elipse will include large blocks of scrolling text as different modules are downloaded and installed. The process will take several minutes in total, just be ready to execute the next command when you get the cpan prompt back.
| Code Block | ||
|---|---|---|
| ||
# choose 'yes' to do as much automatically as possible
# choose 'local::lib' for the approach you want (as you don't have admin rights on TACC)
...
cpan[1]> install Config::General
...
cpan[2]> install Tie::IxHash
...
cpan[3]> install Parallel::ForkManager
...
cpan[4]> quit
|
Once you quit cpan, you will get a message to restart your shell. Since you are on a remote computer, you can accomplish the same thing by logging out of TACC and sshing back in.
| Tip |
|---|
If the above bold letters are not enough of a clue for what you need to do here (and/or where you need to go to find appropriate minitutorials), now is a good time to start thinking about what question you need to be asking or sending in an email. It is ok to be overwhelmed or lost especially with the class being virtual and not being able to get good feedback from me directly on your progress. I am happy too help, but can only do so if I know you are struggling. |
Once you have logged back in, be sure to restart a new idev session, and activate your SVDetect conda environment.
Analyze read mapping distribution
The first step is to look at all mapped read pairs and whittle down the list only to those that have an unusual insert sizes (distances between the two reads in a pair).
| Code Block |
|---|
cd $SCRATCH/GVA_sv_tutorial
BAM_preprocessingPairs.pl -p 0 61FTVAAXX.sam
|
...
The following commands will take a while few minutes each and must be completed in order, so no advantages/ability to have them run in the background. Consult the manual for a full description of what these commands and options are doing while the commands are running.
| Code Block | ||
|---|---|---|
| ||
SVDetect linking -conf svdetect.conf
SVDetect filtering -conf svdetect.conf
SVDetect links2SV -conf svdetect.conf
| ||
| Warning | ||
| ||
In reviewing these tutorials these commands were not executing for me in idev sessions for unknown reasons. By chance I had an idev session time out with me noticing and I noticed it did run on the head node. Try the above commands 1 at a time, but if you see error messages like the following logout of your idev session with the logout command, and then execute them 1 at a time on the head node. While this is not the best citizenship, the program gave no indications of being a problem. Feedback from other students says this is not a problem limited to me and that multiple people are experiencing the same problem. Running these commands on the head node should be acceptable, for reasons that will be discussed in zoom. No Format |
did not return a true value at /corral-repl/utexas/BioITeam/bin/SVDetect line 48.
BEGIN failed--compilation aborted at /corral-repl/utexas/BioITeam/bin/SVDetect line 48. |
Take a look at the resulting final output file: 61FTVAAXX.ab.sam.links.filtered.sv.txt. Another downside of command line applications is that while you can print files to the screen, the formatting is not always the nicest. On the plus side in 95% of cases, you can directly copy the output from the terminal window to excel and make better sense of what the columns actually are
...
| Expand | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| |||||||||
1. This is a tandem head-to-tail duplication of the region from approximately 600000 to 663000. ... Many of the others are due to new insertions of transposable elements. | |||||||||
| Expand | |||||||||
| click here for installation instructions | |||||||||
Optional: Install SVDetectWe have installed SVdetect for you already as installation is a bit difficult (though still much easier than the alternatives listed in the introduction). You can verify it's location using which SVDetect in your Install SVDetect scriptsNavigate to the SVDetect project page More information: Download the code onto TACC.
Move the Perl scripts and make them executable
Install required Perl modulesSVdetect requires a few Perl modules to be installed. In the default TACC environment, you can use the cpan shell to install most well-behaved Perl modules (with the exception of some complicated ones that require other libraries to be installed or things to compile). Here's how:
|
Return to GVA2020 Return to GVA2021 course page.