...
| Code Block | ||||
|---|---|---|---|---|
| ||||
cds cp -r $BI/gva_course/structural_variation/data GVA_sv_tutorial cd GVA_sv_tutorial |
| Note | ||
|---|---|---|
| ||
At least 1 student was experiencing an issue with the above command where an "Input/Output" error message was generated, and the files were copied, but the files were |
This is Illumina mate-paired data (having a larger insert size than paired-end data) from genome re-sequencing of an E. coli clone.
...
| Info | ||
|---|---|---|
| ||
In short, No. A detailed explanation can be found at this link. This is mentioned here somewhat out of posterity ... years ago when multiple different mappers were used (including bowtie and bowtie2) it was pointed out that SV is very difficult to detect when reads are mapped with bowtie as it does not identify discordantly mapped read pairs. |
Analyze read mapping distribution
The first step is to look at all mapped read pairs and whittle down the list only to those that have an unusual insert sizes (distances between the two reads in a pair).
...
| Code Block | ||||
|---|---|---|---|---|
| ||||
<general> input_format=sam sv_type=all mates_orientation=RF read1_length=35 read2_length=35 mates_file=/scratch/#####/<USERNAME>/GVA_sv_tutorial/61FTVAAXX.ab.sam cmap_file=/scratch/#####/<USERNAME<USERNAME>/GVA_sv_tutorial/NC_012967.1.lengths num_threads=48 </general> <detection> split_mate_file=0 window_size=2000 step_length=1000 </detection> <filtering> split_link_file=0 nb_pairs_threshold=3 strand_filtering=1 </filtering> <bed> <colorcode> 255,0,0=1,4 0,255,0=5,10 0,0,255=11,100000 </colorcode> </bed> |
...
| Code Block | ||
|---|---|---|
| ||
SVDetect linking -conf svdetect.conf SVDetect filtering -conf svdetect.conf SVDetect links2SV -conf svdetect.conf |
| Warning | ||
|---|---|---|
| ||
In reviewing these tutorials these commands were not executing for me in idev sessions for unknown reasons. By chance I had an idev session time out with me noticing and I noticed it did run on the head node. Try the above commands 1 at a time, but if you see error messages like the following logout of your idev session with the logout command, and then execute them 1 at a time on the head node. While this is not the best citizenship, the program gave no indications of being a problem. Feedback from other students says this is not a problem limited to me and that multiple people are experiencing the same problem. Running these commands on the head node should be acceptable, for reasons that will be discussed in zoom.
|
Take a look at the resulting file: 61FTVAAXX.ab.sam.links.filtered.sv.txt. Another downside of command line applications is that while you can print files to the screen, the formatting is not always the nicest. On the plus side in 95% of cases, you can directly copy the output from the terminal window to excel and make better sense of what the columns actually are
...
| Expand | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| |||||||||
Optional: Install SVDetectWe have installed SVdetect for you already as installation is a bit difficult (though still much easier than the alternatives listed in the introduction). You can verify it's location using which SVDetect in your Install SVDetect scriptsNavigate to the SVDetect project page More information: Download the code onto TACC.
Move the Perl scripts and make them executable
Install required Perl modulesSVdetect requires a few Perl modules to be installed. In the default TACC environment, you can use the cpan shell to install most well-behaved Perl modules (with the exception of some complicated ones that require other libraries to be installed or things to compile). Here's how:
|
Return to GVA2019 GVA2020 course page.