...
None of these other tutorials are required to complete this tutorial, but additional information about individual steps may be found there.
Identification of a novel plasmid
One example of novel DNA being present is when a given sample may have a virus or plasmid associated with a sample. Here we will take a sample known to have a high copy plasmid associated with it, but map the reads against only the genome. Unaligned reads would then be expected to be able to assemble into a plasmid.
...
Code Block language bash title Convert reference to fasta module load bioperl bp_seqconvert.pl --from genbank --to fasta < CP009273.1_Eco_BW25113.gbk > CP009273.1_Eco_BW25113.fasta
Warning title Remember to make sure you are on an idev done For reasons discussed numerous times throughout the course already, please be sure you are on an idev done. It is unlikely that you are currently on an idev node as copying the files while on an idev node seems to be having problems as discussed. Remember the hostname command and showq -u can be used to check if you are on one of the login nodes or one of the compute nodes. If you need more information or help re-launching a new idev node, please see this tutorial.
Code Block language bash title Index the reference mkdir bowtie2 module load bowtie/2.3.4 bowtie2-build CP009273.1_Eco_BW25113.fasta bowtie2/CP009273.1_Eco_BW25113
The following command will take ~5 minutes to complete. Before you run the command execute '
bowtie2 -h
' so while the command is running you can try to figure out what the different options are doing that we did not include in our first tutorial.Code Block language bash title Map reads bowtie2 --very-sensitive-local -t -p 48 -x bowtie2/CP009273.1_Eco_BW25113 -1 SRR4341249_1.fastq -2 SRR4341249_2.fastq -S bowtie2/SRR4341249-vsl.sam --un-conc SRR4341249-unmapped-vsl.fastq
Expand title Click here to explain the new options. option effect --very-sensitive-local
map in very sensitive local alignment mode -t
print time info -p 48
use 48 processors -x bowtie2/CP009273.1_Eco_BW25113
index -1 SRR4341249_1.fastq
-2 SRR4341249_2.fastqreads used for mapping -S bowtie2/SRR4341249-vsl.sam
Sam file detailing mapping --un-conc SRR4341249-unmapped-vsl.fastq
print reads which do not map to the genome to the file SRR4341249-unmapped-vsl.fastq
Expand title What percent of reads mapped? The stdoutput of the program listed:
65.74% overall alignment rate
...
Additional questions are:
Expand title How might we go about finding out what an assembled product actually is when it truly is novel rather than a positive control? blast. In fact I did just that and identified it as an artifact of sequencing. The contig corresponds to phiX.
Code Block title Steps to identify phiX linenumbers true copy full 5441 bp of sequence Go to https://blast.ncbi.nlm.nih.gov/Blast.cgi large list of results, including vast majority listing phiX or genome assembly/scaffold
Why does seeing phiX (link to screen shot of blast results) tell me that it is an artifact? phiX is used as a loading control for illumina runs to both tell the difference between a failed run because of bad libraries and a failed run due to poor base diversity.
? ... dependsExpand title How would we decide if it was real or important if we hadn't recognized it? Depends on blast results, how high coverage is compared to genome, gene content
...Expand title In other systems what else might you find? viruses, mobile genetic elements, evidence of microbiome
... consider tomorrow'sExpand title How does this effect mapping? Consider advanced read mapping with multiple references tutorial
. similarExpand title Do you expect to find more novel DNA in a highly accurate reference file, or a "similar' reference file? Similar. The fact that the reference is not as accurate will lower the alignment scores across the board, potentially dropping below thresholds to be able to anchor the match at all. Look deeper at the bowtie2 mapping command where you used --very-sensative-local mode the documentation tells you about tolerated mismatches etc. The more reads you have that don't match, the more novel DNA inserts you are likely to deal with.