...
Expand title How might we go about finding out what an assembled product actually is when it truly is novel rather than a positive control? blast. In fact I did just that and identified it as an artifact of sequencing. The contig corresponds to phiX.
Code Block title Steps to identify phiX linenumbers true copy full 5441 bp of sequence Go to https://blast.ncbi.nlm.nih.gov/Blast.cgi large list of results, including vast majority listing phiX or genome assembly/scaffold
Why does seeing phiX (link to screen shot of blast results) tell me that it is an artifact? phiX is used as a loading control for illumina runs to both tell the difference between a failed run because of bad libraries and a failed run due to poor base diversity.
Expand title How would we decide if it was real or important if we hadn't recognized it? Depends on blast results, how high coverage is compared to genome, gene content Expand title In other systems what else might you find? viruses, mobile genetic elements, evidence of microbiome Expand title How does this effect mapping? consider tomorrow's Consider advanced read mapping with multiple references tutorial Expand title Do you expect to find more novel DNA in a highly accurate reference file, or a "similar' reference file? Similar. The fact that the reference is not as accurate will lower the alignment scores across the board, potentially dropping below thresholds to be able to anchor the match at all. Look deeper at the bowtie2 mapping command where you used --very-sensative-local mode the documentation tells you about tolerated mismatches etc. The more reads you have that don't match, the more novel DNA inserts you are likely to deal with.
...