Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...


  1. Expand
    titleHow many contigs were generated in each case?

    2 for the plasmid and 16 for the full

    Code Block
    languagebash
    titleWhere does the answer come from?
    collapsetrue
    grep -c "^>" unmapped*/contigs.fasta




  2. Expand
    titleAre any of the contigs the same?

    YesProbably, both contigs detected in the plasmid mode were also detected have similar lengths in the full mode

    Code Block
    languagebash
    titleWhere does the answer come from?
    collapsetrue
    grep "^>" unmapped*/contigs.fasta
    # lists identical lengths and coverages for 2 plasmid contig




  3. Expand
    titleWhat sizes are the contigs?

    5441 or 5463

    3170 or 3192

    14 others less than 500bp in full spade mode

    Code Block
    languagebash
    titleWhere does the answer come from?
    collapsetrue
    grep "^>" unmapped*/contigs.fasta
    # same command as above, just focusing on the length value




  4. Expand
    titlewhich is most likely to be our plasmid?

    The contig that is 3170 .or 3192

    Code Block
    languagebash
    titleWhere does the answer come from?
    collapsetrue
    # From above, I stated this was a high copy plasmid, it has a coverage value of 12,698 compared to 98 for the larger contig




  5. Expand
    titleIs that actually our plasmid?

    Yes! The actual plasmid reference locus line stats:

    LOCUS       GFP_Plasmid_Sequ        3115 bp    DNA     circular UNA 18-NOV-2013

    Expand
    titleWhy might the sizes not agree?

    My thought is that this was raw fastq files that were fed into the assembly, not trimmed files. Leads me to hypothesize that the difference in size is related to the presence of adapter sequence. Alternatively, it may be that small changes in either bowtie2 or spades versions or read trimmers have influenced what reads are considered. In fact using bowtie 2.3.5.1 and spades 3.13.0 both mappers gave the same stats on these 2 conditions for the largest 2 plasmids (3170 and 5441)




Next steps

Here we have presented a proof of concept that unmapped reads can be used to find something that we actually did know was present. We also found something that was even longer that wasn't expected.

...


  1. Expand
    titleHow might we go about finding out what an assembled product actually is when it truly is novel rather than a positive control?

    blast. In fact I did just that and identified it as an artifact of sequencing. The contig corresponds to phiX.

    Code Block
    titleSteps to identify phiX
    linenumberstrue
    copy full 5441 bp of sequence
    Go to https://blast.ncbi.nlm.nih.gov/Blast.cgi
    large list of results, including vast majority listing phiX or genome assembly/scaffold

    Why does seeing phiX (link to screen shot of blast results) tell me that it is an artifact? phiX is used as a loading control for illumina runs to both tell the difference between a failed run because of bad libraries and a failed run due to poor base diversity.



  2. Expand
    titleHow would we decide if it was real or important if we hadn't recognized it?
    Depends on blast results, how high coverage is compared to genome, gene content



  3. Expand
    titleIn other systems what else might you find?
    viruses, mobile genetic elements, evidence of microbiome, mitochondria, chloroplast, other plasmids



  4. Expand
    titleHow does this effect mapping?
    Consider advanced read mapping with multiple references tutorial



  5. Expand
    titleDo you expect to find more novel DNA in a highly accurate reference file, or a "similar' reference file?
    Similar. The fact that the reference is not as accurate will lower the alignment scores across the board, potentially dropping below thresholds to be able to anchor the match at all. Look deeper at the bowtie2 mapping command where you used --very-sensative-local mode the documentation tells you about tolerated mismatches etc. The more reads you have that don't match, the more novel DNA inserts you are likely to deal with.


Next steps:

Expand
titleYou have likely worked through enough of these turtorials to notice a glaring error in the way this analysis was done. 

The reads were not trimmed. As a bonus exercise you could trim these reads before redoing the analysis and see how it effects alignment fraction, and assembly statistics.


Return to GVA2022