/
Trimmomatic - GVA2021

Trimmomatic - GVA2021

Overview

As mentioned in the introduction tutorial as well as the read processing tutorial, read processing can make a huge impact on downstream work. While cutadapt which was introduced in the read processing tutorial is great for quick evaluation or dealing with a single bad sample, it is not as robust as some other trimmers in particular when it comes to removing sequence that you know shouldn't be present but may exist in odd orientations (such as adapter sequences from the library preparation).

A note on the adapter file used here

The adapter file listed here is likely the correct one to use for standard library preps that have been generated in the last few years, but may not be appropriate for all library preps (such as single end sequencing adapters, nextera based preps, and certainly not appropriate for PacBio generated data). Look to both the trimmomatic documentation and your experimental procedures at the bench to figure out if the adapter file is sufficient or if you need to create your own.


Learning objectives:

  1. Install trimmomatic

  2. Remove adapter sequences from some plasmids and evaluate effect on read quality, or assembly.

Installing trimmomatic

Trimmomatic's home page can be found at this link which includes links to the paper discussing the program, and a user manual. Trimmomatic is far above average for as far as programs go, most will not have a user manual, may not have been updated since originally published, etc. This is one thing that makes it such a good tool. Another sign of its popularity is that while the home page lists information for installing it that is not conda based, searching for the tool on anaconda still reveals this page https://anaconda.org/bioconda/trimmomatic which is a conda installation for the tool. Anytime you have other people in the community working with a tool and sharing resources like this it is a good sign.

 Since at this point we have several different conda environments going, take a moment to think about what environment a read processing tool would work well in.

There actually are not a lot of "wrong" answers here at least from the theoretical side. As read processing takes place upstream of other analysis steps it makes sense to put it in almost any environment. The 1 notable exception might be the SVdetect environment that we specifically only installed the SVDetect tool as it wouldn't make a lot of sense to activate the SVDetect environment to process the reads, activate a different environment to map the reads, and reactivate the SVDetect environment to use that tool

Practically, as we have seen in other tutorials, it may cause conflicts with other programs on a given environment making it better or worse to install in different locations. My first thought of where to try to install it was on the multiqc environment, and it appears to work there without concern (single message about a single dependent package being superseded by higher priority channel). If you have not done that tutorial, will also install into the GVA2021 environment with no messages about potential issues with other packages. If you want to try installing it in another environment feel free to try, if you are unsure if it will create problems based on the output, ask.


As noted in our IVG tutorial, when we dealt with the readseq program, trimmomatic is also a java based program, and like the conda installation of readseq, the conda installation of trimmomatic includes a bash wrapper script around the java invocation. For those interested in how such wrapper scripts are made, last year's tirmmomatic tutorial actually built a trimmomatic wrapper to avoid the java envokation. Using "-version" verify that you have version 0.39 of trimmomatic installed.