Overview:
This section provides directions for generating SSCS (Single Strand Consensus Sequence) reads and trimming molecular indexes from raw fastq files.
Learning Objectives:
- Use flexbar to trim molecular indexes from duplex seq libraries.
- Use python script to generate SSCS Reads.
Tutorial:
For the purpose of this tutorial, we will be working with flexbar which like breseq is something that we have installed in the BioITeam as it is not a tacc module. Some additional modules must be loaded in order for it to work correctly, and the LD_LIBRARY_PATH variable must be modified as listed below.
module swap intel gcc export LD_LIBRARY_PATH=/corral-repl/utexas/BioITeam/flexbar_v2.23_linux64:$LD_LIBRARY_PATH
For this tutorial, it is sufficient to simply type these commands out, if this becomes something you want to do more often, or want to submit as a job, it would be important to add these lines to your .profile so they are loaded each time you log in.
If the above commands are executed properly, typing flexbar -h should display a lengthy list of optional arguments which can be used for a variety of purposes. For the purpose of this tutorial, we will only focus on trimming the first 16 bases off each read as this represents the 12 bases of the molecular index and a 4 base constant region. See if you can figure out what the command is based on the help output pay special attention to the -t option.
In an idev shell this should take less than 5 minutes to complete. Once completed there should be 6 new files, all of which begin with "trimmed" if you took the answer from the above help, or whatever string you entered for the -t argument if you did not use the above help. These 6 files represent the trimmed files, the length distribution, and any errors. using the head command, see if you can figure out which file is which.
Next we want to generate SSCS reads.