Molecular Index Error correction GVA2019
Overview:
This section provides directions for generating SSCS (Single Strand Consensus Sequence) reads and trimming molecular indexes from raw fastq files.
Learning Objectives:
- Use python script to generate SSCS Reads.
- Use cutadapt to trim molecular indexes from duplex seq libraries.
Tutorial: SSCS Reads
First we want to generate SSCS reads where we take advantage of the molecular indexes added during library prep. To do so we will use a "majority rules" python script (named SSCS_DCS.py) which was heavily modified by DED from a script originally created by Mike Schmitt and Scott Kennedy for the original duplex seq paper. This script can be found in the $BI/bin directory. For the purpose of this tutorial, the paired end sequencing of sample DED110 (prepared with a molecular index library) has been placed in the $BI/gva_course/mixed_population directory. Invoking the script is as simple as typing SSCS_DCS.py; adding -h will give a list of the available options. The goal of this command is to generate SSCS reads, for any molecular index where we have at least 2 reads present, and to generate a log file which will tell us some information about the data.