/
Molecular Index Error correction GVA2019

Molecular Index Error correction GVA2019

Overview:

This section provides directions for generating SSCS (Single Strand Consensus Sequence) reads and trimming molecular indexes from raw fastq files. 

Learning Objectives:

  1. Use python script to generate SSCS Reads.
  2. Use cutadapt to trim molecular indexes from duplex seq libraries.

Tutorial: SSCS Reads

First we want to generate SSCS reads where we take advantage of the molecular indexes added during library prep. To do so we will use a "majority rules" python script (named SSCS_DCS.py) which was heavily modified by DED from a script originally created by Mike Schmitt and Scott Kennedy for the original duplex seq paper. This script can be found in the $BI/bin directory. For the purpose of this tutorial, the paired end sequencing of sample DED110 (prepared with a molecular index library) has been placed in the  $BI/gva_course/mixed_population directory. Invoking the script is as simple as typing SSCS_DCS.py; adding -h will give a list of the available options. The goal of this command is to generate SSCS reads, for any molecular index where we have at least 2 reads present, and to generate a log file which will tell us some information about the data.

Click here for solution of how to copy the DED110 fastq files to a new directorry called BDIB_Error_Correction
cds
mkdir GVA_Error_Correction
cd GVA_Error_Correction
cp $BI/gva_course/mixed_population/DED110*.fastq .



 Interrogate the SSCS_DCS.py script to determine how to invoke it. Click here for hints before the answer

You can often get more information about python scripts by typing the name of the script followed by the -h command.


The -h command should show you these options as being th