Page Comparison

...

Here we will assume you have data from GSAF's Illumina HiSeq or MiSeq sequencer.

Learning Objectives

...

See if you can figure out what's wrong with these data sets (copy them to your $SCRATCH directory before analyzing them) and then process them to get rid of the problem(s):. If you're very ambitious, you could also map them to the reference genomes and perform variant calling before and after cleaning them up to see how the results change. Each file has a different problem.

Example #1: Single-end Illumina MiSeq data for E. coli

Code Block

language	bash
title	Example read file and reference files #1

$BI/gva_course/read_processing_example/JJM104_TAAGGCGA-TAGATCGC_L001_R1_001.fastq.gz
$BI/gva_course/read_processing/REL606.fna

Expand

title	What's wrong with this data?

This

Example #2: Paired-end Illumina Genome Analyzer IIx data for E. coli

Code Block

language	bash
title	Example read and reference files #2

$BI/gva_course/read_processing/61FTVAAXX_2_R1_ZDB172.fastq.gz
$BI/gva_course/read_processing/61FTVAAXX_2_R2_ZDB172.fastq.gz
$BI/gva_course/read_processing/REL606.fna

Expand

title	What's wrong with this data?

There was some sort of problem during library prep that highly biased the beginning of reads to "T". Unfortunately, post-processing can't help with this one. The read sequences are fine, but the coverage across the genome is so uneven that many regions of the genome were not sampled (have zero coverage) even though the volume of sequencing data was very high for this microbial genome. The facility had to do a new library prep and re-sequence to correct this issue.

Versions Compared

Old Version 10

New Version 11

Key

Learning Objectives