...
Warning | ||
---|---|---|
| ||
Set 1: Plasmid SPAdes
Set 2: SPAdes example Data
SPAdes provides two example data sets and benchmarks for how long the pipeline might take to run http://spades.bioinf.spbau.ru/release3.11.1/manual.html#sec1.3. In this section we will download the data for the standard E. coli data and compare their results with ours.
Code Block | ||||
---|---|---|---|---|
| ||||
wget http://spades.bioinf.spbau.ru/spades_test_datasets/ecoli_mc/s_6_1.fastq.gz wget http://spades.bioinf.spbau.ru/spades_test_datasets/ecoli_mc/s_6_2.fastq.gz |
Set 3: Whole Genome Simulated Data
Program self tests are typically safe to run on the head node, but the rest of the Tutorial assumes that you are on an idev node. If you are not sure please ask for help.
Tip |
---|
Unlike other times in the class where we are concerned about being good TACC citizens and not hurting other people by the programs we run, assembly programs are exceptionally memory intensive and attempting to run on the head node will likely result in the program returning a memory error rather than useable results. When it comes time to assemble your own reference genome, remember to give each sample its own compute node rather than having multiple samples split a single node. |
Data
Code Block | ||
---|---|---|
| ||
mkdir $SCRATCH/GVA_SPAdes_tutorial # you likely already did this when you ran the selftest cp $BI/ngs_course/velvet/data/*/* $SCRATCH/GVA_SPAdes_tutorial cd $SCRATCH/GVA_SPAdes_tutorial |
...
Often your read pairs will be "separate" with the corresponding paired reads at the same index in two different files (each with exactly the same number of reads).
SPAdes Assembly
Now let's use SPAdes to assemble the reads. As always its a good idea to get a look at what kind of options the program accepts using the -h option. SPAdes is actually written in python and the base script name is "spades.py". There are additional scripts that change many of the default options such as metaspades.py, plasmidspades.py, and rnaspades.py or these options can be set from the main spades.py script with the flags --meta, --plasmid, --rna respectively.
...
Expand | ||
---|---|---|
| ||
|
What comes next when working with your own data?
- Look for things: If you're just after a few homologs, an operon, etc. you're probably done. Think about what question you are trying to answer.
- You can turn the contigs.fa into a blast database (
formatdb
ormakeblastdb
depending on which version of blast you have) or try multiple sequence alignments through NCBIs blast. - If you built your contigs based on a normal/control sample you can map other reads to the contigs using bowtie2 to try to identify variants in other samples.
- If you don't think the contigs you have are "good enough"
- Try using Spades MismatchCorrector to see if you can improve the contigs you already have.
- Add additional sequencing libraries to try to connect some more contigs. Especially think about pacbio sequencing and oxford nanopore.
...