Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Assembling even small bacterial genomes can be incredibly time intensive (as well as memory intensive as highlighted above). Fortunately for this class, we can make use of the plasmid spades option to assemble and even smaller plasmid genome that is ~2000 bp long in only a few minutes. I suggest analyzing this data on an idev node and then submitting the other data analysis for the bacterial genomes as a job to run overnight.

Data

Note
titlePossible errors on idev nodes

As mentioned yesterday, you can not copy from the BioITeam (because it is on corral-repl) while on an idev node. Logout of your idev session, copy the files.

Download the paired end fastq files which have had their adapters trimmed from the $BI/gva_course/Assembly/ directory.

...

Once you have figured out what options you need to use see if you can come up with a command to run on the paired end reads and have the output go into a new directory called plasmid using all 68 48 cores that are available on your idev node (-t 6848). The following command is expected to take less than 2 minutes.

...

Code Block
languagebash
titleDid you come up with the same thing I did?
collapsetrue
plasmidspades.py -t 6848 -o plasmid -1 SH1_1P.fastq.gz -2 SH1_2P.fastq.gz 

...

Here we will look at 4 sets of data with library preparation conditions to evaluate how wet lab decisions influence outcomes on the computer. Some of the text here is very similar or identical to that in set 1 incase people choose to skip directly to it.

Data

Note
titlePossible errors on idev nodes

As mentioned yesterday, you can not copy from the BioITeam (because it is on corral-repl) while on an idev node. Logout of your idev session, copy the files.


Code Block
titleMove to scratch, copy the raw data, and change into this directory for the tutorial
mkdir  $SCRATCH/GVA_SPAdes_tutorial # you likely already did this when you ran the selftest
cp $BI/ngs_course/velvet/data/*/* $SCRATCH/GVA_SPAdes_tutorial
cd $SCRATCH/GVA_SPAdes_tutorial

...

Once you have figured out what options you need to use see if you can come up with a command to run on the single end and have the output go into a new directory called single_end using all 68 48 threads that are available (-t 6848).

Code Block
languagebash
titleDid you come up with the same thing I did?
collapsetrue
spades.py -t 6848 -o single_end -s single_end_100_c_50.fastq 

...

Code Block
languagebash
titlePossible other commands
linenumberstrue
spades.py -t 6848 -o 400_1500_3000 --12 paired_end_2x100_ins_400_c_50.fastq --12 paired_end_2x100_ins_1500_c_20.fastq --12 paired_end_2x100_ins_3000_c_25.fastq
spades.py -t 6848 -o 400_and_1500 --12 paired_end_2x100_ins_400_c_50.fastq --12 paired_end_2x100_ins_1500_c_20.fastq
spades.py -t 6848 -o 400_only --12 paired_end_2x100_ins_400_c_50.fastq 

...