Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...


The samtools mpileup command will take a few minutes to run. As practice for a fairly common occurrence when working with the iDEV environment, once the command is running, you should could try putting it in the background by pressing control-z and then typing the command bg so that you can do some other things in this terminal window at the same time. Remember, there are still many other processors available on this node for you to do other things! Just remember that if you have something running in the background you need to check in with it to see if it is still running with the ps command or watch the command line every time you execute a new command as you may see information about your background task having finished.

...

Code Block
languagebash
titleThe above result can be piped into sort to sort the lines into increasing order
 awk -F";" '{for(i=1;i<=NF;i++){if ($i ~ /AF1/){print $i}}}' SRR030257.vcf | sort
Expand
titleDid you see anything when you were running the commands in this tutorial to suggest that it was looking for diploid genomes?

After you ran the bcftools call command you saw: "Note: none of --samples-file, --ploidy or --ploidy-file given, assuming all sites are diploid". Just like on a webpage you can use control/command + F to find specific text in the window. Lookf for 'diploid' and you should see the line referenced above.

Obviously this suggests a way that you could go back and reanalyze this data introducing one of the recommended flags to the bcftools call command and see how this might effect your analysis. If you choose to do this, I suggest adding descriptive file name between 'SRR030257' and '.vcf' to make the results easier to compare.

Expand
titleRather than (or in addition to) going back and changing the analysis to look for variation from our haploid genome, can you come up with a way to create a filtered .vcf file that only has variants with a frequency of 1?

An initial attempt might be something like this:

Code Block
languagebash
titleNote these 2 examples will give the exact same output. The first command gives you extra practice with piping, and may help order your thoughts somewhat.
cat SRR030257.vcf | grep AF1=1 > SRR030257.filtered.vcf



grep AF1=1 SRR030257.vcf > SRR030257.filtered.vcf

...

Expand
titleHow you would change this to variants that have a frequency of at least 90%?

Remember from our Raw Sequencing Data tutorial yesterday that we can group certain characters together by placing them between square brackets [].

Code Block
languagebash
title2 equivalent answers again
cat SRR030257.vcf | grep -v AF1=0.[0-8] > SRR030257.filtered.vcf

grep -v AF1=0.[0-8] SRR030257.vcf > SRR030257.filtered.vcf

Here we added a decimal point after the 0, and then allowed for a match to any digit between 0 and 8. Thus lines that have AF1=1 would not match, nor would a line with AF1=0.9 .

Optional Exercises at the end of class or for Wednesday/Thursday choose your own tutorial time.

...

  1. Trim both Read1 and Read2 using info from read preprocessing tutorial.
  2. Map reads with bowtie2 using info from read mapping tutorial.
  3. Call variants using this tutorial.

Remember in the intro tutorial we talked about file/direcotory directory naming. Be sure you don't write over your old files. Maybe create a new directories like GVA_samtools_bowtie_improved for the outputs.

...