Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

For major bonus points and a great THANK YOU from Scott, compute the mean and standard deviation of the intersected and subtracted SNPs from NA12878 vs all and then perform a t-test to make sure the differences are statistically significant using only linux command line tools (probably in a shell script). Yes, it's probably easier in Python, Perl, or R.

 

Other linux utilities useful for making subsets of VCF files and comparing them

 

Code Block
titleMake files containing all the het & hom alt alleles from a vcf, and simplify it somewhat:
cat NA12878.raw.vcf | awk 'BEGIN {FS="\t"} {print $2 "\t" substr($10,1,3) "\t" $4 "\t" $5}' \
  | sort -n | grep "0/1" > NA12878.raw.vcf.simple.het
cat NA12878.raw.vcf | awk 'BEGIN {FS="\t"} {print $2 "\t" substr($10,1,3) "\t" $4 "\t" $5}' \
  | sort -n | grep "1/1" > NA12878.raw.vcf.simple.hom

 

 

 

Code Block
titleMake a file containing all the het & hom alt alleles from a vcf, with the same simplification we used above:
cat NA12891.raw.vcf | awk 'BEGIN {FS="\t"} {print $2 "\t" substr($10,1,3) "\t" $4 "\t" $5}' \
  | sort -n | grep "0/1" | sort > NA12891.raw.vcf.simple.het
cat NA12892.raw.vcf | awk 'BEGIN {FS="\t"} {print $2 "\t" substr($10,1,3) "\t" $4 "\t" $5}' \
  | sort -n | grep "0/1" | sort > NA12892.raw.vcf.simple.het
cat NA12891.raw.vcf | awk 'BEGIN {FS="\t"} {print $2 "\t" substr($10,1,3) "\t" $4 "\t" $5}' \
  | sort -n | grep "1/1" | sort > NA12891.raw.vcf.simple.hom
cat NA12892.raw.vcf | awk 'BEGIN {FS="\t"} {print $2 "\t" substr($10,1,3) "\t" $4 "\t" $5}' \
  | sort -n | grep "1/1" | sort > NA12892.raw.vcf.simple.hom

 

  1. Now count how many GT are het in both of the second two (parents) but hom in the first (child):

    Code Block
    join NA12892.raw.vcf.simple.het NA12891.raw.vcf.simple.het > both.het
    join both.het NA12878.raw.vcf.simple.hom | wc -l
    

    (would you have expected this result?)

 

  1. Now find which GT are hom in both of the second two (parents) but het in the first (child):

    Code Block
    join NA12892.raw.vcf.simple.hom NA12891.raw.vcf.simple.hom > both.hom
    join both.hom NA12878.raw.vcf.simple.het | wc -l
    

 

Virmid - an advanced auto-screener

...