...
We'll return to this example data later to demonstrate a much more involved tool, GATK, to do the same steps in another tutorial.
...
This same type of analysis can be done on much larger cohorts, say cohorts of 100 or even 1000s of individuals with known disease state to attempt to identify associations between allelic state and disease state.
Warning | ||
---|---|---|
| ||
STOP HERE UNTIL FIXED |
A bcftools
auxiliary perl script called vcfutils.pl
can provide some useful stats. This is not part of a standard TACC module but it's in our common $BI/bin directory.
Code Block | ||
---|---|---|
| ||
vcfutils.pl qstats trios_tutorial.raw.vcf
|
No Format | ||||
---|---|---|---|---|
| ||||
tacc:/scratch/01821/ded/BDIB_Human_tutorial/samtools_example$ vcfutils.pl qstats trios_tutorial.raw.vcf
186 1011 1011 757 0 2.9803 0.0000 0.0000 2.9803
142 2036 2036 1441 0 2.4218 0.0000 0.0000 2.0059
122 3091 3091 2156 0 2.3059 0.0000 0.0000 2.1029
109 4177 4177 2925 0 2.3363 0.0000 0.0000 2.4259
99.5 5237 5237 3647 0 2.2937 0.0000 0.0000 2.1361
92 6273 6273 4354 0 2.2689 0.0000 0.0000 2.1489
85.5 7328 7328 5105 0 2.2964 0.0000 0.0000 2.4704
79 8352 8352 5811 0 2.2869 0.0000 0.0000 2.2201
74 9369 9369 6497 0 2.2622 0.0000 0.0000 2.0725
69 10553 10553 7311 0 2.2551 0.0000 0.0000 2.2000
65 11782 11782 8176 0 2.2673 0.0000 0.0000 2.3764
61 13059 13059 9090 0 2.2902 0.0000 0.0000 2.5179
57 14280 14280 9945 0 2.2941 0.0000 0.0000 2.3361
53 15301 15301 10656 0 2.2941 0.0000 0.0000 2.2935
48.8 16323 16323 11347 0 2.2803 0.0000 0.0000 2.0876
45 17460 17460 12122 0 2.2709 0.0000 0.0000 2.1409
41.8 18639 18639 12899 0 2.2472 0.0000 0.0000 1.9328
39.8 19660 19660 13572 0 2.2293 0.0000 0.0000 1.9339
37.8 21013 21013 14496 0 2.2243 0.0000 0.0000 2.1538
35.8 22309 22309 15380 0 2.2197 0.0000 0.0000 2.1456
33.8 23568 23568 16251 0 2.2210 0.0000 0.0000 2.2448
31.8 24846 24846 17101 0 2.2080 0.0000 0.0000 1.9860
29.8 26171 26171 17975 0 2.1931 0.0000 0.0000 1.9379
28 27308 27308 18688 0 2.1680 0.0000 0.0000 1.6816
26 28654 28654 19551 0 2.1478 0.0000 0.0000 1.7867
24 29947 29947 20371 0 2.1273 0.0000 0.0000 1.7336
22 31050 31050 21065 0 2.1097 0.0000 0.0000 1.6968
20 32081 32081 21719 0 2.0960 0.0000 0.0000 1.7347
17.1 33251 33251 22446 0 2.0774 0.0000 0.0000 1.6411
13.2 34447 34447 23238 0 2.0732 0.0000 0.0000 1.9604
10.4 35751 35751 24067 0 2.0598 0.0000 0.0000 1.7453
9.53 36772 36772 24724 0 2.0521 0.0000 0.0000 1.8049
8.65 38023 38023 25539 0 2.0457 0.0000 0.0000 1.8693
7.8 39301 39301 26360 0 2.0369 0.0000 0.0000 1.7965
6.98 40665 40665 27196 0 2.0192 0.0000 0.0000 1.5833
6.2 42394 42394 28243 0 1.9958 0.0000 0.0000 1.5352
5.46 44018 44018 29211 0 1.9728 0.0000 0.0000 1.4756
4.77 45553 45553 30168 0 1.9609 0.0000 0.0000 1.6557
4.13 47020 47020 31081 0 1.9500 0.0000 0.0000 1.6480
3.54 48572 48572 32063 0 1.9422 0.0000 0.0000 1.7228
3.01 50463 50463 33139 0 1.9129 0.0000 0.0000 1.3202
|
Undoubtedly, you this looks like nothing but gibberish to you and understandably so, a tab delimited output without headers is rarely if ever useful. It is unclear exactly what the column headings are
Here is an honest write-up about the Ts/Tv metric, referencing this 2002 paper by Ebersberger. Bottom line: between about 2.1 and 2.8 is OK for Ts/Tv (also called Ti/Tv). You can also get some quick stats with some linux one-liners on this page; there are more thorough analysis programs built to work with vcf's.