...
VCF format has alternative Allele Frequency tags denoted by AF= Try the following command to see what values we have in our files.
Code Block |
---|
grep AF=AF1 SRR030257.vcf
|
Expand |
---|
title | Optional: For the data we are dealing with, predictions with an allele frequency not equal to 1 are not really applicable. (The reference genome is haploid. There aren't any heterozygotes.) How can we remove these lines from the file? |
---|
|
Try looking at grep --help to see what you can come up with. Code Block |
---|
language | bash |
---|
title | Here for answer |
---|
collapse | true |
---|
| grep -v *something* # The -v flag inverts the match effecitvely showing you everything that does not match your input
|
Expand |
---|
| Code Block |
---|
cat input.vcf | grep AFAF1=1 > output.vcf
|
Is not practical, since we will lose vital VCF formatting and may not be able to use this file in the future. Code Block |
---|
cat input.vcf | grep -v AFAF1=0 > output.vcf
|
Will preserve all lines that don't have a AF=0 value and is one way of doing this. Code Block |
---|
sed -i '/AF1=0/ d' input.vcf
|
Is a way of doing it in-line and not requiring you to make another file. (But it writes over your existing file!) |
|
...