Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

VCF format has alternative Allele Frequency tags denoted by AF= Try the following command to see what values we have in our files.

Code Block
grep AF=AF1 SRR030257.vcf
Expand
titleOptional: For the data we are dealing with, predictions with an allele frequency not equal to 1 are not really applicable. (The reference genome is haploid. There aren't any heterozygotes.) How can we remove these lines from the file?

Try looking at grep --help to see what you can come up with.

Code Block
languagebash
titleHere for answer
collapsetrue
grep -v *something*  # The -v flag inverts the match effecitvely showing you everything that does not match your input
Expand
titleGoing farther
Code Block
cat input.vcf | grep AFAF1=1 > output.vcf

Is not practical, since we will lose vital VCF formatting and may not be able to use this file in the future.

Code Block
cat input.vcf | grep -v AFAF1=0 > output.vcf

Will preserve all lines that don't have a AF=0 value and is one way of doing this.

Code Block
sed -i '/AF1=0/ d' input.vcf

Is a way of doing it in-line and not requiring you to make another file. (But it writes over your existing file!)

...