Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
titleOptional: For the data we are dealing with, predictions with an allele frequency not equal to 1 are not really applicable. (The reference genome is haploid. There aren't any heterozygotes.) How can we remove these lines from the file?

Try looking at grep --help to see what you can come up with.

Code Block
languagebash
titleHere for answer
collapsetrue
grep -v *something*  # The -v flag inverts the match effecitvely showing you everything that does not match your input
Expand
titleGoing farther
Code Block
cat SRR030257.vcf | grep AF1=1 > SRR030257.filtered.vcf

Is not practical, since we will lose vital VCF formatting and may not be able to use this file in the future for formats which require that formatting.

Code Block
cat SRR030257.vcf | grep -v AF1=0 > SRR030257.filtered.vcf

Will preserve all lines that don't have a AF'AF1=0' value on the line and is one way of doing this. If you look closely at the non-filtered file you will see that the frequencies are given as AF1=0.### so by filtering out lines that have 'AF1=0' in them we get rid of all frequencies that are not 1, including say 'AF1=0.99'. How you would change this to variants that have a frequency of at least 90%?

Code Block
sed -i '/AF1=0/ d' SRR030257.vcf

Is a way of doing it in-line and not requiring you to make another file. (But it writes over your existing file!)

...