VCF/BCF Documentation

Here is the documentation for VCF.

VCFsnippet

VCFsnippet

Tools for analyzing VCF/BCF data

R based tools: vcfR

Python based tools: PyVCF, cyvcf2

Linux based tools: bcftolls, github_bcftools, bioconda_bcftools

Analyze VCF file

samtools mpileup -f xxxxxx.fas -uv xxxxxx.sorted.bam > xxxxxx.vcf

number of entries for a particular chromosome

cat xxxxxx.vcf | grep -v “^##” | cut -f1 | grep -c “^Chr1”

cat xxxxxx.vcf | cut -f1 | grep " Chr1" | wc -l

number of adenines (A) in the file

cat xxxxxx.vcf | grep -v “^##” | cut -f4 | grep -P “^A$”

cat xxxxxx.vcf | cut -f4 | awk ‘$1 == “A”’ | wc -l

number of entries have exactly 20 supporting reads (read depth)

cat xxxxxx.vcf | grep -v “^##” | grep -c “DP=20;”

cat xxxxxx.vcf | cut -f8 | grep “DP=20” | wc -l

number of entries represent indels

cat xxxxxx.vcf | grep -v “^##” | grep -c INDEL

cat xxxxxx.vcf | cut -f8 | grep “INDEL” | wc -l (reduce the output by one)

number of entries on a specific chromosome and specific location (e.g., pos 175672 on Chr1)

cat xxxxxx.vcf | grep -v “^##” | cut -f1,2 | grep “Chr1” | grep 175672

cat xxxxxx.vcf | cut -f1,2 | grep “Chr1” | awk ‘$2 == “175672”’ | wc -l

Analyze compressed VCF (VCF.gz) file

samtools mpileup -f xxxxxx.fas -g xxxxxx.sorted.bam > xxxxxx.bcf

all reported variants on chromosome 3

zcat xxxxxx.vcf.gz | cut -f1 | grep “Chr3” | wc -l

number of A to T mutations in a tabular fashion (-P for pattern,‘t’ for tabular)

zcat xxxxxx.vcf.gz | cut -f4,5 | awk ‘$1 == “A” && $2 == “T”’ | wc -l

number of indels in the file

zcat xxxxxx.vcf.gz | cut -f8 | grep “INDEL” | wc -l

finding the type of mutation at a particular position in a particular chromosome

zcat xxxxxx.vcf.gz |grep “Chr3”|awk ‘$2==“11937923”’|head

number of entries have precisely 20 supporting reads (read depth)

zcat xxxxxx.vcf.gz | cut -f8 | grep “DP=20” | wc -l

Convert BCF to VCF for analysis

call variants using ‘BCFtools call’ with the multi-allelic caller (option ‘-m’), showing only variant sites (‘-v’) and presenting the output in uncompressed VCF format (‘-O v’)

bcftools call -m -v -O v xxxxxx.bcf > xxxxxx.vcf

all reported variants on chromosome 3

cat xxxxxx.vcf | grep -v “^##” | cut -f1 | sort | uniq -c | grep “Chr3”

number of A to T mutations in a tabular fashion (-P for pattern,‘t’ for tabular)

cat xxxxxx.vcf | grep -v “^##” | cut -f4,5 | grep -P “^A$” | wc -l

number of indels

cat xxxxxx.vcf | cut -f8 | grep “INDEL” | wc -l

LS0tDQp0aXRsZTogIlZDRi9CQ0Y6IEFubm90YXRpb24gYW5kIGFuYWx5aXMgb2YgbmV4dCBnZW5lcmF0aW9uIHNlcXVlbmNpbmcgKE5HUykgZGF0YSINCmF1dGhvcjogIkJoYWdpcmF0aGkgRGFzaCINCmRhdGU6ICJEZWNlbWJlciAyLCAyMDE4Ig0Kb3V0cHV0OiBodG1sX25vdGVib29rDQotLS0NCg0KIyBWQ0YvQkNGIERvY3VtZW50YXRpb24NCg0KSGVyZSBpcyB0aGUgZG9jdW1lbnRhdGlvbiBmb3IgW1ZDRl0oaHR0cDovL3NhbXRvb2xzLmdpdGh1Yi5pby9odHMtc3BlY3MvVkNGdjQuMy5wZGYpLg0KDQohW1ZDRnNuaXBwZXRdKEM6L1VzZXJzL2JoYWdpL09uZURyaXZlL1Byb2dyYW1taW5nLUxpbnV4L3ZjZi5KUEcpDQoNCg0KDQojIFRvb2xzIGZvciBhbmFseXppbmcgVkNGL0JDRiBkYXRhDQoNClIgYmFzZWQgdG9vbHM6IFt2Y2ZSXShodHRwczovL2dpdGh1Yi5jb20va25hdXNiL3ZjZlIpDQoNClB5dGhvbiBiYXNlZCB0b29sczogW1B5VkNGXShodHRwczovL2dpdGh1Yi5jb20vamFtZXNjYXNib24vUHlWQ0YpLCBbY3l2Y2YyXShodHRwczovL2dpdGh1Yi5jb20vYnJlbnRwL2N5dmNmMikNCg0KTGludXggYmFzZWQgdG9vbHM6IFtiY2Z0b2xsc10oaHR0cHM6Ly9zYW10b29scy5naXRodWIuaW8vYmNmdG9vbHMvaG93dG9zL2luZGV4Lmh0bWwpLCAgW2dpdGh1Yl9iY2Z0b29sc10oaHR0cHM6Ly9zYW10b29scy5naXRodWIuaW8vYmNmdG9vbHMvYmNmdG9vbHMuaHRtbCksICBbYmlvY29uZGFfYmNmdG9vbHNdKGh0dHBzOi8vYmlvY29uZGEuZ2l0aHViLmlvL3JlY2lwZXMvYmNmdG9vbHMvUkVBRE1FLmh0bWwpDQoNCg0KDQojIEFuYWx5emUgVkNGIGZpbGUNCg0Kc2FtdG9vbHMgbXBpbGV1cCAtZiB4eHh4eHguZmFzIC11diB4eHh4eHguc29ydGVkLmJhbSA+IHh4eHh4eC52Y2YNCg0KIyMjIG51bWJlciBvZiBlbnRyaWVzIGZvciBhIHBhcnRpY3VsYXIgY2hyb21vc29tZSANCg0KY2F0IHh4eHh4eC52Y2YgfCBncmVwIC12ICJeIyMiIHwgY3V0IC1mMSB8IGdyZXAgLWMgIl5DaHIxIg0KDQpjYXQgeHh4eHh4LnZjZiB8IGN1dCAtZjEgfCBncmVwICIgQ2hyMSIgfCB3YyAtbA0KDQojIyMgbnVtYmVyIG9mIGFkZW5pbmVzIChBKSBpbiB0aGUgZmlsZQ0KDQpjYXQgeHh4eHh4LnZjZiB8IGdyZXAgLXYgIl4jIyIgfCBjdXQgLWY0IHwgZ3JlcCAtUCAiXkEkIg0KDQpjYXQgeHh4eHh4LnZjZiB8IGN1dCAtZjQgfCBhd2sgJyQxID09ICJBIicgfCB3YyAtbA0KDQoNCiMjIyBudW1iZXIgb2YgZW50cmllcyBoYXZlIGV4YWN0bHkgMjAgc3VwcG9ydGluZyByZWFkcyAocmVhZCBkZXB0aCkNCg0KY2F0IHh4eHh4eC52Y2YgfCBncmVwIC12ICJeIyMiIHwgZ3JlcCAtYyAiRFA9MjA7Ig0KDQpjYXQgeHh4eHh4LnZjZiB8IGN1dCAtZjggfCBncmVwICJEUD0yMCIgfCB3YyAtbA0KDQoNCiMjIyBudW1iZXIgb2YgZW50cmllcyByZXByZXNlbnQgaW5kZWxzDQoNCmNhdCB4eHh4eHgudmNmIHwgZ3JlcCAtdiAiXiMjIiB8IGdyZXAgLWMgSU5ERUwNCg0KY2F0IHh4eHh4eC52Y2YgfCBjdXQgLWY4IHwgZ3JlcCAiSU5ERUwiIHwgd2MgLWwgKHJlZHVjZSB0aGUgb3V0cHV0IGJ5IG9uZSkNCg0KIyMjIG51bWJlciBvZiBlbnRyaWVzIG9uIGEgc3BlY2lmaWMgY2hyb21vc29tZSBhbmQgc3BlY2lmaWMgbG9jYXRpb24gKGUuZy4sIHBvcyAxNzU2NzIgb24gQ2hyMSkNCg0KY2F0IHh4eHh4eC52Y2YgfCBncmVwIC12ICJeIyMiIHwgY3V0IC1mMSwyIHwgZ3JlcCAiQ2hyMSIgfCBncmVwIDE3NTY3Mg0KDQpjYXQgeHh4eHh4LnZjZiB8IGN1dCAtZjEsMiB8IGdyZXAgIkNocjEiIHwgYXdrICckMiA9PSAiMTc1NjcyIicgfCB3YyAtbA0KDQoNCg0KDQoNCiMgQW5hbHl6ZSBjb21wcmVzc2VkIFZDRiAoVkNGLmd6KSBmaWxlDQoNCnNhbXRvb2xzIG1waWxldXAgLWYgeHh4eHh4LmZhcyAtZyB4eHh4eHguc29ydGVkLmJhbSA+IHh4eHh4eC5iY2YNCg0KDQojIyMgYWxsIHJlcG9ydGVkIHZhcmlhbnRzIG9uIGNocm9tb3NvbWUgMw0KDQp6Y2F0IHh4eHh4eC52Y2YuZ3ogfCBjdXQgLWYxIHwgZ3JlcCAiQ2hyMyIgfCB3YyAtbA0KDQojIyMgbnVtYmVyIG9mIEEgdG8gVCBtdXRhdGlvbnMgaW4gYSB0YWJ1bGFyIGZhc2hpb24gKC1QIGZvciBwYXR0ZXJuLCd0JyBmb3IgdGFidWxhcikNCg0KemNhdCB4eHh4eHgudmNmLmd6IHwgY3V0IC1mNCw1IHwgYXdrICckMSA9PSAiQSIgJiYgJDIgPT0gIlQiJyB8IHdjIC1sDQoNCg0KIyMjIG51bWJlciBvZiBpbmRlbHMgaW4gdGhlIGZpbGUNCg0KemNhdCB4eHh4eHgudmNmLmd6IHwgY3V0IC1mOCB8IGdyZXAgIklOREVMIiB8IHdjIC1sDQoNCg0KIyMjIGZpbmRpbmcgdGhlIHR5cGUgb2YgbXV0YXRpb24gYXQgYSBwYXJ0aWN1bGFyIHBvc2l0aW9uIGluIGEgcGFydGljdWxhciBjaHJvbW9zb21lDQoNCnpjYXQgeHh4eHh4LnZjZi5neiB8Z3JlcCAiQ2hyMyJ8YXdrICckMj09IjExOTM3OTIzIid8aGVhZA0KDQoNCiMjIyBudW1iZXIgb2YgZW50cmllcyBoYXZlIHByZWNpc2VseSAyMCBzdXBwb3J0aW5nIHJlYWRzIChyZWFkIGRlcHRoKQ0KDQp6Y2F0IHh4eHh4eC52Y2YuZ3ogfCBjdXQgLWY4IHwgZ3JlcCAiRFA9MjAiIHwgd2MgLWwNCg0KDQoNCg0KDQoNCg0KDQojIENvbnZlcnQgQkNGIHRvIFZDRiBmb3IgYW5hbHlzaXMNCg0KIyMjIGNhbGwgdmFyaWFudHMgdXNpbmcgJ0JDRnRvb2xzIGNhbGwnIHdpdGggdGhlIG11bHRpLWFsbGVsaWMgY2FsbGVyIChvcHRpb24gJy1tJyksIHNob3dpbmcgb25seSB2YXJpYW50IHNpdGVzICgnLXYnKSBhbmQgcHJlc2VudGluZyB0aGUgb3V0cHV0IGluIHVuY29tcHJlc3NlZCBWQ0YgZm9ybWF0ICgnLU8gdicpDQoNCmJjZnRvb2xzIGNhbGwgLW0gLXYgLU8gdiB4eHh4eHguYmNmID4geHh4eHh4LnZjZg0KDQoNCiMjIyBhbGwgcmVwb3J0ZWQgdmFyaWFudHMgb24gY2hyb21vc29tZSAzDQoNCmNhdCB4eHh4eHgudmNmIHwgZ3JlcCAtdiAiXiMjIiB8IGN1dCAtZjEgfCBzb3J0IHwgdW5pcSAtYyB8IGdyZXAgIkNocjMiDQoNCiMjIyBudW1iZXIgb2YgQSB0byBUIG11dGF0aW9ucyBpbiBhIHRhYnVsYXIgZmFzaGlvbiAoLVAgZm9yIHBhdHRlcm4sJ3QnIGZvciB0YWJ1bGFyKQ0KDQpjYXQgeHh4eHh4LnZjZiB8IGdyZXAgLXYgIl4jIyIgfCBjdXQgLWY0LDUgfCBncmVwIC1QICJeQVx0VCQiIHwgd2MgLWwgIA0KDQojIyMgbnVtYmVyIG9mIGluZGVscw0KDQpjYXQgeHh4eHh4LnZjZiB8IGN1dCAtZjggfCBncmVwICJJTkRFTCIgfCB3YyAtbA0KDQoNCg0KDQoNCg0KDQoNCg0KDQo=