Variant calling using bcftools:

First, we need to install bcftools using conda into our environment

Activate conda env dablab

conda activate dablab

Install bcftools and its dependencies if required.


conda install -c bioconda bcftools

conda install -c conda-forge -c bioconda libdeflate=1.0

Download the alignment and its index file if you missed the previous tutorials: #

https://drive.google.com/file/d/1cVdoRr3cl-OF9jPMtSHFW31VhkHDuKMg/view?usp=sharing

https://drive.google.com/file/d/1_xsFlA9zPlP4U4qXW2iFuK154yCaPC6d/view?usp=sharing

Running the commands mpileups and call through a pipe |


bcftools mpileup -f bowtie2-2.4.4/example/reference/lambda_virus.fa virus.align.sorted.bam | \
bcftools call -mv -Ov --ploidy 1 -o lambda_virus.vcf

mpileup part generates genotype likelihoods at each genomic position with coverage. The second call part makes the actual calls. The -m switch tells the program to use the default calling method, the -v option asks to output only variant sites, finally the -O option selects the output format.

We also use --ploidy 1 since the genome we are working is a virus.

Investigate the vcf output and info entries
less lambda_virus.vcf

A quality score (the QUAL) column, which gives an estimate of how likely it is to observe a call purely by chance.


cut -f1-6 lambda_virus.vcf|less

bcftools view -i '%QUAL>=20' lambda_virus.vcf

bcftools view -i '%QUAL>=30' lambda_virus.vcf| cut -f1-6|grep -v "#"|wc -l

bcftools view -i '%QUAL>=20' lambda_virus.vcf | bcftools stats

bcftools view -i '%QUAL>=20' lambda_virus.vcf | bcftools stats| grep TSTV