First, we need to install bcftools using conda into our environment
Activate conda env dablab
conda activate dablab
Install bcftools and its dependencies if required.
conda install -c bioconda bcftools
conda install -c conda-forge -c bioconda libdeflate=1.0
Download the alignment and its index file if you missed the previous tutorials: #
https://drive.google.com/file/d/1cVdoRr3cl-OF9jPMtSHFW31VhkHDuKMg/view?usp=sharing
https://drive.google.com/file/d/1_xsFlA9zPlP4U4qXW2iFuK154yCaPC6d/view?usp=sharing
Running the commands mpileups and call through a pipe |
bcftools mpileup -f bowtie2-2.4.4/example/reference/lambda_virus.fa virus.align.sorted.bam | \
bcftools call -mv -Ov --ploidy 1 -o lambda_virus.vcf
mpileup
part generates genotype likelihoods at each genomic position with coverage. The second call
part makes the actual calls. The -m
switch tells the program to use the default calling method, the -v
option asks to output only variant sites, finally the -O
option selects the output format.
We also use --ploidy 1
since the genome we are working is a virus.
less lambda_virus.vcf
A quality score (the QUAL) column, which gives an estimate of how likely it is to observe a call purely by chance.
cut -f1-6 lambda_virus.vcf|less
bcftools view -i '%QUAL>=20' lambda_virus.vcf
bcftools view -i '%QUAL>=30' lambda_virus.vcf| cut -f1-6|grep -v "#"|wc -l
bcftools view -i '%QUAL>=20' lambda_virus.vcf | bcftools stats
bcftools view -i '%QUAL>=20' lambda_virus.vcf | bcftools stats| grep TSTV