1 Indexing: Prepare reference for mapping

To perform the QTL analysis on this dataset, we first need to map all fastq files to the P9424 reference and do variant calling. We will be using both BWA and SAMTOOLS; Index with both to ensure each of the tools finds it’s respective indexes dow the line

bwa index P9424.fna
samtools faidx P9424.fna

2 Map reads to reference


for fq in `ls AGRF_CAGRF16155_HVGHWAFXX_gbs/fastq/*.gz`; do id=$(basename $fq| cut -f 1 -d \.); echo bwa mem -t 10 P9424.fna $fq  \> bwa_bam_files/$id.sam; done > run_bwa.sh
mkdir bwa_bam_files
bash run_bwa.sh

3 SAMTOOLS and BCFtools

ls *.sam | xargs -I {} echo samtools view -S -b {} \> {}.bam > sam2bam.sh ## sam to bam create script
bash sam2bam.sh ## sam to bam run script
`rename 's/\.sam.bam/\.bam/g' *.bam` ## rename extensions 
ls *.bam | xargs -I {} echo samtools sort {} -o {}.sorted.bam ## sort bam || remove all sorted.bam before retrying this step
`rename 's/\.bam.sorted.bam/\.sorted.bam/g' *.bam` ## rename extensions 
ls *.sorted.bam | xargs -I {} echo samtools index -\@10 {} ## index bam 

## Variant calling
for i in `ls bwa_bam_files/*.sorted.bam`; do
  id=$(basename $i)
  echo bcftools mpileup -O u -f P9424.fna $i \| \
  bcftools call --ploidy 1 -mv -O z -o bcf_variants/raw_vcf/${id/.sorted.bam/.vcf}
done > call_variants.sh

parallel -j 10 -k < call_variants.sh

ls *.vcf | xargs -I {} echo vcfutils.pl varFilter {}  \> {}_filtered.vcf ## in the `bcf_variants/raw_vcf` vcf files folder
ls *.vcf | xargs -I {} bgzip {}
ls *.vcf.gz | xargs -I {} tabix -p  vcf {} ## index

for i in *.gz; do 
  bcftools reheader -s sample $i > m.$i;
  mv m.$i $i;
  tabix -f -p vcf $i;
done ## reheader + re-index

echo bcftools merge *.gz --threads 16 -O v -o P9424_merged_ddRADseq.vcf ## merge all indexed files

sed -i 's/\.bam//g' P9424_merged_ddRADseq.vcf
sed -i 's/bwa_bam_files\///g' P9424_merged_ddRADseq.vcf

3.1 MPILEUP For all samples together.

ls *.sorted.bam > samples_bams.txt
bcftools mpileup -O u -b samples_bams.txt -f P9424.fna --threads 16 | bcftools call --ploidy 1 -mv -O v -o P9424_all_samples.vcf

4 RAXML Phylogeny of the samples

python3 vcf2phylip.py --fasta --phylip-disable -m 1 --input merged-ddRADseq.vcf ## convert to fasta
snp-sites -m merged-ddRADseq.min1.fasta -o merged-ddRADseq.msa.fasta

raxmlHPC -m GTRGAMMA -p 12345 -s merged-ddRADseq.msa.fasta -n A_lentis_ddRADseq_bootstrap -f a -x 12345 -N 100 -T 12

The protocol for merging VCF files can be found here

5 Running STACKS variant calling

The ref_map.pl program will execute the STACKS pipeline by running each of the Stacks components individually. The pipeline takes two files, a bam file of the reference aligned sample, and a map file specifying samples<>TAB<>POPULATION.

The files are located at: /ppgdata/fredrick/assembly_data/ascochyta/ddRADseq_P9424

for i in `ls bwa_bam_files/*.bam`; do 
  id=$(basename $i)
  gp=$(basename $i | cut -f1 -d \_ | cut -f 2 -d \-)
  echo -e $id"\t"$gp
done > pop.map

mkdir stacks
ref_map.pl -T 8 -o stacks --popmap pop.map --samples bwa_bam_files --unpaired 
denovo_map.pl -M 7 -T 8 -o ./stacks --popmap pop.map --samples AGRF_CAGRF16155_HVGHWAFXX_gbs/fastq

6 Session information

This script was last updated on:

## Thu 18 Feb 2021, 10:04:57

………………. THE END ………………….

Author: Fredrick M. Mobegi, PhD
Created: 02-11-2020 Mon 08:00
Copyright © 2020 Fredrick Mobegi | This notebook is for research purposes only and may contain links to embargoed or legally privileged data.