To perform the QTL analysis on this dataset, we first need to map all fastq files to the P9424 reference and do variant calling. We will be using both BWA and SAMTOOLS; Index with both to ensure each of the tools finds it’s respective indexes dow the line
bwa index P9424.fna
samtools faidx P9424.fna
for fq in `ls AGRF_CAGRF16155_HVGHWAFXX_gbs/fastq/*.gz`; do id=$(basename $fq| cut -f 1 -d \.); echo bwa mem -t 10 P9424.fna $fq \> bwa_bam_files/$id.sam; done > run_bwa.sh
mkdir bwa_bam_files
bash run_bwa.sh
ls *.sam | xargs -I {} echo samtools view -S -b {} \> {}.bam > sam2bam.sh ## sam to bam create script
bash sam2bam.sh ## sam to bam run script
`rename 's/\.sam.bam/\.bam/g' *.bam` ## rename extensions
ls *.bam | xargs -I {} echo samtools sort {} -o {}.sorted.bam ## sort bam || remove all sorted.bam before retrying this step
`rename 's/\.bam.sorted.bam/\.sorted.bam/g' *.bam` ## rename extensions
ls *.sorted.bam | xargs -I {} echo samtools index -\@10 {} ## index bam
## Variant calling
for i in `ls bwa_bam_files/*.sorted.bam`; do
id=$(basename $i)
echo bcftools mpileup -O u -f P9424.fna $i \| \
bcftools call --ploidy 1 -mv -O z -o bcf_variants/raw_vcf/${id/.sorted.bam/.vcf}
done > call_variants.sh
parallel -j 10 -k < call_variants.sh
ls *.vcf | xargs -I {} echo vcfutils.pl varFilter {} \> {}_filtered.vcf ## in the `bcf_variants/raw_vcf` vcf files folder
ls *.vcf | xargs -I {} bgzip {}
ls *.vcf.gz | xargs -I {} tabix -p vcf {} ## index
for i in *.gz; do
bcftools reheader -s sample $i > m.$i;
mv m.$i $i;
tabix -f -p vcf $i;
done ## reheader + re-index
echo bcftools merge *.gz --threads 16 -O v -o P9424_merged_ddRADseq.vcf ## merge all indexed files
sed -i 's/\.bam//g' P9424_merged_ddRADseq.vcf
sed -i 's/bwa_bam_files\///g' P9424_merged_ddRADseq.vcf
ls *.sorted.bam > samples_bams.txt
bcftools mpileup -O u -b samples_bams.txt -f P9424.fna --threads 16 | bcftools call --ploidy 1 -mv -O v -o P9424_all_samples.vcf
python3 vcf2phylip.py --fasta --phylip-disable -m 1 --input merged-ddRADseq.vcf ## convert to fasta
snp-sites -m merged-ddRADseq.min1.fasta -o merged-ddRADseq.msa.fasta
raxmlHPC -m GTRGAMMA -p 12345 -s merged-ddRADseq.msa.fasta -n A_lentis_ddRADseq_bootstrap -f a -x 12345 -N 100 -T 12
The protocol for merging VCF files can be found here
The ref_map.pl program will execute the STACKS pipeline by running each of the Stacks components individually. The pipeline takes two files, a bam file of the reference aligned sample, and a map file specifying samples<>TAB<>POPULATION.
The files are located at: /ppgdata/fredrick/assembly_data/ascochyta/ddRADseq_P9424
ref_map.pl uses BAM files aligned to reference.denovo_map.pl uses RAW fastq files independent of reference.for i in `ls bwa_bam_files/*.bam`; do
id=$(basename $i)
gp=$(basename $i | cut -f1 -d \_ | cut -f 2 -d \-)
echo -e $id"\t"$gp
done > pop.map
mkdir stacks
ref_map.pl -T 8 -o stacks --popmap pop.map --samples bwa_bam_files --unpaired
denovo_map.pl -M 7 -T 8 -o ./stacks --popmap pop.map --samples AGRF_CAGRF16155_HVGHWAFXX_gbs/fastq
This script was last updated on:
## Thu 18 Feb 2021, 10:04:57
………………. THE END ………………….
Author: Fredrick M. Mobegi, PhD
Created: 02-11-2020 Mon 08:00
Copyright © 2020 Fredrick Mobegi | This notebook is for research purposes only and may contain links to embargoed or legally privileged data.