The recombination frequency of Varroa destructor, a
parasitic mite of honeybees was estimated for male and female adult
mites. We used two analysis methods: manual calculation of exact
recombination frequency, and computational estimation using a linkage
mapping software, Lep-MAP3 (Rastas 2017). For both analyses we used as
input a VCF file containing only the ‘Informative sites’. Informative
sites are sites that are heterozygotic in the F1 female, and homozygotic
for one allele in the F1 male, and his mother (F0 female). Only for
these sites we can phase (determine the allele parental origin) the F2
generation genotypes, and follow the inheritance of specific sites
through the generations (Fig __).
All biosamples are available in Sequence Read Archive (SRA) under the
accession PRJNA794941.
library("tidyverse")
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.1 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.1.0
## ✔ tidyr 1.3.0 ✔ stringr 1.5.0
## ✔ readr 2.1.4 ✔ forcats 1.0.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library("plyr")
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
##
## Attaching package: 'plyr'
##
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
##
## The following object is masked from 'package:purrr':
##
## compact
library("dplyr")
library("ggplot2")
library("scales")
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
library("ggpubr")
##
## Attaching package: 'ggpubr'
##
## The following object is masked from 'package:plyr':
##
## mutate
library("gridExtra") # for arranging a few plots in one area
##
## Attaching package: 'gridExtra'
##
## The following object is masked from 'package:dplyr':
##
## combine
#library("grid")
#library("GGally")
library("data.table")
##
## Attaching package: 'data.table'
##
## The following objects are masked from 'package:dplyr':
##
## between, first, last
##
## The following object is masked from 'package:purrr':
##
## transpose
library("stringr")
library("janitor")
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library("plotly") # to identify outliers.
##
## Attaching package: 'plotly'
##
## The following objects are masked from 'package:plyr':
##
## arrange, mutate, rename, summarise
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library("readr") # to extract numbers from a vector
library("tidyr")
library("LinkageMapView") # for constructing the linkage map
library("vcfR") # for extracting genotype data from a vcf file
##
## ***** *** vcfR *** *****
## This is vcfR 1.14.0
## browseVignettes('vcfR') # Documentation
## citation('vcfR') # Citation
## ***** ***** ***** *****
knitr::opts_chunk$set(include = FALSE)
Extract genotypes for each site and individual. The metadata for all samples can be found in here.
We used the VCF file in two methods to estimate varroa mite
recombination frequency:
- Lep-MAP3, a linkage mapping software (Rastas
2017).
- Manual estimation of the recombination rate, R script
can be found in the file manually-estimating-recom.Rmd
We followed the documentation in the Lep-MAP3 Wiki
page, with a few adjustments (See Fig 1).
We ran the first two modules: ParentCall2,
Filtering2, then skipped to the last module
OrderMarkers2 that produces the final map file.
The input and output files for each modules are as follow:
(1) ParentCall2 module; options = removNonInformative;
input = filtered VCF file and pedigree.txt; output = data.call.
(2) Filtering2 module; options = dataTolerance=0.0001,
removeNonInformative; input = data.call; output = data_f.call. (3)
OrderMarkers2 module; options = useKosambi=1
numMergeIterations=100 sexAveraged=1 outputPhasedData=2
grandparentPhase=1; input = data_f.call and map.txt files; output =
order.txt and order.mapped files.
In the following codes we generated two files necessary for the
Lep-MAP analysis:
- the list of informative sites (used to filter the original VCF file),
for the first module, ParentCall2; - and the map.txt
file, containing the physical position of sites, for the last module,
OrderMarkers2.
Keep:
Change:
For 0/0 x 0/1 cross:
F1_fem = 0/1:22:10,12:10:353:12:444:-33.6837,0,-25.484
F1_male = 0/0:17:17,0:17:565:0:0:0,-5.11751,-51.1547
F0 = 0/0:17:17,0:17:565:0:0:0,-5.11751,-51.1547
For 1/1 x 0/1 cross:
F1_fem = 0/1:22:10,12:10:353:12:444:-33.6837,0,-25.484
F1_male = 1/1:17:0,17:0:0:17:629:-56.9466,-5.11751,0
F0 = 1/1:17:0,17:0:0:17:629:-56.9466,-5.11751,0
Code for changing sample genotype, see this bash script
filter for informative sites in the original VCF file.
in each family, filter for sites with:
(a) homo x hetero F1 cross (0/0 x 0/1), so we can predict the parental
and recombinant F2 types.
(b) homo F0 (0/0), so we can phase the hetero F1 sites.
There are 13,651 informative sites for the 0/0 x 0/1 F1
“family”.
Save the file as tsv, so it can be used in the next steps.
#write_delim(InfoSites_00_01, "/Users/nuriteliash/Documents/GitHub/varroa-linkage-map/results/InfoSites_00_01.tsv", col_names = FALSE)
# check manually a few IDs and sites,
#InfoSites_00_01 <- reduce(sites, bind_rows) %>% as.data.frame() %>% distinct(site, .keep_all = TRUE)
#tableInfo = InfoSites_00_01 %>% select(c(site,`57_58b_grnson`))
#tableAll = df %>% select(c(site,`57_58b_grnson`))
#left_join(tableInfo, tableAll, by = "site") %>% rename(c("57_58b_grnson.x" ="InfoSites", "57_58b_grnson.y" = "AllSites")) %>% view()
There are 12,158 informative sites for the 1/1 x 0/1 F1
“family”
Save the file as tsv, so it can be used in the next steps.
#write_delim(InfoSites_11_01, "/Users/nuriteliash/Documents/GitHub/varroa-linkage-map/results/InfoSites_11_01.tsv", col_names = FALSE)
# check manually a few IDs and sites,
#InfoSites_11_01 <- reduce(sites, bind_rows) %>% as.data.frame() %>% distinct(site, .keep_all = TRUE)
#tableInfo = InfoSites_11_01 %>% select(c(site,`300_301a_grnson`))
#tableAll = df %>% select(c(site,`300_301a_grnson`))
#check <- left_join(tableInfo, tableAll, by = "site") %>% rename(c("300_301a_grnson.x" ="InfoSites", "300_301a_grnson.y" = "AllSites")) %>% na.omit("InfoSites")
#all(check$InfoSites == check$AllSites)
make a map.txt file with the sites assigned to 7 LGs based on their physical position use this file as an input for the Order module.
the ordering doesn’t have to be fantastic either at the end. We’re just looking to see if we can get some sort of recombination rate estimates and a sense of which sex is showing recombination.
the map.txt files are used along with the Data_f.call (generated by the Filtering2 module) will be used as input for the final module, OrderMarkers2 module.
the outputs of the OrderMarkers2 module are
order.mapped files.
These are used in the next chunk: - correlating the physical and genetic
positions of the sites.
- making the genetic map of varroa mite.
Does the markers’ physical position correlates to its genetic position, as determined by recombination rate?
The genetic maps are based on two different crosses: - 0/0 x 0/1 - 1/1 x 0/1