In this study we explored the mode of genetic inheritance of Varroa destructor, a parasitic mite of honeybees, so far known as a haplodiploid species with a arrhenotokous parthenogenesis reproductive mode. The mite shows contrasting phenomena of highly inbreeding life style (sib-mating), yet maintaining relatively high genetic variation.
The experimental setup consisted of a three-generational
pedigree.
Sample size: 30 families, total of 223 individuals.
3 different colonies, from OIST apiary.
For details of mite collection and pedigree construction, please see the
original manuscript.
All biosamples are available in Sequence Read Archive (SRA) under the
accession PRJNA794941.
All data are available and reproducible from the GitHub
page.
library("tidyverse")
library("plyr")
library("dplyr")
library("ggplot2")
library("scales")
library("ggpubr")
library("gridExtra")
library("grid")
library("GGally")
library("vcfR") # for extracting genotype data from a vcf file
library("data.table")
library("stringr")
library("janitor")
library("gmodels")
library("rstatix")
library("freqtables")
library("broom")
library("DescTools") # for the Goodness-of-Fit test, 3 variables; and for Breslow-Day Test for Homogeneity of the Odds Ratios
library("vcd") # for the woolf test testign log odds for each pair
library("patchwork") # for gathering the plots
library("fuzzyjoin") # to join tables based on a string in a column
#library("aspi") # Repeated G–tests of Goodness-of-Fit, work only for 2 variables..
#library("RVAideMemoire") # Repeated G–tests of Goodness-of-Fit, work only for 2 variables...
#library("InfoTrad")
#library("ggthemes") # for more colors in the ggplot
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE,
fig.width = 10,
fig.asp = 0.4,
out.width = "100%")
#fig.width = 6,fig.asp = 0.8,out.width = "100%"
Extract genotypes for each site and individual. The metadata for all samples can be found in here.
vcf <- read.vcfR("/Users/nuriteliash/Documents/GitHub/varroa-linkage-map/data/vcf_filter/Q40BIALLDP16HDP40mis.5Chr7/Q40BIALLDP16HDP40mis.5Chr7.recode.vcf", verbose = FALSE )
vcf
## ***** Object of Class vcfR *****
## 223 samples
## 7 CHROMs
## 35,169 variants
## Object size: 293.5 Mb
## 0 percent missing data
## ***** ***** *****
# extract the genotype for each site in each individual
gt <- extract.gt(vcf, element = "GT")
gt <- as.data.frame(t(gt)) %>%
rownames_to_column("ID")
#clean the ID names
gt$ID <- sub("_[^_]+$", "", gt$ID)
The individual ID name nomenclature is composed of 3 parts, separated
by an underscore (_).
The first part is the family ID, the second is the unique name of the
individual, and the the third part indicates its generation, sex, and
its position in relation to the foundress mite (generation F0).
For example:
Individual ID 240_240a_son, belong to family
240,
has a unique name of 240a,
and is the son of the foundress mite of family 240, that is,
its a male of F1 generation.
Individual ID 240_241b_grndat, belong to family
240,
has a unique name of 241b,
and is the granddaughter of the foundress mite of family 240,
that is, its a female of F2 generation (and the daughter of
240_240a_son).
From the 223 individuals of 30 families, we excluded non adults mites
(for which sex could not be determined for sure).
Eventually, we kept 144 individuals (adult males and females in F0, F1
and F2 generations) of 26 families, and all 35,169 biallelic sites.
For example, these are the first 6 sites (13565-25513) of chromosome (NW_019211454.1), in family number 240.
table <- gt %>%
t() %>%
as.data.frame() %>%
row_to_names(row_number = 1) %>%
dplyr::select(contains(c("son", "dat", "fnd"))) # keep only adults of F0, F1 and F2
table %>% select(contains(c("240_"))) %>% head()
## 240_240a_son 240_241c_grnson 240_241b_grndat
## NW_019211454.1_13565 0/1 <NA> <NA>
## NW_019211454.1_18037 0/0 <NA> <NA>
## NW_019211454.1_20998 <NA> <NA> <NA>
## NW_019211454.1_24935 <NA> 0/0 <NA>
## NW_019211454.1_25470 <NA> <NA> <NA>
## NW_019211454.1_25513 <NA> <NA> <NA>
## 240_241a_grndat 240_241_dat 240_240_fnd
## NW_019211454.1_13565 0/1 0/1 0/0
## NW_019211454.1_18037 0/0 0/0 0/0
## NW_019211454.1_20998 0/0 0/0 0/0
## NW_019211454.1_24935 0/0 0/0 0/0
## NW_019211454.1_25470 0/0 0/0 0/0
## NW_019211454.1_25513 0/0 <NA> 0/0
The family members include:
Each individual, is genotyped for each site with one of the three genotypes:
For more information about the mapping, variant calling and variant filtration, please see the Snakemake pipeline in here.
In the following code we viewed the F2 generation genotype of all possible nine crosses between F1 male and females.
Then we crossed heterozygotic sites, to explore the mode of genetic inheritance in varroa mite:
In addition to the fixed F1 genotypes, we also “fixed” the F0 female
genotype to match that of her son (F1 male), so we can phase the alleles
of the F1 generations.
“Phasing alleles” is the process of determine the parental origin of
each allele.
for heterozygotic genotype phasing is critical, as it allows the
tracking of the allele from one generation tot he next.
Varroa mite was known to have a haplo-diploid reproductive system, in which females are produced sexually, and males are produced in an arrhenotokous haplodiploid system.
For each crossing we observed the F2 genotype proportion, and compared it to the proportion of 3 known reproductive modes:
# set the families (include only families with at least one adult F2)
family = grep("grndat|grnson",gt$ID, value=TRUE) %>%
str_extract("[^_]+") %>%
unique()
# or, include all F2 samples, but indicate if they have an adult sister (may indicate if the F1 female was fertilized)
#family = grep("grn",gt$ID, value=TRUE) %>%
# str_extract("[^_]+") %>%
# unique()
we expect all F2 offspring to be homozygotic (0/0) like their parents (F1)
# define a list to put all the data frames in
obs <- list()
for (fam in family) {
obs[[fam]] <- table %>%
dplyr::select(starts_with(fam)) %>%
dplyr::filter_at(vars(matches("_fnd")), all_vars(. == "0/0")) %>% # force F0 female to be homo, like her son
dplyr::filter_at(vars(matches("_son")), all_vars(. == "0/0")) %>%
dplyr::filter_at(vars(matches("_dat")), all_vars(. == "0/0")) %>%
dplyr::select(contains("grn")) %>%
tidyr::pivot_longer(everything()) %>%
dplyr::rename(sample = name, gt = value) %>%
dplyr::count(sample, gt, .drop = FALSE) %>%
dplyr::filter(gt %in% c("0/0", "1/1", "0/1")) %>%
mutate(n = as.numeric(n)) %>%
group_by(sample) %>%
mutate(total = as.numeric(sum(n))) %>%
mutate(sex = case_when(
grepl("son", sample) ~ "male",
grepl("dat", sample) ~ "female"))
}
# bind all families together, to a final data frame containing all observed counts
observed <- do.call("rbind", obs) %>% mutate(sample = as.character(sample))
samples <- data.frame(sample = rep(unique(observed$sample), each =3), gt = rep(c("0/0", "0/1", "1/1")))
# define the group of abnormal males
abnorm_males = tibble(sample = c("240_241c_grnson", "400_401a_grnson","412_413a_grnson", "426_427b_grnson", "458_459a_grnson", "46_47d_grnson"), sex = "male", normality = "abnormal")
samples_obs <- left_join(samples, observed, by=c("sample","gt")) %>% mutate(sex = case_when(
grepl("son", sample) ~ "male",
grepl("dat", sample) ~ "female")) %>%
group_by(sample) %>%
replace(is.na(.), 0) %>%
mutate(total = as.numeric(sum(n))) %>%
dplyr::mutate(prop = n/total) %>%
left_join(abnorm_males, by = "sample") %>%
dplyr::select(-sex.y) %>%
replace(is.na(.), "normal") %>%
dplyr::rename(sex = sex.x) %>%
unite("type", sex,normality, remove = FALSE)
# make a table with the expected proportions for the different modes of reproduction:
haploid = data.frame(mode = "haploid", gt = c("0/0", "0/1", "1/1"),prop = c(1, 0, 0))
automixis = data.frame(mode = "automixis", gt = c("0/0", "0/1", "1/1"),prop = c(1, 0, 0))
sexual = data.frame(mode = "sexual",gt = c("0/0", "0/1", "1/1"),prop = c(1, 0, 0))
modes = rbind(haploid,automixis,sexual)
# order the modes
modes$mode <- factor(modes$mode, level=c("haploid", "automixis", "sexual"))
p_modes <- modes %>%
ggplot(aes(fill=gt, y=prop, x=mode)) +
geom_bar(position="fill", stat="identity", ) +
xlab("Possible reproductive modes") +
ylab("Genotype proportion") +
labs(title = "Expected proportions") +
labs(fill = "Genotype") +
theme_classic() +
theme(axis.text.x = element_text(size=12),
axis.text.y = element_text(size=12)) +
scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(legend.position = "none")
# order the types for the observed proportions
samples_obs$type <- factor(samples_obs$type, level=c("female_normal", "male_normal", "male_abnormal"))
# make a table with site count per pooled data, for the three "types"
pooled_obs_count = samples_obs %>%
filter(gt =="0/0") %>%
group_by(type) %>%
mutate(count = sum(total)) %>%
select(c(type,count)) %>%
unique()
p_samples <- samples_obs %>% dplyr::filter(total>=10) %>%
ggplot(aes(fill=gt, y=prop, x=type)) +
geom_bar(position="fill", stat="identity", ) +
scale_x_discrete("Genotype proportion", labels = c("Females","Normal
males", "Abnormal
males")) +
xlab("Observed genotype proportion") +
ylab("Genotype proportion") +
labs(title = "Observed proportions") +
labs(fill = "Genotype") +
theme_classic() +
theme(axis.text.x = element_text(size=12),
axis.text.y = element_text(size=12)) +
scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(legend.position = "none")
pooled_obs_count
## # A tibble: 3 × 2
## # Groups: type [3]
## type count
## <fct> <dbl>
## 1 male_normal 17558
## 2 female_normal 35422
## 3 male_abnormal 18520
# Create plot with legend
p_legend <- p_samples +theme(legend.position = "bottom")
# Create user-defined function, which extracts legends from ggplots
extract_legend <- function(my_ggp) {
step1 <- ggplot_gtable(ggplot_build(my_ggp))
step2 <- which(sapply(step1$grobs, function(x) x$name) == "guide-box")
step3 <- step1$grobs[[step2]]
return(step3)
}
# Apply user-defined function to extract legend
shared_legend <- extract_legend(p_legend)
# Draw plots with shared legend
grid.arrange(arrangeGrob(p_modes, p_samples,ncol = 2), shared_legend, nrow = 2, heights = c(10, 1),top = grid::textGrob("Pooled offspring genotype, crossing male (0/0) x female (0/0)", x = 0, hjust = 0))
# and plot each individual separately
samples_obs %>% filter(sex == "male") %>%
dplyr::filter(total>=10) %>%
ggplot(aes(fill=gt, y=prop, x=sample)) +
geom_bar(position="fill", stat="identity", ) +
xlab("F2 offspring") +
ylab("Genotype proportion") +
labs(title = "Male offspring genotype, crossing male (0/0) x female (0/0)") +
geom_text(data = filter(samples_obs, gt == "1/1", sex =="male", total>=10), aes(label=total), vjust = 0) +
facet_wrap(~ type, scales ="free_x") +
labs(fill = "Genotype") +
theme_classic()+scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(axis.text.x=element_text(angle = 45, hjust = 1)) +
theme(legend.position = "none")
samples_obs %>% filter(sex == "female") %>%
dplyr::filter(total>=10) %>%
ggplot(aes(fill=gt, y=prop, x=sample)) +
geom_bar(position="fill", stat="identity", ) +
xlab("F2 offspring") +
ylab("Genotype proportion") +
labs(title = "Female offspring genotype, crossing male (0/0) x female (0/0)") +
# geom_text(aes(label = n), position = position_stack(vjust = 0.5)) +
geom_text(data = filter(samples_obs, gt == "1/1", sex =="female", total>=10), aes(label=total), vjust = 0) +
labs(fill = "Genotype") +
theme_classic()+scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(axis.text.x=element_text(angle = 45, hjust = 1))+
theme(legend.position = "none")
we expect all F2 offspring to be homozygotic (1/1) like their parents (F1)
# define a list to put all the data frames in
obs <- list()
for (fam in family) {
obs[[fam]] <- table %>%
dplyr::select(starts_with(fam)) %>%
dplyr::filter_at(vars(matches("_fnd")), all_vars(. == "1/1")) %>% # force F0 female to be homo, like her son
dplyr::filter_at(vars(matches("_son")), all_vars(. == "1/1")) %>%
dplyr::filter_at(vars(matches("_dat")), all_vars(. == "1/1")) %>%
dplyr::select(contains("grn")) %>%
tidyr::pivot_longer(everything()) %>%
dplyr::rename(sample = name, gt = value) %>%
dplyr::count(sample, gt, .drop = FALSE) %>%
dplyr::filter(gt %in% c("0/0", "1/1", "0/1")) %>%
mutate(n = as.numeric(n)) %>%
group_by(sample) %>%
mutate(total = as.numeric(sum(n))) %>%
mutate(sex = case_when(
grepl("son", sample) ~ "male",
grepl("dat", sample) ~ "female"))
}
# bind all families together, to a final data frame containing all observed counts
observed <- do.call("rbind", obs) %>% mutate(sample = as.character(sample))
samples <- data.frame(sample = rep(unique(observed$sample), each =3), gt = rep(c("0/0", "0/1", "1/1")))
# define the group of abnormal males
abnorm_males = tibble(sample = c("240_241c_grnson", "400_401a_grnson","412_413a_grnson", "426_427b_grnson", "458_459a_grnson", "46_47d_grnson"), sex = "male", normality = "abnormal")
samples_obs <- left_join(samples, observed, by=c("sample","gt")) %>% mutate(sex = case_when(
grepl("son", sample) ~ "male",
grepl("dat", sample) ~ "female")) %>%
group_by(sample) %>%
replace(is.na(.), 0) %>%
mutate(total = as.numeric(sum(n))) %>%
dplyr::mutate(prop = n/total) %>%
left_join(abnorm_males, by = "sample") %>%
dplyr::select(-sex.y) %>%
replace(is.na(.), "normal") %>%
dplyr::rename(sex = sex.x) %>%
unite("type", sex,normality, remove = FALSE)
# make a table with the expected proportions for the different modes of reproduction:
haploid = data.frame(mode = "haploid", gt = c("0/0", "0/1", "1/1"),prop = c(0, 0, 1))
automixis = data.frame(mode = "automixis", gt = c("0/0", "0/1", "1/1"),prop = c(0, 0, 1))
sexual = data.frame(mode = "sexual",gt = c("0/0", "0/1", "1/1"),prop = c(0, 0, 1))
modes = rbind(haploid,automixis,sexual)
# order the modes
modes$mode <- factor(modes$mode, level=c("haploid", "automixis", "sexual"))
p_modes <- modes %>%
ggplot(aes(fill=gt, y=prop, x=mode)) +
geom_bar(position="fill", stat="identity", ) +
xlab("Possible reproductive modes") +
ylab("Genotype proportion") +
labs(title = "Expected proportions") +
labs(fill = "Genotype") +
theme_classic() +
theme(axis.text.x = element_text(size=12),
axis.text.y = element_text(size=12)) +
scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(legend.position = "none")
# order the types for the observed proportions
samples_obs$type <- factor(samples_obs$type, level=c("female_normal", "male_normal", "male_abnormal"))
# make a table with site count per pooled data, for the three "types"
pooled_obs_count = samples_obs %>%
filter(gt =="0/0") %>%
group_by(type) %>%
mutate(count = sum(total)) %>%
select(c(type,count)) %>%
unique()
p_samples <- samples_obs %>% dplyr::filter(total>=10) %>%
ggplot(aes(fill=gt, y=prop, x=type)) +
geom_bar(position="fill", stat="identity", ) +
scale_x_discrete("Genotype proportion", labels = c("Females","Normal
males", "Abnormal
males")) +
xlab("Observed genotype proportion") +
ylab("Genotype proportion") +
labs(title = "Observed proportions") +
labs(fill = "Genotype") +
theme_classic() +
theme(axis.text.x = element_text(size=12),
axis.text.y = element_text(size=12)) +
scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(legend.position = "none")
pooled_obs_count
## # A tibble: 3 × 2
## # Groups: type [3]
## type count
## <fct> <dbl>
## 1 male_normal 10491
## 2 female_normal 23406
## 3 male_abnormal 14078
# Create plot with legend
p_legend <- p_samples +theme(legend.position = "bottom")
# Create user-defined function, which extracts legends from ggplots
extract_legend <- function(my_ggp) {
step1 <- ggplot_gtable(ggplot_build(my_ggp))
step2 <- which(sapply(step1$grobs, function(x) x$name) == "guide-box")
step3 <- step1$grobs[[step2]]
return(step3)
}
# Apply user-defined function to extract legend
shared_legend <- extract_legend(p_legend)
# Draw plots with shared legend
grid.arrange(arrangeGrob(p_modes, p_samples,ncol = 2), shared_legend, nrow = 2, heights = c(10, 1),top = grid::textGrob("Pooled offspring genotype, crossing male (1/1) x female (1/1)", x = 0, hjust = 0))
# and plot each individual separately
samples_obs %>% filter(sex == "male") %>%
dplyr::filter(total>=10) %>%
ggplot(aes(fill=gt, y=prop, x=sample)) +
geom_bar(position="fill", stat="identity", ) +
xlab("F2 offspring") +
ylab("Genotype proportion") +
labs(title = "Male offspring genotype, crossing male (1/1) x female (1/1)") +
geom_text(data = filter(samples_obs, gt == "1/1", sex =="male", total>=10), aes(label=total), vjust = 0) +
facet_wrap(~ type, scales ="free_x") +
labs(fill = "Genotype") +
theme_classic()+scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(axis.text.x=element_text(angle = 45, hjust = 1))+
theme(legend.position = "none")
samples_obs %>% filter(sex == "female") %>%
dplyr::filter(total>=10) %>%
ggplot(aes(fill=gt, y=prop, x=sample)) +
geom_bar(position="fill", stat="identity", ) +
xlab("F2 offspring") +
ylab("Genotype proportion") +
labs(title = "Female offspring genotype, crossing male (1/1) x female (1/1)") +
# geom_text(aes(label = n), position = position_stack(vjust = 0.5)) +
geom_text(data = filter(samples_obs, gt == "1/1", sex =="female", total>=10), aes(label=total), vjust = 0) +
labs(fill = "Genotype") +
theme_classic()+scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(axis.text.x=element_text(angle = 45, hjust = 1))+
theme(legend.position = "none")
We expect zero sites for this cross, because the F1 are
siblings.
Indeed, there are only 23 sites in the F2 pooled females, and 14 for the
F2 pooled males:
# define a list to put all the data frames in
obs <- list()
for (fam in family) {
obs[[fam]] <- table %>%
dplyr::select(starts_with(fam)) %>%
dplyr::filter_at(vars(matches("_fnd")), all_vars(. == "0/0")) %>% # force F0 female to be homo, like her son
dplyr::filter_at(vars(matches("_son")), all_vars(. == "0/0")) %>%
dplyr::filter_at(vars(matches("_dat")), all_vars(. == "1/1")) %>%
dplyr::select(contains("grn")) %>%
tidyr::pivot_longer(everything()) %>%
dplyr::rename(sample = name, gt = value) %>%
dplyr::count(sample, gt, .drop = FALSE) %>%
dplyr::filter(gt %in% c("0/0", "1/1", "0/1")) %>%
mutate(n = as.numeric(n)) %>%
group_by(sample) %>%
mutate(total = as.numeric(sum(n))) %>%
mutate(sex = case_when(
grepl("son", sample) ~ "male",
grepl("dat", sample) ~ "female"))
}
# bind all families together, to a final data frame containing all observed counts
observed <- do.call("rbind", obs) %>% mutate(sample = as.character(sample))
samples <- data.frame(sample = rep(unique(observed$sample), each =3), gt = rep(c("0/0", "0/1", "1/1")))
# define the group of abnormal males
abnorm_males = tibble(sample = c("240_241c_grnson", "400_401a_grnson","412_413a_grnson", "426_427b_grnson", "458_459a_grnson", "46_47d_grnson"), sex = "male", normality = "abnormal")
samples_obs <- left_join(samples, observed, by=c("sample","gt")) %>% mutate(sex = case_when(
grepl("son", sample) ~ "male",
grepl("dat", sample) ~ "female")) %>%
group_by(sample) %>%
replace(is.na(.), 0) %>%
mutate(total = as.numeric(sum(n))) %>%
dplyr::mutate(prop = n/total) %>%
left_join(abnorm_males, by = "sample") %>%
dplyr::select(-sex.y) %>%
replace(is.na(.), "normal") %>%
dplyr::rename(sex = sex.x) %>%
unite("type", sex,normality, remove = FALSE)
# order the types for the observed proportions
samples_obs$type <- factor(samples_obs$type, level=c("female_normal", "male_normal", "male_abnormal"))
# make a table with site count per pooled data, for the three "types"
pooled_obs_count = samples_obs %>%
filter(gt =="0/0") %>%
group_by(type) %>%
mutate(count = sum(total)) %>%
select(c(type,count)) %>%
unique()
pooled_obs_count
## # A tibble: 2 × 2
## # Groups: type [2]
## type count
## <fct> <dbl>
## 1 female_normal 23
## 2 male_normal 14
We expect zero sites for this cross, since there is no paternal
inheritance to the males.
Indeed, there are only 24 sites in the F2 pooled females, and 15 for the
F2 pooled males:
# define a list to put all the data frames in
obs <- list()
for (fam in family) {
obs[[fam]] <- table %>%
dplyr::select(starts_with(fam)) %>%
dplyr::filter_at(vars(matches("_fnd")), all_vars(. == "1/1")) %>% # force F0 female to be homo, like her son
dplyr::filter_at(vars(matches("_son")), all_vars(. == "1/1")) %>%
dplyr::filter_at(vars(matches("_dat")), all_vars(. == "0/0")) %>%
dplyr::select(contains("grn")) %>%
tidyr::pivot_longer(everything()) %>%
dplyr::rename(sample = name, gt = value) %>%
dplyr::count(sample, gt, .drop = FALSE) %>%
dplyr::filter(gt %in% c("0/0", "1/1", "0/1")) %>%
mutate(n = as.numeric(n)) %>%
group_by(sample) %>%
mutate(total = as.numeric(sum(n))) %>%
mutate(sex = case_when(
grepl("son", sample) ~ "male",
grepl("dat", sample) ~ "female"))
}
# bind all families together, to a final data frame containing all observed counts
observed <- do.call("rbind", obs) %>% mutate(sample = as.character(sample))
samples <- data.frame(sample = rep(unique(observed$sample), each =3), gt = rep(c("0/0", "0/1", "1/1")))
# define the group of abnormal males
abnorm_males = tibble(sample = c("240_241c_grnson", "400_401a_grnson","412_413a_grnson", "426_427b_grnson", "458_459a_grnson", "46_47d_grnson"), sex = "male", normality = "abnormal")
samples_obs <- left_join(samples, observed, by=c("sample","gt")) %>% mutate(sex = case_when(
grepl("son", sample) ~ "male",
grepl("dat", sample) ~ "female")) %>%
group_by(sample) %>%
replace(is.na(.), 0) %>%
mutate(total = as.numeric(sum(n))) %>%
dplyr::mutate(prop = n/total) %>%
left_join(abnorm_males, by = "sample") %>%
dplyr::select(-sex.y) %>%
replace(is.na(.), "normal") %>%
dplyr::rename(sex = sex.x) %>%
unite("type", sex,normality, remove = FALSE)
# order the types for the observed proportions
samples_obs$type <- factor(samples_obs$type, level=c("female_normal", "male_normal", "male_abnormal"))
# make a table with site count per pooled data, for the three "types"
pooled_obs_count = samples_obs %>%
filter(gt =="0/0") %>%
group_by(type) %>%
mutate(count = sum(total)) %>%
select(c(type,count)) %>%
unique()
pooled_obs_count
## # A tibble: 2 × 2
## # Groups: type [2]
## type count
## <fct> <dbl>
## 1 female_normal 24
## 2 male_normal 15
# define a list to put all the data frames in
obs <- list()
for (fam in family) {
obs[[fam]] <- table %>%
dplyr::select(starts_with(fam)) %>%
dplyr::filter_at(vars(matches("_fnd")), all_vars(. == "0/0")) %>% # force F0 female to be homo, like her son
dplyr::filter_at(vars(matches("_son")), all_vars(. == "0/0")) %>%
dplyr::filter_at(vars(matches("_dat")), all_vars(. == "0/1")) %>%
dplyr::select(contains("grn")) %>%
tidyr::pivot_longer(everything()) %>%
dplyr::rename(sample = name, gt = value) %>%
dplyr::count(sample, gt, .drop = FALSE) %>%
dplyr::filter(gt %in% c("0/0", "1/1", "0/1")) %>%
mutate(n = as.numeric(n)) %>%
group_by(sample) %>%
mutate(total = as.numeric(sum(n))) %>%
mutate(sex = case_when(
grepl("son", sample) ~ "male",
grepl("dat", sample) ~ "female"))
}
# bind all families together, to a final data frame containing all observed counts
observed <- do.call("rbind", obs) %>% mutate(sample = as.character(sample))
samples <- data.frame(sample = rep(unique(observed$sample), each =3), gt = rep(c("0/0", "0/1", "1/1")))
# define the group of abnormal males
abnorm_males = tibble(sample = c("240_241c_grnson", "400_401a_grnson","412_413a_grnson", "426_427b_grnson", "458_459a_grnson", "46_47d_grnson"), sex = "male", normality = "abnormal")
samples_obs <- left_join(samples, observed, by=c("sample","gt")) %>% mutate(sex = case_when(
grepl("son", sample) ~ "male",
grepl("dat", sample) ~ "female")) %>%
group_by(sample) %>%
replace(is.na(.), 0) %>%
mutate(total = as.numeric(sum(n))) %>%
dplyr::mutate(prop = n/total) %>%
left_join(abnorm_males, by = "sample") %>%
dplyr::select(-sex.y) %>%
replace(is.na(.), "normal") %>%
dplyr::rename(sex = sex.x) %>%
unite("type", sex,normality, remove = FALSE)
# make a table with the expected proportions for the different modes of reproduction:
haploid = data.frame(mode = "haploid", gt = c("0/0", "0/1", "1/1"),prop = c(0.5, 0, 0.5))
automixis = data.frame(mode = "automixis", gt = c("0/0", "0/1", "1/1"),prop = c(0, 1, 0))
sexual = data.frame(mode = "sexual",gt = c("0/0", "0/1", "1/1"),prop = c(0.5, 0.5, 0))
modes = rbind(haploid,automixis,sexual)
# right_join(together, by = c("gt", "sex"),multiple = "all") %>%
#select(c("sample", "gt","sex","total" ,"mode", "prop"))
# order the modes
modes$mode <- factor(modes$mode, level=c("haploid", "automixis", "sexual"))
p_modes <- modes %>%
ggplot(aes(fill=gt, y=prop, x=mode)) +
geom_bar(position="fill", stat="identity", ) +
xlab("Possible reproductive modes") +
ylab("Genotype proportion") +
labs(title = "Expected proportions") +
labs(fill = "Genotype") +
theme_classic() +
theme(axis.text.x = element_text(size=12),
axis.text.y = element_text(size=12)) +
scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(legend.position = "none")
# order the types for the observed proportions
samples_obs$type <- factor(samples_obs$type, level=c("female_normal", "male_normal", "male_abnormal"))
# make a table with site count per pooled data, for the three "types"
pooled_obs_count = samples_obs %>%
filter(gt =="0/0") %>%
group_by(type) %>%
mutate(count = sum(total)) %>%
select(c(type,count)) %>%
unique()
p_samples <- samples_obs %>% dplyr::filter(total>=10) %>%
ggplot(aes(fill=gt, y=prop, x=type)) +
geom_bar(position="fill", stat="identity", ) +
scale_x_discrete("Genotype proportion", labels = c("Females","Normal
males", "Abnormal
males")) +
xlab("Observed genotype proportion") +
ylab("Genotype proportion") +
labs(title = "Observed proportions") +
labs(fill = "Genotype") +
theme_classic() +
theme(axis.text.x = element_text(size=12),
axis.text.y = element_text(size=12)) +
scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(legend.position = "none")
pooled_obs_count
## # A tibble: 3 × 2
## # Groups: type [3]
## type count
## <fct> <dbl>
## 1 male_normal 5768
## 2 female_normal 5263
## 3 male_abnormal 261
# Create plot with legend
p_legend <- p_samples +theme(legend.position = "bottom")
# Create user-defined function, which extracts legends from ggplots
extract_legend <- function(my_ggp) {
step1 <- ggplot_gtable(ggplot_build(my_ggp))
step2 <- which(sapply(step1$grobs, function(x) x$name) == "guide-box")
step3 <- step1$grobs[[step2]]
return(step3)
}
# Apply user-defined function to extract legend
shared_legend <- extract_legend(p_legend)
# Draw plots with shared legend
grid.arrange(arrangeGrob(p_modes, p_samples,ncol = 2), shared_legend, nrow = 2, heights = c(10, 1),top = grid::textGrob("Pooled offspring genotype, crossing male (0/0) x female (0/1)", x = 0, hjust = 0))
# and plot each individual separately
samples_obs %>% filter(sex == "male") %>%
dplyr::filter(total>=10) %>%
ggplot(aes(fill=gt, y=prop, x=sample)) +
geom_bar(position="fill", stat="identity", ) +
xlab("F2 offspring") +
ylab("Genotype proportion") +
labs(title = "Male offspring genotype, crossing male (0/0) x female (0/1)") +
geom_text(data = filter(samples_obs, gt == "1/1", sex =="male", total>=10), aes(label=total), vjust = 0) +
facet_wrap(~ type, scales ="free_x") +
labs(fill = "Genotype") +
theme_classic()+scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(axis.text.x=element_text(angle = 45, hjust = 1))+
theme(legend.position = "none")
samples_obs %>% filter(sex == "female") %>%
dplyr::filter(total>=10) %>%
ggplot(aes(fill=gt, y=prop, x=sample)) +
geom_bar(position="fill", stat="identity", ) +
xlab("F2 offspring") +
ylab("Genotype proportion") +
labs(title = "Female offspring genotype, crossing male (0/0) x female (0/1)") +
# geom_text(aes(label = n), position = position_stack(vjust = 0.5)) +
geom_text(data = filter(samples_obs, gt == "1/1", sex =="female", total>=10), aes(label=total), vjust = 0) +
labs(fill = "Genotype") +
theme_classic()+scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(axis.text.x=element_text(angle = 45, hjust = 1))+
theme(legend.position = "none")
# define a list to put all the data frames in
obs <- list()
for (fam in family) {
obs[[fam]] <- table %>%
dplyr::select(starts_with(fam)) %>%
dplyr::filter_at(vars(matches("_fnd")), all_vars(. == "1/1")) %>% # force F0 female to be homo, like her son
dplyr::filter_at(vars(matches("_son")), all_vars(. == "1/1")) %>%
dplyr::filter_at(vars(matches("_dat")), all_vars(. == "0/1")) %>%
dplyr::select(contains("grn")) %>%
tidyr::pivot_longer(everything()) %>%
dplyr::rename(sample = name, gt = value) %>%
dplyr::count(sample, gt, .drop = FALSE) %>%
dplyr::filter(gt %in% c("0/0", "1/1", "0/1")) %>%
mutate(n = as.numeric(n)) %>%
group_by(sample) %>%
mutate(total = as.numeric(sum(n))) %>%
mutate(sex = case_when(
grepl("son", sample) ~ "male",
grepl("dat", sample) ~ "female"))
}
# bind all families together, to a final data frame containing all observed counts
observed <- do.call("rbind", obs) %>% mutate(sample = as.character(sample))
samples <- data.frame(sample = rep(unique(observed$sample), each =3), gt = rep(c("0/0", "0/1", "1/1")))
# define the group of abnormal males
abnorm_males = tibble(sample = c("240_241c_grnson", "400_401a_grnson","412_413a_grnson", "426_427b_grnson", "458_459a_grnson", "46_47d_grnson"), sex = "male", normality = "abnormal")
samples_obs <- left_join(samples, observed, by=c("sample","gt")) %>% mutate(sex = case_when(
grepl("son", sample) ~ "male",
grepl("dat", sample) ~ "female")) %>%
group_by(sample) %>%
replace(is.na(.), 0) %>%
mutate(total = as.numeric(sum(n))) %>%
dplyr::mutate(prop = n/total) %>%
left_join(abnorm_males, by = "sample") %>%
dplyr::select(-sex.y) %>%
replace(is.na(.), "normal") %>%
dplyr::rename(sex = sex.x) %>%
unite("type", sex,normality, remove = FALSE)
# make a table with the expected proportions for the different modes of reproduction:
haploid = data.frame(mode = "haploid", gt = c("0/0", "0/1", "1/1"),prop = c(0.5, 0, 0.5))
automixis = data.frame(mode = "automixis", gt = c("0/0", "0/1", "1/1"),prop = c(0, 1, 0))
sexual = data.frame(mode = "sexual",gt = c("0/0", "0/1", "1/1"),prop = c(0, 0.5, 0.5))
modes = rbind(haploid,automixis,sexual)
# order the modes
modes$mode <- factor(modes$mode, level=c("haploid", "automixis", "sexual"))
p_modes <- modes %>%
ggplot(aes(fill=gt, y=prop, x=mode)) +
geom_bar(position="fill", stat="identity", ) +
xlab("Possible reproductive modes") +
ylab("Genotype proportion") +
labs(title = "Expected proportions") +
labs(fill = "Genotype") +
theme_classic() +
theme(axis.text.x = element_text(size=12),
axis.text.y = element_text(size=12)) +
scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(legend.position = "none")
# order the types for the observed proportions
samples_obs$type <- factor(samples_obs$type, level=c("female_normal", "male_normal", "male_abnormal"))
# make a table with site count per pooled data, for the three "types"
pooled_obs_count = samples_obs %>%
filter(gt =="0/0") %>%
group_by(type) %>%
mutate(count = sum(total)) %>%
select(c(type,count)) %>%
unique()
p_samples <- samples_obs %>% dplyr::filter(total>=10) %>%
ggplot(aes(fill=gt, y=prop, x=type)) +
geom_bar(position="fill", stat="identity", ) +
scale_x_discrete("Genotype proportion", labels = c("Females","Normal
males", "Abnormal
males")) +
xlab("Observed genotype proportion") +
ylab("Genotype proportion") +
labs(title = "Observed proportions") +
labs(fill = "Genotype") +
theme_classic() +
theme(axis.text.x = element_text(size=12),
axis.text.y = element_text(size=12)) +
scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(legend.position = "none")
pooled_obs_count
## # A tibble: 3 × 2
## # Groups: type [3]
## type count
## <fct> <dbl>
## 1 male_normal 5965
## 2 female_normal 5450
## 3 male_abnormal 190
# Create plot with legend
p_legend <- p_samples +theme(legend.position = "bottom")
# Create user-defined function, which extracts legends from ggplots
extract_legend <- function(my_ggp) {
step1 <- ggplot_gtable(ggplot_build(my_ggp))
step2 <- which(sapply(step1$grobs, function(x) x$name) == "guide-box")
step3 <- step1$grobs[[step2]]
return(step3)
}
# Apply user-defined function to extract legend
shared_legend <- extract_legend(p_legend)
# Draw plots with shared legend
grid.arrange(arrangeGrob(p_modes, p_samples,ncol = 2), shared_legend, nrow = 2, heights = c(10, 1),top = grid::textGrob("Pooled offspring genotype, crossing male (1/1) x female (0/1)", x = 0, hjust = 0))
# and plot each individual separately
samples_obs %>% filter(sex == "male") %>%
dplyr::filter(total>=10) %>%
ggplot(aes(fill=gt, y=prop, x=sample)) +
geom_bar(position="fill", stat="identity", ) +
xlab("F2 offspring") +
ylab("Genotype proportion") +
labs(title = "Male offspring genotype, crossing male (1/1) x female (0/1)") +
geom_text(data = filter(samples_obs, gt == "1/1", sex =="male", total>=10), aes(label=total), vjust = 0) +
facet_wrap(~ type, scales ="free_x") +
labs(fill = "Genotype") +
theme_classic()+scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(axis.text.x=element_text(angle = 45, hjust = 1))+
theme(legend.position = "none")
samples_obs %>% filter(sex == "female") %>%
dplyr::filter(total>=10) %>%
ggplot(aes(fill=gt, y=prop, x=sample)) +
geom_bar(position="fill", stat="identity", ) +
xlab("F2 offspring") +
ylab("Genotype proportion") +
labs(title = "Female offspring genotype, crossing male (1/1) x female (0/1)") +
# geom_text(aes(label = n), position = position_stack(vjust = 0.5)) +
geom_text(data = filter(samples_obs, gt == "1/1", sex =="female", total>=10), aes(label=total), vjust = 0) +
labs(fill = "Genotype") +
theme_classic()+scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(axis.text.x=element_text(angle = 45, hjust = 1))+
theme(legend.position = "none")
The former crosses of heterozygotic females (5 and 6) show that F2
males can be heterozygotic and carry two alleles, like their
mother.
But are these sites real? and if they are, can they be
transmitted to their daughters?
# define a list to put all the data frames in
obs <- list()
for (fam in family) {
obs[[fam]] <- table %>%
dplyr::select(starts_with(fam)) %>%
dplyr::filter_at(vars(matches("_fnd")), all_vars(. == "0/1")) %>% # force F0 female to be homo, like her son
dplyr::filter_at(vars(matches("_son")), all_vars(. == "0/1")) %>%
dplyr::filter_at(vars(matches("_dat")), all_vars(. == "0/0")) %>%
dplyr::select(contains("grn")) %>%
tidyr::pivot_longer(everything()) %>%
dplyr::rename(sample = name, gt = value) %>%
dplyr::count(sample, gt, .drop = FALSE) %>%
dplyr::filter(gt %in% c("0/0", "1/1", "0/1")) %>%
mutate(n = as.numeric(n)) %>%
group_by(sample) %>%
mutate(total = as.numeric(sum(n))) %>%
mutate(sex = case_when(
grepl("son", sample) ~ "male",
grepl("dat", sample) ~ "female"))
}
# bind all families together, to a final data frame containing all observed counts
observed <- do.call("rbind", obs) %>% mutate(sample = as.character(sample))
samples <- data.frame(sample = rep(unique(observed$sample), each =3), gt = rep(c("0/0", "0/1", "1/1")))
# define the group of abnormal males
abnorm_males = tibble(sample = c("240_241c_grnson", "400_401a_grnson","412_413a_grnson", "426_427b_grnson", "458_459a_grnson", "46_47d_grnson"), sex = "male", normality = "abnormal")
samples_obs <- left_join(samples, observed, by=c("sample","gt")) %>% mutate(sex = case_when(
grepl("son", sample) ~ "male",
grepl("dat", sample) ~ "female")) %>%
group_by(sample) %>%
replace(is.na(.), 0) %>%
mutate(total = as.numeric(sum(n))) %>%
dplyr::mutate(prop = n/total) %>%
left_join(abnorm_males, by = "sample") %>%
dplyr::select(-sex.y) %>%
replace(is.na(.), "normal") %>%
dplyr::rename(sex = sex.x) %>%
unite("type", sex,normality, remove = FALSE)
# make a table with the expected proportions for the different modes of reproduction:
haploid = data.frame(mode = "haploid", gt = c("0/0", "0/1", "1/1"),prop = c(1, 0, 0))
automixis = data.frame(mode = "automixis", gt = c("0/0", "0/1", "1/1"),prop = c(1, 0, 0))
sexual = data.frame(mode = "sexual",gt = c("0/0", "0/1", "1/1"),prop = c(0.5, 0.5, 0))
modes = rbind(haploid,automixis,sexual)
# order the modes
modes$mode <- factor(modes$mode, level=c("haploid", "automixis", "sexual"))
p_modes <- modes %>%
ggplot(aes(fill=gt, y=prop, x=mode)) +
geom_bar(position="fill", stat="identity", ) +
xlab("Possible reproductive modes") +
ylab("Genotype proportion") +
labs(title = "Expected proportions") +
labs(fill = "Genotype") +
theme_classic() +
theme(axis.text.x = element_text(size=12),
axis.text.y = element_text(size=12)) +
scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(legend.position = "none")
# order the types for the observed proportions
samples_obs$type <- factor(samples_obs$type, level=c("female_normal", "male_normal", "male_abnormal"))
# make a table with site count per pooled data, for the three "types"
pooled_obs_count = samples_obs %>%
filter(gt =="0/0") %>%
group_by(type) %>%
mutate(count = sum(total)) %>%
select(c(type,count)) %>%
unique()
p_samples <- samples_obs %>% dplyr::filter(total>=10) %>%
ggplot(aes(fill=gt, y=prop, x=type)) +
geom_bar(position="fill", stat="identity", ) +
scale_x_discrete("Genotype proportion", labels = c("Females","Normal
males", "Abnormal
males")) +
xlab("Observed genotype proportion") +
ylab("Genotype proportion") +
labs(title = "Observed proportions") +
labs(fill = "Genotype") +
theme_classic() +
theme(axis.text.x = element_text(size=12),
axis.text.y = element_text(size=12)) +
scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(legend.position = "none")
pooled_obs_count
## # A tibble: 3 × 2
## # Groups: type [3]
## type count
## <fct> <dbl>
## 1 male_normal 2154
## 2 female_normal 2834
## 3 male_abnormal 219
# Create plot with legend
p_legend <- p_samples +theme(legend.position = "bottom")
# Create user-defined function, which extracts legends from ggplots
extract_legend <- function(my_ggp) {
step1 <- ggplot_gtable(ggplot_build(my_ggp))
step2 <- which(sapply(step1$grobs, function(x) x$name) == "guide-box")
step3 <- step1$grobs[[step2]]
return(step3)
}
# Apply user-defined function to extract legend
shared_legend <- extract_legend(p_legend)
# Draw plots with shared legend
grid.arrange(arrangeGrob(p_modes, p_samples,ncol = 2), shared_legend, nrow = 2, heights = c(10, 1),top = grid::textGrob("Pooled offspring genotype, crossing male (0/1) x female (0/0)", x = 0, hjust = 0))
# and plot each individual separately
samples_obs %>% filter(sex == "male") %>%
dplyr::filter(total>=10) %>%
ggplot(aes(fill=gt, y=prop, x=sample)) +
geom_bar(position="fill", stat="identity", ) +
xlab("F2 offspring") +
ylab("Genotype proportion") +
labs(title = "Male offspring genotype, crossing male (0/1) x female (0/0)") +
geom_text(data = filter(samples_obs, gt == "1/1", sex =="male", total>=10), aes(label=total), vjust = 0) +
facet_wrap(~ type, scales ="free_x") +
labs(fill = "Genotype") +
theme_classic()+scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(axis.text.x=element_text(angle = 45, hjust = 1))+
theme(legend.position = "none")
samples_obs %>% filter(sex == "female") %>%
dplyr::filter(total>=10) %>%
ggplot(aes(fill=gt, y=prop, x=sample)) +
geom_bar(position="fill", stat="identity", ) +
xlab("F2 offspring") +
ylab("Genotype proportion") +
labs(title = "Female offspring genotype, crossing male (0/1) x female (0/0)") +
# geom_text(aes(label = n), position = position_stack(vjust = 0.5)) +
geom_text(data = filter(samples_obs, gt == "1/1", sex =="female", total>=10), aes(label=total), vjust = 0) +
labs(fill = "Genotype") +
theme_classic()+scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(axis.text.x=element_text(angle = 45, hjust = 1))+
theme(legend.position = "none")
# define a list to put all the data frames in
obs <- list()
for (fam in family) {
obs[[fam]] <- table %>%
dplyr::select(starts_with(fam)) %>%
dplyr::filter_at(vars(matches("_fnd")), all_vars(. == "0/1")) %>% # force F0 female to be homo, like her son
dplyr::filter_at(vars(matches("_son")), all_vars(. == "0/1")) %>%
dplyr::filter_at(vars(matches("_dat")), all_vars(. == "1/1")) %>%
dplyr::select(contains("grn")) %>%
tidyr::pivot_longer(everything()) %>%
dplyr::rename(sample = name, gt = value) %>%
dplyr::count(sample, gt, .drop = FALSE) %>%
dplyr::filter(gt %in% c("0/0", "1/1", "0/1")) %>%
mutate(n = as.numeric(n)) %>%
group_by(sample) %>%
mutate(total = as.numeric(sum(n))) %>%
mutate(sex = case_when(
grepl("son", sample) ~ "male",
grepl("dat", sample) ~ "female"))
}
# bind all families together, to a final data frame containing all observed counts
observed <- do.call("rbind", obs) %>% mutate(sample = as.character(sample))
samples <- data.frame(sample = rep(unique(observed$sample), each =3), gt = rep(c("0/0", "0/1", "1/1")))
# define the group of abnormal males
abnorm_males = tibble(sample = c("240_241c_grnson", "400_401a_grnson","412_413a_grnson", "426_427b_grnson", "458_459a_grnson", "46_47d_grnson"), sex = "male", normality = "abnormal")
samples_obs <- left_join(samples, observed, by=c("sample","gt")) %>% mutate(sex = case_when(
grepl("son", sample) ~ "male",
grepl("dat", sample) ~ "female")) %>%
group_by(sample) %>%
replace(is.na(.), 0) %>%
mutate(total = as.numeric(sum(n))) %>%
dplyr::mutate(prop = n/total) %>%
left_join(abnorm_males, by = "sample") %>%
dplyr::select(-sex.y) %>%
replace(is.na(.), "normal") %>%
dplyr::rename(sex = sex.x) %>%
unite("type", sex,normality, remove = FALSE)
# make a table with the expected proportions for the different modes of reproduction:
haploid = data.frame(mode = "haploid", gt = c("0/0", "0/1", "1/1"),prop = c(0, 0, 1))
automixis = data.frame(mode = "automixis", gt = c("0/0", "0/1", "1/1"),prop = c(0, 0, 1))
sexual = data.frame(mode = "sexual",gt = c("0/0", "0/1", "1/1"),prop = c(0, 0.5, 0.5))
modes = rbind(haploid,automixis,sexual)
# order the modes
modes$mode <- factor(modes$mode, level=c("haploid", "automixis", "sexual"))
p_modes <- modes %>%
ggplot(aes(fill=gt, y=prop, x=mode)) +
geom_bar(position="fill", stat="identity", ) +
xlab("Possible reproductive modes") +
ylab("Genotype proportion") +
labs(title = "Expected proportions") +
labs(fill = "Genotype") +
theme_classic() +
theme(axis.text.x = element_text(size=12),
axis.text.y = element_text(size=12)) +
scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(legend.position = "none")
# order the types for the observed proportions
samples_obs$type <- factor(samples_obs$type, level=c("female_normal", "male_normal", "male_abnormal"))
# make a table with site count per pooled data, for the three "types"
pooled_obs_count = samples_obs %>%
filter(gt =="0/0") %>%
group_by(type) %>%
mutate(count = sum(total)) %>%
select(c(type,count)) %>%
unique()
p_samples <- samples_obs %>% dplyr::filter(total>=10) %>%
ggplot(aes(fill=gt, y=prop, x=type)) +
geom_bar(position="fill", stat="identity", ) +
scale_x_discrete("Genotype proportion", labels = c("Females","Normal
males", "Abnormal
males")) +
xlab("Observed genotype proportion") +
ylab("Genotype proportion") +
labs(title = "Observed proportions") +
labs(fill = "Genotype") +
theme_classic() +
theme(axis.text.x = element_text(size=12),
axis.text.y = element_text(size=12)) +
scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(legend.position = "none")
pooled_obs_count
## # A tibble: 3 × 2
## # Groups: type [3]
## type count
## <fct> <dbl>
## 1 male_normal 1597
## 2 female_normal 2437
## 3 male_abnormal 59
# Create plot with legend
p_legend <- p_samples +theme(legend.position = "bottom")
# Create user-defined function, which extracts legends from ggplots
extract_legend <- function(my_ggp) {
step1 <- ggplot_gtable(ggplot_build(my_ggp))
step2 <- which(sapply(step1$grobs, function(x) x$name) == "guide-box")
step3 <- step1$grobs[[step2]]
return(step3)
}
# Apply user-defined function to extract legend
shared_legend <- extract_legend(p_legend)
# Draw plots with shared legend
grid.arrange(arrangeGrob(p_modes, p_samples,ncol = 2), shared_legend, nrow = 2, heights = c(10, 1),top = grid::textGrob("Pooled offspring genotype, crossing male (0/1) x female (1/1)", x = 0, hjust = 0))
# and plot each individual separately
samples_obs %>% filter(sex == "male") %>%
dplyr::filter(total>=10) %>%
ggplot(aes(fill=gt, y=prop, x=sample)) +
geom_bar(position="fill", stat="identity", ) +
xlab("F2 offspring") +
ylab("Genotype proportion") +
labs(title = "Male offspring genotype, crossing male (0/1) x female (1/1)") +
geom_text(data = filter(samples_obs, gt == "1/1", sex =="male", total>=10), aes(label=total), vjust = 0) +
facet_wrap(~ type, scales ="free_x") +
labs(fill = "Genotype") +
theme_classic()+scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(axis.text.x=element_text(angle = 45, hjust = 1))+
theme(legend.position = "none")
samples_obs %>% filter(sex == "female") %>%
dplyr::filter(total>=10) %>%
ggplot(aes(fill=gt, y=prop, x=sample)) +
geom_bar(position="fill", stat="identity", ) +
xlab("F2 offspring") +
ylab("Genotype proportion") +
labs(title = "Female offspring genotype, crossing male (0/1) x female (1/1)") +
# geom_text(aes(label = n), position = position_stack(vjust = 0.5)) +
geom_text(data = filter(samples_obs, gt == "1/1", sex =="female", total>=10), aes(label=total), vjust = 0) +
labs(fill = "Genotype") +
theme_classic()+scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(axis.text.x=element_text(angle = 45, hjust = 1))+
theme(legend.position = "none")
# define a list to put all the data frames in
obs <- list()
for (fam in family) {
obs[[fam]] <- table %>%
dplyr::select(starts_with(fam)) %>%
dplyr::filter_at(vars(matches("_fnd")), all_vars(. == "0/1")) %>% # force F0 female to be homo, like her son
dplyr::filter_at(vars(matches("_son")), all_vars(. == "0/1")) %>%
dplyr::filter_at(vars(matches("_dat")), all_vars(. == "0/1")) %>%
dplyr::select(contains("grn")) %>%
tidyr::pivot_longer(everything()) %>%
dplyr::rename(sample = name, gt = value) %>%
dplyr::count(sample, gt, .drop = FALSE) %>%
dplyr::filter(gt %in% c("0/0", "1/1", "0/1")) %>%
mutate(n = as.numeric(n)) %>%
group_by(sample) %>%
mutate(total = as.numeric(sum(n))) %>%
mutate(sex = case_when(
grepl("son", sample) ~ "male",
grepl("dat", sample) ~ "female"))
}
# bind all families together, to a final data frame containing all observed counts
observed <- do.call("rbind", obs) %>% mutate(sample = as.character(sample))
samples <- data.frame(sample = rep(unique(observed$sample), each =3), gt = rep(c("0/0", "0/1", "1/1")))
# define the group of abnormal males
abnorm_males = tibble(sample = c("240_241c_grnson", "400_401a_grnson","412_413a_grnson", "426_427b_grnson", "458_459a_grnson", "46_47d_grnson"), sex = "male", normality = "abnormal")
samples_obs <- left_join(samples, observed, by=c("sample","gt")) %>% mutate(sex = case_when(
grepl("son", sample) ~ "male",
grepl("dat", sample) ~ "female")) %>%
group_by(sample) %>%
replace(is.na(.), 0) %>%
mutate(total = as.numeric(sum(n))) %>%
dplyr::mutate(prop = n/total) %>%
left_join(abnorm_males, by = "sample") %>%
dplyr::select(-sex.y) %>%
replace(is.na(.), "normal") %>%
dplyr::rename(sex = sex.x) %>%
unite("type", sex,normality, remove = FALSE)
# make a table with the expected proportions for the different modes of reproduction:
haploid = data.frame(mode = "haploid", gt = c("0/0", "0/1", "1/1"),prop = c(0.5, 0, 0.5))
automixis = data.frame(mode = "automixis", gt = c("0/0", "0/1", "1/1"),prop = c(0, 1, 0))
sexual = data.frame(mode = "sexual",gt = c("0/0", "0/1", "1/1"),prop = c(0.25, 0.5, 0.25))
modes = rbind(haploid,automixis,sexual)
# order the modes
modes$mode <- factor(modes$mode, level=c("haploid", "automixis", "sexual"))
p_modes <- modes %>%
ggplot(aes(fill=gt, y=prop, x=mode)) +
geom_bar(position="fill", stat="identity", ) +
xlab("Possible reproductive modes") +
ylab("Genotype proportion") +
labs(title = "Expected proportions") +
labs(fill = "Genotype") +
theme_classic() +
theme(axis.text.x = element_text(size=12),
axis.text.y = element_text(size=12)) +
scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(legend.position = "none")
# order the types for the observed proportions
samples_obs$type <- factor(samples_obs$type, level=c("female_normal", "male_normal", "male_abnormal"))
# make a table with site count per pooled data, for the three "types"
pooled_obs_count = samples_obs %>%
filter(gt =="0/0") %>%
group_by(type) %>%
mutate(count = sum(total)) %>%
select(c(type,count)) %>%
unique()
p_samples <- samples_obs %>% dplyr::filter(total>=10) %>%
ggplot(aes(fill=gt, y=prop, x=type)) +
geom_bar(position="fill", stat="identity", ) +
scale_x_discrete("Genotype proportion", labels = c("Females","Normal
males", "Abnormal
males")) +
xlab("Observed genotype proportion") +
ylab("Genotype proportion") +
labs(title = "Observed proportions") +
labs(fill = "Genotype") +
theme_classic() +
theme(axis.text.x = element_text(size=12),
axis.text.y = element_text(size=12)) +
scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(legend.position = "none")
pooled_obs_count
## # A tibble: 3 × 2
## # Groups: type [3]
## type count
## <fct> <dbl>
## 1 male_normal 3990
## 2 female_normal 5452
## 3 male_abnormal 415
# Create plot with legend
p_legend <- p_samples +theme(legend.position = "bottom")
# Create user-defined function, which extracts legends from ggplots
extract_legend <- function(my_ggp) {
step1 <- ggplot_gtable(ggplot_build(my_ggp))
step2 <- which(sapply(step1$grobs, function(x) x$name) == "guide-box")
step3 <- step1$grobs[[step2]]
return(step3)
}
# Apply user-defined function to extract legend
shared_legend <- extract_legend(p_legend)
# Draw plots with shared legend
grid.arrange(arrangeGrob(p_modes, p_samples,ncol = 2), shared_legend, nrow = 2, heights = c(10, 1),top = grid::textGrob("Pooled offspring genotype, crossing male (0/1) x female (0/1)", x = 0, hjust = 0))
# and plot each individual separately
samples_obs %>% filter(sex == "male") %>%
dplyr::filter(total>=10) %>%
ggplot(aes(fill=gt, y=prop, x=sample)) +
geom_bar(position="fill", stat="identity", ) +
xlab("F2 offspring") +
ylab("Genotype proportion") +
labs(title = "Male offspring genotype, crossing male (0/1) x female (0/1)") +
geom_text(data = filter(samples_obs, gt == "1/1", sex =="male", total>=10), aes(label=total), vjust = 0) +
facet_wrap(~ type, scales ="free_x") +
labs(fill = "Genotype") +
theme_classic()+scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(axis.text.x=element_text(angle = 45, hjust = 1))+
theme(legend.position = "none")
samples_obs %>% filter(sex == "female") %>%
dplyr::filter(total>=10) %>%
ggplot(aes(fill=gt, y=prop, x=sample)) +
geom_bar(position="fill", stat="identity", ) +
xlab("F2 offspring") +
ylab("Genotype proportion") +
labs(title = "Female offspring genotype, crossing male (0/1) x female (0/1)") +
# geom_text(aes(label = n), position = position_stack(vjust = 0.5)) +
geom_text(data = filter(samples_obs, gt == "1/1", sex =="female", total>=10), aes(label=total), vjust = 0) +
labs(fill = "Genotype") +
theme_classic()+scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))+
theme(axis.text.x=element_text(angle = 45, hjust = 1))+
theme(legend.position = "none")
We did 2 types of a-parametric tests for independence, testing the predicted counts against the observed counts: