Modified from hybridViability.Rmd (Oct 2017)
Identify reproductive barriers between two sympatric moth-grmlinated plant species, Schiedea kaalae and S. hookeri by fitting a generalized linear mixed model (GLMM).
In the experimental design, the following crosstypes were made:
In this analysis the response variable is the grmlen viability of each cross. Other barriers (hybrid survival, flowering) could be analyzed in a similar framework, with appropriate changes to the underlying distribution.
Fixed effects:
Potential random effects:
grm <- read.table("germination.csv", header=T, sep="\t", colClasses=c(crossid="factor", potid="factor",mompop="factor",dadpop="factor"))
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:plyr':
arrange, count, desc, failwith, id, mutate, rename, summarise,
summarize
The following object is masked from 'package:bbmle':
slice
The following object is masked from 'package:car':
recode
The following object is masked from 'package:MASS':
select
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
grm <- as.data.frame(summarise(group_by(grm, crossid, mompop, momid, species, dadpop, dadid, momsp, crosstype), planted = sum(planted), germ = sum(germ)))
grm$vp <- grm$germ/grm$planted
grm$cross <- factor(toupper(paste0(substr(grm$species,0,1), substr(grm$dadsp,0,1))))
crosscol <- c("green","blue","orange","red")
#rename crosstype codes
levels(grm$crosstype) <- c("between", "hybrid", "within")
grm$crosstype <- factor(grm$crosstype, levels=c("between", "within", "hybrid"))
#made "between" the first reference level to facilitate comparison between outcrossing populations and hybridizing species
grm$mompop <- sapply(grm$mompop, mapvalues, from = c("794","866","899","879","892","904","3587"), to = c("WK","WK","WK","879WKG","892WKG","904WPG","3587WP"), warn_missing=F)
grm$dadpop <- sapply(grm$dadpop, mapvalues, from = c("794","866","899","879","892","904","3587"), to = c("WK","WK","WK","879WKG","892WKG","904WPG","3587WP"), warn_missing=F)
#define interactions
grm <- within(grm, sxc <- interaction(species,crosstype))
grm <- within(grm, sxcxm <- interaction(species,crosstype,mompop,momid))
grm <- within(grm, mompid <- as.factor(paste(mompop,momid,sep=".")))
grm <- within(grm, dadpid <- as.factor(paste(dadpop,dadid,sep=".")))
grm <- within(grm, smompop <- as.factor(paste(species,mompop,sep="")))
#check final structure
str(grm)
'data.frame': 236 obs. of 17 variables:
$ crossid : Factor w/ 235 levels "1","10","100",..: 1 2 3 4 5 6 7 8 9 10 ...
$ mompop : Factor w/ 5 levels "3587WP","WK",..: 3 3 2 2 2 2 2 2 2 4 ...
$ momid : Factor w/ 17 levels "1","10","10-1",..: 8 8 2 2 4 4 4 4 4 1 ...
$ species : Factor w/ 2 levels "hook","kaal": 1 1 1 1 1 1 1 1 1 2 ...
$ dadpop : Factor w/ 5 levels "3587WP","WK",..: 3 5 1 1 3 4 5 5 1 3 ...
$ dadid : Factor w/ 18 levels "1","10","10-1",..: 16 8 5 15 9 1 8 12 5 9 ...
$ momsp : Factor w/ 2 levels "hook","kaal": 1 2 2 2 1 2 2 2 2 1 ...
$ crosstype: Factor w/ 3 levels "between","within",..: 2 3 3 3 1 3 3 3 3 3 ...
$ planted : int 52 24 2 12 10 12 10 30 10 18 ...
$ germ : int 38 22 2 10 10 11 2 26 9 8 ...
$ vp : num 0.731 0.917 1 0.833 1 ...
$ cross : Factor w/ 2 levels "H","K": 1 1 1 1 1 1 1 1 1 2 ...
$ sxc : Factor w/ 6 levels "hook.between",..: 3 5 5 5 1 5 5 5 5 6 ...
$ sxcxm : Factor w/ 510 levels "hook.between.3587WP.1",..: 225 227 41 41 97 101 101 101 101 24 ...
$ mompid : Factor w/ 24 levels "3587WP.10","3587WP.14",..: 8 8 21 21 22 22 22 22 22 12 ...
$ dadpid : Factor w/ 25 levels "3587WP.10","3587WP.14",..: 10 19 2 6 9 13 19 21 2 9 ...
$ smompop : Factor w/ 5 levels "hook879WKG","hookWK",..: 1 1 2 2 2 2 2 2 2 4 ...
The sample sizes are unbalanced at all levels, including maternal population:
reptab <- with(grm, table(smompop,crosstype))
mosaic(reptab, pop=F)
labeling_cells(text = reptab, margin = 0)(reptab)
Replication is low for some within-population crosses. The replication is even lower for each maternal plant, so we need to be wary of estimates when subsetting at this level:
with(grm, kable(table(mompid,crosstype)))
| between | within | hybrid | |
|---|---|---|---|
| 3587WP.10 | 1 | 1 | 3 |
| 3587WP.14 | 3 | 0 | 8 |
| 3587WP.15 | 2 | 2 | 3 |
| 3587WP.7 | 4 | 1 | 6 |
| 3587WP.A | 1 | 2 | 5 |
| 3587WP.C | 3 | 1 | 3 |
| 879WKG.10-1 | 4 | 4 | 8 |
| 879WKG.2-2 | 2 | 3 | 9 |
| 879WKG.G-2 | 3 | 2 | 7 |
| 879WKG.H-2 | 0 | 1 | 2 |
| 879WKG.N-5 | 3 | 3 | 7 |
| 892WKG.1 | 4 | 2 | 7 |
| 892WKG.10 | 1 | 0 | 2 |
| 892WKG.2 | 1 | 0 | 0 |
| 892WKG.3 | 1 | 2 | 3 |
| 892WKG.4 | 1 | 0 | 5 |
| 892WKG.5 | 3 | 2 | 5 |
| 904WPG.2 | 2 | 2 | 8 |
| 904WPG.3 | 3 | 2 | 7 |
| 904WPG.5 | 7 | 2 | 8 |
| WK.10 | 4 | 1 | 8 |
| WK.11 | 2 | 0 | 4 |
| WK.2 | 4 | 3 | 8 |
| WK.4 | 4 | 2 | 9 |
To identify the best-fitting distribution, we make quantile-quantile plots of the raw data against various distributions. The more points within the confidence interval envelopes, the better the fit. Later, we present quantile-quantile plots of the model residuals to assess model fit.
ggplot(grm, aes(x = vp, fill=species)) +
geom_histogram(data=subset(grm,species == "hook"), aes(y=-..density..),binwidth=0.05)+
geom_histogram(data=subset(grm,species == "kaal"), aes(y= ..density..),binwidth=0.05)+
coord_flip() + facet_grid(~crosstype) + labs(y="Histogram", x="Viability")
ggplot(aes(y=vp, x=mompid, color=crosstype), data=grm) + geom_count(alpha=0.8) + coord_flip() + labs(x="Maternal plant", y="vp")
Our mixed model uses one parameter to capture random effect variance, which is assumed to be homogeneous. Plotting on a log scale should uncouple variances from means to assess this visually. Subsets are species * crosstype * maternal plant.
Subset variances are not homogeneous:
ggplot(aes(y=log(vp+1), x=sxcxm, color=crosstype), data=grm) + geom_boxplot() + coord_flip() + labs(y="ln(Viability + 1)",x="Subsets")
Various distributions make different assumptions about the mean-variance (µ-Var) ratio.
grpVars <- with(grm, tapply(vp, list(sxcxm), var))
grpMeans <- with(grm, tapply(vp, list(sxcxm), mean))
grpCounts <- with(grm, tapply(vp, list(sxcxm), length))
#set weight=grpCounts to weight loess by sample sizes
ggplot(na.omit(data.frame(grpMeans,grpVars,grpCounts)),
aes(x=grpMeans,y=grpVars, weight=1))+geom_point(aes(size=grpCounts))+
guides(colour=guide_legend(title="Fit"),size=guide_legend(title="Sample size")) + labs(x="Subset Mean", y="Subset Variance") + labs(subtitle="Subset: species*crosstype*mompid")
Effects and interactions in these plots are simply given by the mean, which may be unduly influenced by high values.
intplot <- ggplot(grm,aes(x=crosstype,y=vp))+
geom_count(aes(size = ..prop.., group=sxc),alpha=0.5)+
stat_summary(aes(x=as.numeric(crosstype)),fun.y=mean,geom="line")+ facet_grid(~species)
intplot + aes(group=species, color=species)
intplot + aes(group=mompop, color=mompop)
intplot + aes(group=mompid, color=mompop)
intplot + aes(group=dadpop, color=dadpop)
Run many generalized linear models on subsets of the data defined by crosstype | mompid to see if effects estimates are consistent within maternal plants.
Most maternal plant subsets agree, but some are problematic outliers. These plants can be picked out visually from the random effects interaction plot above, the estimated parameters of each subset model, and the QQ plot of the estimated parameters:
#had to get rid of species or mompid since mompid is nested inside species. dadpop also works
glm.lis <- lmList(vp~crosstype|mompid,data=grm, family="binomial", weights=grm$planted)
plot.lmList(glm.lis,scale=list(x=list(relation="free")))
Loading required package: reshape
Attaching package: 'reshape'
The following object is masked from 'package:dplyr':
rename
The following objects are masked from 'package:plyr':
rename, round_any
The following object is masked from 'package:Matrix':
expand
Using grp as id variables
qqmath.lmList(glm.lis)#
Using as id variables
We constructed the following models with the package glmmADMB. They all have the same fixed effects, species x crosstype, and response variable, vp
| distribution, Random Effects: | None | Maternal plant | Maternal population |
|---|---|---|---|
| normal (norm) | X | X | X |
#Normal (Gaussian) distribution, identity link
#sc.norm <- glmmadmb(vp~species*crosstype, data=grm, family="gaussian")
#sc.mix.mompid <- glmmadmb(vp~species*crosstype + (1|mompid), data=grm, family="gaussian")
#sc.mix.mompop <- glmmadmb(vp~species*crosstype + (1|mompop), data=grm, family="gaussian")
sc.bin <- glm(vp~species*crosstype, data=grm, family="binomial", weights=grm$planted)
sc.mix.mompid.bin <- glmer(vp~species*crosstype + (1|mompid), data=grm, family="binomial", weights=grm$planted)
sc.mix.mompop.bin <- glmer(vp~species*crosstype + (1|mompop), data=grm, family="binomial", weights=grm$planted)
sc.mix.momdadpid.bin <- glmer(vp~species*crosstype + (1|mompid) + (1|dadpid), data=grm, family="binomial", weights=grm$planted)
#library(glmmTMB)
#sc.b <- glmmTMB(vp~species*crosstype, data=grm, family=list(family="beta",link="logit"))
#sc.mix.mompid.b <- glmmTMB(vp~species*crosstype + (1|mompid), data=grm, family=list(family="beta",link="logit"))
#sc.mix.mompop.b <- glmmTMB(vp~species*crosstype + (1|mompop), data=grm, family=list(family="beta",link="logit"))
We will use the Aikake Information Criterion to pick the model the best fits the data, penalized by the number of parameters. Differences of 2 units are significant.
#AICtab(sc.b, sc.mix.mompop.b, sc.mix.mompid.b)
sc.names <- c("sc.bin", "sc.mix.mompid.bin", "sc.mix.mompop.bin","sc.mix.momdadpid.bin")
sc.list <- sapply(sc.names, get, USE.NAMES=T)
sc.AIC <- ICtab(sc.list,mnames=sc.names,type="AIC", base=T, delta=F) # for AICc, nobs=nobs(sc.list[[1]])
class(sc.AIC)<-"data.frame"
all.names <- c(sc.names)
all.list <- sapply(all.names, get, USE.NAMES=T)
all.AIC <- dfun(rbind(sc.AIC))
all.AIC <- all.AIC[order(all.AIC$dAIC),]
kable(all.AIC, format.arg=list(digits=3))
| dAIC | df | |
|---|---|---|
| sc.mix.momdadpid.bin | 0.0 | 8 |
| sc.mix.mompid.bin | 16.0 | 7 |
| sc.mix.mompop.bin | 88.1 | 7 |
| sc.bin | 102.0 | 6 |
The best-fiting model is a mixed model with the following components:
Looking at the normal, fixed effects model, we see that the residuals are not normal:
shapiro.test(sc.bin$residuals)#raw residuals!
Shapiro-Wilk normality test
data: sc.bin$residuals
W = 0.95692, p-value = 1.644e-06
The coefficients estimated for each model agree qualitatively.
sc.log.names <- sc.names
sc.log <- sapply(sc.log.names, get, USE.NAMES=T)
coefplot2(sc.log, legend.x="topright",legend=T,legend.args=list(cex=0.8, xpd=T, inset=c(-0.1,0)), col.pts=sample(gg_color_hue(length(sc.log.names))), spacing=0.05, lwd.2=2, lwd.1=4, intercept=T)
We chose the model with nearly the best (lowest) AIC, to carry out inference tests and parameter estimation.
mod <- sc.mix.momdadpid.bin
print(mod)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: vp ~ species * crosstype + (1 | mompid) + (1 | dadpid)
Data: grm
Weights: grm$planted
AIC BIC logLik deviance df.resid
1063.4497 1091.1604 -523.7249 1047.4497 228
Random effects:
Groups Name Std.Dev.
dadpid (Intercept) 0.2466
mompid (Intercept) 0.6512
Number of obs: 236, groups: dadpid, 25; mompid, 24
Fixed Effects:
(Intercept) specieskaal
0.66624 0.03502
crosstypewithin crosstypehybrid
0.08820 0.29712
specieskaal:crosstypewithin specieskaal:crosstypehybrid
-0.60123 -3.12682
Using a likelihood ratio test, with a null hypothesis of zero variance, the random effect (maternal plant) is significant for both model parts:
anova(sc.bin, sc.mix.momdadpid.bin) #double this p-value. or simulate null by permuting data.
Analysis of Deviance Table
Model: binomial, link: logit
Response: vp
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev
NULL 235 1574.41
species 1 424.57 234 1149.84
crosstype 2 158.51 232 991.34
species:crosstype 2 288.81 230 702.52
By dropping it from the model and performing a likelihood-ratio test, we see that the species x crosstype interaction is not significant.
sxc.chisq <- drop1(mod, test="Chisq") #load from file
dfun(sxc.chisq)
Single term deletions
Model:
vp ~ species * crosstype + (1 | mompid) + (1 | dadpid)
Df dAIC LRT Pr(Chi)
<none> 0.000
species:crosstype 2 37.113 41.113 1.181e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The model estimated the following parameters, with individual parameter significance determined by the Wald z-test, and fixed effect significance determined by analysis of deviance Wald test.
summary(mod)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: vp ~ species * crosstype + (1 | mompid) + (1 | dadpid)
Data: grm
Weights: grm$planted
AIC BIC logLik deviance df.resid
1063.4 1091.2 -523.7 1047.4 228
Scaled residuals:
Min 1Q Median 3Q Max
-3.9305 -0.9440 -0.2904 0.9049 4.8245
Random effects:
Groups Name Variance Std.Dev.
dadpid (Intercept) 0.06079 0.2466
mompid (Intercept) 0.42409 0.6512
Number of obs: 236, groups: dadpid, 25; mompid, 24
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.66624 0.25532 2.609 0.00907 **
specieskaal 0.03502 0.34283 0.102 0.91864
crosstypewithin 0.08820 0.13754 0.641 0.52133
crosstypehybrid 0.29712 0.17276 1.720 0.08546 .
specieskaal:crosstypewithin -0.60123 0.23731 -2.533 0.01129 *
specieskaal:crosstypehybrid -3.12682 0.31069 -10.064 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) spcskl crsstypw crsstyph spcskl:crsstypw
specieskaal -0.741
crsstypwthn -0.230 0.171
crsstyphybr -0.368 0.375 0.329
spcskl:crsstypw 0.136 -0.268 -0.577 -0.209
spcskl:crsstyph 0.304 -0.462 -0.199 -0.809 0.336
Anova(mod, type=3)
Analysis of Deviance Table (Type III Wald chisquare tests)
Response: vp
Chisq Df Pr(>Chisq)
(Intercept) 6.8091 1 0.00907 **
species 0.0104 1 0.91864
crosstype 2.9642 2 0.22716
species:crosstype 102.0971 2 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
These are box and QQ (to check normality) plots of the estimated random effect of each maternal plant.
predre <- setNames(data.frame(RE=ranef(mod)$mompid,SD=ranef(mod, sd=T)$`1`),c("RE","SD"))
ggplot(predre, aes(x = rownames(predre),y=RE)) +
geom_point(size = 2) + coord_flip()+
geom_errorbar(aes(ymin = RE-SD, ymax = RE+SD)) + labs(x="Maternal plants", y="Predicted random effects")
#Count
reStack <- ldply(ranef(mod))
print( qqmath( ~`(Intercept)`|.id, data=reStack, scales=list(relation="free"),
prepanel = prepanel.qqmathline,
panel = function(x, ...) {
panel.qqmathline(x, ...)
panel.qqmath(x, ...)
},
layout=c(1,1)))
The least square means procedure can generate predictor estimates of each type, and give their significance groupings with a post-hoc Tukey test. S. hookeri-produced hybrids produce less Viability than either crosses between or within S. hookeri populations. The other differences are not significant, but remember that the fixed effect of hybrid (vs. between) was significant (model summary).
#Count
rg <- ref.grid(mod)
#summary(rg)
sxc.lsm <- lsmeans(rg, ~ crosstype*species)
plot(sxc.lsm)
cld.mod <- cld(sxc.lsm, Letters=letters) #tukey letterings
library(boot)
Attaching package: 'boot'
The following object is masked from 'package:lattice':
melanoma
The following object is masked from 'package:car':
logit
cld.mod$response <- inv.logit(cld.mod$lsmean)
cld.mod$uSE <- inv.logit(cld.mod$lsmean+cld.mod$SE)
cld.mod$lSE <- inv.logit(cld.mod$lsmean-cld.mod$SE)
options(digits=4)
cld.mod[rev(order(cld.mod$species, cld.mod$crosstype)),]
crosstype species lsmean SE df asymp.LCL asymp.UCL .group response
hybrid kaal -2.1284 0.2329 NA -2.5850 -1.6719 a 0.1064
within kaal 0.1882 0.2518 NA -0.3052 0.6817 b 0.5469
between kaal 0.7013 0.2300 NA 0.2504 1.1521 b 0.6685
hybrid hook 0.9634 0.2502 NA 0.4730 1.4537 b 0.7238
within hook 0.7544 0.2607 NA 0.2435 1.2654 b 0.6801
between hook 0.6662 0.2553 NA 0.1658 1.1667 b 0.6607
uSE lSE
0.1306 0.08617
0.6083 0.48412
0.7173 0.61568
0.7709 0.67110
0.7340 0.62099
0.7154 0.60131
Results are given on the logit (not the response) scale.
Confidence level used: 0.95
Results are given on the log odds ratio (not the response) scale.
P value adjustment: tukey method for comparing a family of 6 estimates
significance level used: alpha = 0.05
H.wb <- with(cld.mod[cld.mod$species=="hook",], response[crosstype=="within"]/response[crosstype=="between"] - 1)
K.wb <- with(cld.mod[cld.mod$species=="kaal",], response[crosstype=="within"]/response[crosstype=="between"] - 1)
K.H <- with(cld.mod[cld.mod$crosstype=="between",], response[species=="kaal"]/response[species=="hook"] - 1)
maxsp <- ifelse(K.H>0, "kaal","hook")
minsp <- ifelse(K.H<0, "kaal","hook")
maxresp <- with(cld.mod, response[species==maxsp & crosstype=="between"])
minresp <- with(cld.mod, response[species==minsp & crosstype=="between"])
HK.resp <- with(cld.mod, response[species=="hook" & crosstype=="hybrid"])
KH.resp <- with(cld.mod, response[species=="kaal" & crosstype=="hybrid"])
HK.int <- with(cld.mod, ifelse(HK.resp > minresp & HK.resp < maxresp, (HK.resp-minresp)/(maxresp-minresp),
ifelse(HK.resp < minresp, HK.resp/minresp-1, HK.resp/maxresp-1)))
KH.int <- with(cld.mod, ifelse(KH.resp > minresp & KH.resp < maxresp, (KH.resp-minresp)/(maxresp-minresp),
ifelse(KH.resp < minresp, KH.resp/minresp-1, KH.resp/maxresp-1)))
intermed <- (minresp + maxresp) / 2
HK.prop <- with(grm, colSums(10*cbind(germ[species=="hook" & crosstype=="hybrid"],(planted-germ)[species=="hook" & crosstype=="hybrid"])))
prop.test(x=HK.prop[1], n=sum(HK.prop), p=intermed)
1-sample proportions test with continuity correction
data: HK.prop[1] out of sum(HK.prop), null probability intermed
X-squared = 220, df = 1, p-value <2e-16
alternative hypothesis: true p is not equal to 0.6646
95 percent confidence interval:
0.7349 0.7548
sample estimates:
p
0.745
KH.prop <- with(grm, colSums(10*cbind(germ[species=="kaal" & crosstype=="hybrid"],(planted-germ)[species=="kaal" & crosstype=="hybrid"])))
prop.test(x=KH.prop[1], n=sum(KH.prop), p=intermed)
1-sample proportions test with continuity correction
data: KH.prop[1] out of sum(KH.prop), null probability intermed
X-squared = 9900, df = 1, p-value <2e-16
alternative hypothesis: true p is not equal to 0.6646
95 percent confidence interval:
0.1456 0.1611
sample estimates:
p
0.1532
round(c(H.wb,K.wb,K.H,HK.int,KH.int),2)
[1] 0.03 -0.18 0.01 0.08 -0.84
ggplot(as.data.frame(cld.mod), aes(y=response, x=relevel(crosstype, "within"), fill=species)) +
geom_col(position=position_dodge2()) +
geom_linerange(aes(ymin=lSE, ymax=uSE), position=position_dodge(0.9)) +
labs(x="", y="Germination",fill="Maternal plant") +
scale_fill_manual(labels = c("S. hookeri ", "S. kaalae "), values=c("grey70","grey30")) +
scale_x_discrete(labels = c("Intrapopulation", "Interpopulation", "Hybrid")) +
geom_text(aes(label=.group), position=position_dodge(0.9), hjust=0, vjust=-1) +
scale_y_continuous(expand = expand_scale(mult=c(0,0.05)), breaks = scales::pretty_breaks(n = 5), limits=c(0,1)) +
theme_classic() + theme(legend.text=element_text(face="italic", size=rel(1)), legend.position="bottom", axis.text = element_text(colour="black", size=rel(1)), text=element_text(size=14), axis.ticks.x = element_blank()) + geom_segment(aes(x=2.5, y=intermed, xend=3.5, yend=intermed))