Identify reproductive barriers between two sympatric moth-pollinated plant species, Schiedea kaalae and S. hookeri by fitting a generalized linear mixed model (GLMM).
In the experimental design, the following crosstypes were made:
In this analysis the response variable is the number of seeds produced by each cross. Other barriers (hybrid survival, biomass, flowering) could be analyzed in a similar framework, with appropriate changes to the underlying distribution. For count data, we cannot use a normal distribution, even after transformation. Instead we use discrete distributions available in generalized models - Poisson, negative binomial, and variants. In addition, the data are zero-inflated, so we model this as a separate process.
Fixed effects:
Potential random effects:
sds <- read.table("hybridseeds.csv", header=T, sep="\t")
#treat populations as factors
sds$mompop <- as.factor(sds$mompop)
sds$dadpop <- as.factor(sds$dadpop)
sds$linenum <- as.factor(sds$linenum)
#exclude "closed" dadpop
sds$dadpop[sds$dadpop=="closed"] <- NA
sds$dadpop<- droplevels(sds$dadpop)
sds <- na.omit(sds)
#rename crosstype codes
levels(sds$crosstype) <- list(between="e",within="a",hybrid="h")
#made "between" the first reference level to facilitate comparison between outcrossing populations and hybridizing species
#define interactions
sds <- within(sds, sxc <- interaction(species,crosstype))
sds <- within(sds, sxcxm <- interaction(species,crosstype,mompop,momid))
sds <- within(sds, mompid <- as.factor(paste(mompop,momid,sep=".")))
sds <- within(sds, smompop <- as.factor(paste(species,mompop,sep="")))
sds <- within(sds, yesno <- as.numeric(seednum>0))
#check final structure
str(sds)
'data.frame': 533 obs. of 15 variables:
$ linenum : Factor w/ 534 levels "1","2","3","4",..: 22 23 24 25 26 27 28 29 30 31 ...
$ labelid : int 18 19 22 23 24 25 32 35 36 37 ...
$ species : Factor w/ 2 levels "hook","kaal": 1 1 1 1 1 1 1 2 2 2 ...
$ mompop : Factor w/ 5 levels "794","879","892",..: 1 1 1 1 1 1 1 3 3 3 ...
$ momid : Factor w/ 18 levels "1","10","10-1",..: 6 6 11 11 11 11 10 11 1 1 ...
$ dadpop : Factor w/ 7 levels "3587","794","866",..: 2 4 2 2 5 4 4 4 4 4 ...
$ crosstype : Factor w/ 3 levels "between","within",..: 2 1 2 2 3 1 1 3 3 3 ...
$ dadspecies: Factor w/ 2 levels "H","K": 1 1 1 1 2 1 1 1 1 1 ...
$ cross : Factor w/ 6 levels "HHbetween","HHwithin",..: 2 1 2 2 3 1 1 4 4 4 ...
$ seednum : int 20 18 0 11 0 1 7 4 14 2 ...
$ sxc : Factor w/ 6 levels "hook.between",..: 3 1 3 3 5 1 1 6 6 6 ...
$ sxcxm : Factor w/ 540 levels "hook.between.794.1",..: 153 151 303 303 305 301 271 318 18 18 ...
$ mompid : Factor w/ 26 levels "3587.10","3587.10-1",..: 8 8 10 10 10 10 9 23 18 18 ...
$ smompop : Factor w/ 5 levels "hook794","hook879",..: 1 1 1 1 1 1 1 4 4 4 ...
$ yesno : num 1 1 0 1 0 1 1 1 1 1 ...
- attr(*, "na.action")=Class 'omit' Named int 427
.. ..- attr(*, "names")= chr "427"
The sample sizes are unbalanced at all levels, including maternal population:
reptab <- with(sds, table(smompop,crosstype))
mosaic(reptab, pop=F)
labeling_cells(text = reptab, margin = 0)(reptab)
Replication is low for some within-population crosses. The replication is even lower for each maternal plant, so we need to be wary of estimates when subsetting at this level:
with(sds, kable(table(mompid,crosstype)))
| between | within | hybrid | |
|---|---|---|---|
| 3587.10 | 3 | 2 | 7 |
| 3587.10-1 | 0 | 0 | 1 |
| 3587.14 | 5 | 2 | 18 |
| 3587.15 | 3 | 0 | 7 |
| 3587.7 | 4 | 2 | 20 |
| 3587.A | 1 | 2 | 6 |
| 3587.C | 6 | 3 | 11 |
| 794.2 | 18 | 13 | 33 |
| 794.4 | 7 | 6 | 14 |
| 794.5 | 10 | 5 | 20 |
| 794.6 | 0 | 1 | 7 |
| 879.10-1 | 15 | 6 | 29 |
| 879.2-2 | 3 | 6 | 18 |
| 879.3-8 | 2 | 1 | 6 |
| 879.G-2 | 3 | 3 | 9 |
| 879.H-2 | 0 | 2 | 4 |
| 879.N-5 | 6 | 4 | 16 |
| 892.1 | 6 | 3 | 12 |
| 892.10 | 1 | 0 | 2 |
| 892.2 | 1 | 0 | 0 |
| 892.3 | 1 | 1 | 9 |
| 892.4 | 2 | 1 | 11 |
| 892.5 | 5 | 4 | 12 |
| 904.2 | 6 | 3 | 17 |
| 904.3 | 1 | 5 | 12 |
| 904.5 | 10 | 6 | 32 |
To identify the best-fitting distribution, we make quantile-quantile plots of the raw data against various distributions. The more points within the confidence interval envelopes, the better the fit. Later, we present quantile-quantile plots of the model residuals to assess model fit.
Based on these fits (none of which account for zero-inflation), the negative binomial distribution best fits the overall data. We will also formally test the Poisson and normal distributions. Zero-inflation is evident from the hockey-stick shapes:
#QQ plots against various distributions
set.seed(1)
par(mfrow=c(2,3))
normal <- fitdistr(sds$seednum+1, "normal")
qqp(sds$seednum+1, "norm", main="Normal")
lognormal <- fitdistr(sds$seednum+1, "lognormal")
qqp(sds$seednum+1, "lnorm", main="Log Normal")
pois <- fitdistr(sds$seednum+1, "Poisson")
qqp(sds$seednum, "pois", pois$estimate, main="Poisson")
nbinom <- fitdistr(sds$seednum+1, "Negative Binomial")
qqp(sds$seednum+1, "nbinom", size = nbinom$estimate[[1]], mu=nbinom$estimate[[2]], main="Negative Binomial")
gamma <- fitdistr(sds$seednum+1, "gamma")
qqp(sds$seednum+1, "gamma", shape = gamma$estimate[[1]], rate = gamma$estimate[[2]], main="Gamma")
Another visual technique to identify an underlying distribution is the rootogram:
poisf <- goodfit(sds$seednum, type="poisson")
vcd::rootogram(poisf, main="Raw data Poisson fit", xlab="Seeds")
nbinomf <- goodfit(sds$seednum, type="nbinom")
vcd::rootogram(nbinomf, main="Raw data negative binomial fit", xlab="Seeds")
sc.c.nb.hurd <- countreg::hurdle(seednum~ species*crosstype | crosstype , data=sds, dist="negbin", zero="binom")
countreg::rootogram(sc.c.nb.hurd, main="Fixed effects only hurdle model (negative binomial + binomial) fit")
ggplot(sds, aes(x = seednum, fill=species)) +
geom_histogram(data=subset(sds,species == "hook"), aes(y=-..density..),binwidth=1)+
geom_histogram(data=subset(sds,species == "kaal"), aes(y= ..density..),binwidth=1)+
coord_flip() + facet_grid(~crosstype) + labs(y="Histogram", x="Seeds")
ggplot(aes(y=seednum, x=mompid, color=crosstype), data=sds) + geom_count(alpha=0.8) + coord_flip() + labs(x="Maternal plant", y="Seeds")
Our mixed model uses one parameter to capture random effect variance, which is assumed to be homogeneous. Plotting on a log scale should uncouple variances from means to assess this visually. Subsets are species * crosstype * maternal plant.
Subset variances are not homogeneous:
ggplot(aes(y=log(seednum+1), x=sxcxm, color=crosstype), data=sds) + geom_boxplot() + coord_flip() + labs(y="ln(seeds + 1)",x="Subsets")
Various distributions make different assumptions about the mean-variance (µ-Var) ratio. If the variance is greater than the mean (Var > µ, region above the black line representing the Poisson distribution), the data are overdispersed.
| Distribution | Variance-mean model |
|---|---|
| Poisson | Var = µ |
| quasi-Poisson | Var = φµ |
| Negative binomial | Var = µ(1 + αµ) |
We can see that the data are overdispersed, but both the negative binomial and quasi-Poisson seem to fit better than the Poisson:
grpVars <- with(sds, tapply(seednum, list(sxcxm), var))
grpMeans <- with(sds, tapply(seednum, list(sxcxm), mean))
grpCounts <- with(sds, tapply(seednum, list(sxcxm), length))
#set weight=grpCounts to weight loess by sample sizes
ggplot(na.omit(data.frame(grpMeans,grpVars,grpCounts)),
aes(x=grpMeans,y=grpVars, weight=1))+geom_point(aes(size=grpCounts))+
geom_abline(intercept = 0, slope = 1, aes(color="Poisson"))+
geom_smooth(method="loess", se=F, aes(color="Loess"))+
geom_smooth(method="lm",formula=y~x-1,aes(color="Quasi-Poisson"))+
geom_smooth(method="lm",formula=y~I(x^2)+offset(x)-1, aes(color="Negative Binomial"))+
guides(colour=guide_legend(title="Fit"),size=guide_legend(title="Sample size")) + labs(x="Subset Mean", y="Subset Variance") + labs(subtitle="Subset: species*crosstype*mompid")
Effects and interactions in these plots are simply given by the mean, which may be unduly influenced by high values.
intplot <- ggplot(sds,aes(x=crosstype,y=seednum))+
geom_count(aes(size = ..prop.., group=sxc),alpha=0.5)+
stat_summary(aes(x=as.numeric(crosstype)),fun.y=mean,geom="line")+ facet_grid(~species)
intplot + aes(group=species, color=species)
intplot + aes(group=mompop, color=mompop)
intplot + aes(group=mompid, color=mompop)
intplot + aes(group=dadpop, color=dadpop)
ggplot(sds,aes(x=crosstype,fill=factor(seednum>0))) + geom_bar(position="fill") + facet_grid(~species) + labs(y="Proportion") + guides(fill=guide_legend(title="Seeds produced"))
Run many generalized linear models on subsets of the data defined by crosstype | mompid to see if effects estimates are consistent within maternal plants.
Most maternal plant subsets agree, but some are problematic outliers. Three plants (plus two missing) can be picked out visually from the random effects interaction plot above, the estimated parameters of each subset model, and the QQ plot of the estimated parameters:
#had to get rid of species or mompid since mompid is nested inside species. dadpop also works
glm.lis <- lmList(seednum~crosstype|mompid,data=sds, family="poisson")
plot.lmList(glm.lis,scale=list(x=list(relation="free")))
Using grp as id variables
qqmath.lmList(glm.lis)
Using as id variables
Diagnosis of each problematic maternal plant:
We should consider rerunning model without these to see if random effect variance changes.
xtreme <- c("794.6","892.4","879.3-8","892.2","3587.10-1")
with(droplevels(sds[sds$mompid %in% xtreme,]), kable(table(mompid,crosstype)))
| between | within | hybrid | |
|---|---|---|---|
| 3587.10-1 | 0 | 0 | 1 |
| 794.6 | 0 | 1 | 7 |
| 879.3-8 | 2 | 1 | 6 |
| 892.2 | 1 | 0 | 0 |
| 892.4 | 2 | 1 | 11 |
We constructed the following models with the package glmmADMB. They all have the same fixed effects, species x crosstype
| Distribution, Random Effects: | None | Maternal plant | Maternal population |
|---|---|---|---|
| normal (norm) | X | X | X |
| Poisson (poi) | X, ZI | X, ZI, HR | |
| quasi-Poisson (qpoi) | X, ZI | X, ZI, HR | ZI |
| negative binomial (nb) | X, ZI | X, ZI, HR | ZI |
#Normal (Gaussian) distribution, identity link
sc.norm <- glmmadmb(seednum~species*crosstype, data=sds, family="gaussian")
sc.mix <- glmmadmb(seednum~species*crosstype + (1|mompid), data=sds, family="gaussian")
sc.mix.mompop <- glmmadmb(seednum~species*crosstype + (1|mompop), data=sds, family="gaussian")
#Poissson distribution, log link
sc.poi <- glmmadmb(seednum~species*crosstype, data=sds, family="poisson")
sc.poi.zi <- glmmadmb(seednum~species*crosstype, data=sds, family="poisson", zeroInflation=T)
sc.mix.poi <- glmmadmb(seednum~species*crosstype + (1|mompid), data=sds, family="poisson")
sc.mix.poi.zi <- glmmadmb(seednum~species*crosstype + (1|mompid), data=sds, family="poisson", zeroInflation=T)
#Negative binomial distribution, log link
sc.nb <- glmmadmb(seednum~species*crosstype, data=sds, family="nbinom")
sc.nb.zi <- glmmadmb(seednum~species*crosstype, data=sds, family="nbinom", zeroInflation=T)
sc.mix.nb <- glmmadmb(seednum~species*crosstype + (1|mompid), data=sds, family="nbinom")
sc.mix.nb.zi <- glmmadmb(seednum~species*crosstype + (1|mompid), data=sds, family="nbinom", zeroInflation=T)
sc.mix.nb.zi.mompop <- glmmadmb(seednum~species*crosstype + (1|mompop), data=sds, family="nbinom", zeroInflation=T)
#Quasi-Poisson (NB1), log link
sc.qpoi <- glmmadmb(seednum~species*crosstype, data=sds, family="nbinom1")
sc.qpoi.zi <- glmmadmb(seednum~species*crosstype, data=sds, family="nbinom1", zeroInflation=T)
sc.mix.qpoi <- glmmadmb(seednum~species*crosstype + (1|mompid), data=sds, family="nbinom1")
sc.mix.qpoi.zi <- glmmadmb(seednum~species*crosstype + (1|mompid), data=sds, family="nbinom1", zeroInflation=T)
sc.mix.qpoi.zi.mompop <- glmmadmb(seednum~species*crosstype + (1|mompop), data=sds, family="nbinom1", zeroInflation=T)
#Hurdle models - split into binary and count models
#binary models
intercept.hr <- glmmadmb(yesno~1, data=sds, family="binom")
c.hr <- glmmadmb(yesno~crosstype, data=sds, family="binom")
c.mix.hr <- glmmadmb(yesno~crosstype + (1|mompid), data=sds, family="binom")
sc.hr <- glmmadmb(yesno~species*crosstype, data=sds, family="binom")
sc.mix.hr <- glmmadmb(yesno~species*crosstype + (1|mompid), data=sds, family="binom")
#truncated count models
sc.mix.poi.tr <- glmmadmb(seednum~species*crosstype + (1|mompid), data=subset(sds,seednum>0), family="truncpoiss")
sc.mix.nb.tr <- glmmadmb(seednum~species*crosstype + (1|mompid), data=subset(sds,seednum>0), family="truncnbinom")
sc.mix.qpoi.tr <- glmmadmb(seednum~species*crosstype + (1|mompid), data=subset(sds,seednum>0), family="truncnbinom1",
admb.opts = admbControl(shess = FALSE, noinit = FALSE))
We will use the Aikake Information Criterion to pick the model the best fits the data, penalized by the number of parameters. Differences of 2 units are significant.
Zero inflation is present, so fit is improved with zero-inflated and hurdle models.
sc.names <- c("sc.norm","sc.mix","sc.mix.mompop","sc.poi","sc.poi.zi","sc.mix.poi","sc.mix.poi.zi","sc.nb","sc.nb.zi","sc.mix.nb","sc.mix.nb.zi","sc.mix.nb.zi.mompop","sc.qpoi","sc.qpoi.zi","sc.mix.qpoi","sc.mix.qpoi.zi","sc.mix.qpoi.zi.mompop")
sc.list <- sapply(sc.names, get, USE.NAMES=T)
sc.AIC <- ICtab(sc.list,mnames=sc.names,type="AIC", base=T, delta=F) # for AICc, nobs=nobs(sc.list[[1]])
class(sc.AIC)<-"data.frame"
hr <- AIC(intercept.hr, c.hr, c.mix.hr, sc.hr, sc.mix.hr)
trunc <- AIC(sc.mix.poi.tr,sc.mix.nb.tr,sc.mix.qpoi.tr)
combined <- trunc + hr[c(5,5,5),] #TODO: check df incorporation
all.names <- c(sc.names, "sc.mix.poi.tr","sc.mix.nb.tr","sc.mix.qpoi.tr")
all.list <- sapply(all.names, get, USE.NAMES=T)
all.AIC <- dfun(rbind(sc.AIC,combined))
all.AIC <- all.AIC[order(all.AIC$dAIC),]
kable(all.AIC, format.arg=list(digits=3))
| dAIC | df | |
|---|---|---|
| sc.mix.qpoi.tr | 0.0 | 15 |
| sc.mix.nb.tr | 22.7 | 15 |
| sc.mix.qpoi.zi | 43.4 | 9 |
| sc.mix.nb.zi | 54.3 | 9 |
| sc.nb.zi | 102.4 | 8 |
| sc.mix.nb.zi.mompop | 103.5 | 9 |
| sc.mix.qpoi.zi.mompop | 109.5 | 9 |
| sc.qpoi.zi | 111.7 | 8 |
| sc.mix.qpoi | 144.5 | 8 |
| sc.mix.poi.tr | 187.0 | 14 |
| sc.mix.nb | 204.9 | 8 |
| sc.mix.poi.zi | 221.1 | 8 |
| sc.nb | 252.5 | 7 |
| sc.qpoi | 256.8 | 7 |
| sc.poi.zi | 424.0 | 7 |
| sc.mix | 525.4 | 8 |
| sc.mix.mompop | 603.3 | 8 |
| sc.norm | 615.1 | 7 |
| sc.mix.poi | 1075.4 | 7 |
| sc.poi | 1730.3 | 6 |
The best-fiting model is a hurdle model with the following components:
We can individually select the binary and count component of the hurdle models. The AIC of the combined model (sum of the AICs of the binary and count models) was computed above.
dfun(hr)[order(dfun(hr)$dAIC),]
df dAIC
c.mix.hr 4 0.000
sc.mix.hr 7 3.508
c.hr 3 37.492
sc.hr 6 40.060
intercept.hr 1 40.516
dfun(trunc)[order(dfun(trunc)$dAIC),]
df dAIC
sc.mix.qpoi.tr 8 0.00
sc.mix.nb.tr 8 22.68
sc.mix.poi.tr 7 186.96
Hurdle models are not easily comparable to regular GLMMs except by AIC, so they are omitted until inference below.
Looking at the normal, fixed effects model, we see that the residuals are not normal:
shapiro.test(sc.norm$residuals)#raw residuals!
Shapiro-Wilk normality test
data: sc.norm$residuals
W = 0.95667, p-value = 2.082e-11
Furthermore, we examined the overall data distribution for overdispersion above. This is a formal test of each model, with the null hypothesis that there is no overdispersion.
The mixed negative binomial and mixed quasi-Poisson are not over-dispersed:
sc.overdisp.names <- c("sc.mix.poi","sc.mix.poi.zi","sc.mix.nb","sc.mix.nb.zi.mompop", "sc.mix.nb.zi", "sc.mix.qpoi", "sc.mix.qpoi.zi.mompop","sc.mix.qpoi.zi")
sc.overdisp <- sapply(sc.overdisp.names, get, USE.NAMES=T)
kable(t(sapply(sc.overdisp, overdisp)))
| chisq | ratio | rdf | p | |
|---|---|---|---|---|
| sc.mix.poi | 1990.6421 | 3.784491 | 526 | 0.0000000 |
| sc.mix.poi.zi | 1778.4340 | 3.381053 | 526 | 0.0000000 |
| sc.mix.nb | 372.5411 | 0.708253 | 526 | 0.9999999 |
| sc.mix.nb.zi.mompop | 708.9644 | 1.347841 | 526 | 0.0000002 |
| sc.mix.nb.zi | 762.2621 | 1.449167 | 526 | 0.0000000 |
| sc.mix.qpoi | 1226.2945 | 2.331358 | 526 | 0.0000000 |
| sc.mix.qpoi.zi.mompop | 652.8878 | 1.241232 | 526 | 0.0001282 |
| sc.mix.qpoi.zi | 512.1089 | 0.973591 | 526 | 0.6596505 |
The coefficients estimated for each model agree qualitatively (excluding the normal distribution models, not shown, which has a different link function). It’s hard to interpret main effects in the presence of a significant interaction, but here are the non-rigorous results (wait for the lsmeans Tukey test):
sc.loglink.names <- c("sc.poi","sc.poi.zi","sc.mix.poi","sc.mix.poi.tr","sc.mix.poi.zi","sc.nb","sc.nb.zi","sc.mix.nb","sc.mix.nb.zi", "sc.mix.nb.zi.mompop","sc.mix.nb.tr","sc.qpoi","sc.qpoi.zi", "sc.mix.qpoi", "sc.mix.qpoi.zi.mompop","sc.mix.qpoi.zi","sc.mix.qpoi.tr")
sc.loglink <- sapply(sc.loglink.names, get, USE.NAMES=T)
coefplot2(sc.loglink, legend.x="topright",legend=T,legend.args=list(cex=0.8, xpd=T, inset=c(-0.1,0)), col.pts=sample(gg_color_hue(length(sc.loglink.names))), spacing=0.05, lwd.2=2, lwd.1=4, intercept=T)
These are plots of residuals (y) against fitted values (x) for each model. Ideally, the black spline is horizontal and residuals are equally spread. Otherwise, there are non-linear relationships beteeen the predictors and outcome. Zeroes in the original data are marked in red.
#cannot use the normal models since they do not have sd.est to compute Pearson residuals
par(mfrow=c(3,5),mar=c(0,0,1,0))
for(i in 4:length(sc.list)){
if(i %in% c(2,3,8)) plot.new()
plot(fitted(sc.list[[i]]), residuals(sc.list[[i]]), axes=F, frame= T, xlab = "Fitted Values", ylab = "Residuals", col=1+(sds$seednum==0), main=sc.names[i])
abline(h = 0, lty = 2)
lines(smooth.spline(fitted(sc.list[[i]]), residuals(sc.list[[i]])))
}
Model residuals are normally distributed if they follow the fitted line. For large sample sizes, residuals of Poisson and negative binomial GLMs tend towards normality. Zeroes in the original data are marked in red.
par(mfrow=c(3,5),mar=c(2,0,1,0))
for(i in 4:length(sc.list)){
if(i %in% c(2,3,8)) plot.new()
qqp(residuals(sc.list[[i]]), "norm", main=sc.names[i], lwd=1, col.lines=3, col=1+(sds$seednum==0))
}
These are plots of sqrt(abs(residuals)) (y) against fitted values (x) for each model. Ideally, the loess fit should be flat. This tests the assumption of equal variance (homoscedasticity). Zeroes in the original data are marked in red.
par(mfrow=c(3,5),mar=c(0,0,1,0))
for(i in 4:length(sc.list)){
if(i==8) plot.new()
locscaleplot(sc.list[[i]],col=1+(sds$seednum==0))
title(main=sc.names[i])
}
We chose the model with nearly the best (lowest) AIC, sc.mix.qpoi.tr+sc.mix.hr, to carry out inference tests and parameter estimation. This hurdle model is split into a binary model (data reduced to =0 or >0) and a count model (truncated distribution fit to observations >0), with the same fixed and random terms. Each model has its own parameter estimates.
mod <- sc.mix.qpoi.tr
print(mod)
GLMM's in R powered by AD Model Builder:
Family: truncnbinom
alpha = 1.9153
link = log
Fixed effects:
Log-likelihood: -1079.12
AIC: 2174.24
Formula: seednum ~ species * crosstype + (1 | mompid)
(Intercept) specieskaal
2.57178857 -0.73781370
crosstypewithin crosstypehybrid
-0.12982426 -0.76278096
specieskaal:crosstypewithin specieskaal:crosstypehybrid
0.06588293 0.52063268
Random effects:
Structure: Diagonal matrix
Group=mompid
Variance StdDev
(Intercept) 0.1078 0.3283
Number of observations: total=387, mompid=25
mod.hr <- sc.mix.hr
print(mod.hr)
GLMM's in R powered by AD Model Builder:
Family: binom
link = logit
Fixed effects:
Log-likelihood: -288.431
AIC: 590.862
Formula: yesno ~ species * crosstype + (1 | mompid)
(Intercept) specieskaal
1.33667166 0.20635116
crosstypewithin crosstypehybrid
-0.02307112 -0.89466439
specieskaal:crosstypewithin specieskaal:crosstypehybrid
-0.30533293 0.43281810
Random effects:
Structure: Diagonal matrix
Group=mompid
Variance StdDev
(Intercept) 1.084 1.041
Number of observations: total=533, mompid=26
Using a likelihood ratio test, with a null hypothesis of zero variance, the random effect (maternal plant) is significant for both model parts:
anova(sc.qpoi.zi, sc.mix.qpoi.zi) #double this p-value. or simulate null by permuting data.
Analysis of Deviance Table
Model 1: seednum ~ species * crosstype
Model 2: seednum ~ species * crosstype
NoPar LogLik Df Deviance Pr(>Chi)
1 8 -1430.4
2 9 -1395.2 1 70.32 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(sc.hr, sc.mix.hr)
Analysis of Deviance Table
Model 1: yesno ~ species * crosstype
Model 2: yesno ~ species * crosstype
NoPar LogLik Df Deviance Pr(>Chi)
1 6 -307.71
2 7 -288.43 1 38.552 5.331e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
By dropping it from the model and performing a likelihood-ratio test, we see that the species x crosstype interaction is significant for the count model but not the binary model:
#sxc.chisq <- drop1(mod, test="Chisq") #load from file
dfun(sxc.chisq)
Single term deletions
Model:
seednum ~ species * crosstype + (1 | mompid)
Df dAIC LRT Pr(>Chi)
<none> 0.00
species:crosstype 2 11.56 15.56 0.000418 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# sxc.chisq.hr <- drop1(sc.mix.hr, test="Chisq") #load from file
dfun(sxc.chisq.hr)
Single term deletions
Model:
yesno ~ species * crosstype + (1 | mompid)
Df dAIC LRT Pr(>Chi)
<none> 2.394
species:crosstype 2 0.000 1.606 0.448
The model estimated the following parameters, with individual parameter significance determined by the Wald z-test, and fixed effect significance determined by analysis of deviance Wald test.
summary(mod)
Call:
glmmadmb(formula = seednum ~ species * crosstype + (1 | mompid),
data = subset(sds, seednum > 0), family = "truncnbinom1",
admb.opts = admbControl(shess = FALSE, noinit = FALSE))
AIC: 2174.2
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.5718 0.1286 20.00 < 2e-16 ***
specieskaal -0.7378 0.1769 -4.17 3.0e-05 ***
crosstypewithin -0.1298 0.0825 -1.57 0.12
crosstypehybrid -0.7628 0.0784 -9.73 < 2e-16 ***
specieskaal:crosstypewithin 0.0659 0.1562 0.42 0.67
specieskaal:crosstypehybrid 0.5206 0.1240 4.20 2.7e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Number of observations: total=387, mompid=25
Random effect variance(s):
Group=mompid
Variance StdDev
(Intercept) 0.1078 0.3283
Negative binomial dispersion parameter: 1.9153 (std. err.: 0.15177)
Log-likelihood: -1079.12
Anova(mod, type=3)
Analysis of Deviance Table (Type III tests)
Response: seednum
Df Chisq Pr(>Chisq)
(Intercept) 1 399.872 < 2.2e-16 ***
species 1 17.395 3.035e-05 ***
crosstype 2 101.604 < 2.2e-16 ***
species:crosstype 2 18.171 0.0001133 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(mod.hr)
Call:
glmmadmb(formula = yesno ~ species * crosstype + (1 | mompid),
data = sds, family = "binom")
AIC: 590.9
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.3367 0.4875 2.74 0.0061 **
specieskaal 0.2064 0.6745 0.31 0.7596
crosstypewithin -0.0231 0.5167 -0.04 0.9644
crosstypehybrid -0.8947 0.3793 -2.36 0.0184 *
specieskaal:crosstypewithin -0.3053 0.7722 -0.40 0.6925
specieskaal:crosstypehybrid 0.4328 0.5540 0.78 0.4347
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Number of observations: total=533, mompid=26
Random effect variance(s):
Group=mompid
Variance StdDev
(Intercept) 1.084 1.041
Log-likelihood: -288.431
Anova(mod.hr, type=3)
Analysis of Deviance Table (Type III tests)
Response: yesno
Df Chisq Pr(>Chisq)
(Intercept) 1 7.5177 0.00611 **
species 1 0.0936 0.75964
crosstype 2 5.5648 0.06189 .
species:crosstype 2 0.7898 0.67375
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
These are box and QQ (to check normality) plots of the estimated random effect of each maternal plant. Note that this is NOT an assuption of the model, but they should approach normality for high sample sizes in a well-fitted model.
predre <- setNames(data.frame(RE=ranef(mod)$mompid,SD=ranef(mod, sd=T)$`1`),c("RE","SD"))
ggplot(predre, aes(x = rownames(predre),y=RE)) +
geom_point(size = 2) + coord_flip()+
geom_errorbar(aes(ymin = RE-SD, ymax = RE+SD)) + labs(x="Maternal plants", y="Predicted random effects")
#Count
reStack <- ldply(ranef(mod))
print( qqmath( ~`(Intercept)`|.id, data=reStack, scales=list(relation="free"),
prepanel = prepanel.qqmathline,
panel = function(x, ...) {
panel.qqmathline(x, ...)
panel.qqmath(x, ...)
},
layout=c(1,1)))
#Binary
reStack <- ldply(ranef(mod.hr))
print( qqmath( ~`(Intercept)`|.id, data=reStack, scales=list(relation="free"),
prepanel = prepanel.qqmathline,
panel = function(x, ...) {
panel.qqmathline(x, ...)
panel.qqmath(x, ...)
},
layout=c(1,1)))
These predictions and confidence intervals are only based on fixed effects, conditional on the estimates of the random-effect variances.
newdat <- expand.grid(crosstype=levels(sds$crosstype), species=levels(sds$species))
mm <- model.matrix(delete.response(terms(mod)),newdat) ## design matrix (fixed effects)
newdat$pred <- mm %*% fixef(mod) ## matrix-multiply X by the parameter vector β to get the predictions (or linear predictor in the case of GLM(M)s). could also use predict(mod, newdata=newdat)
newdat$seednum <- exp(newdat$pred) #For GLMMs, back-transform this with the inverse link function (e.g. plogis() for binomial, beta; exp() for Poisson, negative binomial
predvar <- diag(mm %*% vcov(mod) %*% t(mm)) #extract the variance-covariance matrix of the parameters V, compute XVX′ to get the variance-covariance matrix of the predictions; extract the diagonal of this matrix to get variances of predictions;
pdl <- position_dodge(width=0.1)
pd <- position_dodge(width=0.4)
g0 <- ggplot(sds,aes(x=crosstype,y=seednum,color=species)) + geom_count(aes(size = ..prop..),position=pd) + scale_size_area()
g1 <- g0 + geom_line(aes(group=species),data=newdat, position=pd)
#confidence intervals
newdat$SE <- sqrt(predvar) #take the square-root of the variances to get the standard deviations (errors) of the predictions
#confidence intervals = based on fixed-effected uncertainty only
#compute confidence intervals based on a Normal approximation;
#for GL(M)Ms, run the confidence interval boundaries (not the standard errors) through the inverse-link function.
g1 + geom_errorbar(data=newdat, aes(ymin=exp(pred-1.96*SE),ymax=exp(pred+1.96*SE)), position=pd)
These confidence intervals are based on fixed effects uncertainty and random effects variance.
#if computing prediction rather than confidence intervals, add the residual variance
newdat$SE2 <- sqrt(predvar+mod$alpha^2)
g1 + geom_errorbar(data=newdat, aes(ymin=pred-1.96*SE2,ymax=pred+1.96*SE2), position=pd)
The least square means procedure can generate predictor estimates of each type, and give their significance groupings with a post-hoc Tukey test. S. hookeri-produced hybrids produce less seeds than either crosses between or within S. hookeri populations. The other differences are not significant, but remember that the fixed effect of hybrid (vs. between) was significant (model summary).
#Count
rg <- ref.grid(mod)
#summary(rg)
sxc.lsm <- lsmeans(rg, ~ crosstype*species)
plot(sxc.lsm)
cld(sxc.lsm, Letters=letters) #tukey letterings
crosstype species lsmean SE df asymp.LCL asymp.UCL .group
hybrid kaal 1.591827 0.2689134 NA 1.064766 2.118887 a
within kaal 1.770034 0.2813601 NA 1.218578 2.321489 a
hybrid hook 1.809008 0.1557897 NA 1.503665 2.114350 a
between kaal 1.833975 0.2058160 NA 1.430583 2.237367 a
within hook 2.441964 0.1505596 NA 2.146873 2.737056 b
between hook 2.571789 0.1286100 NA 2.319718 2.823860 b
Results are given on the log (not the response) scale.
Confidence level used: 0.95
P value adjustment: tukey method for comparing a family of 6 estimates
significance level used: alpha = 0.05
#Binary
rg.hr <- ref.grid(mod.hr)
sc.lsm.hr <- lsmeans(rg.hr, ~ crosstype*species)
plot(sc.lsm.hr)
options(digits=4)
cld.mod <- cld(sxc.lsm, Letters=letters) #tukey letterings
cld.mod[rev(order(cld.mod$species, cld.mod$crosstype)),]
crosstype species lsmean SE df asymp.LCL asymp.UCL .group
hybrid kaal 1.592 0.2689 NA 1.065 2.119 a
within kaal 1.770 0.2814 NA 1.219 2.321 a
between kaal 1.834 0.2058 NA 1.431 2.237 a
hybrid hook 1.809 0.1558 NA 1.504 2.114 a
within hook 2.442 0.1506 NA 2.147 2.737 b
between hook 2.572 0.1286 NA 2.320 2.824 b
Results are given on the log (not the response) scale.
Confidence level used: 0.95
P value adjustment: tukey method for comparing a family of 6 estimates
significance level used: alpha = 0.05