The ideal model would allow for interactions of dose and sex with the growth curve (= smooth effect of week), and for random variation among pups (and mothers) in the growth curve. Presumably-interesting biological questions are:
Load packages:
## built-in packages (no need to install)
library(mgcv)
## use install.packages() (or RStudio menu) for these
## fitting
## data manipulation
library(plyr)
library(dplyr)
## graphics, diagnostics, etc.
library(ggplot2); theme_set(theme_bw())
library(cowplot) ## plot_grid
library(bbmle) ## for AICtab
library(broom)
library(dotwhisker)
library(emmeans)
library(pander)
Read in data & make preliminary plot:
dd <- read.csv("DEPRE.weights.vertical.csv")
## re-order dose properly
dd <- transform(dd,
dose=factor(dose,
levels=c("Sham",paste0(c(50,300,1000),"mGy"))),
## factors and interactions
mother=factor(mother),
dose_sex=interaction(dose,sex),
mother_id=interaction(mother,id,drop=TRUE))
dd <- subset(dd,select=c(mother,id,mother_id,week,dose,sex,dose_sex,weight))
pups_per_mum <- with(unique(subset(dd,select=c(mother,id))),rowSums(table(mother,id)))
summary(subset(dd,select=-c(mother_id,dose_sex)))
## mother id week dose sex
## 108 : 52 A184-1 : 13 Min. : 4 Sham :182 female:416
## 124 : 52 A198-1 : 13 1st Qu.: 7 50mGy :221 male :403
## 140 : 52 A199-1 : 13 Median :10 300mGy :221
## 145 : 52 A199-2 : 13 Mean :10 1000mGy:195
## 178 : 52 A222-1 : 13 3rd Qu.:13
## 188 : 52 A222-2 : 13 Max. :16
## (Other):507 (Other):741
## weight
## Min. :12.00
## 1st Qu.:18.10
## Median :20.20
## Mean :21.09
## 3rd Qu.:24.50
## Max. :31.50
##
There are 23 mothers and 63 pups (from 1 to 4 pups per mother). Mothers are “nested” within dose (each mother, and all of her pups, have a single dose treatment). mother_id and dose_sex (not shown) are auxiliary variables that encode interactions. Doses are measured in milliGrays (except for “Sham”, i.e. sham treatment with no radiation).
Each line represents one pup:
gg0 <- ggplot(dd,aes(week,weight,colour=mother))+
geom_point()+
facet_wrap(~dose,labeller=label_both)
print(gg0+geom_line(aes(linetype=sex,group=id)))
Or (dropping information about mothers from the plot, plotting mean values for sex/dose/week combinations):
gg1 <- ggplot(dd,aes(week,weight,colour=dose,shape=dose))+
geom_point()+
facet_wrap(~sex,scale="free")+
scale_colour_brewer(palette="Dark2")
print(gg1+
stat_summary(fun.y=mean,geom="line",lwd=2,alpha=0.8)+
geom_line(aes(group=id),alpha=0.2))
Since it seems hard/finicky to come up with and fit a nonlinear mixed-effects model (e.g. coming up with a simple, sensible nonlinear function that starts out decelerating but transitions to linear growth rather than saturating to a constant value), we’ll proceed with generalized additive mixed models.
Many growth curve fits: data file with fits available here.
L <- load("growthcurves_batch.RData")
All models are fitted with mgcv::gam() using method="REML", with sex using sum-to-zero contrasts, e.g.
gam(weight~sex*dose+s(week,by=dose_sex)+
s(week,mother_id,bs='fs'),
data=dd,
method="REML")
(this is the most complex model fitted).
Using gamm4 would have been faster, but it didn’t seem worth the additional trouble for these purposes (our slowest model took 4.1 minutes on an 8-year-old desktop machine).
We include sex as a predictor in all models; this pattern (males are heavier than females and have a different growth curve) is biologically obvious/not interesting to us. In other words, “growth curves differ by sex but dose doesn’t have any clear effects” is our null model/hypothesis.
We have three options for the vertical shift of the curves (aka changes in intercept):
sex: M/F have different overall sizesex+dose: additive effects of dose and sex on overall sizesex*dose: interaction of dose and sexThree options for changes in the growth curve by group:
s(week,by=sex): M/F have different growth curvess(week,by=dose)+s(week,by=sex): additive effectss(week,by=dose_sex): interactive effects on growth curveAnd three options for random effects:
1: no random effect, all data pooleds(mother_id,bs='re'): random effects on overall size (vertical shift in growth curves)s(week,mother_id,bs='fs'): random effects on growth curveFor simplicity, I only fitted pup-within-mother random effects; I didn’t try to decompose the random effects into (mother + pup within mother). Preliminary analyses suggested that this wouldn’t make much difference/ we wouldn’t be able to detect a non-trivial level of variance among mothers with these data. Furthermore, when I tried to include both effects in the same model (sex+s(week,by=sex)+s(week,mother,bs='fs')+s(week,mother_id,bs='fs')) I got:
Error in gam(forms[[i]], data = dd, method = "REML") :
Model has more coefficients than data<br
Calls: system.time -> gam
In addition: Warning message:
In gam.side(sm, X, tol = .Machine$double.eps^0.5) :
model has repeated 1-d smooths of same variable.
For illustration, let’s look at the predictions of some of these models.
plotfun <- function(model,exclude=NULL) {
## generate data for prediction
pframe <- dd
## exclusion: see https://stats.stackexchange.com/questions/131106/predicting-with-random-effects-in-mgcv-gam
pframe$weight <- predict(model,newdata=pframe,exclude=exclude)
gg2 <- (ggplot(pframe,aes(week,weight,colour=dose,shape=dose))
+ geom_point()
+ scale_colour_brewer(palette="Dark2")
+ geom_line(aes(group=id,linetype=sex))
)
return(gg2)
}
Simplest model: different curves by sex, all doses identical, no random effects.
plotfun(models[["sex+s(week,by=sex)+1"]])
Interactive effects of dose and sex on shift; curve shape varies only by sex: no random effects.
plotfun(models[["dose*sex+s(week,by=sex)+1"]])
Interactive effects of dose and sex on shift and curve shape: no random effects.
plotfun(models[["dose*sex+s(week,by=dose_sex)+1"]])
No effects of dose: random effects (shifts) by pup
plotfun(models[["sex+s(week,by=sex)+s(mother_id,bs='re')"]])
No effects of dose: random effects (curves) by pup
plotfun(models[["sex+s(week,by=sex)+s(week,mother_id,bs='fs')"]])
Most complex model (dose*sex + sex(week,by=dose_sex)+s(week,mother_id,bs='fs')):
plotfun(models[["dose*sex+s(week,by=dose_sex)+s(week,mother_id,bs='fs')"]])
Not sure what this means:
no_legend <- theme(legend.position="none")
ysc <- scale_y_continuous(limits=c(12,30))
small_sub <-
theme(plot.subtitle=element_text(size=10))
dwid <- c(40,50) ## deparse width
noREmodel_ind <- which(names(models)=="dose*sex+s(week,by=dose_sex)+1")
fullmodel_ind <- which(names(models)=="dose*sex+s(week,by=dose_sex)+s(week,mother_id,bs='fs')")
gg3 <- plot_grid(
plotfun(models[[fullmodel_ind]],
exclude="s(week,mother_id)")
+no_legend+ ysc + small_sub
+ ggtitle("full model, RE zeroed out",
subtitle=paste(c(deparse(forms[[fullmodel_ind]],width=dwid[1]),
"predict(.,exclude='s(week,mother_id)')"),
collapse="\n")),
plotfun(models[[noREmodel_ind]])
+ no_legend+ysc + small_sub
+ ggtitle("model with no RE",
subtitle=paste(c(deparse(forms[[noREmodel_ind]],width=dwid[2]),
"predict(.)"),
collapse="\n")),
nrow=1)
print(gg3)
ggsave(gg3,file="modelcomp.png",width=8)
AICtab <- (ldply(models,glance,.id="model")
%>% mutate(dAIC=AIC-min(AIC)) ## compute delta-AIC
%>% select(model,dAIC,df)
## compute AIC 'weights'
%>% mutate(weight=exp(-dAIC/2),
weight=weight/sum(weight))
%>% arrange(dAIC)
)
AIC results:
| model | dAIC | df | weight |
|---|---|---|---|
| sex+s(week,by=dose_sex)+s(week,mother_id,bs=‘fs’) | 0 | 210.7 | 0.611 |
| dose*sex+s(week,by=dose_sex)+s(week,mother_id,bs=‘fs’) | 2 | 209.4 | 0.223 |
| dose+sex+s(week,by=dose_sex)+s(week,mother_id,bs=‘fs’) | 2.6 | 208.9 | 0.166 |
| sex+s(week,by=dose)+s(week,by=sex)+s(week,mother_id,bs=‘fs’) | 34.3 | 183.2 | 0 |
| dose+sex+s(week,by=dose)+s(week,by=sex)+s(week,mother_id,bs=‘fs’) | 37.1 | 181.5 | 0 |
| dose*sex+s(week,by=dose)+s(week,by=sex)+s(week,mother_id,bs=‘fs’) | 39.9 | 182.3 | 0 |
This suggests that the only models we really need to think about (ΔAIC<10) are those with the full set of random effects (curves varying by pup) and with curve shapes varying at the population level by dose and sex.
Classical null-hypothesis testing for the full model:
full_model <- models[["dose*sex+s(week,by=dose_sex)+s(week,mother_id,bs='fs')"]]
anova(full_model)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## weight ~ dose * sex + s(week, by = dose_sex) + s(week, mother_id,
## bs = "fs")
##
## Parametric Terms:
## df F p-value
## dose 3 5.947 0.000532
## sex 1 75.632 < 2e-16
## dose:sex 3 0.120 0.948038
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(week):dose_sexSham.female 6.790 7.893 20.16 <2e-16
## s(week):dose_sex50mGy.female 7.239 8.255 25.38 <2e-16
## s(week):dose_sex300mGy.female 7.907 8.685 18.07 <2e-16
## s(week):dose_sex1000mGy.female 6.716 7.829 23.75 <2e-16
## s(week):dose_sexSham.male 7.688 8.561 86.44 <2e-16
## s(week):dose_sex50mGy.male 7.562 8.481 94.83 <2e-16
## s(week):dose_sex300mGy.male 6.048 7.201 68.03 <2e-16
## s(week):dose_sex1000mGy.male 7.981 8.724 86.17 <2e-16
## s(week,mother_id) 143.436 614.000 8.46 <2e-16
This shows that the “shift” effect in the dose is clear (very clear effect of sex, effect of interaction is unclear). Not a useful comparison of curve effects (all we get is a list saying that all of the curves are clearly different from a flat line).
Model comparison: full model (interaction between dose and sex in curve shape) vs. model with additive effects of dose and sex on the shape of the growth curve
anova(full_model,
models[["dose*sex+s(week,by=dose)+s(week,by=sex)+s(week,mother_id,bs='fs')"]],test="F")
## Analysis of Deviance Table
##
## Model 1: weight ~ dose * sex + s(week, by = dose_sex) + s(week, mother_id,
## bs = "fs")
## Model 2: weight ~ dose * sex + s(week, by = dose) + s(week, by = sex) +
## s(week, mother_id, bs = "fs")
## Resid. Df Resid. Dev Df Deviance F Pr(>F)
## 1 569.05 148.23
## 2 600.52 166.24 -31.472 -18.011 2.3537 6.16e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Dot-whisker plot of shift-effects only: since sex is coded with sum-to-zero contrasts, the dose parameters represent the average across males and females of the difference between Sham and (50 mGy, 300 mGy, 1000 mGy). sex1 means the difference between females (level 1) and the average of males and females.
dwplot(tidy(full_model,parametric=TRUE))+
geom_vline(linetype=2,xintercept=0)
Differences in curves are hard to summarize in terms of parameters.
effects package?multcomp::glht ?emmeans ?what is driving the differences in the RE-zeroed vs RE-left-out predictions above? Should the wiggliness in the former (overfitted main effects) worry me?
diagnostics: plots look OK, but check residual vs. fitted, scale-location, random effects, Q-Q … ??
?mgcv::smooth.construct.fs.smooth.specbs='fs'), e.g. this twitter thread: ‘when fitting models w/ both a main effect and a group level term (e.g. s(time) + s(time, id, bs=“fs”)) using the gam() function, it’s really important to use method = “REML” or “ML”; using the default GCV criteria often leads to very flat main effects’