growth curve analysis

Goals

data on growth of rat pups under different levels of irradiation of mothers, from J. McEvoy
use a generalized additive model to analyze changes in weight over time
use a random-effects/multilevel component to allow for differences among mothers and pups within mothers
keep it reasonably simple (!)

The ideal model would allow for interactions of dose and sex with the growth curve (= smooth effect of week), and for random variation among pups (and mothers) in the growth curve. Presumably-interesting biological questions are:

(how) does intercept vary by dose/sex/interaction?
(how) does shape of growth curve vary, ditto?

Preliminaries

Load packages:

## built-in packages (no need to install)
library(mgcv)
## use install.packages() (or RStudio menu) for these
##  fitting
## data manipulation
library(plyr)
library(dplyr)
## graphics, diagnostics, etc.
library(ggplot2); theme_set(theme_bw())
library(cowplot) ## plot_grid
library(bbmle) ## for AICtab
library(broom)
library(dotwhisker)
library(emmeans)
library(pander)

Read in data & make preliminary plot:

dd <- read.csv("DEPRE.weights.vertical.csv")
## re-order dose properly
dd <- transform(dd,
                dose=factor(dose,
                     levels=c("Sham",paste0(c(50,300,1000),"mGy"))),
                ## factors and interactions
                mother=factor(mother),
                dose_sex=interaction(dose,sex),
                mother_id=interaction(mother,id,drop=TRUE))

dd <- subset(dd,select=c(mother,id,mother_id,week,dose,sex,dose_sex,weight))
pups_per_mum <- with(unique(subset(dd,select=c(mother,id))),rowSums(table(mother,id)))
summary(subset(dd,select=-c(mother_id,dose_sex)))

##      mother          id           week         dose         sex     
##  108    : 52   A184-1 : 13   Min.   : 4   Sham   :182   female:416  
##  124    : 52   A198-1 : 13   1st Qu.: 7   50mGy  :221   male  :403  
##  140    : 52   A199-1 : 13   Median :10   300mGy :221               
##  145    : 52   A199-2 : 13   Mean   :10   1000mGy:195               
##  178    : 52   A222-1 : 13   3rd Qu.:13                             
##  188    : 52   A222-2 : 13   Max.   :16                             
##  (Other):507   (Other):741                                          
##      weight     
##  Min.   :12.00  
##  1st Qu.:18.10  
##  Median :20.20  
##  Mean   :21.09  
##  3rd Qu.:24.50  
##  Max.   :31.50  
##

There are 23 mothers and 63 pups (from 1 to 4 pups per mother). Mothers are “nested” within dose (each mother, and all of her pups, have a single dose treatment). mother_id and dose_sex (not shown) are auxiliary variables that encode interactions. Doses are measured in milliGrays (except for “Sham”, i.e. sham treatment with no radiation).

Each line represents one pup:

gg0 <- ggplot(dd,aes(week,weight,colour=mother))+
    geom_point()+
    facet_wrap(~dose,labeller=label_both)
print(gg0+geom_line(aes(linetype=sex,group=id)))

Or (dropping information about mothers from the plot, plotting mean values for sex/dose/week combinations):

gg1 <- ggplot(dd,aes(week,weight,colour=dose,shape=dose))+
    geom_point()+
    facet_wrap(~sex,scale="free")+
    scale_colour_brewer(palette="Dark2")
print(gg1+
      stat_summary(fun.y=mean,geom="line",lwd=2,alpha=0.8)+
      geom_line(aes(group=id),alpha=0.2))

Since it seems hard/finicky to come up with and fit a nonlinear mixed-effects model (e.g. coming up with a simple, sensible nonlinear function that starts out decelerating but transitions to linear growth rather than saturating to a constant value), we’ll proceed with generalized additive mixed models.

model possibilities

Many growth curve fits: data file with fits available here.

L <- load("growthcurves_batch.RData")

All models are fitted with mgcv::gam() using method="REML", with sex using sum-to-zero contrasts, e.g.

gam(weight~sex*dose+s(week,by=dose_sex)+
      s(week,mother_id,bs='fs'),
    data=dd,
    method="REML")

(this is the most complex model fitted).

Using gamm4 would have been faster, but it didn’t seem worth the additional trouble for these purposes (our slowest model took 4.1 minutes on an 8-year-old desktop machine).

We include sex as a predictor in all models; this pattern (males are heavier than females and have a different growth curve) is biologically obvious/not interesting to us. In other words, “growth curves differ by sex but dose doesn’t have any clear effects” is our null model/hypothesis.

We have three options for the vertical shift of the curves (aka changes in intercept):

sex: M/F have different overall size
sex+dose: additive effects of dose and sex on overall size
sex*dose: interaction of dose and sex

Three options for changes in the growth curve by group:

s(week,by=sex): M/F have different growth curves
s(week,by=dose)+s(week,by=sex): additive effects
s(week,by=dose_sex): interactive effects on growth curve

And three options for random effects:

1: no random effect, all data pooled
s(mother_id,bs='re'): random effects on overall size (vertical shift in growth curves)
s(week,mother_id,bs='fs'): random effects on growth curve

For simplicity, I only fitted pup-within-mother random effects; I didn’t try to decompose the random effects into (mother + pup within mother). Preliminary analyses suggested that this wouldn’t make much difference/ we wouldn’t be able to detect a non-trivial level of variance among mothers with these data. Furthermore, when I tried to include both effects in the same model (sex+s(week,by=sex)+s(week,mother,bs='fs')+s(week,mother_id,bs='fs')) I got:

Error in gam(forms[[i]], data = dd, method = "REML") :
Model has more coefficients than data<br
Calls: system.time -> gam
In addition: Warning message:
In gam.side(sm, X, tol = .Machine$double.eps^0.5) :
  model has repeated 1-d smooths of same variable.

model examples

For illustration, let’s look at the predictions of some of these models.

plotfun <- function(model,exclude=NULL) {
    ## generate data for prediction
    pframe <- dd
    ## exclusion: see https://stats.stackexchange.com/questions/131106/predicting-with-random-effects-in-mgcv-gam
    pframe$weight <- predict(model,newdata=pframe,exclude=exclude)
    gg2 <- (ggplot(pframe,aes(week,weight,colour=dose,shape=dose))
        +         geom_point()
        + scale_colour_brewer(palette="Dark2")
        + geom_line(aes(group=id,linetype=sex))
    )
    return(gg2)
}

Simplest model: different curves by sex, all doses identical, no random effects.

plotfun(models[["sex+s(week,by=sex)+1"]])

Interactive effects of dose and sex on shift; curve shape varies only by sex: no random effects.

plotfun(models[["dose*sex+s(week,by=sex)+1"]])

Interactive effects of dose and sex on shift and curve shape: no random effects.

plotfun(models[["dose*sex+s(week,by=dose_sex)+1"]])

No effects of dose: random effects (shifts) by pup

plotfun(models[["sex+s(week,by=sex)+s(mother_id,bs='re')"]])

No effects of dose: random effects (curves) by pup

plotfun(models[["sex+s(week,by=sex)+s(week,mother_id,bs='fs')"]])

Most complex model (dose*sex + sex(week,by=dose_sex)+s(week,mother_id,bs='fs')):

plotfun(models[["dose*sex+s(week,by=dose_sex)+s(week,mother_id,bs='fs')"]])

Not sure what this means:

no_legend <- theme(legend.position="none")
ysc <- scale_y_continuous(limits=c(12,30))
small_sub <-
  theme(plot.subtitle=element_text(size=10))
dwid <- c(40,50) ## deparse width
noREmodel_ind <- which(names(models)=="dose*sex+s(week,by=dose_sex)+1")
fullmodel_ind <- which(names(models)=="dose*sex+s(week,by=dose_sex)+s(week,mother_id,bs='fs')")
gg3 <- plot_grid(
    plotfun(models[[fullmodel_ind]],
            exclude="s(week,mother_id)")
    +no_legend+ ysc + small_sub
    + ggtitle("full model, RE zeroed out",
         subtitle=paste(c(deparse(forms[[fullmodel_ind]],width=dwid[1]),
                          "predict(.,exclude='s(week,mother_id)')"),
                          collapse="\n")),
    plotfun(models[[noREmodel_ind]])
    + no_legend+ysc + small_sub
    + ggtitle("model with no RE",
            subtitle=paste(c(deparse(forms[[noREmodel_ind]],width=dwid[2]),
                "predict(.)"),
                           collapse="\n")),
    nrow=1)
print(gg3)

ggsave(gg3,file="modelcomp.png",width=8)

inference (1):

AICtab <- (ldply(models,glance,.id="model")
    %>% mutate(dAIC=AIC-min(AIC))  ## compute delta-AIC
    %>% select(model,dAIC,df)
    ## compute AIC 'weights'
    %>% mutate(weight=exp(-dAIC/2),
           weight=weight/sum(weight))
    %>% arrange(dAIC)
)

AIC results:

model	dAIC	df	weight
sex+s(week,by=dose_sex)+s(week,mother_id,bs=‘fs’)	0	210.7	0.611
dose*sex+s(week,by=dose_sex)+s(week,mother_id,bs=‘fs’)	2	209.4	0.223
dose+sex+s(week,by=dose_sex)+s(week,mother_id,bs=‘fs’)	2.6	208.9	0.166
sex+s(week,by=dose)+s(week,by=sex)+s(week,mother_id,bs=‘fs’)	34.3	183.2	0
dose+sex+s(week,by=dose)+s(week,by=sex)+s(week,mother_id,bs=‘fs’)	37.1	181.5	0
dose*sex+s(week,by=dose)+s(week,by=sex)+s(week,mother_id,bs=‘fs’)	39.9	182.3	0

This suggests that the only models we really need to think about (ΔAIC<10) are those with the full set of random effects (curves varying by pup) and with curve shapes varying at the population level by dose and sex.

Classical null-hypothesis testing for the full model:

full_model <-  models[["dose*sex+s(week,by=dose_sex)+s(week,mother_id,bs='fs')"]]
anova(full_model)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## weight ~ dose * sex + s(week, by = dose_sex) + s(week, mother_id, 
##     bs = "fs")
## 
## Parametric Terms:
##          df      F  p-value
## dose      3  5.947 0.000532
## sex       1 75.632  < 2e-16
## dose:sex  3  0.120 0.948038
## 
## Approximate significance of smooth terms:
##                                    edf  Ref.df     F p-value
## s(week):dose_sexSham.female      6.790   7.893 20.16  <2e-16
## s(week):dose_sex50mGy.female     7.239   8.255 25.38  <2e-16
## s(week):dose_sex300mGy.female    7.907   8.685 18.07  <2e-16
## s(week):dose_sex1000mGy.female   6.716   7.829 23.75  <2e-16
## s(week):dose_sexSham.male        7.688   8.561 86.44  <2e-16
## s(week):dose_sex50mGy.male       7.562   8.481 94.83  <2e-16
## s(week):dose_sex300mGy.male      6.048   7.201 68.03  <2e-16
## s(week):dose_sex1000mGy.male     7.981   8.724 86.17  <2e-16
## s(week,mother_id)              143.436 614.000  8.46  <2e-16

This shows that the “shift” effect in the dose is clear (very clear effect of sex, effect of interaction is unclear). Not a useful comparison of curve effects (all we get is a list saying that all of the curves are clearly different from a flat line).

Model comparison: full model (interaction between dose and sex in curve shape) vs. model with additive effects of dose and sex on the shape of the growth curve

anova(full_model,
      models[["dose*sex+s(week,by=dose)+s(week,by=sex)+s(week,mother_id,bs='fs')"]],test="F")

## Analysis of Deviance Table
## 
## Model 1: weight ~ dose * sex + s(week, by = dose_sex) + s(week, mother_id, 
##     bs = "fs")
## Model 2: weight ~ dose * sex + s(week, by = dose) + s(week, by = sex) + 
##     s(week, mother_id, bs = "fs")
##   Resid. Df Resid. Dev      Df Deviance      F   Pr(>F)    
## 1    569.05     148.23                                     
## 2    600.52     166.24 -31.472  -18.011 2.3537 6.16e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Dot-whisker plot of shift-effects only: since sex is coded with sum-to-zero contrasts, the dose parameters represent the average across males and females of the difference between Sham and (50 mGy, 300 mGy, 1000 mGy). sex1 means the difference between females (level 1) and the average of males and females.

dwplot(tidy(full_model,parametric=TRUE))+
    geom_vline(linetype=2,xintercept=0)

Differences in curves are hard to summarize in terms of parameters.

To do/questions for experts

best way to get overall effects/post hoc comparisons?
- effects package?
- multcomp::glht ?
- emmeans ?
- ??
are the model differences/ΔAIC overstated? To my eyes these models look a little overfitted. How would I cross-check? Parametric bootstrap, simulation, nested cross-validation?
what is driving the differences in the RE-zeroed vs RE-left-out predictions above? Should the wiggliness in the former (overfitted main effects) worry me?
how much should I worry about random effects on mothers?
diagnostics: plots look OK, but check residual vs. fitted, scale-location, random effects, Q-Q … ??

References

?mgcv::smooth.construct.fs.smooth.spec
helpful advice from G. Simpson, N. Ross, E. Pedersen on twitter (esp. bs='fs'), e.g. this twitter thread: ‘when fitting models w/ both a main effect and a group level term (e.g. s(time) + s(time, id, bs=“fs”)) using the gam() function, it’s really important to use method = “REML” or “ML”; using the default GCV criteria often leads to very flat main effects’
https://noamross.github.io/mgcv-esa-workshop/
https://github.com/noamross/mixed-effect-gams
- specifically figure comparing different smooths