Random intercept/slope model vs. nested random effect model in R

Intro

In many experiments, random effects interact with fixed effects. However, modelling such interactions can be challenging. In this example I will take a closer look at an example that is rather common in (evo)ecology.

In an experiment we examine how plants respond to a treatment. The population level effect is modeled as a fixed effect. In many cases we are as much interested in how individual genotypes respond to the treatment, i.e. a genotype \(\times\) environment interaction. This can be done by having both genotype and treatment as fixed effects. But if we want to make generalizable statements, easier extract genetic variation and do other in depths analyses, it is preferable to model genotype as a random factor.

First let us generate some data based on a complete randomized experiment. Clones of each genotype were randomly assigned to a treatment (control or treatment) and thereafter randomly planted in a grid. Fitness was measured as the response, that later was transformed into relative fitness. The change in fitness between the control and the treatment (damage) measures tolerance. Hence this experiment can be said to explore variation in tolerance to damage. In the first example we assume that there is a strong genotype \(\times\) treatment interaction. In other words, genotypes respond differently and we can express this as they have different slopes (note that treatment is a categorical variable here so it may be more “correct” to say differences than slopes. However I will use the term slope in this example). Variance in slopes result in higher genotype variance in the treatment group compared to the control group if there is no correlation between the intercept and the slopes. In the first example presented here, there is no correlation between the intercept and the slope, i.e. genotype response does not depend on the “intrinsic” fitness. Towards the end of this report I will show an example where there is a strong correlation between intercepts and slopes, which leads to similar variances in the treatment groups.

Random intercept/slope model

set.seed(1)
# Genotype variation in the control (n = 50)
rel.c <- rnorm(50, 1, sd = 0.2)
# Genetic variation of tolerance, i.e. slopes (var = sd^2)
# The population effect is set to zero for simplicity
gen.var <- rnorm(50, 0, sd = 0.30)
# Genotype means for damage treatment
#rel.d <- rel.c + sort(gen.var)
rel.d <- rel.c + gen.var
# Set within genotype:treatment combination sample size
n.in = 20
# Generate within genotype:treatment data with sigma = 0.3
dat.con <- rnorm(50*n.in, rel.c, sd = 0.3)
dat.treat <- rnorm(50*n.in, rel.d, sd = 0.3)
# Give genotype id
gen.id <- rep(1:50, times = n.in)
# Put together data frame
dat <- data.frame( "gen" = factor(c(gen.id, gen.id)), "treat" = rep(c("con","treat"), each = 50*n.in),
            "rel.fit" = c(dat.con, dat.treat) )

To get a better understanding of the data we can plot the genotype responses using ordinary least-square.

plot of chunk unnamed-chunk-2

There is large variation in slopes and this results in a larger genotypic variation in the Damage treatment. We can now start to fit models to this data. I will be using the package nlme but code for fitting the same model in lme4 is shown as well.

library(nlme)
# Random intercept and slope
# Fitting a model with random intercepts and random slope but no co-varaince between intercepts and slope
mod <- lme(rel.fit ~ treat, random=list(gen = pdDiag(~ treat)), data = dat)
# In lme4
#mod.lmer <- lmer(rel.fit ~ dummy(treat, "treat") + (1|gen) + (0 + dummy(treat, "treat") | gen), data = dat)

# This model can also be defined as estimating a variance for each group (control, treat) instead of estimating the differences as in the model above
mod.2var <- lme(rel.fit ~ treat, random=list(gen = pdDiag(~ treat-1)), data = dat) 
# Only random intercept model
mod.null <- lme(rel.fit ~ treat, random = ~ 1|gen, data = dat)

If we look at the models we notice that the first model, random int/slope, captures the data generating process rather well. The variation in slopes is close to the value used in the simulation - sd = 0.2912.

mod

## Linear mixed-effects model fit by REML
##   Data: dat 
##   Log-restricted-likelihood: -639.9
##   Fixed: rel.fit ~ treat 
## (Intercept)  treattreat 
##     1.01302     0.03345 
## 
## Random effects:
##  Formula: ~treat | gen
##  Structure: Diagonal
##         (Intercept) treattreat Residual
## StdDev:      0.1645     0.2912   0.3131
## 
## Number of Observations: 2000
## Number of Groups: 50

mod.2var

## Linear mixed-effects model fit by REML
##   Data: dat 
##   Log-restricted-likelihood: -644.9
##   Fixed: rel.fit ~ treat 
## (Intercept)  treattreat 
##     1.01302     0.03345 
## 
## Random effects:
##  Formula: ~treat - 1 | gen
##  Structure: Diagonal
##         treatcon treattreat Residual
## StdDev:   0.1648     0.3324   0.3131
## 
## Number of Observations: 2000
## Number of Groups: 50

mod.null

## Linear mixed-effects model fit by REML
##   Data: dat 
##   Log-restricted-likelihood: -787.8
##   Fixed: rel.fit ~ treat 
## (Intercept)  treattreat 
##     1.01302     0.03345 
## 
## Random effects:
##  Formula: ~1 | gen
##         (Intercept) Residual
## StdDev:      0.2168   0.3455
## 
## Number of Observations: 2000
## Number of Groups: 50

# Test for random slope (test on boudary so P-value should be diveded by 2)
anova(mod, mod.null)

##          Model df  AIC  BIC logLik   Test L.Ratio p-value
## mod          1  5 1290 1318 -639.9                       
## mod.null     2  4 1584 1606 -787.8 1 vs 2   295.9  <.0001

# We can also explicity test the hypothesis that the genotypic variance is equal in the two treatments. It is clear that two variances are needed. But strangely, the "2 variances model" is not identical to the intercept + slope model
anova(mod.2var, mod.null)

##          Model df  AIC  BIC logLik   Test L.Ratio p-value
## mod.2var     1  5 1300 1328 -644.9                       
## mod.null     2  4 1584 1606 -787.8 1 vs 2   285.8  <.0001

We generated data with no correlation between intercepts and slopes. We can test the correlation (i.e. the covariance term).

# Fit model with a covariance between intercept and slope
mod.corr <- lme(rel.fit ~ treat, random = ~ 1+treat|gen, data = dat)
summary(mod.corr)

## Linear mixed-effects model fit by REML
##  Data: dat 
##    AIC  BIC logLik
##   1292 1325 -639.8
## 
## Random effects:
##  Formula: ~1 + treat | gen
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev Corr  
## (Intercept) 0.1648 (Intr)
## treattreat  0.2917 -0.018
## Residual    0.3131       
## 
## Fixed effects: rel.fit ~ treat 
##              Value Std.Error   DF t-value p-value
## (Intercept) 1.0130   0.02532 1949   40.01  0.0000
## treattreat  0.0334   0.04356 1949    0.77  0.4427
##  Correlation: 
##            (Intr)
## treattreat -0.105
## 
## Standardized Within-Group Residuals:
##      Min       Q1      Med       Q3      Max 
## -2.92156 -0.65699 -0.02624  0.68332  3.58296 
## 
## Number of Observations: 2000
## Number of Groups: 50

# Test this covariance component. The covariance can be negative so it is not appropiate to divide by 2.
anova(mod.corr, mod)

##          Model df  AIC  BIC logLik   Test L.Ratio p-value
## mod.corr     1  6 1292 1325 -639.8                       
## mod          2  5 1290 1318 -639.9 1 vs 2 0.01242  0.9113

The test gives us a high P-value (no evidence of a correlation) as we expected.

Nested random effect model

There is another way to examine interactions between a random factor and a fixed. This is called the nested model. A nested version is often said to be, more or less, identical to the first model we fitted (random intercept + random slope but not covariance). In fact, Pinheiro and Bates (2000) compare the mod.corr model with a nested model to test if there is correlation. However, the nested model is not identical. The likelihood estimate do not differ much but there is a difference in how the random effects are partitioned. Lets look closer at these two models.

First we define the random intercept + random slope (no correlation) model. \[Y_ij = (\beta_0 + b_0i) + (\beta_1 + b_1i)x_ij + \epsilon_ij\]

\[\; \begin{equation} b_i = \left( \begin{array}{ccc} b_0i\\ b_1i\\ \end{array} \right) \end{equation} \sim N(0, \Psi), \: \epsilon_ij \sim N(0, \sigma^2)\]

\[ \begin{equation} \Psi = \left( \begin{array}{ccc} \sigma_1^2 & 0 \\ 0 & \sigma_2^2 \\ \end{array} \right) \end{equation} \]

There are no covariances terms in and thus no correlation between random intercept and random slope. We can rewrite this in a vector format and then we get:

\[y_i = X_i\beta + Z_i b_i + \epsilon_ij, \; i = 1, ..., 50,\] \[b_i \sim N(0, \Psi), \epsilon_i \sim N(0, \sigma^2I)\]

where is as above.

The nested model deals differently with the random effects. This model has two levels; the genotype level and treatment within genotype (the genotype x treatment interaction). It can be defined as: \[Y_ijk = \beta_j + b_i + b_ij + \epsilon_ijk, \ \ \ \ \ \ \ \ i = 1, ..., 50, \ j = 1, 2, \ k = 1, ..., 20,\] \[b_i \sim N(0, \sigma_1^2), \ \ \ b_ij \sim N (0, \sigma_2^2), \ \ \ \epsilon \sim N(0, \sigma^2)\]

The nested approach is how you usually model a nested experimental design such as a split-plot experiment with blocks. For example, you then allow for varying intercepts for each treatment within blocks. In our plant response example the nested model is less intuitive. The genotype effect is straight forward, it is the overall position of each genotype, but the nested random effect is allowing different intercepts at each treatment level within genotypes. To better illustrate the differences we can fit the nested model and graph the effects.

# Fit the nested random effect model
mod.nest <- lme(rel.fit ~ treat, random=~1|gen/treat, data = dat) 
# In lme4
# mod.nest.lmer <- lmer(rel.fit ~ treat + (1 | gen)+ (1 | gen:treat), data = dat)

# The fixed effects are the same but the random effects are different
mod.nest

## Linear mixed-effects model fit by REML
##   Data: dat 
##   Log-restricted-likelihood: -651
##   Fixed: rel.fit ~ treat 
## (Intercept)  treattreat 
##     1.01302     0.03345 
## 
## Random effects:
##  Formula: ~1 | gen
##         (Intercept)
## StdDev:      0.1621
## 
##  Formula: ~1 | treat %in% gen
##         (Intercept) Residual
## StdDev:      0.2063   0.3131
## 
## Number of Observations: 2000
## Number of Groups: 
##            gen treat %in% gen 
##             50            100

The nested random effect model does not give us the variation in slopes. So if that is of interest this model may be less useful. The main effect of genotype may, or may not, be of interest. To test of a main effect in presences of a interaction effect is always a topic of debate. In general, the specific question in mind should guide you how to proceed.

Graphic comparison between the two models

We can plot a specific genotype to have a closer look at the differences between the two models

# Select genotype
gen.id = 25
dat.sub <- dat[dat$gen == gen.id,]
# Make numeric for plotting
xnum <- as.numeric(dat.sub$treat)-1
# Plot raw data
plot(dat.sub$rel.fit ~ xnum, ylim = c(0.4, 1.5), xaxt = "n", xlab = "", ylab = "Rel. fitness", las = 1)  
axis(1, c("Control", "Damage"), at= c(0,1) )
# Add line for the fixed effect
abline(fixef(mod))
# Add line for the least-square estimate of this genotype
abline(lm(dat.sub$rel.fit ~ xnum), lty = 2)
# Add line for conditonal genotype effect (the model that includes both fixed and random effects).
abline(a = coef(mod)[gen.id, 1], b = coef(mod)[gen.id, 2],  col = 2, lty = 2)
# We can extract the estimated intercept, i.e. fixed + BLUP of the random intercept 
# And illustrate its effect size with a line
fixef(mod)[1] + ranef(mod)[gen.id,1]

## (Intercept) 
##       1.095

segments(x0 = 0, x1 = 0, y0 = fixef(mod)[1] , y1 = coef(mod)[gen.id, 1], col = 2, lwd = 1.2 )
# We can extract the estimated slope of the genotype, i.e. fixed effect + BLUP of the random slope
fixef(mod)[2] + ranef(mod)[25, 2]

## treattreat 
##    -0.2015

# Compare with the nested random effect model
# Fixed estimates are identical between the two models
summary(mod.nest)

## Linear mixed-effects model fit by REML
##  Data: dat 
##    AIC  BIC logLik
##   1312 1340   -651
## 
## Random effects:
##  Formula: ~1 | gen
##         (Intercept)
## StdDev:      0.1621
## 
##  Formula: ~1 | treat %in% gen
##         (Intercept) Residual
## StdDev:      0.2063   0.3131
## 
## Fixed effects: rel.fit ~ treat 
##              Value Std.Error   DF t-value p-value
## (Intercept) 1.0130   0.03840 1900  26.383  0.0000
## treattreat  0.0334   0.04356   49   0.768  0.4463
##  Correlation: 
##            (Intr)
## treattreat -0.567
## 
## Standardized Within-Group Residuals:
##      Min       Q1      Med       Q3      Max 
## -2.90452 -0.65370 -0.02994  0.68304  3.59840 
## 
## Number of Observations: 2000
## Number of Groups: 
##            gen treat %in% gen 
##             50            100

# Predicted intercept in the control treatment
# Fixed intercept + BLUP genotype + BLUP within treat:genotype random intercept
# The estimate is plotted as a blue point
con.est <- fixef(mod.nest)[1] + ranef(mod.nest)$gen[gen.id, 1] + ranef(mod.nest)$treat$'(Intercept)'[(gen.id*2-1)]   
points(-0.005, con.est, col = "blue")
# The BLUP within treat:genotype is plotted as a line (i.e. the effect size)
segments(x0 = -0.005, x1 = -0.005, y0 = fixef(mod)[1] , y1 = fixef(mod)[1] + ranef(mod.nest)$treat$'(Intercept)'[(gen.id*2-1)], col = "blue", lwd = 1.2 )
# Now the same for the treatment Damage
# First get the treatment mean
treat.me <- fixef(mod.nest)[1] + fixef(mod.nest)[2]
# Illustrate the BLUP within treat:genotype as a line
segments(x0 = 1.005, x1 = 1.005, y0 = treat.me, y1 = treat.me + ranef(mod.nest)$treat$'(Intercept)'[(gen.id*2)], 
         col = "blue", lwd = 1.2 )
# Plot predicted intercept in the Damage treatmen
treat.est <- treat.me + ranef(mod.nest)$gen[gen.id, 1] + ranef(mod.nest)$treat$'(Intercept)'[(gen.id*2)]   
points(1.005, treat.est, col = "blue")
# We can now compare the "slopes" for this genotype between the random int/slope model and the nested random effect model
treat.est - con.est

## (Intercept) 
##     -0.2148

coef(mod)[gen.id, 2]

## [1] -0.2015

# Add "slope" of the nested random effect model
segments(x0 = -0.005, y0 = con.est,  x1 = 1.005, y1 = treat.est, col = "blue", lty = 2 )

# To better illustrate the random effects we can plot their sizes.
# Genotype random effect
segments(x0 = 0.3, y0 = 0.85,  x1 = 0.3, y1 = 0.85 + ranef(mod)[gen.id,1], col = "red", lwd = 1.3 )
segments(x0 = 0.32, y0 = 0.85,  x1 = 0.32, y1 = 0.85 + ranef(mod.nest)$gen[gen.id, 1], col = "blue", lwd = 1.3 )

# Slope vs within genotype:treatment random effect
segments(x0 = 0.3, y0 = 0.65,  x1 = 0.3, y1 = 0.65 + ranef(mod)[gen.id, 2], col = "red", lwd = 1.3 )
segments(x0 = 0.32, y0 = 0.65,  x1 = 0.32, y1 = 0.65 - ranef(mod.nest)$treat$'(Intercept)'[gen.id*2-1], col = "blue", lwd = 1.3 )
segments(x0 = 0.34, y0 = 0.65,  x1 = 0.34, y1 = 0.65 + ranef(mod.nest)$treat$'(Intercept)'[gen.id*2], col = "blue", lwd = 1.3 )
# Sum the nested random effects for both treatments to get the "slope" for this model
segments(x0 = 0.36, y0 = 0.65,  x1 = 0.36, 
         y1 = 0.65 + ranef(mod.nest)$treat$'(Intercept)'[gen.id*2] - ranef(mod.nest)$treat$'(Intercept)'[gen.id*2-1] , col = "blue", lwd = 1.3)
# Text for labeling
text(0.22, 0.89, "Genotype", cex = 0.8)
text(0.22, 0.61, expression(atop("Genotype x","Treatment") ), cex = 0.8)
text(-0.027, 1.058, "a", col = "blue")
text(0.02, 1.058, "b", col = "red")
text(1.017, 0.962, "c", col = "blue")
text(0.44, 0.97, "d", col = "red")
text(0.69, 1.08, "Fixed effect", cex = 1 )

text(0.298, 0.81, "b", col = "red")
text(0.298, 0.683, "d", col = "red")
text(0.319, 0.683, "a", col = "blue")
text(0.339, 0.683, "c", col = "blue")
text(0.375, 0.683, "a+c", col = "blue")
legend("top", c("Red = Random intercept (b)/slope model (d)", "Blue = Nested random effect (a,c) model",
                "(blue points = predicted values within treatments)"), text.col=c("red", "blue", "blue"))

plot of chunk unnamed-chunk-7

Figure. Showing data and fitted models for one specific genotype. The solid black line is the average treatment effect (labelled fixed effect). Red illustrates the fit of the random intercept/slope model while blue is the nested random effect model. Dashed black line is the least-square fit for this genotype while the red dashed line is the fitted slope of the random int/slope model. Blue dashed lines is the calculated slope for the nested model. Solid lines show the random effects (effect size). The nested model shows a very small main effect of genotype while the rand int/slope shows a large effect because it is the change in intercept in the control treatment. The two nested random effects, which allow the intercept to vary in each treatment, sum up to the estimated difference between the treatments (ie slope) by the random int/slope model.

It is clear that the models do have a similar fit, but they model the data differently. First, the main effect of genotype differ largely between the models. In the random int/slope model, the genotype variation is the variation among genotypes in the control treatment. In the nested random effect model, the genotype effect is the overall effect, regardless of treatment. Second, the fixed - random effect interaction is handled in different ways as shown above. In short, the nested model “splits up the slope” into two intercept estimates. Hence, this variance is much smaller than the variance of slopes and may be harder to estimate accurately in some cases. The relationship between the two models can then be further illustrated by examining the slopes.

# Extract the within treatment random effects for each genotype and put them side by side so each row is one genotype
within.treat <- cbind(ranef(mod.nest)$treat$'(Intercept)'[c(TRUE, FALSE)],
                      ranef(mod.nest)$treat$'(Intercept)'[c(FALSE, TRUE)]) 
# Calculate "slopes" by adding the effects 
nest.mod.slopes <- -1 * within.treat[,1] + within.treat[,2]
# make a plot to compare the "slopes" from the two models
plot(nest.mod.slopes , ranef(mod)[, 2], xlab = "Slopes (Nest rand model)", ylab = "Slopes (Rand int/slope model)")

plot of chunk unnamed-chunk-8

The correlation is not perfect but very strong.

*Example with correlation between intercepts and slopes**

Finally, we are going to look at a case where there are large differences in slopes between genotypes but the overall genotype variance is almost zero due to crossing lines (i.e a strong correlation between intercept and slope).

# We simulate some data
set.seed(1)
rel.c <- sort(rnorm(50, 1, sd = 0.2))
gen.var <- rnorm(50, 0, sd = 0.30)
rel.d <- rel.c + sort(gen.var,  decreasing = TRUE) 
n.in = 20
dat.con <- rnorm(50*n.in, rel.c, sd = 0.3)
dat.treat <- rnorm(50*n.in, rel.d, sd = 0.3)
gen.id <- rep(1:50, times = n.in)
dat2 <- data.frame( "gen" = factor(c(gen.id, gen.id)), "treat" = rep(c("con","treat"), each = 50*n.in),
                   "rel.fit" = c(dat.con, dat.treat) )
# We fit the same models again and look at the output
mod2 <- lme(rel.fit ~ treat, random=list(gen = pdDiag(~ treat)), data = dat2) 
mod2.corr <- lme(rel.fit ~ treat, random=~treat|gen, data = dat2) 
mod2.nest <- lme(rel.fit ~ treat, random=~1|gen/treat, data = dat2) 
mod2.null <- lme(rel.fit ~ treat, random=~1|gen, data = dat2) 
mod2

## Linear mixed-effects model fit by REML
##   Data: dat2 
##   Log-restricted-likelihood: -633.7
##   Fixed: rel.fit ~ treat 
## (Intercept)  treattreat 
##     1.01302     0.03345 
## 
## Random effects:
##  Formula: ~treat | gen
##  Structure: Diagonal
##         (Intercept) treattreat Residual
## StdDev:      0.1453     0.2481    0.314
## 
## Number of Observations: 2000
## Number of Groups: 50

mod2.corr

## Linear mixed-effects model fit by REML
##   Data: dat2 
##   Log-restricted-likelihood: -579.5
##   Fixed: rel.fit ~ treat 
## (Intercept)  treattreat 
##     1.01302     0.03345 
## 
## Random effects:
##  Formula: ~treat | gen
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev Corr  
## (Intercept) 0.1657 (Intr)
## treattreat  0.2808 -0.993
## Residual    0.3131       
## 
## Number of Observations: 2000
## Number of Groups: 50

mod2.nest

## Linear mixed-effects model fit by REML
##   Data: dat2 
##   Log-restricted-likelihood: -602.4
##   Fixed: rel.fit ~ treat 
## (Intercept)  treattreat 
##     1.01302     0.03345 
## 
## Random effects:
##  Formula: ~1 | gen
##         (Intercept)
## StdDev:   2.203e-05
## 
##  Formula: ~1 | treat %in% gen
##         (Intercept) Residual
## StdDev:      0.1438   0.3131
## 
## Number of Observations: 2000
## Number of Groups: 
##            gen treat %in% gen 
##             50            100

mod2.null

## Linear mixed-effects model fit by REML
##   Data: dat2 
##   Log-restricted-likelihood: -709.1
##   Fixed: rel.fit ~ treat 
## (Intercept)  treattreat 
##     1.01302     0.03345 
## 
## Random effects:
##  Formula: ~1 | gen
##         (Intercept) Residual
## StdDev:     0.02143   0.3433
## 
## Number of Observations: 2000
## Number of Groups: 50

# We notice that the random int/slope models capture the simualted data rather well and there is a strong correlation between intercepts and slopes. The overall genotype effect is almost zero.
# We can test the covariance between intercepts and slopes, and for varying slopes.
anova(mod2.corr, mod2, mod2.null)

##           Model df  AIC  BIC logLik   Test L.Ratio p-value
## mod2.corr     1  6 1171 1205 -579.5                       
## mod2          2  5 1277 1305 -633.7 1 vs 2   108.4  <.0001
## mod2.null     3  4 1426 1448 -709.1 2 vs 3   150.8  <.0001

Here is a plot showing the crossing slopes.

gen.slope <- coef(mod2) 
ran.data <- data.frame(rownames(gen.slope), gen.slope)
colnames(ran.data) <- c('genotype', 'intercept', 'slope')

# Plot fitted genotype effects.
plot(c(0.5,1.5) ~ c(0.45, 1.5), type = "n", 
     ylab = "Rel. fitness", xlab = "Treatment", xaxt = "n", las = 1)
axis(1, c("Control", "Damage"), at= c(0.5,1.5) )
# Add lines
apply(ran.data, 1, function (x) lines( c(0.5,1.5),y=c(x[2], (as.numeric(x[2])+as.numeric(x[3])) ) ))

plot of chunk unnamed-chunk-10

We can compare this with our first example where the slopes varied but there was no correlation between intercepts and slopes.

plot of chunk unnamed-chunk-11

Random intercept/slope model vs. nested random effect model in R

Gustaf Granath

Intro