In an effort to add additional theoretical and practical clarity to the class, a number of days will include brief simulations demonstrating important concepts or tools. Today we will cover pooled models, dummy variables, random intercepts, and random slopes. We will also compare random effects to fixed effects and demonstrate that the former are a regularized version of the latter utilizing data augmentation.
This is not the way in which the topic is usually taught and I have been unable to find clearly expressed analogous treatments. I have found viewing fixed and random effects in the way described below quite clarifying. For the mess of other ways in which the terms “fixed” and “random” effects have been used historically – and the resulting obfuscation of their close relationship – ask John for references!
Simpson’s Paradox is a wonderfull illustration of how failure to model structure within one’s data may lead to incorrect inferences. Let’s make some simple multilevel data to illustrate the paradox.
library(ggplot2)
set.seed(1234)
nobs <- 1000
n_groups <- 4
grp <- rep_len(1:n_groups,nobs)
labels <- letters[grp]
grp_means <- -seq(-n_groups,n_groups,length=n_groups)
x <- rnorm(nobs,grp_means[grp])
err <- rnorm(nobs)
y <- x + 8*grp + err
data <- data.frame(y,x,grp,grp_means[grp],labels)
ggplot(data,aes(x=x,y=y,color=labels)) +
geom_point()
This example is somewhat extreme to illustrate what is going on. Each blob of data are observations from one group. Within each group, as the level of x increases, y increases. Without taking this grouping structure into account, i.e. examining only the pooled model, it is apparent that a negative relationship between y and x would be estimated.
mod <- lm(y~x,data=data)
summary(mod)
##
## Call:
## lm(formula = y ~ x, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.4648 -2.0264 -0.1024 1.9968 9.9898
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19.94274 0.09672 206.2 <2e-16 ***
## x -1.69852 0.03071 -55.3 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.059 on 998 degrees of freedom
## Multiple R-squared: 0.754, Adjusted R-squared: 0.7537
## F-statistic: 3058 on 1 and 998 DF, p-value: < 2.2e-16
Let’s take a quick look at how to deal with this problem using both fixed and random effects. Fixed effects can be specified by adding the labels into the specification of the linear regression model, as below.
fe <- lm(y ~ x + labels - 1,data=data)
summary(fe)
##
## Call:
## lm(formula = y ~ x + labels - 1, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.1838 -0.6363 0.0134 0.6515 3.1031
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## x 1.05606 0.03111 33.95 <2e-16 ***
## labelsa 7.75799 0.13759 56.39 <2e-16 ***
## labelsb 15.87814 0.07486 212.09 <2e-16 ***
## labelsc 24.17006 0.07481 323.07 <2e-16 ***
## labelsd 32.25780 0.14063 229.38 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9801 on 995 degrees of freedom
## Multiple R-squared: 0.9978, Adjusted R-squared: 0.9978
## F-statistic: 9.089e+04 on 5 and 995 DF, p-value: < 2.2e-16
To estimate random intercepts not much more is required:
library(lme4)
re <- lmer(y ~ (1|labels) + x,data=data)
summary(re)
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ (1 | labels) + x
## Data: data
##
## REML criterion at convergence: 2836.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2475 -0.6483 0.0127 0.6653 3.1632
##
## Random effects:
## Groups Name Variance Std.Dev.
## labels (Intercept) 111.4381 10.5564
## Residual 0.9606 0.9801
## Number of obs: 1000, groups: labels, 4
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 20.0160 5.2783 3.792
## x 1.0551 0.0311 33.926
##
## Correlation of Fixed Effects:
## (Intr)
## x 0.000
Note that for both methods we were able to get an accurate estimate of the effect of x on y. We can also take a look at the estimates of the group effects.
ints <- data.frame(truth = 8*unique(grp),
fe_est = fe$coefficients[2:(n_groups+1)],
re_est = coef(re)$labels[,1],
row.names = unique(labels))
ints
## truth fe_est re_est
## a 8 7.757995 7.762162
## b 16 15.878145 15.879567
## c 24 24.170060 24.168640
## d 32 32.257797 32.253527
You’ll note that they are quite similar. An explicit relationship can actually be shown. If you’re interested in the theory, take a look at slides 102–116 of this lecture. For our purposes, it is sufficient to say that we will be regularizing the fixed effects coefficients with a ridge penalty by hand.
First, let’s extract the model matrix from the fixed effects regression.
design <- data.frame(y,model.matrix(fe))
head(design)
## y x labelsa labelsb labelsc labelsd
## 1 9.587601 2.7929343 1 0 0 0
## 2 17.912229 1.6107626 0 1 0 0
## 3 22.211963 -0.2488922 0 0 1 0
## 4 26.289673 -6.3456977 0 0 0 1
## 5 13.132076 4.4291247 1 0 0 0
## 6 15.933506 1.8393892 0 1 0 0
Fixed effects rely on vectors of dummy variables within the design matrix. What we will do is demonstrate that, once smoothed, these are equivalent to random effect estimates. This is because random effects are regularized fixed effects. First, let’s extract the variance components and calculate the penalty.
ests <- data.frame(VarCorr(re))
sigma_u <- ests[ests$grp=="labels","vcov"]
sigma_e <- ests[ests$grp=="Residual","vcov"]
lambda <- sigma_e/sigma_u
Next, we will add pseudo observations to our initial design matrix and estimate the regularized fixed effects coefficients.
tmp <- data.frame(diag(sqrt(lambda),ncol=n_groups,nrow=n_groups))
pseudo <- data.frame(y=rep(0,n_groups),
x=rep(0,n_groups),
tmp)
colnames(pseudo) <- colnames(design)
aug <- rbind(design,pseudo)
out <- lm(y~.-1,data=aug)
summary(out)
##
## Call:
## lm(formula = y ~ . - 1, data = aug)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.1821 -0.6410 0.0088 0.6506 3.1008
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## x 1.05510 0.03131 33.70 <2e-16 ***
## labelsa 7.76155 0.13848 56.05 <2e-16 ***
## labelsb 15.87890 0.07536 210.71 <2e-16 ***
## labelsc 24.16793 0.07531 320.92 <2e-16 ***
## labelsd 32.25276 0.14155 227.86 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9866 on 999 degrees of freedom
## Multiple R-squared: 0.9978, Adjusted R-squared: 0.9978
## F-statistic: 8.968e+04 on 5 and 999 DF, p-value: < 2.2e-16
Let’s compare the estimates of the group effects to those that we got before.
ints$augmented <- out$coefficients[2:(n_groups+1)]
round(ints,3)
## truth fe_est re_est augmented
## a 8 7.758 7.762 7.762
## b 16 15.878 15.880 15.879
## c 24 24.170 24.169 24.168
## d 32 32.258 32.254 32.253
Note that the augmented coefficient estimates are vitually identical to the random effect estimates.
The theorem shows that if we are interested in a multivariate linear regression that we can get the same result through orthogonal projection. This can be used to illustrate what, exactly, is going on when we introduce fixed effects.
# Regress y on unit dummies and save residuals
data$y_res <- lm(y~.,data=design[,-2])$residuals
# Regress x on unit dummies and save residuals
data$x_res <- lm(x~.,data = design[,-1])$residuals
ggplot(data,aes(x=x_res,y=y_res,color=labels)) + geom_point()
within <- lm(y_res ~ x_res, data = data)
summary(within)
##
## Call:
## lm(formula = y_res ~ x_res, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.1838 -0.6363 0.0134 0.6515 3.1031
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.692e-16 3.095e-02 0 1
## x_res 1.056e+00 3.106e-02 34 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9786 on 998 degrees of freedom
## Multiple R-squared: 0.5367, Adjusted R-squared: 0.5362
## F-statistic: 1156 on 1 and 998 DF, p-value: < 2.2e-16
What happened here? Well, we have just removed the group means from both y and x in a fancy way; see also this animation.
library(dplyr)
data %>%
group_by(labels) %>%
summarise(y=mean(y),
x=mean(x),
y_res=mean(y_res),
x_res=mean(x_res)) %>%
mutate_if(is.numeric,round,3)
## # A tibble: 4 x 5
## labels y x y_res x_res
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 a 11.9 3.95 0 0
## 2 b 17.3 1.35 0 0
## 3 c 22.7 -1.35 0 0
## 4 d 28.0 -4.06 0 0
We can do the same thing with projections.
dummies <- as.matrix(design[,3:ncol(design)])
M <- diag(nobs) - dummies %*% solve(t(dummies) %*% dummies) %*% t(dummies)
data$y_proj <- M %*% data$y
data$x_proj <- M %*% data$x
summary(lm(y_proj ~ x_proj, data=data))
##
## Call:
## lm(formula = y_proj ~ x_proj, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.1838 -0.6363 0.0134 0.6515 3.1031
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.591e-16 3.095e-02 0 1
## x_proj 1.056e+00 3.106e-02 34 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9786 on 998 degrees of freedom
## Multiple R-squared: 0.5367, Adjusted R-squared: 0.5362
## F-statistic: 1156 on 1 and 998 DF, p-value: < 2.2e-16
Or with (slightly smoothed) GLS
X <- data$x
y <- data$y
C <- solve(M + diag(0.00001,nrow(M)))
sigma <- C %*% t(C)
solve(t(X) %*% solve(sigma) %*% X) %*% t(X) %*% solve(sigma) %*% y
## [,1]
## [1,] 1.056063
Neat! If you’re interested in this I suggest reading Nerlove (1971 a) and Nerlove (1971 b).
We can conduct a similar exercise to illustrate what is meant by random slopes and the connection, once again, to the “fixed effects” version of the model. Let’s simulate some data featuring both slopes and intercepts that vary by group.
set.seed(1234)
nobs <- 1000
n_groups <- 10
grp <- rep_len(1:n_groups,nobs)
labels <- letters[grp]
grp_means <- rnorm(n_groups,0,1)
grp_ints <- rnorm(n_groups,0,1)
grp_slps <- rnorm(n_groups,0,1)
x <- rnorm(nobs,grp_means[grp])
err <- rnorm(nobs)
y <- grp_slps[grp]*x + grp_ints[grp] + err
data <- data.frame(y,x,grp,grp_means[grp],labels)
ggplot(data,aes(x=x,y=y,color=labels)) +
geom_point()
As expected, the pooled model fails to estimate the relationship of interest.
summary(lm(y~x,data=data))
##
## Call:
## lm(formula = y ~ x, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.6246 -1.1399 -0.2360 0.8897 6.5323
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.36534 0.05807 -6.292 4.69e-10 ***
## x -0.23326 0.04085 -5.710 1.49e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.759 on 998 degrees of freedom
## Multiple R-squared: 0.03163, Adjusted R-squared: 0.03066
## F-statistic: 32.6 on 1 and 998 DF, p-value: 1.491e-08
The fixed effects version of the estimate involves including the interaction of group membership with x.
fe <- lm(y ~ x*labels-1, data = data)
summary(fe)
##
## Call:
## lm(formula = y ~ x * labels - 1, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.14836 -0.63328 0.01518 0.65169 2.94437
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## x 0.24189 0.09223 2.623 0.00886 **
## labelsa -0.43954 0.14755 -2.979 0.00296 **
## labelsb -1.06838 0.10074 -10.605 < 2e-16 ***
## labelsc -0.78425 0.14172 -5.534 4.02e-08 ***
## labelsd 0.31865 0.28132 1.133 0.25763
## labelse 0.99075 0.10748 9.218 < 2e-16 ***
## labelsf -0.04212 0.11978 -0.352 0.72515
## labelsg -0.32045 0.10986 -2.917 0.00362 **
## labelsh -0.89284 0.12120 -7.367 3.71e-13 ***
## labelsi -0.81970 0.12130 -6.758 2.40e-11 ***
## labelsj 2.42865 0.13804 17.594 < 2e-16 ***
## x:labelsb -0.65894 0.13221 -4.984 7.35e-07 ***
## x:labelsc -0.66297 0.13647 -4.858 1.38e-06 ***
## x:labelsd 0.34852 0.14768 2.360 0.01847 *
## x:labelse -0.86011 0.14144 -6.081 1.71e-09 ***
## x:labelsf -1.63979 0.14179 -11.565 < 2e-16 ***
## x:labelsg 0.33217 0.12020 2.763 0.00583 **
## x:labelsh -1.20438 0.13255 -9.087 < 2e-16 ***
## x:labelsi -0.16260 0.15296 -1.063 0.28801
## x:labelsj -1.21467 0.14049 -8.646 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9893 on 980 degrees of freedom
## Multiple R-squared: 0.7059, Adjusted R-squared: 0.6999
## F-statistic: 117.6 on 20 and 980 DF, p-value: < 2.2e-16
The random effects version is similarly easy to specify.
re <- lmer(y ~ x + (x|labels), data = data)
summary(re)
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ x + (x | labels)
## Data: data
##
## REML criterion at convergence: 2901.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.1540 -0.6356 0.0270 0.6581 2.9507
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## labels (Intercept) 1.1465 1.0707
## x 0.4571 0.6761 -0.27
## Residual 0.9787 0.9893
## Number of obs: 1000, groups: labels, 10
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) -0.07741 0.34169 -0.227
## x -0.33374 0.21622 -1.544
##
## Correlation of Fixed Effects:
## (Intr)
## x -0.260
Let’s take a loot at the estimates from both methods compared to the truth (I’ll leave modifying the data augmentation approach from above to interested students as an exercise). First, let’s look at the intercept estimates.
ints <- data.frame(true_ints = grp_ints,
fe_int = fe$coefficients[2:(n_groups+1)],
re_int = coef(re)$labels[,1],
row.names = unique(labels))
ints
## true_ints fe_int re_int
## a -0.47719270 -0.43954017 -0.44891873
## b -0.99838644 -1.06838080 -1.05972023
## c -0.77625389 -0.78425492 -0.77536845
## d 0.06445882 0.31864875 0.20328617
## e 0.95949406 0.99074539 0.97952017
## f -0.11028549 -0.04212428 -0.05417239
## g -0.51100951 -0.32044557 -0.32937617
## h -0.91119542 -0.89284252 -0.86663572
## i -0.83717168 -0.81970246 -0.81755623
## j 2.41583518 2.42864833 2.39487016
No surprises here. Let’s take a look at the slope estimates:
slps <- data.frame(true_slps = grp_slps,
fe_int = c(fe$coefficients[1],fe$coefficients[(n_groups+2):(2*n_groups)]+fe$coefficients[1]),
re_int = coef(re)$labels[,2],
row.names = unique(labels))
slps
## true_slps fe_int re_int
## a 0.1340882 0.24188535 0.2329934
## b -0.4906859 -0.41705083 -0.4135118
## c -0.4405479 -0.42108301 -0.4231631
## d 0.4595894 0.59041031 0.5425394
## e -0.6937202 -0.61822396 -0.6116245
## f -1.4482049 -1.39790200 -1.3724575
## g 0.5747557 0.57405818 0.5616333
## h -1.0236557 -0.96249517 -0.9400246
## i -0.0151383 0.07928402 0.0739659
## j -0.9359486 -0.97278635 -0.9877411
Not bad!
The best way to learn R is to use R, and the best way to really understand methods is to poke around at them either with the mathematical theory or simulations. I suggest playing around with the simulation code and the notion of data augmentation for penalization. Recognizing that this amounts to adding particular observations to the data, reason through the following questions:
summary(re)
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ x + (x | labels)
## Data: data
##
## REML criterion at convergence: 2901.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.1540 -0.6356 0.0270 0.6581 2.9507
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## labels (Intercept) 1.1465 1.0707
## x 0.4571 0.6761 -0.27
## Residual 0.9787 0.9893
## Number of obs: 1000, groups: labels, 10
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) -0.07741 0.34169 -0.227
## x -0.33374 0.21622 -1.544
##
## Correlation of Fixed Effects:
## (Intr)
## x -0.260