Joel Correa da Rosa
June, 19th 2017
It is common to make studies where the subjects are followed over time.
The regression approach assumes independence between data which is not the case when some subjects contribute with two or more observations.
The same violation for the classical regression happens when subjects are grouped by schools, households, classes, …
The solution to enable the use of statistical models is the inclusion of a variable to absorb the group correlation and allow the residuals to be uncorrelated.
fixed-effects model
random-effects model
A dataset of weight of the babies of 200 mothers who all have five children. The objective is to measure the association between the age of the mother and birth weight.
This data set can be load with the following R command:
# babies=read.table("http://www.uio.no/studier/emner/matnat/math/STK4900/v16/gababies.txt", header=T)
momid | birthord | momage | timesnc | delwght | lowbrth | bweight | lastwght | initage | initwght | cinitage |
---|---|---|---|---|---|---|---|---|---|---|
39 | 1 | 15 | 0 | -1240 | 0 | 3720 | 2480 | 15 | 3720 | -2.545 |
39 | 2 | 17 | 2 | -1240 | 0 | 3260 | 2480 | 15 | 3720 | -2.545 |
39 | 3 | 19 | 4 | -1240 | 0 | 3910 | 2480 | 15 | 3720 | -2.545 |
39 | 4 | 24 | 9 | -1240 | 0 | 3320 | 2480 | 15 | 3720 | -2.545 |
39 | 5 | 25 | 10 | -1240 | 1 | 2480 | 2480 | 15 | 3720 | -2.545 |
62 | 1 | 17 | 0 | -170 | 1 | 2381 | 2211 | 17 | 2381 | -0.545 |
62 | 2 | 21 | 4 | -170 | 1 | 2835 | 2211 | 17 | 2381 | -0.545 |
62 | 3 | 25 | 8 | -170 | 1 | 2381 | 2211 | 17 | 2381 | -0.545 |
62 | 4 | 27 | 10 | -170 | 1 | 2268 | 2211 | 17 | 2381 | -0.545 |
62 | 5 | 28 | 11 | -170 | 1 | 2211 | 2211 | 17 | 2381 | -0.545 |
plot(babies$birthord,babies$bweight,type="n",xlab="Birth order",ylab="Birth Weight")
points(babies$birthord,babies$bweight,type="p",pch=3)
for(i in unique(babies$momid))
{
lines(babies$birthord[babies$momid==i],babies$bweight[babies$momid==i],type="l",lty=2)
}
Within the same mother, the birth weights are not independent and this has to be taken into account in the data analysis. See below the output of the “naive” analysis that ignores the dependence of the data within the same mother.
fit<-lm(bweight~momage,data=babies)
summary(fit)
Call:
lm(formula = bweight ~ momage, data = babies)
Residuals:
Min 1Q Median 3Q Max
-2942.81 -293.58 28.38 327.51 1747.63
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2754.504 69.101 39.862 < 2e-16 ***
momage 17.610 3.083 5.712 1.48e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 571.2 on 998 degrees of freedom
Multiple R-squared: 0.03165, Adjusted R-squared: 0.03068
F-statistic: 32.62 on 1 and 998 DF, p-value: 1.476e-08
n<-dim(babies)[1]
plot(fit$residuals[2:n],fit$residuals[1:(n-1)])
Residuals are autocorrelated
cor(fit$residuals[2:n],fit$residuals[1:(n-1)])
[1] 0.3394293
A simple solution is to include a variable to control the effect due to each one of the mothers. This is called the fixed effects approach.
fit<-lm(bweight~momage+factor(momid),data=babies)
n<-dim(babies)[1]
plot(fit$residuals[2:n],fit$residuals[1:(n-1)])
Although this approach has positive impact on the residuals autocorrelation, it fits a model with a large number of parameters (one for each mother).
Instead of estimating one effect for each mother, the random-effects approach assumes that there is a random effect from mother that follows the normal distribution with mean 0 and variance \( \sigma_{\alpha}^2 \)
For the birth weight problem we will fit a random-effects model according to the following equation:
\( BirthWeight_{ij} = \beta_0 +\beta_1 Age_{ij} + \alpha_i +\epsilon_{ij} \)
This model describe the \( j \)-th birth from the \( i \)-th mother as a function of the age of the \( i \)-mother at the \( j \)-th birth plus a random effect from the \( i \)-th mother.
\( \alpha_i\sim \) Normal(0,\( \sigma_{\alpha}^2 \))
\( \epsilon_{ij} \sim \) Normal(0,\( \sigma^2 \))
\( \sigma_{\alpha}^2 \) : Covariance for two measurements from the same mother.
\( \sigma_{\alpha}^2 + \sigma^2 \) : Variance of the Birth Weight
\( \frac{\sigma_{\alpha}^2}{\sigma_{\alpha}^2 + \sigma^2} \) : Correlation of two measurements from the same mother
There are different libraries in R for fitting random-effects models. Here we show an application for the library lme4.
require(lme4)
fit.lme<-lmer(bweight~momage+(1|momid),data=babies)
There are different libraries in R for fitting random-effects models. Here we show an application for the library lme4.
summary(fit.lme)
Linear mixed model fit by REML ['lmerMod']
Formula: bweight ~ momage + (1 | momid)
Data: babies
REML criterion at convergence: 15312
Scaled residuals:
Min 1Q Median 3Q Max
-5.1122 -0.4427 0.0469 0.5293 4.0901
Random effects:
Groups Name Variance Std.Dev.
momid (Intercept) 127796 357.5
Residual 199047 446.1
Number of obs: 1000, groups: momid, 200
Fixed effects:
Estimate Std. Error t value
(Intercept) 2793.678 73.031 38.25
momage 15.799 3.099 5.10
Correlation of Fixed Effects:
(Intr)
momage -0.918
plot(residuals(fit.lme)[1:(n-1)],residuals(fit.lme)[2:n])