Lecture 15: Longitudinal Data

Joel Correa da Rosa
June, 19th 2017

Longitudinal Data

It is common to make studies where the subjects are followed over time.

The regression approach assumes independence between data which is not the case when some subjects contribute with two or more observations.

The same violation for the classical regression happens when subjects are grouped by schools, households, classes, …

The solution to enable the use of statistical models is the inclusion of a variable to absorb the group correlation and allow the residuals to be uncorrelated.

Strategies for Longitudinal Data

  • fixed-effects model

  • random-effects model

Example: Birth Weight

A dataset of weight of the babies of 200 mothers who all have five children. The objective is to measure the association between the age of the mother and birth weight.

This data set can be load with the following R command:

# babies=read.table("http://www.uio.no/studier/emner/matnat/math/STK4900/v16/gababies.txt", header=T)

Example: Birth Weight

momid birthord momage timesnc delwght lowbrth bweight lastwght initage initwght cinitage
39 1 15 0 -1240 0 3720 2480 15 3720 -2.545
39 2 17 2 -1240 0 3260 2480 15 3720 -2.545
39 3 19 4 -1240 0 3910 2480 15 3720 -2.545
39 4 24 9 -1240 0 3320 2480 15 3720 -2.545
39 5 25 10 -1240 1 2480 2480 15 3720 -2.545
62 1 17 0 -170 1 2381 2211 17 2381 -0.545
62 2 21 4 -170 1 2835 2211 17 2381 -0.545
62 3 25 8 -170 1 2381 2211 17 2381 -0.545
62 4 27 10 -170 1 2268 2211 17 2381 -0.545
62 5 28 11 -170 1 2211 2211 17 2381 -0.545

Example: Birth Weight

plot of chunk unnamed-chunk-5

Example: Birth Weight

 plot(babies$birthord,babies$bweight,type="n",xlab="Birth order",ylab="Birth Weight") 
 points(babies$birthord,babies$bweight,type="p",pch=3) 
 for(i in unique(babies$momid)) 
 { 
 lines(babies$birthord[babies$momid==i],babies$bweight[babies$momid==i],type="l",lty=2) 
}

plot of chunk unnamed-chunk-6

Example: Birth Weight

Within the same mother, the birth weights are not independent and this has to be taken into account in the data analysis. See below the output of the “naive” analysis that ignores the dependence of the data within the same mother.

fit<-lm(bweight~momage,data=babies)
summary(fit)

Call:
lm(formula = bweight ~ momage, data = babies)

Residuals:
     Min       1Q   Median       3Q      Max 
-2942.81  -293.58    28.38   327.51  1747.63 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 2754.504     69.101  39.862  < 2e-16 ***
momage        17.610      3.083   5.712 1.48e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 571.2 on 998 degrees of freedom
Multiple R-squared:  0.03165,   Adjusted R-squared:  0.03068 
F-statistic: 32.62 on 1 and 998 DF,  p-value: 1.476e-08

Example: Birth Weight (Linear Model ignoring Mother's effects)

n<-dim(babies)[1]
plot(fit$residuals[2:n],fit$residuals[1:(n-1)])

plot of chunk unnamed-chunk-8

Residuals are autocorrelated

cor(fit$residuals[2:n],fit$residuals[1:(n-1)])
[1] 0.3394293

Fixed-Effects Approach

A simple solution is to include a variable to control the effect due to each one of the mothers. This is called the fixed effects approach.

fit<-lm(bweight~momage+factor(momid),data=babies)

Fixed-Effects Approach

n<-dim(babies)[1]
plot(fit$residuals[2:n],fit$residuals[1:(n-1)])

plot of chunk unnamed-chunk-11

Fixed-Effects Approach

Although this approach has positive impact on the residuals autocorrelation, it fits a model with a large number of parameters (one for each mother).

Random-effects Approach

Instead of estimating one effect for each mother, the random-effects approach assumes that there is a random effect from mother that follows the normal distribution with mean 0 and variance \( \sigma_{\alpha}^2 \)

Random-effects Regression Model (Simpler formulation)

For the birth weight problem we will fit a random-effects model according to the following equation:

\( BirthWeight_{ij} = \beta_0 +\beta_1 Age_{ij} + \alpha_i +\epsilon_{ij} \)

This model describe the \( j \)-th birth from the \( i \)-th mother as a function of the age of the \( i \)-mother at the \( j \)-th birth plus a random effect from the \( i \)-th mother.

\( \alpha_i\sim \) Normal(0,\( \sigma_{\alpha}^2 \))

\( \epsilon_{ij} \sim \) Normal(0,\( \sigma^2 \))

Interpretation of Random-effects

\( \sigma_{\alpha}^2 \) : Covariance for two measurements from the same mother.

\( \sigma_{\alpha}^2 + \sigma^2 \) : Variance of the Birth Weight

\( \frac{\sigma_{\alpha}^2}{\sigma_{\alpha}^2 + \sigma^2} \) : Correlation of two measurements from the same mother

Random-effects in R

There are different libraries in R for fitting random-effects models. Here we show an application for the library lme4.

require(lme4)
fit.lme<-lmer(bweight~momage+(1|momid),data=babies)

Random-effects in R

There are different libraries in R for fitting random-effects models. Here we show an application for the library lme4.

summary(fit.lme)
Linear mixed model fit by REML ['lmerMod']
Formula: bweight ~ momage + (1 | momid)
   Data: babies

REML criterion at convergence: 15312

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-5.1122 -0.4427  0.0469  0.5293  4.0901 

Random effects:
 Groups   Name        Variance Std.Dev.
 momid    (Intercept) 127796   357.5   
 Residual             199047   446.1   
Number of obs: 1000, groups:  momid, 200

Fixed effects:
            Estimate Std. Error t value
(Intercept) 2793.678     73.031   38.25
momage        15.799      3.099    5.10

Correlation of Fixed Effects:
       (Intr)
momage -0.918

Random-effects in R

plot(residuals(fit.lme)[1:(n-1)],residuals(fit.lme)[2:n])

plot of chunk unnamed-chunk-14