Correlated Data

This week we continued to work with correlated data. Here are common types of correlation structure:

An Example

Here is an example of some of the practice that we did to identify correlation structures among other things.

Study: Car accidents. The monthly number of car accidents at fifteen intersections during 2013.

R Code

Here is an example of how a mixed model is implemented in R:

Galton <-  read.csv("http://www.cknudson.com/data/Galton.csv")
library(lme4)
## Warning: package 'lme4' was built under R version 3.6.3
## Loading required package: Matrix
mixedmod <- lmer(Height ~ 0 + Gender + (1|FamilyID), data=Galton)
summary(mixedmod)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Height ~ 0 + Gender + (1 | FamilyID)
##    Data: Galton
## 
## REML criterion at convergence: 4007.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9475 -0.5661  0.0067  0.5937  3.5069 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  FamilyID (Intercept) 2.448    1.564   
##  Residual             3.843    1.960   
## Number of obs: 898, groups:  FamilyID, 197
## 
## Fixed effects:
##         Estimate Std. Error t value
## GenderF  64.1489     0.1542   415.9
## GenderM  69.3019     0.1505   460.5
## 
## Correlation of Fixed Effects:
##         GendrF
## GenderM 0.567

This model predicts the height of a child. The fixed effects are two slopes with indicator functions for male and female. Notice how ‘Height ~ 0’ is used to create a model without an intercept. This makes is easier to compare the impact gender has on height. However, we could have included an intercept if that was wanted by omitting the 0. There is a random effect for each FamilyID. This means that families are the PSU. The ‘1|FamilyID’ is used to create an intercept for the random effect. Here is what the model looks like:

\[\hat{height_{ij}}=64.1489I(female)+69.3019I(male)+u_{j} \]

where

\[u_j\thicksim{}N(0,1.564^2)iid \]

There are two sources of variation in this model. The first is the variance of the residuals. This is the variance within families, from kid to kid. This variance is \(3.843\). The other type is the variance of the random effects. This is the variability from family to family. It is \(2.448\).