Correlated Data
This week we continued to work with correlated data. Here are common types of correlation structure:
Longitudinal: Multiple measurements through time. An example is measuring the weight of babies every month.
Spatial: Observations near each other in location are likely to be similar.
Clustered: Examples include choosing a dorm and randomly sampling rooms in that dorm, or randomly choosing a room and sampling everyone in that room.
Repeated measurements: Multiple measurements on each subject.
An Example
Here is an example of some of the practice that we did to identify correlation structures among other things.
Study: Car accidents. The monthly number of car accidents at fifteen intersections during 2013.
Primary Sampling Unit(PSU): Intersection. PSU is usually the thing that gets sampled multiple times.
Response: The number of accidents.
Correlation Structure: Spatial, nearby intersections are likely to have similar numbers of accidents.
R Code
Here is an example of how a mixed model is implemented in R:
Galton <- read.csv("http://www.cknudson.com/data/Galton.csv")
library(lme4)
## Warning: package 'lme4' was built under R version 3.6.3
## Loading required package: Matrix
mixedmod <- lmer(Height ~ 0 + Gender + (1|FamilyID), data=Galton)
summary(mixedmod)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Height ~ 0 + Gender + (1 | FamilyID)
## Data: Galton
##
## REML criterion at convergence: 4007.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.9475 -0.5661 0.0067 0.5937 3.5069
##
## Random effects:
## Groups Name Variance Std.Dev.
## FamilyID (Intercept) 2.448 1.564
## Residual 3.843 1.960
## Number of obs: 898, groups: FamilyID, 197
##
## Fixed effects:
## Estimate Std. Error t value
## GenderF 64.1489 0.1542 415.9
## GenderM 69.3019 0.1505 460.5
##
## Correlation of Fixed Effects:
## GendrF
## GenderM 0.567
This model predicts the height of a child. The fixed effects are two slopes with indicator functions for male and female. Notice how ‘Height ~ 0’ is used to create a model without an intercept. This makes is easier to compare the impact gender has on height. However, we could have included an intercept if that was wanted by omitting the 0. There is a random effect for each FamilyID. This means that families are the PSU. The ‘1|FamilyID’ is used to create an intercept for the random effect. Here is what the model looks like:
\[\hat{height_{ij}}=64.1489I(female)+69.3019I(male)+u_{j} \]
where
\[u_j\thicksim{}N(0,1.564^2)iid \]
There are two sources of variation in this model. The first is the variance of the residuals. This is the variance within families, from kid to kid. This variance is \(3.843\). The other type is the variance of the random effects. This is the variability from family to family. It is \(2.448\).