Week 9 Summary

Correlated Data

This week we continued to work with correlated data. Here are common types of correlation structure:

Longitudinal: Multiple measurements through time. An example is measuring the weight of babies every month.
Spatial: Observations near each other in location are likely to be similar.
Clustered: Examples include choosing a dorm and randomly sampling rooms in that dorm, or randomly choosing a room and sampling everyone in that room.
Repeated measurements: Multiple measurements on each subject.

An Example

Here is an example of some of the practice that we did to identify correlation structures among other things.

Study: Car accidents. The monthly number of car accidents at fifteen intersections during 2013.

Primary Sampling Unit(PSU): Intersection. PSU is usually the thing that gets sampled multiple times.
Response: The number of accidents.
Correlation Structure: Spatial, nearby intersections are likely to have similar numbers of accidents.

R Code

Here is an example of how a mixed model is implemented in R:

Galton <-  read.csv("http://www.cknudson.com/data/Galton.csv")
library(lme4)

## Warning: package 'lme4' was built under R version 3.6.3

## Loading required package: Matrix

mixedmod <- lmer(Height ~ 0 + Gender + (1|FamilyID), data=Galton)
summary(mixedmod)

## Linear mixed model fit by REML ['lmerMod']
## Formula: Height ~ 0 + Gender + (1 | FamilyID)
##    Data: Galton
## 
## REML criterion at convergence: 4007.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9475 -0.5661  0.0067  0.5937  3.5069 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  FamilyID (Intercept) 2.448    1.564   
##  Residual             3.843    1.960   
## Number of obs: 898, groups:  FamilyID, 197
## 
## Fixed effects:
##         Estimate Std. Error t value
## GenderF  64.1489     0.1542   415.9
## GenderM  69.3019     0.1505   460.5
## 
## Correlation of Fixed Effects:
##         GendrF
## GenderM 0.567

This model predicts the height of a child. The fixed effects are two slopes with indicator functions for male and female. Notice how ‘Height ~ 0’ is used to create a model without an intercept. This makes is easier to compare the impact gender has on height. However, we could have included an intercept if that was wanted by omitting the 0. There is a random effect for each FamilyID. This means that families are the PSU. The ‘1|FamilyID’ is used to create an intercept for the random effect. Here is what the model looks like:

\[\hat{height_{ij}}=64.1489I(female)+69.3019I(male)+u_{j} \]

where

\[u_j\thicksim{}N(0,1.564^2)iid \]

There are two sources of variation in this model. The first is the variance of the residuals. This is the variance within families, from kid to kid. This variance is \(3.843\). The other type is the variance of the random effects. This is the variability from family to family. It is \(2.448\).

Week 9 Summary

Sam K

4/18/2021