This week we worked on evaluating and analyzing mixed models.
First, I will examine this Height data set we discussed during class.
library(lme4)
Galton <- read.csv("http://www.cknudson.com/data/Galton.csv")
attach(Galton)
str(Galton)
## 'data.frame': 898 obs. of 6 variables:
## $ FamilyID : Factor w/ 197 levels "1","10","100",..: 1 1 1 1 108 108 108 108 123 123 ...
## $ FatherHeight: num 78.5 78.5 78.5 78.5 75.5 75.5 75.5 75.5 75 75 ...
## $ MotherHeight: num 67 67 67 67 66.5 66.5 66.5 66.5 64 64 ...
## $ Gender : Factor w/ 2 levels "F","M": 2 1 1 1 2 2 1 1 2 1 ...
## $ Height : num 73.2 69.2 69 69 73.5 72.5 65.5 65.5 71 68 ...
## $ NumKids : int 4 4 4 4 4 4 4 4 2 2 ...
If we are trying to predict ‘Height’ with just the parents respective heights we get this model:
fixedmod <- lm(Height ~ 0 + FatherHeight + MotherHeight)
‘FatherHeight’ and ‘MotherHeight’ are our fixed effects.
This is a naive model because it assumes that there are no correlations in the data.
This data set certainly has correlation. There are kids coming from the same families and we would expect their heights to be somewhat correlated because they have the same parents. Therefore, we add a random intercept into the model.
mixedmod <- lmer(Height ~ 0 + FatherHeight + MotherHeight + (1|FamilyID), data = Galton)
Now to compare the model summaries
summary(fixedmod)
##
## Call:
## lm(formula = Height ~ 0 + FatherHeight + MotherHeight)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.4895 -2.6686 -0.0381 2.7890 11.4215
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## FatherHeight 0.54246 0.03396 15.97 <2e-16 ***
## MotherHeight 0.45549 0.03669 12.41 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.434 on 896 degrees of freedom
## Multiple R-squared: 0.9974, Adjusted R-squared: 0.9974
## F-statistic: 1.697e+05 on 2 and 896 DF, p-value: < 2.2e-16
summary(mixedmod)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Height ~ 0 + FatherHeight + MotherHeight + (1 | FamilyID)
## Data: Galton
##
## REML criterion at convergence: 4766.1
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.3255 -0.7679 -0.0069 0.7832 3.1939
##
## Random effects:
## Groups Name Variance Std.Dev.
## FamilyID (Intercept) 1.075 1.037
## Residual 10.765 3.281
## Number of obs: 898, groups: FamilyID, 197
##
## Fixed effects:
## Estimate Std. Error t value
## FatherHeight 0.53555 0.04029 13.29
## MotherHeight 0.46351 0.04356 10.64
##
## Correlation of Fixed Effects:
## FthrHg
## MotherHeght -0.999
Fixed Model: \[\widehat{height} = .542I(FatherHeight) + .455I(MotherHeight)+\epsilon\]
Mixed Model: \[\widehat{height} = .536I(FatherHeight) + .464I(MotherHeight) + u_{family} + \epsilon\]
The \[\epsilon\] represents the variance within families and \[u_{family}\] represents the variance between the different families in the whole study.
If we had a positive value for \[u_{family}\] that would imply that height would also increase and the opposite is true if we had a negative value.
If we had a negative value for \[\epsilon\] we should expect a scenario where mom and dad are tall (also most likely a positive random effect) but the child is short.
This brief example shows how important it is to incorporate random effects to account for correlation within the data. We do not want to over or underestimate