This week we worked on evaluating and analyzing mixed models.

First, I will examine this Height data set we discussed during class.

Children’s Height

library(lme4)
Galton <- read.csv("http://www.cknudson.com/data/Galton.csv")
attach(Galton)
str(Galton)
## 'data.frame':    898 obs. of  6 variables:
##  $ FamilyID    : Factor w/ 197 levels "1","10","100",..: 1 1 1 1 108 108 108 108 123 123 ...
##  $ FatherHeight: num  78.5 78.5 78.5 78.5 75.5 75.5 75.5 75.5 75 75 ...
##  $ MotherHeight: num  67 67 67 67 66.5 66.5 66.5 66.5 64 64 ...
##  $ Gender      : Factor w/ 2 levels "F","M": 2 1 1 1 2 2 1 1 2 1 ...
##  $ Height      : num  73.2 69.2 69 69 73.5 72.5 65.5 65.5 71 68 ...
##  $ NumKids     : int  4 4 4 4 4 4 4 4 2 2 ...

If we are trying to predict ‘Height’ with just the parents respective heights we get this model:

fixedmod <- lm(Height ~ 0 + FatherHeight + MotherHeight)

‘FatherHeight’ and ‘MotherHeight’ are our fixed effects.

This is a naive model because it assumes that there are no correlations in the data.

This data set certainly has correlation. There are kids coming from the same families and we would expect their heights to be somewhat correlated because they have the same parents. Therefore, we add a random intercept into the model.

mixedmod <- lmer(Height ~ 0 + FatherHeight + MotherHeight + (1|FamilyID), data = Galton)

Now to compare the model summaries

summary(fixedmod)
## 
## Call:
## lm(formula = Height ~ 0 + FatherHeight + MotherHeight)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.4895 -2.6686 -0.0381  2.7890 11.4215 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## FatherHeight  0.54246    0.03396   15.97   <2e-16 ***
## MotherHeight  0.45549    0.03669   12.41   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.434 on 896 degrees of freedom
## Multiple R-squared:  0.9974, Adjusted R-squared:  0.9974 
## F-statistic: 1.697e+05 on 2 and 896 DF,  p-value: < 2.2e-16
summary(mixedmod)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Height ~ 0 + FatherHeight + MotherHeight + (1 | FamilyID)
##    Data: Galton
## 
## REML criterion at convergence: 4766.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.3255 -0.7679 -0.0069  0.7832  3.1939 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  FamilyID (Intercept)  1.075   1.037   
##  Residual             10.765   3.281   
## Number of obs: 898, groups:  FamilyID, 197
## 
## Fixed effects:
##              Estimate Std. Error t value
## FatherHeight  0.53555    0.04029   13.29
## MotherHeight  0.46351    0.04356   10.64
## 
## Correlation of Fixed Effects:
##             FthrHg
## MotherHeght -0.999

Fixed Model: \[\widehat{height} = .542I(FatherHeight) + .455I(MotherHeight)+\epsilon\]

Mixed Model: \[\widehat{height} = .536I(FatherHeight) + .464I(MotherHeight) + u_{family} + \epsilon\]

The \[\epsilon\] represents the variance within families and \[u_{family}\] represents the variance between the different families in the whole study.

If we had a positive value for \[u_{family}\] that would imply that height would also increase and the opposite is true if we had a negative value.

If we had a negative value for \[\epsilon\] we should expect a scenario where mom and dad are tall (also most likely a positive random effect) but the child is short.

This brief example shows how important it is to incorporate random effects to account for correlation within the data. We do not want to over or underestimate