Today in class we talked about the Likelihood Ratio Test, otherwise known as the Wilks test. We use this to compare two nested models in the terms of model fit.
I will be using the Galton data set that contains heights of family members. I will be comparing a model using only father’s height to predict height to a model using both father’s and mother’s height to predict height to see if we should include mother’s height.
First we get the data in.
Galton <- read.csv("http://cknudson.com/data/Galton.csv")
attach(Galton)
Then we can run our two models.
m1<-lm(Height~FatherHeight)
m2<-lm(Height~FatherHeight+MotherHeight)
Now we can find the test statitistic for the hypothesis test. Our null hypothesis is that our extra variable, Mother’s height, is not a necessary predictor whereas our alternative hypothesis is that it is necessary in our model.
anova(m1,m2)
## Analysis of Variance Table
##
## Model 1: Height ~ FatherHeight
## Model 2: Height ~ FatherHeight + MotherHeight
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 896 10642
## 2 895 10261 1 380.86 33.219 1.132e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
logLik(m1)
## 'log Lik.' -2384.311 (df=3)
logLik(m2)
## 'log Lik.' -2367.947 (df=4)
t1<-as.numeric(logLik(m1))
t2<-as.numeric(logLik(m2))
teststat<- -2*(t1-t2)
pchisq(teststat, df=1, lower.tail = FALSE)
## [1] 1.060446e-08
As we can see our p-value is less than .05 so we can reject the null and keep mother’s height in our model. One thing to note is how we found our degrees of freedom for our test. It is simply the difference between the degrees of freedom for each models log likelihood. In the end this is a good strategy to use to determine if a predictor is important to the model or not.