Today in class we covered the likelyhood ratio test and how to use it when comparing two models, one of which is a nested version of the other. We learned that when using the likelyhood ratio test small p-values lead us to using the more complicated model.
# Fit the two mods
simpler <- lm(Petal.Length ~ Sepal.Length, data =iris)
biggermod <- lm(Petal.Length ~ Sepal.Length + Species, data =iris)
summary(biggermod) #why can't we use this to decide?
##
## Call:
## lm(formula = Petal.Length ~ Sepal.Length + Species, data = iris)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.76390 -0.17875 0.00716 0.17461 0.79954
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.70234 0.23013 -7.397 1.01e-11 ***
## Sepal.Length 0.63211 0.04527 13.962 < 2e-16 ***
## Speciesversicolor 2.21014 0.07047 31.362 < 2e-16 ***
## Speciesvirginica 3.09000 0.09123 33.870 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2826 on 146 degrees of freedom
## Multiple R-squared: 0.9749, Adjusted R-squared: 0.9744
## F-statistic: 1890 on 3 and 146 DF, p-value: < 2.2e-16
# Could use the F-test to decide
anova(biggermod, simpler)
## Analysis of Variance Table
##
## Model 1: Petal.Length ~ Sepal.Length + Species
## Model 2: Petal.Length ~ Sepal.Length
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 146 11.657
## 2 148 111.459 -2 -99.802 624.99 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Or could use the LRT (Wilks test)
logLik(simpler)
## 'log Lik.' -190.5675 (df=3)
logLik(biggermod)
## 'log Lik.' -21.23709 (df=5)
mydf <- length(coef(biggermod)) - length(coef(simpler))
(teststat<- -2*as.numeric(logLik(simpler) - logLik(biggermod)))
## [1] 338.6608
pchisq(teststat, df = mydf, lower.tail = FALSE)
## [1] 2.888934e-74
#reject the null: the more complicated model (biggmod) fits better
####