Today we discussed the Likelihood Ratio Tests, also known as the Wilks Test. This test can be used to compare 2 nested models in terms of model fit. Remember nested means that one model is a special case of the other. A basic example would be these two equations: \(y=\beta_0+\beta_1x\) and \(y=\beta_0\). The test statistic for the Wilks Test is equal to -2[log likelihood of the complicated model - the log likelihood of the simpler model]. The -2 can be changed to a two if needed for the value to be positive. This test stat follows a chi-squared distribution with degrees of freedom equal to the difference in parameters between the two models. For this test, the null hypothesis is that the additional parameter is not needed, and the alternative is that the additional parameter is necessary.
First we will use this new test to compare two linear regression models. WE will use data from the iris data set. Our simple model will have Petal.Length as the lone predictor, and Sepal.Length as the response. In our second model we will add the predictor species to see if it improves the model.
data(iris)
names(iris)
## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
## [5] "Species"
attach(iris)
mod1<- lm(Sepal.Length ~ Petal.Length)
mod2 <- lm(Sepal.Length~ Petal.Length + Species)
summary(mod2)
##
## Call:
## lm(formula = Sepal.Length ~ Petal.Length + Species)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.75310 -0.23142 -0.00081 0.23085 1.03100
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.68353 0.10610 34.719 < 2e-16 ***
## Petal.Length 0.90456 0.06479 13.962 < 2e-16 ***
## Speciesversicolor -1.60097 0.19347 -8.275 7.37e-14 ***
## Speciesvirginica -2.11767 0.27346 -7.744 1.48e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.338 on 146 degrees of freedom
## Multiple R-squared: 0.8367, Adjusted R-squared: 0.8334
## F-statistic: 249.4 on 3 and 146 DF, p-value: < 2.2e-16
As we see from the summary output from our complicated model, there are four parameters counting the intercept, and two for the simple model. First, we can try the F-Test just to see which model this test favors so we have a comparison.
anova(mod1,mod2)
## Analysis of Variance Table
##
## Model 1: Sepal.Length ~ Petal.Length
## Model 2: Sepal.Length ~ Petal.Length + Species
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 148 24.525
## 2 146 16.682 2 7.8434 34.323 6.053e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here we see that the p-value of the more complicated model is significant enough for us to favor it over the simpler model. Now we will find the log likelihood of each model using the logLik command.
logLik(mod1)
## 'log Lik.' -77.02021 (df=3)
logLik(mod2)
## 'log Lik.' -48.11637 (df=5)
Now that we have both of these values, we can find our test statistic.
teststat <- -2*(as.numeric(logLik(mod1))-as.numeric(logLik(mod2)))
teststat
## [1] 57.80768
We make sure that our test stat is positive. Now we use the pchisq command to find the p-value. Remember our degrees of freedom is the difference in parameters between the two models, so in this case it is 4-2=2.
pchisq(teststat, df=2, lower.tail =FALSE)
## [1] 2.800409e-13
With this small of a p-value we conclude that the species variable is a significant predictor, and our model improves by including it, therefore we reject our null hypothesis in favor of the alternative and we will choose the more complicated model.
Now we will use the Wilks Test on two ARIMA models from time series data, specifically the births data that we used previously. With the ARIMA model instead of adding predictors, we add additional parameters of to the ARIMA model. In this example we add an additional seasonal auto regressive parameter as well as an additional seasonal moving average parameter. The degrees of freedom is the total change in parameters between the two models, so in our case it is two. First we create the two ARIMA models.
births <- scan("http://robjhyndman.com/tsdldata/data/nybirths.dat")
births <- births[-c(1:24)]
births <- ts(births, frequency=12, start=c(1946,1))
mod3 <- arima(births, order = c(1,0,1), season = list( order = c( 1,1,1), period=12))
mod4<- arima(births, order = c(1,0,1), season = list( order = c( 2,1,2), period=12))
Now that we have entered our models, we can find our test statistic, and then enter that into the pchisq function to calculate our p-value.
teststat1 <- -2* (as.numeric(logLik(mod3))-as.numeric(logLik(mod4)))
teststat1
## [1] 14.18952
pchisq(teststat1, df = 2, lower.tail = FALSE)
## [1] 0.0008294403
From this low p-value we conclude that inclusion of these two additional parameters does significantly improve the model. We would reject our null hypothesis and say that the complicated model is worth utilizing. For further comparison, we can look at the AIC values of both models. Remember a model has to have a value that is 10 less than the other for us to conclude that it is a significant improvement.
AIC(mod3);AIC(mod4)
## [1] 283.6117
## [1] 273.4222
Here we see that mod4 has the lower AIC value by more than 10, therefore it is the better model based upon two different tests conducted.