In class today we introduced the Likelihood Ratio Test (Wilks Test), which is used to compare 2 nested models. As before when comparing models, this test will help us identify if the added complexity is worth it for a better model fit. \[Test Statistic=-2[Log(LikelihoodModel)-Log(NestedLikelihoodModel)]\]
The test statistic is calculated as a chi-squared distribution with Degrees of freedom defined as: \[DF={p_1}-{p_2}\] Or, the difference of degrees of freedom from both models.
Order: 1.Fit Models 2.Determine Log of Likelihood Functions 3.‘pchisq(teststat)’
data("AirPassengers")
logair<-log(AirPassengers)
ts.plot(logair)
mod<-arima(logair,order = c(1,0,1), season = list( order = c( 1,2,1), period=12))
#From previous analysis
summary(mod)
##
## Call:
## arima(x = logair, order = c(1, 0, 1), seasonal = list(order = c(1, 2, 1), period = 12))
##
## Coefficients:
## ar1 ma1 sar1 sma1
## 0.9303 -0.4175 -0.4505 -1.0000
## s.e. 0.0448 0.0991 0.0861 0.1038
##
## sigma^2 estimated as 0.001415: log likelihood = 203.55, aic = -397.1
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set -0.001134586 0.03439361 0.02451314 -0.01830164 0.4413303
## MASE ACF1
## Training set 0.2706075 -0.02443055
While we can So our goal with this section is to compare to models together using the Likelihood Ratio Test (Wilks Test) So what we can do is develop higher orders in our arima model and use the LRT test to see if it could be beneficial to develop our model more:
Acf(logair)
Pacf(logair)
Looking at these plots, it follows that we should at least have p=1 and q=1. But, what if having a higher order of q would make our model better? We may also want to include a P=1 and a higher level of Q. We will keep the difference values the same as they were calculated previously
mod2<-arima(logair,order = c(1,0,2), season = list( order = c( 1,2,2), period=12))
summary(mod2)
##
## Call:
## arima(x = logair, order = c(1, 0, 2), seasonal = list(order = c(1, 2, 2), period = 12))
##
## Coefficients:
## ar1 ma1 ma2 sar1 sma1 sma2
## 0.9727 -0.4554 -0.0099 0.0887 -1.9056 0.9966
## s.e. 0.0381 0.0962 0.0842 0.1291 0.3750 0.3846
##
## sigma^2 estimated as 0.001035: log likelihood = 209.08, aic = -404.16
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set -0.001626717 0.02942676 0.02122888 -0.02726594 0.3814855
## MASE ACF1
## Training set 0.2343516 -0.01747046
Now that we have two different models, we can continue on with our test. We first calculate the log of the likelihood model. The function’logLik’ can calculate the likelihood of each model.
logLik(mod)
## 'log Lik.' 203.5517 (df=5)
logLik(mod2)
## 'log Lik.' 209.079 (df=7)
H_o: Don’t need to include the higher order in our model H_a: We need to include the higher order in out model We will want to calculate the degrees of freedom for the difference of the models. To do this we can take the difference of the coefficients of both models:
df <- length(coef(mod2)) - length(coef(mod))
df
## [1] 2
Thus, \[DF=2\] The test stat for our LRT is 2 times the difference of the likelihood models:
teststat<--2*(as.numeric(logLik(mod))-as.numeric(logLik(mod2)))
teststat
## [1] 11.05474
Next, we can calculate our p-value per the chi-squared distribution:
pchisq(teststat,df=2,lower.tail=FALSE)
## [1] 0.003976431
The pchisq functions returns a p-value, which with our traditional level, this says that including the higher order is significant to our model