In class today we introduced the Likelihood Ratio Test (Wilks Test), which is used to compare 2 nested models. As before when comparing models, this test will help us identify if the added complexity is worth it for a better model fit. \[Test Statistic=-2[Log(LikelihoodModel)-Log(NestedLikelihoodModel)]\]

The test statistic is calculated as a chi-squared distribution with Degrees of freedom defined as: \[DF={p_1}-{p_2}\] Or, the difference of degrees of freedom from both models.

Order: 1.Fit Models 2.Determine Log of Likelihood Functions 3.‘pchisq(teststat)’

data("AirPassengers")
logair<-log(AirPassengers)
ts.plot(logair)

mod<-arima(logair,order = c(1,0,1), season = list( order = c( 1,2,1), period=12))
#From previous analysis
summary(mod)
## 
## Call:
## arima(x = logair, order = c(1, 0, 1), seasonal = list(order = c(1, 2, 1), period = 12))
## 
## Coefficients:
##          ar1      ma1     sar1     sma1
##       0.9303  -0.4175  -0.4505  -1.0000
## s.e.  0.0448   0.0991   0.0861   0.1038
## 
## sigma^2 estimated as 0.001415:  log likelihood = 203.55,  aic = -397.1
## 
## Training set error measures:
##                        ME       RMSE        MAE         MPE      MAPE
## Training set -0.001134586 0.03439361 0.02451314 -0.01830164 0.4413303
##                   MASE        ACF1
## Training set 0.2706075 -0.02443055

While we can So our goal with this section is to compare to models together using the Likelihood Ratio Test (Wilks Test) So what we can do is develop higher orders in our arima model and use the LRT test to see if it could be beneficial to develop our model more:

Acf(logair)

Pacf(logair)

Looking at these plots, it follows that we should at least have p=1 and q=1. But, what if having a higher order of q would make our model better? We may also want to include a P=1 and a higher level of Q. We will keep the difference values the same as they were calculated previously

mod2<-arima(logair,order = c(1,0,2), season = list( order = c( 1,2,2), period=12))
summary(mod2)
## 
## Call:
## arima(x = logair, order = c(1, 0, 2), seasonal = list(order = c(1, 2, 2), period = 12))
## 
## Coefficients:
##          ar1      ma1      ma2    sar1     sma1    sma2
##       0.9727  -0.4554  -0.0099  0.0887  -1.9056  0.9966
## s.e.  0.0381   0.0962   0.0842  0.1291   0.3750  0.3846
## 
## sigma^2 estimated as 0.001035:  log likelihood = 209.08,  aic = -404.16
## 
## Training set error measures:
##                        ME       RMSE        MAE         MPE      MAPE
## Training set -0.001626717 0.02942676 0.02122888 -0.02726594 0.3814855
##                   MASE        ACF1
## Training set 0.2343516 -0.01747046

Now that we have two different models, we can continue on with our test. We first calculate the log of the likelihood model. The function’logLik’ can calculate the likelihood of each model.

logLik(mod)
## 'log Lik.' 203.5517 (df=5)
logLik(mod2)
## 'log Lik.' 209.079 (df=7)

H_o: Don’t need to include the higher order in our model H_a: We need to include the higher order in out model We will want to calculate the degrees of freedom for the difference of the models. To do this we can take the difference of the coefficients of both models:

df <- length(coef(mod2)) - length(coef(mod)) 
df
## [1] 2

Thus, \[DF=2\] The test stat for our LRT is 2 times the difference of the likelihood models:

teststat<--2*(as.numeric(logLik(mod))-as.numeric(logLik(mod2)))
teststat
## [1] 11.05474

Next, we can calculate our p-value per the chi-squared distribution:

pchisq(teststat,df=2,lower.tail=FALSE)
## [1] 0.003976431

The pchisq functions returns a p-value, which with our traditional level, this says that including the higher order is significant to our model