In class today, we learned about the likelihood ratio test and how to sue it to compare two nested models to determine the best model fit. We first did this with a multiple linear regression model, then were tasked with using time series data to practice.

For this learning log, we will be using the gas data from R to provide an example of the work we did with the time series data.

library(quantmod)
library(tseries)
library(timeSeries)
library(forecast)
library(xts)
data(gas, package = "forecast")
head(gas)
##       Jan  Feb  Mar  Apr  May  Jun
## 1956 1709 1646 1794 1878 2173 2321

First we will plot the time series data to get an idea of what it looks like.

plot.ts(log(gas))

dec.gas <- decompose(gas)
plot(dec.gas)

We can create weights to help us smooth the time series data and make it better to work with.

(wgts <- c(.5, rep(1,11), .5)/12)
##  [1] 0.04166667 0.08333333 0.08333333 0.08333333 0.08333333 0.08333333
##  [7] 0.08333333 0.08333333 0.08333333 0.08333333 0.08333333 0.08333333
## [13] 0.04166667
plot.ts(filter(log(gas), sides=2, filter = wgts))

diff1 <- diff(log(gas), lag = 1)
plot.ts(diff1) 

adf.test(diff1) 
## Warning in adf.test(diff1): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff1
## Dickey-Fuller = -17.55, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
Acf(diff1)

Pacf(diff1)

Based on these plots, we will create an initial mod that is of the form (1, 1, 0)x(1, 0, 0)12.

mod1 <- arima(log(gas), order = c(1,1,0), season = list( order = c( 1,0,0), period=12))
summary(mod1)
## 
## Call:
## arima(x = log(gas), order = c(1, 1, 0), seasonal = list(order = c(1, 0, 0), 
##     period = 12))
## 
## Coefficients:
##           ar1    sar1
##       -0.2858  0.8503
## s.e.   0.0466  0.0241
## 
## sigma^2 estimated as 0.003579:  log likelihood = 656.05,  aic = -1306.1
## 
## Training set error measures:
##                       ME       RMSE        MAE        MPE      MAPE
## Training set 0.001353713 0.05975911 0.04439422 0.01457608 0.4752704
##                   MASE        ACF1
## Training set 0.5342741 -0.01456859

We will compare this model to the model created by the auto.arima command.

mod2 <- auto.arima(log(gas))
summary(mod2)
## Series: log(gas) 
## ARIMA(1,1,2)(1,0,0)[12] 
## 
## Coefficients:
##          ar1      ma1     ma2    sar1
##       0.9023  -1.2318  0.2584  0.8641
## s.e.  0.0513   0.0812  0.0662  0.0264
## 
## sigma^2 estimated as 0.003519:  log likelihood=661.9
## AIC=-1313.81   AICc=-1313.68   BIC=-1292.99
## 
## Training set error measures:
##                       ME       RMSE        MAE        MPE      MAPE
## Training set 0.003827408 0.05900734 0.04343791 0.04291608 0.4653342
##                   MASE        ACF1
## Training set 0.4356729 -0.01243158

We will then create our test stat to compare the two models. We do this by using the formula -2[loglikelihood of the complicated model-loglikelihood of simple model]. For the test stat, we will use 2 degrees of freedom, because the only changes in the two models is that q in the first model is 0, while it is 2 in the second model. We then use the pchisq command to test the null hypothesis.

teststat <- 2* as.numeric(logLik(mod2)-logLik(mod1))
pchisq(teststat, df = 2, lower.tail = FALSE)
## [1] 0.002867776

The small value of the pchisq value means that the fit of mod2 is significantly better than the fit of mod1. This means that any predictions or forecasting done should use mod2 as the model. We can make predictions or forecasts by using the forecast command.

pred <- forecast(mod2, h=18)
pred
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Sep 1995       10.94965 10.87363 11.02567 10.83339 11.06592
## Oct 1995       10.88002 10.78850 10.97155 10.74005 11.02000
## Nov 1995       10.83317 10.72982 10.93652 10.67511 10.99123
## Dec 1995       10.60026 10.48740 10.71311 10.42766 10.77286
## Jan 1996       10.67014 10.54940 10.79088 10.48548 10.85480
## Feb 1996       10.66831 10.54088 10.79574 10.47342 10.86320
## Mar 1996       10.76401 10.63081 10.89721 10.56030 10.96773
## Apr 1996       10.81415 10.67590 10.95240 10.60272 11.02559
## May 1996       10.93951 10.79679 11.08222 10.72125 11.15777
## Jun 1996       11.01480 10.86810 11.16150 10.79044 11.23916
## Jul 1996       11.08080 10.93050 11.23109 10.85094 11.31065
## Aug 1996       10.99186 10.83830 11.14542 10.75701 11.22671
## Sep 1996       10.94618 10.76501 11.12734 10.66911 11.22324
## Oct 1996       10.88638 10.69087 11.08190 10.58737 11.18540
## Nov 1996       10.84624 10.63854 11.05394 10.52858 11.16389
## Dec 1996       10.64529 10.42704 10.86353 10.31151 10.97906
## Jan 1997       10.70595 10.47845 10.93345 10.35801 11.05388
## Feb 1997       10.70461 10.46889 10.94033 10.34410 11.06512

This forecast provides us with a prediction about what the values will be for the 18 months following our collected data. We can plot this with our original time series plot to see what our forecast looks like.

plot(pred)