In past learning logs, I have worked with the \(\texttt{AirPassengers}\) data set. I will now contiue my analysis on this data set and bring my findings to an airline company.

The Analysis

In my learning log for day 23 (http://rpubs.com/bensonsyd/383764), I developed a model for this data set that determined we would need an ARIMA (1, 0, 1)x(0, 1, 1)12 model to model the data accurately. We can also use the \(\texttt{auto.arima()}\) function to model the data set. This function is mentioned in my learning log for day 22 (http://rpubs.com/bensonsyd/382861).

mod <- arima(log(AirPassengers), order = c(1, 0, 1), season = list(order = c(0, 1, 1), period= 12))
modauto <- auto.arima(log(AirPassengers))
modauto
## Series: log(AirPassengers) 
## ARIMA(0,1,1)(0,1,1)[12] 
## 
## Coefficients:
##           ma1     sma1
##       -0.4018  -0.5569
## s.e.   0.0896   0.0731
## 
## sigma^2 estimated as 0.001371:  log likelihood=244.7
## AIC=-483.4   AICc=-483.21   BIC=-474.77

From the \(\texttt{auto.arima}\) function, we find that the suggested order for the model is ARIMA (0, 1, 1)x(0, 1, 1)12. Let’s compare the AIC of these two models. We’ll use our \(\texttt{AIC()}\) function.

AIC(modauto)
## [1] -483.3991
AIC(mod)
## [1] -483.2233

Our two models have very similar AICs and the same number of parameters but we’ll choose to move forward with our automatic model. We might then want to look at whether our seasonal moving average really improves our model. Let’s create an ARIMA (0, 1, 1)x(0, 1, 0)12 model.

mod2 <- arima(log(AirPassengers), order = c(0, 1, 1), season = list(order = c(0, 1, 0), period= 12))
mod2
## 
## Call:
## arima(x = log(AirPassengers), order = c(0, 1, 1), seasonal = list(order = c(0, 
##     1, 0), period = 12))
## 
## Coefficients:
##           ma1
##       -0.3870
## s.e.   0.0887
## 
## sigma^2 estimated as 0.001828:  log likelihood = 226.99,  aic = -449.98

Notice that this model is actually a subset of our automatically created model so we can conduct a likelihood ratio test. Our test statistic is \(2[l(mod)-l(mod2)]\) and will be compared to a chi-squared distribution with 1 degree of freedom.

teststat <- 2*(as.numeric(logLik(modauto))-as.numeric(logLik(mod2)))
pchisq(teststat, df=1, lower.tail = FALSE)
## [1] 2.657912e-09

Our p-value is much smaller than .05 so we can conclude that our seasonal moving average is an important predictor in our model.

Results

We have determined that the optimal model for predicting passengers for your airline is an ARIMA (0, 1, 1)x(0, 1, 1)12 model. We can use this data to predict the number of passengers taking your airline for 1961.

fore <- forecast(modauto, h=12)
fore
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 1961       6.110186 6.062729 6.157642 6.037607 6.182764
## Feb 1961       6.053775 5.998476 6.109074 5.969203 6.138347
## Mar 1961       6.171715 6.109555 6.233874 6.076650 6.266779
## Apr 1961       6.199300 6.130966 6.267635 6.094792 6.303809
## May 1961       6.232556 6.158560 6.306552 6.119388 6.345724
## Jun 1961       6.368779 6.289524 6.448033 6.247569 6.489988
## Jul 1961       6.507294 6.423109 6.591479 6.378544 6.636044
## Aug 1961       6.502906 6.414064 6.591749 6.367034 6.638779
## Sep 1961       6.324698 6.231431 6.417965 6.182058 6.467338
## Oct 1961       6.209008 6.111516 6.306500 6.059908 6.358109
## Nov 1961       6.063487 5.961947 6.165028 5.908195 6.218780
## Dec 1961       6.168025 6.062591 6.273459 6.006778 6.329272

To make this more visually appealing, we can look at a plot, but it seems that the number of passengers on your airline per month for 1961 will be between 6000 and 6500.

plot(fore)

This model will work well for predicting the number of airline passengers monthly for a few years following 1960, but will need to be updated for use after this.