This analysis seeks to understand the volume and trend of airline passengers from 1949 through 1960, and test different univariate forecasting methods. The data set contains the number of airline passengers per month.

Load the necessary libraries and read in the data set:

# load libraries
library(fable)
library(tsibble)
library(dplyr)
library(ggplot2)
library(lubridate)
library(feasts)
library(tseries)

# read data
airline_passengers <- read.csv('airline_passengers.csv')

# verify data
head(airline_passengers)

View the raw time series:

# transform date from character to tsibble year month
airline_passengers$date <- yearmonth(as.Date(airline_passengers$date, format='%m/%d/%Y'))

# create tsibble from data frame
# NULL key for univariate time series
ap_ts <- as_tsibble(airline_passengers, key=NULL, index=date)

# raw time series
ap_ts %>%
  autoplot(passengers) +
  ggtitle('Airline Passengers (1949 - 1960)')

The next step is to decompose the time series to confirm trend and seasonality.

# decompose time series
ap_ts %>%
  model(STL(passengers ~ season(window = Inf))) %>% 
  components() %>% 
  autoplot()

Based on the decomposition, the time series contains both trend and seasonality. There is also increasing variance (the time series is not stationary). To reduce the increasing variance, a data transformation technique such as Box Cox or differencing can be applied prior to forecasting.

# apply Box Cox transformation
ap_ts %>%
  autoplot(box_cox(passengers, lambda=0.1)) +
  ggtitle('Box Cox Transformation of Airline Passengers (1949 - 1960)')

The variance appears more stable after the Box Cox transformation, but the rolling mean is increasing in the time series. The next step is to difference the Box Cox values.

# apply seasonal and non-seasonal differencing
ap_ts %>% 
  gg_tsdisplay(difference(box_cox(passengers, lambda=0.1), lag=12) %>% difference(), plot_type='partial')
## Warning: Removed 13 rows containing missing values (`geom_line()`).
## Warning: Removed 13 rows containing missing values (`geom_point()`).

The data appear stationary after a non-seasonal differencing (lag=1). Both the ACF and PACF indicate there is strong auto-correlation with lag 12 (the same month of the previous year). A seasonal ARIMA model may initially be used for forecasting.

# manual ARIMA modelling
# Box Cox transformed, d=1 (non-seasonal differencing)
# D=1 (seasonal differencing)
# Q=1 : seasonal autoregressive term, based on ACF
# q=1 : non-seasonal autoregressive term, based on ACF
ap_ts %>% 
  model(
    ap_arima = ARIMA(box_cox(passengers, lambda=0.1) ~ 0 + pdq(0, 1, 1) + PDQ(0, 1, 1, period="1 year"))
  ) %>%
  forecast(h = "2 years") %>% 
  autoplot(ap_ts)

The next step is to verify the residuals of the fitted ARIMA model.

ap_ts %>% 
  model(
    ap_arima = ARIMA(box_cox(passengers, lambda=0.1) ~ 0 + pdq(0, 1, 1) + PDQ(0, 1, 1, period="1 year"))
  ) %>%
  gg_tsresiduals()

There is no strong autocorrelation within the residual lags, and residuals are relatively normally distributed.

ap_ts %>% 
  model(
    ap_arima = ARIMA(box_cox(passengers, lambda=0.1) ~ 0 + pdq(0, 1, 1) + PDQ(0, 1, 1, period="1 year"))
  ) %>%
  glance()

The AIC for the ARIMA (0, 1, 1) (0, 1, 1) model is -337.8. The seasonal ARIMA model can also work with parameters (1, 1, 0) (1, 1, 0), but the AIC for that model is approximately -331, which is higher than the current model. So the parameters (0, 1, 1) (0, 1, 1) are selected for forecasting.

Exponential smoothing models (ETS) can be used for forecasting as well. Since the data has both trend and seasonality, a Holt-Winters model may potentially be a good fit.

The ETS function selects either additive, multiplicative, or none for the error, trend, and seasonal components. The airline passenger time series appears to have a linear (additive) trend, and multiplicative (increasing variance) seasonality.

ap_ts %>% 
  model(
    ETS(passengers ~ trend("A") + season("M"))
  ) %>%
  forecast(h = "2 years") %>% 
  autoplot(ap_ts)

The forecasted values from the ETS model visually appear similar to the forecasted values from the ARIMA model. The next step is to verify the ETS model parameters and statistics:

# ETS automated modelling
airline_passengers_ets <- ap_ts %>% 
  model(ETS(passengers ~ trend("A") + season("M")))

# get ETS model parameters and statistics
report(airline_passengers_ets)
## Series: passengers 
## Model: ETS(M,A,M) 
##   Smoothing parameters:
##     alpha = 0.373202 
##     beta  = 0.01245161 
##     gamma = 0.4320481 
## 
##   Initial states:
##      l[0]      b[0]      s[0]     s[-1]     s[-2]    s[-3]    s[-4]    s[-5]
##  119.6154 0.9284305 0.8989108 0.7833545 0.9180881 1.055207 1.144985 1.171129
##     s[-6]     s[-7]    s[-8]    s[-9]    s[-10]    s[-11]
##  1.080575 0.9693889 1.033204 1.076279 0.9596447 0.9092337
## 
##   sigma^2:  0.0016
## 
##      AIC     AICc      BIC 
## 1400.452 1405.309 1450.939

The ETS(M, A, M) model has a higher AIC (1400.5) compared to the ARIMA model. Therefore, the ARIMA model should be preferred for forecasting for this data set.