Forecasting Air Passenger Revenue Miles

Author

Aditi Anand

Published

February 22, 2023

Introduction

The air revenue passenger mile is a transportation industry metric that shows the number of miles traveled by paying air passengers and is typically an airline traffic statistic. Revenue passenger miles are calculated by multiplying the number of paying passengers by the distance traveled.

The dataset is obtained from the Federal Reserve Economic Data (FRED). The dataset includes the number of revenue passenger miles flown by air carriers in the United States from January 2000 to the present. The data is published monthly and is measured in billions of revenue passenger miles. For the forecasting, I am using 10 years of data from January 2010 to December 2019.

Data Generating process

The data-generating process for this variable is likely influenced by various factors such as economic conditions, consumer confidence, and events such as natural disasters and pandemics. Economic downturns and recessions can decrease demand for air travel, while periods of economic growth and prosperity can increase demand. Consumer confidence also plays a role, as people tend to travel more when they feel financially secure. Furthermore, natural disasters or pandemics can also have a major impact on the demand for air travel.

Code
fredr_set_key("d6b4de57ebc04ac1898e4b98ebecbbac")

# Loading the dataset
air_miles <- fredr(
  series_id = "AIRRPMTSI",
  observation_start = as.Date("2010-01-01"),
  observation_end = as.Date("2019-12-01")
) %>%
  select(date, value) %>%
  mutate(value = value/1000000) %>%
  mutate(date = yearmonth(date)) %>%
  as_tsibble(index = date)

# Train set
train = air_miles %>%  
  filter(date<ymd('2017-01-01'))

# Test set
test = air_miles %>% 
  filter(date>=ymd('2017-01-01'))

air_miles_plot <- air_miles %>%
  ggplot() +
  geom_line(aes(date, value)) +
  geom_vline(xintercept=ymd('2017-01-01'), color='Green')+
  theme_bw() +
  ggtitle("Air Passenger Miles Over Time") +
  xlab("Date") +
  ylab("Air Passenger Miles (in billions)")

air_miles_plot

This line chart shows the overall trend of the Air Revenue Passenger Miles over time. It clearly shows the upward trend in air travel over the years, with some fluctuations. The fluctuations can be attributed to the external factors such as economic conditions, consumer confidence, etc, and seasonality.

Train-Test Split

The vertical green line in the above plot depicts the train-test split.

Training Set: Data from 2010-01-01 to 2016-12-01

Testing Set: Data from 2017-01-01 to 2019-12-01

All following analysis are being conducted on training set.

EDA & TS Decomposition

Exploratory data analysis (EDA) is the process of visualizing and summarizing data to gain insights into its underlying patterns, distributions, and relationships. EDA can help to identify outliers, missing values, and other data quality issues, and inform the selection of appropriate modeling techniques.

Code
hist <- train %>%
  ggplot() +
  geom_histogram(aes(value)) +
  theme_bw()

dens <- train %>%
  ggplot() +
  geom_density(aes(value)) +
  theme_bw()

violin <- train %>%
  ggplot() +
  geom_violin(aes("", value)) +
  theme_bw()

boxplot <- train %>%
  ggplot() +
  geom_boxplot(aes("", value)) +
  theme_bw()

hist + boxplot + dens + violin

The histogram plot shows the frequency of data points within different ranges. The data seems to be normally distributed.

The boxplot shows the median around 71 billion, the first and third quartiles. There are no outliers.

The density plot shows the distribution of the data points. It shows that the majority of the data are located between 65-80 billions.

The violin plot also shows the distribution of the data.

Summary Statistics

Code
sumtable(train, digits = 3, fixed.digits = TRUE,
         summ=c('notNA(x)','mean(x)','sd(x)', 'min(x)',
      'quantile(train$value, probs = 0.25)',
      'median(x)',
      'quantile(train$value, probs = 0.75)',
      'max(x)'),
   summ.names = list(
     c('No. of Obs.','Mean','Std. Dev.','Min.','1st Qu.', 'Median','3rd Qu.','Max.')
   ))
Summary Statistics
Variable No. of Obs. Mean Std. Dev. Min. 1st Qu. Median 3rd Qu. Max.
value 84 71.827 7.971 53.240 66.261 71.704 77.264 91.257

*All the values are in billions of revenue passenger miles.

Moving Average

Code
air_miles_decomp <- train %>%
  mutate(
    ma_13_right = rollapply(
      value,
      13,
      FUN = mean,
      align = "right", fill = NA
    )
  ) %>%
  mutate(resid = value - ma_13_right) %>%
  select(date, value, ma_13_right, resid)


air_miles_decomp %>%
  ggplot() +
  geom_line(aes(date, value), size = 1) +
  geom_line(aes(date, ma_13_right), size = 1, color = "blue") +
  theme_bw()+
  ylab('Air Passenger Miles (in billions)')+
  xlab("Date")+
  ggtitle(
    "Right Aligned Moving Average of Order 13"
  )

The blue line in the above plot shows the right-aligned moving average for air passenger miles data. We get a smoother estimate of the trend cycle, indicating an increasing trend over time.

Time Series Decomposition

Code
air_pax_mult = train %>%
  as_tsibble() %>%
  model(
    classical_decomposition(value,'multiplicative')
  ) %>%
  components()

air_pax_mult %>%
  autoplot()

By visualizing the season plot above, we can say that there is a clear repeating pattern in the data, which indicates seasonality. The seasonality appears to be strong, as there is a clear and consistent repeating pattern, with peaks and valleys that occur at regular intervals, indicating that the number of passenger miles flown varies greatly depending on the season. After subtracting the moving average and seasonality from the original time series, we get the remainder which is white noise.

ARIMA Modeling

ARIMA (Autoregressive Integrated Moving Average) is a statistical method used to analyze and forecast time series data. It combines autoregression, differencing, and moving average components to model complex temporal patterns. One of the key assumptions of ARIMA models is that the underlying time series data is stationary.

Stationarity

A stationary time series is one whose statistical properties such as mean and variance do not depend on the time at which the series is observed.

From the initial analysis, the data does not appear to be mean stationary as there is an increasing tread over time. Also there are cyclical patterns, with spikes in air passenger miles occurring every year. This suggests that there is yearly seasonality in the data. From looking at the above plot we cannot say much about variance stationary.

Using the rolling sd method to determine if the data is variance stationary or not

Code
air_miles_roll <- train %>%
  mutate(
    value_sd = zoo::rollapply(
      value, 
      FUN = sd, 
      width = 12, 
      fill = NA)
  )

air_miles_rollsd <- air_miles_roll %>%
  ggplot() +
  geom_line(aes(date, value_sd)) +
  geom_smooth(aes(date,value_sd),method='lm',se=F)+
  theme_bw() +
  ggtitle("Air Passenger Miles Standard Deviation over Time (12 month rolling window)") +
  ylab("Air Passenger Miles (in billions)") +
  xlab("Date")

air_miles_rollsd

The rolling sd method suggests that the data is variance non-stationary as the variance is ranging approximately from 6.75 to 7.75.

Log Transformation to Induce Variance Stationarity

Code
train_trans <- train %>%
  mutate(value_log = log(value))

train_trans %>%
  ggplot() +
  geom_line(aes(date, value_log)) +
  theme_bw() +
  ggtitle("Air Passenger Miles over Time (Log Transformed)") +
  ylab("Air Passenger Miles Transformed") +
  xlab("Date")

Performing seasonal difference of 12 lags to remove the yearly seasonal effect

Code
train_trans<-train_trans %>%
  mutate(value_diff_season = difference(value_log,12)) %>%
  drop_na() %>%
  as_tsibble(index=date)

train_trans %>%
  gg_tsdisplay(value_diff_season,
               plot_type='partial', lag=36) +
  labs(title="Seasonally Differenced", y="")

KPSS test for stationarity

Code
season_value_kpss = train_trans %>% 
  features(value_diff_season, unitroot_kpss)
season_value_kpss
# A tibble: 1 × 2
  kpss_stat kpss_pvalue
      <dbl>       <dbl>
1     0.796        0.01

The KPSS test suggests that process is still mean non-stationary as p-value (0.01) < 0.05.

Calculating the first difference of the transformed data

Code
train_trans <- train_trans %>%
  mutate(value_diff_mean = difference(value_diff_season,1)) %>%
  drop_na() %>%
  as_tsibble(index=date)

train_trans %>%
  gg_tsdisplay(value_diff_mean,
               plot_type='partial', lag=36) +
  labs(title="Double Differenced", y="")

KPSS test for stationarity

Code
mean_value_kpss = train_trans %>% 
  features(value_diff_mean, unitroot_kpss)
mean_value_kpss
# A tibble: 1 × 2
  kpss_stat kpss_pvalue
      <dbl>       <dbl>
1    0.0617         0.1

The above KPSS test confirms that process is stationary as p-value (0.1) > 0.05.

ACF/PACF

ACF/PACF plots of the transformed time series

Code
par(mfrow = c(1, 2))
acf(train_trans$value_diff_mean)
pacf(train_trans$value_diff_mean)

Based on the ACF and PACF plots, the time-series appear to be a combination of autoregressive and moving average process, as neither of ACF and PACF plot show any dampening effect. Also, there is no clear indication of order of AR or MA process.

The significant spike at lag 1 in the ACF suggests a non-seasonal MA(1) component and the significant spike at lag 2 in the PACF suggests a AR(2) component. There are two significant spike after lag 12 in the ACF suggests a seasonal MA(2) component, and one significant spike at lag 21 in the PACF suggests a seasonal AR(1) component. Consequently, we begin with an ARIMA(2,1,1)(1,1,2)[12] model, indicating a first difference, a seasonal difference, a non-seasonal AR(2), a non-seasonal MA(1), a seasonal AR(1) and seasonal MA(2) component.

ARIMA Model Comparision

Fitting several ARIMA models similar to guessed ARIMA(2,1,1)(1,1,2)[12] model

Code
# Order = p,d,q (AR, I, MA)
models_bic = train %>%
  model(
    mod1 = ARIMA(log(value)~pdq(2,1,1)+PDQ(1,1,2)),
    mod2 = ARIMA(log(value)~pdq(2,1,0)+PDQ(1,1,2)),
    mod3 = ARIMA(log(value)~pdq(2,1,1)+PDQ(1,1,1)),
    mod4 = ARIMA(log(value)~pdq(2,1,0)+PDQ(1,1,1)),
    mod5 = ARIMA(log(value)~pdq(1,1,1)+PDQ(1,1,2)),
    mod6 = ARIMA(log(value)~pdq(1,1,0)+PDQ(1,1,2)),
    mod7 = ARIMA(log(value)~pdq(1,1,1)+PDQ(1,1,1)),
    mod8 = ARIMA(log(value)~pdq(1,1,0)+PDQ(1,1,1)),
  )

models_bic %>%
  glance() %>%
  arrange(BIC)
# A tibble: 8 × 8
  .model   sigma2 log_lik   AIC  AICc   BIC ar_roots   ma_roots  
  <chr>     <dbl>   <dbl> <dbl> <dbl> <dbl> <list>     <list>    
1 mod7   0.000170    202. -394. -394. -383. <cpl [13]> <cpl [13]>
2 mod4   0.000172    202. -394. -393. -382. <cpl [14]> <cpl [12]>
3 mod5   0.000156    204. -396. -395. -382. <cpl [13]> <cpl [25]>
4 mod6   0.000158    202. -393. -392. -382. <cpl [13]> <cpl [24]>
5 mod2   0.000158    203. -395. -394. -381. <cpl [14]> <cpl [24]>
6 mod8   0.000185    198. -389. -388. -380. <cpl [13]> <cpl [12]>
7 mod3   0.000172    202. -393. -392. -379. <cpl [14]> <cpl [13]>
8 mod1   0.000158    204. -394. -392. -378. <cpl [14]> <cpl [25]>

According to BIC, mod 7, i.e. ARIMA(1,1,1)(1,1,1)[12] model is best.

Deriving the fitted values from ARIMA(1,1,1)(1,1,1)[12] model and plotting them against the observed values of the series.

Code
best_model <- models_bic %>%
  select(mod7)
fitted <- best_model %>%
  augment() %>%
  .$.fitted

ggplot() +
  geom_line(aes(train$date, train$value), color = "black") +
  geom_line(aes(train$date, fitted), color = "blue", alpha=0.5) +
  theme_bw() +
  xlab("Date") +
  ylab("Air Passenger Miles (in billions)") +
  ggtitle("Observed vs Fitted (in blue) values")

The in-sample predicted values (in blue) matches closely to the observed values (in black).

Residuals Analysis

Analyzing the residuals from the selected model

Code
best_model %>%
  gg_tsresiduals(lag=36)

There is no significant spike in the above residual’s ACF plot, suggesting that the residuals are white noise.

Box-Ljung test for residual autocorrelation

Code
best_model %>%
  augment() %>%
  features(.innov, ljung_box, lag = 10, dof = 4)
# A tibble: 1 × 3
  .model lb_stat lb_pvalue
  <chr>    <dbl>     <dbl>
1 mod7      12.1    0.0593

Box-Ljung test for residual autocorrelation confirms that the residuals are similar to white noise as p-value > 0.05.

Meta Prophet Model

The Prophet model consists of trend, seasonality, and holidays. The trend component captures the underlying growth rate of the time series, while the seasonal component represents periodic fluctuations in the data. The holiday component is used to model the effects of holidays and special events.

Prophet is able to handle missing data by imputing it using a technique called linear interpolation. The model also has built-in functionality for detecting and handling outliers.

Decomposition

Decomposition of time-series into trend & seasonality

Code
# Train set
prophet_train = train %>%  
  rename(ds = date,
  y = value)

# Test set
prophet_test = test %>% 
  rename(ds = date,
  y = value)

# Train Model
orig_model = prophet(prophet_train) 

# Create future dataframe for predictions
orig_future = make_future_dataframe(orig_model,periods = 24, freq='month') 

# Get forecast
orig_forecast = predict(orig_model,orig_future) 

prophet_plot_components(orig_model,orig_forecast)

Since the data is monthly, we can only see the yearly seasonality of the data, and no weekly seasonality plot is generated.

Hyperparameter Tuning

Identifying changepoints for the trend part

Code
plot(orig_model,orig_forecast)+
  add_changepoints_to_plot(orig_model)+
  ggtitle("24 Months Forecasting with Default Chagepoints")+
  ylab("Air Passenger Miles (in billions)")+
  xlab("Month")+
  theme_bw()

The above detected 7 changepoints does not make sense for the Air Passenger Miles time-series data as there seems to be only 2-3 major changepoints (from looking at the trend plot). Therefore, adjusting the hyperparameters of the model to get 2 changepoints.

Code
# Number of Changepoints
model_2CP = prophet(prophet_train,n.changepoints=2)

model_2CP_forecast = predict(model_2CP,orig_future)

plot(model_2CP,model_2CP_forecast)+
  add_changepoints_to_plot(model_2CP)+
  ggtitle("24 Months Forecasting with Tuned Hyperparameter Prophet Model")+
  ylab("Air Passenger Miles (in billions)")+
  xlab("Month")+
  theme_bw()

By adjusting the hyperparameters of the model by specifying number of changepoints as 2, we get the desired model.

Assessing the tuned model

The above two-year forecast from the base prophet model with hyperparameter n.changepoints=2 appears to be able to capture the trend in the data reasonably. We do not need to take into account a saturating minimum/maximum point by specifying a floor and a cap. The default growth that is linear seems to be good in this case.

Seasonality

The previous decomposition of the time-series identifies yearly seasonality in the data. As this is Air Passenger Miles Data, multipicative seasonality makes more sense as seasonal fluctuations increases over time.

Multiplicative Seasonality

Code
multi = prophet(prophet_train,n.changepoints=2,seasonality.mode = 'multiplicative')
multi_fcst = predict(multi,orig_future)

plot(multi,multi_fcst)+
ggtitle('24 Months Forecast with Multiplicative Seasonality ')+
ylab('Air Passenger Miles (in billions)')+
xlab('Date')

Code
prophet_plot_components(multi, multi_fcst)

This is the monthly data, we don’t need to examine and include holidays into the model.

The best prophet model selected is the linear growth model with multiplicative seasonality and two number of changepoints.

Model Comparison and Validation

Model comparison involves comparing the performance of different models on a common set of evaluation metrics such RMSE, MAPE, etc, while model validation involves evaluating the performance of a chosen model on new, unseen (test) data to assess its ability to generalize. This crucial for ensuring that a model is accurate, reliable, and useful for making predictions.

Cross-Validation

In-Sample Cross-Validation to Compare SNAIVE vs ARIMA vs Prophet Models

Code
cv_data = train %>%
  stretch_tsibble(.init = 36, .step = 12)

cv_forecast = cv_data %>%
  model(
  naive_w_season = SNAIVE(value),
  arima_mod = ARIMA(log(value)~pdq(1,1,1)+PDQ(1,1,1)),
  prophet_mod = fable.prophet::prophet(value ~ growth("linear", n_changepoints=2) + season("year", type = "multiplicative")),
  ) %>%
  forecast(h = 12)

cv_forecast %>%
    as_tsibble() %>%
    select(-value) %>%
    left_join(
        train
    ) %>%
    ggplot()+
    geom_line(aes(date,value))+
    geom_line(aes(date,.mean,color=factor(.id),linetype=.model))+
    scale_color_discrete(name='Iteration')+
    ggtitle('SNAIVE vs ARIMA vs Prophet Forecast for Each CV Iteration')+
    ylab('Air Passenger Miles (in billions)')+
    xlab('Date')+
    theme_bw()

Average Accuracy Comparison

Code
cv_forecast %>%
  accuracy(train) %>%
  data.table()
           .model .type        ME     RMSE      MAE       MPE     MAPE
1:      arima_mod  Test 0.2363339 1.466018 1.170648 0.3289766 1.568213
2: naive_w_season  Test 2.2188764 2.546839 2.251280 2.9524475 3.003218
3:    prophet_mod  Test 0.6939505 1.888822 1.526252 1.0036309 2.041929
        MASE     RMSSE      ACF1
1: 0.6150629 0.6518747 0.4696950
2: 1.1828307 1.1324690 0.4756066
3: 0.8018985 0.8398772 0.5562686

All the three models seems to perform well as the forecasted value overlaps closely with the actual value. ARIMA model has the lowest RMSE, indicating that the ARIMA model is best for forecasting Air Passenger Miles Time Series Data.

RMSE

Code
cv_forecast %>%
  group_by(.id,.model) %>%
  mutate(h = row_number()) %>%
  ungroup() %>%
  as_fable(response = "value", distribution = value) %>%
  accuracy(train, by = c("h", ".model")) %>%
  ggplot(aes(x = h, y = RMSE,color=.model)) +
  geom_point()+
  geom_line()+
  theme_bw()+
  ggtitle('RMSE of Each Model at Different Intervals')+
  ylab('Average RMSE at Forecasting Intervals')+
  xlab('Months in the Future')

The ARIMA Model performs best at forecasting horizons 2,3,4,5,6,8,9 and 12 month, whereas Prophet Model performs best at forecasting horizons 1,7,10 and 11 month. Thus, we can say that overall ARIMA model is best for forecasting Air Passenger Miles data.

Out of Sample Forecast

Fitting final selected (ARIMA) model to the test set

Code
best_model %>%
    forecast(h=36) %>%
    autoplot(train %>%
    bind_rows(test))+
    ylab('Air Passenger Miles (in billions)')+
    xlab('Date')+
    ggtitle('3 years Forecast of Air Passenger Miles')+
    theme_bw()

Final accuracy calculation based on the test set

Code
best_model %>%
    forecast(h=36) %>%
    accuracy(test)
# A tibble: 1 × 10
  .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
  <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 mod7   Test   1.79  2.46  2.04  2.01  2.36   NaN   NaN 0.485

The best selected model, i.e., ARIMA(log(value)~pdq(1,1,1)+PDQ(1,1,1)) model performed with 2.36% average deviation from real values during out of sample testing, corresponding to an average error of 2.46 billion miles in the air passenger revenue miles.