Data

# Load data

df <- read.csv("/Users/pin.lyu/Desktop/BC_Class_Folder/Predictive_Analytics/Data_Sets/baggagecomplaints.csv")
  • Project Objective

    • The aim of this project to use the variables provided in this data set to forecast the number complaints by the passengers in American Eagle airline.
  • Data Information

    • The data set contains monthly observations from 2004 to 2010 for United Airlines, American Eagle, and Hawaiian Airlines. The variables in the data set include:

      • Baggage - The total number of passenger complaints for theft of baggage contents, or for lost, damaged, or misrouted luggage for the airline that month

      • Scheduled - The total number of flights scheduled by that airline that month

      • Cancelled - The total number of flights cancelled by that airline that month

      • Enplaned - The total number of passengers who boarded a plane with the airline that month

Data Manipulation

# Create new time variable 

year_month <- paste(df$Year, df$Month, sep = "-")

# Convert the combined string into a date object

df$Time <- as.Date(paste(year_month, "01", sep = "-"))
# Drop uneccessary variables

new_df <- df[, -c(2:4)]

# Separate different airlines

AE <- new_df |>
  filter(Airline == "American Eagle")

Decomposition

Additive

Multiplicative

Data Visualization

Variable Selection

Correlation Check

  • Comments: Due to high correlation between several of these variables, to avoid multicollinearity in our model building process, only three variables are chosen; “Baggage” (dependent variable), “Enplaned” & “Cancelled” (independent variables)

Model Building

Neural Network

# Define the time series data

AE_ts <- ts(AE$Baggage[1:60], start = c(2004, 1), end = c(2008, 12), frequency = 12)

# Define the external regressors

Enplaned <- AE$Enplaned[1:60]
Cancelled <- AE$Cancelled[1:60]

# Fit the NNETAR model

nnt_model <- nnetar(AE_ts, xreg = cbind(Enplaned, Cancelled))
# Define the external regressors for the forecast period (2009/01 - 2010/12)

Enplaned_forecast <- AE$Enplaned[61:84]
Cancelled_forecast <- AE$Cancelled[61:84]

# Forecast for the next 24 months

nnt_result <- forecast(nnt_model, xreg = cbind(Enplaned_forecast, Cancelled_forecast), h = 24)
## Warning in forecast.nnetar(nnt_model, xreg = cbind(Enplaned_forecast,
## Cancelled_forecast), : xreg contains different column names from the xreg used
## in training. Please check that the regressors are in the same order.

SARIMA

# Seasonal ARIMA model

sarima_model <- auto.arima(AE_ts, seasonal = T)

# Forecasts

sarima_result <- forecast(sarima_model, h = 12)

ETS

# ETS model

ets_model <- ets(AE_ts)

# Forecasts

ets_result <- forecast(ets_model, h = 24)

Model Comparison

  • ETS Model
## ETS(M,N,A) 
## 
## Call:
##  ets(y = AE_ts) 
## 
##   Smoothing parameters:
##     alpha = 0.9806 
##     gamma = 0.019 
## 
##   Initial states:
##     l = 14489.2893 
##     s = 5536.371 -3766.585 -1968.58 -2502.965 2265.311 2989.616
##            2082.037 -1277.157 -2259.975 219.0715 -2187.267 870.123
## 
##   sigma:  0.1221
## 
##      AIC     AICc      BIC 
## 1166.319 1177.228 1197.734 
## 
## Training set error measures:
##                     ME     RMSE      MAE        MPE     MAPE     MASE
## Training set -77.06409 1746.776 1335.798 -0.7611951 8.411157 0.234522
##                    ACF1
## Training set 0.02665093
  • SARIMA Model
## Series: AE_ts 
## ARIMA(0,1,0)(1,1,0)[12] 
## 
## Coefficients:
##          sar1
##       -0.5684
## s.e.   0.1075
## 
## sigma^2 = 5230318:  log likelihood = -432.07
## AIC=868.14   AICc=868.41   BIC=871.84
## 
## Training set error measures:
##                     ME     RMSE      MAE       MPE     MAPE      MASE      ACF1
## Training set -261.5393 2002.476 1384.761 -1.608408 8.098751 0.2431183 0.0407665

Summary

Based on the information presented above, graphically, the ETS and Neural Network (NNT) models exhibit the best predictions. It’s important to note that the NNT model does not have prediction intervals, as neural networks are not based on a well-defined stochastic model. Nevertheless, it still provides reliable predictions.

Comparing the ETS model with the SARIMA model, we can observe that the AICc level from the SARIMA model is significantly lower than that of the ETS model. However, metrics such as RMSE, MAE, and MAPE, which indicate a good forecasting model, are significantly lower in the ETS model. Additionally, given how well the fitted line aligns with the results of the ETS prediction, it’s reasonable to conclude that the ETS model performs better than the SARIMA model.

Given the excellent performance of both the ETS and NNT models, both can be considered suitable for forecasting the level of complaints filed by passengers. Alternatively, an averaged model could be created by combining the forecasts from the ETS and NNT models. This approach may help capture the strengths of both models and provide more robust and reliable predictions.