# Load data
df <- read.csv("/Users/pin.lyu/Desktop/BC_Class_Folder/Predictive_Analytics/Data_Sets/baggagecomplaints.csv")
Project Objective
Data Information
The data set contains monthly observations from 2004 to 2010 for United Airlines, American Eagle, and Hawaiian Airlines. The variables in the data set include:
Baggage - The total number of passenger complaints for theft of baggage contents, or for lost, damaged, or misrouted luggage for the airline that month
Scheduled - The total number of flights scheduled by that airline that month
Cancelled - The total number of flights cancelled by that airline that month
Enplaned - The total number of passengers who boarded a plane with the airline that month
# Create new time variable
year_month <- paste(df$Year, df$Month, sep = "-")
# Convert the combined string into a date object
df$Time <- as.Date(paste(year_month, "01", sep = "-"))
# Drop uneccessary variables
new_df <- df[, -c(2:4)]
# Separate different airlines
AE <- new_df |>
filter(Airline == "American Eagle")
# Define the time series data
AE_ts <- ts(AE$Baggage[1:60], start = c(2004, 1), end = c(2008, 12), frequency = 12)
# Define the external regressors
Enplaned <- AE$Enplaned[1:60]
Cancelled <- AE$Cancelled[1:60]
# Fit the NNETAR model
nnt_model <- nnetar(AE_ts, xreg = cbind(Enplaned, Cancelled))
# Define the external regressors for the forecast period (2009/01 - 2010/12)
Enplaned_forecast <- AE$Enplaned[61:84]
Cancelled_forecast <- AE$Cancelled[61:84]
# Forecast for the next 24 months
nnt_result <- forecast(nnt_model, xreg = cbind(Enplaned_forecast, Cancelled_forecast), h = 24)
## Warning in forecast.nnetar(nnt_model, xreg = cbind(Enplaned_forecast,
## Cancelled_forecast), : xreg contains different column names from the xreg used
## in training. Please check that the regressors are in the same order.
# Seasonal ARIMA model
sarima_model <- auto.arima(AE_ts, seasonal = T)
# Forecasts
sarima_result <- forecast(sarima_model, h = 12)
# ETS model
ets_model <- ets(AE_ts)
# Forecasts
ets_result <- forecast(ets_model, h = 24)
## ETS(M,N,A)
##
## Call:
## ets(y = AE_ts)
##
## Smoothing parameters:
## alpha = 0.9806
## gamma = 0.019
##
## Initial states:
## l = 14489.2893
## s = 5536.371 -3766.585 -1968.58 -2502.965 2265.311 2989.616
## 2082.037 -1277.157 -2259.975 219.0715 -2187.267 870.123
##
## sigma: 0.1221
##
## AIC AICc BIC
## 1166.319 1177.228 1197.734
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -77.06409 1746.776 1335.798 -0.7611951 8.411157 0.234522
## ACF1
## Training set 0.02665093
## Series: AE_ts
## ARIMA(0,1,0)(1,1,0)[12]
##
## Coefficients:
## sar1
## -0.5684
## s.e. 0.1075
##
## sigma^2 = 5230318: log likelihood = -432.07
## AIC=868.14 AICc=868.41 BIC=871.84
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -261.5393 2002.476 1384.761 -1.608408 8.098751 0.2431183 0.0407665
Based on the information presented above, graphically, the ETS and Neural Network (NNT) models exhibit the best predictions. It’s important to note that the NNT model does not have prediction intervals, as neural networks are not based on a well-defined stochastic model. Nevertheless, it still provides reliable predictions.
Comparing the ETS model with the SARIMA model, we can observe that the AICc level from the SARIMA model is significantly lower than that of the ETS model. However, metrics such as RMSE, MAE, and MAPE, which indicate a good forecasting model, are significantly lower in the ETS model. Additionally, given how well the fitted line aligns with the results of the ETS prediction, it’s reasonable to conclude that the ETS model performs better than the SARIMA model.
Given the excellent performance of both the ETS and NNT models, both can be considered suitable for forecasting the level of complaints filed by passengers. Alternatively, an averaged model could be created by combining the forecasts from the ETS and NNT models. This approach may help capture the strengths of both models and provide more robust and reliable predictions.