if (!require("knitr")) {
install.packages("knitr")
library(knitr)
}
## Loading required package: knitr
if (!require("forecast")) {
install.packages("forecast")
library(forecast)
}
## Loading required package: forecast
## Warning: package 'forecast' was built under R version 4.4.3
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
if (!require("TTR")) {
install.packages("TTR")
library(TTR)
}
## Loading required package: TTR
## Warning: package 'TTR' was built under R version 4.4.3
if (!require("dplyr")) {
install.packages("dplyr")
library(dplyr)
}
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
if (!require("ggplot2")) {
install.packages("ggplot2")
library(ggplot2)
}
## Loading required package: ggplot2
knitr::opts_chunk$set(
echo = TRUE,
warning = FALSE,
message = FALSE,
fig.align = "center",
fig.pos = "ht"
)
In this project, we apply time series to investigate the monthly totals of international airlne passengers from the airline passengers dataset. This dataset has a sample size over 100, and displays seasonality for traveling trends. We will fit and compare a simple exponential smoothing model, Holt models, and Holt-Winters models. We will then report training accuracy and forecast errors on the held-out test set. The ultimate goal is to identify which exponential smoothing model best captures the structure of the dataset and produces the best forecast.
For this project, we use the AirPassengers dataset in R, which records the monthly totals of international airline passengers from Jan. 1949 to Dec. 1960.
This dataset contains a single numeric variable: Passengers, which describes the number of international airline passengers in thousands.
data("AirPassengers")
ap_raw <- AirPassengers
ap_raw
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949 112 118 132 129 121 135 148 148 136 119 104 118
## 1950 115 126 141 135 125 149 170 170 158 133 114 140
## 1951 145 150 178 163 172 178 199 199 184 162 146 166
## 1952 171 180 193 181 183 218 230 242 209 191 172 194
## 1953 196 196 236 235 229 243 264 272 237 211 180 201
## 1954 204 188 235 227 234 264 302 293 259 229 203 229
## 1955 242 233 267 269 270 315 364 347 312 274 237 278
## 1956 284 277 317 313 318 374 413 405 355 306 271 306
## 1957 315 301 356 348 355 422 465 467 404 347 305 336
## 1958 340 318 362 348 363 435 491 505 404 359 310 337
## 1959 360 342 406 396 420 472 548 559 463 407 362 405
## 1960 417 391 419 461 472 535 622 606 508 461 390 432
start(ap_raw); end(ap_raw); frequency(ap_raw)
## [1] 1949 1
## [1] 1960 12
## [1] 12
length(ap_raw)
## [1] 144
This series has 144 monthly observations, satisfying the >100 requirment.
Because the dataset is already an object with frequency of 12, we can use it as out working series.
ap.ts <- ap_raw
par(mar = c(3, 3, 2, 1))
plot(
ap.ts,
main = "Monthly Airline Passengers (1949–1960)",
ylab = "Passengers (thousands)",
xlab = "Year",
col = "darkred"
)
This series exhibits a strong upward trend over time and seasonal peaks
which grow larger as the level increases.
We hold out the most recent 12 months (1 year), as our testing set. The training data ranges from Jan. 1949 - Dec. 1959, with the test data spanning Jan 1960 - Dec. 1960.
ap_all <- as.numeric(ap.ts)
n_total <- length(ap_all)
# Last 12 months as test data
h_test <- 12
test_data <- ap_all[(n_total - h_test + 1):n_total]
train_data <- ap_all[1:(n_total - h_test)]
length(train_data); length(test_data)
## [1] 132
## [1] 12
ap_train.ts <- ts(train_data, frequency = 12, start = c(1949, 1))
par(mar = c(3, 3, 2, 1))
plot(
ap_train.ts,
main = "Training Data: Airline Passengers (1949–1959)",
ylab = "Passengers (thousands)",
xlab = "Year",
col = "darkred"
)
The training sample retains more than 100 observations, leaving a full calendar year for out of sample testing evaluation.
We will now fit the three models: - Simple Exponential Model - Holt’s Trend Model - Holt-Winter’s Seasonal Model
SES assumes explicit trend or seasonality, only a smoothed level. This makes a sufficient baseline to compare against the other 2 models.
fit_ses <- ses(ap_train.ts, h = h_test)
fit_ses
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1960 404.9957 364.6867 445.3047 343.3483 466.6431
## Feb 1960 404.9957 347.9930 461.9984 317.8175 492.1739
## Mar 1960 404.9957 335.1830 474.8084 298.2264 511.7649
## Apr 1960 404.9957 324.3837 485.6077 281.7102 528.2812
## May 1960 404.9957 314.8691 495.1223 267.1590 542.8324
## Jun 1960 404.9957 306.2673 503.7241 254.0037 555.9877
## Jul 1960 404.9957 298.3571 511.6343 241.9061 568.0853
## Aug 1960 404.9957 290.9945 518.9969 230.6459 579.3455
## Sep 1960 404.9957 284.0793 525.9121 220.0700 589.9214
## Oct 1960 404.9957 277.5388 532.4526 210.0672 599.9242
## Nov 1960 404.9957 271.3179 538.6735 200.5531 609.4383
## Dec 1960 404.9957 265.3739 544.6175 191.4625 618.5289
accuracy(fit_ses)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 2.219773 31.21412 23.90229 0.4135676 8.911792 0.7849683 0.2863262
The ouptut reports the smoothing parameter alpha, and various training accuracy measures (ME, RMSE, MAE, MAPE) based on the fitted values.
SES tracks the overall level but cannot capture the strong trend and seasonality in the data.
Next we will fit Holt’s linear trend models, both with and without a damped trend. The damped version is meant to flatten long horizon forecasts and improve accuracy in most cases when the series shows trend.
# Holt additive trend (no damping)
fit_holt <- holt(ap_train.ts, initial = "optimal", h = h_test)
# Holt additive trend with damping
fit_holt_damped <- holt(ap_train.ts, initial = "optimal",
damped = TRUE, h = h_test)
fit_holt
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1960 407.0734 366.5319 447.6149 345.0705 469.0763
## Feb 1960 409.1630 351.8371 466.4888 321.4906 496.8353
## Mar 1960 411.2525 341.0441 481.4610 303.8780 518.6271
## Apr 1960 413.3421 332.2710 494.4132 289.3546 537.3297
## May 1960 415.4317 324.7887 506.0747 276.8052 554.0582
## Jun 1960 417.5213 318.2232 516.8194 265.6580 569.3846
## Jul 1960 419.6109 312.3524 526.8694 255.5731 583.6486
## Aug 1960 421.7005 307.0314 536.3696 246.3292 597.0717
## Sep 1960 423.7900 302.1597 545.4204 237.7724 609.8076
## Oct 1960 425.8796 297.6641 554.0951 229.7909 621.9683
## Nov 1960 427.9692 293.4894 562.4490 222.3001 633.6383
## Dec 1960 430.0588 289.5926 570.5250 215.2343 644.8833
fit_holt_damped
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1960 405.1364 364.3876 445.8853 342.8165 467.4564
## Feb 1960 405.2743 347.6467 462.9020 317.1404 493.4083
## Mar 1960 405.4095 334.8278 475.9912 297.4641 513.3549
## Apr 1960 405.5419 324.0378 487.0461 280.8920 530.1919
## May 1960 405.6717 314.5432 496.8003 266.3026 545.0409
## Jun 1960 405.7990 305.9680 505.6300 253.1206 558.4773
## Jul 1960 405.9236 298.0888 513.7584 241.0045 570.8428
## Aug 1960 406.0458 290.7601 521.3314 229.7316 582.3600
## Sep 1960 406.1655 283.8809 528.4501 219.1474 593.1836
## Oct 1960 406.2828 277.3776 535.1880 209.1393 603.4263
## Nov 1960 406.3978 271.1947 541.6009 199.6225 613.1731
## Dec 1960 406.5105 265.2891 547.7319 190.5310 622.4900
accuracy(fit_holt)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.06902756 31.15172 23.8295 -0.5842125 8.965372 0.782578 0.2860526
accuracy(fit_holt_damped)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 1.489525 31.18846 23.93146 -0.02541273 8.96902 0.7859265 0.2860381
Both Holt models model the upward trend. The damped version constrains the long growth and helps to reduce over forecasting for extended horizons.
Lastly, we use the Holt-Winters methods to model both trend and seasonality. We will fit the following: - Additive Holt-Winters - Multiplicative Holt-Winters - Additive Holt-Winters with damped trend - Multiplicative Holt-Winters with damped trend
# Holt–Winters additive seasonality
fit_hw_add <- hw(ap_train.ts, h = h_test, seasonal = "additive")
# Holt–Winters multiplicative seasonality
fit_hw_mult <- hw(ap_train.ts, h = h_test, seasonal = "multiplicative")
# Holt–Winters additive with damped trend
fit_hw_add_damped <- hw(ap_train.ts, h = h_test,
seasonal = "additive", damped = TRUE)
# Holt–Winters multiplicative with damped trend
fit_hw_mult_damped <- hw(ap_train.ts, h = h_test,
seasonal = "multiplicative", damped = TRUE)
fit_hw_add
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1960 409.2577 388.1899 430.3256 377.0373 441.4782
## Feb 1960 402.8045 373.0119 432.5970 357.2407 448.3683
## Mar 1960 439.0902 402.5995 475.5810 383.2824 494.8980
## Apr 1960 430.7499 388.6097 472.8900 366.3021 495.1977
## May 1960 433.3781 386.2587 480.4976 361.3151 505.4411
## Jun 1960 474.8847 423.2618 526.5077 395.9342 553.8353
## Jul 1960 502.6950 446.9289 558.4611 417.4082 587.9819
## Aug 1960 504.2110 444.5870 563.8349 413.0240 595.3980
## Sep 1960 461.0817 397.8328 524.3305 364.3509 557.8124
## Oct 1960 426.6746 359.9959 493.3533 324.6983 528.6509
## Nov 1960 398.6437 328.7014 468.5860 291.6762 505.6112
## Dec 1960 424.4781 351.4162 497.5399 312.7396 536.2165
fit_hw_mult
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1960 416.6188 394.3272 438.9105 382.5267 450.7110
## Feb 1960 393.6628 371.3633 415.9624 359.5587 427.7670
## Mar 1960 462.3467 434.7187 489.9747 420.0934 504.6001
## Apr 1960 448.5228 420.3370 476.7085 405.4164 491.6291
## May 1960 472.2089 441.0871 503.3307 424.6123 519.8055
## Jun 1960 540.0026 502.7660 577.2392 483.0542 596.9510
## Jul 1960 625.6443 580.6024 670.6862 556.7586 694.5300
## Aug 1960 635.3948 587.7281 683.0615 562.4948 708.2947
## Sep 1960 520.6261 479.9980 561.2542 458.4908 582.7614
## Oct 1960 455.1924 418.2998 492.0850 398.7701 511.6147
## Nov 1960 399.6811 366.0861 433.2762 348.3020 451.0603
## Dec 1960 440.2986 401.9675 478.6297 381.6763 498.9209
accuracy(fit_hw_add)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.7468154 15.41083 11.57204 0.2589801 5.002286 0.3800341 0.1636311
accuracy(fit_hw_mult)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 1.369382 9.949946 7.533284 0.2993092 2.99775 0.2473985 0.3047973
accuracy(fit_hw_add_damped)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 1.638265 15.50791 11.67084 0.6132358 5.063083 0.3832787 0.1704608
accuracy(fit_hw_mult_damped)
## ME RMSE MAE MPE MAPE MASE
## Training set 1.431481 8.482344 6.764148 0.4725379 2.857529 0.2221395
## ACF1
## Training set -0.03712534
accuracy_table_train <- round(rbind(
SES = accuracy(fit_ses),
Holt_Additive = accuracy(fit_holt),
Holt_Add_Damped = accuracy(fit_holt_damped),
HW_Additive = accuracy(fit_hw_add),
HW_Multiplicative = accuracy(fit_hw_mult),
HW_Add_Damped = accuracy(fit_hw_add_damped),
HW_Mult_Damped = accuracy(fit_hw_mult_damped)
), 4)
kable(accuracy_table_train,
caption = "Training Accuracy Measures for Exponential Smoothing Models (Airline Passengers)")
| ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | |
|---|---|---|---|---|---|---|---|
| Training set | 2.2198 | 31.2141 | 23.9023 | 0.4136 | 8.9118 | 0.7850 | 0.2863 |
| Training set | 0.0690 | 31.1517 | 23.8295 | -0.5842 | 8.9654 | 0.7826 | 0.2861 |
| Training set | 1.4895 | 31.1885 | 23.9315 | -0.0254 | 8.9690 | 0.7859 | 0.2860 |
| Training set | 0.7468 | 15.4108 | 11.5720 | 0.2590 | 5.0023 | 0.3800 | 0.1636 |
| Training set | 1.3694 | 9.9499 | 7.5333 | 0.2993 | 2.9978 | 0.2474 | 0.3048 |
| Training set | 1.6383 | 15.5079 | 11.6708 | 0.6132 | 5.0631 | 0.3833 | 0.1705 |
| Training set | 1.4315 | 8.4823 | 6.7641 | 0.4725 | 2.8575 | 0.2221 | -0.0371 |
-SES has the largest RMSE and MAPE, showing its inability to capture trend/seasonality. -Holt additive models improve fit by modeling the upward trend. -Holt-Winters models show lower RMSE and MAPE, showing the ability to model seasonality.
acc_fun <- function(test_data, model_object) {
fc_mean <- as.numeric(model_object$mean)
PE <- (test_data - fc_mean) / fc_mean
MAPE <- mean(abs(PE))
E <- test_data - fc_mean
MSE <- mean(E^2)
c(MSE = MSE, MAPE = MAPE)
}
test_accuracy <- rbind(
SES = acc_fun(test_data, fit_ses),
Holt_Additive = acc_fun(test_data, fit_holt),
Holt_Add_Damped = acc_fun(test_data, fit_holt_damped),
HW_Additive = acc_fun(test_data, fit_hw_add),
HW_Multiplicative = acc_fun(test_data, fit_hw_mult),
HW_Add_Damped = acc_fun(test_data, fit_hw_add_damped),
HW_Mult_Damped = acc_fun(test_data, fit_hw_mult_damped)
)
test_accuracy <- round(test_accuracy, 4)
kable(test_accuracy,
caption = "Forecast Error (MSE and MAPE) on the 12-Month Test Set (1960)")
| MSE | MAPE | |
|---|---|---|
| SES | 10604.7788 | 0.1877 |
| Holt_Additive | 8704.9008 | 0.1595 |
| Holt_Add_Damped | 10470.8899 | 0.1858 |
| HW_Additive | 2894.0839 | 0.0867 |
| HW_Multiplicative | 274.9997 | 0.0226 |
| HW_Add_Damped | 3582.8358 | 0.1005 |
| HW_Mult_Damped | 501.8367 | 0.0399 |
This table summarized out of sample performance.
par(mfrow = c(2, 1), mar = c(3, 4, 3, 1))
# Time index for training and forecast
time_train <- time(ap_train.ts)
time_full <- c(time_train,
time(ap.ts)[(length(time_train) + 1):length(ap.ts)])
pred_index <- (length(train_data) + 1):length(ap_all)
plot(
time_train, train_data,
type = "o",
ylab = "Passengers",
xlab = "",
xlim = range(time_full),
main = "Non-Seasonal Smoothing Models (Training + 12-Month Forecast)",
col = "black",
cex = 0.5
)
lines(time_full[pred_index], as.numeric(fit_ses$mean), col = "red")
lines(time_full[pred_index], as.numeric(fit_holt$mean), col = "blue")
lines(time_full[pred_index], as.numeric(fit_holt_damped$mean), col = "purple")
points(time_full[pred_index], as.numeric(fit_ses$mean), pch = 16, col = "red", cex = 0.6)
points(time_full[pred_index], as.numeric(fit_holt$mean), pch = 17, col = "blue", cex = 0.6)
points(time_full[pred_index], as.numeric(fit_holt_damped$mean), pch = 19, col = "purple", cex = 0.6)
legend(
"topleft",
lty = 1,
col = c("red", "blue", "purple"),
pch = c(16, 17, 19),
legend = c("SES", "Holt Additive", "Holt Additive Damped"),
cex = 0.8,
bty = "n"
)
plot(
time_train, train_data,
type = "o",
ylab = "Passengers",
xlab = "",
xlim = range(time_full),
main = "Holt–Winters Trend and Seasonal Smoothing Models",
col = "black",
cex = 0.5
)
lines(time_full[pred_index], as.numeric(fit_hw_add$mean), col = "red")
lines(time_full[pred_index], as.numeric(fit_hw_mult$mean), col = "blue")
lines(time_full[pred_index], as.numeric(fit_hw_add_damped$mean), col = "purple")
lines(time_full[pred_index], as.numeric(fit_hw_mult_damped$mean), col = "navy")
points(time_full[pred_index], as.numeric(fit_hw_add$mean), pch = 16, col = "red", cex = 0.6)
points(time_full[pred_index], as.numeric(fit_hw_mult$mean), pch = 17, col = "blue", cex = 0.6)
points(time_full[pred_index], as.numeric(fit_hw_add_damped$mean), pch = 19, col = "purple", cex = 0.6)
points(time_full[pred_index], as.numeric(fit_hw_mult_damped$mean), pch = 21, col = "navy", cex = 0.6)
legend(
"topleft",
lty = 1,
col = c("red", "blue", "purple", "navy"),
pch = c(16, 17, 19, 21),
legend = c("HW Additive", "HW Multiplicative",
"HW Additive Damped", "HW Multiplicative Damped"),
cex = 0.8,
bty = "n"
)
par(mfrow = c(1, 1))
The AirPassengers series shows strong upward trend and seasonal spikes, with the size of seasonal swings increasing as time passes.
The training accuracy tables show clear rankings of models: - SES performs the worst, with the largest RMSE and MAPE. This is due to not being able to model trend or seasonality. - Holt’s additive trend and damped models improve on SES by capturing the upward movement in the data, but still lack the ability to capture seasonality. The forecasts fail to reproduce the seasonal patterns. - The Holt-Winters models fit the data better. Both additive and multiplicative reduce error metrics,most notably the multiplicative H-W model. - The multiplicative damped H-W model achieves the smallest RMSE (8.48) and the smallest MAPE (2.86%) on the training set, indicating good fit.
The 12 month holdout confirms the H-W Multiplicative model as superior. - The SES and Holt models show very large MSE values, showing weak ability to generalize. - The additive H-W model performs better, but still does not do a great job of capturing the data. - The multiplicative H-W model provides by far the best OOS performance, with a MSE of 275 and a MAPE of 0.0026.
The test results align closely with the findings from the forecast plots, with multiplicative H-W model perfroming the best while other models fail to capture trends/seasonality.
Across both training and testing, the H-W multiplicative model is the best forecasting method for the dataset. It is able to capture both trend and seasonality.
The results are consistent with the structure of the data, given strong trend and seasonality within airline travel.
Ooverall, the exponential smoothing analysis shows that incorporating both trend and seasonality is vital for forecasting monthly airline data.