This analysis will be performed using a time series of the number of new houses sold in the U.S. in each month from January 2010 to January 2022. The data was retrieved from the website of the U.S. Census Bureau, where it is publicly available for free download.
The purpose of this analysis is to use this time series to construct several different exponential smoothing models, and to compare each model’s forecasting accuracy, both numerically and visually, to determine which is the best for this data.
library(forecast)
house.sales <- read.csv("C:\\Users\\eh738\\OneDrive\\Documents\\STA321\\FusionCharts2.csv")[1:145, 2]
house.sales.ts <- ts(house.sales, frequency = 12, start = 2010)
plot(house.sales.ts, main = "New House Sales, January 2010 to January 2022", ylab = "Houses Sold (in Thousands)", xlab= "")
Based on the plot, this time series appears to exhibit both an upward trend as well as seasonality. The non-constant variance may also be indicative of a multiplicative structure.
In order to assess the accuracy of each eventual forecasting model, the data will be split into a training set and a test set. The training set will consist of the first 133 monthly values and will be used to construct each model, while the final 12 observations will be used to compare the forecasted values of each model to the actual values to evaluate the model’s accuracy.
train.houses <- house.sales[1:133]
test.houses <- house.sales[134:145]
With the training set defined, the smoothing models can now be constructed. Three different types of exponential smoothing models will be created: simple, Holt, & Holt-Winters. Multiple variations of the Holt & Holt-Winters models (additive, multiplicative, with possible damped trend, etc.) will be constructed.
train.ts <- ts(train.houses, start=2010, frequency = 12)
fit1 = ses(train.ts, h=12)
fit2 = holt(train.ts, initial="optimal", h=12)
fit3 = holt(train.ts,damped=TRUE, h=12)
fit4 = holt(train.ts,exponential=TRUE, damped=TRUE, h=12)
fit5 = hw(train.ts,h=12, seasonal="additive")
fit6 = hw(train.ts,h=12, seasonal="multiplicative")
fit7 = hw(train.ts,h=12, seasonal="additive",damped=TRUE)
fit8 = hw(train.ts,h=12, seasonal="multiplicative",damped=TRUE)
accuracy.table = round(rbind(accuracy(fit1), accuracy(fit2), accuracy(fit3), accuracy(fit4),
accuracy(fit5), accuracy(fit6), accuracy(fit7), accuracy(fit8)),4)
row.names(accuracy.table)=c("SES","Holt Linear","Holt Add. Damped", "Holt Exp. Damped",
"HW Add.","HW Exp.","HW Add. Damp", "HW Exp. Damp")
kable(accuracy.table, caption = "Accuracy Metrics of Each Exponential Smoothing Model
Based on Training Set")
| ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | |
|---|---|---|---|---|---|---|---|
| SES | 0.3985 | 5.2451 | 3.9474 | 0.2022 | 9.0666 | 0.6662 | 0.0455 |
| Holt Linear | 0.0513 | 5.2799 | 4.0181 | -0.7904 | 9.3902 | 0.6781 | 0.0389 |
| Holt Add. Damped | 0.4132 | 5.2839 | 4.0090 | 0.2503 | 9.2784 | 0.6765 | 0.0408 |
| Holt Exp. Damped | 0.4463 | 5.2795 | 4.0096 | 0.3494 | 9.2794 | 0.6766 | 0.0411 |
| HW Add. | 0.0031 | 4.2369 | 3.0284 | -0.6884 | 7.0290 | 0.5111 | 0.0607 |
| HW Exp. | 0.2826 | 4.2434 | 3.0049 | 0.1452 | 6.9492 | 0.5071 | 0.1075 |
| HW Add. Damp | 0.6157 | 4.2568 | 3.0978 | 1.0902 | 7.2942 | 0.5228 | 0.0268 |
| HW Exp. Damp | 0.4665 | 4.1003 | 2.9194 | 0.4389 | 6.7337 | 0.4927 | -0.0001 |
Based on this table displaying several measures of the forecasting accuracy of each model, the Holt-Winters Multiplicative Damped model appears to have performed the best at forecasting the values of the test set. This is consistent with the initial observations on the pattern of the time series plot, as this particular model would capture both the seasonal variation of the data as well as the increasing variance, in addition to the general trend. Though not readily apparent in the plot, the damped variation of the model has better forecasting accuracy than the Holt-Winters multiplicative model without damping, though the difference is marginal.
par(mfrow=c(2,1), mar=c(3,4,3,1))
pred.id = 134:145
plot(1:133, train.houses, lwd=2,type="l", ylab="Houses Sold (in Thousands)", xlab="",
xlim=c(120,144), ylim=c(40, 120), cex=0.3,
main="Non-Seasonal Smoothing Models")
lines(pred.id, fit1$mean, col="lightgoldenrod")
lines(pred.id, fit2$mean, col="darkturquoise")
lines(pred.id, fit3$mean, col="darkorchid2")
lines(pred.id, fit4$mean, col="springgreen3")
##
points(pred.id, fit1$mean, pch=16, col="lightgoldenrod", cex = .7)
points(pred.id, fit2$mean, pch=17, col="darkturquoise", cex = .7)
points(pred.id, fit3$mean, pch=19, col="darkorchid2", cex = .7)
points(pred.id, fit4$mean, pch=21, col="springgreen3", cex = .7)
#points(fit0, col="black", pch=1)
legend("topleft", lty=1, col=c("lightgoldenrod","darkturquoise","darkorchid2", "springgreen3"),pch=c(16,17,19,21),
c("SES","Holt Linear","Holt Linear Damped", "Holt Multiplicative Damped"),
cex = 0.7, bty="n")
###########
plot(1:133, train.houses, lwd=2,type="l", ylab="Houses Sold (in Thousands)", xlab="",
xlim=c(120,144), ylim=c(40, 120), cex=0.3,
main="Holt-Winters Trend and Seasonal Smoothing Models")
lines(pred.id, fit5$mean, col="lightgoldenrod")
lines(pred.id, fit6$mean, col="darkturquoise")
lines(pred.id, fit7$mean, col="darkorchid2")
lines(pred.id, fit8$mean, col="springgreen3")
##
points(pred.id, fit5$mean, pch=16, col="lightgoldenrod", cex = .7)
points(pred.id, fit6$mean, pch=17, col="darkturquoise", cex = .7)
points(pred.id, fit7$mean, pch=19, col="darkorchid2", cex = .7)
points(pred.id, fit8$mean, pch=21, col="springgreen3", cex = .7)
###
legend("topleft", lty=1, col=c("lightgoldenrod","darkturquoise","darkorchid2", "springgreen3"),pch=c(16,17,19,21),
c("HW Additive","HW Multiplicative","HW Additive Damped", "HW Multiplicative Damped"),
cex = 0.7, bty="n")
As can be seen from the plots above, the non-seasonal models (i.e., SES & Holt models) generally do a poor job of capturing the rising and falling pattern of the data. This is reflected in the accuracy measures in the previous section. On the other hand, the trajectories of the Holt-Winters models in the bottom plot appear to resemble the seasonal pattern of the training data more closely.
The accuracy of the models can also be evaluated by comparing their forecasted values to the actual values of the test set observations.
acc.fun = function(test.data, mod.obj){
PE=100*(test.data-mod.obj$mean)/mod.obj$mean
MAPE = mean(abs(PE))
###
E=test.data-mod.obj$mean
MSE=mean(E^2)
###
accuracy.metric=c(MSE=MSE, MAPE=MAPE)
accuracy.metric
}
pred.accuracy = rbind(SES =acc.fun(test.data=test.houses, mod.obj=fit1),
Holt.Add =acc.fun(test.data=test.houses, mod.obj=fit2),
Holt.Add.Damp =acc.fun(test.data=test.houses, mod.obj=fit3),
Holt.Exp =acc.fun(test.data=test.houses, mod.obj=fit4),
HW.Add =acc.fun(test.data=test.houses, mod.obj=fit5),
HW.Exp =acc.fun(test.data=test.houses, mod.obj=fit6),
HW.Add.Damp =acc.fun(test.data=test.houses, mod.obj=fit7),
HW.Exp.Damp =acc.fun(test.data=test.houses, mod.obj=fit8))
kable(pred.accuracy, caption="Accuracy Measures of Exponential Smoothing Models
Based on Test Set")
| MSE | MAPE | |
|---|---|---|
| SES | 256.1293 | 18.61354 |
| Holt.Add | 320.9427 | 20.35274 |
| Holt.Add.Damp | 260.6465 | 18.74765 |
| Holt.Exp | 256.3903 | 18.62130 |
| HW.Add | 327.4461 | 20.75423 |
| HW.Exp | 617.3429 | 26.97782 |
| HW.Add.Damp | 272.4077 | 18.98471 |
| HW.Exp.Damp | 361.7761 | 21.67971 |
Interestingly, the accuracy metrics calculated based on the values of the test set are not consistent with the inferences drawn from the previous sections. In this case, the non-seasonal smoothing models generally do a better job of forecasting future values than the Holt-Winters models, with the Holt Multiplicative model exhibiting the least forecasting error. The Holt-Winters multiplicative damped model, which performed best in the previous accuracy metrics based on the entire data set, was actually one of the least accurate models when predicting the test set values. This may simply be an anomaly due to some general irregularity in the final few months of data, but it may also be indicative of an actual change occurring in the pattern of the data.
Because the Holt-Winters additive damped model was among the most accurate models using both methods, and its pattern of forecasted values somewhat resembles the seasonal pattern of the data, this will be used as the final model to reconcile the conflicting results of the two methods of accuracy evaluation.
With the type of the final model selected, it must be refit to the entire data set instead of just the training set.
final.model = hw(house.sales.ts, h=12, seasonal="additive", damped=TRUE)
smoothing.parameter = final.model$model$par[1:4]
kable(smoothing.parameter, caption="Estimated Values of Smoothing Parameters in
Holt-Winters Damped Trend with Additive Seasonality")
| x | |
|---|---|
| alpha | 0.8038473 |
| beta | 0.0001001 |
| gamma | 0.0001007 |
| phi | 0.9491619 |
Based on these parameter values, the final model can be represented by the following set of equations:
Level Equation:
\(l_t = 0.8038473 \cdot (y_{t} - s_{t-12})+(0.1961527) \cdot (l_{t-1}+(0.9491619)b_{t-1})\)
Trend Equation:
\(b_t= 0.0001001 \cdot (l_t - l_{t-1})+(0.9998999) \cdot (0.9491619)(b_{t-1})\)
Seasonal Equation:
\(s_t= 0.0001007 \cdot (y_t -l_t)+(0.9998993) \cdot s_{t-12}\)
Forecast Equation:
\(\hat{y}_{t+h} = l_t + (1+0.9491619+0.9491619^2+...+0.9491619^{12}) \cdot b_t +s_{t-12}\)
where:
\(y_t\) is the observed number of houses sold at time \(t\)
\(l_t\) is the level component at time \(t\)
\(b_t\) is the trend component at time \(t\)
\(s_t\) is the seasonal component at time \(t\)