1 Introduction

1.1 Data Set

This analysis will be performed using a time series of the number of new houses sold in the U.S. in each month from January 2010 to January 2022. The data was retrieved from the website of the U.S. Census Bureau, where it is publicly available for free download.

1.2 Research Question

The purpose of this analysis is to use this time series to construct several different exponential smoothing models, and to compare each model’s forecasting accuracy, both numerically and visually, to determine which is the best for this data.

1.3 Time Series Plot

library(forecast)
house.sales <- read.csv("C:\\Users\\eh738\\OneDrive\\Documents\\STA321\\FusionCharts2.csv")[1:145, 2]
house.sales.ts <- ts(house.sales, frequency = 12, start = 2010)
plot(house.sales.ts, main = "New House Sales, January 2010 to January 2022", ylab = "Houses Sold (in Thousands)", xlab= "")

Based on the plot, this time series appears to exhibit both an upward trend as well as seasonality. The non-constant variance may also be indicative of a multiplicative structure.

2 Creating Exponential Smoothing Models

2.1 Training & Test Sets

In order to assess the accuracy of each eventual forecasting model, the data will be split into a training set and a test set. The training set will consist of the first 133 monthly values and will be used to construct each model, while the final 12 observations will be used to compare the forecasted values of each model to the actual values to evaluate the model’s accuracy.

train.houses <- house.sales[1:133]
test.houses <- house.sales[134:145]

2.2 Constructing Models

With the training set defined, the smoothing models can now be constructed. Three different types of exponential smoothing models will be created: simple, Holt, & Holt-Winters. Multiple variations of the Holt & Holt-Winters models (additive, multiplicative, with possible damped trend, etc.) will be constructed.

train.ts <- ts(train.houses, start=2010, frequency = 12)
fit1 = ses(train.ts, h=12)
fit2 = holt(train.ts, initial="optimal", h=12)             
fit3 = holt(train.ts,damped=TRUE, h=12)                   
fit4 = holt(train.ts,exponential=TRUE, damped=TRUE, h=12) 
fit5 = hw(train.ts,h=12, seasonal="additive")              
fit6 = hw(train.ts,h=12, seasonal="multiplicative")
fit7 = hw(train.ts,h=12, seasonal="additive",damped=TRUE)
fit8 = hw(train.ts,h=12, seasonal="multiplicative",damped=TRUE)

3 Comparing Model Forecasting Accuracy

3.1 Accuracy Metrics Table

accuracy.table = round(rbind(accuracy(fit1), accuracy(fit2), accuracy(fit3), accuracy(fit4),
                             accuracy(fit5), accuracy(fit6), accuracy(fit7), accuracy(fit8)),4)
row.names(accuracy.table)=c("SES","Holt Linear","Holt Add. Damped", "Holt Exp. Damped",
                            "HW Add.","HW Exp.","HW Add. Damp", "HW Exp. Damp")
kable(accuracy.table, caption = "Accuracy Metrics of Each Exponential Smoothing Model 
      Based on Training Set")

Accuracy Metrics of Each Exponential Smoothing Model Based on Training Set
	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
SES	0.3985	5.2451	3.9474	0.2022	9.0666	0.6662	0.0455
Holt Linear	0.0513	5.2799	4.0181	-0.7904	9.3902	0.6781	0.0389
Holt Add. Damped	0.4132	5.2839	4.0090	0.2503	9.2784	0.6765	0.0408
Holt Exp. Damped	0.4463	5.2795	4.0096	0.3494	9.2794	0.6766	0.0411
HW Add.	0.0031	4.2369	3.0284	-0.6884	7.0290	0.5111	0.0607
HW Exp.	0.2826	4.2434	3.0049	0.1452	6.9492	0.5071	0.1075
HW Add. Damp	0.6157	4.2568	3.0978	1.0902	7.2942	0.5228	0.0268
HW Exp. Damp	0.4665	4.1003	2.9194	0.4389	6.7337	0.4927	-0.0001

Based on this table displaying several measures of the forecasting accuracy of each model, the Holt-Winters Multiplicative Damped model appears to have performed the best at forecasting the values of the test set. This is consistent with the initial observations on the pattern of the time series plot, as this particular model would capture both the seasonal variation of the data as well as the increasing variance, in addition to the general trend. Though not readily apparent in the plot, the damped variation of the model has better forecasting accuracy than the Holt-Winters multiplicative model without damping, though the difference is marginal.

3.2 Graphical Comparisons

par(mfrow=c(2,1), mar=c(3,4,3,1))
pred.id = 134:145
plot(1:133, train.houses, lwd=2,type="l", ylab="Houses Sold (in Thousands)", xlab="", 
     xlim=c(120,144), ylim=c(40, 120), cex=0.3,
     main="Non-Seasonal Smoothing Models")
lines(pred.id, fit1$mean, col="lightgoldenrod")
lines(pred.id, fit2$mean, col="darkturquoise")
lines(pred.id, fit3$mean, col="darkorchid2")
lines(pred.id, fit4$mean, col="springgreen3")
##
points(pred.id, fit1$mean, pch=16, col="lightgoldenrod", cex = .7)
points(pred.id, fit2$mean, pch=17, col="darkturquoise", cex = .7)
points(pred.id, fit3$mean, pch=19, col="darkorchid2", cex = .7)
points(pred.id, fit4$mean, pch=21, col="springgreen3", cex = .7)
#points(fit0, col="black", pch=1)
legend("topleft", lty=1, col=c("lightgoldenrod","darkturquoise","darkorchid2", "springgreen3"),pch=c(16,17,19,21),
   c("SES","Holt Linear","Holt Linear Damped", "Holt Multiplicative Damped"), 
   cex = 0.7, bty="n")
###########
plot(1:133, train.houses, lwd=2,type="l", ylab="Houses Sold (in Thousands)", xlab="", 
     xlim=c(120,144), ylim=c(40, 120), cex=0.3,
     main="Holt-Winters Trend and Seasonal Smoothing Models")
lines(pred.id, fit5$mean, col="lightgoldenrod")
lines(pred.id, fit6$mean, col="darkturquoise")
lines(pred.id, fit7$mean, col="darkorchid2")
lines(pred.id, fit8$mean, col="springgreen3")
##
points(pred.id, fit5$mean, pch=16, col="lightgoldenrod", cex = .7)
points(pred.id, fit6$mean, pch=17, col="darkturquoise", cex = .7)
points(pred.id, fit7$mean, pch=19, col="darkorchid2", cex = .7)
points(pred.id, fit8$mean, pch=21, col="springgreen3", cex = .7)
###
legend("topleft", lty=1, col=c("lightgoldenrod","darkturquoise","darkorchid2", "springgreen3"),pch=c(16,17,19,21),
   c("HW Additive","HW Multiplicative","HW Additive Damped", "HW Multiplicative Damped"), 
   cex = 0.7, bty="n")

As can be seen from the plots above, the non-seasonal models (i.e., SES & Holt models) generally do a poor job of capturing the rising and falling pattern of the data. This is reflected in the accuracy measures in the previous section. On the other hand, the trajectories of the Holt-Winters models in the bottom plot appear to resemble the seasonal pattern of the training data more closely.

3.3 Accuracy Based on Test Set

The accuracy of the models can also be evaluated by comparing their forecasted values to the actual values of the test set observations.

acc.fun = function(test.data, mod.obj){
  PE=100*(test.data-mod.obj$mean)/mod.obj$mean
  MAPE = mean(abs(PE))
  ###
  E=test.data-mod.obj$mean
  MSE=mean(E^2)
  ###
  accuracy.metric=c(MSE=MSE, MAPE=MAPE)
  accuracy.metric
}

pred.accuracy = rbind(SES =acc.fun(test.data=test.houses, mod.obj=fit1),
                      Holt.Add =acc.fun(test.data=test.houses, mod.obj=fit2),
                      Holt.Add.Damp =acc.fun(test.data=test.houses, mod.obj=fit3),
                      Holt.Exp =acc.fun(test.data=test.houses, mod.obj=fit4),
                      HW.Add =acc.fun(test.data=test.houses, mod.obj=fit5),
                      HW.Exp =acc.fun(test.data=test.houses, mod.obj=fit6),
                      HW.Add.Damp =acc.fun(test.data=test.houses, mod.obj=fit7),
                      HW.Exp.Damp =acc.fun(test.data=test.houses, mod.obj=fit8))
kable(pred.accuracy, caption="Accuracy Measures of Exponential Smoothing Models 
      Based on Test Set")

Accuracy Measures of Exponential Smoothing Models Based on Test Set
	MSE	MAPE
SES	256.1293	18.61354
Holt.Add	320.9427	20.35274
Holt.Add.Damp	260.6465	18.74765
Holt.Exp	256.3903	18.62130
HW.Add	327.4461	20.75423
HW.Exp	617.3429	26.97782
HW.Add.Damp	272.4077	18.98471
HW.Exp.Damp	361.7761	21.67971

Interestingly, the accuracy metrics calculated based on the values of the test set are not consistent with the inferences drawn from the previous sections. In this case, the non-seasonal smoothing models generally do a better job of forecasting future values than the Holt-Winters models, with the Holt Multiplicative model exhibiting the least forecasting error. The Holt-Winters multiplicative damped model, which performed best in the previous accuracy metrics based on the entire data set, was actually one of the least accurate models when predicting the test set values. This may simply be an anomaly due to some general irregularity in the final few months of data, but it may also be indicative of an actual change occurring in the pattern of the data.

Because the Holt-Winters additive damped model was among the most accurate models using both methods, and its pattern of forecasted values somewhat resembles the seasonal pattern of the data, this will be used as the final model to reconcile the conflicting results of the two methods of accuracy evaluation.

4 Final Model

With the type of the final model selected, it must be refit to the entire data set instead of just the training set.

final.model = hw(house.sales.ts, h=12, seasonal="additive", damped=TRUE) 
smoothing.parameter = final.model$model$par[1:4]
kable(smoothing.parameter, caption="Estimated Values of Smoothing Parameters in
      Holt-Winters Damped Trend with Additive Seasonality")

Estimated Values of Smoothing Parameters in Holt-Winters Damped Trend with Additive Seasonality
	x
alpha	0.8038473
beta	0.0001001
gamma	0.0001007
phi	0.9491619

Based on these parameter values, the final model can be represented by the following set of equations:

Level Equation:

\(l_t = 0.8038473 \cdot (y_{t} - s_{t-12})+(0.1961527) \cdot (l_{t-1}+(0.9491619)b_{t-1})\)

Trend Equation:

\(b_t= 0.0001001 \cdot (l_t - l_{t-1})+(0.9998999) \cdot (0.9491619)(b_{t-1})\)

Seasonal Equation:

\(s_t= 0.0001007 \cdot (y_t -l_t)+(0.9998993) \cdot s_{t-12}\)

Forecast Equation:

\(\hat{y}_{t+h} = l_t + (1+0.9491619+0.9491619^2+...+0.9491619^{12}) \cdot b_t +s_{t-12}\)

where:

\(y_t\) is the observed number of houses sold at time \(t\)

\(l_t\) is the level component at time \(t\)

\(b_t\) is the trend component at time \(t\)

\(s_t\) is the seasonal component at time \(t\)

Modeling Monthly New House Sales Using Exponential Smoothing

Haley Koprivsek

Week 13 Assignment