The following data is from https://www.census.gov. The data set contains the number of homes sold (by the thousands) in the United States from the years 1963 to 2024. Each year contains 12 observations for each month of the year with the exception of 2024 that only has 9 observations. There are a total of 741 observations. For this analysis, we will only use the 180 most recent observations. With this time series, we will build several smoothing models to see which one performs the best.
Which smoothing model will be the best fit for our time series?
First, the data is downloaded. Next, the data is cut down to the 180 most recent observations. this means we are only using observations from the years 2010 to 2024.We also modify the Vale variable so it does not contain commas. There appears to be no missing values in the data.
data.house <- read.csv("https://raw.githubusercontent.com/AvaDeSt/STA-321/refs/heads/main/Book%203(Sheet1).csv", header = TRUE)
n.row = dim(data.house)[1]
house = data.house[(n.row-180):n.row, ]
house$Value = as.numeric(gsub(",", "", house$Value))
First, we take the last most recent twelve observations as test data to identify the best model, we will also build an accuracy table based on the training data
test.house = house$Value[170:181]
train.hou = house$Value[1:169]
hou=ts(house$Value[1:169], start=2000, frequency = 12)
fit1 = ses(hou, h=12)
fit2 = holt(hou, initial="optimal", h=12) ## optimal alpha and beta
fit3 = holt(hou,damped=TRUE, h=12) ## additive damping
fit4 = holt(hou,exponential=TRUE, damped=TRUE, h =12) ## multiplicative damp
fit5 = hw(hou,h=12, seasonal="additive") ## default h = 10
fit6 = hw(hou,h=12, seasonal="multiplicative")
fit7 = hw(hou,h=12, seasonal="additive",damped=TRUE)
fit8 = hw(hou,h=12, seasonal="multiplicative",damped=TRUE)
accuracy.table = round(rbind(accuracy(fit1), accuracy(fit2), accuracy(fit3), accuracy(fit4),
accuracy(fit5), accuracy(fit6), accuracy(fit7), accuracy(fit8)),4)
row.names(accuracy.table)=c("SES","Holt Linear","Holt Add. Damped", "Holt Exp. Damped",
"HW Add.","HW Exp.","HW Add. Damp", "HW Exp. Damp")
kable(accuracy.table, caption = "The accuracy measures of various exponential smoothing models
based on the training data")
ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | |
---|---|---|---|---|---|---|---|
SES | 2.1676 | 48.0616 | 34.8290 | 0.0145 | 6.2117 | 0.4022 | 0.0001 |
Holt Linear | -0.4309 | 48.0610 | 34.9218 | -0.5012 | 6.2543 | 0.4033 | -0.0389 |
Holt Add. Damped | 2.5414 | 48.0301 | 34.7696 | 0.1272 | 6.1870 | 0.4015 | -0.0006 |
Holt Exp. Damped | 2.8592 | 48.0063 | 34.6673 | 0.2181 | 6.1540 | 0.4003 | -0.0011 |
HW Add. | -0.4703 | 47.9056 | 35.1570 | -0.5140 | 6.3216 | 0.4060 | 0.0019 |
HW Exp. | 1.7871 | 53.5761 | 40.0047 | -0.0539 | 7.2444 | 0.4620 | 0.0266 |
HW Add. Damp | 2.3200 | 47.9329 | 35.3732 | 0.0539 | 6.3355 | 0.4085 | -0.0017 |
HW Exp. Damp | 2.2896 | 48.0815 | 35.0745 | 0.0642 | 6.2895 | 0.4050 | -0.0280 |
According to the table, it looks like Holt exponential damped model performs the best with the lowest MAE, MAP, and MASE. It also has an ACF1 close to 0.
Next, we are going to build two tables to display these models. The first one will contain the Holt non seasonal smoothing models. The second one will contain the Holt Winters seasonal models.
par(mfrow=c(2,1), mar=c(3,4,3,1))
###### plot the original data
pred.id = 170:181
plot(1:169, train.hou, lwd=2,type="o", ylab="House", xlab="",
xlim=c(1,181), ylim=c(200, 1200), cex=0.3,
main="Non-seasonal Smoothing Models")
lines(pred.id, fit1$mean, col="red")
lines(pred.id, fit2$mean, col="blue")
lines(pred.id, fit3$mean, col="purple")
lines(pred.id, fit4$mean, col="navy")
##
points(pred.id, fit1$mean, pch=16, col="red", cex = 0.5)
points(pred.id, fit2$mean, pch=17, col="blue", cex = 0.5)
points(pred.id, fit3$mean, pch=19, col="purple", cex = 0.5)
points(pred.id, fit4$mean, pch=21, col="navy", cex = 0.5)
#points(fit0, col="black", pch=1)
legend("bottomright", lty=1, col=c("red","blue","purple", "navy"),pch=c(16,17,19,21),
c("SES","Holt Linear","Holt Linear Damped", "Holt Multiplicative Damped"),
cex = 0.7, bty="n")
###########
plot(1:169, train.hou, lwd=2,type="o", ylab="House", xlab="",
xlim=c(1,181), ylim=c(200, 1200), cex=0.3,
main="Holt-Winterd Teend and Seasonal Smoothing Models")
lines(pred.id, fit5$mean, col="red")
lines(pred.id, fit6$mean, col="blue")
lines(pred.id, fit7$mean, col="purple")
lines(pred.id, fit8$mean, col="navy")
##
points(pred.id, fit5$mean, pch=16, col="red", cex = 0.5)
points(pred.id, fit6$mean, pch=17, col="blue", cex = 0.5)
points(pred.id, fit7$mean, pch=19, col="purple", cex = 0.5)
points(pred.id, fit8$mean, pch=21, col="navy", cex = 0.5)
###
legend("bottomright", lty=1, col=c("red","blue","purple", "navy"),pch=c(16,17,19,21),
c("HW Additive","HW Multiplicative","HW Additive Damped", "HW Multiplicative Damped"),
cex = 0.7, bty="n")
Case study: Comparing various exponential smoothing models.
We saw from the table that a Holt multiplicative damped model performed the best. The graph emulates this as it has an overall damped shape that is also multiplicative. Now, we are going to refit the model using the entire data set to update the smoothing parameters for the final model.
acc.fun = function(test.data, mod.obj){
PE=100*(test.data-mod.obj$mean)/mod.obj$mean
MAPE = mean(abs(PE))
###
E=test.data-mod.obj$mean
MSE=mean(E^2)
###
accuracy.metric=c(MSE=MSE, MAPE=MAPE)
accuracy.metric
}
pred.accuracy = rbind(SES =acc.fun(test.data=test.house, mod.obj=fit1),
Holt.Add =acc.fun(test.data=test.house, mod.obj=fit2),
Holt.Add.Damp =acc.fun(test.data=test.house, mod.obj=fit3),
Holt.Exp =acc.fun(test.data=test.house, mod.obj=fit4),
HW.Add =acc.fun(test.data=test.house, mod.obj=fit5),
HW.Exp =acc.fun(test.data=test.house, mod.obj=fit6),
HW.Add.Damp =acc.fun(test.data=test.house, mod.obj=fit7),
HW.Exp.Damp =acc.fun(test.data=test.house, mod.obj=fit8))
kable(pred.accuracy, caption="The accuracy measures of various exponential smoothing models
based on the testing data")
MSE | MAPE | |
---|---|---|
SES | 1426.891 | 4.707724 |
Holt.Add | 1499.462 | 4.731439 |
Holt.Add.Damp | 1426.201 | 4.705750 |
Holt.Exp | 1425.989 | 4.705398 |
HW.Add | 1911.521 | 5.342191 |
HW.Exp | 5311.508 | 8.032114 |
HW.Add.Damp | 1744.384 | 5.163332 |
HW.Exp.Damp | 2366.812 | 6.044176 |
We can see from the accuracy table above that the Holt exponential damped model with the lowest MSE and MAPE. This is consistent with what we saw in the accuracy table with the training data which is a good sign. Next, we will calculate the three smoothing parameters for the holt multiplicative damped trend which will be our final model.
Hou=ts(house$Value[1:181], start=2000, frequency = 12)
final.model = holt(Hou,exponential=TRUE, damped=TRUE, h =12)
smoothing.parameter = final.model$model$par[1:3]
kable(smoothing.parameter, caption="Estimated values of the smoothing parameters in
Holt-Winters linear trend with additive seasonality")
x | |
---|---|
alpha | 0.8081143 |
beta | 0.0001000 |
phi | 0.8906856 |
To conclude, we looked at 180 of the most recent observations for a time series that shows the number of homes bought in the United States on a monthly basis. The years we looked at were from late 2009 to 2024. We split the data into training and test data. We fitted 8 different models with the test data, a SES model, a Holt linear model, a Holt additive damped model, a Holt multiplicative damped model, a Holt-Winter’s additive model, a Holt-Winter’s multiplicative damped model, a Holt-Winters additive damped model, and a Holt-Winters multiplicative damped model. We built an accuracy table and found that the Holt multiplicative damped model performed the best. We built an accuracy table without the training data and still got the same results.