1 Introduction

The following data is from https://www.census.gov. The data set contains the number of homes sold (by the thousands) in the United States from the years 1963 to 2024. Each year contains 12 observations for each month of the year with the exception of 2024 that only has 9 observations. There are a total of 741 observations. For this analysis, we will only use the 180 most recent observations. With this time series, we will build several smoothing models to see which one performs the best.

1.1 Variable Description

  • Period - The year and month the number of houses sold was recorded.
  • Value - The number of houses bought in the United states in thousands. (Example: 400 = 400,000 houses sold)

1.2 Practical Question

Which smoothing model will be the best fit for our time series?

2 Exploratory Analysis

First, the data is downloaded. Next, the data is cut down to the 180 most recent observations. this means we are only using observations from the years 2010 to 2024.We also modify the Vale variable so it does not contain commas. There appears to be no missing values in the data.

data.house <- read.csv("https://raw.githubusercontent.com/AvaDeSt/STA-321/refs/heads/main/Book%203(Sheet1).csv", header = TRUE)
n.row = dim(data.house)[1]
house = data.house[(n.row-180):n.row, ]
          
house$Value = as.numeric(gsub(",", "", house$Value))

3 Model Fitting

First, we take the last most recent twelve observations as test data to identify the best model, we will also build an accuracy table based on the training data

test.house = house$Value[170:181]
train.hou = house$Value[1:169]
hou=ts(house$Value[1:169], start=2000, frequency = 12)
fit1 = ses(hou, h=12)
fit2 = holt(hou, initial="optimal", h=12)             ## optimal alpha and beta
fit3 = holt(hou,damped=TRUE, h=12)                   ## additive damping
fit4 = holt(hou,exponential=TRUE, damped=TRUE, h =12) ## multiplicative damp
fit5 = hw(hou,h=12, seasonal="additive")              ## default h = 10
fit6 = hw(hou,h=12, seasonal="multiplicative")
fit7 = hw(hou,h=12, seasonal="additive",damped=TRUE)
fit8 = hw(hou,h=12, seasonal="multiplicative",damped=TRUE)

accuracy.table = round(rbind(accuracy(fit1), accuracy(fit2), accuracy(fit3), accuracy(fit4),
                             accuracy(fit5), accuracy(fit6), accuracy(fit7), accuracy(fit8)),4)
row.names(accuracy.table)=c("SES","Holt Linear","Holt Add. Damped", "Holt Exp. Damped",
                            "HW Add.","HW Exp.","HW Add. Damp", "HW Exp. Damp")
kable(accuracy.table, caption = "The accuracy measures of various exponential smoothing models 
      based on the training data")
The accuracy measures of various exponential smoothing models based on the training data
ME RMSE MAE MPE MAPE MASE ACF1
SES 2.1676 48.0616 34.8290 0.0145 6.2117 0.4022 0.0001
Holt Linear -0.4309 48.0610 34.9218 -0.5012 6.2543 0.4033 -0.0389
Holt Add. Damped 2.5414 48.0301 34.7696 0.1272 6.1870 0.4015 -0.0006
Holt Exp. Damped 2.8592 48.0063 34.6673 0.2181 6.1540 0.4003 -0.0011
HW Add. -0.4703 47.9056 35.1570 -0.5140 6.3216 0.4060 0.0019
HW Exp. 1.7871 53.5761 40.0047 -0.0539 7.2444 0.4620 0.0266
HW Add. Damp 2.3200 47.9329 35.3732 0.0539 6.3355 0.4085 -0.0017
HW Exp. Damp 2.2896 48.0815 35.0745 0.0642 6.2895 0.4050 -0.0280

According to the table, it looks like Holt exponential damped model performs the best with the lowest MAE, MAP, and MASE. It also has an ACF1 close to 0.

Next, we are going to build two tables to display these models. The first one will contain the Holt non seasonal smoothing models. The second one will contain the Holt Winters seasonal models.

par(mfrow=c(2,1), mar=c(3,4,3,1))
###### plot the original data
pred.id = 170:181
plot(1:169, train.hou, lwd=2,type="o", ylab="House", xlab="", 
     xlim=c(1,181), ylim=c(200, 1200), cex=0.3,
     main="Non-seasonal Smoothing Models")
lines(pred.id, fit1$mean, col="red")
lines(pred.id, fit2$mean, col="blue")
lines(pred.id, fit3$mean, col="purple")
lines(pred.id, fit4$mean, col="navy")
##
points(pred.id, fit1$mean, pch=16, col="red", cex = 0.5)
points(pred.id, fit2$mean, pch=17, col="blue", cex = 0.5)
points(pred.id, fit3$mean, pch=19, col="purple", cex = 0.5)
points(pred.id, fit4$mean, pch=21, col="navy", cex = 0.5)
#points(fit0, col="black", pch=1)
legend("bottomright", lty=1, col=c("red","blue","purple", "navy"),pch=c(16,17,19,21),
   c("SES","Holt Linear","Holt Linear Damped", "Holt Multiplicative Damped"), 
   cex = 0.7, bty="n")
###########
plot(1:169, train.hou, lwd=2,type="o", ylab="House", xlab="", 
     xlim=c(1,181), ylim=c(200, 1200), cex=0.3,
     main="Holt-Winterd Teend and Seasonal Smoothing Models")
lines(pred.id, fit5$mean, col="red")
lines(pred.id, fit6$mean, col="blue")
lines(pred.id, fit7$mean, col="purple")
lines(pred.id, fit8$mean, col="navy")
##
points(pred.id, fit5$mean, pch=16, col="red", cex = 0.5)
points(pred.id, fit6$mean, pch=17, col="blue", cex = 0.5)
points(pred.id, fit7$mean, pch=19, col="purple", cex = 0.5)
points(pred.id, fit8$mean, pch=21, col="navy", cex = 0.5)
###
legend("bottomright", lty=1, col=c("red","blue","purple", "navy"),pch=c(16,17,19,21),
   c("HW Additive","HW Multiplicative","HW Additive Damped", "HW Multiplicative Damped"), 
   cex = 0.7, bty="n")
Case study: Comparing various exponential smoothing models.

Case study: Comparing various exponential smoothing models.

We saw from the table that a Holt multiplicative damped model performed the best. The graph emulates this as it has an overall damped shape that is also multiplicative. Now, we are going to refit the model using the entire data set to update the smoothing parameters for the final model.

acc.fun = function(test.data, mod.obj){
  PE=100*(test.data-mod.obj$mean)/mod.obj$mean
  MAPE = mean(abs(PE))
  ###
  E=test.data-mod.obj$mean
  MSE=mean(E^2)
  ###
  accuracy.metric=c(MSE=MSE, MAPE=MAPE)
  accuracy.metric
}

pred.accuracy = rbind(SES =acc.fun(test.data=test.house, mod.obj=fit1),
                      Holt.Add =acc.fun(test.data=test.house, mod.obj=fit2),
                      Holt.Add.Damp =acc.fun(test.data=test.house, mod.obj=fit3),
                      Holt.Exp =acc.fun(test.data=test.house, mod.obj=fit4),
                      HW.Add =acc.fun(test.data=test.house, mod.obj=fit5),
                      HW.Exp =acc.fun(test.data=test.house, mod.obj=fit6),
                      HW.Add.Damp =acc.fun(test.data=test.house, mod.obj=fit7),
                      HW.Exp.Damp =acc.fun(test.data=test.house, mod.obj=fit8))
kable(pred.accuracy, caption="The accuracy measures of various exponential smoothing models 
      based on the testing data")
The accuracy measures of various exponential smoothing models based on the testing data
MSE MAPE
SES 1426.891 4.707724
Holt.Add 1499.462 4.731439
Holt.Add.Damp 1426.201 4.705750
Holt.Exp 1425.989 4.705398
HW.Add 1911.521 5.342191
HW.Exp 5311.508 8.032114
HW.Add.Damp 1744.384 5.163332
HW.Exp.Damp 2366.812 6.044176

We can see from the accuracy table above that the Holt exponential damped model with the lowest MSE and MAPE. This is consistent with what we saw in the accuracy table with the training data which is a good sign. Next, we will calculate the three smoothing parameters for the holt multiplicative damped trend which will be our final model.

Hou=ts(house$Value[1:181], start=2000, frequency = 12)
final.model = holt(Hou,exponential=TRUE, damped=TRUE, h =12) 
smoothing.parameter = final.model$model$par[1:3]
kable(smoothing.parameter, caption="Estimated values of the smoothing parameters in
      Holt-Winters linear trend with additive seasonality")
Estimated values of the smoothing parameters in Holt-Winters linear trend with additive seasonality
x
alpha 0.8081143
beta 0.0001000
phi 0.8906856

4 Summary and Conclusion

To conclude, we looked at 180 of the most recent observations for a time series that shows the number of homes bought in the United States on a monthly basis. The years we looked at were from late 2009 to 2024. We split the data into training and test data. We fitted 8 different models with the test data, a SES model, a Holt linear model, a Holt additive damped model, a Holt multiplicative damped model, a Holt-Winter’s additive model, a Holt-Winter’s multiplicative damped model, a Holt-Winters additive damped model, and a Holt-Winters multiplicative damped model. We built an accuracy table and found that the Holt multiplicative damped model performed the best. We built an accuracy table without the training data and still got the same results.