In this dataset we have weather observations from Houston, Texas from 1947-2014. Some variables included in this dataset include mean temperature, total precipitation, and the highest temperature. In this specific time series analysis we will look at the most recent 150 observations which comes from around 2002-2014. We will be looking at only one variable (meantemp) to see the trends around this given time.
The objective of this analysis is to use time series modeling to draw conclusions on the mean temperature for the next year of this data set
weather <- read.csv("https://raw.githubusercontent.com/TylerBattaglini/STA-321/refs/heads/main/houston_weather.csv")
weather_subset <- tail(weather, 150)
train.weath = weather_subset$meanTemp[1:138]
test.weath = weather_subset$meanTemp[139:150]
weath = ts(train.weath, start=c(2000, 1), frequency=12)
fit1 = ses(weath, h=12)
fit2 = holt(weath, initial="optimal", h=12)
fit3 = holt(weath, damped=TRUE, h=12)
fit4 = holt(weath, exponential=TRUE, damped=TRUE, h=12)
fit5 = hw(weath, h=12, seasonal="additive")
fit6 = hw(weath, h=12, seasonal="multiplicative")
fit7 = hw(weath, h=12, seasonal="additive", damped=TRUE)
fit8 = hw(weath, h=12, seasonal="multiplicative", damped=TRUE)
We focus on the last 150 observations of the data set to reduce the chance of overfitting and to get the most recent observations of this dataset. We then split our data into a test and a training set. 12 observations in the test set and the remaining 138 in the training set. The training data is then converted into a time series object with monthly frequency of 12 because we are looking at the years. We then fit 8 different models to see which best fits our data.
We get many accuracy values to see which model is best for our data.
accuracy.table = round(rbind(accuracy(fit1), accuracy(fit2), accuracy(fit3), accuracy(fit4),
accuracy(fit5), accuracy(fit6), accuracy(fit7), accuracy(fit8)),4)
row.names(accuracy.table)=c("SES","Holt Linear","Holt Add. Damped", "Holt Exp. Damped",
"HW Add.","HW Exp.","HW Add. Damp", "HW Exp. Damp")
kable(accuracy.table, caption = "The accuracy measures of various exponential smoothing models
based on the training data")
ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | |
---|---|---|---|---|---|---|---|
SES | -0.2210 | 6.3659 | 5.3461 | -0.7799 | 7.8880 | 2.0777 | 0.6116 |
Holt Linear | -0.0587 | 5.3786 | 4.2739 | 0.5114 | 6.3078 | 1.6610 | 0.0460 |
Holt Add. Damped | -0.0907 | 5.0365 | 4.0451 | 0.2195 | 5.9348 | 1.5721 | 0.0545 |
Holt Exp. Damped | -0.5602 | 5.1376 | 4.1367 | -0.4293 | 6.0188 | 1.6077 | 0.0352 |
HW Add. | -0.0498 | 2.2572 | 1.8184 | -0.2023 | 2.7012 | 0.7067 | 0.1684 |
HW Exp. | -0.1411 | 2.3253 | 1.7499 | -0.3550 | 2.6720 | 0.6801 | 0.3866 |
HW Add. Damp | -0.0747 | 2.2400 | 1.8105 | -0.2256 | 2.6920 | 0.7037 | 0.1563 |
HW Exp. Damp | -0.0829 | 2.3408 | 1.8635 | -0.2395 | 2.7574 | 0.7242 | 0.1470 |
We see from our output above that HW Add. is the best fit for our data. We see that HW Add. has one of the lowest values for ME, RSME, MAE, MPE, MAPE, and MASE. For ACF1 we see that it is in the middle compared to the other values but it is still a relatively low value meaning that residual autocrrelation is not a concern.
We are visualizing the original time series data and forecast results from different exponential smoothing models. We create two models one for non-seasonal smoothing models and another for seasonal smoothing models.
par(mfrow=c(2,1), mar=c(3,4,3,1))
###### plot the original data
pred.id = 139:150
plot(1:138, train.weath, lwd=2,type="o", ylab="MeanTemp", xlab="",
xlim=c(1,160), ylim=c(30, 90), cex=0.3,
main="Non-seasonal Smoothing Models")
lines(pred.id, fit1$mean, col="red")
lines(pred.id, fit2$mean, col="blue")
lines(pred.id, fit3$mean, col="purple")
lines(pred.id, fit4$mean, col="navy")
##
points(pred.id, fit1$mean, pch=16, col="red", cex = 0.5)
points(pred.id, fit2$mean, pch=17, col="blue", cex = 0.5)
points(pred.id, fit3$mean, pch=19, col="purple", cex = 0.5)
points(pred.id, fit4$mean, pch=21, col="navy", cex = 0.5)
#points(fit0, col="black", pch=1)
legend("bottomright", lty=1, col=c("red","blue","purple", "navy"),pch=c(16,17,19,21),
c("SES","Holt Linear","Holt Linear Damped", "Holt Multiplicative Damped"),
cex = 0.7, bty="n")
plot(1:138, train.weath, lwd=2,type="o", ylab="Mean Temp", xlab="",
xlim=c(1,160), ylim=c(30, 90), cex=0.3,
main="Holt-Winterd Trend and Seasonal Smoothing Models")
lines(pred.id, fit5$mean, col="red")
lines(pred.id, fit6$mean, col="blue")
lines(pred.id, fit7$mean, col="purple")
lines(pred.id, fit8$mean, col="navy")
##
points(pred.id, fit5$mean, pch=16, col="red", cex = 0.5)
points(pred.id, fit6$mean, pch=17, col="blue", cex = 0.5)
points(pred.id, fit7$mean, pch=19, col="purple", cex = 0.5)
points(pred.id, fit8$mean, pch=21, col="navy", cex = 0.5)
###
legend("bottomright", lty=1, col=c("red","blue","purple", "navy"),pch=c(16,17,19,21),
c("HW Additive","HW Multiplicative","HW Additive Damped", "HW Multiplicative Damped"),
cex = 0.7, bty="n")
Case study: Comparing various exponential smoothing models.
We see from the above accuracy table that HW’s linear trend with an additive seasonal model is the best fit our of eight smoothing models. This is also matches with our previous conclusion that HW Add is the best model for our data.
We just visualized our training and test data and they showed us the same result in that HW Add is the appropriate model for these two data sets. To make a real life forecast of our data we need to use the whole dataset with all 150 observations. We refit our data to update the smoothing parameters in our final model.
acc.fun = function(test.data, mod.obj){
PE=100*(test.data-mod.obj$mean)/mod.obj$mean
MAPE = mean(abs(PE))
###
E=test.data-mod.obj$mean
MSE=mean(E^2)
###
accuracy.metric=c(MSE=MSE, MAPE=MAPE)
accuracy.metric
}
pred.accuracy = rbind(SES =acc.fun(test.data=test.weath, mod.obj=fit1),
Holt.Add =acc.fun(test.data=test.weath, mod.obj=fit2),
Holt.Add.Damp =acc.fun(test.data=test.weath, mod.obj=fit3),
Holt.Exp =acc.fun(test.data=test.weath, mod.obj=fit4),
HW.Add =acc.fun(test.data=test.weath, mod.obj=fit5),
HW.Exp =acc.fun(test.data=test.weath, mod.obj=fit6),
HW.Add.Damp =acc.fun(test.data=test.weath, mod.obj=fit7),
HW.Exp.Damp =acc.fun(test.data=test.weath, mod.obj=fit8))
kable(pred.accuracy, caption="The accuracy measures of various exponential smoothing models
based on the testing data")
MSE | MAPE | |
---|---|---|
SES | 420.027941 | 32.430511 |
Holt.Add | 5921.842465 | 607.491932 |
Holt.Add.Damp | 1759.165800 | 137.916182 |
Holt.Exp | 1768.349891 | 139.113295 |
HW.Add | 5.111103 | 2.972308 |
HW.Exp | 8.833668 | 3.593340 |
HW.Add.Damp | 4.973422 | 2.981870 |
HW.Exp.Damp | 7.161286 | 3.597887 |
We see from the above output that HW Add is again the best model which is consistent to our conclusions made previously in the training and test data. HW Add is the best model because MSE and MAPE are both the lowest values out of the models with MSE = 5.111 and MAPE = 2.972.
Here we use the whole data set and refit it using smoothing parameters.
weather <- read.csv("https://raw.githubusercontent.com/TylerBattaglini/STA-321/refs/heads/main/houston_weather.csv")
weath=ts(weather$meanTemp[1:150], start=2000, frequency = 12)
final.model = hw(weath,h=12, seasonal="additive")
smoothing.parameter = final.model$model$par[1:3]
kable(smoothing.parameter, caption="Estimated values of the smoothing parameters in
Holt-Winters linear trend with additive seasonality")
x | |
---|---|
alpha | 0.1305680 |
beta | 0.0001000 |
gamma | 0.0001006 |
We see from the output above that a value of 0.131 for our alpha indicates that the model places moderate weight on recent observations when updating the level. For Beta, .00001, this value close to zero doesn’t give importance to updating the trend based on recent observations. For Gamma, .0001, with a value relatively close to zero it shows that seasonality patterns are stable and will likely not change.
For this Houston Weather data set from 2002-2014 representing mean temperature the best model we can use is HW’s linear trend using with an additive seasonal model. We determined this by splitting our data into two sets, a training and test data set and from these two sets we gathered accuracy measures and plots and they all came to the same conclusion, that HW ADD is the best model. We then used the additive model for our full dataset and got alpha, beta, and gamma values. Our alpha value of .13 shows that the model moderately adjusts the level based on recent data. The Beta value of nearly zero shows us there is little to no trend detected in the series. This means that the mean temperature does not have a significant long-term upward or downward trend which we would expect of temperature. Our Gamma value of nearly zero demonstrates that the seasonality trend does not change over this interval.