This dataset contains the sale of Perrin Freres Champagne in millions of dollars every month from January, 1964 to September, 1972.
champagne = na.omit(read.csv("C:/Users/qinfa/Desktop/school/STA 321/champagne.csv"))
We will proceed by holding back the last 12 data points for testing our final model and creating the time series object.
champagne.ts = ts(champagne[,2], start = c(1964, 1), frequency = 12)
par(mar=c(2,2,2,2))
plot(champagne.ts, main="Champagne Sales from January, 1964 to September, 1972 (Millions of Dollars)", ylab="Monthly Rate", xlab="")
This dataset includes monthly data, and the time series plot shows annual seasonality. Upon first glance, there appears to be an increase in seasonal variation, which means a multiplicative time series, as well as a damped trend.
We will use mthods of decomposition and exponential smoothing to create forecasting models.
We begin with a look at classical and STL decomposition methods.
Classical Decomposition:
cls.decomp = decompose(champagne.ts, "multiplicative")
par(mar=c(2,2,2,2))
plot(cls.decomp)
STL Decomposition:
stl.decomp=stl(champagne.ts, s.window = 12)
par(mar=c(2,2,2,2))
plot(stl.decomp)
STL uses LOESS, a non-parametric smoothing method that can more
accurately estimate a nonlinear trend. We will use it to decompose this
time series and forecast future values.
Holding up the last 12 periods for testing, we use the rest of the historical data to train the forecast model. We will use different training set sizes of 25, 48, 71, and 93 and test them with the same test set.
ini.data = champagne[,2]
n0 = length(ini.data)
##
train.data01 = champagne[1:(n0-12), 2]
train.data02 = champagne[23:(n0-12), 2]
train.data03 = champagne[46:(n0-12), 2]
train.data04 = champagne[69:(n0-12), 2]
## last 7 observations
test.data = champagne[(n0-11):n0,2]
##
train01.ts = ts(train.data01, frequency = 12, start = c(1964, 1))
train02.ts = ts(train.data02, frequency = 12, start = c(1965, 11))
train03.ts = ts(train.data03, frequency = 12, start = c(1967, 10))
train04.ts = ts(train.data04, frequency = 12, start = c(1969, 9))
##
stl01 = stl(train01.ts, s.window = 12)
stl02 = stl(train02.ts, s.window = 12)
stl03 = stl(train03.ts, s.window = 12)
stl04 = stl(train04.ts, s.window = 12)
## Forecast with decomposing
fcst01 = forecast(stl01,h=12, method="naive")
fcst02 = forecast(stl02,h=12, method="naive")
fcst03 = forecast(stl03,h=12, method="naive")
fcst04 = forecast(stl04,h=12, method="naive")
Next, we perform error analysis:
## To compare different errors, we will not use the percentage for MAPE
PE01=(test.data-fcst01$mean)/fcst01$mean
PE02=(test.data-fcst02$mean)/fcst02$mean
PE03=(test.data-fcst03$mean)/fcst03$mean
PE04=(test.data-fcst04$mean)/fcst04$mean
###
MAPE1 = mean(abs(PE01))
MAPE2 = mean(abs(PE02))
MAPE3 = mean(abs(PE03))
MAPE4 = mean(abs(PE04))
###
E1=test.data-fcst01$mean
E2=test.data-fcst02$mean
E3=test.data-fcst03$mean
E4=test.data-fcst04$mean
##
MSE1=mean(E1^2)
MSE2=mean(E2^2)
MSE3=mean(E3^2)
MSE4=mean(E4^2)
###
MSE=c(MSE1, MSE2, MSE3, MSE4)
MAPE=c(MAPE1, MAPE2, MAPE3, MAPE4)
accuracy=cbind(MSE=MSE, MAPE=MAPE)
row.names(accuracy)=c("n.93", "n.71", "n. 48", "n. 25")
kable(accuracy, caption="Error comparison between forecast results with different sample sizes")
MSE | MAPE | |
---|---|---|
n.93 | 394518.8 | 0.1155051 |
n.71 | 298639.9 | 0.1034382 |
n. 48 | 226170.9 | 0.0830158 |
n. 25 | 402331.3 | 0.0827651 |
plot(1:4, MSE, type="b", col="darkred", ylab="Error", xlab="",
ylim=c(0,400000),xlim = c(0.5, 4.5), main="Error Curves of STL Forecasting Models", axes=FALSE)
labs=c("n=93", "n=71", "n=48", "n=25")
axis(1, at=1:4, label=labs, pos=0)
axis(2)
lines(1:4, MAPE, type="b", col="blue")
text(1:4, MAPE+0.03, as.character(round(MAPE,4)), col="blue", cex=0.7)
text(1:4, MSE-0.03, as.character(round(MSE,4)), col="darkred", cex=0.7)
legend(1.5, 250000, c("MSE", "MAPE"), col=c("darkred","blue"), lty=1, bty="n", cex=0.7)
As is seen in the MSE for the error curve, the sample size of 48 yields the best performance for our forecasting with decomposition with STL smoothing, resulting in a mean standard error of 226170.9. We continue into forecasting through exponential smoothing.
We beginning by splitting our dataset into test and training sets, once again holding back 12 for testing our forecasting model.
train.champagne <- champagne[,2][1:93]
test.champagne <- champagne[,2][94:105]
champagne.ts = ts(champagne[,2][1:93], start = c(1964, 1), frequency = 12)
For our smoothing models, we will fit the following: simple exponential smoothing, holt linear, holt additive damped, holt exponential damped, holt-winters additive, holt-winters exponential, holt-winters additive damped, and holt-winters exponential damped. Looking at our original graph of the time series, we observe what appears to be a multiplicative damped trend with annual seasonality in the data.
We create the models below:
fit1 = ses(champagne.ts, h=12)
fit2 = holt(champagne.ts, initial="optimal", h=12)
fit3 = holt(champagne.ts,damped=TRUE, h=12 )
fit4 = holt(champagne.ts,exponential=TRUE, damped=TRUE, h =12)
fit5 = hw(champagne.ts,h=12, seasonal="additive")
fit6 = hw(champagne.ts,h=12, seasonal="multiplicative")
fit7 = hw(champagne.ts,h=12, seasonal="additive",damped=TRUE)
fit8 = hw(champagne.ts,h=12, seasonal="multiplicative",damped=TRUE)
accuracy.table = round(rbind(accuracy(fit1), accuracy(fit2), accuracy(fit3), accuracy(fit4),
accuracy(fit5), accuracy(fit6), accuracy(fit7), accuracy(fit8)),4)
row.names(accuracy.table)=c("SES","Holt Linear","Holt Add. Damped", "Holt Exp. Damped",
"HW Add.","HW Exp.","HW Add. Damp", "HW Exp. Damp")
kable(accuracy.table, caption = "The accuracy measures of various exponential smoothing models
based on the training data")
ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | |
---|---|---|---|---|---|---|---|
SES | 275.3871 | 2454.6724 | 1666.6698 | -15.8673 | 38.9449 | 2.5287 | 0.4216 |
Holt Linear | -246.1259 | 2392.7121 | 1756.0344 | -28.4341 | 44.8335 | 2.6643 | 0.4164 |
Holt Add. Damped | -1.8720 | 2530.9036 | 1730.2236 | -17.8534 | 44.4789 | 2.6251 | 0.0692 |
Holt Exp. Damped | 21.0960 | 2531.5996 | 1734.1817 | -17.2256 | 44.4184 | 2.6311 | 0.0680 |
HW Add. | -3.6676 | 779.8375 | 554.0357 | -3.1925 | 13.8429 | 0.8406 | 0.2567 |
HW Exp. | -55.8700 | 557.1406 | 392.6629 | -2.8941 | 9.8574 | 0.5958 | 0.2123 |
HW Add. Damp | 7.6758 | 786.7418 | 557.9193 | -3.0582 | 13.7124 | 0.8465 | 0.2598 |
HW Exp. Damp | 25.2781 | 569.8484 | 416.6006 | -1.3267 | 10.3711 | 0.6321 | -0.0469 |
The training data suggests that a Holt-Winters exponential model is the most appropriate, followed by the HW exponential damped model.
We continue by graphing all of the models.
par(mfrow=c(2,1), mar=c(3,4,3,1))
###### plot the original data
pred.id = 94:105
plot(1:93, train.champagne, lwd=2,type="o", ylab="Champagne Sales", xlab="",
xlim=c(1, 140), ylim=c(1950, 14000), cex=0.5,
main="Non-seasonal Smoothing Models")
lines(pred.id, fit1$mean, col="red")
lines(pred.id, fit2$mean, col="blue")
lines(pred.id, fit3$mean, col="purple")
lines(pred.id, fit4$mean, col="navy")
##
points(pred.id, fit1$mean, pch=16, col="red", cex = 0.5)
points(pred.id, fit2$mean, pch=17, col="blue", cex = 0.5)
points(pred.id, fit3$mean, pch=19, col="purple", cex = 0.5)
points(pred.id, fit4$mean, pch=21, col="navy", cex = 0.5)
#points(fit0, col="black", pch=1)
legend("bottomright", lty=1, col=c("red","blue","purple", "navy"),pch=c(16,17,19,21),
c("SES","Holt Linear","Holt Linear Damped", "Holt Multiplicative Damped"),
cex = 0.7, bty="n")
###########
plot(1:93, train.champagne, lwd=2,type="o", ylab="Champagne Sales", xlab="",
xlim=c(1, 140), ylim=c(1950, 14000), cex=0.5,
main="Holt-Winters Trend and Seasonal Smoothing Models")
lines(pred.id, fit5$mean, col="red")
lines(pred.id, fit6$mean, col="blue")
lines(pred.id, fit7$mean, col="purple")
lines(pred.id, fit8$mean, col="navy")
##
points(pred.id, fit5$mean, pch=16, col="red", cex = 0.5)
points(pred.id, fit6$mean, pch=17, col="blue", cex = 0.5)
points(pred.id, fit7$mean, pch=19, col="purple", cex = 0.5)
points(pred.id, fit8$mean, pch=21, col="navy", cex = 0.5)
###
legend("bottomright", lty=1, col=c("red","blue","purple", "navy"),pch=c(16,17,19,21),
c("HW Additive","HW Multiplicative","HW Additive Damped", "HW Multiplicative Damped"),
cex = 0.7, bty="n")
From the above graphs, we can see that the Holt-Winters Smoothing Models
performed much better than the Holt models alone. Taking a closer
look:
par(mfrow=c(1,1), mar=c(3,4,3,1))
plot(1:93, train.champagne, lwd=2,type="o", ylab="Champagne Sales", xlab="",
xlim=c(1, 140), ylim=c(1950, 14000), cex=0.5,
main="Holt-Winters Trend and Seasonal Smoothing Models")
lines(pred.id, fit5$mean, col="red")
lines(pred.id, fit6$mean, col="blue")
lines(pred.id, fit7$mean, col="purple")
lines(pred.id, fit8$mean, col="navy")
##
points(pred.id, fit5$mean, pch=16, col="red", cex = 0.5)
points(pred.id, fit6$mean, pch=17, col="blue", cex = 0.5)
points(pred.id, fit7$mean, pch=19, col="purple", cex = 0.5)
points(pred.id, fit8$mean, pch=21, col="navy", cex = 0.5)
###
legend("bottomright", lty=1, col=c("red","blue","purple", "navy"),pch=c(16,17,19,21),
c("HW Additive","HW Multiplicative","HW Additive Damped", "HW Multiplicative Damped"),
cex = 0.7, bty="n")
Compared to our original plot, the Holt-Winters additive damped model
seems to have performed the best, which is different from what the graph
of the training data indicated. We check the accuracy of the model
against the test data below.
acc.fun = function(test.data, mod.obj){
PE=100*(test.data-mod.obj$mean)/mod.obj$mean
MAPE = mean(abs(PE))
###
E=test.data-mod.obj$mean
MSE=mean(E^2)
###
accuracy.metric=c(MSE=MSE, MAPE=MAPE)
accuracy.metric
}
pred.accuracy = rbind(SES =acc.fun(test.data=test.champagne, mod.obj=fit1),
Holt.Add =acc.fun(test.data=test.champagne, mod.obj=fit2),
Holt.Add.Damp =acc.fun(test.data=test.champagne, mod.obj=fit3),
Holt.Exp =acc.fun(test.data=test.champagne, mod.obj=fit4),
HW.Add =acc.fun(test.data=test.champagne, mod.obj=fit5),
HW.Exp =acc.fun(test.data=test.champagne, mod.obj=fit6),
HW.Add.Damp =acc.fun(test.data=test.champagne, mod.obj=fit7),
HW.Exp.Damp =acc.fun(test.data=test.champagne, mod.obj=fit8))
kable(pred.accuracy, caption="The accuracy measures of various exponential smoothing models
based on the testing data")
MSE | MAPE | |
---|---|---|
SES | 8519347.8 | 38.164400 |
Holt.Add | 8191986.4 | 37.339994 |
Holt.Add.Damp | 8442531.2 | 37.844518 |
Holt.Exp | 8421314.9 | 37.809241 |
HW.Add | 177469.9 | 6.963315 |
HW.Exp | 108336.1 | 6.239144 |
HW.Add.Damp | 128425.8 | 6.382174 |
HW.Exp.Damp | 115041.5 | 6.621129 |
The table shows that the HW Exponential model has the lowest measures of error and therefore the best out of the eight exponential smoothing models. The MSE is also much smaller than our STL smoothing forecasting that we made above.
We construct the final Holt-Winters exponential model through by refitting it according to the entire dataset.
ts.champagne=ts(champagne[,2], start=1964, frequency = 12)
final.model = hw(champagne.ts,h=12, seasonal="multiplicative")
smoothing.parameter = final.model$model$par[1:3]
kable(smoothing.parameter, caption="Estimated values of the smoothing parameters in
Holt-Winters multiplicative model")
x | |
---|---|
alpha | 0.0546574 |
beta | 0.0001003 |
gamma | 0.0001008 |
This gives the parameters for our final forecasting model for the sale of Perrin Freres Champagne in millions of dollars every month starting from January, 1964.