1 Description of the Dataset

This dataset contains the sale of Perrin Freres Champagne in millions of dollars every month from January, 1964 to September, 1972.

champagne = na.omit(read.csv("C:/Users/qinfa/Desktop/school/STA 321/champagne.csv"))

2 Create the Time Series Object

We will proceed by holding back the last 12 data points for testing our final model and creating the time series object.

champagne.ts = ts(champagne[,2], start = c(1964, 1), frequency = 12)
par(mar=c(2,2,2,2))
plot(champagne.ts, main="Champagne Sales from January, 1964 to September, 1972 (Millions of Dollars)", ylab="Monthly Rate", xlab="")

This dataset includes monthly data, and the time series plot shows annual seasonality. Upon first glance, there appears to be an increase in seasonal variation, which means a multiplicative time series, as well as a damped trend.

3 Forecasting

We will use mthods of decomposition and exponential smoothing to create forecasting models.

3.1 Forecasting with Decomposition

We begin with a look at classical and STL decomposition methods.

Classical Decomposition:

cls.decomp = decompose(champagne.ts, "multiplicative")
par(mar=c(2,2,2,2))
plot(cls.decomp)

STL Decomposition:

stl.decomp=stl(champagne.ts, s.window = 12)
par(mar=c(2,2,2,2))
plot(stl.decomp)

STL uses LOESS, a non-parametric smoothing method that can more accurately estimate a nonlinear trend. We will use it to decompose this time series and forecast future values.

3.1.1 Training and Testing Data

Holding up the last 12 periods for testing, we use the rest of the historical data to train the forecast model. We will use different training set sizes of 25, 48, 71, and 93 and test them with the same test set.

ini.data = champagne[,2]
n0 = length(ini.data)
##
train.data01 = champagne[1:(n0-12), 2]
train.data02 = champagne[23:(n0-12), 2]
train.data03 = champagne[46:(n0-12), 2]
train.data04 = champagne[69:(n0-12), 2]
## last 7 observations
test.data = champagne[(n0-11):n0,2]
##
train01.ts = ts(train.data01, frequency = 12, start = c(1964, 1))
train02.ts = ts(train.data02, frequency = 12, start = c(1965, 11))
train03.ts = ts(train.data03, frequency = 12, start = c(1967, 10))
train04.ts = ts(train.data04, frequency = 12, start = c(1969, 9))
##
stl01 = stl(train01.ts, s.window = 12)
stl02 = stl(train02.ts, s.window = 12)
stl03 = stl(train03.ts, s.window = 12)
stl04 = stl(train04.ts, s.window = 12)
## Forecast with decomposing
fcst01 = forecast(stl01,h=12, method="naive")
fcst02 = forecast(stl02,h=12, method="naive")
fcst03 = forecast(stl03,h=12, method="naive")
fcst04 = forecast(stl04,h=12, method="naive")

Next, we perform error analysis:

## To compare different errors, we will not use the percentage for MAPE
PE01=(test.data-fcst01$mean)/fcst01$mean
PE02=(test.data-fcst02$mean)/fcst02$mean
PE03=(test.data-fcst03$mean)/fcst03$mean
PE04=(test.data-fcst04$mean)/fcst04$mean
###
MAPE1 = mean(abs(PE01))
MAPE2 = mean(abs(PE02))
MAPE3 = mean(abs(PE03))
MAPE4 = mean(abs(PE04))
###
E1=test.data-fcst01$mean
E2=test.data-fcst02$mean
E3=test.data-fcst03$mean
E4=test.data-fcst04$mean
##
MSE1=mean(E1^2)
MSE2=mean(E2^2)
MSE3=mean(E3^2)
MSE4=mean(E4^2)
###
MSE=c(MSE1, MSE2, MSE3, MSE4)
MAPE=c(MAPE1, MAPE2, MAPE3, MAPE4)
accuracy=cbind(MSE=MSE, MAPE=MAPE)
row.names(accuracy)=c("n.93", "n.71", "n. 48", "n. 25")
kable(accuracy, caption="Error comparison between forecast results with different sample sizes")

Error comparison between forecast results with different sample sizes
	MSE	MAPE
n.93	394518.8	0.1155051
n.71	298639.9	0.1034382
n. 48	226170.9	0.0830158
n. 25	402331.3	0.0827651

plot(1:4, MSE, type="b", col="darkred", ylab="Error", xlab="",
     ylim=c(0,400000),xlim = c(0.5, 4.5), main="Error Curves of STL Forecasting Models", axes=FALSE)
labs=c("n=93", "n=71", "n=48", "n=25")
axis(1, at=1:4, label=labs, pos=0)
axis(2)
lines(1:4, MAPE, type="b", col="blue")
text(1:4, MAPE+0.03, as.character(round(MAPE,4)), col="blue", cex=0.7)
text(1:4, MSE-0.03, as.character(round(MSE,4)), col="darkred", cex=0.7)
legend(1.5, 250000, c("MSE", "MAPE"), col=c("darkred","blue"), lty=1, bty="n", cex=0.7)

As is seen in the MSE for the error curve, the sample size of 48 yields the best performance for our forecasting with decomposition with STL smoothing, resulting in a mean standard error of 226170.9. We continue into forecasting through exponential smoothing.

3.2 Forecasting with Exponential Smoothing

We beginning by splitting our dataset into test and training sets, once again holding back 12 for testing our forecasting model.

train.champagne <- champagne[,2][1:93]
test.champagne <- champagne[,2][94:105]

champagne.ts = ts(champagne[,2][1:93], start = c(1964, 1), frequency = 12)

For our smoothing models, we will fit the following: simple exponential smoothing, holt linear, holt additive damped, holt exponential damped, holt-winters additive, holt-winters exponential, holt-winters additive damped, and holt-winters exponential damped. Looking at our original graph of the time series, we observe what appears to be a multiplicative damped trend with annual seasonality in the data.

We create the models below:

fit1 = ses(champagne.ts, h=12)
fit2 = holt(champagne.ts, initial="optimal", h=12)             
fit3 = holt(champagne.ts,damped=TRUE, h=12 )                   
fit4 = holt(champagne.ts,exponential=TRUE, damped=TRUE, h =12) 
fit5 = hw(champagne.ts,h=12, seasonal="additive")              
fit6 = hw(champagne.ts,h=12, seasonal="multiplicative")
fit7 = hw(champagne.ts,h=12, seasonal="additive",damped=TRUE)
fit8 = hw(champagne.ts,h=12, seasonal="multiplicative",damped=TRUE)


accuracy.table = round(rbind(accuracy(fit1), accuracy(fit2), accuracy(fit3), accuracy(fit4),
                             accuracy(fit5), accuracy(fit6), accuracy(fit7), accuracy(fit8)),4)
row.names(accuracy.table)=c("SES","Holt Linear","Holt Add. Damped", "Holt Exp. Damped",
                            "HW Add.","HW Exp.","HW Add. Damp", "HW Exp. Damp")
kable(accuracy.table, caption = "The accuracy measures of various exponential smoothing models 
      based on the training data")

The accuracy measures of various exponential smoothing models based on the training data
	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
SES	275.3871	2454.6724	1666.6698	-15.8673	38.9449	2.5287	0.4216
Holt Linear	-246.1259	2392.7121	1756.0344	-28.4341	44.8335	2.6643	0.4164
Holt Add. Damped	-1.8720	2530.9036	1730.2236	-17.8534	44.4789	2.6251	0.0692
Holt Exp. Damped	21.0960	2531.5996	1734.1817	-17.2256	44.4184	2.6311	0.0680
HW Add.	-3.6676	779.8375	554.0357	-3.1925	13.8429	0.8406	0.2567
HW Exp.	-55.8700	557.1406	392.6629	-2.8941	9.8574	0.5958	0.2123
HW Add. Damp	7.6758	786.7418	557.9193	-3.0582	13.7124	0.8465	0.2598
HW Exp. Damp	25.2781	569.8484	416.6006	-1.3267	10.3711	0.6321	-0.0469

The training data suggests that a Holt-Winters exponential model is the most appropriate, followed by the HW exponential damped model.

We continue by graphing all of the models.

par(mfrow=c(2,1), mar=c(3,4,3,1))
###### plot the original data
pred.id = 94:105
plot(1:93, train.champagne, lwd=2,type="o", ylab="Champagne Sales", xlab="", 
     xlim=c(1, 140), ylim=c(1950, 14000), cex=0.5,
     main="Non-seasonal Smoothing Models")
lines(pred.id, fit1$mean, col="red")
lines(pred.id, fit2$mean, col="blue")
lines(pred.id, fit3$mean, col="purple")
lines(pred.id, fit4$mean, col="navy")
##
points(pred.id, fit1$mean, pch=16, col="red", cex = 0.5)
points(pred.id, fit2$mean, pch=17, col="blue", cex = 0.5)
points(pred.id, fit3$mean, pch=19, col="purple", cex = 0.5)
points(pred.id, fit4$mean, pch=21, col="navy", cex = 0.5)
#points(fit0, col="black", pch=1)
legend("bottomright", lty=1, col=c("red","blue","purple", "navy"),pch=c(16,17,19,21),
   c("SES","Holt Linear","Holt Linear Damped", "Holt Multiplicative Damped"), 
   cex = 0.7, bty="n")
###########
plot(1:93, train.champagne, lwd=2,type="o", ylab="Champagne Sales", xlab="", 
     xlim=c(1, 140), ylim=c(1950, 14000), cex=0.5,
     main="Holt-Winters Trend and Seasonal Smoothing Models")
lines(pred.id, fit5$mean, col="red")
lines(pred.id, fit6$mean, col="blue")
lines(pred.id, fit7$mean, col="purple")
lines(pred.id, fit8$mean, col="navy")
##
points(pred.id, fit5$mean, pch=16, col="red", cex = 0.5)
points(pred.id, fit6$mean, pch=17, col="blue", cex = 0.5)
points(pred.id, fit7$mean, pch=19, col="purple", cex = 0.5)
points(pred.id, fit8$mean, pch=21, col="navy", cex = 0.5)
###
legend("bottomright", lty=1, col=c("red","blue","purple", "navy"),pch=c(16,17,19,21),
   c("HW Additive","HW Multiplicative","HW Additive Damped", "HW Multiplicative Damped"), 
   cex = 0.7, bty="n")

From the above graphs, we can see that the Holt-Winters Smoothing Models performed much better than the Holt models alone. Taking a closer look:

par(mfrow=c(1,1), mar=c(3,4,3,1))

plot(1:93, train.champagne, lwd=2,type="o", ylab="Champagne Sales", xlab="", 
     xlim=c(1, 140), ylim=c(1950, 14000), cex=0.5,
     main="Holt-Winters Trend and Seasonal Smoothing Models")
lines(pred.id, fit5$mean, col="red")
lines(pred.id, fit6$mean, col="blue")
lines(pred.id, fit7$mean, col="purple")
lines(pred.id, fit8$mean, col="navy")
##
points(pred.id, fit5$mean, pch=16, col="red", cex = 0.5)
points(pred.id, fit6$mean, pch=17, col="blue", cex = 0.5)
points(pred.id, fit7$mean, pch=19, col="purple", cex = 0.5)
points(pred.id, fit8$mean, pch=21, col="navy", cex = 0.5)
###
legend("bottomright", lty=1, col=c("red","blue","purple", "navy"),pch=c(16,17,19,21),
   c("HW Additive","HW Multiplicative","HW Additive Damped", "HW Multiplicative Damped"), 
   cex = 0.7, bty="n")

Compared to our original plot, the Holt-Winters additive damped model seems to have performed the best, which is different from what the graph of the training data indicated. We check the accuracy of the model against the test data below.

acc.fun = function(test.data, mod.obj){
  PE=100*(test.data-mod.obj$mean)/mod.obj$mean
  MAPE = mean(abs(PE))
  ###
  E=test.data-mod.obj$mean
  MSE=mean(E^2)
  ###
  accuracy.metric=c(MSE=MSE, MAPE=MAPE)
  accuracy.metric
}

pred.accuracy = rbind(SES =acc.fun(test.data=test.champagne, mod.obj=fit1),
                      Holt.Add =acc.fun(test.data=test.champagne, mod.obj=fit2),
                      Holt.Add.Damp =acc.fun(test.data=test.champagne, mod.obj=fit3),
                      Holt.Exp =acc.fun(test.data=test.champagne, mod.obj=fit4),
                      HW.Add =acc.fun(test.data=test.champagne, mod.obj=fit5),
                      HW.Exp =acc.fun(test.data=test.champagne, mod.obj=fit6),
                      HW.Add.Damp =acc.fun(test.data=test.champagne, mod.obj=fit7),
                      HW.Exp.Damp =acc.fun(test.data=test.champagne, mod.obj=fit8))
kable(pred.accuracy, caption="The accuracy measures of various exponential smoothing models 
      based on the testing data")

The accuracy measures of various exponential smoothing models based on the testing data
	MSE	MAPE
SES	8519347.8	38.164400
Holt.Add	8191986.4	37.339994
Holt.Add.Damp	8442531.2	37.844518
Holt.Exp	8421314.9	37.809241
HW.Add	177469.9	6.963315
HW.Exp	108336.1	6.239144
HW.Add.Damp	128425.8	6.382174
HW.Exp.Damp	115041.5	6.621129

The table shows that the HW Exponential model has the lowest measures of error and therefore the best out of the eight exponential smoothing models. The MSE is also much smaller than our STL smoothing forecasting that we made above.

4 Final Model

We construct the final Holt-Winters exponential model through by refitting it according to the entire dataset.

ts.champagne=ts(champagne[,2], start=1964, frequency = 12)
final.model = hw(champagne.ts,h=12, seasonal="multiplicative")
smoothing.parameter = final.model$model$par[1:3]
kable(smoothing.parameter, caption="Estimated values of the smoothing parameters in
      Holt-Winters multiplicative model")

Estimated values of the smoothing parameters in Holt-Winters multiplicative model
	x
alpha	0.0546574
beta	0.0001003
gamma	0.0001008

This gives the parameters for our final forecasting model for the sale of Perrin Freres Champagne in millions of dollars every month starting from January, 1964.

STA 321 Final Project: Time Series Models with Champagne Sales from 1964 to 1972

Alice Xiang