Firts, we require some libraries and read the data from the .csv file.
require(forecast)
require(xts)
require(ggplot2)
library(ggfortify) #Plot Monthplot
#read data from CSV file
rawData <- read.csv("./ko.csv", sep=";", dec=",")
Let see how the data looks like
head(rawData)
NA
We just have two columns, “Fechas” (dates) and “Ingresos” (revenues).
#Create a XTS object
#Convert data to XTS
mData=xts(rawData$Ingresos, order.by = as.Date(as.character(rawData$Fecha),"%Y%m%d"),frequency=4)
Above we order by date the data, and below we conver data to XTS (“frecuency=4” means that we are working with quarterly data). Moreover, we generate pure quarterly data and transform to zoo data (the type of data that we need in order to use the time series library).
xVentas=xts((rawData$Ingresos),order.by=as.POSIXct(strptime(as.character(rawData$Fecha),"%Y%m%d")),frequency=4)
#Generate quarterly data
xVentas=to.quarterly(xVentas)
#Transform to zoo data (forecast package)
zVentas=as.zoo(xVentas$xVentas.Close)
#Change name
names(zVentas)="Ventas"
Before starting with the forcasting, we plot the serie (raw) and also the seasonal plot (per quarter).
##Plot Serie
autoplot(zVentas)+ggtitle("Ventas Trimestrales CocaCola")+xlab("Trimestres")+ylab("Ventas")
#Seasonal Plot
ggfreqplot(as.ts(zVentas),freq=4,nrow=1,facet.labeller=c("1T","2T","3T","4T"))+ggtitle("Ventas Trimestrales")
Here we see how there exist seasonal component. 2Q and 3Q revenues are higher than 1Q and 4Q revenues. In addition, we also see how volatility is different each quarter and thus, the seasonal component is not stationary.
#Select number of observation to compare forecast
cOmit=4
#Data Size
nObs=length(zVentas)
nObs
[1] 98
The number of quarterly observation is 98.
#sub_sample
#oVentas=zVentas[1:(nObs-cOmit),]
oVentas <- window(zVentas,start=index(zVentas[1]),end=index(zVentas[nObs-cOmit]))
head(oVentas)
Ventas
1991 Q1 2480
1991 Q2 3039
1991 Q3 3172
1991 Q4 2879
1992 Q1 2772
1992 Q2 3550
Above we substract the 4 last quarter as it is stated in the case study. Lets start fitting all the models (in this first step we will use models without seasonal component).
Note: This is just to illustrate that we will need a seaonal component (in real life you can directely use models with seasonal).
#Fit Simple Exponential Smoothing
fit1 <- ses(oVentas)
#Fit Holt
fit2 <- holt(oVentas)
#Fit Holt- exponential
fit3 <- holt(oVentas,exponential=TRUE)
#Fit Holt - damped
fit4 <- holt(oVentas,damped=TRUE)
#Fit Holt - (exponential+damped)
fit5 <- holt(oVentas,exponential=TRUE,damped=TRUE)
We have already fitted 5 models, below we ask for the resutls in the first model (“Simple Exponential Smoothing”),
# Results for first model:
fit1$model
Simple exponential smoothing
Call:
ses(y = oVentas)
Smoothing parameters:
alpha = 0.5938
Initial states:
l = 2714.2745
sigma: 754.6586
AIC AICc BIC
1676.786 1677.053 1684.416
Now, we plot the model fitted values for each fitting (5 models).
#Plot models fitted
plot(fit3, type="o", ylab="Ventas", flwd=1, plot.conf=FALSE)
lines(window(zVentas),type="o")
lines(fit1$mean,col=2)
lines(fit2$mean,col=3)
lines(fit4$mean,col=5)
lines(fit5$mean,col=6)
legend("topleft", lty=1, pch=1, col=1:6,
c("Data","SES","Holt's","Exponential",
"Additive Damped","Multiplicative Damped"))
It is clear that we need a seasonal component, the models are also able to forecast “well” the first quarter (out of sample).
Finally, we add two additional models (with seasonal component),
#seasonal model Holt-winters
fit6 <- hw(oVentas,seasonal="additive")
fit7 <- hw(oVentas,seasonal="multiplicative")
We plot also the forecasting graph for these models,
#Plot models
plot(fit7,ylab="Ventas",
plot.conf=FALSE, type="o", fcol="white", xlab="Year")
lines(window(zVentas),type="o",col="blue")
lines(fitted(fit6), col="red", lty=2)
lines(fitted(fit7), col="green", lty=2)
lines(fit6$mean, type="o", col="red")
lines(fit7$mean, type="o", col="green")
legend("topleft",lty=1, pch=1, col=1:3,
c("data","Holt Winters' Additive","Holt Winters' Multiplicative"))
It seems to work better the “mulplicative” model. For these two last models, we also learn how to compute the different components (level, slope and seasonal) and how to plot them,
#Calculate Components
states <- cbind(fit6$model$states[,1:3],fit7$model$states[,1:3])
colnames(states) <- c("level","slope","seasonal","level","slope","seasonal")
plot(states, xlab="Year")
fit6$model$state[,1:3]
l b s1
1990 Q4 2969.031 83.12899 -284.0012
1991 Q1 3004.310 83.12420 -524.3095
1991 Q2 2466.805 83.06211 572.1952
1991 Q3 2935.912 83.10073 236.0873
1991 Q4 3162.987 83.11513 -283.9868
1992 Q1 3296.304 83.12016 -524.3045
1992 Q2 2977.845 83.07998 572.1550
1992 Q3 3271.891 83.10109 236.1085
1992 Q4 3526.970 83.11829 -283.9696
1993 Q1 3580.307 83.11531 -524.3075
1993 Q2 3326.879 83.08164 572.1213
1993 Q3 3392.893 83.07994 236.1067
1993 Q4 3656.951 83.09804 -283.9515
1994 Q1 3876.294 83.11167 -524.2938
1994 Q2 3769.898 83.09271 572.1024
1994 Q3 4224.856 83.12992 236.1439
1994 Q4 4300.952 83.12921 -283.9522
1995 Q1 4378.294 83.12863 -524.2944
1995 Q2 4363.907 83.11888 572.0926
1995 Q3 4658.835 83.14007 236.1651
1995 Q4 4616.965 83.12756 -283.9647
1996 Q1 4748.290 83.13238 -524.2896
1996 Q2 4713.919 83.12063 572.0809
1996 Q3 4450.870 83.08600 236.1305
1996 Q4 4759.942 83.10860 -283.9421
1997 Q1 4662.308 83.09052 -524.3077
1997 Q2 4502.944 83.06627 572.0566
1997 Q3 4717.856 83.07946 236.1437
1997 Q4 4984.924 83.09786 -283.9237
1998 Q1 4981.316 83.08919 -524.3163
1998 Q2 4578.992 83.04063 572.0081
1998 Q3 4510.872 83.02550 236.1286
1998 Q4 4741.909 83.04031 -283.9089
1999 Q1 4924.306 83.05025 -524.3064
1999 Q2 4763.017 83.02581 571.9836
1999 Q3 4902.866 83.03149 236.1343
1999 Q4 5086.899 83.04160 -283.8988
2000 Q1 4915.332 83.01612 -524.3319
2000 Q2 5049.011 83.02119 571.9887
2000 Q3 5306.848 83.03868 236.1517
2000 Q4 5016.936 83.00137 -283.9361
2001 Q1 4483.394 82.93969 -524.3935
2001 Q2 4081.060 82.89114 571.9401
2001 Q3 4458.819 82.92064 236.1812
2001 Q4 4521.938 82.91866 -283.9381
2002 Q1 4603.394 82.91851 -524.3937
2002 Q2 4796.049 82.92949 571.9511
2002 Q3 5085.798 82.95018 236.2019
2002 Q4 5078.947 82.94120 -283.9471
2003 Q1 5026.407 82.92765 -524.4072
2003 Q2 5123.047 82.92902 571.9525
2003 Q3 5434.775 82.95191 236.2248
2003 Q4 5459.953 82.94613 -283.9529
2004 Q1 5602.401 82.95208 -524.4013
2004 Q2 5342.082 82.91774 571.9182
2004 Q3 5359.782 82.91121 236.2183
2004 Q4 5487.948 82.91574 -283.9483
2005 Q1 5790.379 82.93770 -524.3793
2005 Q2 5738.095 82.92417 571.9046
2005 Q3 5800.784 82.92215 236.2163
2005 Q4 5834.953 82.91727 -283.9532
2006 Q1 5750.396 82.90052 -524.3961
2006 Q2 5904.088 82.90760 571.9117
2006 Q3 6217.761 82.93069 236.2393
2006 Q4 6215.962 82.92221 -283.9617
2007 Q1 6627.363 82.95507 -524.3632
2007 Q2 7161.043 83.00016 571.9568
2007 Q3 7453.740 83.02114 236.2603
2007 Q4 7614.954 83.02896 -283.9539
2008 Q1 7903.343 83.04951 -524.3427
2008 Q2 8473.994 83.09829 572.0056
2008 Q3 8156.780 83.05824 236.2203
2008 Q4 7410.037 82.97523 -284.0369
2009 Q1 7693.323 82.99527 -524.3227
2009 Q2 7695.003 82.98713 571.9974
2009 Q3 7807.777 82.99011 236.2232
2009 Q4 7794.047 82.98043 -284.0466
2010 Q1 8049.305 82.99767 -524.3054
2010 Q2 8102.006 82.99464 571.9944
2010 Q3 8189.776 82.99512 236.2237
2010 Q4 10777.795 83.24573 -283.7960
2011 Q1 11041.287 83.26376 -524.2874
2011 Q2 12164.901 83.36784 572.0985
2011 Q3 12011.800 83.34418 236.2001
2011 Q4 11323.874 83.26702 -283.8731
2012 Q1 11661.262 83.29245 -524.2620
2012 Q2 12512.824 83.36931 572.1753
2012 Q3 12103.849 83.32005 236.1508
2012 Q4 11738.918 83.27521 -283.9180
2013 Q1 11559.288 83.24890 -524.2883
2013 Q2 12176.771 83.30235 572.2287
2013 Q3 11793.896 83.25571 236.1042
2013 Q4 11323.974 83.20037 -283.9733
2014 Q1 11100.319 83.16967 -524.3190
2014 Q2 12001.689 83.25153 572.3106
fitted(fit6)
Qtr1 Qtr2 Qtr3 Qtr4
1991 2527.855 3659.691 2785.916 2735.012
1992 2721.792 3951.620 3297.013 3071.006
1993 3085.783 4235.578 3646.069 3192.004
1994 3215.742 4531.527 4089.097 4024.034
1995 3859.788 5033.525 4683.170 4458.023
1996 4175.798 5403.515 5033.205 4249.991
1997 4318.761 5317.479 4822.140 4516.994
1998 4543.714 5636.462 4898.177 4309.973
1999 4300.633 5579.365 5082.171 4701.988
2000 4645.634 5570.332 5368.167 5105.988
2001 4575.606 5138.322 4400.103 4257.803
2002 4080.463 5258.252 5115.160 4884.810
2003 4637.495 5681.286 5442.178 5233.780
2004 5018.492 6257.306 5661.225 5158.740
2005 5046.463 6445.235 6057.238 5599.758
2006 5393.491 6405.201 6223.212 6016.738
2007 5774.488 7282.230 7480.282 7252.799
2008 7173.620 8558.349 8793.353 7955.884
2009 6968.670 8348.323 8014.210 7606.730
2010 7352.704 8704.300 8421.223 7988.725
2011 10336.735 11696.545 12484.493 11811.348
2012 10882.853 12316.653 12832.394 11903.296
2013 11297.931 12214.713 12496.224 11593.234
2014 10882.886 11755.718
fit6$mean
Qtr1 Qtr2 Qtr3 Qtr4
2014 12321.04 11884.22
2015 11727.12 12907.01 12654.05 12217.22
2016 12060.13 13240.01
In the additive model (left figures), level and slope seems to have some seosanility…meanwhile the seasonal component in the multiplicative model seems to have some trend…so here we have some trade-off between both.
As it is really difficult to know what of the above models is gonna perform better, we can use the function ETS which is gonna give us the best model (using the Akaike’s Information Criterion) from all the possible combinations.
## Select automatic ETS
etsfit<-ets(oVentas)
#forecast model
fventas.ets=forecast(etsfit)
#Results
summary(fventas.ets)
Forecast method: ETS(M,A,M)
Model Information:
ETS(M,A,M)
Call:
ets(y = oVentas)
Smoothing parameters:
alpha = 0.9606
beta = 1e-04
gamma = 1e-04
Initial states:
l = 2930.9513
b = 92.1587
s = 0.9566 1.0417 1.0924 0.9093
sigma: 0.0552
AIC AICc BIC
1519.519 1521.662 1542.409
Error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set -2.945034 362.9583 208.4911 -0.4163861 3.406357 0.3811693 0.1458926
Forecasts:
Here we see how the best model is an ETS(M,A,M).
#Plot
plot(fventas.ets)
lines(window(zVentas),type="o")
#Actual and Forecast
matrix(c(fventas.ets$mean[1:cOmit],zVentas[(nObs-cOmit+1):nObs]),ncol=2)
[,1] [,2]
[1,] 12094.71 11976
[2,] 11195.27 10872
[3,] 10725.07 10711
[4,] 12985.83 12156
Now, we repeate the above steps allowing “damped = TRUE”.
## Select automatic ETS
etsfit2<-ets(oVentas,damped=TRUE)
#forecast model
fventas.ets2=forecast(etsfit2)
#Results
summary(fventas.ets2)
Forecast method: ETS(M,Ad,M)
Model Information:
ETS(M,Ad,M)
Call:
ets(y = oVentas, damped = TRUE)
Smoothing parameters:
alpha = 0.9999
beta = 0.0311
gamma = 1e-04
phi = 0.9674
Initial states:
l = 2758.3417
b = 88.5606
s = 0.9566 1.042 1.0922 0.9092
sigma: 0.0557
AIC AICc BIC
1520.586 1523.237 1546.019
Error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 40.32895 366.0291 210.5488 0.4964912 3.36551 0.3849311 0.1016975
Forecasts:
summary(fventas.ets2) #This is just to show you both results in the solution
Forecast method: ETS(M,Ad,M)
Model Information:
ETS(M,Ad,M)
Call:
ets(y = oVentas, damped = TRUE)
Smoothing parameters:
alpha = 0.9999
beta = 0.0311
gamma = 1e-04
phi = 0.9674
Initial states:
l = 2758.3417
b = 88.5606
s = 0.9566 1.042 1.0922 0.9092
sigma: 0.0557
AIC AICc BIC
1520.586 1523.237 1546.019
Error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 40.32895 366.0291 210.5488 0.4964912 3.36551 0.3849311 0.1016975
Forecasts:
In this case the best model is an ETS (M, Ad, M).
#Plot
plot(fventas.ets2)
lines(window(zVentas),type="o")
#Actual and Forecast
matrix(c(fventas.ets2$mean[1:cOmit],fventas.ets$mean[1:cOmit],zVentas[(nObs-cOmit+1):nObs]),ncol=3)
[,1] [,2] [,3]
[1,] 12052.62 12094.71 11976
[2,] 11114.85 11195.27 10872
[3,] 10610.23 10725.07 10711
[4,] 12800.87 12985.83 12156
It is difficult to see what model is better (depends on the quarter to be forecasted). We can say that we are indifferent between both (at least with these tests).
#Plot all models
plot(fventas.ets2)
lines(window(zVentas),type="o")
lines(fventas.ets$mean,type="o",col="red")
—-END—-