Problem #2 Relationship between Moving Average and Exponential Smoothing: With simple exponential smoothing, instead of taking a simple average over the w most recent values, we take a weighted average of all past values, so that weights decrease exponentially into the past. As the smoothing constant , alpha determines the rate of learning and a value closer to 1 indicates fast learning or only the most recent values influence the forecasts, we would want the alpha to be closer to one for the exponential smoothing to approximate a moving average with a very short window span.
Problem # 5 Forecasting Department Store sales ## Problem #1
library(forecast)
library(ggplot2)
setwd("~/USM_MBA_678")
DeptSales <- read.csv("DeptStoreSales.csv", header=TRUE, stringsAsFactors=FALSE)
str(DeptSales)
## 'data.frame': 24 obs. of 2 variables:
## $ Quarter: int 1 2 3 4 5 6 7 8 9 10 ...
## $ Sales : int 50147 49325 57048 76781 48617 50898 58517 77691 50862 53028 ...
head(DeptSales)
## Quarter Sales
## 1 1 50147
## 2 2 49325
## 3 3 57048
## 4 4 76781
## 5 5 48617
## 6 6 50898
tail(DeptSales)
## Quarter Sales
## 19 19 71486
## 20 20 92183
## 21 21 60800
## 22 22 64900
## 23 23 76997
## 24 24 103337
#time series object
DeptSales.ts <- ts(DeptSales$Sales, start=c(1,1), freq=4)
yrange=range(DeptSales.ts)
#autoplotting
autoplot(DeptSales.ts)
In this dataset, we have quarterly data which from the graph has obvious seasonality and trends upwards. The data is not stationary and we should only use moving averages when this is the case. So either of the moving average options would not be suitable since they would not effectively forecast the seasonal peaks and valleys of the dataset. Similarly, simple exponential smoothing which is like forecasting with a moving average, would not capture the seasonality of the dataset. We can use however double exponential smoothing for the series with a trend, but we would need to remove the seasonality first. The best forecasting method would be to use the Holt-Winter’s exponential smoothing method which uses a k-step-ahead forecast to take into account the season at period t+k.
5(b)Generating a forecasts for 4 quarters using Holt-Winter’s exponential smoothing.
validLength<-4
trainLength<-length(DeptSales.ts)-validLength
SalesTrain<-window(DeptSales.ts,end=c(1,trainLength))
SalesValid<-window(DeptSales.ts,start=c(1,trainLength+1))
library(zoo)
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
SalesHWMult<-hw(SalesTrain,seasonal="multiplicative",h=4)
summary(SalesHWMult)
##
## Forecast method: Holt-Winters' multiplicative method
##
## Model Information:
## Holt-Winters' multiplicative method
##
## Call:
## hw(y = SalesTrain, h = 4, seasonal = "multiplicative")
##
## Smoothing parameters:
## alpha = 0.4032
## beta = 0.1429
## gamma = 0.4549
##
## Initial states:
## l = 57401.8119
## b = 605.4045
## s=1.3012 0.9795 0.8614 0.8579
##
## sigma: 0.0258
##
## AIC AICc BIC
## 372.3936 390.3936 381.3552
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 246.631 1499.632 977.6487 0.3530739 1.665686 0.3137009
## ACF1
## Training set -0.07882461
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 6 Q1 61334.90 59303.21 63366.58 58227.70 64442.09
## 6 Q2 64971.30 62529.36 67413.25 61236.67 68705.94
## 6 Q3 76718.11 73376.84 80059.37 71608.08 81828.13
## 6 Q4 99420.55 94372.29 104468.81 91699.90 107141.20
accuracy(SalesHWMult,SalesValid)
## ME RMSE MAE MPE MAPE MASE
## Training set 246.631 1499.632 977.6487 0.3530739 1.665686 0.3137009
## Test set 897.285 1981.638 1200.3859 0.7906391 1.285456 0.3851712
## ACF1 Theil's U
## Training set -0.078824613 NA
## Test set 0.009540924 0.1291442
autoplot(SalesTrain)+xlab("Year")+autolayer(SalesHWMult,PI=
FALSE, series="HW Multiplicative")
From the text: MAPE = 1/v(summation(|et/yt|))100 =1/4(1415/60800)100 = .5818 for Quarter 21 =1/4(3243.5/64900)100 = 1.249 for Quarter 21 For both quarters combined MAPE = 1.831
5c - Is this model suitable? If we just look at the Actual vs. Forecast we might believe that the model is suitable for quarters 21 and 22, the problem is that the forecast errors have begun to trend upwards towards the last periods of the training data. This leads me to believe that the trend component of the data is not being adequately forecasted and will grow ever larger. 5d. Using differencing to remove the trend.
par(mfrow=c(2,2))
plot(diff(DeptSales.ts,lag=4),ylab = "Lag 4",xlab="Year", main="Seasonal Lag Differencing")
plot(diff(DeptSales.ts, lag=1),ylab="Lag 1",xlab="Year",main="Trend Lag Differencing")
plot(diff(diff(DeptSales.ts, lag=1),lag=4),ylab="Lag 1 then Lag 4",xlab="Year", main="Trend then Seasonal Differencing")
plot(diff(diff(DeptSales.ts, lag=4),lag=1),ylab = "Lag 4 then Lag 1",xlab="Year", main="Seasonal then Trend Differencing")
From the graph above showing the effect of differencing to remove the trend and seasonal pattern, the order of season or trend in differencing does not make any difference.
5e-Forecast Quarters 21-22
validLength<-4
trainLength<-length(DeptSales.ts)-validLength
SalesTrain<-window(DeptSales.ts,end=c(1,trainLength))
SalesValid<-window(DeptSales.ts,start=c(1,trainLength+1))
pointForecasts <-meanf(diff(diff(SalesTrain,lag=4),lag=1),h=2)
pointForecasts
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 6 Q1 569.2 -2116.935 3255.335 -3714.114 4852.514
## 6 Q2 569.2 -2116.935 3255.335 -3714.114 4852.514
The point forecasts for the two quarters are the same. However, we need to adjust back for the trend and seasonal pattern.
realForecasts<-vector()
for (i in 1:validLength) {
if(i==1){
realForecasts[i]<-pointForecasts$mean[i]+SalesTrain[(trainLength+i)-validLength]+ (SalesTrain[trainLength]-SalesTrain[trainLength-validLength])
} else {
realForecasts[i]<-pointForecasts$mean[i]+SalesTrain[(trainLength+i)-validLength]+ (realForecasts[i-1]-SalesTrain[trainLength+i-1-validLength])
}
}
#Let's see what the real forecast values look like
realForecasts
## [1] 63982.2 68177.4 NA NA
par(mfrow = c(1,1))
plot(realForecasts, type="l", bty = "l")
yrange=range(DeptSales.ts)
plot(c(1,6),c(yrange[1],yrange[2]),type="n",xlab="Year", ylab="Department Sales", bty="l",xaxt="n",yaxt="n")
lines(DeptSales.ts,bty="l")
axis(1, at=seq(1,6,1), labels=format(seq(1,6,1)))
axis(2, at=seq(48000,105000,5000), labels =format(seq(48.0,105,5)),las=2)
lines(seq(6,7-1/validLength,1/validLength),realForecasts, col="red",lwd=2,lty=2)
accuracy(realForecasts,SalesValid)
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set -3229.8 3230.151 3229.8 -5.141902 5.141902 -0.5 0.13634
5f.Here the MAPE values is 5.141902 compared to the MAPE of 1.831 for the exponential smoothing. The Exponential Smoothing seems to do a better job of forecasting over the average of the double-differenced and it is much simpler to perform from a programming perspective.
5g.An even simpler baseline would to perform a seasonal Naive forecast.
Problem 8
library(forecast)
library(ggplot2)
setwd("~/USM_MBA_678")
FortWine <- read.csv("AustralianWines.csv", header=TRUE, stringsAsFactors=FALSE)
str(FortWine)
## 'data.frame': 188 obs. of 7 variables:
## $ Month : chr "Jan-80" "Feb-80" "Mar-80" "Apr-80" ...
## $ Fortified : int 2585 3368 3210 3111 3756 4216 5225 4426 3932 3816 ...
## $ Red : int 464 675 703 887 1139 1077 1318 1260 1120 963 ...
## $ Rose : chr "112" "118" "129" "99" ...
## $ sparkling : int 1686 1591 2304 1712 1471 1377 1966 2453 1984 2596 ...
## $ Sweet.white: int 85 89 109 95 91 95 96 128 124 111 ...
## $ Dry.white : int 1954 2302 3054 2414 2226 2725 2589 3470 2400 3180 ...
head(FortWine)
## Month Fortified Red Rose sparkling Sweet.white Dry.white
## 1 Jan-80 2585 464 112 1686 85 1954
## 2 Feb-80 3368 675 118 1591 89 2302
## 3 Mar-80 3210 703 129 2304 109 3054
## 4 Apr-80 3111 887 99 1712 95 2414
## 5 May-80 3756 1139 116 1471 91 2226
## 6 Jun-80 4216 1077 168 1377 95 2725
tail(FortWine)
## Month Fortified Red Rose sparkling Sweet.white Dry.white
## 183 NA NA NA NA NA
## 184 NA NA NA NA NA
## 185 NA NA NA NA NA
## 186 NA NA NA NA NA
## 187 NA NA NA NA NA
## 188 NA NA NA NA NA
#time series object
FortWine.ts <- ts(FortWine$Fortified, start=c(1980,1), freq=12)
#yrange=range(DeptSales.ts)
#autoplotting
autoplot(FortWine.ts)
8b. We partition the data and apply Holt-Winter’s exponential smoothing with multiplicative seasonality to sales.
library(zoo)
validLength<=12
## [1] TRUE
trainLength<-length(FortWine.ts)-validLength
FortTrain<-window(FortWine.ts,end=c(1980,trainLength))
FortValid<-window(FortWine.ts,start=c(1980,trainLength+1))
FortHWMult<-hw(FortTrain,seasonal="multiplicative",h=12)
## Warning in ets(x, "MAM", alpha = alpha, beta = beta, gamma = gamma, phi =
## phi, : Missing values encountered. Using longest contiguous portion of time
## series
summary(FortHWMult)
##
## Forecast method: Holt-Winters' multiplicative method
##
## Model Information:
## Holt-Winters' multiplicative method
##
## Call:
## hw(y = FortTrain, h = 12, seasonal = "multiplicative")
##
## Smoothing parameters:
## alpha = 0.0383
## beta = 2e-04
## gamma = 1e-04
##
## Initial states:
## l = 3981.777
## b = -8.1023
## s=1.0827 1.0429 0.896 0.9613 1.2844 1.3614
## 1.1394 1.1084 0.9455 0.8468 0.7231 0.6081
##
## sigma: 0.0925
##
## AIC AICc BIC
## 2982.699 2986.477 3036.979
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -25.56646 283.0968 223.7384 -1.662526 7.564533 0.8174337
## ACF1
## Training set 0.04506775
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1995 1297.504 1143.710 1451.298 1062.297 1532.711
## Feb 1995 1536.090 1353.879 1718.302 1257.422 1814.758
## Mar 1995 1791.120 1578.493 2003.747 1465.935 2116.305
## Apr 1995 1990.957 1754.421 2227.492 1629.206 2352.707
## May 1995 2323.770 2047.473 2600.067 1901.210 2746.330
## Jun 1995 2378.109 2095.120 2661.098 1945.315 2810.904
## Jul 1995 2828.640 2491.758 3165.522 2313.424 3343.856
## Aug 1995 2656.625 2339.961 2973.289 2172.330 3140.921
## Sep 1995 1979.456 1743.305 2215.607 1618.294 2340.617
## Oct 1995 1836.422 1617.142 2055.702 1501.062 2171.782
## Nov 1995 2127.963 1873.642 2382.283 1739.013 2516.912
## Dec 1995 2199.181 1936.109 2462.254 1796.846 2601.517
accuracy(FortHWMult,FortValid)
## ME RMSE MAE MPE MAPE MASE
## Training set -25.56646 283.0968 223.7384 -1.662526 7.564533 0.8174337
## Test set NaN NaN NaN NaN NaN NaN
## ACF1 Theil's U
## Training set 0.04506775 NA
## Test set NA NaN
autoplot(FortTrain)+xlab("Year")+autolayer(FortHWMult,PI=
FALSE, series="HW Multiplicative")
8c. Below we plot the residuals from the Holt-Winter’s exponential smoothing.
autoplot(FortHWMult$residuals)
FortHWMult$residuals
## Jan Feb Mar Apr May
## 1980 0.0697865684 0.1714303787 -0.0510104910 -0.1729907378 -0.1408864509
## 1981 -0.0179022498 0.0635803062 -0.0762407135 0.0158040687 -0.0007640742
## 1982 0.0439224710 -0.0689551888 -0.0203250323 0.0656447484 0.0118343984
## 1983 -0.0374643823 -0.0362066606 -0.0121897037 -0.0398545232 -0.0375703114
## 1984 -0.0881218715 0.0279247766 0.2239021889 -0.1479413285 0.0442620668
## 1985 0.0288046669 -0.1005032976 -0.0344267677 -0.0583787388 0.1180795634
## 1986 -0.0186218133 -0.0419216029 -0.1389016831 -0.0902672885 0.0453830364
## 1987 -0.1377133671 -0.1061483464 -0.1218660561 0.1502822960 -0.0508745935
## 1988 -0.0114620444 0.0426562767 -0.1062142101 0.0188760161 -0.0349109185
## 1989 -0.0817160055 -0.0320384124 -0.0471194267 0.0314331276 0.0363301962
## 1990 0.0107133357 -0.1091481215 0.1653046101 -0.0806834764 0.0143325239
## 1991 -0.1509668186 -0.0861805567 0.1452172320 0.0476132016 0.0092142945
## 1992 0.0349560276 -0.0720805805 0.0253084162 0.0531730689 0.0665542379
## 1993 -0.1573136457 -0.1632188937 0.0042166303 -0.0721373126 -0.0802354991
## 1994 -0.1383556780 -0.0057499391 0.0688662108 0.2978223738 -0.0268454852
## Jun Jul Aug Sep Oct
## 1980 -0.0548342216 -0.0154600659 -0.1136459103 0.0589661061 0.1026128329
## 1981 0.1452639617 0.0875234720 0.1177022616 -0.0100951875 -0.1484925646
## 1982 0.0276229417 0.0060202826 0.0862124870 -0.0660642657 0.0671536953
## 1983 -0.0097935332 -0.0878364906 0.1561277984 0.1742142477 -0.0381157195
## 1984 -0.0986561304 -0.1051252990 0.0219868263 -0.1254949870 -0.0599716025
## 1985 -0.0793927237 -0.1089029005 0.0103539518 -0.0886575634 -0.0015910803
## 1986 -0.0765561689 0.0327960446 0.1127750017 -0.0727639085 0.0244777787
## 1987 -0.1282873911 0.0001115970 -0.1047946029 0.0263715957 -0.0268998456
## 1988 0.0392109413 0.0878301771 -0.1504142738 -0.0726550888 -0.0955978385
## 1989 0.0356585112 0.1092490184 -0.0166394428 -0.0497412236 -0.0297718231
## 1990 0.0851950606 0.1000778935 -0.1394281859 -0.0533926463 0.0549539889
## 1991 -0.0416672960 0.0017696257 -0.0646296861 -0.0088388010 -0.0907432143
## 1992 0.0244237424 -0.0066115017 -0.1547951874 -0.0091092355 0.0612406819
## 1993 0.0294325544 -0.0502817928 -0.0899964368 -0.0051469266 -0.1087461067
## 1994 0.0480537196 -0.0792124317 -0.1688616080 0.1824808274 0.0560627640
## Nov Dec
## 1980 -0.0928626672 -0.0890473830
## 1981 -0.0326552298 -0.1075598190
## 1982 -0.0098347684 -0.1305367599
## 1983 -0.0649761520 -0.0288796278
## 1984 -0.0349897976 -0.1380515775
## 1985 0.0193436356 -0.0522213741
## 1986 -0.1217917917 0.1328251867
## 1987 0.0042502387 0.1802145991
## 1988 -0.0177178031 0.1625107426
## 1989 0.0645487374 0.0872933784
## 1990 0.0646360015 -0.0816419467
## 1991 -0.0219012053 0.1961743438
## 1992 0.0329486489 0.1195557021
## 1993 0.1009035475 0.1571093347
## 1994 0.2625260954 0.0657546678
From the graph, it is unreasonable to state that Decembers are not captured well by the model. From the table of residuals, December does have some of the largest residuals but there are other months that have larger values also. In fact the residuals do not seem to show much of a pattern but do reach higher values at the end of the training data. It is difficult to determine from the graph if there is correlation on a monthly basis, but it seems that the residuals are rather random in nature. Due to the random distribution of the residuals, I would surmise that the seasonality has been captured rather well by the model. Holt-Winter’s is designed to handle seasonality so it would not be necessary to remove seasonality before applying the HW method.
8ii. If there are strong correlations between months in the residuals, or a certain month is not captured well by the model or the model is not capturing seasonality particularly well, with exponential smoothing we can use the flexibility of the ets function to better choose a particular combination of error, trend or seasonality type. In addition, the ets function will automate the model search.