Homework 4

Problem #2 Relationship between Moving Average and Exponential Smoothing: With simple exponential smoothing, instead of taking a simple average over the w most recent values, we take a weighted average of all past values, so that weights decrease exponentially into the past. As the smoothing constant , alpha determines the rate of learning and a value closer to 1 indicates fast learning or only the most recent values influence the forecasts, we would want the alpha to be closer to one for the exponential smoothing to approximate a moving average with a very short window span.

Problem # 5 Forecasting Department Store sales ## Problem #1

library(forecast)
library(ggplot2)
setwd("~/USM_MBA_678")
DeptSales <- read.csv("DeptStoreSales.csv", header=TRUE, stringsAsFactors=FALSE)
str(DeptSales)

## 'data.frame':    24 obs. of  2 variables:
##  $ Quarter: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Sales  : int  50147 49325 57048 76781 48617 50898 58517 77691 50862 53028 ...

head(DeptSales)

##   Quarter Sales
## 1       1 50147
## 2       2 49325
## 3       3 57048
## 4       4 76781
## 5       5 48617
## 6       6 50898

tail(DeptSales)

##    Quarter  Sales
## 19      19  71486
## 20      20  92183
## 21      21  60800
## 22      22  64900
## 23      23  76997
## 24      24 103337

#time series object
DeptSales.ts <- ts(DeptSales$Sales, start=c(1,1), freq=4)
yrange=range(DeptSales.ts)

#autoplotting
autoplot(DeptSales.ts)

In this dataset, we have quarterly data which from the graph has obvious seasonality and trends upwards. The data is not stationary and we should only use moving averages when this is the case. So either of the moving average options would not be suitable since they would not effectively forecast the seasonal peaks and valleys of the dataset. Similarly, simple exponential smoothing which is like forecasting with a moving average, would not capture the seasonality of the dataset. We can use however double exponential smoothing for the series with a trend, but we would need to remove the seasonality first. The best forecasting method would be to use the Holt-Winter’s exponential smoothing method which uses a k-step-ahead forecast to take into account the season at period t+k.

5(b)Generating a forecasts for 4 quarters using Holt-Winter’s exponential smoothing.

validLength<-4
trainLength<-length(DeptSales.ts)-validLength

SalesTrain<-window(DeptSales.ts,end=c(1,trainLength))
SalesValid<-window(DeptSales.ts,start=c(1,trainLength+1))

library(zoo)

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

SalesHWMult<-hw(SalesTrain,seasonal="multiplicative",h=4)
summary(SalesHWMult)

## 
## Forecast method: Holt-Winters' multiplicative method
## 
## Model Information:
## Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = SalesTrain, h = 4, seasonal = "multiplicative") 
## 
##   Smoothing parameters:
##     alpha = 0.4032 
##     beta  = 0.1429 
##     gamma = 0.4549 
## 
##   Initial states:
##     l = 57401.8119 
##     b = 605.4045 
##     s=1.3012 0.9795 0.8614 0.8579
## 
##   sigma:  0.0258
## 
##      AIC     AICc      BIC 
## 372.3936 390.3936 381.3552 
## 
## Error measures:
##                   ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 246.631 1499.632 977.6487 0.3530739 1.665686 0.3137009
##                     ACF1
## Training set -0.07882461
## 
## Forecasts:
##      Point Forecast    Lo 80     Hi 80    Lo 95     Hi 95
## 6 Q1       61334.90 59303.21  63366.58 58227.70  64442.09
## 6 Q2       64971.30 62529.36  67413.25 61236.67  68705.94
## 6 Q3       76718.11 73376.84  80059.37 71608.08  81828.13
## 6 Q4       99420.55 94372.29 104468.81 91699.90 107141.20

accuracy(SalesHWMult,SalesValid)

##                   ME     RMSE       MAE       MPE     MAPE      MASE
## Training set 246.631 1499.632  977.6487 0.3530739 1.665686 0.3137009
## Test set     897.285 1981.638 1200.3859 0.7906391 1.285456 0.3851712
##                      ACF1 Theil's U
## Training set -0.078824613        NA
## Test set      0.009540924 0.1291442

autoplot(SalesTrain)+xlab("Year")+autolayer(SalesHWMult,PI=
                                             FALSE, series="HW Multiplicative")

From the text: MAPE = 1/v(summation(|et/yt|))100 =1/4(1415/60800)100 = .5818 for Quarter 21 =1/4(3243.5/64900)100 = 1.249 for Quarter 21 For both quarters combined MAPE = 1.831

5c - Is this model suitable? If we just look at the Actual vs. Forecast we might believe that the model is suitable for quarters 21 and 22, the problem is that the forecast errors have begun to trend upwards towards the last periods of the training data. This leads me to believe that the trend component of the data is not being adequately forecasted and will grow ever larger. 5d. Using differencing to remove the trend.

par(mfrow=c(2,2))
plot(diff(DeptSales.ts,lag=4),ylab = "Lag 4",xlab="Year", main="Seasonal Lag Differencing")
plot(diff(DeptSales.ts, lag=1),ylab="Lag 1",xlab="Year",main="Trend Lag Differencing")
plot(diff(diff(DeptSales.ts, lag=1),lag=4),ylab="Lag 1 then Lag 4",xlab="Year", main="Trend then Seasonal Differencing")
plot(diff(diff(DeptSales.ts, lag=4),lag=1),ylab = "Lag 4 then Lag 1",xlab="Year", main="Seasonal then Trend Differencing")

From the graph above showing the effect of differencing to remove the trend and seasonal pattern, the order of season or trend in differencing does not make any difference.

5e-Forecast Quarters 21-22

validLength<-4
trainLength<-length(DeptSales.ts)-validLength

SalesTrain<-window(DeptSales.ts,end=c(1,trainLength))
SalesValid<-window(DeptSales.ts,start=c(1,trainLength+1))

pointForecasts <-meanf(diff(diff(SalesTrain,lag=4),lag=1),h=2)

pointForecasts

##      Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## 6 Q1          569.2 -2116.935 3255.335 -3714.114 4852.514
## 6 Q2          569.2 -2116.935 3255.335 -3714.114 4852.514

The point forecasts for the two quarters are the same. However, we need to adjust back for the trend and seasonal pattern.

realForecasts<-vector()

for (i in 1:validLength) {
  if(i==1){
  realForecasts[i]<-pointForecasts$mean[i]+SalesTrain[(trainLength+i)-validLength]+ (SalesTrain[trainLength]-SalesTrain[trainLength-validLength])
} else {
  realForecasts[i]<-pointForecasts$mean[i]+SalesTrain[(trainLength+i)-validLength]+ (realForecasts[i-1]-SalesTrain[trainLength+i-1-validLength])
}
}
#Let's see what the real forecast values look like
realForecasts

## [1] 63982.2 68177.4      NA      NA

par(mfrow = c(1,1))
plot(realForecasts, type="l", bty = "l")

yrange=range(DeptSales.ts)

plot(c(1,6),c(yrange[1],yrange[2]),type="n",xlab="Year", ylab="Department Sales", bty="l",xaxt="n",yaxt="n")

lines(DeptSales.ts,bty="l")

axis(1, at=seq(1,6,1), labels=format(seq(1,6,1)))

axis(2, at=seq(48000,105000,5000), labels =format(seq(48.0,105,5)),las=2)


lines(seq(6,7-1/validLength,1/validLength),realForecasts, col="red",lwd=2,lty=2)

accuracy(realForecasts,SalesValid)

##               ME     RMSE    MAE       MPE     MAPE ACF1 Theil's U
## Test set -3229.8 3230.151 3229.8 -5.141902 5.141902 -0.5   0.13634

5f.Here the MAPE values is 5.141902 compared to the MAPE of 1.831 for the exponential smoothing. The Exponential Smoothing seems to do a better job of forecasting over the average of the double-differenced and it is much simpler to perform from a programming perspective.

5g.An even simpler baseline would to perform a seasonal Naive forecast.

Problem 8

I would use Holt-Winter’s exponential smoothing. All six wine plots exhibit seasonality and some degree of trending. b)fortified wine

library(forecast)
library(ggplot2)
setwd("~/USM_MBA_678")
FortWine <- read.csv("AustralianWines.csv", header=TRUE, stringsAsFactors=FALSE)
str(FortWine)

## 'data.frame':    188 obs. of  7 variables:
##  $ Month      : chr  "Jan-80" "Feb-80" "Mar-80" "Apr-80" ...
##  $ Fortified  : int  2585 3368 3210 3111 3756 4216 5225 4426 3932 3816 ...
##  $ Red        : int  464 675 703 887 1139 1077 1318 1260 1120 963 ...
##  $ Rose       : chr  "112" "118" "129" "99" ...
##  $ sparkling  : int  1686 1591 2304 1712 1471 1377 1966 2453 1984 2596 ...
##  $ Sweet.white: int  85 89 109 95 91 95 96 128 124 111 ...
##  $ Dry.white  : int  1954 2302 3054 2414 2226 2725 2589 3470 2400 3180 ...

head(FortWine)

##    Month Fortified  Red Rose sparkling Sweet.white Dry.white
## 1 Jan-80      2585  464  112      1686          85      1954
## 2 Feb-80      3368  675  118      1591          89      2302
## 3 Mar-80      3210  703  129      2304         109      3054
## 4 Apr-80      3111  887   99      1712          95      2414
## 5 May-80      3756 1139  116      1471          91      2226
## 6 Jun-80      4216 1077  168      1377          95      2725

tail(FortWine)

##     Month Fortified Red Rose sparkling Sweet.white Dry.white
## 183              NA  NA             NA          NA        NA
## 184              NA  NA             NA          NA        NA
## 185              NA  NA             NA          NA        NA
## 186              NA  NA             NA          NA        NA
## 187              NA  NA             NA          NA        NA
## 188              NA  NA             NA          NA        NA

#time series object
FortWine.ts <- ts(FortWine$Fortified, start=c(1980,1), freq=12)
#yrange=range(DeptSales.ts)

#autoplotting
autoplot(FortWine.ts)

8b. We partition the data and apply Holt-Winter’s exponential smoothing with multiplicative seasonality to sales.

library(zoo)
validLength<=12

## [1] TRUE

trainLength<-length(FortWine.ts)-validLength

FortTrain<-window(FortWine.ts,end=c(1980,trainLength))
FortValid<-window(FortWine.ts,start=c(1980,trainLength+1))

FortHWMult<-hw(FortTrain,seasonal="multiplicative",h=12)

## Warning in ets(x, "MAM", alpha = alpha, beta = beta, gamma = gamma, phi =
## phi, : Missing values encountered. Using longest contiguous portion of time
## series

summary(FortHWMult)

## 
## Forecast method: Holt-Winters' multiplicative method
## 
## Model Information:
## Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = FortTrain, h = 12, seasonal = "multiplicative") 
## 
##   Smoothing parameters:
##     alpha = 0.0383 
##     beta  = 2e-04 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 3981.777 
##     b = -8.1023 
##     s=1.0827 1.0429 0.896 0.9613 1.2844 1.3614
##            1.1394 1.1084 0.9455 0.8468 0.7231 0.6081
## 
##   sigma:  0.0925
## 
##      AIC     AICc      BIC 
## 2982.699 2986.477 3036.979 
## 
## Error measures:
##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -25.56646 283.0968 223.7384 -1.662526 7.564533 0.8174337
##                    ACF1
## Training set 0.04506775
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 1995       1297.504 1143.710 1451.298 1062.297 1532.711
## Feb 1995       1536.090 1353.879 1718.302 1257.422 1814.758
## Mar 1995       1791.120 1578.493 2003.747 1465.935 2116.305
## Apr 1995       1990.957 1754.421 2227.492 1629.206 2352.707
## May 1995       2323.770 2047.473 2600.067 1901.210 2746.330
## Jun 1995       2378.109 2095.120 2661.098 1945.315 2810.904
## Jul 1995       2828.640 2491.758 3165.522 2313.424 3343.856
## Aug 1995       2656.625 2339.961 2973.289 2172.330 3140.921
## Sep 1995       1979.456 1743.305 2215.607 1618.294 2340.617
## Oct 1995       1836.422 1617.142 2055.702 1501.062 2171.782
## Nov 1995       2127.963 1873.642 2382.283 1739.013 2516.912
## Dec 1995       2199.181 1936.109 2462.254 1796.846 2601.517

accuracy(FortHWMult,FortValid)

##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -25.56646 283.0968 223.7384 -1.662526 7.564533 0.8174337
## Test set           NaN      NaN      NaN       NaN      NaN       NaN
##                    ACF1 Theil's U
## Training set 0.04506775        NA
## Test set             NA       NaN

autoplot(FortTrain)+xlab("Year")+autolayer(FortHWMult,PI=
                                             FALSE, series="HW Multiplicative")

8c. Below we plot the residuals from the Holt-Winter’s exponential smoothing.

autoplot(FortHWMult$residuals)

FortHWMult$residuals

##                Jan           Feb           Mar           Apr           May
## 1980  0.0697865684  0.1714303787 -0.0510104910 -0.1729907378 -0.1408864509
## 1981 -0.0179022498  0.0635803062 -0.0762407135  0.0158040687 -0.0007640742
## 1982  0.0439224710 -0.0689551888 -0.0203250323  0.0656447484  0.0118343984
## 1983 -0.0374643823 -0.0362066606 -0.0121897037 -0.0398545232 -0.0375703114
## 1984 -0.0881218715  0.0279247766  0.2239021889 -0.1479413285  0.0442620668
## 1985  0.0288046669 -0.1005032976 -0.0344267677 -0.0583787388  0.1180795634
## 1986 -0.0186218133 -0.0419216029 -0.1389016831 -0.0902672885  0.0453830364
## 1987 -0.1377133671 -0.1061483464 -0.1218660561  0.1502822960 -0.0508745935
## 1988 -0.0114620444  0.0426562767 -0.1062142101  0.0188760161 -0.0349109185
## 1989 -0.0817160055 -0.0320384124 -0.0471194267  0.0314331276  0.0363301962
## 1990  0.0107133357 -0.1091481215  0.1653046101 -0.0806834764  0.0143325239
## 1991 -0.1509668186 -0.0861805567  0.1452172320  0.0476132016  0.0092142945
## 1992  0.0349560276 -0.0720805805  0.0253084162  0.0531730689  0.0665542379
## 1993 -0.1573136457 -0.1632188937  0.0042166303 -0.0721373126 -0.0802354991
## 1994 -0.1383556780 -0.0057499391  0.0688662108  0.2978223738 -0.0268454852
##                Jun           Jul           Aug           Sep           Oct
## 1980 -0.0548342216 -0.0154600659 -0.1136459103  0.0589661061  0.1026128329
## 1981  0.1452639617  0.0875234720  0.1177022616 -0.0100951875 -0.1484925646
## 1982  0.0276229417  0.0060202826  0.0862124870 -0.0660642657  0.0671536953
## 1983 -0.0097935332 -0.0878364906  0.1561277984  0.1742142477 -0.0381157195
## 1984 -0.0986561304 -0.1051252990  0.0219868263 -0.1254949870 -0.0599716025
## 1985 -0.0793927237 -0.1089029005  0.0103539518 -0.0886575634 -0.0015910803
## 1986 -0.0765561689  0.0327960446  0.1127750017 -0.0727639085  0.0244777787
## 1987 -0.1282873911  0.0001115970 -0.1047946029  0.0263715957 -0.0268998456
## 1988  0.0392109413  0.0878301771 -0.1504142738 -0.0726550888 -0.0955978385
## 1989  0.0356585112  0.1092490184 -0.0166394428 -0.0497412236 -0.0297718231
## 1990  0.0851950606  0.1000778935 -0.1394281859 -0.0533926463  0.0549539889
## 1991 -0.0416672960  0.0017696257 -0.0646296861 -0.0088388010 -0.0907432143
## 1992  0.0244237424 -0.0066115017 -0.1547951874 -0.0091092355  0.0612406819
## 1993  0.0294325544 -0.0502817928 -0.0899964368 -0.0051469266 -0.1087461067
## 1994  0.0480537196 -0.0792124317 -0.1688616080  0.1824808274  0.0560627640
##                Nov           Dec
## 1980 -0.0928626672 -0.0890473830
## 1981 -0.0326552298 -0.1075598190
## 1982 -0.0098347684 -0.1305367599
## 1983 -0.0649761520 -0.0288796278
## 1984 -0.0349897976 -0.1380515775
## 1985  0.0193436356 -0.0522213741
## 1986 -0.1217917917  0.1328251867
## 1987  0.0042502387  0.1802145991
## 1988 -0.0177178031  0.1625107426
## 1989  0.0645487374  0.0872933784
## 1990  0.0646360015 -0.0816419467
## 1991 -0.0219012053  0.1961743438
## 1992  0.0329486489  0.1195557021
## 1993  0.1009035475  0.1571093347
## 1994  0.2625260954  0.0657546678

From the graph, it is unreasonable to state that Decembers are not captured well by the model. From the table of residuals, December does have some of the largest residuals but there are other months that have larger values also. In fact the residuals do not seem to show much of a pattern but do reach higher values at the end of the training data. It is difficult to determine from the graph if there is correlation on a monthly basis, but it seems that the residuals are rather random in nature. Due to the random distribution of the residuals, I would surmise that the seasonality has been captured rather well by the model. Holt-Winter’s is designed to handle seasonality so it would not be necessary to remove seasonality before applying the HW method.

8ii. If there are strong correlations between months in the residuals, or a certain month is not captured well by the model or the model is not capturing seasonality particularly well, with exponential smoothing we can use the flexibility of the ets function to better choose a particular combination of error, trend or seasonality type. In addition, the ets function will automate the model search.

Homework 4

Thomas Beretich

February 26, 2018