Assignment-4

Problem 2

Relationship between moving average and exponential smoothing:

Assume the we apply a moving to series, using a very short windows span. If we wanted to achieve an equivalent result using simple exponential smoothing, what value should the smoothing constant take?

The both method the moving average and simple exponential smoothing, the user must specify a single parameter:

In moving averages, the window width (ω); and in exponential smoothing, the smoothing constant (α)

In both cases, the parameter determines the importance of fresh information over older information.

The two smoothers are approximately equal if the window width of the moving is equal to w=(2/α)-1 so find the smoothing constant should be to achieve an equivalent result we have to find (α) and insert the window width(ω) .

Problem 5

Which of the following methods would no be suitable for forecasting this series?
1. Moving average of raw series:If we look to the plot the approach cannot be appropriate because it contain trend seasonality. The mothed is not suitable because the data has trend and seasonality. based on the text book p (79). the moving average and simple exponential smoothing are suitable for forecasting series that contain no trend and seasonality
1. Moving average of deseasonalized series: it will be not appropriate because removing the seasonality using moving average of deseasonalized, but still contain trend .so it is not suitable
1. Simple exponential smoothing of the raw series: This method would not be suitable for because Simple exponential smoothing of the raw series should only be use for forecasting series that have no trend or seasonality, series can obtained by removing trend /or seasonality from raw series.
1. Double exponential smoothing of the raw series: Double exponential smoothing would not be suitable over the raw series This method is not suitable because we cannot use for a series that has seasonality.
1. Holt-Winters exponential smoothing over the raw series: This method is suitable for the series presented, becauseit contains both a trend and seasonality component.

getwd()

## [1] "C:/Users/Yusuf"

setwd("C:/Users/Yusuf")

# Create a time series 
DepStore <- read.csv("DeptStoreSales.csv", stringsAsFactors = FALSE)
str(DepStore)

## 'data.frame':    24 obs. of  2 variables:
##  $ Quarter: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Sales  : int  50147 49325 57048 76781 48617 50898 58517 77691 50862 53028 ...

head(DepStore)

##   Quarter Sales
## 1       1 50147
## 2       2 49325
## 3       3 57048
## 4       4 76781
## 5       5 48617
## 6       6 50898

tail(DepStore)

##    Quarter  Sales
## 19      19  71486
## 20      20  92183
## 21      21  60800
## 22      22  64900
## 23      23  76997
## 24      24 103337

DepStoreTs <- ts(DepStore$Sales, start=c(1), frequency=4)

# Set up the plot with y range 
y = range(DepStoreTs)
plot(c(1, 7), y, type = "n", xlab = "Year", ylab = "Quarterly Sales", bty = "l", ylim = c(20000, 120000), xaxt = "n", yaxt = "n")

# Add the time series
lines(DepStoreTs, bty = "l")

# Add the x-axis
axis(1, at = seq(1, 7), labels = format(seq(1, 7)))

# Add the y-axis
axis(2, at = seq(20000, 120000, 20000), labels = format(seq(20000, 120000, 20000)), las = 0)

abline(v=6)
arrows(1, 105000, 6, 105000, code=3, length=0.1)
text(10, 108000, "Training")

abline(v=7)
arrows(6, 105000, 7, 105000, code=3, length=0.1)
text(6.5, 108000, "Validation")

# first we partion the data to give us validation and training period.
ValidLength <- 4
TrainLength <- length(DepStoreTs) - ValidLength

DepTrain <- window(DepStoreTs, end = c(1, TrainLength))
DepValid <- window(DepStoreTs, start = c(1, TrainLength+1))
# Raw plots
plot(DepTrain)

# plot the validation 
plot(DepValid)

problem 5 b i

library(forecast)

## Warning: package 'forecast' was built under R version 3.3.3

#The Holt-Winters Forecast smoother method
dept.hw <- ets(DepTrain, model = "ZZZ",  alpha = 0.2, beta = 0.15, gamma = 0.05, restrict = FALSE)
dept.HWfor <- forecast(dept.hw, h=ValidLength, level=0)
#clean the plot
plot(c(1, 7), c(48000,114500), type="n", xlab="Year",  ylab="Quarterly Department Store Sales", bty="l", xaxt="n", yaxt="n")
# Add the x-axis
axis(1, at=seq(1,7,1), labels=format(seq(1,7,1)))
# Add the y-axis
axis(2, at=seq(48000, 114500, 9500), labels=format(seq(48000, 114500, 9500)), las=0)
abline(v=6)
arrows(1, 105000, 6, 105000, code=3, length=0.1)
#the training period
text(3, 108000, "Training")
abline(v=7)
arrows(6, 105000, 7, 105000, code=3, length=0.1)
# the validation period
text(6.5, 108000, "Validation")
#Add the time series 
lines(DepStoreTs, bty="l")
#Add the fitted and forecasted
lines(dept.HWfor$fitted, lwd = 2, col = "Red")
lines(dept.HWfor$mean, lwd = 2, col = "Red", lty = 2)

# the Quarterly  
dept.HWfor

##      Point Forecast      Lo 0      Hi 0
## 6 Q1       62115.16  62115.16  62115.16
## 6 Q2       65371.60  65371.60  65371.60
## 6 Q3       77076.87  77076.87  77076.87
## 6 Q4      102937.73 102937.73 102937.73

Problem 5 b ii

# accuracy function

accuracy(dept.HWfor$mean[1:2], DepValid[1:2])

##                 ME    RMSE      MAE       MPE     MAPE
## Test set -893.3803 987.939 893.3803 -1.444875 1.444875

Based on the results of the accuracy function, we see the MAPE for the test set (validation period) for quarters 21-22 was 1.44%.

problem 5 c

#Clean the plot
plot(c(1, 7), c(48000,114500), type="n", xlab="Year",  ylab="Quarterly Department Store Sales", bty="l", xaxt="n", yaxt="n")
# Add the x-axis
axis(1, at=seq(1,7,1), labels=format(seq(1,7,1)))
# Add the y-axis
axis(2, at=seq(48000, 114500, 9500), labels=format(seq(48000, 114500, 9500)), las=0)
abline(v=6)
arrows(1, 105000, 6, 105000, code=3, length=0.1)
text(3, 108000, "Training")
abline(v=7)
arrows(6, 105000, 7, 105000, code=3, length=0.1)
text(6.5, 108000, "Validation")
#time series 
lines(DepStoreTs, bty="l")
#fitted and forecasted
lines(dept.HWfor$fitted, lwd = 2, col = "Red")
lines(dept.HWfor$mean, lwd = 2, col = "Red", lty = 2)

```

problem 5 d. using the differencing on eof the smoothing methods

#create lag-1 series
difflag1.ts <- diff(DepStoreTs, lag = 1)
difflag4.ts <- diff(DepStoreTs, lag = 4)
difflag1_4.ts <-diff(diff(DepStoreTs, lag = 4), lag = 1)
# plot the four time series'
# Set up the plot to have two rows and two columns
par(mfrow = c(3,2))
plot(DepStoreTs, ylab="Sales", xlab="Year", main="Original Series")
plot (difflag1.ts, ylab="Lag 1", xlab="Year", main="Lag 1 to remove trend")
plot (difflag4.ts, ylab="Lag 4", xlab="Year", main="Lag 4 to remove quarterly seasonality")
plot (difflag1_4.ts, ylab="Lag 1 and 4", xlab="Year", main="Double Differenced Series")


plot((diff(diff(DepStoreTs, lag = 1), lag = 4)), ylab="Lag 1 and 4", xlab="Year", main="Double Differenced Series opposite order")

problem 5

#Set up training and validation periods on double diff
DDIFTrain <- window(difflag1_4.ts, end=c(1, TrainLength))
DDIFValid <- window(difflag1_4.ts, start=c(1,TrainLength+1), end=c(1,TrainLength+ValidLength))
#plot
par(mfrow = c(1,2))
plot(DDIFTrain)
plot(DDIFValid)

par(mfrow = c(1,1))
#Create Double Differenced point forecasts based on the training period.
pointForecasts <- meanf(DDIFTrain, h=4)
pointForecasts

##      Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## 6 Q1          569.2 -2116.935 3255.335 -3714.114 4852.514
## 6 Q2          569.2 -2116.935 3255.335 -3714.114 4852.514
## 6 Q3          569.2 -2116.935 3255.335 -3714.114 4852.514
## 6 Q4          569.2 -2116.935 3255.335 -3714.114 4852.514

# de-difference it
realForecasts <- vector()

for (i in 1:ValidLength) {
  if(i == 1) {
    realForecasts[i] <- pointForecasts$mean[i] + DepTrain[(TrainLength+i)-ValidLength] + (DepTrain[TrainLength] - DepTrain[TrainLength - ValidLength])
  } else {
    realForecasts[i] <- pointForecasts$mean[i] + DepTrain[(TrainLength+i)-ValidLength] + (realForecasts[i-1] - DepTrain[TrainLength+i-1-ValidLength])
  }
}
#print real forecast
realForecasts

## [1]  63982.2  68177.4  80201.6 101467.8

plot(realForecasts, type = 'l', bty = "l")

# plot our double-differenced forecast against the actual time series:
plot(c(1, 7), c(48000,114500), type="n", xlab="Year",  ylab="Quarterly Department Store Sales", bty="l", xaxt="n", yaxt="n")
# Add the x-axis
axis(1, at=seq(1,7,1), labels=format(seq(1,7,1)))
# Add the y-axis
axis(2, at=seq(48000, 114500, 9500), labels=format(seq(48000, 114500, 9500)), las=0)
abline(v=6)
arrows(1, 105000, 6, 105000, code=3, length=0.1)
text(3, 108000, "Training")
abline(v=7)
arrows(6, 105000, 7, 105000, code=3, length=0.1)
text(6.5, 108000, "Validation")
#Add the time series 
lines(DepStoreTs, bty="l")
lines(seq(6, 6.75, .25), realForecasts, col="blue", lwd=2, lty=2)

realForecasts[1:2]

## [1] 63982.2 68177.4

The plot above includes the forecasts we produced by double-differencing for all four quarters, but we did print the values for just quarters 21 and 22.

Problem 5

In comparing our results from the Holt Winters technique and the double-differencing technique, I would select Holt Winters for two reasons. First, Holt-Winters was simpler to execute, whereas the double-differencing was essentially a conversion process to enable us to apply a simpler static-series technique to a seasonal-trending series. Second, Holt-Winters seemed to produce forecasts with very strong predictive power in this case. In other words, it was both simpler and better. based on the both methods I prefer to use the Holt Winter’s method because easy to execute and the resultsin Holt winters is very strong predictive

Problem 5 g an even simpler approach is naive sasonal from that we can compare

DeptSnaive <- DepTrain[17:20]
#Let's plot our seasonal naive forecast against the actual time series:
plot(c(1, 7), c(48000,114500), type="n", xlab="Year",  ylab="Quarterly Department Store Sales", bty="l", xaxt="n", yaxt="n")
# Add the x-axis
axis(1, at=seq(1,7,1), labels=format(seq(1,7,1)))
# Add the y-axis
axis(2, at=seq(48000, 114500, 9500), labels=format(seq(48000, 114500, 9500)), las=0)
abline(v=6)
arrows(1, 105000, 6, 105000, code=3, length=0.1)
text(3, 108000, "Training")
abline(v=7)
arrows(6, 105000, 7, 105000, code=3, length=0.1)
text(6.5, 108000, "Validation")
#Add the time series 
lines(DepStoreTs, bty="l")
#Add the seasonal naive forecast
lines(seq(6, 6.75, .25), DeptSnaive, col="red", lwd=2, lty=2)

As expected, the seasonal naive forecast offers a decent baseline. It’s not as strongly predictive as the Holt-Winters approach, but much simpler to compute. Depending on what we’re doing (aka, domain knowledge), seasonal naive may be “good enough” for the task at hand.

Problem 8

Which forecasting method would you choose if you had to choose the same method for all series? Why?

After carefully observing trends of 6 types of wine sales. we identify some seasonality and different kind of trends. If we asked to the same method then one which may do the job with less risk of under/over estimating is Seasoanl. I will choose Holt-Winter’s Exponential Smoothing method .

Problem 8

Fortified wine has the largest market share of the six types of wine considered. You are asked to focus on fortified wine sales alone and produce as accurate as possible forecasts for the next 2 months

AustralianWines.data <- read.csv("C:/Users/Yusuf/AustralianWines.csv", header = TRUE, stringsAsFactors = FALSE)
fortifiedwines.ts <- ts(AustralianWines.data$Fortified, start = c(1980,1), end = c(1994, 12), freq = 12)


totalRecords <- length(fortifiedwines.ts)
nValidationRecords <- 12
nTrainigRecords <- totalRecords - nValidationRecords
train.ts <- window(fortifiedwines.ts, start = c(1980,1), end = c(1980,nTrainigRecords))
valid.ts <- window(fortifiedwines.ts, start = c (1980,nTrainigRecords+1), end = c(1980,totalRecords))

#Linear Trend Model with Season 

train.lm <- tslm(train.ts ~ trend + season)
# based on this model try to predict next 12 records i.e. validation period
train.lm.pred <- forecast(train.lm, h = nValidationRecords, level = 0)

## the model
fortifiedwines.lm.tns <- tslm(fortifiedwines.ts ~ trend + season)
fortifiedwines.lm.tns.pred <- forecast(fortifiedwines.lm.tns, h = 2, level = 0)
fortifiedwines.lm.tns.pred

##          Point Forecast     Lo 0     Hi 0
## Jan 1995        836.631  836.631  836.631
## Feb 1995       1191.564 1191.564 1191.564

train.lm.pred

##          Point Forecast      Lo 0      Hi 0
## Jan 1994       918.8819  918.8819  918.8819
## Feb 1994      1269.5962 1269.5962 1269.5962
## Mar 1994      1690.0247 1690.0247 1690.0247
## Apr 1994      1936.0247 1936.0247 1936.0247
## May 1994      2496.5962 2496.5962 2496.5962
## Jun 1994      2558.0247 2558.0247 2558.0247
## Jul 1994      3266.3819 3266.3819 3266.3819
## Aug 1994      2962.1676 2962.1676 2962.1676
## Sep 1994      1951.2390 1951.2390 1951.2390
## Oct 1994      1747.3104 1747.3104 1747.3104
## Nov 1994      2188.3104 2188.3104 2188.3104
## Dec 1994      2387.4533 2387.4533 2387.4533

plot(train.lm.pred)

accuracy(train.lm.pred$mean[1:2], valid.ts[1:2])

##               ME     RMSE     MAE      MPE     MAPE
## Test set 266.761 268.6312 266.761 19.70252 19.70252

plot(train.lm.pred$residuals)

problem 8

plot(train.lm.pred$residuals)

1.December (month 12) is not captured well by the model. the large residuals on the plot show above and its appear on the seasonality will have accuracy on the 12 months in training period

2.There is a strong correlation between sales on the same calendar month. That mean there is a prove of the seasonality and we can use this prove to evaluate the time series from the original instead of the Holt Winters .

3.The model does not capture seasonality well. This is true for accuracy sine it shows the residual and the seasonality

4.We should first deaseasonalize the data and then apply Holt -winters exponential smoothing Maybe we can ignore this statement and use the Holts winters .Holt-Winters is appropriate for a time series seasonality.

8cii

As we can see we have a large error on the month of Dec the lead to fact maybe we have more than one seasonal cycle. And the can be handle with exponential smoothing

Assignment-4

Yusuf Sultan

March 26, 2017

Problem 2

Relationship between moving average and exponential smoothing:

Problem 5