Assume the we apply a moving to series, using a very short windows span. If we wanted to achieve an equivalent result using simple exponential smoothing, what value should the smoothing constant take?
The both method the moving average and simple exponential smoothing, the user must specify a single parameter: In moving averages, the window width (ω); and in exponential smoothing, the smoothing constant (α) In both cases, the parameter determines the importance of fresh information over older information. The two smoothers are approximately equal if the window width of the moving is equal to w=(2/α)-1 so find the smoothing constant should be to achieve an equivalent result we have to find (α) and insert the window width(ω) . |
getwd()
## [1] "C:/Users/Yusuf"
setwd("C:/Users/Yusuf")
# Create a time series
DepStore <- read.csv("DeptStoreSales.csv", stringsAsFactors = FALSE)
str(DepStore)
## 'data.frame': 24 obs. of 2 variables:
## $ Quarter: int 1 2 3 4 5 6 7 8 9 10 ...
## $ Sales : int 50147 49325 57048 76781 48617 50898 58517 77691 50862 53028 ...
head(DepStore)
## Quarter Sales
## 1 1 50147
## 2 2 49325
## 3 3 57048
## 4 4 76781
## 5 5 48617
## 6 6 50898
tail(DepStore)
## Quarter Sales
## 19 19 71486
## 20 20 92183
## 21 21 60800
## 22 22 64900
## 23 23 76997
## 24 24 103337
DepStoreTs <- ts(DepStore$Sales, start=c(1), frequency=4)
# Set up the plot with y range
y = range(DepStoreTs)
plot(c(1, 7), y, type = "n", xlab = "Year", ylab = "Quarterly Sales", bty = "l", ylim = c(20000, 120000), xaxt = "n", yaxt = "n")
# Add the time series
lines(DepStoreTs, bty = "l")
# Add the x-axis
axis(1, at = seq(1, 7), labels = format(seq(1, 7)))
# Add the y-axis
axis(2, at = seq(20000, 120000, 20000), labels = format(seq(20000, 120000, 20000)), las = 0)
abline(v=6)
arrows(1, 105000, 6, 105000, code=3, length=0.1)
text(10, 108000, "Training")
abline(v=7)
arrows(6, 105000, 7, 105000, code=3, length=0.1)
text(6.5, 108000, "Validation")
# first we partion the data to give us validation and training period.
ValidLength <- 4
TrainLength <- length(DepStoreTs) - ValidLength
DepTrain <- window(DepStoreTs, end = c(1, TrainLength))
DepValid <- window(DepStoreTs, start = c(1, TrainLength+1))
# Raw plots
plot(DepTrain)
# plot the validation
plot(DepValid)
problem 5 b i
library(forecast)
## Warning: package 'forecast' was built under R version 3.3.3
#The Holt-Winters Forecast smoother method
dept.hw <- ets(DepTrain, model = "ZZZ", alpha = 0.2, beta = 0.15, gamma = 0.05, restrict = FALSE)
dept.HWfor <- forecast(dept.hw, h=ValidLength, level=0)
#clean the plot
plot(c(1, 7), c(48000,114500), type="n", xlab="Year", ylab="Quarterly Department Store Sales", bty="l", xaxt="n", yaxt="n")
# Add the x-axis
axis(1, at=seq(1,7,1), labels=format(seq(1,7,1)))
# Add the y-axis
axis(2, at=seq(48000, 114500, 9500), labels=format(seq(48000, 114500, 9500)), las=0)
abline(v=6)
arrows(1, 105000, 6, 105000, code=3, length=0.1)
#the training period
text(3, 108000, "Training")
abline(v=7)
arrows(6, 105000, 7, 105000, code=3, length=0.1)
# the validation period
text(6.5, 108000, "Validation")
#Add the time series
lines(DepStoreTs, bty="l")
#Add the fitted and forecasted
lines(dept.HWfor$fitted, lwd = 2, col = "Red")
lines(dept.HWfor$mean, lwd = 2, col = "Red", lty = 2)
# the Quarterly
dept.HWfor
## Point Forecast Lo 0 Hi 0
## 6 Q1 62115.16 62115.16 62115.16
## 6 Q2 65371.60 65371.60 65371.60
## 6 Q3 77076.87 77076.87 77076.87
## 6 Q4 102937.73 102937.73 102937.73
Problem 5 b ii
# accuracy function
accuracy(dept.HWfor$mean[1:2], DepValid[1:2])
## ME RMSE MAE MPE MAPE
## Test set -893.3803 987.939 893.3803 -1.444875 1.444875
Based on the results of the accuracy function, we see the MAPE for the test set (validation period) for quarters 21-22 was 1.44%.
problem 5 c
#Clean the plot
plot(c(1, 7), c(48000,114500), type="n", xlab="Year", ylab="Quarterly Department Store Sales", bty="l", xaxt="n", yaxt="n")
# Add the x-axis
axis(1, at=seq(1,7,1), labels=format(seq(1,7,1)))
# Add the y-axis
axis(2, at=seq(48000, 114500, 9500), labels=format(seq(48000, 114500, 9500)), las=0)
abline(v=6)
arrows(1, 105000, 6, 105000, code=3, length=0.1)
text(3, 108000, "Training")
abline(v=7)
arrows(6, 105000, 7, 105000, code=3, length=0.1)
text(6.5, 108000, "Validation")
#time series
lines(DepStoreTs, bty="l")
#fitted and forecasted
lines(dept.HWfor$fitted, lwd = 2, col = "Red")
lines(dept.HWfor$mean, lwd = 2, col = "Red", lty = 2)
```
problem 5 d. using the differencing on eof the smoothing methods
#create lag-1 series
difflag1.ts <- diff(DepStoreTs, lag = 1)
difflag4.ts <- diff(DepStoreTs, lag = 4)
difflag1_4.ts <-diff(diff(DepStoreTs, lag = 4), lag = 1)
# plot the four time series'
# Set up the plot to have two rows and two columns
par(mfrow = c(3,2))
plot(DepStoreTs, ylab="Sales", xlab="Year", main="Original Series")
plot (difflag1.ts, ylab="Lag 1", xlab="Year", main="Lag 1 to remove trend")
plot (difflag4.ts, ylab="Lag 4", xlab="Year", main="Lag 4 to remove quarterly seasonality")
plot (difflag1_4.ts, ylab="Lag 1 and 4", xlab="Year", main="Double Differenced Series")
plot((diff(diff(DepStoreTs, lag = 1), lag = 4)), ylab="Lag 1 and 4", xlab="Year", main="Double Differenced Series opposite order")
problem 5
e
#Set up training and validation periods on double diff
DDIFTrain <- window(difflag1_4.ts, end=c(1, TrainLength))
DDIFValid <- window(difflag1_4.ts, start=c(1,TrainLength+1), end=c(1,TrainLength+ValidLength))
#plot
par(mfrow = c(1,2))
plot(DDIFTrain)
plot(DDIFValid)
par(mfrow = c(1,1))
#Create Double Differenced point forecasts based on the training period.
pointForecasts <- meanf(DDIFTrain, h=4)
pointForecasts
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 6 Q1 569.2 -2116.935 3255.335 -3714.114 4852.514
## 6 Q2 569.2 -2116.935 3255.335 -3714.114 4852.514
## 6 Q3 569.2 -2116.935 3255.335 -3714.114 4852.514
## 6 Q4 569.2 -2116.935 3255.335 -3714.114 4852.514
# de-difference it
realForecasts <- vector()
for (i in 1:ValidLength) {
if(i == 1) {
realForecasts[i] <- pointForecasts$mean[i] + DepTrain[(TrainLength+i)-ValidLength] + (DepTrain[TrainLength] - DepTrain[TrainLength - ValidLength])
} else {
realForecasts[i] <- pointForecasts$mean[i] + DepTrain[(TrainLength+i)-ValidLength] + (realForecasts[i-1] - DepTrain[TrainLength+i-1-ValidLength])
}
}
#print real forecast
realForecasts
## [1] 63982.2 68177.4 80201.6 101467.8
plot(realForecasts, type = 'l', bty = "l")
# plot our double-differenced forecast against the actual time series:
plot(c(1, 7), c(48000,114500), type="n", xlab="Year", ylab="Quarterly Department Store Sales", bty="l", xaxt="n", yaxt="n")
# Add the x-axis
axis(1, at=seq(1,7,1), labels=format(seq(1,7,1)))
# Add the y-axis
axis(2, at=seq(48000, 114500, 9500), labels=format(seq(48000, 114500, 9500)), las=0)
abline(v=6)
arrows(1, 105000, 6, 105000, code=3, length=0.1)
text(3, 108000, "Training")
abline(v=7)
arrows(6, 105000, 7, 105000, code=3, length=0.1)
text(6.5, 108000, "Validation")
#Add the time series
lines(DepStoreTs, bty="l")
lines(seq(6, 6.75, .25), realForecasts, col="blue", lwd=2, lty=2)
realForecasts[1:2]
## [1] 63982.2 68177.4
The plot above includes the forecasts we produced by double-differencing for all four quarters, but we did print the values for just quarters 21 and 22.
Problem 5
f
In comparing our results from the Holt Winters technique and the double-differencing technique, I would select Holt Winters for two reasons. First, Holt-Winters was simpler to execute, whereas the double-differencing was essentially a conversion process to enable us to apply a simpler static-series technique to a seasonal-trending series. Second, Holt-Winters seemed to produce forecasts with very strong predictive power in this case. In other words, it was both simpler and better. based on the both methods I prefer to use the Holt Winter’s method because easy to execute and the resultsin Holt winters is very strong predictive
Problem 5 g an even simpler approach is naive sasonal from that we can compare
DeptSnaive <- DepTrain[17:20]
#Let's plot our seasonal naive forecast against the actual time series:
plot(c(1, 7), c(48000,114500), type="n", xlab="Year", ylab="Quarterly Department Store Sales", bty="l", xaxt="n", yaxt="n")
# Add the x-axis
axis(1, at=seq(1,7,1), labels=format(seq(1,7,1)))
# Add the y-axis
axis(2, at=seq(48000, 114500, 9500), labels=format(seq(48000, 114500, 9500)), las=0)
abline(v=6)
arrows(1, 105000, 6, 105000, code=3, length=0.1)
text(3, 108000, "Training")
abline(v=7)
arrows(6, 105000, 7, 105000, code=3, length=0.1)
text(6.5, 108000, "Validation")
#Add the time series
lines(DepStoreTs, bty="l")
#Add the seasonal naive forecast
lines(seq(6, 6.75, .25), DeptSnaive, col="red", lwd=2, lty=2)
As expected, the seasonal naive forecast offers a decent baseline. It’s not as strongly predictive as the Holt-Winters approach, but much simpler to compute. Depending on what we’re doing (aka, domain knowledge), seasonal naive may be “good enough” for the task at hand.
Problem 8
After carefully observing trends of 6 types of wine sales. we identify some seasonality and different kind of trends. If we asked to the same method then one which may do the job with less risk of under/over estimating is Seasoanl. I will choose Holt-Winter’s Exponential Smoothing method .
Problem 8
AustralianWines.data <- read.csv("C:/Users/Yusuf/AustralianWines.csv", header = TRUE, stringsAsFactors = FALSE)
fortifiedwines.ts <- ts(AustralianWines.data$Fortified, start = c(1980,1), end = c(1994, 12), freq = 12)
totalRecords <- length(fortifiedwines.ts)
nValidationRecords <- 12
nTrainigRecords <- totalRecords - nValidationRecords
train.ts <- window(fortifiedwines.ts, start = c(1980,1), end = c(1980,nTrainigRecords))
valid.ts <- window(fortifiedwines.ts, start = c (1980,nTrainigRecords+1), end = c(1980,totalRecords))
#Linear Trend Model with Season
train.lm <- tslm(train.ts ~ trend + season)
# based on this model try to predict next 12 records i.e. validation period
train.lm.pred <- forecast(train.lm, h = nValidationRecords, level = 0)
## the model
fortifiedwines.lm.tns <- tslm(fortifiedwines.ts ~ trend + season)
fortifiedwines.lm.tns.pred <- forecast(fortifiedwines.lm.tns, h = 2, level = 0)
fortifiedwines.lm.tns.pred
## Point Forecast Lo 0 Hi 0
## Jan 1995 836.631 836.631 836.631
## Feb 1995 1191.564 1191.564 1191.564
train.lm.pred
## Point Forecast Lo 0 Hi 0
## Jan 1994 918.8819 918.8819 918.8819
## Feb 1994 1269.5962 1269.5962 1269.5962
## Mar 1994 1690.0247 1690.0247 1690.0247
## Apr 1994 1936.0247 1936.0247 1936.0247
## May 1994 2496.5962 2496.5962 2496.5962
## Jun 1994 2558.0247 2558.0247 2558.0247
## Jul 1994 3266.3819 3266.3819 3266.3819
## Aug 1994 2962.1676 2962.1676 2962.1676
## Sep 1994 1951.2390 1951.2390 1951.2390
## Oct 1994 1747.3104 1747.3104 1747.3104
## Nov 1994 2188.3104 2188.3104 2188.3104
## Dec 1994 2387.4533 2387.4533 2387.4533
plot(train.lm.pred)
accuracy(train.lm.pred$mean[1:2], valid.ts[1:2])
## ME RMSE MAE MPE MAPE
## Test set 266.761 268.6312 266.761 19.70252 19.70252
plot(train.lm.pred$residuals)
problem 8
ci
plot(train.lm.pred$residuals)
1.December (month 12) is not captured well by the model. the large residuals on the plot show above and its appear on the seasonality will have accuracy on the 12 months in training period
2.There is a strong correlation between sales on the same calendar month. That mean there is a prove of the seasonality and we can use this prove to evaluate the time series from the original instead of the Holt Winters .
3.The model does not capture seasonality well. This is true for accuracy sine it shows the residual and the seasonality
4.We should first deaseasonalize the data and then apply Holt -winters exponential smoothing Maybe we can ignore this statement and use the Holts winters .Holt-Winters is appropriate for a time series seasonality.
8cii
As we can see we have a large error on the month of Dec the lead to fact maybe we have more than one seasonal cycle. And the can be handle with exponential smoothing