Predictive
Analytics

Question 7:1

Our first series concerns pigs slaughtered in Victoria. We fit a simple exponential smoothing model. The alpha and initial states found by the function are: $\alpha$ = .2971 and ${ \iota }_{ 0 }$ = 77260.06. Our value for $\hat { y } \pm Z * s$ has a narrower range than the ses function computes. We computed the standard deviation based on the full series and based on the last 3 years. The first months’estimates are 78611.97 and 119020.8. Ours are in the table below.

##  Time-Series [1:188] from 1980 to 1996: 76378 71947 33873 96428 105084 ...

## [1] forecast for next 4 months:

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Sep 1995       98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995       98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995       98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995       98816.41 83958.37 113674.4 76092.99 121539.8

## Simple exponential smoothing 
## 
## Call:
##  ses(y = pigs, h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.2971 
## 
##   Initial states:
##     l = 77260.0561 
## 
##   sigma:  10308.58
## 
##      AIC     AICc      BIC 
## 4462.955 4463.086 4472.665

##         Mar    Apr    May    Jun    Jul    Aug
## 1995 106723  84307 114896 106749  87892 100506


min. based on full residuals	78679.9711418255
max. based on full residuals	118952.848858174
min. based on last three years’ residuals	78969.952246465
max. based on last three years’ residuals	118662.867753535

Question 7:5

Our next series is a series of book sales, separated into two series for paperback and hardcover books. There are 30 values for each series. There is an upward trend for both that looks like it might be similar. But the overall change is much more for hardcover than for paperbacks. There is no apparent seasonality.

##  Time-Series [1:30, 1:2] from 1 to 30: 199 172 111 209 161 119 195 195 131 183 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:2] "Paperback" "Hardcover"

Our RMSE estimates are:


paperbacks	33.637686782912
hardcovers	31.9310149844547

Question 7:6

For the same series, we fit a Holt’s linear method forecast.

##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       209.4668 166.6035 252.3301 143.9130 275.0205
## 32       210.7177 167.8544 253.5811 145.1640 276.2715
## 33       211.9687 169.1054 254.8320 146.4149 277.5225
## 34       213.2197 170.3564 256.0830 147.6659 278.7735

##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       250.1739 212.7390 287.6087 192.9222 307.4256
## 32       253.4765 216.0416 290.9113 196.2248 310.7282
## 33       256.7791 219.3442 294.2140 199.5274 314.0308
## 34       260.0817 222.6468 297.5166 202.8300 317.3334

For this model, the RMSE is :


paperbacks-Holt linear	31.1369230162347
hardcovers-Holt linear	27.1935779818511

Our RMSE for the Holt’s linear smoothing model is less than for the simple exponential smoothing. We can’t directly compare them because they don’t have the same number of parameters. We would do better to use AIC.

Without a reason to use a more complicated, it appears the simpler ses model makes more sense. It also has a smaller confidence interval. With a confidence interval wide enough for the highest value to be over 50% more than the lowest value, one would have to be wary of both methods.

__________________

Instead of RMSE, we use standard deviation to compute our confidence error. The sd function uses n-1 in the denominator, which is more appropriate for sampled data. Our confidence intervals are more narrow than the function output provides.

##         Mar    Apr    May    Jun    Jul    Aug
## 1995 106723  84307 114896 106749  87892 100506


min. for paperbacks	147.839044920336
min. for paperbacks	271.094555079664
min. for hardcovers	195.963995925
max. for hardcovers	304.383804075

## [1] "paperbacks"

##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       209.4668 166.6035 252.3301 143.9130 275.0205
## 32       210.7177 167.8544 253.5811 145.1640 276.2715
## 33       211.9687 169.1054 254.8320 146.4149 277.5225
## 34       213.2197 170.3564 256.0830 147.6659 278.7735

## [1] "hardcovers"

##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       250.1739 212.7390 287.6087 192.9222 307.4256
## 32       253.4765 216.0416 290.9113 196.2248 310.7282
## 33       256.7791 219.3442 294.2140 199.5274 314.0308
## 34       260.0817 222.6468 297.5166 202.8300 317.3334

Question 7:7

Our eggs dataset comes from data for the price of a dozen eggs in te US between 1900 and 1993.

The price of eggs in this index appears to be inflation linked, with 1980 pegged to 100 as the basis. Source: https://www.thedailymeal.com/eat/egg-price-year-you-were-born-gallery

We look at Holt’s linear model for different levels of damping and different Box-Cox transformations. When we damp our prediction model, we limit the future effect for our model. Prediction is more difficult when we look further out and we might expect a trend to regress toward the mean. Damping helps us account for the uncertainty and brings a prediction closer back to a naive forecast.

All of our Box-Cox applied functions have a more rounded or scooped appearance. Applying a power function causes a relationship to curve. With a negative power, variance toward the end is minimized and we see a very narrow confidence interval. With a power near 2, it becomes wide and hard to predict. When we fit a Box-Cox transformation with an Mle estimate, we get a fairly reasonable estimate that looks slowed in its change. We end up with a low RMSE. The RMSE is applied to a low lambda transformation appears low, but the transformation made all of the values low. The optimal lambda model looks appropriate.

## [1] 0.3956183

RMSE for different inputs


no modification	0.0449908695633474
high damping	3.31114451284137
low damping	1.19650419858032
Box-Cox low lambda	1.13857756677201e-06
Box-Cox high lambda	45.1115980951156
Box-Cox MLE lambda	0.00247085256848212

Question 7:8

We turn back toward the retail series from previous homeworks. A multiplicative model is the best. When we look at the seasonal decomposition, we see that the seasonal component is not stable. The peaks get larger over time. The shape of the seasonality changes and becomes wider.

The RMSE is lower for the undamped model. A more complicated model will tend to fit training data better. Our concern is that in test data, it will not hold up as well if the additional trend is not called for.

## [1] RMSE for undamped Holt-Winter multiplicative model

## [1] 5.431613e-05

## [1] RMSE for damped Holt-Winter multiplicative model

## [1] 0.009815399

The residuals from our time series fit do not appear to be white noise. The variance appears to be decreasing over time. Our RMSE for the multiplicative Holt-Winter is considerably worse than RMSE for the seasonal naive method from 3:7.

## [1] 2.86398

##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.01025188 0.9534696 0.7324588 -0.4347743 4.952780 0.5071923
## Test set     -2.36907333 2.8639797 2.5062068 -8.7973452 9.303342 1.7354270
##                    ACF1 Theil's U
## Training set 0.01839391        NA
## Test set     0.65462182 0.5562065

## [1] 1.24733

When we look at training splits for 6 month intervals, we see that the end of 2010 is not the optimal time for the multiplicative Holt-Winter prediction. The RMSE is not stable from time to time. If we split it 6 moths before, the seasonal naive method still works better, but by less.

Question 7:9

We want to test an stl model and an ETS model containing additive seasonality (not pictured) and an ETS model containing multiplicative seasonality. For both of the following models, mid-2010 yields the lowest RMSE just like the multiplicative Holt-Winter. Some interpret “followed by” to mean apply one model that uses the results of the first. The textbook doesn’t describe such a model. Without understanding what the results of applying one complicated modeling technique to another does, it seems risky.

## [1] "RMSE for STL model"

## [1] 3.304199

When we fit a ZZM model, an MNA model is chosen by the information criterion. The best RMSE, 1.55, is achieved with a split in the middle of 2010. We still are not able to achieve an RMSE better than that for a seasonal naive model. Given that it is also simpler, it is the best model to choose.

## ETS(M,N,A) 
## 
## Call:
##  ets(y = turnover.train, model = "ZZA") 
## 
##   Smoothing parameters:
##     alpha = 0.4512 
##     gamma = 0.3094 
## 
##   Initial states:
##     l = 5.2379 
##     s = -0.1283 -1.5478 -1.2293 2.2136 0.2006 -0.0526
##            -0.3004 -0.4622 0.0403 -0.1608 0.8024 0.6247
## 
##   sigma:  0.0687
## 
##      AIC     AICc      BIC 
## 2021.343 2022.802 2078.997

## [1] 1.879095

## [1] 1.550265

—————————————————————————

_Appendix______________________

str(pigs) exp_smooth<-ses(pigs,h=4) full_resids_stdev<-sd(exp_smooth$residuals) recent_three_stdev<-sd(window(exp_smooth$residuals, start=c(1992,9))) #full_resids_stdev #recent_three_stdev noquote(‘forecast for next 4 months:’) exp_smooth exp_smooth$model

ninety_five_ci_estimate_table<-c(“min. based on full residuals”,“max. based on full residuals”,“min. based on last three years’ residuals”,“max. based on last three years’ residuals”) ninety_five_ci_estimate_table<-cbind(ninety_five_ci_estimate_table,ninety_five_ci_estimate_table) colnames(ninety_five_ci_estimate_table)<-c(‘’,’’) ninety_five_ci_estimate_table[1,2]<-(98816.41-1.96full_resids_stdev) ninety_five_ci_estimate_table[2,2]<-(98816.41+1.96full_resids_stdev) ninety_five_ci_estimate_table[3,2]<-(98816.41-1.96recent_three_stdev) ninety_five_ci_estimate_table[4,2]<-(98816.41+1.96recent_three_stdev) tail(pigs) kable(ninety_five_ci_estimate_table, “html”) %>% kable_styling(“striped”, full_width = F) %>% column_spec(1, bold = T, color = “white”, background = “#73b587”) %>% column_spec(2, bold = T, color = “#73b587”, background = “white”)

str(books) books<-ts(books) autoplot(books)+ theme(axis.text.x = element_text(angle = 30, hjust = .9),text = element_text(family = “corben”,color=‘#249382’,size=16))+ theme(panel.background = element_rect(fill = ‘#f4f4ef’),panel.grid.major = element_blank(), panel.grid.minor = element_blank())+xlab(‘’) paperback_ses<-ses(books[,1],h=4) hardcover_ses<-ses(books[,2],h=4) #plot ses for softback,hardcover autoplot(paperback_ses) + autolayer(fitted(paperback_ses), series=“Fitted”)+ ylab(“paperbacks smoothed”) + xlab(“”)+ylim(50,350)+ theme(panel.background = element_rect(fill =’#f4f4ef’))+ theme(axis.text.x = element_text(angle = 30, hjust = .9),text = element_text(family = “corben”,color=‘#656aa0’,size=16))+ggtitle(‘simple exponential smoothing’)

autoplot(hardcover_ses) + autolayer(fitted(hardcover_ses), series=“Fitted”)+ ylab(“hardcovers smoothed”) + xlab(“”)+ylim(50,350)+ theme(panel.background = element_rect(fill = ‘#f4f4ef’))+ theme(axis.text.x = element_text(angle = 30, hjust = .9),text = element_text(family = “corben”,color=‘#656aa0’,size=16))+ggtitle(‘simple exponential smoothing’)

Our RMSE estimates are:

#RMSE for paperback,hardcover ses rmse_table<-c(“paperbacks”,“hardcovers”) rmse_table<-cbind(rmse_table,rmse_table) colnames(rmse_table)<-c(‘’,’’) rmse_table[1,2]<-sqrt(mean(paperback_ses$residuals^2)) rmse_table[2,2]<-sqrt(mean(hardcover_ses$residuals^2)) kable(rmse_table, “html”) %>% kable_styling(“striped”, full_width = F) %>% column_spec(1, bold = T, color = “white”, background = “#73b587”) %>% column_spec(2, bold = T, color = “#73b587”, background = “white”)

paperback_holt <- holt(books[,1], h=4) hardcover_holt <- holt(books[,2], h=4)

paperback_holt hardcover_holt

rmse_table<-c(“paperbacks-Holt linear”,“hardcovers-Holt linear”) rmse_table<-cbind(rmse_table,rmse_table) colnames(rmse_table)<-c(‘’,’’) rmse_table[1,2]<-sqrt(mean(paperback_holt$residuals^2)) rmse_table[2,2]<-sqrt(mean(hardcover_holt$residuals^2)) kable(rmse_table, “html”) %>% kable_styling(“striped”, full_width = F) %>% column_spec(1, bold = T, color = “white”, background = “#73b587”) %>% column_spec(2, bold = T, color = “#73b587”, background = “white”)

autoplot(hardcover_ses) + autolayer(fitted(paperback_holt), series=“Fitted”)+ ylab(“paperbacks smoothed”) + xlab(“”)+ylim(50,350)+ theme(panel.background = element_rect(fill = ‘#f4f4ef’))+ theme(axis.text.x = element_text(angle = 30, hjust = .9),text = element_text(family = “corben”,color=‘#656aa0’,size=16))+ggtitle(“Holt’s linear”) autoplot(hardcover_ses) + autolayer(fitted(hardcover_holt), series=“Fitted”)+ ylab(“hardcovers smoothed”) + xlab(“”)+ylim(50,350)+ theme(panel.background = element_rect(fill = ‘#f4f4ef’))+ theme(axis.text.x = element_text(angle = 30, hjust = .9),text = element_text(family = “corben”,color=‘#656aa0’,size=16))+ggtitle(“Holt’s linear”)

pback_resids_stdev<-sd(paperback_holt$residuals) hcover_resids_stdev<-sd(hardcover_holt$residuals)

holt_estimate_table<-c(“min. for paperbacks”,“min. for paperbacks”,“min. for hardcovers”,“max. for hardcovers”) holt_estimate_table<-cbind(holt_estimate_table,holt_estimate_table) colnames(holt_estimate_table)<-c(‘’,’‘) holt_estimate_table[1,2]<-209.4668-1.96pback_resids_stdev holt_estimate_table[2,2]<-209.4668+1.96pback_resids_stdev holt_estimate_table[3,2]<-250.1739-1.96hcover_resids_stdev holt_estimate_table[4,2]<-250.1739+1.96hcover_resids_stdev tail(pigs) kable(holt_estimate_table, “html”) %>% kable_styling(“striped”, full_width = F) %>% column_spec(1, bold = T, color = “white”, background = “#73b587”) %>% column_spec(2, bold = T, color = “#73b587”, background = “white”) print(’paperbacks’) paperback_holt print(‘hardcovers’) hardcover_holt

autoplot(eggs) holt_no_mod<-holt(eggs,h=100) autoplot(holt_no_mod)+xlab(“Holt’s linear”)+ylab(‘’)+ggtitle(’‘) holt_high_damping<-holt(eggs, damped=TRUE, phi = 0.8,h=100) autoplot(holt_high_damping)+xlab(“Holt’s damped 80%”)+ylab(’‘)+ theme(axis.text.x = element_text(angle = 30, hjust = .9),text = element_text(family = “corben”,color=’#249382’,size=16))+ggtitle(‘’) holt_low_damping<-holt(eggs, damped=TRUE, phi = 0.98,h=100) autoplot(holt_low_damping)+xlab(“Holt’s damped 98%”)+ylab(’‘)+ theme(axis.text.x = element_text(angle = 30, hjust = .9),text = element_text(family = “corben”,color=’#249382’,size=16))+ggtitle(’’) holt_low_lambda<-holt(eggs, lambda=-2, h=100)

autoplot(holt_low_lambda)+xlab(“Holt’s - lambda -2”)+ylab(‘’)+ theme(axis.text.x = element_text(angle = 30, hjust = .9),text = element_text(family = “corben”,color=’#249382’,size=16))+ggtitle(‘’) holt_high_lambda<-holt(eggs, lambda=1.9, h=100) autoplot(holt_high_lambda)+xlab(“Holt’s - lambda 1.9”)+ylab(’‘)+ theme(axis.text.x = element_text(angle = 30, hjust = .9),text = element_text(family = “corben”,color=’#249382’,size=16))+ggtitle(‘’) BoxCox.lambda(eggs) holt_MLE_lambda<-holt(eggs, lambda=.3956, h=100) autoplot(holt_MLE_lambda)+xlab(“Holt’s - lambda optimal”)+ylab(’‘)+ theme(axis.text.x = element_text(angle = 30, hjust = .9),text = element_text(family = “corben”,color=’#249382’,size=16))+ggtitle(‘’) #compute the RMSE boxcox_damp_rmse_table<-c(’no modification’,‘high damping’,‘low damping’,‘Box-Cox low lambda’,‘Box-Cox high lambda’,‘Box-Cox MLE lambda’) boxcox_damp_rmse_table<-cbind(boxcox_damp_rmse_table,boxcox_damp_rmse_table) colnames(boxcox_damp_rmse_table)<-c(‘’,’’)

boxcox_damp_rmse_table[1,2]<-sqrt(mean(holt_no_mod$residuals)^2) boxcox_damp_rmse_table[2,2]<-sqrt(mean(holt_high_damping$residuals)^{2) boxcox_damp_rmse_table[3,2]<-sqrt(mean(holt_low_damping$residuals)^2) boxcox_damp_rmse_table[4,2]<-sqrt(mean(holt_low_lambda$residuals)}2) boxcox_damp_rmse_table[5,2]<-sqrt(mean(holt_high_lambda$residuals)^2) boxcox_damp_rmse_table[6,2]<-sqrt(mean(holt_MLE_lambda$residuals)^2) kable(boxcox_damp_rmse_table, “html”) %>% kable_styling(“striped”, full_width = F) %>% column_spec(1, bold = T, color = “white”, background = “#73b587”) %>% column_spec(2, bold = T, color = “#73b587”, background = “white”)

retaildata <- readxl::read_excel(“C:/Users/dawig/Desktop/Data624/retail.xlsx”, skip=1) turnover <- ts(retaildata[,“A3349608L”], frequency=12, start=c(1982,4)) autoplot(turnover, ylab=“turnover”, xlab=“”,color=‘#249382’)+ theme(panel.background = element_rect(fill = ‘#efeae8’))+ theme(axis.text.x = element_text(angle = 30, hjust = .9),text = element_text(family = “corben”,color=‘#249382’,size=16)) #check seasonality to see if multiplicative smoothing is useful autoplot(seas(turnover)) + xlab(“Year”) + ggtitle(“Seasonal Adjustment”) autoplot(hw(turnover,seasonal=‘multiplicative’), ylab=“holt winter multiplicative”, xlab=“”,color=‘#249382’)+ theme(panel.background = element_rect(fill = ‘#efeae8’))+ theme(axis.text.x = element_text(angle = 30, hjust = .9),text = element_text(family = “corben”,color=‘#249382’,size=16))

autoplot(hw(turnover,seasonal=“multiplicative”,damped=TRUE,phi=.8,h=50),xlab=(‘’),ylab=(’damped, 50 periods’))+ theme(axis.text.x = element_text(angle = 30, hjust = .9),text = element_text(family = “corben”,color=‘#249382’,size=16)) noquote(‘RMSE for undamped Holt-Winter multiplicative model’) sqrt(mean(hw(turnover,seasonal=“multiplicative”,h=1)$residuals)^2) noquote('RMSE for damped Holt-Winter multiplicative model') sqrt(mean(hw(turnover,seasonal="multiplicative",damped=TRUE,phi=.8,h=1)$residuals)^2)

autoplot(hw(turnover,seasonal=“multiplicative”,h=1)$residuals,xlab=‘residuals’,ylab=‘’)+ theme(axis.text.x = element_text(angle = 30, hjust = .9),text = element_text(family = “corben”,color=’#249382’,size=16))

turnover.train <- window(turnover, end=c(2010,12)) turnover.test <- window(turnover, start=2011) accuracy(hw(turnover.train,seasonal=“multiplicative”),turnover.test)[4] accuracy(hw(turnover.train,seasonal=“multiplicative”),turnover.test) fc <- snaive(turnover.train) accuracy(fc,turnover.test)[4] RMSE_set<-rep(0,6) RMSE_set[4]<-accuracy(hw(turnover.train,seasonal=“multiplicative”),turnover.test)[4]

turnover.train <- window(turnover, end=c(2009,6)) turnover.test <- window(turnover, start=c(2009,7)) fc <- hw(turnover.train,seasonal=“multiplicative”) RMSE_set[1]<-accuracy(fc,turnover.test)[4]

turnover.train <- window(turnover, end=c(2009,12)) turnover.test <- window(turnover, start=2010) fc <- hw(turnover.train,seasonal=“multiplicative”) RMSE_set[2]<-accuracy(fc,turnover.test)[4]

turnover.train <- window(turnover, end=c(2010,6)) turnover.test <- window(turnover, start=c(2010,7)) fc <- hw(turnover.train,seasonal=“multiplicative”) RMSE_set[3]<-accuracy(fc,turnover.test)[4]

turnover.train <- window(turnover, end=c(2011,6)) turnover.test <- window(turnover, start=c(2011,7)) fc <-hw(turnover.train,seasonal=“multiplicative”) RMSE_set[5]<-accuracy(fc,turnover.test)[4]

turnover.train <- window(turnover, end=c(2012,12)) turnover.test <- window(turnover, start=2013) fc <- hw(turnover.train,seasonal=“multiplicative”) RMSE_set[6]<-accuracy(fc,turnover.test)[4]

ggplot()+geom_line(aes(y=RMSE_set,x=seq_along(RMSE_set)),size=1.5,color=‘#3dc666’)+scale_x_discrete(limits=c(2,4,6), labels=c(“Dec. 2009”,“Dec. 2010”,“Dec. 2011”))+labs(y=‘RMSE’,x=‘train/test split’)+theme(text = element_text(family = “corben”,color=‘#3dc666’,size=20))+ggtitle(‘HW multiplicative’)

lambda_input<-BoxCox.lambda(turnover) turnover.train <- window(turnover, end=c(2010,12)) turnover.test <- window(turnover, start=2011) stl_model<-stlf(turnover.train,lambda=lambda_input) print(‘RMSE for STL model’) accuracy(stl_model,turnover.test)[4] RMSE_set[4]<-accuracy(stl_model,turnover.test)[4] #———–

turnover.train <- window(turnover, end=c(2009,6)) turnover.test <- window(turnover, start=c(2009,7)) fc <- stlf(turnover.train,lambda=lambda_input) RMSE_set[1]<-accuracy(fc,turnover.test)[4]

turnover.train <- window(turnover, end=c(2009,12)) turnover.test <- window(turnover, start=2010) fc <- stlf(turnover.train,lambda=lambda_input) RMSE_set[2]<-accuracy(fc,turnover.test)[4]

turnover.train <- window(turnover, end=c(2010,6)) turnover.test <- window(turnover, start=c(2010,7)) fc <- stlf(turnover.train,lambda=lambda_input) RMSE_set[3]<-accuracy(fc,turnover.test)[4]

turnover.train <- window(turnover, end=c(2011,6)) turnover.test <- window(turnover, start=c(2011,7)) fc <-stlf(turnover.train,lambda=lambda_input) RMSE_set[5]<-accuracy(fc,turnover.test)[4]

turnover.train <- window(turnover, end=c(2012,12)) turnover.test <- window(turnover, start=2013) fc <- stlf(turnover.train,lambda=lambda_input) RMSE_set[6]<-accuracy(fc,turnover.test)[4]

turnover.train <- window(turnover, end=c(2010,12)) turnover.test <- window(turnover, start=2011) ets_mult_fit<-ets(turnover.train, model=“ZZA”) ets_mult_fit

ets_mult_fit %>% forecast(h=36)->ets_predict accuracy(ets_predict,turnover.test)[4] RMSE_set[4]<-accuracy(ets_predict,turnover.test)[4]

turnover.train <- window(turnover, end=c(2009,6)) turnover.test <- window(turnover, start=c(2009,7)) ets_mult_fit<-ets(turnover.train, model=“ZZA”) ets_mult_fit %>% forecast(h=36)->ets_predict RMSE_set[1]<-accuracy(ets_predict,turnover.test)[4]

turnover.train <- window(turnover, end=c(2009,12)) turnover.test <- window(turnover, start=2010) ets_mult_fit<-ets(turnover.train, model=“ZZA”) ets_mult_fit %>% forecast(h=36)->ets_predict RMSE_set[2]<-accuracy(ets_predict,turnover.test)[4]

turnover.train <- window(turnover, end=c(2010,6)) turnover.test <- window(turnover, start=c(2010,7)) ets_mult_fit<-ets(turnover.train, model=“ZZA”) ets_mult_fit %>% forecast(h=36)->ets_predict RMSE_set[3]<-accuracy(ets_predict,turnover.test)[4] accuracy(ets_predict,turnover.test)[4]

turnover.train <- window(turnover, end=c(2011,6)) turnover.test <- window(turnover, start=c(2011,7)) ets_mult_fit<-ets(turnover.train, model=“ZZA”) ets_mult_fit %>% forecast(h=36)->ets_predict RMSE_set[5]<-accuracy(ets_predict,turnover.test)[4]

turnover.train <- window(turnover, end=c(2012,12)) turnover.test <- window(turnover, start=2013) ets_mult_fit<-ets(turnover.train, model=“ZZA”) ets_mult_fit %>% forecast(h=36)->ets_predict RMSE_set[6]<-accuracy(ets_predict,turnover.test)[4]

Dan Wigodsky

Data 624 Homework 5

March 7, 2019

Question 7:1

Question 7:5

Question 7:6

For the same series, we fit a Holt’s linear method forecast.

Our RMSE for the Holt’s linear smoothing model is less than for the simple exponential smoothing. We can’t directly compare them because they don’t have the same number of parameters. We would do better to use AIC.

Without a reason to use a more complicated, it appears the simpler ses model makes more sense. It also has a smaller confidence interval. With a confidence interval wide enough for the highest value to be over 50% more than the lowest value, one would have to be wary of both methods.

__________________

Instead of RMSE, we use standard deviation to compute our confidence error. The sd function uses n-1 in the denominator, which is more appropriate for sampled data. Our confidence intervals are more narrow than the function output provides.

Question 7:7

Our eggs dataset comes from data for the price of a dozen eggs in te US between 1900 and 1993.

The price of eggs in this index appears to be inflation linked, with 1980 pegged to 100 as the basis. Source: https://www.thedailymeal.com/eat/egg-price-year-you-were-born-gallery

Question 7:8

We turn back toward the retail series from previous homeworks. A multiplicative model is the best. When we look at the seasonal decomposition, we see that the seasonal component is not stable. The peaks get larger over time. The shape of the seasonality changes and becomes wider.

The RMSE is lower for the undamped model. A more complicated model will tend to fit training data better. Our concern is that in test data, it will not hold up as well if the additional trend is not called for.

The residuals from our time series fit do not appear to be white noise. The variance appears to be decreasing over time. Our RMSE for the multiplicative Holt-Winter is considerably worse than RMSE for the seasonal naive method from 3:7.

When we look at training splits for 6 month intervals, we see that the end of 2010 is not the optimal time for the multiplicative Holt-Winter prediction. The RMSE is not stable from time to time. If we split it 6 moths before, the seasonal naive method still works better, but by less.

Question 7:9

When we fit a ZZM model, an MNA model is chosen by the information criterion. The best RMSE, 1.55, is achieved with a split in the middle of 2010. We still are not able to achieve an RMSE better than that for a seasonal naive model. Given that it is also simpler, it is the best model to choose.

—————————————————————————

_Appendix______________________

Dan Wigodsky

Data 624 Homework 5

March 7, 2019

Question 7:1

Question 7:5

Question 7:6

For the same series, we fit a Holt’s linear method forecast.

Our RMSE for the Holt’s linear smoothing model is less than for the simple exponential smoothing. We can’t directly compare them because they don’t have the same number of parameters. We would do better to use AIC.

Without a reason to use a more complicated, it appears the simpler ses model makes more sense. It also has a smaller confidence interval. With a confidence interval wide enough for the highest value to be over 50% more than the lowest value, one would have to be wary of both methods.

__________________

Instead of RMSE, we use standard deviation to compute our confidence error. The sd function uses n-1 in the denominator, which is more appropriate for sampled data. Our confidence intervals are more narrow than the function output provides.

Question 7:7

Our eggs dataset comes from data for the price of a dozen eggs in te US between 1900 and 1993.

The price of eggs in this index appears to be inflation linked, with 1980 pegged to 100 as the basis. Source: https://www.thedailymeal.com/eat/egg-price-year-you-were-born-gallery

Question 7:8

We turn back toward the retail series from previous homeworks. A multiplicative model is the best. When we look at the seasonal decomposition, we see that the seasonal component is not stable. The peaks get larger over time. The shape of the seasonality changes and becomes wider.

The RMSE is lower for the undamped model. A more complicated model will tend to fit training data better. Our concern is that in test data, it will not hold up as well if the additional trend is not called for.

The residuals from our time series fit do not appear to be white noise. The variance appears to be decreasing over time. Our RMSE for the multiplicative Holt-Winter is considerably worse than RMSE for the seasonal naive method from 3:7.

When we look at training splits for 6 month intervals, we see that the end of 2010 is not the optimal time for the multiplicative Holt-Winter prediction. The RMSE is not stable from time to time. If we split it 6 moths before, the seasonal naive method still works better, but by less.

Question 7:9

When we fit a ZZM model, an MNA model is chosen by the information criterion. The best RMSE, 1.55, is achieved with a split in the middle of 2010. We still are not able to achieve an RMSE better than that for a seasonal naive model. Given that it is also simpler, it is the best model to choose.

—————————————————————————

_______________________Appendix____________________________________________

_Appendix______________________