HW 5

7.1, 7.5,7.6, 7.7, 7.8 and 7.9

7.1 Consider the pigs series — the number of pigs slaughtered in Victoria each month.

library(fpp2)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## -- Attaching packages ---------------------------------------------- fpp2 2.4 --

## v ggplot2   3.3.3     v fma       2.4  
## v forecast  8.13      v expsmooth 2.3

##

library(seasonal)
library(kableExtra)

## Warning: package 'kableExtra' was built under R version 4.0.4

Use the ses() function in R to find the optimal values of \(\alpha\) \(\ell_0\) and generate forecasts for the next four months.

alpha = 0.2971, sigma: 10308.58

forecast <- ses(pigs,h=4) 
forecast

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Sep 1995       98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995       98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995       98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995       98816.41 83958.37 113674.4 76092.99 121539.8

print(forecast$model)

## Simple exponential smoothing 
## 
## Call:
##  ses(y = pigs, h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.2971 
## 
##   Initial states:
##     l = 77260.0561 
## 
##   sigma:  10308.58
## 
##      AIC     AICc      BIC 
## 4462.955 4463.086 4472.665

Compute a 95% prediction interval for the first forecast using \(\hat{y} \pm 1.96s\) where \(s\) is the standard deviation of the residuals. Compare your interval with the interval produced by

s <- sd(forecast$residuals)
mean_fc <- forecast$mean[1]


int_fc <- c(mean_fc - (1.96*s),mean_fc + (1.96*s))
int_fc

## [1]  78679.97 118952.84

7.5 Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.

Plot the series and discuss the main features of the data.

autoplot(books) +
  ggtitle('Daily sales of paper and hardcover books sale')

b. Use the ses() function to forecast each series, and plot the forecasts.

pb_ses <- ses(books[, "Paperback"], h = 4)
hc_ses <- ses(books[, "Hardcover"], h = 4)

autoplot(books[, "Paperback"], series = "Paperback") +
  autolayer(pb_ses, series = "Paperback", PI = FALSE) +
  autolayer(books[, "Hardcover"], series = "Hardcover") +
  autolayer(hc_ses, series = "Hardcover", PI=FALSE) +
  ylab("Books Sale") +
  ggtitle("Daily Paperback and hardcover books sales")

c. Compute the RMSE values for the training data in each case.

sqrt(mean(pb_ses$residuals^2))

## [1] 33.63769

sqrt(mean(hc_ses $residuals^2))

## [1] 31.93101

7.6. will continue with the daily sales of paperback and hardcover books in data set books.

Apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.

holtpaperback <- holt(books[, "Paperback"], h = 4)
holthardcover <- holt(books[, "Hardcover"], h = 4)


autoplot(books[, "Paperback"])

autoplot(books[,"Hardcover"])

Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.

holtpaperback <- holt(books[,1], h=4)
forecast(holtpaperback)

##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       209.4668 166.6035 252.3301 143.9130 275.0205
## 32       210.7177 167.8544 253.5811 145.1640 276.2715
## 33       211.9687 169.1054 254.8320 146.4149 277.5225
## 34       213.2197 170.3564 256.0830 147.6659 278.7735

holthardcover <- holt(books[,2], h=4)
forecast(holthardcover)

##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       250.1739 212.7390 287.6087 192.9222 307.4256
## 32       253.4765 216.0416 290.9113 196.2248 310.7282
## 33       256.7791 219.3442 294.2140 199.5274 314.0308
## 34       260.0817 222.6468 297.5166 202.8300 317.3334

round(accuracy(holtpaperback), 2)

##                 ME  RMSE   MAE   MPE  MAPE MASE  ACF1
## Training set -3.72 31.14 26.18 -5.51 15.58 0.66 -0.18

round(accuracy(holthardcover), 2)

##                 ME  RMSE   MAE   MPE  MAPE MASE  ACF1
## Training set -0.14 27.19 23.16 -2.11 12.16 0.69 -0.03

Compare the forecasts for the two series using both methods. Which do you think is best?

Based on the RMSE values and the fact that underlying time series has trend component, I will say forecast generated from Holt’s linear method is best

d.Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.

paste("95% PI of the paperback sales were calculated by holt function")

## [1] "95% PI of the paperback sales were calculated by holt function"

holtpaperback$upper[1, "95%"]

##      95% 
## 275.0205

holtpaperback$lower[1, "95%"]

##     95% 
## 143.913

paste("95% PI of the paperback sales were calculated by formula")

## [1] "95% PI of the paperback sales were calculated by formula"

#holtpaperback$mean[1] + 1.96*s_paperback

#holtpaperback$mean[1] - 1.96*s_paperback

paste("95% PI of the hardcover sales were calculated by holt function")

## [1] "95% PI of the hardcover sales were calculated by holt function"

holthardcover$upper[1, "95%"]

##      95% 
## 307.4256

holthardcover$lower[1, "95%"]

##      95% 
## 192.9222

paste("95% PI of the hardcover sales were calculated by formula")

## [1] "95% PI of the hardcover sales were calculated by formula"

#holthardcover$mean[1] + 1.96* s_hardcover

#holthardcover$mean[1] - 1.96*s_hardcover

7.7 For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900–1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.

[Hint: use h=100 when calling holt() so you can clearly see the differences between the various options when plotting the forecasts.]

Which model gives the best RMSE?

From Comparing the three models they have revealed the RMSE scores were similar, however, the holt w/ the box-cox has a slight edge from the different methods. I believe the holt w/ boxcox will be the best model to use for this example.

head(eggs)

## Time Series:
## Start = 1900 
## End = 1905 
## Frequency = 1 
## [1] 276.79 315.42 314.87 321.25 314.54 317.92

h <- 100

holteggs <- holt(eggs, h=h)
boxcox_eggs <- holt(eggs, h=h, lambda=TRUE)
dampedeggs <- holt(eggs, h=h, damped=TRUE)
dampedbox_cox_eggs <- holt(eggs, h=h, damped=TRUE, lambda=TRUE)
exponentialeggs <- holt(eggs, h=h, exponential=TRUE)

autoplot(holteggs) + ggtitle("Holt's Method")

autoplot(dampedeggs) + ggtitle("Damped")

autoplot(boxcox_eggs) + ggtitle("Box-Cox")

autoplot(dampedbox_cox_eggs) + ggtitle("Damped & Box-Cox")

autoplot(exponentialeggs) + ggtitle("Exponential")

g_rmse <- function(model){
  accuracy(model)[2]
}

Model <- c("Holt's Linear", 
           "Box-Cox Transformed",
           "Damped",
           "Damped and Box-Cox",
           "Exponential")

RMSE <- c(g_rmse(holteggs), 
          g_rmse(boxcox_eggs),
          g_rmse(dampedeggs),
          g_rmse(dampedbox_cox_eggs),
          g_rmse(exponentialeggs))

eggs_rmse_df <- data.frame(Model, RMSE) 

eggs_rmse_df %>%
  kable() %>%
  kable_styling()

Model	RMSE
Holt’s Linear	26.58219
Box-Cox Transformed	26.55504
Damped	26.54019
Damped and Box-Cox	26.73445
Exponential	26.49795

7.8 Recall your retail time series data (from Exercise 3 in Section 2.10)

retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349335T"],
  frequency=12, start=c(1982,4))

autoplot(myts)

Why is multiplicative seasonality necessary for this series?

The multiplicative is necessary in this series because the seasonality variations are charging w/ increase in time. Multiplicative seasonality variations are not constant and extra method can handle constant seasonal variations only.

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

hw_multi <- hw(myts, seasonal = "multiplicative")
autoplot(hw_multi) + 
  ggtitle("Multiplicative") + 
  ylab("Retail Sales")

summary(hw_multi)

## 
## Forecast method: Holt-Winters' multiplicative method
## 
## Model Information:
## Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = myts, seasonal = "multiplicative") 
## 
##   Smoothing parameters:
##     alpha = 0.3253 
##     beta  = 0.0129 
##     gamma = 0.0255 
## 
##   Initial states:
##     l = 304.256 
##     b = 2.1149 
##     s = 1.0213 0.9379 1.0298 1.1195 1.015 1.0226
##            0.9666 1.0035 0.9715 0.9468 0.9983 0.9673
## 
##   sigma:  0.0275
## 
##      AIC     AICc      BIC 
## 4769.995 4771.681 4837.022 
## 
## Error measures:
##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set 0.9212824 25.20381 18.77683 0.06856226 1.979315 0.3016982
##                    ACF1
## Training set -0.1217931
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 2014       2320.673 2238.928 2402.418 2195.655 2445.691
## Feb 2014       2146.714 2066.909 2226.519 2024.663 2268.765
## Mar 2014       2333.130 2241.768 2424.492 2193.404 2472.856
## Apr 2014       2232.738 2140.809 2324.666 2092.145 2373.330
## May 2014       2286.119 2187.318 2384.920 2135.016 2437.222
## Jun 2014       2198.216 2098.655 2297.777 2045.951 2350.482
## Jul 2014       2267.353 2159.893 2374.812 2103.008 2431.697
## Aug 2014       2319.664 2204.785 2434.543 2143.972 2495.356
## Sep 2014       2260.000 2143.202 2376.799 2081.373 2438.628
## Oct 2014       2382.065 2253.759 2510.371 2185.838 2578.292
## Nov 2014       2379.754 2246.319 2513.190 2175.682 2583.827
## Dec 2014       2657.668 2502.716 2812.620 2420.689 2894.647
## Jan 2015       2402.219 2256.142 2548.296 2178.813 2625.624
## Feb 2015       2221.927 2081.767 2362.086 2007.572 2436.282
## Mar 2015       2414.636 2256.781 2572.491 2173.218 2656.055
## Apr 2015       2310.511 2154.110 2466.911 2071.316 2549.705
## May 2015       2365.521 2199.864 2531.179 2112.170 2618.872
## Jun 2015       2274.345 2109.704 2438.986 2022.548 2526.142
## Jul 2015       2345.650 2170.260 2521.039 2077.415 2613.885
## Aug 2015       2399.538 2214.355 2584.721 2116.325 2682.751
## Sep 2015       2337.597 2151.532 2523.663 2053.035 2622.159
## Oct 2015       2463.620 2261.508 2665.732 2154.516 2772.724
## Nov 2015       2460.998 2253.045 2668.952 2142.960 2779.036
## Dec 2015       2748.143 2509.110 2987.175 2382.574 3113.712

hw_multi_damped <- hw(myts, seasonal = "multiplicative", damped = TRUE)
autoplot(hw_multi_damped) + 
  ggtitle("Multiplicative & Damped") + 
  ylab("Retail Sales")

summary(hw_multi_damped)

## 
## Forecast method: Damped Holt-Winters' multiplicative method
## 
## Model Information:
## Damped Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = myts, seasonal = "multiplicative", damped = TRUE) 
## 
##   Smoothing parameters:
##     alpha = 0.3632 
##     beta  = 0.0321 
##     gamma = 0.0141 
##     phi   = 0.98 
## 
##   Initial states:
##     l = 304.2559 
##     b = 2.7033 
##     s = 1.0191 0.9312 1.0265 1.1253 1.018 1.0257
##            0.9723 0.997 0.9821 0.9424 0.9851 0.9751
## 
##   sigma:  0.0279
## 
##      AIC     AICc      BIC 
## 4781.286 4783.175 4852.256 
## 
## Error measures:
##                  ME     RMSE      MAE       MPE     MAPE      MASE       ACF1
## Training set 2.9059 25.10059 18.74334 0.2366077 1.993423 0.3011601 -0.1394101
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 2014       2321.366 2238.340 2404.392 2194.389 2448.343
## Feb 2014       2135.288 2053.195 2217.381 2009.737 2260.838
## Mar 2014       2327.785 2231.563 2424.006 2180.627 2474.943
## Apr 2014       2236.621 2137.288 2335.954 2084.704 2388.538
## May 2014       2268.949 2160.829 2377.070 2103.594 2434.305
## Jun 2014       2182.623 2071.229 2294.018 2012.261 2352.986
## Jul 2014       2268.891 2145.125 2392.658 2079.606 2458.176
## Aug 2014       2307.530 2173.289 2441.771 2102.226 2512.834
## Sep 2014       2257.437 2117.698 2397.175 2043.724 2471.149
## Oct 2014       2380.566 2224.122 2537.011 2141.305 2619.827
## Nov 2014       2375.984 2210.584 2541.383 2123.027 2628.940
## Dec 2014       2649.063 2454.144 2843.981 2350.961 2947.165
## Jan 2015       2393.562 2207.407 2579.717 2108.863 2678.262
## Feb 2015       2200.181 2020.088 2380.275 1924.752 2475.611
## Mar 2015       2396.919 2190.814 2603.023 2081.709 2712.128
## Apr 2015       2301.540 2094.016 2509.065 1984.159 2618.922
## May 2015       2333.316 2113.073 2553.560 1996.483 2670.150
## Jun 2015       2243.144 2021.853 2464.435 1904.708 2581.580
## Jul 2015       2330.388 2090.478 2570.297 1963.477 2697.298
## Aug 2015       2368.669 2114.568 2622.770 1980.055 2757.283
## Sep 2015       2315.909 2057.380 2574.437 1920.524 2711.294
## Oct 2015       2440.850 2157.681 2724.019 2007.780 2873.919
## Nov 2015       2434.810 2141.619 2728.001 1986.413 2883.207
## Dec 2015       2713.191 2374.474 3051.908 2195.169 3231.214

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

Well the RMSE values are very similar. For the reason of, it takes the same amount of effort to get either, I would rather choose the Holt-Winters’ multiplicative method.

cat("RMSE of Multiplicative = ", accuracy(hw_multi)[2])

## RMSE of Multiplicative =  25.20381

cat("RMSE of Multiplicative & Damped = ", accuracy(hw_multi_damped)[2])

## RMSE of Multiplicative & Damped =  25.10059

Check that the residuals from the best method look like white noise.

checkresiduals(hw_multi)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' multiplicative method
## Q* = 250.64, df = 8, p-value < 2.2e-16
## 
## Model df: 16.   Total lags used: 24

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 8 in Section 3.7?

my_timeseries_train <- window(myts, end = c(2010, 12))
my_timeseries_test <- window(myts, start = 2011)

my_timesseries_train_hw <- hw(my_timeseries_train, damped = TRUE, seasonal = "multiplicative")
accuracy(my_timesseries_train_hw, my_timeseries_test)

##                      ME     RMSE      MAE       MPE     MAPE      MASE
## Training set   4.089724 25.68057 18.92510  0.371369 2.057716 0.3068053
## Test set     -27.787782 41.08034 32.12435 -1.279976 1.487404 0.5207858
##                     ACF1 Theil's U
## Training set -0.05009924        NA
## Test set      0.13519074 0.3334222

my_timeseries_train_sn <- snaive(my_timeseries_train)
accuracy(my_timeseries_train_sn,my_timeseries_test)

##                    ME      RMSE       MAE      MPE     MAPE     MASE      ACF1
## Training set 61.56787  72.20702  61.68438 6.388722 6.404105 1.000000 0.6018274
## Test set     97.44583 109.62545 100.02917 4.629852 4.751209 1.621629 0.2686595
##              Theil's U
## Training set        NA
## Test set     0.9036205

7.9. For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

boxc <- BoxCox.lambda(my_timeseries_train)
fc_boxc<-BoxCox(my_timeseries_train,boxc)
stl_boxc <- stlm(fc_boxc)
my_timeserires.boxc <- forecast(stl_boxc)
autoplot(my_timeserires.boxc)

HW 5

Amanda Arce

3/14/21