Load packages and data

library(fpp3)
bacon_data <- readxl::read_excel("//Users//colinadams//Documents//GCSU//Fall 2022//Forecasting//Forecasts//Forecast 4//Bacon Price.xlsx")

Converting data to a tsibble, plotting, and multiplicative decomposition

bacon <- bacon_data %>%
  mutate(Date = yearmonth(Date)) %>%
  as_tsibble(index = Date)

bacon_train <- bacon %>%
  filter(year(Date) < 2016)

bacon_test <- bacon %>%
  filter(year(Date) > 2015)

bacon %>%
  ggplot(aes(x = Date, y = Price)) +
  geom_line(aes(y = Price)) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(y = "Price($)")
## `geom_smooth()` using formula 'y ~ x'

bacon_stl <- bacon %>%
  model(classical_decomposition(Price, type = "multiplicative")) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical Multiplicative Decomposition of Bacon Price")
bacon_stl
## Warning: Removed 6 row(s) containing missing values (geom_path).

After importing the data I turned it into a tsibble to make it usable for forecasting and the other things I want to do with it. I developed test and train data to run the accuracy() function on (shown next). I plotted the data with a linear fit to get to know what I am working with. I also did a multiplicative classical decomposition to check for seasonality. I immediately noticed an insubstantial seasonality to the data based off the decomposition. In the plot, I was able to tell a clear upward tend in the data. Again I do not see seasonality nor cycles. The data does appear to be a random walk along the upward trend.

Looking at accuracy measures for Component, Holt, and Holt-Winter’s(Damped, Additive, Multiplicative)

bacon_bunch1 <- bacon_train %>%
  stretch_tsibble(.init = 10) %>%
  model(
    Component = ETS(Price ~ error("A") + trend("N") + season("N")),
    Holt = ETS(Price ~ error("A") + trend("A") + season("N")),
    Damped = ETS(Price ~ error("A") + trend("Ad") + season("N")),
    Additive = ETS(Price ~ error("A") + trend("A") + season("A")),
    Multiplicative = ETS(Price ~ error("M") + trend("A") + season("M"))) %>%
  forecast(h = 1)
## Warning: 8 errors (2 unique) encountered for Additive
## [3] A seasonal ETS model cannot be used for this data.
## [5] Not enough data to estimate this ETS model.
## Warning: 8 errors (2 unique) encountered for Multiplicative
## [3] A seasonal ETS model cannot be used for this data.
## [5] Not enough data to estimate this ETS model.
bacon_bunch4 <- bacon_train %>%
  stretch_tsibble(.init = 10) %>%
  model(
    Component = ETS(Price ~ error("A") + trend("N") + season("N")),
    Holt = ETS(Price ~ error("A") + trend("A") + season("N")),
    Damped = ETS(Price ~ error("A") + trend("Ad") + season("N")),
    Additive = ETS(Price ~ error("A") + trend("A") + season("A")),
    Multiplicative = ETS(Price ~ error("M") + trend("A") + season("M"))) %>%
  forecast(h = 4)
## Warning: 8 errors (2 unique) encountered for Additive
## [3] A seasonal ETS model cannot be used for this data.
## [5] Not enough data to estimate this ETS model.
## 
## 8 errors (2 unique) encountered for Multiplicative
## [3] A seasonal ETS model cannot be used for this data.
## [5] Not enough data to estimate this ETS model.
accuracy(bacon_bunch1, bacon_test, list(rmse = RMSE, mae = MAE, mape = MAPE))
## Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing. 
## 182 observations are missing between 2000 Nov and 2015 Dec
## # A tibble: 5 × 5
##   .model         .type   rmse    mae  mape
##   <chr>          <chr>  <dbl>  <dbl> <dbl>
## 1 Additive       Test  0.110  0.110   1.94
## 2 Component      Test  0.0780 0.0780  1.38
## 3 Damped         Test  0.0869 0.0869  1.54
## 4 Holt           Test  0.106  0.106   1.87
## 5 Multiplicative Test  0.125  0.125   2.22
accuracy(bacon_bunch4, bacon_test, list(rmse = RMSE, mae = MAE, mape = MAPE))
## Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing. 
## 182 observations are missing between 2000 Nov and 2015 Dec
## # A tibble: 5 × 5
##   .model         .type  rmse   mae  mape
##   <chr>          <chr> <dbl> <dbl> <dbl>
## 1 Additive       Test  0.299 0.257  4.69
## 2 Component      Test  0.297 0.260  4.74
## 3 Damped         Test  0.396 0.352  6.41
## 4 Holt           Test  0.349 0.317  5.77
## 5 Multiplicative Test  0.261 0.227  4.14

After learning about the data I am using, I wanted to see the accuracy, based off primarily RMSE, of each of the exponential smoothing techniques we have covered so far. I decided to check the accuracy using cross-validation for a forecast both one period and four periods ahead. I wanted to check one period ahead as a baseline and four ahead as that was the amount I was forecasting ahead. I checked component, Holt, Damped, Holt-Winter’s (Additive), and Holt-Winter’s (Multiplicative), Using my accuracy measures (RMSE, MAE, and MAPE), I saw some conflicting results. When I used the cross-validation one period ahead I found the Component forecast as the most accurate. When I used the cross-validation four periods ahead I found the Multiplicative to be the most accurate. I decided to use the Multiplicative in my forecast in part because of this. I chose not to use the Component forecast because it only included level and the naive nature of it does not fit my observations. There is no place in my data where four observations are at the same level, like what would be given by the component forecast.

Below I calculated my fitted values based off each method listed. I plotted the data with a forecast four months ahead for each. I did this to see if my forecast followed what the actual data looked like. I also ran the ljung-box test to check for white noise in my residuals. I then checked the AICc for each as another measure of my fit and accuracy. I included the point forecast and the 80% prediction interval for each method four periods ahead. I explain some of the findings for each of them and go in depth on my chosen forecast the Holt-Winter’s Multiplicative method.

Component Form Model, Forecast, and Report

bacon_component_form <- bacon %>%
  model(ETS(Price ~ error("A") + trend("N") + season("N")))

bacon_component_form %>%
  forecast(h = 4) %>%
  autoplot(bacon) +
  geom_line(aes(y = .fitted), color = "red", data = augment(bacon_component_form)) +
  labs(y = "Bacon Price") +
  guides(color = "none")

augment(bacon_component_form) %>%
  features(.resid, ljung_box, lag = 10, dof = 0)
## # A tibble: 1 × 3
##   .model                                                     lb_stat lb_pvalue
##   <chr>                                                        <dbl>     <dbl>
## 1 "ETS(Price ~ error(\"A\") + trend(\"N\") + season(\"N\"))"    22.6    0.0123
report(bacon_component_form)
## Series: Price 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.9999 
## 
##   Initial states:
##      l[0]
##  2.750068
## 
##   sigma^2:  0.016
## 
##      AIC     AICc      BIC 
## 407.0060 407.0952 417.8344
bacon_component_form %>%
  forecast(h = 4) %>%
  hilo()
## # A tsibble: 4 x 6 [1M]
## # Key:       .model [1]
##   .model                         Date         Price .mean                  `80%`
##   <chr>                         <mth>        <dist> <dbl>                 <hilo>
## 1 "ETS(Price ~ error(\"A\")… 2022 Oct N(7.4, 0.016)  7.38 [7.220737, 7.545261]80
## 2 "ETS(Price ~ error(\"A\")… 2022 Nov N(7.4, 0.032)  7.38 [7.153538, 7.612460]80
## 3 "ETS(Price ~ error(\"A\")… 2022 Dec N(7.4, 0.048)  7.38 [7.101972, 7.664026]80
## 4 "ETS(Price ~ error(\"A\")… 2023 Jan N(7.4, 0.064)  7.38 [7.058500, 7.707498]80
## # … with 1 more variable: `95%` <hilo>
## # ℹ Use `colnames()` to see all variable names

By plotting the component form fitted values I see that it follows my data very well. You see a one period lag for each value as we would expect. The forecast is not bad. I think it is the best method for forecasting one month ahead, but I think for four periods ahead I can do better. Using the Ljung-box test, I see that my residuals are not statistically different from white noise which is good. The AICc is very similar to the other forecasts other than the Multiplicative.

Holt Method Model, Forecast, and Report

bacon_holt <- bacon %>%
  model(AAN = ETS(Price ~ error("A") + trend("A") + season("N")))

bacon_holt %>% 
  forecast(h = 4) %>%
  autoplot(bacon) +
  geom_line(aes(y = .fitted), color = "red", data = augment(bacon_holt)) +
  labs(y = "Price($)") +
  guides(color = "none")

augment(bacon_holt) %>%
  features(.resid, ljung_box, lag = 10, dof = 0)
## # A tibble: 1 × 3
##   .model lb_stat lb_pvalue
##   <chr>    <dbl>     <dbl>
## 1 AAN       23.1    0.0105
report(bacon_holt)
## Series: Price 
## Model: ETS(A,A,N) 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 0.0001103611 
## 
##   Initial states:
##      l[0]       b[0]
##  2.499874 0.01890469
## 
##   sigma^2:  0.0161
## 
##      AIC     AICc      BIC 
## 409.4910 409.7157 427.5384
bacon_holt %>%
  forecast(h = 4) %>%
  hilo()
## # A tsibble: 4 x 6 [1M]
## # Key:       .model [1]
##   .model     Date         Price .mean                  `80%`
##   <chr>     <mth>        <dist> <dbl>                 <hilo>
## 1 AAN    2022 Oct N(7.4, 0.016)  7.40 [7.239464, 7.564289]80
## 2 AAN    2022 Nov N(7.4, 0.032)  7.42 [7.191065, 7.650440]80
## 3 AAN    2022 Dec N(7.4, 0.048)  7.44 [7.158309, 7.720947]80
## 4 AAN    2023 Jan N(7.5, 0.064)  7.46 [7.133649, 7.783358]80
## # … with 1 more variable: `95%` <hilo>
## # ℹ Use `colnames()` to see all variable names

The Holt method gives good fitted values. I think the fitted values are slightly more off than the component form based off the plot. I like that the forecast takes into account the upward trend in the data, but based off the RMSE shown earlier, it is not the best method. Using the Ljung-box test, I see that my residuals are not statistically different from white noise which is good. The AICc is very similar to the other forecasts other than the Multiplicative.

Damped Method Model, Forecast, and Report

bacon_damped <- bacon %>%
  model(`Damped Holt's method` = ETS(Price ~ error("A") + trend("Ad", phi = 0.9) + season("N")))

bacon_damped %>%
  forecast(h = 4) %>%
  autoplot(bacon) +
  geom_line(aes(y = .fitted), color = "red", data = augment(bacon_damped)) +
  labs(y = "Price($)") +
  guides(color = "none")

augment(bacon_damped) %>%
  features(.resid, ljung_box, lag = 10, dof = 0)
## # A tibble: 1 × 3
##   .model               lb_stat lb_pvalue
##   <chr>                  <dbl>     <dbl>
## 1 Damped Holt's method    23.9   0.00793
report(bacon_damped)
## Series: Price 
## Model: ETS(A,Ad,N) 
##   Smoothing parameters:
##     alpha = 0.9998997 
##     beta  = 0.06998557 
##     phi   = 0.9 
## 
##   Initial states:
##      l[0]       b[0]
##  2.546801 0.08026293
## 
##   sigma^2:  0.0161
## 
##      AIC     AICc      BIC 
## 409.7725 409.9972 427.8198
bacon_damped %>%
  forecast(h = 4) %>%
  hilo()
## # A tsibble: 4 x 6 [1M]
## # Key:       .model [1]
##   .model                   Date         Price .mean                  `80%`
##   <chr>                   <mth>        <dist> <dbl>                 <hilo>
## 1 Damped Holt's method 2022 Oct N(7.4, 0.016)  7.39 [7.231083, 7.556681]80
## 2 Damped Holt's method 2022 Nov N(7.4, 0.034)  7.40 [7.166093, 7.641258]80
## 3 Damped Holt's method 2022 Dec N(7.4, 0.055)  7.41 [7.113046, 7.711933]80
## 4 Damped Holt's method 2023 Jan N(7.4, 0.077)  7.42 [7.065480, 7.775365]80
## # … with 1 more variable: `95%` <hilo>
## # ℹ Use `colnames()` to see all variable names

The Damped method shows a very similar plot to first three methods I showed. The forecast is very similar to the Holt method but the trend is slightly looser. This led to a less steep prediction interval compared to the Holt method. I think this makes it slightly worse than the Holt method. Using the Ljung-box test, I see that my residuals are not statistically different from white noise which is good. The p-value given by the ljung-box test is very low, the lowest of any method I checked. The AICc is very similar to the other forecasts other than the Multiplicative.

Holt-Winter’s Additive Method Model, Forecast, and Report

bacon_add <- bacon %>%
  model(additive = ETS(Price ~ error("A") + trend("A") + season("A")))

bacon_add %>%
  forecast(h = 4) %>%
  autoplot(bacon) +
  geom_line(aes(y = .fitted), color = "red", data = augment(bacon_add)) +
  labs(y = "Price($)") +
  guides(color = "none")

augment(bacon_add) %>%
  features(.resid, ljung_box, lag = 10, dof = 0)
## # A tibble: 1 × 3
##   .model   lb_stat lb_pvalue
##   <chr>      <dbl>     <dbl>
## 1 additive    15.5     0.116
report(bacon_add)
## Series: Price 
## Model: ETS(A,A,A) 
##   Smoothing parameters:
##     alpha = 0.9803652 
##     beta  = 0.0001007609 
##     gamma = 0.013936 
## 
##   Initial states:
##      l[0]       b[0]        s[0]       s[-1]      s[-2]     s[-3]     s[-4]
##  3.051095 0.01350681 -0.09506619 -0.03456496 0.09526095 0.1464967 0.1563638
##       s[-5]      s[-6]       s[-7]       s[-8]      s[-9]      s[-10]
##  0.07218908 0.02523609 -0.03567011 -0.08330036 -0.0904351 -0.07813024
##       s[-11]
##  -0.07837957
## 
##   sigma^2:  0.0151
## 
##      AIC     AICc      BIC 
## 404.9479 407.3479 466.3090
bacon_add %>%
  forecast(h = 4) %>%
  hilo()
## # A tsibble: 4 x 6 [1M]
## # Key:       .model [1]
##   .model       Date         Price .mean                  `80%`
##   <chr>       <mth>        <dist> <dbl>                 <hilo>
## 1 additive 2022 Oct N(7.3, 0.015)  7.34 [7.186163, 7.501558]80
## 2 additive 2022 Nov  N(7.2, 0.03)  7.23 [7.008111, 7.449813]80
## 3 additive 2022 Dec N(7.2, 0.044)  7.18 [6.911598, 7.450807]80
## 4 additive 2023 Jan N(7.2, 0.059)  7.20 [6.890578, 7.512198]80
## # … with 1 more variable: `95%` <hilo>
## # ℹ Use `colnames()` to see all variable names

The Holt-Winter’s Additive method shows the worst fitted values of any of the methods I have checked so far. The fitted values look much worse compared to the observed data than the Holt, Damped, and Component form methods. It was surprising to me that the prediction interval shows the price of bacon decreasing. This goes against the trend. While against the trend, it is possible that the level decreases. Using the ljung-box test, I see that the residuals are statistically different from white noise which confirms to me that this is the worst forecast I have tried so far. The AICc is very similar to the other forecasts other than the Multiplicative.

Holt-Winter’s Multiplicative Method Model, Forecast, and Report

bacon_multi <- bacon %>%
  model(multiplivative = ETS(Price ~ error("M") + trend("A") + season("M")))

bacon_multi %>% 
  forecast(h = 4) %>%
  autoplot(bacon) +
  geom_line(aes(y = .fitted), color = "red", data = augment(bacon_multi)) +
  labs(y = "Bacon Price") +
  guides(color = "none")

augment(bacon_multi) %>%
  features(.resid, ljung_box, lag = 10, dof = 0)
## # A tibble: 1 × 3
##   .model         lb_stat lb_pvalue
##   <chr>            <dbl>     <dbl>
## 1 multiplivative    18.5    0.0476
report(bacon_multi)
## Series: Price 
## Model: ETS(M,A,M) 
##   Smoothing parameters:
##     alpha = 0.9356425 
##     beta  = 0.000185909 
##     gamma = 0.06389516 
## 
##   Initial states:
##      l[0]       b[0]    s[0]     s[-1]    s[-2]    s[-3]    s[-4]    s[-5]
##  2.760093 0.01077252 0.98675 0.9918366 1.026525 1.032165 1.025509 1.008152
##     s[-6]   s[-7]     s[-8]    s[-9]    s[-10]    s[-11]
##  1.003132 0.99375 0.9878733 0.985513 0.9831575 0.9756359
## 
##   sigma^2:  7e-04
## 
##      AIC     AICc      BIC 
## 361.8549 364.2549 423.2159
bacon_multi %>%
  forecast(h = 4) %>%
  hilo()
## # A tsibble: 4 x 6 [1M]
## # Key:       .model [1]
##   .model             Date         Price .mean                  `80%`
##   <chr>             <mth>        <dist> <dbl>                 <hilo>
## 1 multiplivative 2022 Oct N(7.3, 0.036)  7.34 [7.097398, 7.584216]80
## 2 multiplivative 2022 Nov N(7.2, 0.064)  7.17 [6.840345, 7.490803]80
## 3 multiplivative 2022 Dec N(7.1, 0.093)  7.10 [6.709517, 7.489646]80
## 4 multiplivative 2023 Jan  N(7.1, 0.12)  7.10 [6.648480, 7.543256]80
## # … with 1 more variable: `95%` <hilo>
## # ℹ Use `colnames()` to see all variable names

Plotting the fitted values of the Holt-Winter’s Multiplicative method look very good. I like that the data is not lagged such as the component form. The fitted values appear to follow the path of bacon prices almost exactly. Surprisingly the forecast shows the price of bacon decreasing in the next four months. This is against the trend, but given the pattern of of the data is most certainly possible. The level being much above the linear fit given at the beginning of this page shows how one could think bacon price is “due” to fall soon.

I checked for white noise using the ljung-box test and the results are good. Based off the results, I can say the residuals are not statistically different from white noise. This is another reason why I believe this forecast to be the best for this data.

When I did the report() function the AICc was much lower for the multiplicative method than any of the other ones I checked. This is another reason I chose this method over the rest.

Finally, I believe the multiplicative to be the best because the variance shown in the plots are proportional to the level of the series. Only for the first half of the data is the data have homoskedasticity. This is why I believe the multiplicative approach fits this data theoretically in addition to the many tests I ran on this page to show that it is the best.

I show the point forecast as well as the 80% prediction interval above for this forecast as well. It is as follows… October 2022: PI = 7.34, 80% = (7.097398, 7.584216) November 2022: PI = 7.17, 80% = (6.840345, 7.490803) December 2022: PI = 7.10, 80% = (6.709517, 7.489646) January 2023: PI = 7.10, 80% = (6.648480, 7.543256)