library(fpp2) #I don't have time to keep trouble-shooting fpp3 package problems
library(fpp3)
library(forecast)
library(ggplot2)

Background

The purpose of this assignment was to explore the Exponential Smoothing exercises from Forecasting: Principles and Practice.


8.1

Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

  • Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of α and ℓ, and generate forecasts for the next four months.
  • Compute a 95% prediction interval for the first forecast using y±1.96s, where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

We filter for Animal = “Pigs”, State = “Victoria” and for the months from the start of 2010 until 4 months before the close of 2018:

#determine exact filtering conditions, last date was December 2018
#summary(aus_livestock$Animal)
#summary(aus_livestock$State)

#filter for Animal = "Pigs", State = "Victoria"
vic_piggies <- aus_livestock %>%
  filter(Animal == "Pigs" & State == "Victoria") %>%
    filter(Month >= yearmonth(as.Date("2010-01-01")) & Month <= yearmonth(as.Date("2018-08-01")))

With properly filtered data we proceed to generate our simple exponential smoothing model and find optimal values for α and ℓ:

#generate ETS model and determine optimal alpha, l
vic_piggies$Count %>%
    ets(model = "ANN")
## ETS(A,N,N) 
## 
## Call:
##  ets(y = ., model = "ANN") 
## 
##   Smoothing parameters:
##     alpha = 0.2509 
## 
##   Initial states:
##     l = 66303.4832 
## 
##   sigma:  6969.013
## 
##      AIC     AICc      BIC 
## 2327.637 2327.877 2335.570

Being that the ETS() gave continuous problems which hours of trouble-shooting did not resolve, I elected to proceed with the ets() function.

α = 0.2509 and ℓ = 66303.4832. Next, we generate forecasts for 4 months and plot the result:

#generate forecasts for 4 months
vic_piggies$Count %>%
    ets(model = "ANN") %>%
    forecast(h = 4) -> vic_piggies_ses

#plot the result
vic_piggies_ses %>% autoplot()

Our forecast is a simple exponential smoothing model (just a flat, level line) that appears to fit the range of the data quite well.

As a final step, we compare a computation of the 95% prediction interval for the first forecast vs. that produced by R. We start with what’s produced by R

vic_piggies_ses
##     Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 105       97520.14 88588.99 106451.3 83861.13 111179.2
## 106       97520.14 88312.21 106728.1 83437.82 111602.5
## 107       97520.14 88043.50 106996.8 83026.87 112013.4
## 108       97520.14 87782.21 107258.1 82627.25 112413.0

The last two columns of the first row is what we’re interested in:

  • Lo 95 = 83861.13
  • Hi 95 = 111179.2

Compare to the 95% prediction interval computation:

#95% prediction interval computation
s <- sd(vic_piggies_ses$residuals)
vic_piggies_ses$mean[1] + 1.96 * s #upper 95%
## [1] 110907.1
vic_piggies_ses$mean[1] - 1.96 * s #lower 95%
## [1] 84133.14

We observe a relatively small difference between our upper and lower bound and thus a vote of confidence in the accuracy of the model and the values that R produced.

8.5

Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

  • Plot the Exports series and discuss the main features of the data.
  • Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.
  • Compute the RMSE values for the training data.
  • Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.
  • Compare the forecasts from both methods. Which do you think is best?
  • Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

We filter for “Estonia”, prepare our training data by leaving the last 2 years to be forecast, and visualize Exports for Estonia:

#Filter for Exports for one country
global_economy %>% 
    filter(Country == 'Estonia') -> est_econ

#subset further for training data - 2016, 2017 to be forecast
est_econ %>%
    filter(Year <= year(as.Date("2015-01-01"))) -> est_train

#visualize the subset
est_econ %>%
  autoplot(Exports) +
  labs(y = "Exports", title = "Estonian Exports")

What we see in this series, and I have a little background as an American-Estonian is that the series did not start until the mid-90s because Estonia did not regain independence until 1991. There likely was not proper logging of data and such up until this point so it’s just not reported.

Toward the start of the series, Estonia initially experienced a decline in Exports, then a climb and decline, a fluctuating climb with a downward blip (due to the Recession) followed by incredible growth and another steady decline.

We’re dealing with yearly data where there appears to be a trend but not seasonality.

We compute the RMSE values for the training data:

#store model - AIC 143.26
est_train$Exports %>%
    ets(model = "ANN") %>% 
    forecast(h = 2) -> est_ses

#calculate corresponding RMSE
sqrt(mean(est_ses$residuals^2))
## [1] 5.731807

We observe the RMSE for our simple exponential smoothing model above and proceed to compare it with the results of a Holt model:

#ETS (AAN) model - AIC 147.76
est_train$Exports %>%
    ets(model = "AAN")  %>% 
    forecast(h = 2) -> est_holt

#RMSE
sqrt(mean(est_holt$residuals^2))
## [1] 5.800228

The RMSE values are slightly smaller for the simple model than the holt model. Being that I only cast the forecast for 2 values, it provides only a limited snapshot for each model to perform and thus I believe the Holt model would perform better with more values because it would capture the general upward trend of Estonian exports while the simple model captures the general level or average level of exports.

Next we compare forecasts via visualization:

#ETS(A,N,N) simple model:
est_ses %>% autoplot()

#ETS(A,A,N) Holt model:
est_holt %>% autoplot()

The models are seemingly identical which explains the near identical RMSE values. With this said, when we look at the actual values (below) we see that our Holt model captures the recent downward trend in Exports and when we look into the AIC score for each model, the Holt model slightly outperforms the simple model.

For these reasons, I would elect the Holt model. The addition of trend as a component makes for a more robust representative ability.

As a final step, we compare a computation of the 95% prediction interval for the first forecast vs. that produced by R. We start with what’s produced by R:

est_ses
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 57       78.60668 70.88413 86.32922 66.79606 90.41729
## 58       78.60668 67.68590 89.52746 61.90478 95.30857
est_holt
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 57        78.5536 70.29195 86.81524 65.91851 91.18869
## 58        78.5005 66.81680 90.18419 60.63183 96.36917

The last two columns of the first row is what we’re interested in:

  • Simple: Lo 95 = 66.79606, Hi 95 = 90.41729
  • Holt: Lo 95 = 65.91851, Hi 95 = 91.18869

Compared to the 95% prediction interval computation:

#95% prediction interval computation (simple)
s1 <- sd(est_ses$residuals)
est_ses$mean[1] - 1.96 * s1 #lower 95%
## [1] 67.13968
est_ses$mean[1] + 1.96 * s1 #upper 95%
## [1] 90.07368
#95% prediction interval computation (holt)
s2 <- sd(est_holt$residuals)
est_holt$mean[1] - 1.96 * s2 #lower 95%
## [1] 66.9291
est_holt$mean[1] + 1.96 * s2 #upper 95%
## [1] 90.17809

Both the simple and the Holt model calculations were within ~1 point of what was produced in R, the simple model was closer though.

8.6

Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

[Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]

#Filter for China
global_economy %>% 
    filter(Country == 'China') -> china_econ

#subset further for training data - 2015 onward to be forecast
#china_econ %>%
#    filter(Year <= year(as.Date("2014-01-01"))) -> china_train

#visualize the subset
china_econ %>%
  autoplot(GDP) +
  labs(y = "GDP", title = "Chinese Gross Domestic Product")

We observe a SERIOUS upward trend (without seasonality) to the Chinese economy and right away can rule out a simple and the Holt-Winters models.

We’ll experiment with different forms of the Holt model to see which is most accurate.

#generate forecasts for 10 years
china_econ$GDP %>%
    holt() %>%
    forecast(h = 10) -> china_holt

#plot the result
china_holt %>% autoplot()

#generate forecasts for 10 years - damped
china_econ$GDP %>%
    holt(damped = TRUE) %>%
    forecast(h = 10) -> china_holt_damped

#plot the result
china_holt_damped %>% autoplot()

#generate forecasts for 10 years - box cox
china_econ$GDP %>%
    holt(lambda = BoxCox.lambda(china_econ$GDP)) %>%
    forecast(h = 10) -> china_holt

#plot the result
china_holt %>% autoplot()

I used h = 10 (forecast for 10 years) and observed that for our Holt’s method the slope remained in-line with the most recent trend whereas for the damped method, the slope reduced with time and for the Box Cox method it took off exponentially. The Box Cox appears to provide the most dramtatic forecast, the Holt’s method appears to be intermediate, and the damped method the most conservative. Shorter term, I think the Holt method would be mot accurate whereas longer term I’d lean toward the damped method. Then again, maybe China’s GDP continues in this manner and then we’re all in for a very interesting future …

8.7

Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

To start we explore the corresponding summary statistics and output plot:

#Explore the data
summary(aus_production$Gas)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    5.00   11.25  105.00   99.21  163.75  252.00
aus_production %>%
    autoplot(Gas)

Our data follows an upward trend with seasonal fluctuations and thus it appears a Holt-Winters model (ETS = “MAM”) would be the best step forward:

timeseries <- ts(data=aus_production$Gas, frequency=4)
gas_hw <- ets(timeseries, model="MAM")
fc_gas_hw <- gas_hw %>% forecast(h=12) #12 / 4 quarters = 3 years
fc_gas_hw %>% autoplot() 

An additive model would have been preferred if the seasonal variation were constant (regardless of level). Because it’s not, and because the variation changes with proportion to level, multiplicative seasonality is preferred.

Next, we observe the impact of making the trend damped:

gas_hw_damped <- ets(timeseries, model="MAM", damped=TRUE)
fc_gas_hw_damped <- gas_hw_damped %>% forecast(h=12) #12 / 4 quarters = 3 years
fc_gas_hw_damped %>% autoplot() #visualize the impact

While the impact is not as apparent because of a relatively short forecast and “zoomed out” y-axis, damping “curbs” the magnitude of forecast values with a greater impact in time. The impact would be more apparent in time.

In order to determine whether damping improved our model we check AIC scores:

#Compare performance 
gas_hw
## ETS(M,A,M) 
## 
## Call:
##  ets(y = timeseries, model = "MAM") 
## 
##   Smoothing parameters:
##     alpha = 0.6529 
##     beta  = 0.1442 
##     gamma = 0.0978 
## 
##   Initial states:
##     l = 5.9456 
##     b = 0.0706 
##     s = 0.9309 1.1779 1.0749 0.8163
## 
##   sigma:  0.057
## 
##      AIC     AICc      BIC 
## 1680.929 1681.794 1711.389
gas_hw_damped
## ETS(M,Ad,M) 
## 
## Call:
##  ets(y = timeseries, model = "MAM", damped = TRUE) 
## 
##   Smoothing parameters:
##     alpha = 0.6489 
##     beta  = 0.1551 
##     gamma = 0.0937 
##     phi   = 0.98 
## 
##   Initial states:
##     l = 5.8589 
##     b = 0.0994 
##     s = 0.9282 1.1779 1.0768 0.8171
## 
##   sigma:  0.0573
## 
##      AIC     AICc      BIC 
## 1684.028 1685.091 1717.873

For Holt Winters:

  • alpha = 0.6529, beta = 0.1442, gamma = 0.0978
  • AIC = 1680.929, AICc = 1681.794 BIC = 1711.389

For Holt Winters damped:

  • alpha = 0.6489, beta = 0.1551, gamma = 0.0937, phi = 0.98
  • AIC = 1684.028, AICc = 1685.091, BIC = 1717.873

Being that a lower score is better, we see that Holt Winters slightly outperformed the damped version of the model across the board. We extend that damping did not improve the model. Holt Winters is the stronger model.

8.8

Recall your retail time series data (from Exercise 8 in Section 2.10).

  • Why is multiplicative seasonality necessary for this series?
  • Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.
  • Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?
  • Check that the residuals from the best method look like white noise.
  • Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?
#Revisit the retail data
set.seed(333)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

#create time series with Month, Turnover
timeseries <- ts(data=myseries$Turnover, frequency=12)

#plot the series
timeseries %>% autoplot()

Multiplicative seasonality is necessary for this series because the seasonal variation is not constant, it varies based on the level (similar to the last Q).

We apply Holt Winters multiplicative with and without damping of the trend:

#apply HW multiplicative:
retail_hw <- ets(timeseries, model="MAM")
fc_retail_hw <- retail_hw %>% forecast(h=36) #36 / 4 quarters = 3 years
fc_retail_hw %>% autoplot() 

#apply HW multiplicative (with damping):
retail_hw_damped <- ets(timeseries, model="MAM", damped=TRUE)
fc_retail_hw_damped <- retail_hw_damped %>% forecast(h=36) #36 / 4 quarters = 3 years
fc_retail_hw_damped %>% autoplot() 

Switching between the two forecasts, we clearly see the impact of damping: the 2nd (damped) forecast follows a nearly linear trend while the 1st (undamped) forecast climbs upward.

Next we compare the RMSE values for the 1st forecast of each model:

#compare RMSE of 1st forecast
sqrt(mean(fc_retail_hw$residuals^2))
## [1] 0.09928151
sqrt(mean(fc_retail_hw_damped$residuals^2))
## [1] 0.1008364

We observe a slightly higher RMSE for the damped model which is indicative of higher error and reduced accuracy. With this in mind, we elect the undamped model and proceed to check the residuals:

#check residuals of best method
checkresiduals(retail_hw)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,A,M)
## Q* = 81.106, df = 8, p-value = 2.931e-14
## 
## Model df: 16.   Total lags used: 24

From the first output plot, we observe that the residuals from the best method do, indeed, look like white noise.

Next, we split our data (training our model until the end of 2010) and find the test set RMSE:

#train model until end of 2010
myseries_train <- myseries %>%
  filter(year(Month) < 2011)

myseries_test <- myseries %>%
  filter(year(Month) >= 2011)

#create timeseries
timeseries_train <- ts(data=myseries_train$Turnover, frequency=12)
timeseries_test <- ts(data=myseries_test$Turnover, frequency=12)

#create corresponding models
retail_hw_train <- ets(timeseries_train, model="MAM")
retail_hw_test <- ets(timeseries_test, model="MAM")

#find test set RMSE
sqrt(mean(retail_hw_test$residuals^2))
## [1] 0.08166493

From this point, we observe whether our model beat the seasonal naive approach from Exercise 7 in Section 5.11:

#produce forecasts for test data
fc <- retail_hw_train %>%
  forecast(new_data = anti_join(timeseries, timeseries_train))

#observe whether model beat SNAIVE
retail_hw_train %>% accuracy()
##                       ME      RMSE       MAE        MPE    MAPE      MASE
## Training set -0.03733977 0.6889633 0.5020404 -0.9923539 7.02562 0.4540452
##                  ACF1
## Training set 0.035107
fc %>% accuracy(timeseries)
##                       ME      RMSE       MAE         MPE     MAPE      MASE
## Training set -0.03733977 0.6889633 0.5020404  -0.9923539  7.02562 0.4540452
## Test set     -3.82432232 3.9775350 3.8243223 -74.3310267 74.33103 3.4587163
##                   ACF1 Theil's U
## Training set 0.0351070        NA
## Test set     0.1770195  2.424308

The training accuracy is much improved over that of the seasonal naive approach while the test accuracy is worse. Being that oftentimes it’s the models performance on unseen / test data that carries more weight, it appears that the Holt Winters model was outperformed by the Seasonal Naive model.

8.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

We first familiarize ourselves with the STL decomposition of our training timeseries:

#STL decomposition and visualization
ts_stl <- stl(timeseries_train, s.window='periodic')
ts_stl %>% autoplot() #visualization

We perform a Box Cox transformation and ETS, and visualize the output forecast for h=10:

#ETS with Box Cox
timeseries_train %>%
    ets(lambda = BoxCox.lambda(timeseries_train)) %>%
    forecast(h=10) -> ts_bc_ets

ts_bc_ets %>% autoplot() #visualization

It appears to be a solid forecast but we really can’t extend much without hard measures of accuracy and so we re-visit the accuracy statistics for comparison:

#produce forecasts for test data
fc <- ts_bc_ets %>%
  forecast(new_data = anti_join(timeseries, timeseries_train))

#observe whether model beat SNAIVE
ts_bc_ets %>% accuracy()
##                      ME      RMSE       MAE       MPE     MAPE      MASE
## Training set 0.03998483 0.6803232 0.4939701 0.1571171 6.864127 0.4467464
##                    ACF1
## Training set 0.03852586
fc %>% accuracy(timeseries)
##                       ME      RMSE       MAE         MPE      MAPE      MASE
## Training set  0.03998483 0.6803232 0.4939701   0.1571171  6.864127 0.4467464
## Test set     -2.85971328 2.9669503 2.8597133 -55.0513431 55.051343 2.5863241
##                     ACF1 Theil's U
## Training set  0.03852586        NA
## Test set     -0.06518668  1.812422

I was a bit confused about how to use the STL decomposition in this particular cases and opted to use it to more familiarize myself with the components of the data. Hopefully this approach was correct or at least in the right direction since it produced the best performing model.

For our retail data, it appears that a Box Cox with ETS produced our best performing model.