DATA624

Week 6 Exponential Smoothing |7-Mar 13-Mar

Do exercises 8.1, 8.5, 8.6, 8.7, 8.8, 8.9 in Hyndman. Please submit both the link to your Rpubs and the .rmd file.

8.1 Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of α and ℓ0, and generate forecasts for the next four months.

# calculating by dividing count by 1000
victoria_pig <- aus_livestock %>%
  filter(
    Animal == 'Pigs',
    State == 'Victoria'
  ) %>%
  mutate(Count = Count / 1000)

fit <-  victoria_pig %>%
  model(ETS(Count ~ error('A') + trend('N') + season('N')))

fc <- fit %>%
  forecast(h = 4) 

report(fit)

## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3219175 
## 
##   Initial states:
##      l[0]
##  100.4949
## 
##   sigma^2:  87.4807
## 
##      AIC     AICc      BIC 
## 6028.040 6028.084 6041.013

α=0.322 l0=100646.6. Next, we generate forecasts for 4 months and plot the result:

Compute a 95% prediction interval for the first forecast using ^y±1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

The upper bound is 14.63 and lower bound of residuals is -14.22 in thousands. Should demonstrate the confidence interval produced by R.

y <- residuals(fit)
quantile(y$.resid, .05)

##        5% 
## -14.22564

quantile(y$.resid, .95)

##      95% 
## 14.63187

fc %>% 
  autoplot(subset(victoria_pig, Animal == 'Pigs' & State == 'Victoria')) +
  geom_line(aes(y = .fitted), col = '#D55E00',
              data = augment(fit))

8.5 Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

Plot the Exports series and discuss the main features of the data.

# here i selected bangladesh
bgExp <- global_economy %>%
  filter(Code == 'BGD')
head(bgExp)

## # A tsibble: 6 x 9 [1Y]
## # Key:       Country [1]
##   Country    Code   Year         GDP Growth   CPI Imports Exports Population
##   <fct>      <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
## 1 Bangladesh BGD    1960 4274893913. NA        NA    9.31   10.0    48199747
## 2 Bangladesh BGD    1961 4817580184.  6.06     NA   11.7    10.8    49592802
## 3 Bangladesh BGD    1962 5081413340.  5.45     NA   10.8    10.7    51030137
## 4 Bangladesh BGD    1963 5319458351. -0.456    NA   11.7     9.98   52532417
## 5 Bangladesh BGD    1964 5386054619. 11.0      NA   14.1    10.0    54129100
## 6 Bangladesh BGD    1965 5906636557.  1.61     NA   13.4     9.66   55834038

# Plot the series.
bgExp %>%
  autoplot(Exports)  +
  labs(title = 'The Bangladesh Annual Exports')

Upon revieing the series we can see downward trend from 1960 to 1975. Later it displays a steady upward trend until around 2012. We see that this is exports began to drop again.

Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.

fit <- bgExp %>%
  model(ANN = ETS(Exports ~ error('A') + trend('N') + season('N')))

bgExpForecast <- fit %>%
  forecast(h = 4)

bgExpForecast %>% autoplot(bgExp) +
  labs(title = 'Bangladesh Annual Exports Forecast')

Compute the RMSE values for the training data.

accuracy(fit)

## # A tibble: 1 x 11
##   Country    .model .type        ME  RMSE   MAE    MPE  MAPE  MASE RMSSE   ACF1
##   <fct>      <chr>  <chr>     <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 Bangladesh ANN    Training 0.0869  1.25 0.945 -0.891  12.0 0.983 0.991 0.0148

Our RMSE value for the training data is 1.253158.

Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.

modelCompare <- bgExp %>%
  model(
    ANN = ETS(Exports ~ error('A') + trend('N') + season('N')),
    AAN = ETS(Exports ~ error('A') + trend('A') + season('N'))
  )

accuracy(modelCompare)

## # A tibble: 2 x 11
##   Country    .model .type         ME  RMSE   MAE    MPE  MAPE  MASE RMSSE   ACF1
##   <fct>      <chr>  <chr>      <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 Bangladesh ANN    Training 0.0869   1.25 0.945 -0.891  12.0 0.983 0.991 0.0148
## 2 Bangladesh AAN    Training 0.00557  1.25 0.954 -1.90   12.2 0.993 0.989 0.0135

we se tgat AAN model output shows slightly lower RMSE. One could say that it is a more accurate model for this data.

Compare the forecasts from both methods. Which do you think is best?

modelCompare %>%
  forecast(h = 5) %>%
  autoplot(bgExp, level = NULL) +
  labs(title = 'The Bangladesh Annual Exports ANN Vs AAN FC Model Compare')

From analyzing the forecast models, of AAN model is better for forecasting this data i believe. The ANN forecast demonstrate leveling off of the data. It apears that does not fit the overall trend of the data. Our AAN model shows an upward trend in the data which is sits better with the existing data.

Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

standardDeviation <- modelCompare %>%
  select(Country, AAN) %>%
  accuracy() %>%
  transmute(Country, standardDeviation = RMSE)

modelCompare %>%
  select(Country, AAN) %>%
  forecast(h = 1) %>%
  left_join(standardDeviation, by = 'Country') %>%
  mutate(lowerCi = Exports - 1.96 * standardDeviation,
         upperCi = Exports + 1.96 * standardDeviation) %>%
  select(Country, Exports, lowerCi, upperCi)

## # A fable: 1 x 5 [?]
## # Key:     Country [1]
##   Country       Exports    lowerCi    upperCi  Year
##   <fct>          <dist>     <dist>     <dist> <dbl>
## 1 Bangladesh N(15, 1.7) N(13, 1.7) N(18, 1.7)  2018

8.6

Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

[Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]

chinaCty <- global_economy %>% filter(Country=='China')

fit<- chinaCty %>% model(
    nonDamped = ETS(GDP ~ error("A") + trend("A") +
                                                season("N")),
    damped = ETS(GDP ~ error("M") + trend("Ad") +
                                                season("N")),
  )
fc<- fit%>% forecast(h=20) 

fc%>% autoplot(chinaCty)  +
  guides(colour = guide_legend(title = "Forecast"))

From analyzing one could say that damped is of conservative appearance.

8.7 Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

# from the Gas data from aus_production
aus_gas <- aus_production %>%
  select(Quarter, Gas)

aus_gas %>%
  autoplot(Gas)

fit <- aus_gas %>%
  model(
    add = ETS(Gas ~ error("A") + trend("A") + season("N")),
    mult = ETS(Gas ~ error("M") + trend("A") + season("N")),
    add_sea = ETS(Gas ~ error("A") + trend("A") + season("A")),
    mult_sea = ETS(Gas ~ error("M") + trend("A") + season("M")),
    mult_sea_damp = ETS(Gas ~ error("M") + trend("Ad") + season("M"))
  )

# let us try forecast for 5 yrs 
fc <- fit %>% forecast(h = 20)

fc %>%
  autoplot(aus_gas, level=NULL) +
  labs(y="Petajoules", title="The Gas Production: Australia") +
  guides(colour = guide_legend(title = "Forecast"))

From analyzing the data is appears that multiplicative seasonality is needed because according to Hyndman “multiplicative method is preferred when the seasonal variations are changing proportional to the level of the series.” From the analysis of initial data plot and the forecast data plot, it demonstrates the seasonal nature of the time series increases proportional to the level of the series in itself.

Between the multiplicative seasonality forecasts with and without dampening, we witness small difference of visible with the dampening applied. One could say the dampening does improve the forecast as the increasing trend is not a constant.

8.8 Recall your retail time series data (from Exercise 8 in Section 2.10).

Why is multiplicative seasonality necessary for this series?

One could say that because the magnitude of the seasonal swings is increasing as the turnover values go up.

set.seed(888)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries %>%
  autoplot()

## Plot variable not specified, automatically selected `.vars = Turnover`

myseries %>% 
  model(ETS(Turnover)) %>% 
  report()

## Series: Turnover 
## Model: ETS(M,Ad,M) 
##   Smoothing parameters:
##     alpha = 0.5678226 
##     beta  = 0.01760728 
##     gamma = 0.06867204 
##     phi   = 0.9799988 
## 
##   Initial states:
##      l[0]      b[0]      s[0]     s[-1]     s[-2]    s[-3]     s[-4]    s[-5]
##  24.73819 0.3459922 0.9903715 0.8858194 0.9216037 1.026518 0.9716744 1.026713
##     s[-6]    s[-7]    s[-8]    s[-9]   s[-10]    s[-11]
##  1.012244 1.072799 1.083169 1.024279 1.013334 0.9714748
## 
##   sigma^2:  9e-04
## 
##      AIC     AICc      BIC 
## 2622.952 2624.906 2693.346

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

fit <- myseries %>%
  model(
    MAdM = ETS(Turnover ~ error('M') + trend('Ad') + season('M'))
  )

fc <- fit %>% forecast(h = 12) 
fc %>%
  autoplot(myseries)

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

fc1 <- myseries %>%
  filter(year(Month) < 2018) %>%
  model(
    MAM = ETS(Turnover ~ error('M') + trend('A') + season('M'))
  ) %>%
  forecast(h = 12) 
  
fc1 %>%
  accuracy(myseries) %>%
  select(RMSE)

## # A tibble: 1 x 1
##    RMSE
##   <dbl>
## 1  2.77

fc2 <- myseries %>%
  filter(year(Month) < 2018) %>%
  model(
    MAdM = ETS(Turnover ~ error('M') + trend('Ad') + season('M'))
  ) %>%
  forecast(h = 1) 

fc2 %>%
  accuracy(myseries) %>%
  select(RMSE)

## Warning: 1 error encountered
## [1] subscript out of bounds

## # A tibble: 1 x 1
##    RMSE
##   <dbl>
## 1 0.771

Check that the residuals from the best method look like white noise.

myseries %>% 
  model(damped = ETS(Turnover ~ error('M') + trend('Ad') + season('M'))) %>%
  gg_tsresiduals()

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

The new Seasonal Multiplicative model has a better RMSE than the NAIVE model approach.

myseries_train <- myseries %>%
  filter(year(Month) < 2011)

fit1 <- myseries_train %>%
  model(ETS = ETS(Turnover)) 

fc1 <- fit1 %>%
  forecast(new_data = anti_join(myseries, myseries_train)) %>%
  accuracy(myseries)

## Joining, by = c("State", "Industry", "Series ID", "Month", "Turnover")

fc1$RMSE

## [1] 32.8655

fit2 <- myseries_train %>%
  model(SNAIVE(Turnover)) 

fc2 <- fit2 %>%
  forecast(new_data = anti_join(myseries, myseries_train)) %>%
  accuracy(myseries)

## Joining, by = c("State", "Industry", "Series ID", "Month", "Turnover")

fc2$RMSE

## [1] 21.19945

8.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

lambda <- myseries_train %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

training_boxcox <- myseries_train %>%
  mutate(
    bc = box_cox(Turnover, lambda)
  )

fit <- training_boxcox %>%
  model(
    'STL Box-Cox' = STL(bc ~ season(window = 'periodic'), robust = TRUE),
    'ETS Box-Cox' = ETS(bc)
  )

multiplicative_best_fit <- training_boxcox %>%
  model(
    'Holt Winters Multiplicative Method' = ETS(Turnover ~ error('M') + trend('A') + season('M'))
  )

accuracy(fit)

## # A tibble: 2 x 12
##   State     Industry .model .type      ME   RMSE    MAE    MPE  MAPE  MASE RMSSE
##   <chr>     <chr>    <chr>  <chr>   <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>
## 1 Northern~ Superma~ STL B~ Trai~ 9.41e-4 0.0141 0.0107 0.0300 0.363 0.252 0.283
## 2 Northern~ Superma~ ETS B~ Trai~ 6.18e-4 0.0180 0.0144 0.0210 0.489 0.341 0.362
## # ... with 1 more variable: ACF1 <dbl>

accuracy(multiplicative_best_fit)

## # A tibble: 1 x 12
##   State Industry .model .type      ME  RMSE   MAE    MPE  MAPE  MASE RMSSE  ACF1
##   <chr> <chr>    <chr>  <chr>   <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Nort~ Superma~ Holt ~ Trai~ -0.0491  1.67  1.30 -0.175  2.50 0.337 0.349 0.129

We analyze and see that the RMSE values of the STL and ETS Box-Cox methods are 0.04560738 and 0.04964458. I notice that both these methods are more accurate than our previous ‘Holt Winters Multiplicative’ method that has an RMSE of 0.6450982.

DATA624_HW5

John Mazon

3/12/2022

Week 6 Exponential Smoothing |7-Mar 13-Mar

Do exercises 8.1, 8.5, 8.6, 8.7, 8.8, 8.9 in Hyndman. Please submit both the link to your Rpubs and the .rmd file.