Import Libraries

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.1.2

## ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──

## ✓ tibble      3.1.6     ✓ tsibble     1.1.3
## ✓ dplyr       1.0.8     ✓ tsibbledata 0.4.1
## ✓ tidyr       1.2.0     ✓ feasts      0.3.0
## ✓ lubridate   1.8.0     ✓ fable       0.3.2
## ✓ ggplot2     3.3.5     ✓ fabletools  0.3.2

## Warning: package 'dplyr' was built under R version 4.1.2

## Warning: package 'tidyr' was built under R version 4.1.2

## Warning: package 'tsibble' was built under R version 4.1.2

## Warning: package 'tsibbledata' was built under R version 4.1.2

## Warning: package 'feasts' was built under R version 4.1.2

## Warning: package 'fable' was built under R version 4.1.2

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## x lubridate::date()    masks base::date()
## x dplyr::filter()      masks stats::filter()
## x tsibble::intersect() masks base::intersect()
## x tsibble::interval()  masks lubridate::interval()
## x dplyr::lag()         masks stats::lag()
## x tsibble::setdiff()   masks base::setdiff()
## x tsibble::union()     masks base::union()

Do exercises 8.1, 8.5, 8.6, 8.7, 8.8, 8.9 in Hyndman.

Question 8.1

Part A

Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of alpha and lambda-0, and generate forecasts for the next four months.

## Plot Pigs

aus_livestock %>% 
  filter(State == "Victoria", Animal == "Pigs") %>%
  autoplot(Count) + 
  labs(title = "Count of Pigs Slaughtered in Victoria")

fit <- aus_livestock %>% 
  filter(State == "Victoria", Animal == "Pigs") %>%
  model(ETS(Count ~ error("A") + trend("N") + season("N"))) 

opt_values <- fit %>% report()

## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##      l[0]
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

# Generate forecasts for the next four months
forecast_4_months <- fit %>%
  forecast(h = 4)

forecast_4_months %>%
  autoplot(aus_livestock) +
  ggtitle("Number of Pigs Slaughtered in Victoria")

Part B

Compute a 95% prediction interval for the first forecast using y_hat +/- 1.96*s. Where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

# get first predicted value
y_hat <- mean(forecast_4_months$.mean[1])

# augment the fit
fit_augmented <- augment(fit)

# calculate standard deviation based on residuals
s <- sd(fit_augmented$.resid)

# calculate bounds
upper <- y_hat + (s * 1.96)
lower <- y_hat - (s * 1.96)

print(upper)

## [1] 113502.1

print(lower)

## [1] 76871.01

For a 95% confidence interval, we see a lower bound of 76,871 and an upper bound of 113,502.

hilo(forecast_4_months$Count, 95)[1]

## <hilo[1]>
## [1] [76854.79, 113518.3]95

Compared with R’s values, we see they are very close! R’s lower bound is slighly less than ours, while it’s higher bound is slightly greater.

Question 8.5

Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

Part A

Plot the Exports series and discuss the main features of the data.

global_economy %>%
  filter(Country == "United Kingdom") %>%
  autoplot(Exports) +
  labs(title="Exports: United Kingdom")

There is a strong rise in exports from the United Kingdom from the late 1960s up until the late 1970s. After this, we see a major decline all the way until nearly 1990. From 1990 and beyond the UK exports market continually increases, save for some periods where macroeconomic factors create challenges, such as the global financial crisis of 2008.

Part B

Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.

uk <- global_economy %>%
  filter(Country == "United Kingdom")

fit <- uk %>%
  model(ANN = ETS(Exports ~ error('A') + trend('N') + season('N')))

forecast <- fit %>%
  forecast(h = 4)

forecast %>% autoplot(uk) +
  labs(title = 'United Kingdom Annual Exports Forecast')

Part C

Compute the RMSE values for the training data.

accuracy(fit)

From the above we can see that RSME for the fit is 1.354533.

Part D

Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.

fit2 <- global_economy %>%
  filter(Country == "United Kingdom") %>%
  model(ETS(Exports ~ error("A") + trend("A") + season("N")))

forecast2 <- fit2 %>%
  forecast(h = 4)

forecast2 %>%
  autoplot(uk) +
  labs(title="Exports: United Kingdom", subtitle = "ETS(A,A,N)")

accuracy(fit2)

The RSME of fit2 is 1.350896, which is lower than the original model ETS(A,N,N). Additionally we see that the AAN model shows a positive trend in the predictions, whereas ANN shows a flatline trend.

Part E

Compare the forecasts from both methods. Which do you think is best?

Based on the above exploration, I would choose the ETS(A,A,N) model. I would choose this over the ETS(A,N,N) model because the RSME is lower, and it also produces an upward trending prediction which is in line with the overall timeseries.

Part F

Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

# get first predicted value
y_hat <- mean(forecast2$.mean[1])

# augment the fit
fit_augmented <- augment(fit2)

# calculate standard deviation based on residuals
s <- sd(fit_augmented$.resid)

# calculate bounds
upper <- y_hat + (s * 1.96)
lower <- y_hat - (s * 1.96)

print(upper)

## [1] 33.21783

print(lower)

## [1] 27.87964

Question 8.6

Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

[Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]

china <- global_economy %>%
  filter(Country == 'China') %>%
  mutate(GDP = GDP/1e9) %>%
  select(c(Country, Code, Year, GDP))

head(china)

# GDP:  Gross domestic product (in $USD February 2019).
china %>%
  autoplot(GDP) +
  labs(y="Billions $USD", title="GDP: China") +
  guides(colour = "none")

# Get the optimized lambda value for BoxCox transformations.
lambda <- china %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)


# Experiment with various ETS() options.
comparisons <- china %>%
  model(
    ETS = ETS(GDP),
    BoxCox = ETS(box_cox(GDP, lambda)),
    Damped = ETS(GDP ~ trend('Ad', phi = 0.9)),
    Log = ETS(log(GDP))
  )

comparisons %>%
  forecast(h = 15) %>%
  autoplot(china, level = NULL) +
  labs(title = 'Chinese GDP ETS Forecast Options Comparison')

Question 8.7

Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

# Create an ETS model.
fit <- aus_production %>%
  model(fit = ETS(Gas))


report(fit)

## Series: Gas 
## Model: ETS(M,A,M) 
##   Smoothing parameters:
##     alpha = 0.6528545 
##     beta  = 0.1441675 
##     gamma = 0.09784922 
## 
##   Initial states:
##      l[0]       b[0]      s[0]    s[-1]    s[-2]     s[-3]
##  5.945592 0.07062881 0.9309236 1.177883 1.074851 0.8163427
## 
##   sigma^2:  0.0032
## 
##      AIC     AICc      BIC 
## 1680.929 1681.794 1711.389

fit %>%
  forecast(h = 4) %>%
  autoplot(aus_production)

Multiplicative Seasonality is neccessary because of the increasing levels of seasonal variation with respect to increasing trend.

# Make the trend damped.
fit <- aus_production %>%
  model(
    Damped = ETS(Gas ~ trend('Ad', phi = 0.9)),
  )

fit %>%
  forecast(h = 4) %>%
  autoplot(aus_production)

Dampening did not seem to change the predictions or prediction trend.

Question 8.8

Part A

Why is multiplicative seasonality necessary for this series?

Multiplicative Seasonality is neccessary because of the increasing levels of seasonal variation with respect to increasing trend.

Part B

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

set.seed(1234)
series <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1)) 

series_fit <- series %>%
  model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M")),
        damped = ETS(Turnover ~ error("M") + trend("Ad") + season("M"))) 

series_fit %>%
  forecast(h=20) %>%
  autoplot(series, level = NULL) +
  labs(title="Australian Retail Turnover")

The damped method has a smaller amplitude than the non-damped. Additionally it has smaller variability in terms of predictions.

Part C

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

accuracy(series_fit) %>%
  select(.model, RMSE)

The RMSE is slightly lower for the non-damped method and therefore would be my choie between the two.

Part D

Check that the residuals from the best method look like white noise.

series %>%
  model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M"))) %>%
  gg_tsresiduals() +
  ggtitle("Multiplicative Method")

Part E

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

train <- series %>%
  filter(year(Month) < 2011)

fit <- train %>%
  model(multi = ETS(Turnover ~ error("M") + trend("A") + season("M")),
        snaive = SNAIVE(Turnover))

#producing forecasts
forecast <- fit %>%
  forecast(new_data = anti_join(series, train))

## Joining, by = c("State", "Industry", "Series ID", "Month", "Turnover")

forecast %>% autoplot(series, level = NULL)

The multiplicative method does a better job here at predicting the future data points. We see that the muiltiplicative predictions follow closely the actual series data, which the snaive predictions lose all sense of trend.

Question 8.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

# Get the optimized lambda value for BoxCox transformations.
lambda <- train %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

bc_train <- train %>%
  mutate(
    bc = box_cox(Turnover, lambda)
  )

fit <- bc_train %>%
  model(
    'STL' = STL(bc ~ season(window = 'periodic'), robust = TRUE),
    'ETS' = ETS(bc)
  )

multiplicative_fit <- bc_train %>%
  model(
    'Holt Winters' = ETS(Turnover ~ error('M') + trend('A') + season('M'))
  )

accuracy(fit)

accuracy(multiplicative_fit)

The STL and ETS methods methods produced lower RSMEs than the Holt Winters method.

Data 624 Assignment 5

Alec McCabe

Import Libraries

Question 8.1

Part A

Part B

Question 8.5

Part A

Part B

Part C

Part D

Part E

Part F

Question 8.6

Question 8.7

Question 8.8

Part A

Part B

Part C

Part D

Part E

Question 8.9