hw5

exercises 8.1, 8.5, 8.6, 8.7, 8.8, 8.9

Exercise 1:

Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

0.3221247

Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of a and l0 and generate forecasts for the next four months.
1. a’s optimal value should be 0.3221247
2. l0 is 100646.6
Compute a 95% prediction interval for the first forecast using ^y±1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.
1. the interval is 76871 to 113502
2. with hilo, it has wider range compared to the above

library(fpp3)

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.4     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

pigs <- aus_livestock |>
  filter(Animal == 'Pigs' & State == 'Victoria')

pigs |>
  autoplot()

## Plot variable not specified, automatically selected `.vars = Count`

fit <- pigs |>
  model(AAN = ETS(Count ~ error('A') + trend('N') + season('N'))) |>
  report()

## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##      l[0]
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

fomonths <- fit |>
  forecast(h = 4)
fomonths

library(forecast)

## Warning: package 'forecast' was built under R version 4.4.3

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

y_hat <- fomonths |>
  pull(Count)

s <- augment(fit) |>
  pull(.resid) |>
  sd()

# Calculate the lower and upper confidence intervals. 
lowerCi <- y_hat - 1.96 * s
upperCi <- y_hat + 1.96 * s
results <- c(lowerCi, upperCi)
results

## <distribution[8]>
## [1] N(76871, 8.7e+07)  N(76871, 9.7e+07)  N(76871, 1.1e+08)  N(76871, 1.1e+08) 
## [5] N(113502, 8.7e+07) N(113502, 9.7e+07) N(113502, 1.1e+08) N(113502, 1.1e+08)

hilo(fomonths$Count, 95)

## <hilo[4]>
## [1] [76854.79, 113518.3]95 [75927.17, 114445.9]95 [75042.22, 115330.9]95
## [4] [74194.54, 116178.6]95

Exercise 5

Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

Plot the Exports series and discuss the main features of the data.
- There isn’t particular pattern in the data, but steadily going up and downs, then decreases drastcally on 1990, then goes up.
Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.
Compute the RMSE values for the training data.
- RMSE is 5.86
Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.
- AAN is a bit higher than the RMSE of ANN, so less accurate model.
Compare the forecasts from both methods. Which do you think is best?
- ANN seems to perform better than AAN for forecasting, since its above of AAN.
Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.
- The interval for RMSE seems to be smaller than the R calculation

alg  <- global_economy |>
  filter(Country == "Algeria") 

alg |>
  autoplot(Exports)

fit <- alg |> 
  model(ANN = ETS(Exports ~ error('A') + trend('N') + season('N')))

forefit <- fit |>
  forecast(h = 5)

forefit |>
  autoplot(alg)

accuracy(fit)

compare <- alg |>
  model(
    ANN = ETS(Exports ~ error('A') + trend('N') + season('N')),
    AAN = ETS(Exports ~ error('A') + trend('A') + season('N'))
  )

accuracy(compare)

compare |>
  forecast(h = 5) |>
  autoplot(alg, level = NULL)

standardDeviation <- compare |>
  select(Country, AAN) |>
  accuracy() |>
  transmute(Country, standardDeviation = RMSE)

compare |>
  select(Country, AAN) |>
  forecast(h = 1) |>
  right_join(standardDeviation, by = 'Country') |>
  mutate(lowerCi = Exports - 1.96 * standardDeviation,
         upperCi = Exports + 1.96 * standardDeviation) |>
  select(Country, Exports, lowerCi, upperCi)

Exercise 6

Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

chinaGDP <- global_economy |>
  filter(Country == 'China')

chinaGDP |>
  autoplot(GDP)

lambda <- chinaGDP |>
  features(GDP, features = guerrero) |>
  pull(lambda_guerrero)

chinaETS <- chinaGDP |>
  model(
    ETS = ETS(GDP),
    ETSLog = ETS(log(GDP)),
    ETSBoxCox = ETS(box_cox(GDP, lambda))
  )

chinaETS |>
  forecast(h = 15) |>
  autoplot(chinaGDP, level = NULL)

Exercise 7

Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

Multiplicative seasonality is necessary here because the seasonal variation trends are upward.
with damped and non damped, the ets(damped) doesn’t really improve the forecast.

aus_production |>
  autoplot(Gas)

fit <- aus_production |>
  select(Gas) |>
  model(ETS(Gas)) |>
  report(fit)

## Series: Gas 
## Model: ETS(M,A,M) 
##   Smoothing parameters:
##     alpha = 0.6528545 
##     beta  = 0.1441675 
##     gamma = 0.09784922 
## 
##   Initial states:
##      l[0]       b[0]      s[0]    s[-1]    s[-2]     s[-3]
##  5.945592 0.07062881 0.9309236 1.177883 1.074851 0.8163427
## 
##   sigma^2:  0.0032
## 
##      AIC     AICc      BIC 
## 1680.929 1681.794 1711.389

fit <- aus_production |>
  model(fit = ETS(Gas  ~ trend('Ad', phi = 1)))

fit |>
  forecast(h = 5) |>
  autoplot(aus_production)

Exercise 8

Recall your retail time series data (from Exercise 7 in Section 2.10).
1. Why is multiplicative seasonality necessary for this series?
  - multiplicative seasonality is necessary for this series because the data’s up and down increases overtime
2. Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.
3. Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?
  - Multiplicative method has lower RMSE, so I would prefer this over trend damped since more accurate.
4. Check that the residuals from the best method look like white noise.
5. Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?
  - Both method beats seasonal naive method, since they are more accurate seeing the plot.
```
set.seed(12345678)
myseries  <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

myseries  |> autoplot(Turnover)
```
```
fit <- myseries |>
  model(
    'Holt Winters Multiplicative Method' = ETS(Turnover ~ error('M') + trend('A') + season('M')),
    'Holt Winters Trend Damped' = ETS(Turnover ~ error('M') + trend('Ad') + season('M')),
    'Seasonal Naive' = SNAIVE(Turnover)
  )

HoltWinters <- fit |>
  forecast(h = 15)

HoltWinters |> 
  autoplot(myseries, level = NULL)
```
```
accuracy(fit) |>
  select('.model', 'RMSE')
```
```
fit |>
  select('Holt Winters Multiplicative Method') |>
  gg_tsresiduals()
```
```
myseries_train <- myseries |>
  filter(year(Month) <= 2010)

comparison <- anti_join(myseries, myseries_train)
```
```
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
```
```
results <- fit |>
  forecast(comparison)

autoplot(comparison, Turnover) +
  autolayer(results, level = NULL) 
```

Exercise 9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

The RMSE for STL and ETS box cox are 0.0785 and 0.0994, which makes them more accurate than holt method that is 0.5178

lambda <- myseries_train |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)

training_boxcox <- myseries_train |>
  mutate(bc = box_cox(Turnover, lambda))

fit <- training_boxcox |>
  model('STL Box-Cox' = STL(bc ~ season(window = 'periodic')),
    'ETS Box-Cox' = ETS(bc))

accuracy(fit)

fitting <- training_boxcox |>
  model('Holt Winters Multiplicative Method' = ETS(Turnover ~ error('M') + trend('A') + season('M')))

accuracy(fitting)

hw5

Ali Ahmed

2025-03-05