BrianSingh_Data624

library(fpp3)

## ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.4
## ✔ dplyr       1.1.3     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.0     ✔ feasts      0.3.1
## ✔ lubridate   1.9.3     ✔ fable       0.3.3
## ✔ ggplot2     3.4.4     ✔ fabletools  0.3.4

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(dplyr)
library(ggplot2)

8.1 Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

a. Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of α and ℓ 0, and generate forecasts for the next four months.

aus_pigs <- aus_livestock %>%
    filter(Animal == "Pigs" & State == "Victoria")

fit<-aus_pigs%>%
  model(ETS(Count~error('A')+trend('N')+season('N')))
report(fit)

## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##      l[0]
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

forecast <- fit %>%
    forecast(h=4)

forecast %>%
    autoplot(filter(aus_pigs,Month>=yearmonth('2016 Jan')))

### b. Compute a 95% prediction interval for the first forecast using ^y ± 1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

y <- forecast$.mean[1] #first prediction
res <- sd(augment(fit)$.resid)
upper_lim <- y+res*1.96
lower_lim <- y-res*1.96
cat(lower_lim, upper_lim)

## 76871.01 113502.1

#r's forecast
hilo(forecast$Count, 95)

## <hilo[4]>
## [1] [76854.79, 113518.3]95 [75927.17, 114445.9]95 [75042.22, 115330.9]95
## [4] [74194.54, 116178.6]95

8.5 Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

a. Plot the Exports series and discuss the main features of the data.

gt_econ <- global_economy %>%
    filter(Country == "Guyana")

gt_econ %>% autoplot(Exports)

There is no trend or seasonality in the data filtered for Guyana.

b. Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.

gt_fit<-gt_econ%>%
  model(ETS(Exports~error('A')+trend('N')+season('N')))
report(fit)

## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##      l[0]
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

gt_fc <- gt_fit %>%
    forecast(h=4)

gt_fc %>%
    autoplot(global_economy)

### c. Compute the RMSE values for the training data.

accuracy(gt_fit)$RMSE

## [1] 10.85138

d. Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.

gt_fit2<-gt_econ%>%
  model(ETS(Exports~error('A')+trend('A')+season('N')))
report(fit)

## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##      l[0]
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

gt_fc2 <- gt_fit2 %>%
    forecast(h=4)

gt_fc2 %>%
    autoplot(global_economy)

accuracy(gt_fit2)$RMSE

## [1] 10.86057

The ETS(A,A,N) model’s RMSE is slightly higher, meaning the ETS(A,N,N) is a better fit and forecast.

e. Compare the forecasts from both methods. Which do you think is best?

The ETS(A,A,N) model shows a slightly downwards forecast while the ETS(A,N,N) model shows a flat forecast. The ETS(A,A,N) model may be best, given that the RMSE’s are almost identical, in that it takes into account the trend factors.

f. Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

#EST(A N N)
y_ann <- gt_fc$.mean[1] #first prediction
res_ann <- sd(augment(gt_fit)$.resid)
upper_lim_ann <- y_ann+res_ann*1.96
lower_lim_ann <- y_ann-res_ann*1.96
cat(lower_lim_ann, upper_lim_ann)

## 22.90632 65.8142

#r's forecast
hilo(gt_fc$Exports, 95)

## <hilo[4]>
## [1] [22.715486, 66.00503]95 [13.751466, 74.96905]95 [ 6.872926, 81.84759]95
## [4] [ 1.073978, 87.64654]95

#EST(A A N)
y_ann2 <- gt_fc2$.mean[1] #first prediction
res_ann2 <- sd(augment(gt_fit2)$.resid)
upper_lim_ann2 <- y_ann+res_ann2*1.96
lower_lim_ann2 <- y_ann-res_ann2*1.96
cat(lower_lim_ann2, upper_lim_ann2)

## 22.88766 65.83286

#r's forecast
hilo(gt_fc2$Exports, 95)

## <hilo[4]>
## [1] [22.1828700, 66.30413]95 [12.9283218, 75.32517]95 [ 5.7986167, 82.22138]95
## [4] [-0.2312906, 88.01778]95

Forecasts are very similar between the calculated lower and upper limits and R’s calculated.

8.6 Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

china_econ <- global_economy %>%
    filter(Country == "China")

china_econ%>%autoplot()

## Plot variable not specified, automatically selected `.vars = GDP`

fit_china <- china_econ %>%
  model(SES = ETS(GDP ~ error("A") + trend("N") + season("N")),
    Holt = ETS(GDP ~ error("A") + trend("A") + season("N")),
    Damped = ETS(GDP ~ error("A") + trend("Ad") + season("N")))

china_fc <- fit_china%>%
    forecast(h=15)

china_fc %>%
    autoplot(china_econ,level=NULL)

#box cox
lambda <- china_econ %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

fit_china2 <- china_econ %>%
  model(
    SES = ETS(box_cox(GDP, lambda) ~ error("A") + trend("N") + season("N")),
    Holt = ETS(box_cox(GDP, lambda) ~ error("A") + trend("A") + season("N")),
    Damped = ETS(box_cox(GDP, lambda) ~ error("A") + trend("Ad") + season("N"))
  )

china_fc2 <- fit_china2 %>% 
    forecast(h = 20)

china_fc2 %>%
  autoplot(china_econ, level=NULL)

The box cox normalizes the data and produces forecasts that are more realistic in nature compared to the simple exponential smoothing techniques above. The Holt method is taking the steadily increasing trend and forecasting potentially unsustainable growth, while the damped method is a bit more conservative and may be a better baseline for estimation.

8.7 Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

aus_production %>%
    autoplot(Gas)

aus_fit <- aus_production %>%
  model(
    Holt = ETS(Gas ~ error("A") + trend("A") + season("N")),
    Multiplicative = ETS(Gas ~ error("M") + trend("A") + season("M")),
    Damped_mult = ETS(Gas ~ error("M") + trend("Ad") + season("M"))
  )

aus_fc <- aus_fit %>% 
    forecast(h = 20)

aus_fc %>%
  autoplot(aus_production, level=NULL)

We can see the holt method definitely does not follow the seasonality upwards trend in this instance.

accuracy(aus_fit)$RMSE

## [1] 15.698180  4.595113  4.591840

We can see the Holt model at an RMSE of 15.69, multiplicative at 4.595 and damped multiplicative at 4.591, being the lowest of the three, meaning an improved forecast. Multiplicative in general is better suited when there are steady increasing seasonal trends in the data.

8.8 Recall your retail time series data (from Exercise 7 in Section 2.10).

set.seed(789)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

autoplot(myseries)

## Plot variable not specified, automatically selected `.vars = Turnover`

### a. Why is multiplicative seasonality necessary for this series? As mentioned above, multiplicative seasonality is better suited when there are seasonal variations proportional to the time series, in this case, increasing turnover with the increase in time.

b. Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

myseries_fit <- myseries %>%
  model(
    Multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M")),
    Damped_mult = ETS(Turnover ~ error("M") + trend("Ad") + season("M"))
  )

myseries_fc <- myseries_fit %>% 
    forecast(h = 30)

myseries_fc %>%
  autoplot(myseries, level=NULL)

### c. Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

accuracy(myseries_fit)$RMSE

## [1] 0.5893550 0.5869105

The damped multiplicative RMSE is slightly lower than the multiplicative, indicating potentially better results.

d. Check that the residuals from the best method look like white noise.

best_method <- myseries %>%
  model(ETS(Turnover ~ error("M") + trend("Ad") + season("M")))

best_method %>% gg_tsresiduals()

The plot appears to be white noise, with a mean of the residuals centered around zero and constant variance throughout the time series.

f. Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

myseries_train <- myseries |>
  filter(year(Month) < 2011)

mod <- myseries_train %>%
    model('Multiplicative' = ETS(Turnover ~ error("M") + trend("A") + season("M")),
        'Seasonal Naive' = SNAIVE(Turnover))

fc <- mod %>%
    forecast(new_data=anti_join(myseries,myseries_train))

## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`

fc %>% autoplot(myseries,level = NULL)

accuracy(mod)$RMSE

## [1] 0.4222820 0.6751932

The multiplicative method has what appears to be a much smaller RMSE, though based on the plots, it seems the seasonal naive follows the actual data in a better way. Nonetheless, the multiplicative may be more accurate in the long run, given the more recent increases in seasonality compared to years’ past.

8.9 For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

lambda2 <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

stl_dcmp <- myseries %>%
  model(bc = STL(box_cox(Turnover,lambda2) ~ season(window = "periodic"), robust = TRUE))

stl_dcmp %>% components()%>%autoplot()

accuracy(stl_dcmp)

## # A tibble: 1 × 12
##   State Industry .model .type     ME  RMSE   MAE     MPE  MAPE  MASE RMSSE  ACF1
##   <chr> <chr>    <chr>  <chr>  <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Aust… Other r… bc     Trai… 0.0436 0.519 0.323 -0.0298  6.17 0.553 0.604 0.422

The RMSE of the STL decomposition model of 0.52 is a bit higher than the multiplicative method, but smaller than the seasonal naive, therefore, it appears to perform better than the seasonal naive and worse than the multiplicative above.

BrianSingh_Data624_HW5

Brian Singh

2024-03-03