Data624HW5

8.1

Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

a.

Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of α and l0 generate forecasts for the next four months.

dafit <- aus_livestock %>%
  filter(State == "Victoria",
         Animal == "Pigs") %>%
  model(ETS(Count ~ error("A") + trend("N") + season("N"))) 

fc <- dafit %>%
  forecast(h = 4)

fc %>%
  autoplot(aus_livestock) +
  ggtitle("Number of Pigs Slaughtered in Victoria")

report(dafit)

## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##      l[0]
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

The optimal values are:

α=0.3221247 l0=100646.6

Forecasts

fc

## # A fable: 4 x 6 [1M]
## # Key:     Animal, State, .model [1]
##   Animal State    .model                          Month             Count  .mean
##   <fct>  <fct>    <chr>                           <mth>            <dist>  <dbl>
## 1 Pigs   Victoria "ETS(Count ~ error(\"A\") +… 2019 Jan N(95187, 8.7e+07) 95187.
## 2 Pigs   Victoria "ETS(Count ~ error(\"A\") +… 2019 Feb N(95187, 9.7e+07) 95187.
## 3 Pigs   Victoria "ETS(Count ~ error(\"A\") +… 2019 Mar N(95187, 1.1e+08) 95187.
## 4 Pigs   Victoria "ETS(Count ~ error(\"A\") +… 2019 Apr N(95187, 1.1e+08) 95187.

b.

Compute a 95% prediction interval for the first forecast using y±1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

das <- residuals(dafit)$.resid %>% sd()
hat_y <- fc$.mean[1]

fc %>% hilo(95) %>% pull('95%') %>% head(1)

## <hilo[1]>
## [1] [76854.79, 113518.3]95

95% prediction interval: (76871.0124775, 113502.1023845) This interval is slightly smaller than the one produced by R.

8.5

Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

ind_economy <- global_economy %>% 
  filter(Country == 'India') 


ind_economy <- na.omit(ind_economy)

a.

Plot the Exports series and discuss the main features of the data.

ind_economy %>%
  autoplot(Exports) +
  labs(y="% of GDP", title="Exports: India")

There is an increasing trend with no seasonality. There is an increase and decrease from 1960 to early 2000’s and steady increase from 2000’s to early 2000’s and a decline and increase in the early 2000’s and then in late 2000’s increase and decrease and after late 2010’s decrease. The early 1990s can be attributed to a recession.

b.

Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.

fit_ind <-ind_economy %>%
   model(ETS(Exports ~ error("A") + trend("N") + season("N"))) 
# forecast
fc_ind <- fit_ind %>%
  forecast(h = 5)

fc_ind %>%
  autoplot(ind_economy) +
  labs(y="% of GDP", title="Exports: India", subtitle = "ETS(A,N,N)")

fc_ind %>%
  autoplot(ind_economy) +
  geom_line(aes(y = .fitted), col="#D55E00",
            data = augment(fit_ind)) +
  labs(y="% of GDP", title="Exports: Inia") +
  guides(colour = "none")

c.

Compute the RMSE values for the training data.

fit_ind %>% accuracy()

## # A tibble: 1 × 11
##   Country .model         .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE    ACF1
##   <fct>   <chr>          <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 India   "ETS(Exports … Trai… 0.258  1.20 0.810  2.15  7.32 0.982 0.991 -0.0354

d.

Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.

fit_compare <- ind_economy %>%
  model(
    ANN = ETS (Exports ~ error("A") + trend("N") + season("N")),
    AAN = ETS (Exports ~ error("A") + trend("A") + season("N"))
    )
accuracy(fit_compare)

## # A tibble: 2 × 11
##   Country .model .type         ME  RMSE   MAE   MPE  MAPE  MASE RMSSE    ACF1
##   <fct>   <chr>  <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 India   ANN    Training  0.258   1.20 0.810 2.15   7.32 0.982 0.991 -0.0354
## 2 India   AAN    Training -0.0524  1.19 0.748 0.206  7.08 0.908 0.978 -0.0295

I believe that the ETS(A,A,N) method is better since it captures the increasing trend in the data and has a lower RMSE.

e.

Compare the forecasts from both methods. Which do you think is best?

fit_compare %>% 
  forecast(h=4) %>% 
  autoplot(ind_economy, level=NULL) +
  labs(title="Forecast Comparison",
       subtitle = "India Exports")

### f. Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

The confidence interval of the ETS(A,N,N) method:

#get the mean
mean_ind <- fc_ind$.mean[1]

#get the residuals
aind_fit <- augment(fit_ind)

#get standard dev using residuals
s <- sd(aind_fit$.resid)

# Calculate the 95% prediction intervals
upper_95 <- mean_ind + (s * 1.96)
lower_95 <- mean_ind - (s * 1.96)

#lower interval
lower_95

## [1] 16.72397

upper_95

## [1] 21.36682

R generated 95% prediction interval

fc_hilo <- fc_ind %>% hilo()

# Output model interval values
fc_hilo$`95%`[1]

## <hilo[1]>
## [1] [16.64717, 21.44362]95

The two intervals are not identical beyond the first two digits.

8.6

Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

china <- global_economy %>%
  filter(Country == "China")

china %>% autoplot(GDP) +
  labs(title="Chinese GDP")

The Chinese GDP data shows a strong upward trend, with no evidence of a cycle or season. GDP is almost stagnant from the year 1960 up until mid 1990s and then shows an incredible increase from there on wards with a minor stagnant point around 2015.

dcmp <- china %>%
  model(stl = STL(GDP))

components(dcmp) %>% autoplot()

dcmp <- china %>%
  model(stl = STL(GDP))

components(dcmp) %>% autoplot()

Box-Cox transformation for eliminating the non-constant variance

lambda <- china %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)


fitc <- china %>% 
  model(
    
    ETS = ETS(GDP),
    `Log` = ETS(log(GDP)),
    `Damped` = ETS(GDP ~ trend("Ad")),
    `Box-Cox` = ETS(box_cox(GDP, lambda)),
    `Box-Cox, Damped` = ETS(box_cox(GDP, lambda) ~ trend("Ad"))
)

fitc %>%
  forecast(h="20 years") %>%
  autoplot(china, level = NULL)+
  labs(title="20 Year Forecast",
       subtitle= "Chinese GDP") +
  guides(colour = guide_legend(title = "Forecast"))

Based on the plot, Box-Cox and Log transformed forecast appear slightly to over-forecast for the Chinese GDP data. The generic ETS model falls between damped models which are simple damped model of the ETS, and a Box-Cox with dampening. The dampened forecast plots exhibit slower growth than the transformed forecasts where the transformed forecasts resemble continued exponential growth.

8.7

Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

aus_production %>% autoplot(Gas)+
  labs(title="Austrailian Gas Production")

Multiplicative seasonality is needed in this case because there is seasonal variation that increases with time.

fitaus <- aus_production %>%
  model(
    # Multiplicative
    Multiplicative = ETS(Gas ~ error("M") + trend("A") + season("M")),
    # Damped multiplicative
    `Multiplicative, Damped` = ETS(Gas ~ error("M") + trend("Ad") + season("M"))
  )
fcaus <- fitaus %>% forecast(h = "5 years")

fcaus %>%
  autoplot(aus_production, level = NULL) +
  labs(title="Australian Gas Production") +
  guides(colour = guide_legend(title = "Forecast"))

report(fitaus)

## Warning in report.mdl_df(fitaus): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.

## # A tibble: 2 × 9
##   .model                  sigma2 log_lik   AIC  AICc   BIC   MSE  AMSE    MAE
##   <chr>                    <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 Multiplicative         0.00324   -831. 1681. 1682. 1711.  21.1  32.2 0.0413
## 2 Multiplicative, Damped 0.00329   -832. 1684. 1685. 1718.  21.1  32.0 0.0417

8.8

Recall your retail time series data (from Exercise 8 in Section 2.10).

set.seed(123)
ausseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

a.

Why is multiplicative seasonality necessary for this series?

With the data from gas production, we can observe that these data have an increasing trend, seasonality as well as variation. As discussed earlier, the Multiplicative method is preferred when the seasonal variations are changing proportional to the level of the series.

decomp <- ausseries %>%
  model(stl = STL(Turnover))

components(decomp) %>% autoplot()

compare the Multiplicative and Additive methods.

fitaus <- ausseries %>%
  model(
    "Additive" = ETS(Turnover ~ error('A') + trend('A') + season('A')),
    "Multiplicative" = ETS(Turnover ~ error('M') + trend('A') + season('M'))
  )

fcaus <- fitaus %>%
  forecast(h = 10)

fcaus %>% autoplot(ausseries, level = NULL) +
  labs(title = "Household Goods Turnover") +
  guides(colour = guide_legend(title = "Forecasts"))

accuracy(fitaus)

## # A tibble: 2 × 12
##   State Industry .model .type      ME  RMSE   MAE    MPE  MAPE  MASE RMSSE  ACF1
##   <chr> <chr>    <chr>  <chr>   <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Vict… Househo… Addit… Trai… -0.0163  26.4  20.0 -0.336  4.04 0.517 0.529 0.244
## 2 Vict… Househo… Multi… Trai…  2.13    24.3  18.3  0.125  3.40 0.473 0.488 0.316

b.

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

fitaus1 <- ausseries %>%
  model(
    "Holt-Winter" = ETS(Turnover ~ error('M') + trend('Ad') + season('M')),
    "Damped Holt's Method" = ETS(Turnover ~ error('A') + trend('Ad') + season('M'))
  )

fcaus1 <- fitaus1 %>%
  forecast(h = 15)

fcaus1 %>% autoplot(ausseries, level = NULL) +
  labs(title = "Household Goods Turnover") +
  guides(colour = guide_legend(title = "Forecasts"))

c.

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

accuracy(fitaus1)

## # A tibble: 2 × 12
##   State   Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE   ACF1
##   <chr>   <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 Victor… Househo… Holt-… Trai…  2.37  23.7  17.1 0.331  3.12 0.441 0.475 0.0862
## 2 Victor… Househo… Dampe… Trai…  2.35  23.6  16.9 0.258  3.16 0.436 0.472 0.0250

From the above data we can see damped methods(less) seems to be slightly better than the Holt-Winter method.

d.

Check that the residuals from the best method look like white noise.

aus_fit <- ausseries %>%
  model(
    "Damped Holt's Method" = ETS(Turnover ~ error('A') + trend('Ad') + season('M'))
  )

aus_fc <- aus_fit %>%
  forecast(h = 15)

aus_fit %>% gg_tsresiduals()

Residuals on the time plot show a slight increase in variability from the year 2000 and on. There is some correlation on the ACF of the residuals but the histogram seems to be close to normal.

e.

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

ausseries_train <- ausseries %>%
  filter(year(Month) < 2011)

aus_train <- ausseries_train %>%
  model(
    "SNAIVE" = SNAIVE(Turnover),
    "Damped Holt's Method" = ETS(Turnover ~ error('A') + trend('Ad') + season('M'))
  )

fcaus_train <- aus_train %>%
  forecast(h = 15)

fcaus_train %>% autoplot(ausseries_train, level = NULL) +
  labs(title = "Household Goods Turnover") +
  guides(colour = guide_legend(title = "Forecasts"))

fcaus_train %>% accuracy(ausseries)

## # A tibble: 2 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Damped H… Vict… Househo… Test  -28.2  41.8  33.6 -3.39  3.92 0.947 0.918 0.660
## 2 SNAIVE    Vict… Househo… Test   33.7  42.3  36.0  3.73  4.00 1.02  0.928 0.411

From the above graph and accuracy metrics we can conclude that the damped method is superior to the seasonal naive approach.

8.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

lambda2 <- ausseries_train %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

ausseries_train_bx <- ausseries_train
ausseries_train_bx$Turnover <- box_cox(ausseries_train$Turnover,lambda2)

decompaus <- ausseries_train_bx %>%
  model(stl = STL(Turnover))

components(decompaus) %>% autoplot()

aus_train2 <- ausseries_train_bx %>%
  model(
    "Damped Holt's Method" = ETS(Turnover ~ error('A') + trend('Ad') + season('M'))
  )

fcaus_train2 <- aus_train2 %>%
  forecast(h = 15)

fcaus_train2 %>% autoplot(ausseries_train_bx, level = NULL) +
  labs(title = "Household Goods Turnover") +
  guides(colour = guide_legend(title = "Forecasts"))

accuracy(aus_train)

## # A tibble: 2 × 12
##   State  Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE    ACF1
##   <chr>  <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 Victo… Househo… SNAIVE Trai… 25.1   45.6  35.4 5.05   7.29 1     1      0.695 
## 2 Victo… Househo… Dampe… Trai…  2.12  20.3  14.8 0.315  3.33 0.417 0.445 -0.0345

accuracy(aus_train2)

## # A tibble: 1 × 12
##   State Industry .model .type     ME  RMSE   MAE   MPE  MAPE  MASE RMSSE    ACF1
##   <chr> <chr>    <chr>  <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 Vict… Househo… Dampe… Trai… 0.0255 0.240 0.182 0.143  1.16 0.435 0.466 0.00910

According to accuracy metrics above, it appears that the model improved with the Box-Cox transformation. Based on the RMSE values, the Box-Cox STL model is the best performing out of the three with an RMSE of 0.048.

Data624HW5

Jaya Veluri

3/3/2024

8.1

a.

b.

8.5

a.

b.

c.

d.

e.

8.6

8.7

8.8

a.

b.

c.

d.

e.

8.9