Assignment 5

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.4.1

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.3     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1

## Warning: package 'tsibble' was built under R version 4.4.1

## Warning: package 'feasts' was built under R version 4.4.1

## Warning: package 'fabletools' was built under R version 4.4.1

## Warning: package 'fable' was built under R version 4.4.1

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(ggplot2)
library(dplyr)
library(tsibble)
library(fable)
library(feasts)

8.1 Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of α and ℓ0,and generate forecasts for the next four months. Compute a 95% prediction interval for the first forecast using y hat ±1.96 s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

pigs<-aus_livestock|>
  filter(Animal=="Pigs", State=="Victoria")
pigs|>
  autoplot()+
  labs(y="Count", title = "Pigs Slaughtered in Victoria")

## Plot variable not specified, automatically selected `.vars = Count`

fit<-pigs|>
  model(ETS(Count~error("A")+ trend("N")+season("N")))

report(fit)

## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##      l[0]
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

fc<-fit|>
  forecast(h=4) #next four seasons

fc|>
  autoplot(pigs)+
  geom_line(aes(y=.fitted), col= "#D55E00",
              data=augment(fit))+
  labs(y="Count", title="Simple Exponential Pigs in Victoria Four Months Forcast")+
  guides(colour="none")

# 95% prediction interval

pigs_sd<-sd(augment(fit)$.resid)

upper<-fc$.mean[1]+(pigs_sd*1.96)
lower<-fc$.mean[1]-(pigs_sd*1.96)

sprintf("Intervals:[%f,%f]",lower,upper) #print the intervals

## [1] "Intervals:[76871.012478,113502.102384]"

fc|>
  hilo(95)|>
  head(1)|>
  select("95%")

## # A tsibble: 1 x 2 [1M]
##                    `95%`    Month
##                   <hilo>    <mth>
## 1 [76854.79, 113518.3]95 2019 Jan

The lower interval was slightly lower and upper interval was slightly higher when using hilo in R, compared to manually calculating the 95% interval.

8.5 Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

Plot the Exports series and discuss the main features of the data. Use an ETS(A,N,N) model to forecast the series, and plot the forecasts. Compute the RMSE values for the training data. Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set. Compare the forecasts from both methods. Which do you think is best? Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

ExportsArg<-global_economy|>
  filter(Country== "Argentina")

ExportsArg|>
  autoplot(Exports)+
  labs(title = "Argentina Export")

#ETS(A,N,N)
Exportsfit<-ExportsArg|>
  model(ETS(Exports~error("A")+trend("N")+season("N")))

report(Exportsfit)

## Series: Exports 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.8855249 
## 
##   Initial states:
##      l[0]
##  7.407428
## 
##   sigma^2:  7.9819
## 
##      AIC     AICc      BIC 
## 359.9466 360.3911 366.1279

Expfc<-Exportsfit|>
  forecast(h=5)

Expfc|>
  autoplot(ExportsArg)+
  geom_line(aes(y=.fitted), col= "#D55E00",
              data=augment(Exportsfit))+
  labs(y="Exports", title="Argentina Exports Forecast ANN")+
  guides(colour="none")

#RMSE
print(accuracy(Exportsfit)$RMSE)

## [1] 2.776087

#AAN
Exportsfit2<-ExportsArg|>
  model(ETS(Exports~error("A")+trend("A")+season("N")))

report(Exportsfit2)

## Series: Exports 
## Model: ETS(A,A,N) 
##   Smoothing parameters:
##     alpha = 0.8629392 
##     beta  = 1e-04 
## 
##   Initial states:
##      l[0]       b[0]
##  6.794833 0.07159122
## 
##   sigma^2:  8.2795
## 
##      AIC     AICc      BIC 
## 363.9608 365.1147 374.2630

Expfc2<-Exportsfit2|>
  forecast(h=5)

Expfc2|>
  autoplot(ExportsArg)+
  geom_line(aes(y=.fitted), col= "#D55E00",
              data=augment(Exportsfit))+
  labs(y="Exports", title="Argentina Exports Forecast AAN")+
  guides(colour="none")

accuracy(Exportsfit2)$RMSE

## [1] 2.776427

The RMSE results for ETS(A,A,N) were slightly higher than the value for ETS (A,N,N), ETS(A,A,N) value was higher by 0.00034. When comparing the forecast for ETS(A,N,N) would be the best model for the forecast, as the RMSE was lower.

# 95% prediction interval

Exp_sd<-sd(augment(Exportsfit)$.resid)

upper1<-Expfc$.mean[1]+(Exp_sd*1.96)
lower1<-Expfc$.mean[1]-(Exp_sd*1.96)

sprintf("ANN Intervals:[%f,%f]",lower1,upper1) #print the intervals

## [1] "ANN Intervals:[5.836522,16.809687]"

Expfc|>
  hilo(95)|>
  head(1)|>
  select("95%")

## # A tsibble: 1 x 2 [1Y]
##                    `95%`  Year
##                   <hilo> <dbl>
## 1 [5.785765, 16.86044]95  2018

Exp_sd2<-sd(augment(Exportsfit2)$.resid)

upper2<-Expfc2$.mean[1]+(Exp_sd2*1.96)
lower2<-Expfc2$.mean[1]-(Exp_sd2*1.96)

sprintf("AAN Intervals:[%f,%f]",lower2,upper2) #print the intervals

## [1] "AAN Intervals:[5.941910,16.920513]"

Expfc2|>
  hilo(95)|>
  head(1)|>
  select("95%")

## # A tsibble: 1 x 2 [1Y]
##                    `95%`  Year
##                   <hilo> <dbl>
## 1 [5.791571, 17.07085]95  2018

The 95% interval is higher for the AAN model as expected since the forecast has a upwards trend.

8.6 Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

[Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]

China<-global_economy|>
  filter(Country=="China")
China|>
  autoplot(GDP)+
  labs(title="GDP China")

chinaLam<-China|>
  features(GDP, features = guerrero)|>
  pull(lambda_guerrero)

#ETS
Chinafit<-China|>
  model("ANN"=ETS(GDP~error("A")+trend("N")+season("N")),
        "AAN"=ETS(GDP~error("A")+trend("A")+season("N")),
        "ANN Damped"=ETS(GDP~error("A")+trend("Ad")+season("N")),
        "Box-Cox"=ETS(box_cox(GDP,chinaLam)~error("A")+trend("A")+season("N")),
        "Box-Cox Damped"=ETS(box_cox(GDP, chinaLam) ~error("A")+trend("Ad")+season("N")))

report(Chinafit)

## Warning in report.mdl_df(Chinafit): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.

## # A tibble: 5 × 10
##   Country .model     sigma2 log_lik   AIC  AICc   BIC      MSE     AMSE      MAE
##   <fct>   <chr>       <dbl>   <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>    <dbl>
## 1 China   ANN      1.79e+23 -1669.  3345. 3345. 3351. 1.73e+23 7.22e+23 2.13e+11
## 2 China   AAN      3.88e+22 -1624.  3258. 3259. 3268. 3.61e+22 1.31e+23 9.59e+10
## 3 China   ANN Dam… 3.96e+22 -1624.  3260. 3262. 3273. 3.62e+22 1.30e+23 9.49e+10
## 4 China   Box-Cox  1.43e- 3    74.3 -139. -137. -128. 1.33e- 3 3.52e- 3 2.86e- 2
## 5 China   Box-Cox… 1.50e- 3    73.4 -135. -133. -123. 1.37e- 3 4.56e- 3 2.95e- 2

Chinafc<-Chinafit|>
  forecast(h=5)

Chinafc|>
  autoplot(China, level=NULL)+
  labs(y="GDP", title="China GDP Forecast")

8.7 Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts? Multiplicative seasonality is necessary because seasonality pattern for gas seems to be proportional to the overall gas production levels.

aus_production|>
  autoplot(Gas)

Gasfit<-aus_production|>
  model("Multiplicative" = ETS(Gas~ error("M")+ trend("A")+season("M")),
        "Damped Multiplicative"= ETS(Gas~error("M")+ trend("Ad")+season("M")))

Gasfc<-Gasfit|>
  forecast(h=15)

Gasfc|>
  autoplot(aus_production, level=NULL)+
  labs(y="Gas", title="Gas Forecast")

accuracy(Gasfit)|>
  select(".model", "RMSE")

## # A tibble: 2 × 2
##   .model                 RMSE
##   <chr>                 <dbl>
## 1 Multiplicative         4.60
## 2 Damped Multiplicative  4.59

The RMSE is slightly higher for the multiplicative model compared to the damped multiplicative, damped would be preferred since the RMSE value is lower meaning it’s more accurate.

8.8 Recall your retail time series data (from Exercise 7 in Section 2.10).

Why is multiplicative seasonality necessary for this series? Due to seasonal variation for retail in Australia.

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped. Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer? Check that the residuals from the best method look like white noise. Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
autoplot(myseries)

## Plot variable not specified, automatically selected `.vars = Turnover`

myseriesfit<-myseries|>
  model("Multiplicative" = ETS(Turnover~ error("M")+ trend("A")+season("M")),
        "Damped Multiplicative"= ETS(Turnover~error("M")+ trend("Ad")+season("M")))

myseriesfc<- myseriesfit|>
  forecast(h=10)

myseriesfc|>
  autoplot(myseries, level=NULL)+
  labs(title = "Australian Retail Turnover Forecast", y="Turnover")

#RMSE ending 2010

myseries2010<-myseries|>
  filter(year(Month)<2011) #to end year 2010
autoplot(myseries2010)

## Plot variable not specified, automatically selected `.vars = Turnover`

myseriesfit<-myseries2010|>
  model("Multiplicative" = ETS(Turnover~ error("M")+ trend("A")+season("M")),
        "Damped Multiplicative"= ETS(Turnover~error("M")+ trend("Ad")+season("M")),
        "Seasonal naive"=SNAIVE(Turnover))


accuracy(myseriesfit)|>
  select(".model", "RMSE")

## # A tibble: 3 × 2
##   .model                 RMSE
##   <chr>                 <dbl>
## 1 Multiplicative        0.518
## 2 Damped Multiplicative 0.519
## 3 Seasonal naive        1.21

I would prefer to use multiplicative because it has a lower RMSE value, meaning the values in this model would be more accurate. The seasonal naive RMSE was very high, meaning multiplicative and damped multiplicative would beat NAIVE in accuracy.

8.9 For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

Lam<-myseries2010|>
  features(Turnover, features = guerrero)|>
  pull(lambda_guerrero)

myserie<-myseries2010|>
  mutate(TO_box_cox=box_cox(Turnover,Lam))

myseriesSE<-myserie|>
  model("STL (BC)" =STL(TO_box_cox ~ season(window="periodic")), "ETS(BC)"=ETS(TO_box_cox))

myseriesMA<-myseries|>
  model("Damped Multiplicative"= ETS(Turnover~error("M")+ trend("Ad")+season("M")))

rbind(accuracy(myseriesMA), accuracy(myseriesSE))

## # A tibble: 3 × 12
##   State    Industry .model .type      ME   RMSE    MAE     MPE  MAPE  MASE RMSSE
##   <chr>    <chr>    <chr>  <chr>   <dbl>  <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 Norther… Clothin… Dampe… Trai… 3.98e-2 0.616  0.444  -0.0723  5.06 0.507 0.531
## 2 Norther… Clothin… STL (… Trai… 1.61e-4 0.0786 0.0616 -0.134   2.84 0.325 0.313
## 3 Norther… Clothin… ETS(B… Trai… 1.39e-2 0.0994 0.0784  0.516   3.56 0.414 0.396
## # ℹ 1 more variable: ACF1 <dbl>

The RMSE was lower for the STL model, could be because it captures the complex seasonality.

Assignment 5

Andreina A

2025-03-07