In this homework assignment I will be submitting exercises 8.1, 8.5, 8.6, 8.7, 8.8, 8.9 from the Hyndman online Forecasting book.
Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.
library(fpp3)
## Warning: package 'fpp3' was built under R version 4.4.2
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.6
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.1
## ✔ lubridate 1.9.3 ✔ fable 0.4.1
## ✔ ggplot2 3.5.1
## Warning: package 'tsibble' was built under R version 4.4.2
## Warning: package 'tsibbledata' was built under R version 4.4.2
## Warning: package 'feasts' was built under R version 4.4.2
## Warning: package 'fabletools' was built under R version 4.4.2
## Warning: package 'fable' was built under R version 4.4.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
fit <- aus_livestock %>%
filter(Animal == "Pigs", State == "Victoria") %>%
model(SES = ETS(Count ~ error("A") + trend("N") + season("N")))
tidy(fit) %>%
select(term,estimate)
## # A tibble: 2 × 2
## term estimate
## <chr> <dbl>
## 1 alpha 0.322
## 2 l[0] 100647.
The optial alpha is 0.3221 and the optimal l0 is 100646
forecast <- fit %>%
forecast(h = 4)
forecast %>%
select(Month, Count, .mean)
## # A fable: 4 x 3 [1M]
## Month Count
## <mth> <dist>
## 1 2019 Jan N(95187, 8.7e+07)
## 2 2019 Feb N(95187, 9.7e+07)
## 3 2019 Mar N(95187, 1.1e+08)
## 4 2019 Apr N(95187, 1.1e+08)
## # ℹ 1 more variable: .mean <dbl>
forecast %>% autoplot(aus_livestock)
s <- sd(augment(fit)$.resid)
upper_limit_95<-forecast$.mean[1]+(s*1.96)
lower_limit_95<-forecast$.mean[1]-(s*1.96)
upper_limit_95
## [1] 113502.1
lower_limit_95
## [1] 76871.01
forecast %>% hilo()%>% pull(-1)
## <hilo[4]>
## [1] [76854.79, 113518.3]95 [75927.17, 114445.9]95 [75042.22, 115330.9]95
## [4] [74194.54, 116178.6]95
The interval produced by R (76854.79, 113518.3) is very close to my intervals of upper 113502.1 and lower 76871.01. The differences could be due to rounding.
# I will analyze the China exports data
china_exports <- global_economy %>%
filter(Code == "CHN") %>%
select(Exports)
autoplot(china_exports)
## Plot variable not specified, automatically selected `.vars = Exports`
There is an apparent positive trend in the China exports data. The time
series is yearly so there is no seasonality, however there seems to be
some small cyclicity, a decrease in exports every few years (5-10).
There may be a change in trend occuring with the peak around 2005.
fit2 <- china_exports %>%
model(
ANN = ETS(Exports ~ error("A") + trend("N") + season("N"))
)
fc2 <- forecast(fit2, h=5)
autoplot(fc2,china_exports)
c. Compute the RMSE values for the training data.
fit2 %>% accuracy()
## # A tibble: 1 × 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ANN Training 0.266 1.90 1.26 1.84 9.34 0.983 0.991 0.288
The RMSE is 1.900
compare <- china_exports %>%
model(
ANN=ETS(Exports ~ error("A") + trend("N") + season("N")),
AAN=ETS(Exports ~ error("A") + trend("A") + season("N"))
)
accuracy(compare)
## # A tibble: 2 × 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ANN Training 0.266 1.90 1.26 1.84 9.34 0.983 0.991 0.288
## 2 AAN Training -0.0854 1.91 1.25 -0.169 9.57 0.973 0.995 0.232
The RMSE of the AAN model is larger than the ANN model and the ANN is interpreted as a better model. The AAN model is taking into account that the most recent short term trend is that the exports are decreasing and forecasts a continual decrease.
compare %>%
forecast(h=5) %>%
autoplot(china_exports, level=NULL)
Intuitively I want to say that the AAN model looks to be better (the
recent decline in exports could be due to factors that will continue),
but based on the RMSE, and a naive drift thought path I would say the
ANN forecast is better.
s2 <- sd(augment(fit2)$.resid)
upper_limit_952<-fc2$.mean[1]+(s2*1.96)
lower_limit_952<-fc2$.mean[1]-(s2*1.96)
upper_limit_952
## [1] 23.47713
lower_limit_952
## [1] 16.03761
fc2 %>% hilo() %>% pull(-1)
## <hilo[5]>
## [1] [15.96718, 23.54756]95 [14.39750, 25.11724]95 [13.19301, 26.32174]95
## [4] [12.17756, 27.33718]95 [11.28292, 28.23182]95
My manually calculated upper and lower limits at a confidence level of 95 is (23.47713 and 16.03761), while R calculated (15.96718, 23.54756). The CI bounds are very similar.
Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.
[Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]
china_gdp <- global_economy %>%
filter(Code == "CHN") %>%
select(GDP)
autoplot(china_gdp)
## Plot variable not specified, automatically selected `.vars = GDP`
fit <- china_gdp %>%
model(
AAN = ETS(GDP ~ error("A") + trend("A") + season("N")),
AADN = ETS(GDP ~ error("A") + trend("Ad") + season("N")),
MAN = ETS(GDP ~ error("M") + trend("A") + season("N")),
MADN = ETS(GDP ~ error("M") + trend("Ad") + season("N")),
BOXLOG = ETS(box_cox(GDP,0)),
BOXSQRT = ETS(box_cox(GDP,0.5))
)
fc <- forecast(fit, h=20)
autoplot(fc,china_gdp, level = NULL)
From the forecasts of many of the ETS models we can see that the forecasts are drastically different over a longer period of time. The ETS with LOG transform is unrealistic, as it forecasts unlimited exponential growth. The ETS with boxcox log might be better suited for short term forecasting. The ETS with SQRT transformation seems a more reasonable forecast, however is still on the high side of forecasts as seen by the other ETS forecasts. The dampening trend options are probably better options.
Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?
autoplot(aus_production, Gas)
gas <- aus_production %>%
select(Gas)
fit <- gas %>%
model(
MAM = ETS(Gas ~ error("M") + trend("A") + season("M")),
MAdM = ETS(Gas ~ error("M") + trend("Ad") + season("M"))
)
fc3 <- forecast(fit,h=24)
autoplot(fc3,gas,level=NULL)
Multiplicative seasonality is necessary here because the seasonality is increasing exponentially as the trend increases. The forecast seems to be only minimally effected by dampening the trend. I would not say the forecast is improved.
fit %>%
accuracy()
## # A tibble: 2 × 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 MAM Training -0.115 4.60 3.02 0.199 4.08 0.542 0.606 -0.0131
## 2 MAdM Training -0.00439 4.59 3.03 0.326 4.10 0.544 0.606 -0.0217
The RMSE shows that dampening the trend very slightly improved the forecast.
set.seed(1)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
autoplot(myseries)
## Plot variable not specified, automatically selected `.vars = Turnover`
Multiplicative seasonality is necessary here because the seasonality is increasing exponentially as the trend increases.
fit <- myseries %>%
model(
MAM = ETS(Turnover ~ error("M") + trend("A") + season("M")),
MAdM = ETS(Turnover ~ error("M") + trend("Ad") + season("M"))
)
fc4 <- forecast(fit,h=24)
autoplot(fc4,myseries,level=NULL)
fit %>% accuracy()
## # A tibble: 2 × 12
## State Industry .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Quee… Clothin… MAM Trai… 0.415 8.53 5.95 -0.0595 4.65 0.483 0.511 0.0814
## 2 Quee… Clothin… MAdM Trai… 0.689 8.55 5.99 0.174 4.66 0.485 0.512 0.0588
The RMSE of the dampened trend model is 0.02 better. I prefer the MADM model.
fit %>%
components() %>%
filter(.model == "MAdM") %>%
select(remainder) %>%
autoplot()
## Plot variable not specified, automatically selected `.vars = remainder`
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
The residuals look like white noise around the mean 0. There is no discernible pattern and variance remains relatively constant.
myseries_train <- myseries %>%
filter(year(Month) < 2011)
fc <- fit %>%
forecast(new_data = anti_join(myseries, myseries_train))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fc %>% autoplot(myseries, level=NULL)
fc %>% accuracy(myseries)
## # A tibble: 2 × 12
## .model State Industry .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 MAM Queensl… Clothin… Test -73.8 76.5 73.8 -36.6 36.6 6.12 4.64 0.658
## 2 MAdM Queensl… Clothin… Test -66.0 70.3 66.0 -33.2 33.2 5.47 4.26 0.773
Here we show that our MAM model is better than the MADM model by an RMSE of about 7 points. We beat the seasonal naive model from previous!
For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?
lambda_t <- myseries_train %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
fit_myseries_train2 <- myseries_train %>%
model(
`STL decomposition w/ box_cox`= STL(box_cox(Turnover, lambda_t) ~ trend(window = 21)+season(window = "periodic"), robust = TRUE),
`Holt's method M`= ETS(box_cox(Turnover, lambda_t) ~ error("M") + trend("A") + season("M")),
`Damped Holt's method M`= ETS(Turnover ~ error("M") + trend("Ad", phi=0.9) + season("M")))
accuracy(fit_myseries_train2)
## # A tibble: 3 × 12
## State Industry .model .type ME RMSE MAE MPE MAPE MASE RMSSE
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Queensland Clothing… STL d… Trai… 0.572 6.88 4.70 -0.0333 4.26 0.389 0.417
## 2 Queensland Clothing… Holt'… Trai… -0.691 8.02 5.59 -0.512 5.05 0.463 0.486
## 3 Queensland Clothing… Dampe… Trai… 0.637 7.56 5.36 0.480 4.85 0.444 0.458
## # ℹ 1 more variable: ACF1 <dbl>
The STL decomposition with boxcox transformation has the lowest RMSE score at 6.88. This came as a suprise to me as the “simplier” forecasting method performed better in terms of RMSE than our Holt’s method forecasts.