library(fpp3)
## Warning: package 'fpp3' was built under R version 4.4.1
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.0 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.5
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.3.2
## ✔ lubridate 1.9.3 ✔ fable 0.3.4
## ✔ ggplot2 3.5.1 ✔ fabletools 0.4.2
## Warning: package 'ggplot2' was built under R version 4.4.1
## Warning: package 'tsibble' was built under R version 4.4.1
## Warning: package 'tsibbledata' was built under R version 4.4.1
## Warning: package 'feasts' was built under R version 4.4.1
## Warning: package 'fabletools' was built under R version 4.4.1
## Warning: package 'fable' was built under R version 4.4.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.4.1
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.5
## ✔ purrr 1.0.2 ✔ stringr 1.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(plotly)
## Warning: package 'plotly' was built under R version 4.4.1
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 4.4.1
##
## Attaching package: 'gridExtra'
##
## The following object is masked from 'package:dplyr':
##
## combine
theme_set(theme_minimal())
Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.
Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of α and ℓ0 , and generate forecasts for the next four months.
Compute a 95% prediction interval for the first forecast using ^y±1.96 s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.
vic_pigs <- aus_livestock|>
filter(State == "Victoria", Animal == "Pigs")
autoplot(vic_pigs)
## Plot variable not specified, automatically selected `.vars = Count`
fitpigs<- vic_pigs |>
model(ETS(Count~ error("A") +trend("N")+ season("N")))
report(fitpigs)
## Series: Count
## Model: ETS(A,N,N)
## Smoothing parameters:
## alpha = 0.3221247
##
## Initial states:
## l[0]
## 100646.6
##
## sigma^2: 87480760
##
## AIC AICc BIC
## 13737.10 13737.14 13750.07
fcpigs<-fitpigs|>
forecast(h= 4)|>
hilo(level = 95)
hilooo<- fcpigs|>
select( .mean, '95%')
view(hilooo)
residuals <- components(fitpigs) |> pull(remainder)
sd_rem <- sd(residuals, na.rm = TRUE)
cat("Standard Deviation of the remainder:", sd_rem, "\n")
## Standard Deviation of the remainder: 9344.666
fcpigs95 <- fitpigs |> forecast(h = 4, level = 95)
forecasted_values <- fcpigs95$.mean
lower_bound <- forecasted_values - 1.96 * sd_rem
upper_bound <- forecasted_values + 1.96 * sd_rem
prediction_intervals <- data.frame(
Forecast = forecasted_values,
Lower_Bound = lower_bound,
Upper_Bound = upper_bound
)
print(prediction_intervals)
## Forecast Lower_Bound Upper_Bound
## 1 95186.56 76871.01 113502.1
## 2 95186.56 76871.01 113502.1
## 3 95186.56 76871.01 113502.1
## 4 95186.56 76871.01 113502.1
The difference between my prediction intervals and R is very small.
Data set global_economy contains the annual Exports from many countries. Select one country to analyse.
Plot the Exports series and discuss the main features of the data.
Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.
Compute the RMSE values for the training data.
Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.)
Discuss the merits of the two forecasting methods for this data set. Compare the forecasts from both methods.
Which do you think is best?
Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.
dominican_economy <- global_economy |>
filter(Country == 'Dominican Republic')
# a. Plot the Exports series and discuss the main features of the data.
p1 <-dominican_economy |>
autoplot(Exports) +
geom_smooth(method = 'loess', se = FALSE, color = 'red') +
labs(y = 'Exports (% of GDP)',
title = 'Exports: Dominican Republic')
p1
## `geom_smooth()` using formula = 'y ~ x'
The data shows exports as percentage of the GDP over time. I added a Geom_smooth line to try and identify a trend over this time. As we we can see the Dominican Exports were increasing between the 70’s and 2000s with the exception of 1991-1992. After ~2003 the exports as a percent of GDP has been declining significantly. Might be related to DR’s Surge in tourism.
# STL decomposition method
dcmp <- dominican_economy |>
model(stl = STL(Exports))
components(dcmp)
## # A dable: 58 x 7 [1Y]
## # Key: Country, .model [1]
## # : Exports = trend + remainder
## Country .model Year Exports trend remainder season_adjust
## <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Dominican Republic stl 1960 25.6 25.6 0.0271 25.6
## 2 Dominican Republic stl 1961 23.3 24.0 -0.764 23.3
## 3 Dominican Republic stl 1962 23.9 22.5 1.40 23.9
## 4 Dominican Republic stl 1963 20.7 21.0 -0.288 20.7
## 5 Dominican Republic stl 1964 19.7 19.8 -0.0638 19.7
## 6 Dominican Republic stl 1965 16.3 18.7 -2.44 16.3
## 7 Dominican Republic stl 1966 16.3 18.2 -1.82 16.3
## 8 Dominican Republic stl 1967 18.1 17.9 0.156 18.1
## 9 Dominican Republic stl 1968 18.5 18.0 0.441 18.5
## 10 Dominican Republic stl 1969 18.5 18.4 0.0587 18.5
## # ℹ 48 more rows
# plot STL decomposition
components(dcmp) |>
autoplot()
There is a trend component in our data, no seasonality.
# b. Use an ETS(A,N,N) model to forecast the series and plot the forecasts.
dr_fit_1ann <- dominican_economy |>
model(ETS(Exports ~ error('A') + trend('N') + season('N')))
report(dr_fit_1ann)
## Series: Exports
## Model: ETS(A,N,N)
## Smoothing parameters:
## alpha = 0.6403621
##
## Initial states:
## l[0]
## 24.65509
##
## sigma^2: 17.9938
##
## AIC AICc BIC
## 407.0921 407.5365 413.2734
fcDR5<-dr_fit_1ann |>
forecast(h = 5)
fcDR5
## # A fable: 5 x 5 [1Y]
## # Key: Country, .model [1]
## Country .model Year Exports .mean
## <fct> <chr> <dbl> <dist> <dbl>
## 1 Dominican Republic "ETS(Exports ~ error(\"A\") + trend(… 2018 N(25, 18) 24.8
## 2 Dominican Republic "ETS(Exports ~ error(\"A\") + trend(… 2019 N(25, 25) 24.8
## 3 Dominican Republic "ETS(Exports ~ error(\"A\") + trend(… 2020 N(25, 33) 24.8
## 4 Dominican Republic "ETS(Exports ~ error(\"A\") + trend(… 2021 N(25, 40) 24.8
## 5 Dominican Republic "ETS(Exports ~ error(\"A\") + trend(… 2022 N(25, 48) 24.8
fcDR5|>
autoplot(dominican_economy, level = NULL) +
labs(y = 'Exports (% of GDP)',
title = 'Exports Dominican Republic')
# c. Compute the RMSE values for the training data
dr_fit_1ann |>
accuracy() |>
select(Country, .model, .type, RMSE)
## # A tibble: 1 × 4
## Country .model .type RMSE
## <fct> <chr> <chr> <dbl>
## 1 Dominican Republic "ETS(Exports ~ error(\"A\") + trend(\"N\") + s… Trai… 4.17
# estimated parameters
tidy(dr_fit_1ann)
## # A tibble: 2 × 4
## Country .model term estimate
## <fct> <chr> <chr> <dbl>
## 1 Dominican Republic "ETS(Exports ~ error(\"A\") + trend(\"N\") … alpha 0.640
## 2 Dominican Republic "ETS(Exports ~ error(\"A\") + trend(\"N\") … l[0] 24.7
# d. Compare the results with an ETS(A,A,N) model.
# e. Compare the forecasts from both models. Which is the best?
dr_fit_2aan <- dominican_economy |>
model(ETS(Exports ~ error('A') + trend('A') + season('N')))
fcDrAA5<-dr_fit_2aan |>
forecast(h = 5)
fcDrAA5
## # A fable: 5 x 5 [1Y]
## # Key: Country, .model [1]
## Country .model Year Exports .mean
## <fct> <chr> <dbl> <dist> <dbl>
## 1 Dominican Republic "ETS(Exports ~ error(\"A\") + trend(… 2018 N(25, 19) 24.9
## 2 Dominican Republic "ETS(Exports ~ error(\"A\") + trend(… 2019 N(25, 26) 24.9
## 3 Dominican Republic "ETS(Exports ~ error(\"A\") + trend(… 2020 N(25, 33) 24.9
## 4 Dominican Republic "ETS(Exports ~ error(\"A\") + trend(… 2021 N(25, 41) 24.9
## 5 Dominican Republic "ETS(Exports ~ error(\"A\") + trend(… 2022 N(25, 48) 24.9
fcDrAA5|>
autoplot(dominican_economy, level = NULL) +
labs(y = 'Exports (% of GDP)',
title = 'Exports: Dominican Republic')
# A,N,N
dr_fit_1ann |>
forecast(h = 1) |>
hilo(level = 95)
## # A tsibble: 1 x 6 [1Y]
## # Key: Country, .model [1]
## Country .model Year Exports .mean `95%`
## <fct> <chr> <dbl> <dist> <dbl> <hilo>
## 1 Dominican Republic "ETS(Exports … 2018 N(25, 18) 24.8 [16.53003, 33.15803]95
# A,A,N
dr_fit_2aan |>
forecast(h = 1) |>
hilo(level = 95)
## # A tsibble: 1 x 6 [1Y]
## # Key: Country, .model [1]
## Country .model Year Exports .mean `95%`
## <fct> <chr> <dbl> <dist> <dbl> <hilo>
## 1 Dominican Republic "ETS(Exports … 2018 N(25, 19) 24.9 [16.40285, 33.33814]95
RMSE_1 <- dr_fit_1ann |>
accuracy() |>
pull(RMSE)
# 1.96 for 95%
dr_fit_1ann |>
forecast(h = 1) |>
mutate(conf_lo = .mean - 1.96 * RMSE_1,
conf_hi = .mean + 1.96 * RMSE_1)
## # A fable: 1 x 7 [1Y]
## # Key: Country, .model [1]
## Country .model Year Exports .mean conf_lo conf_hi
## <fct> <chr> <dbl> <dist> <dbl> <dbl> <dbl>
## 1 Dominican Republic "ETS(Exports ~ error… 2018 N(25, 18) 24.8 16.7 33.0
RMSE_2 <- dr_fit_2aan |>
accuracy() |>
pull(RMSE)
dr_fit_2aan |>
forecast(h = 1) |>
mutate(conf_lo = .mean - 1.96 * RMSE_2,
conf_hi = .mean + 1.96 * RMSE_2)
## # A fable: 1 x 7 [1Y]
## # Key: Country, .model [1]
## Country .model Year Exports .mean conf_lo conf_hi
## <fct> <chr> <dbl> <dist> <dbl> <dbl> <dbl>
## 1 Dominican Republic "ETS(Exports ~ error… 2018 N(25, 19) 24.9 16.7 33.0
Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.
[Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]
china_gdp <- global_economy |>
filter(Country == "China") |>
select( GDP)
chinamod<-china_gdp |>
model(
suggested = ETS(GDP),
Holt=ETS(GDP ~error("A")+ trend ("A")+ season ("N")),
HoltDamped = ETS(GDP~ error("A")+ trend ("A")+ season ("N")),
MHoltDamped=ETS(GDP ~error("M") + trend ("Ad")+ season ("N"),
)
)
chinamod
## # A mable: 1 x 4
## suggested Holt HoltDamped MHoltDamped
## <model> <model> <model> <model>
## 1 <ETS(M,A,N)> <ETS(A,A,N)> <ETS(A,A,N)> <ETS(M,Ad,N)>
tidy(chinamod)
## # A tibble: 17 × 3
## .model term estimate
## <chr> <chr> <dbl>
## 1 suggested alpha 1.00e+ 0
## 2 suggested beta 3.12e- 1
## 3 suggested l[0] 4.57e+10
## 4 suggested b[0] 3.29e+ 9
## 5 Holt alpha 1.00e+ 0
## 6 Holt beta 5.52e- 1
## 7 Holt l[0] 5.03e+10
## 8 Holt b[0] 3.29e+ 9
## 9 HoltDamped alpha 1.00e+ 0
## 10 HoltDamped beta 5.52e- 1
## 11 HoltDamped l[0] 5.03e+10
## 12 HoltDamped b[0] 3.29e+ 9
## 13 MHoltDamped alpha 1.00e+ 0
## 14 MHoltDamped beta 3.37e- 1
## 15 MHoltDamped phi 9.80e- 1
## 16 MHoltDamped l[0] 4.57e+10
## 17 MHoltDamped b[0] 3.29e+ 9
accuracy(chinamod)
## # A tibble: 4 × 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 suggested Training 4.13e10 2.00e11 9.66e10 2.11 7.72 0.446 0.477 0.268
## 2 Holt Training 2.36e10 1.90e11 9.59e10 1.41 7.62 0.442 0.453 0.00905
## 3 HoltDamped Training 2.36e10 1.90e11 9.59e10 1.41 7.62 0.442 0.453 0.00905
## 4 MHoltDamped Training 4.65e10 2.00e11 9.70e10 2.32 7.79 0.447 0.476 0.236
chinamodfc<- chinamod |>
forecast(h = 30,level= NULL)
chinamodfc|>
autoplot(china_gdp, level = NULL)+
labs(title = "China GDP Multi MOdel: 30 Years")+
theme_minimal()
R selected the MAN model because the data does not exhibit seasonality and the trend is linear. I also tried Holt’s linear model, which is very similar to the one selected by R. Additionally, I tested a Holt damped model that smooths the linear trend around interval 25, as well as another damped model with multiplicative errors, which produced nearly identical forecasts. I prefer the MAN and Holt’s Linear Model.
Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?
ausgas<- aus_production |>
select(Gas)
ausgas_fit<-ausgas|>
model(
Suggested= ETS(Gas),
MAdM = ETS(Gas~ error("M")+ trend ("Ad")+ season("M"))
)
ausgas_fit
## # A mable: 1 x 2
## Suggested MAdM
## <model> <model>
## 1 <ETS(M,A,M)> <ETS(M,Ad,M)>
ausgas_FC<- ausgas_fit |>
forecast(h = "4 years")
ausgas_FC
## # A fable: 32 x 4 [1Q]
## # Key: .model [2]
## .model Quarter Gas .mean
## <chr> <qtr> <dist> <dbl>
## 1 Suggested 2010 Q3 N(259, 218) 259.
## 2 Suggested 2010 Q4 N(214, 243) 214.
## 3 Suggested 2011 Q1 N(201, 329) 201.
## 4 Suggested 2011 Q2 N(242, 699) 242.
## 5 Suggested 2011 Q3 N(263, 1210) 263.
## 6 Suggested 2011 Q4 N(217, 1108) 217.
## 7 Suggested 2012 Q1 N(204, 1279) 204.
## 8 Suggested 2012 Q2 N(246, 2379) 246.
## 9 Suggested 2012 Q3 N(267, 3609) 267.
## 10 Suggested 2012 Q4 N(220, 3037) 220.
## # ℹ 22 more rows
ausgas_FC|>
autoplot(ausgas , level = NULL)+
labs(title = "Aus Gas Production: 4 Years")+
theme_minimal()
accuracy(ausgas_fit)
## # A tibble: 2 × 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Suggested Training -0.115 4.60 3.02 0.199 4.08 0.542 0.606 -0.0131
## 2 MAdM Training -0.00439 4.59 3.03 0.326 4.10 0.544 0.606 -0.0217
The Suggested MAM model outperforms the damped trend model, with the differences in performance being less than 0.1% in cases where it does not.
Recall your retail time series data (from Exercise 7 in Section 2.10).
Why is multiplicative seasonality necessary for this series?
Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped. Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?
Check that the residuals from the best method look like white noise. Now find the test set RMSE, while training the model to the end of 2010.
Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?
set.seed(3231)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries_train <- myseries %>%
filter(year(Month) < 2011)
fitmy <- myseries_train |>
model(SNAIVE(Turnover ~ lag("year")))
fc <- fitmy |>
forecast(new_data = anti_join(myseries, myseries_train))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fcplot<-fc |> autoplot(myseries,level = NULL)+
autolayer(myseries_train, Turnover, colour = "red")+
labs(title="Naive approach prediction test")+
theme_minimal()
fcplot
A.
Multiplicative seasonality is needed for this data because
we observe an increasing trend and increasing seasonal
fluctuations.
myseries_ets<-myseries|>
model(`Multiplicative` = ETS(
Turnover ~ error("M") + trend("A") + season("M")),
`Damped Multiplicative` = ETS(
Turnover ~ error("M") + trend("Ad") + season("M")))
accuracy(myseries_ets)|>
select(.model, RMSE)
## # A tibble: 2 × 2
## .model RMSE
## <chr> <dbl>
## 1 Multiplicative 3.34
## 2 Damped Multiplicative 3.31
## training model
myseries_training_ets<-myseries_train|>
model(`Multiplicative` = ETS(
Turnover ~ error("M") + trend("A") + season("M")),
`Damped Multiplicative` = ETS(
Turnover ~ error("M") + trend("Ad") + season("M"))
)
myseries_training_ets
## # A mable: 1 x 4
## # Key: State, Industry [1]
## State Industry Multiplicative `Damped Multiplicative`
## <chr> <chr> <model> <model>
## 1 Tasmania Food retailing <ETS(M,A,M)> <ETS(M,Ad,M)>
accuracy(myseries_training_ets)|>
select(.model, RMSE)
## # A tibble: 2 × 2
## .model RMSE
## <chr> <dbl>
## 1 Multiplicative 2.91
## 2 Damped Multiplicative 2.93
accuracy(fitmy)|>
select(.model, RMSE)
## # A tibble: 1 × 2
## .model RMSE
## <chr> <dbl>
## 1 "SNAIVE(Turnover ~ lag(\"year\"))" 7.47
myseries|>
model(
multiplicative = ETS(
Turnover ~ error("M") + trend("A") + season("M"))) |>
gg_tsresiduals() +
ggtitle("Multiplicative Method")
myseries |>
model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M"))) |>
augment() |>
features(.innov, box_pierce, lag = 2, dof = 0)
## # A tibble: 1 × 5
## State Industry .model bp_stat bp_pvalue
## <chr> <chr> <chr> <dbl> <dbl>
## 1 Tasmania Food retailing multiplicative 5.22 0.0736
ets_fc <-myseries_training_ets|> forecast(new_data = anti_join(myseries, myseries_train))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fcplot2 <- ets_fc |>
autoplot(myseries, level = NULL) +
autolayer(myseries_train, Turnover, colour = "red",level = NULL) +
labs(title = "ETS Models") +
guides(colour = guide_legend(title = "Model"))+
theme_minimal()+
theme(legend.position = "top")
## Warning in geom_line(eval_tidy(expr(aes(!!!aes_spec))), data = object, ..., :
## Ignoring unknown parameters: `level`
grid.arrange(fcplot,fcplot2, ncol = 2)
From the figure above, we see that the Holt-Winters’ multiplicative method outperformed the SNAIVE model. Although it deviated at the beginning of the test period, it successfully tracked the overall trend and, by the end of the test period, performed well against the test data. However, the model appears to miss some short-term fluctuations, suggesting that it may be susceptible to short-term weaknesses or sudden changes in the data
For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?
my_lambda <- myseries_train |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
myseries_train_bc<-myseries_train |>
model(STL(box_cox(
Turnover,my_lambda) ~ trend(window = 21) +
season(window = "periodic"), robust = TRUE))
components(myseries_train_bc) |>
autoplot() +
ggtitle("STL with Box-Cox Transformation")
adjust_seas <- myseries_train |>
model(STL_box = STL(
box_cox(Turnover,my_lambda) ~ trend(window = 21) +
season(window = "periodic"), robust = TRUE)) |>
components()
myseries_train$Turnover_ad <- adjust_seas$season_adjust
ets_ad_fit<-myseries_train|>
model(ETS(Turnover_ad))
ets_ad_fit|>
gg_tsresiduals() +
ggtitle("Turnover Residuals")
accuracy(ets_ad_fit) |>
select(RMSE, RMSSE)
## # A tibble: 1 × 2
## RMSE RMSSE
## <dbl> <dbl>
## 1 0.0585 0.389
accuracy(myseries_training_ets) |>
select(RMSE, RMSSE)
## # A tibble: 2 × 2
## RMSE RMSSE
## <dbl> <dbl>
## 1 2.91 0.389
## 2 2.93 0.392
fcplot3<-ets_ad_fit|>
forecast(h="9 years")|>
autoplot(myseries_train,color="purple",level=95)+
labs(title = "Adjusted seanonal + prediction 9 yers")+
theme_minimal()+
theme(legend.position = "bottom")
grid.arrange(fcplot,fcplot2,fcplot3)
Extracting the seasonality and applying it to the training data reduces the variance in the seasonal patterns, resulting in a smoother forecast plot. This approach allows the forecast to align well with the data used to evaluate the previous models.