First, let’s construct our time series for pigs slaughtere din Victoria and plot
victoria_pigs <- aus_livestock %>% filter(Animal == "Pigs" & State == "Victoria")
victoria_pigs %>% autoplot(Count)
There doesn’t appread to be strong seasonality or a general overall trend in this time series, so simple exponential smoothing makes sense to use. Now, let’s construct a model via the ETS function for SES. We’ll mirror the techniques used in Hyndman
# Fit model with only an error term to reflect SES
fit <- victoria_pigs %>%
model(ETS(Count ~ error("A") + trend("N") + season("N")))
fc <- fit |>
forecast(h = 4)
Plotting our forecast over our time series gives us the following
# Plot victorian pigs model
fc |>
autoplot(victoria_pigs) +
geom_line(aes(y = .fitted), col="#D55E00",
data = augment(fit)) +
labs(y="Number of Pigs", title="Victorian Pigs Slaughtered") +
guides(colour = "none")
To find the optimized values fo \(\alpha\) and \(l_0\), we can use the report
function on our fit mable. We see optimized values for each
parameter in our SES model below:
fit %>% report()
## Series: Count
## Model: ETS(A,N,N)
## Smoothing parameters:
## alpha = 0.3221247
##
## Initial states:
## l[0]
## 100646.6
##
## sigma^2: 87480760
##
## AIC AICc BIC
## 13737.10 13737.14 13750.07
We can look at the export data from France
# Add fill_gaps for later pltting/forecasting
france <- global_economy %>% filter(Country == "France") %>% fill_gaps()
head(france)
## # A tsibble: 6 x 9 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 France FRA 1960 62651474947. NA 10.4 12.6 14.4 46814237
## 2 France FRA 1961 68346741504. 5.51 10.7 12.3 13.9 47444751
## 3 France FRA 1962 76313782252. 6.67 11.3 12.1 12.8 48119649
## 4 France FRA 1963 85551113767. 5.35 11.8 12.5 12.6 48803680
## 5 France FRA 1964 94906593388. 6.52 12.2 13.0 12.6 49449403
## 6 France FRA 1965 102160571409. 4.78 12.5 12.6 13.2 50023774
france %>% autoplot(Exports)
While this data isn’t strongly seasonal (no noticeable patterns), there
is a general upward trend in this data that may need to be accounted
for.
Let’s fit an \((A, N, N)\) ETS model to this dataset and forecast out 5 years
france_fit <- france %>%
model(ETS(Exports ~ error("A") + trend("N") + season("N")))
france_fc <- france_fit %>%
forecast(h = 5)
france_fc %>% autoplot(france) +
# geom_line(aes(y = .fitted), color="#456fd4" ,
# data = augment(france_fit)) +
labs(y="Exports", title="French Export Forecast") +
guides(colour = "none")
Now we’cc ompute RMSE values for the training data ismilar to the method
used in Hyndman
# Get accuracy, including RMSE
acc_ANN <- france_fit %>% accuracy()
acc_ANN
## # A tibble: 1 × 11
## Country .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 France "ETS(Exports… Trai… 0.284 1.15 0.840 1.17 3.86 0.983 0.991 -0.00542
We see an RMSE of 1.152 on our trainging data (the
france time series). We can compare this accuracy to an
\((A, A, N)\) model on the same
france_fit_AAN <- france %>%
model(ETS(Exports ~ error("A") + trend("A") + season("N")))
france_fc_AAN <- france_fit_AAN %>%
forecast(h = 5)
# Display both accuracies
acc_AAN <- france_fit_AAN %>% accuracy()
rbind(acc_ANN, acc_AAN)
## # A tibble: 2 × 11
## Country .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 France "ETS(Expo… Trai… 0.284 1.15 0.840 1.17 3.86 0.983 0.991 -0.00542
## 2 France "ETS(Expo… Trai… -0.0111 1.12 0.800 -0.243 3.81 0.936 0.964 0.0264
With an \((A, A, N)\) ETS model we see an improvement in our RMSE value. In this case a model with an (additive) trend term makes sense as the export series itself does have a general upward trend. In addition, the trend forecast (below with prediction intervals) does continue a general increase, which could be a resonable assumption for export data (generally, GDP/Export numbers will increase YoY). However, the non-trended forecaste does incorporate events that would break the upward trend (recession, etc.) better, and may be more realistic. In this case, a damped trended forecast could be a good “middle ground”
france_fc_AAN %>% autoplot(france)
##### Prediction Intervals
Our general formula for a prediction interval is:
\[\begin{aligned} \hat{y}_{T+h|T} = c\sigma_h \end{aligned}\]Where \(c=1.96\) as our Z-score value for a 95% confidence interval, and \(\signa_h\) is our forecast variance (given by the Hyndman text here)
c = 1.96
# Get residuals of ANN fit
france_ANN_aug <- france_fit %>% augment()
var_ann <- var(france_ANN_aug$.resid)
france_ANN_aug$interval_lower <- france_ANN_aug$.fitted - c * var_ann
france_ANN_aug$interval_upper <- france_ANN_aug$.fitted + c * var_ann
# Get residuals of AAN fit
france_AAN_aug <- france_fit_AAN %>% augment()
var_aan <- var(france_AAN_aug$.resid)
# Calculate values
france_AAN_aug$interval_lower <- france_AAN_aug$.fitted - c * var_aan
france_AAN_aug$interval_upper <- france_AAN_aug$.fitted + c * var_aan
head(france_AAN_aug)
## # A tsibble: 6 x 9 [1Y]
## # Key: Country, .model [1]
## Country .model Year Exports .fitted .resid .innov interval_lower
## <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 France "ETS(Exports ~ err… 1960 14.4 13.7 0.711 0.711 11.2
## 2 France "ETS(Exports ~ err… 1961 13.9 14.7 -0.752 -0.752 12.2
## 3 France "ETS(Exports ~ err… 1962 12.8 14.3 -1.44 -1.44 11.8
## 4 France "ETS(Exports ~ err… 1963 12.6 13.2 -0.625 -0.625 10.7
## 5 France "ETS(Exports ~ err… 1964 12.6 12.9 -0.304 -0.304 10.4
## 6 France "ETS(Exports ~ err… 1965 13.2 13.0 0.265 0.265 10.5
## # ℹ 1 more variable: interval_upper <dbl>
Alternatively, e can use the hilo method as well to
construct a 95% confidence interval
france_fc_AAN %>% hilo(level = c(95))
## # A tsibble: 5 x 6 [1Y]
## # Key: Country, .model [1]
## Country .model Year Exports .mean `95%`
## <fct> <chr> <dbl> <dist> <dbl> <hilo>
## 1 France "ETS(Exports ~ error(\"… 2018 N(31, 1.3) 31.2 [28.89915, 33.44901]95
## 2 France "ETS(Exports ~ error(\"… 2019 N(31, 2.6) 31.5 [28.35198, 34.62019]95
## 3 France "ETS(Exports ~ error(\"… 2020 N(32, 3.8) 31.8 [27.99403, 35.60215]95
## 4 France "ETS(Exports ~ error(\"… 2021 N(32, 5) 32.1 [27.73743, 36.48275]95
## 5 France "ETS(Exports ~ error(\"… 2022 N(32, 6.2) 32.4 [27.54660, 37.29758]95
china <- global_economy %>% filter(Country == "China")
china_fit <- china %>% model(
`(A, A, N)` = ETS(Exports ~ error("A") + trend("A") + season("N")),
`(A, N, N)` = ETS(Exports ~ error("A") + trend("N") + season("N")),
`Damped Trend (A, A, N)` = ETS(Exports ~ error("A") +
trend("Ad", phi = 0.9) + season("N"))
)
china_fc<- china_fit %>%
forecast(h = 20)
china_fc %>% autoplot(china) + labs(y="Exports", title="Forecasts of Chinese exports: 2018 - 2038")
Looking at this forecast of 20 years out, we see that the damped method (blue) tends to have a less “extreme” impact than the forecast with a trend. This helps to account for reversals in the trend that the naive additive method (with a trend term) does not always account for.
aus_gas <- aus_production %>% select(Gas)
aus_gas %>% autoplot(Gas)
In this case, a multiplicative model would be best since the variance of
our seasonality changes with time.
gas_fit <- aus_gas %>%
model(ETS(Gas ~ error("A") + trend("A") + season("M")))
gas_fc <- gas_fit %>% forecast(h=4)
# Plot gas forecast
gas_fc %>% autoplot(aus_gas)
china_aug <- china_fit %>% augment()
var(china_aug$.resid)
## [1] 3.589456
set.seed(12345678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
# Plot retail data
myseries %>% autoplot()
## Plot variable not specified, automatically selected `.vars = Turnover`
In this case, multiplicative seasonality since the seasonal variance increases with time.
Fitting Holt-Winters’ multiplicative model to the retail data, we see the following:
fit <- myseries |>
model(
multiplicative = ETS(Turnover ~ error("M") + trend("A") +
season("M")),
damped = ETS(Turnover ~ error("M") + trend("Ad") +
season("M"))
)
fc <- fit |> forecast(h = "5 years")
fc |>
autoplot(myseries) +
labs(title="Australian Retail Turnover",
y="Turnover") +
guides(colour = guide_legend(title = "Forecast"))
With a damped trend (orange) we see a flattening of the forecast
level with time, while the reegular trend method from Holt-Winters
continues to increase in level. We can compare the RMSE of the two
methods via the accuracy function:
fit %>% accuracy()
## # A tibble: 2 × 12
## State Industry .model .type ME RMSE MAE MPE MAPE MASE RMSSE
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Northern … Clothin… multi… Trai… -0.0128 0.613 0.450 -0.469 5.15 0.513 0.529
## 2 Northern … Clothin… damped Trai… 0.0398 0.616 0.444 -0.0723 5.06 0.507 0.531
## # ℹ 1 more variable: ACF1 <dbl>
We see a slightly better performance in terms of RMSE from the damped
method as opposed to the regular trended method. We can check the
residuals from the aumented method with our fit passed
augmented <- fit %>% augment()
hist(augmented$.resid)
train <- myseries %>% filter(Month < yearmonth("Jan 2011"))
# Train a model on new test set with damped method
fit_train <- train |> model(
damped = ETS(Turnover ~ error("M") + trend("Ad") +
season("M"))
)
prediction_interval <- "10 years"
fc <- fit_train |> forecast(h = prediction_interval)
fc |>
autoplot(train) +
labs(title="Australian Retail Turnover",
y="Turnover") +
guides(colour = guide_legend(title = "Forecast"))
Now let’s look at the RMSE of our fit trained up until 2011, we see a better RMSE than our other model from exercise 5.7 (\(0.51 < 1.51\))
fit_train %>% accuracy()
## # A tibble: 1 × 12
## State Industry .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 North… Clothin… damped Trai… 0.0357 0.519 0.392 0.158 5.34 0.428 0.427 0.0233
We can transform our retail timeseries via the same method applied
lambda <- train %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
# Transform turnover series based on returned lambda
train$turnover_bc <- box_cox(train$Turnover, lambda=lambda)
train %>% autoplot(turnover_bc) + labs(title="Box-Cox transformed Australian Retail Turnover")
Now that we have our Box-Cox transformed series, we
# Fit an STL
fit_bc_stl <- train %>% model(
STL(turnover_bc)
)
comp_bc <- components(fit_bc_stl)
comp_bc %>% autoplot()
Now we can apply ETS on the seasonally-adjusted data
from our comp variable
fit_bc_ets <- comp_bc %>% as_tsibble() %>%
model(
seasonal_adjustment = ETS(season_adjust ~ error("A") + trend("A"))
) %>% dplyr::select(-c(.model))
# forecast based on our box-cox transformed seasonally-adjusted data
fc_bc <- fit_bc_ets %>%
forecast(h=prediction_interval)
fit_bc_ets %>% accuracy()
## # A tibble: 1 × 12
## State Industry .model .type ME RMSE MAE MPE MAPE MASE RMSSE
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Norther… Clothin… seaso… Trai… 1.21e-4 0.0890 0.0684 -0.0473 3.16 0.363 0.354
## # ℹ 1 more variable: ACF1 <dbl>
Here we see a much lower RMSE value of 0.089. This is an improvement on the models above