### Load Libraries
library(fpp3)
## Warning: package 'fpp3' was built under R version 4.3.3
Consider the the number of pigs slaughtered in Victoria,
available in the aus_livestock
dataset.
Use the ETS()
function to estimate the equivalent
model for simple exponential smoothing. Find the optimal values of alpha
and l_0, and generate forecasts for the next four months.
The ETS model produces an l_0 of 67178.49 and an alpha of 0.225021.
Originally, without specification, an Additive Holt-Winters’ with additive errors was selected but I forced it to be simple exponential smoothing with additive errors to complete Part b.
I also took just the data from January 2010 onwards. My thought being it would display better, and, data from the distant past may not be as relevant for our purposes.
### Exercise 8.1
## Part a
# Select data, truncated for display purposes
pigs_slaughtered <- aus_livestock |>
filter(Animal == "Pigs") |>
filter(State == "Victoria") |>
filter_index("2010 Jan" ~ .)
# Create forecast
fit <- pigs_slaughtered |>
model(ETS(Count ~ error("A") + trend("N") + season("N")))
fc1 <- fit |>
forecast(h = 4)
# Graph forecast
fc1 |>
autoplot(pigs_slaughtered) +
geom_line(aes(y = .fitted), col="#D55E00",
data = augment(fit)) +
labs(y="Count", title="Pigs slaughtered: Victoria")
# Obtain alpha and l_0
report(fit)
## Series: Count
## Model: ETS(A,N,N)
## Smoothing parameters:
## alpha = 0.225021
##
## Initial states:
## l[0]
## 67178.49
##
## sigma^2: 49463947
##
## AIC AICc BIC
## 2423.061 2423.292 2431.107
Compute a 95% prediction interval for the first forecast using y_hat ± 1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.
Since fable is able to formulaically calculate the prediction interval we should get the same values if we calculate ourselves or if we use the 95% prediction interval generated from R. However I get 67178.49±13784.8 vs the 95295±4.9e+07 generated by R.
I need to do more digging. It looks like 95% is the default interval
because the interval generated by R is the same for above where we
didn’t specify 95% and below where we did. Maybe I need a different
function than report()
to get the forecasted values. And if
l_0 is 67178.49, then is l_1 95295?
## Part b
# Obtain 95% prediction interval from R
fc2 <- fit |>
forecast(h = 4, level = 95)
print(fc2$Count[1])
## <distribution[1]>
## [1] N(95295, 4.9e+07)
# Calculate the 95% Prediction Interval ourselves
sigma <- sqrt(49463947)
1.96*sigma
## [1] 13784.8
Data set global_economy contains the annual Exports from many countries. Select one country to analyse.
Italia!
Plot the Exports series and discuss the main features of the data.
Exports is defined as a percentage of GDP. We’ve graphed both Exports as a percentage of GDP and Exports in USD, however for modeling we’re going to stick with Exports as a percentage of GDP because otherwise the story gets lost in the general growth of the Italian economy.
The graph generally shows a transition from a low export economy (10% of GDP) in the 1960s to a high export economy (>30%) in 2017. In the 1990s there was a dramatic drop in exports as a percentage of GDP.
For context I looked for what economic challenges Italy faced in the 1990s and amongst increased globalism changing supply chains notably there was a 1992 Lira crisis where investors put pressure on the Italian government to maintain their exchange rate betting that Italy couldn’t do it with high public debt and a bad economy and they bet right, Italy had to devalue the Lira and they made a lot of money.
### Exercise 8.5
## Part a
italy_economy <- global_economy |>
filter(Country == "Italy")
# filter(State == "Victoria") |>
#filter_index("2010 Jan" ~ .)
italy_economy <- italy_economy |>
mutate(ExportsNum = GDP * Exports)
autoplot(italy_economy, vars(Exports)) +
labs(y="Exports as % of GDP", x="", title="Global Economy: Italy")
autoplot(italy_economy, vars(ExportsNum)) +
labs(y="Exports in USD", x="", title="Global Economy: Italy")
Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.
## Part b
# simplify tsibble so I don't have to specify Export column
italy_exports <- italy_economy |>
select(Year, Exports)
# Create forecast
fit2 <- italy_exports |>
model(ETS(Exports ~ error("A") + trend("N") + season("N")))
fc3 <- fit2 |>
forecast(h = 6)
# Plot forecast
fc3 |>
autoplot(italy_exports) +
geom_line(aes(y = .fitted), col="#D55E00",
data = augment(fit2)) +
labs(y="Percentage", title="Italian Exports as a Percentage of GDP")
Compute the RMSE values for the training data.
The RMSE is 1.335065
## Part c
# Obtain RMSE
accuracy(fit2)[4]
## # A tibble: 1 × 1
## RMSE
## <dbl>
## 1 1.34
Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.
Visually the new forecast with trend is a better baseline than the previous forecast without trend. Also the RMSE improved from 1.335065 to 1.295316.
## Part d
# Create forecast
fit3 <- italy_exports |>
model(ETS(Exports ~ error("A") + trend("A") + season("N")))
fc4 <- fit3 |>
forecast(h = 6)
# Plot forecast
fc4 |>
autoplot(italy_exports) +
geom_line(aes(y = .fitted), col="#D55E00",
data = augment(fit3)) +
labs(y="Percentage", title="Italian Exports as a Percentage of GDP with Trend")
# Obtain RMSE
accuracy(fit3)[4]
## # A tibble: 1 × 1
## RMSE
## <dbl>
## 1 1.30
Find an ETS model for the Gas data from
aus_production
and forecast the next few years. Why is
multiplicative seasonality necessary here? Experiment with making the
trend damped. Does it improve the forecasts?
Oh, this is easy. Look at the two graphs below, the first with multiplicative seasonality and the second with additive. The multiplicative has more amplitude with the seasonality in the forecast to reflect the increasing mean due to trend whereas the additive seasonality is constant.
Look at the third plot to see trend dampening. From the text, trend dampening is consistently used in commercial applications because without trend dampening forecast tend to run high with positive trend. Trend dampening allows for regression to the mean and in this case, we’ll take it.
### Exercise 8.7
# Select data
aus_production_gas <- aus_production |>
select(Quarter, Gas) |>
filter_index("2000 Q1" ~ .)
# model(ETS(Gas ~ error("A") + trend("A") + season("N")))
fit4 <- aus_production_gas |>
model(ETS(Gas))
fc5 <- fit4 |>
forecast(h = 10)
# Plot forecast
fc5 |>
autoplot(aus_production_gas) +
geom_line(aes(y = .fitted), col="#D55E00",
data = augment(fit4)) +
labs(y="Petajoules", title="Australian Production: Gas, with Multiplicative Seasonality")
#report(fit4)
fit5 <- aus_production_gas |>
model(ETS(Gas ~ error("M") + trend("A") + season("A")))
fc6 <- fit5 |>
forecast(h = 10)
# Plot forecast
fc6 |>
autoplot(aus_production_gas) +
geom_line(aes(y = .fitted), col="#D55E00",
data = augment(fit5)) +
labs(y="Petajoules", title="Australian Production: Gas, with Additative Seasonality")
fit6 <- aus_production_gas |>
model(ETS(Gas ~ error("M") + trend("Ad") + season("M")))
fc7 <- fit6 |>
forecast(h = 10)
# Plot forecast
fc7 |>
autoplot(aus_production_gas) +
geom_line(aes(y = .fitted), col="#D55E00",
data = augment(fit6)) +
labs(y="Petajoules", title="Aus Prod: Gas, with Multiplicative Season + Dampening Trend")
Recall your retail time series data (from Exercise 7 in Section 2.10).
### Exercise 8.8
# Create my retail time series data
set.seed(20240915)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
Why is multiplicative seasonality necessary for this series?
Because there is trend (in this case negative) we want to see multiplicative seasonality so that the amplitude of the seasonality can come down with the mean point forecast. With additive seasonality the amplitude will stay high even at lower mean forecasts.
myseries <- myseries |>
filter_index("2015 Jan" ~ .)
fit7 <- myseries |>
model(ETS(Turnover ~ error("M") + trend("A") + season("M")))
fc8 <- fit7 |>
forecast(h = 48)
# Plot forecast
fc8 |>
autoplot(myseries) +
geom_line(aes(y = .fitted), col="#D55E00",
data = augment(fit7)) +
labs(y="Department Store Turnover", title="Multiplicative Seasonality")
data(myseries)
## Warning in data(myseries): data set 'myseries' not found
Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.
We already did the additive trend, multiplicative seasonality in the above example. Below we recreate it with damped trend. What we find is the negative trend levels off like a regression to the mean.
fit8 <- myseries |>
model(ETS(Turnover ~ error("M") + trend("Ad") + season("M")))
fc9 <- fit8 |>
forecast(h = 48)
# Plot forecast
fc9 |>
autoplot(myseries) +
geom_line(aes(y = .fitted), col="#D55E00",
data = augment(fit8)) +
labs(y="Department Store Turnover", title="Multiplicative Seasonality")
Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?
These RMSEs are very close. By dampening the trend the RMSE decreases from 2.806523 to 2.80587 so we prefer the dampened trend, especially for longer forecast horizons.
# Obtain RMSE
accuracy(fit7)[6]
## # A tibble: 1 × 1
## RMSE
## <dbl>
## 1 2.81
# Obtain RMSE
accuracy(fit8)[6]
## # A tibble: 1 × 1
## RMSE
## <dbl>
## 1 2.81
Check that the residuals from the best method look like white noise.
I graphed the residuals below and it looks like it could be normally
distributed however I wasn’t certain so I then graphed the
autocorrelation function and only one spike was outside of the blue
lines. If I set maxLag
equal to twelve I get the same graph
so I assume it automatically picks a lag of twelve for monthly data.
I asked ChatGPT in case I’m missing something and it suggested I perform a Ljung-Box test as a statistical test to determine if the results are white noise. So I ran that too and, if we use a p-value of less than 0.05 as a metric then, we accept the residuals as white noise with a p-value of 0.04088.
residuals_fit8 <- fit8 |>
augment(fit8)
ggplot(residuals_fit8, aes(x = Month, y = .resid)) +
geom_line() +
labs(title = "Residuals from ETS Model", y = "Residuals", x = "Time")
residuals_acf <- residuals_fit8 |>
ACF(.resid)
# Step 3: Plot the ACF using autoplot()
autoplot(residuals_acf) +
labs(title = "ACF of Residuals from ETS Model")
Box.test(residuals_fit8$.resid, lag = 12, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: residuals_fit8$.resid
## X-squared = 21.712, df = 12, p-value = 0.04088
Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?
We first ran a seasonal naïve model and arrived at an RMSE value of 5.382176 then step-wise made changes to approach a Holt-Winters Damped model and each time made improvements on RMSE. (Arranged in increasing RMSE, only the final Holt-Winters Damped is coded)
A,Ad,M = 4.824267 (Holt-Winters Damped) A,A,M = 4.914536
A,A,A = 5.38059
A,N,A = 5.382176 (Seasonal Naïve)
set.seed(20240915)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries_test <- myseries |>
filter_index(. ~ "2010 Dec")
fit9 <- myseries_test |>
model(ETS(Turnover ~ error("A") + trend("Ad") + season("M")))
fc10 <- fit9 |>
forecast(h = 96)
# Plot forecast
fc10 |>
autoplot(myseries) +
geom_line(aes(y = .fitted), col="#D55E00",
data = augment(fit9)) +
labs(y="Department Store Turnover", title="Multiplicative Seasonality")
# Obtain RMSE
accuracy(fit9)[6]
## # A tibble: 1 × 1
## RMSE
## <dbl>
## 1 4.82
For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?
This was incredibly complicated for me and the resulting work is inelegant with lots of steps. I first did just the Box-Cox transformed series and what I learned is I needed to then transform my original data as well for comparison. This meant the RMSE changed from 4.824267 to 0.04834055 but it’s not comparable because the scale is now different. It would have been interesting if I could have undone the box-cox transformation after I created the model.
Then I very tortuously added the STL following from what I could gather in Chapter 5.7 Forecasting with decomposition. I learned that I needed to make the forecast tsibble and the original tsibble with the same variables and applying the STL decomposition was an improvement over just applying the Box-Cox transformation, from 0.04834055 to 0.04411804.
lambda <- myseries_test |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
myseries_test_bc <- myseries_test |>
mutate(Turnover_bc = box_cox(Turnover, lambda))
fit10 <- myseries_test_bc |>
model(ETS(Turnover_bc ~ error("A") + trend("Ad") + season("M")))
fc11 <- fit10 |>
forecast(h = 96)
myseries_bc <- myseries |>
mutate(Turnover_bc = box_cox(Turnover, lambda))
# Plot forecast
fc11 |>
autoplot(myseries_bc) +
geom_line(aes(y = .fitted), col="#D55E00",
data = augment(fit10)) +
labs(y="Department Store Turnover", title="Multiplicative Seasonality")
# Obtain RMSE
accuracy(fit10)[6]
## # A tibble: 1 × 1
## RMSE
## <dbl>
## 1 0.0483
set.seed(20240915)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
lambda <- myseries |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
myseries_bc <- myseries |>
mutate(Turnover_bc = box_cox(Turnover, lambda))
dcmp <- myseries_bc |>
model(STL(Turnover_bc)) |>
components()
dcmp_sa <- dcmp |>
select(Month, season_adjust) |>
rename(Turnover = season_adjust)
myseries_stl_tibble = merge(myseries_bc, dcmp_sa, by="Month") |>
rename(Turnover = Turnover.y)
myseries_stl <- myseries_stl_tibble |>
as_tsibble(index = Month) |>
select(Month, Turnover)
myseries_stl_test <- myseries_stl |>
filter_index(. ~ "2010 Dec")
fit11 <- myseries_stl_test |>
model(ETS(Turnover ~ error("A") + trend("Ad") + season("M")))
fc12 <- fit11 |>
forecast(h = 96)
myseries_keystructure <- myseries |>
select(Month, Turnover)
myseries_keystructure <- myseries_keystructure |>
mutate(Turnover = box_cox(Turnover, lambda))
# Plot forecast
fc12 |>
autoplot(myseries_keystructure) +
geom_line(aes(y = .fitted), col="#D55E00",
data = augment(fit11)) +
labs(y="Department Store Turnover", title="Multiplicative Seasonality")
# Obtain RMSE
accuracy(fit11)[6]
## # A tibble: 1 × 1
## MPE
## <dbl>
## 1 0.0441
These exercises come from ‘Forecasting: Principles and Practice’ by
Rob J Hyndman and George Athanasopoulos, 3rd Ed
https://otexts.com/fpp3/
### Load Libraries
library(fpp3)
### Exercise 8.1
## Part a
# Select data, truncated for display purposes
pigs_slaughtered <- aus_livestock |>
filter(Animal == "Pigs") |>
filter(State == "Victoria") |>
filter_index("2010 Jan" ~ .)
# Create forecast
fit <- pigs_slaughtered |>
model(ETS(Count ~ error("A") + trend("N") + season("N")))
fc1 <- fit |>
forecast(h = 4)
# Graph forecast
fc1 |>
autoplot(pigs_slaughtered) +
geom_line(aes(y = .fitted), col="#D55E00",
data = augment(fit)) +
labs(y="Count", title="Pigs slaughtered: Victoria")
# Obtain alpha and l_0
report(fit)
## Part b
# Obtain 95% prediction interval from R
fc2 <- fit |>
forecast(h = 4, level = 95)
print(fc2$Count[1])
# Calculate the 95% Prediction Interval ourselves
sigma <- sqrt(49463947)
1.96*sigma
### Exercise 8.5
## Part a
italy_economy <- global_economy |>
filter(Country == "Italy")
# filter(State == "Victoria") |>
#filter_index("2010 Jan" ~ .)
italy_economy <- italy_economy |>
mutate(ExportsNum = GDP * Exports)
autoplot(italy_economy, vars(Exports)) +
labs(y="Exports as % of GDP", x="", title="Global Economy: Italy")
autoplot(italy_economy, vars(ExportsNum)) +
labs(y="Exports in USD", x="", title="Global Economy: Italy")
## Part b
# simplify tsibble so I don't have to specify Export column
italy_exports <- italy_economy |>
select(Year, Exports)
# Create forecast
fit2 <- italy_exports |>
model(ETS(Exports ~ error("A") + trend("N") + season("N")))
fc3 <- fit2 |>
forecast(h = 6)
# Plot forecast
fc3 |>
autoplot(italy_exports) +
geom_line(aes(y = .fitted), col="#D55E00",
data = augment(fit2)) +
labs(y="Percentage", title="Italian Exports as a Percentage of GDP")
## Part c
# Obtain RMSE
accuracy(fit2)[4]
## Part d
# Create forecast
fit3 <- italy_exports |>
model(ETS(Exports ~ error("A") + trend("A") + season("N")))
fc4 <- fit3 |>
forecast(h = 6)
# Plot forecast
fc4 |>
autoplot(italy_exports) +
geom_line(aes(y = .fitted), col="#D55E00",
data = augment(fit3)) +
labs(y="Percentage", title="Italian Exports as a Percentage of GDP with Trend")
# Obtain RMSE
accuracy(fit3)[4]
### Exercise 8.7
# Select data
aus_production_gas <- aus_production |>
select(Quarter, Gas) |>
filter_index("2000 Q1" ~ .)
# model(ETS(Gas ~ error("A") + trend("A") + season("N")))
fit4 <- aus_production_gas |>
model(ETS(Gas))
fc5 <- fit4 |>
forecast(h = 10)
# Plot forecast
fc5 |>
autoplot(aus_production_gas) +
geom_line(aes(y = .fitted), col="#D55E00",
data = augment(fit4)) +
labs(y="Petajoules", title="Australian Production: Gas, with Multiplicative Seasonality")
#report(fit4)
fit5 <- aus_production_gas |>
model(ETS(Gas ~ error("M") + trend("A") + season("A")))
fc6 <- fit5 |>
forecast(h = 10)
# Plot forecast
fc6 |>
autoplot(aus_production_gas) +
geom_line(aes(y = .fitted), col="#D55E00",
data = augment(fit5)) +
labs(y="Petajoules", title="Australian Production: Gas, with Additative Seasonality")
fit6 <- aus_production_gas |>
model(ETS(Gas ~ error("M") + trend("Ad") + season("M")))
fc7 <- fit6 |>
forecast(h = 10)
# Plot forecast
fc7 |>
autoplot(aus_production_gas) +
geom_line(aes(y = .fitted), col="#D55E00",
data = augment(fit6)) +
labs(y="Petajoules", title="Aus Prod: Gas, with Multiplicative Season + Dampening Trend")
### Exercise 8.8
# Create my retail time series data
set.seed(20240915)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries <- myseries |>
filter_index("2015 Jan" ~ .)
fit7 <- myseries |>
model(ETS(Turnover ~ error("M") + trend("A") + season("M")))
fc8 <- fit7 |>
forecast(h = 48)
# Plot forecast
fc8 |>
autoplot(myseries) +
geom_line(aes(y = .fitted), col="#D55E00",
data = augment(fit7)) +
labs(y="Department Store Turnover", title="Multiplicative Seasonality")
data(myseries)
fit8 <- myseries |>
model(ETS(Turnover ~ error("M") + trend("Ad") + season("M")))
fc9 <- fit8 |>
forecast(h = 48)
# Plot forecast
fc9 |>
autoplot(myseries) +
geom_line(aes(y = .fitted), col="#D55E00",
data = augment(fit8)) +
labs(y="Department Store Turnover", title="Multiplicative Seasonality")
# Obtain RMSE
accuracy(fit7)[6]
# Obtain RMSE
accuracy(fit8)[6]
residuals_fit8 <- fit8 |>
augment(fit8)
ggplot(residuals_fit8, aes(x = Month, y = .resid)) +
geom_line() +
labs(title = "Residuals from ETS Model", y = "Residuals", x = "Time")
residuals_acf <- residuals_fit8 |>
ACF(.resid)
# Step 3: Plot the ACF using autoplot()
autoplot(residuals_acf) +
labs(title = "ACF of Residuals from ETS Model")
Box.test(residuals_fit8$.resid, lag = 12, type = "Ljung-Box")
set.seed(20240915)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries_test <- myseries |>
filter_index(. ~ "2010 Dec")
fit9 <- myseries_test |>
model(ETS(Turnover ~ error("A") + trend("Ad") + season("M")))
fc10 <- fit9 |>
forecast(h = 96)
# Plot forecast
fc10 |>
autoplot(myseries) +
geom_line(aes(y = .fitted), col="#D55E00",
data = augment(fit9)) +
labs(y="Department Store Turnover", title="Multiplicative Seasonality")
# Obtain RMSE
accuracy(fit9)[6]
lambda <- myseries_test |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
myseries_test_bc <- myseries_test |>
mutate(Turnover_bc = box_cox(Turnover, lambda))
fit10 <- myseries_test_bc |>
model(ETS(Turnover_bc ~ error("A") + trend("Ad") + season("M")))
fc11 <- fit10 |>
forecast(h = 96)
myseries_bc <- myseries |>
mutate(Turnover_bc = box_cox(Turnover, lambda))
# Plot forecast
fc11 |>
autoplot(myseries_bc) +
geom_line(aes(y = .fitted), col="#D55E00",
data = augment(fit10)) +
labs(y="Department Store Turnover", title="Multiplicative Seasonality")
# Obtain RMSE
accuracy(fit10)[6]
set.seed(20240915)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
lambda <- myseries |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
myseries_bc <- myseries |>
mutate(Turnover_bc = box_cox(Turnover, lambda))
dcmp <- myseries_bc |>
model(STL(Turnover_bc)) |>
components()
dcmp_sa <- dcmp |>
select(Month, season_adjust) |>
rename(Turnover = season_adjust)
myseries_stl_tibble = merge(myseries_bc, dcmp_sa, by="Month") |>
rename(Turnover = Turnover.y)
myseries_stl <- myseries_stl_tibble |>
as_tsibble(index = Month) |>
select(Month, Turnover)
myseries_stl_test <- myseries_stl |>
filter_index(. ~ "2010 Dec")
fit11 <- myseries_stl_test |>
model(ETS(Turnover ~ error("A") + trend("Ad") + season("M")))
fc12 <- fit11 |>
forecast(h = 96)
myseries_keystructure <- myseries |>
select(Month, Turnover)
myseries_keystructure <- myseries_keystructure |>
mutate(Turnover = box_cox(Turnover, lambda))
# Plot forecast
fc12 |>
autoplot(myseries_keystructure) +
geom_line(aes(y = .fitted), col="#D55E00",
data = augment(fit11)) +
labs(y="Department Store Turnover", title="Multiplicative Seasonality")
# Obtain RMSE
accuracy(fit11)[6]