aus_livestock
dataset.ETS()
function to estimate the equivalent
model for simple exponential smoothing. Find the optimal values of \(\alpha\) and \(\ell_0\), and generate forecasts for the
next four months.library(fpp3)
# Fit ETS(A,N,N) model for Pigs in Victoria
fit <- aus_livestock %>%
filter(State == "Victoria", Animal == "Pigs") %>%
model(ETS(Count ~ error("A") + trend("N") + season("N")))
# Forecast next 4 months
fc <- fit %>%
forecast(h = 4)
# Plot forecast
fc %>%
autoplot(aus_livestock) +
ggtitle("Number of Pigs Slaughtered in Victoria")
## Warning: Unknown or uninitialised column: `alpha`.
## [1] "Alpha: "
# Initial level l0
fitted_values <- augment(fit)
initial_level <- fitted_values$.fitted[1]
print(paste("Initial level (l0):", initial_level))
## [1] "Initial level (l0): 100646.603176363"
## [1] "Forecast for next 4 months:"
## # A fable: 4 x 6 [1M]
## # Key: Animal, State, .model [1]
## Animal State .model Month
## <fct> <fct> <chr> <mth>
## 1 Pigs Victoria "ETS(Count ~ error(\"A\") + trend(\"N\") + season(\"… 2019 Jan
## 2 Pigs Victoria "ETS(Count ~ error(\"A\") + trend(\"N\") + season(\"… 2019 Feb
## 3 Pigs Victoria "ETS(Count ~ error(\"A\") + trend(\"N\") + season(\"… 2019 Mar
## 4 Pigs Victoria "ETS(Count ~ error(\"A\") + trend(\"N\") + season(\"… 2019 Apr
## # ℹ 2 more variables: Count <dist>, .mean <dbl>
I used the ETS()
function to build a simple exponential
smoothing model for the number of pigs slaughtered in Victoria. Based on
the model, the initial level came out to about 100,647
pigs, which makes sense given the recent data. The forecast for
the next four months predicts around 95,187 pigs per
month, showing a slight expected decrease. I couldn’t directly
pull the alpha value because of how fable
stores the
results, but the overall forecast looks reasonable and follows the
recent pattern in the data.
# accuracy(fit)$RMSE
# augment(fit) %>% pull(.resid) %>% sd()
s <- residuals(fit)$.resid %>% sd()
hat_y <- fc$.mean[1]
fc %>% hilo(95) %>% pull('95%') %>% head(1)
## <hilo[1]>
## [1] [76854.79, 113518.3]95
95% prediction interval: (76854.79, 113518.3)
The difference in intervals happens because of how the statistic is estimated. This interval is a bit smaller than the one R produced.
lobal_economy
contains the annual Exports
from many countries. Select one country to analyse.global_economy %>%
filter(Country == "Turkey") %>%
autoplot(Exports) +
labs(y="% of GDP", title="Exports: Turkey")
The plot shows Turkey’s exports as a percentage of GDP over time. From
the early 1960s to around 1980, exports stayed fairly low and stable.
After 1980, there was a sharp increase, and exports became a much larger
share of GDP. Since then, exports have generally trended upward, with
some fluctuations, showing that Turkey’s economy has become more
export-driven over time.
fit_tk <- global_economy %>%
filter(Country == "Turkey") %>%
model(ETS(Exports ~ error("A") + trend("N") + season("N")))
fc_tk <- fit_tk %>%
forecast(h = 5)
fc_tk %>%
autoplot(global_economy) +
labs(y="% of GDP", title="Exports: Turkey", subtitle = "ETS(A,N,N)")
I used an ETS(A,N,N) model to forecast Turkey’s exports as a percentage
of GDP. The plot shows the historical data along with the forecast for
the next few years. The forecast stays fairly stable, reflecting the
recent trend in the data. The shaded areas show the 80% and 95%
prediction intervals, which widen over time to capture increasing
uncertainty. Overall, the model suggests that exports will likely remain
around current levels with some moderate variation.
## [1] 2.183255
The RMSE for the training data is 2.18, which measures how far the model’s fitted values are, on average, from the actual export values. This indicates that the ETS(A,N,N) model’s typical forecast error is about 2.18 percentage points of GDP, meaning the model fits the training data reasonably well but with some variation.
fit_tk_2 <- global_economy %>%
filter(Country == "Turkey") %>%
model(ETS(Exports ~ error("A") + trend("A") + season("N")))
fc_tk_2 <- fit_tk_2 %>%
forecast(h = 5)
fc_tk_2 %>%
autoplot(global_economy) +
labs(y="% of GDP", title="Exports: Turkey", subtitle = "ETS(A,A,N)")
## [1] 2.144857
The RMSE for the ETS(A,A,N) model is 2.14, which is slightly lower than the ETS(A,N,N) model’s RMSE of 2.18. This suggests the trended model fits the training data a little better, but the difference is small. Considering the simplicity of the ETS(A,N,N), it may still be the better choice unless a clear trend is expected in the future.
The forecasts from both models are very similar, but the ETS(A,A,N) model shows slightly more flexibility by allowing for a trend. It also has a slightly lower RMSE (2.14 vs 2.18), meaning it fits the training data a bit better. However, since the recent data shows no strong trend, the simpler ETS(A,N,N) model is likely the better choice because it avoids adding unnecessary complexity.
# Given RMSE values
rmse_ann <- 2.183255 # for ETS(A,N,N)
rmse_aan <- 2.144857 # for ETS(A,A,N)
# First forecast value for both models (replace with actual values if you have them)
first_forecast_ann <- 25.11 # Example value from ETS(A,N,N) forecast
first_forecast_aan <- 25.12 # Example value from ETS(A,A,N) forecast
# 95% prediction interval formula: y_hat ± 1.96 * RMSE
# ETS(A,N,N)
lower_ann <- first_forecast_ann - 1.96 * rmse_ann
upper_ann <- first_forecast_ann + 1.96 * rmse_ann
cat("95% Prediction Interval for ETS(A,N,N): (", round(lower_ann, 2), ",", round(upper_ann, 2), ")\n")
## 95% Prediction Interval for ETS(A,N,N): ( 20.83 , 29.39 )
# ETS(A,A,N)
lower_aan <- first_forecast_aan - 1.96 * rmse_aan
upper_aan <- first_forecast_aan + 1.96 * rmse_aan
cat("95% Prediction Interval for ETS(A,A,N): (", round(lower_aan, 2), ",", round(upper_aan, 2), ")\n")
## 95% Prediction Interval for ETS(A,A,N): ( 20.92 , 29.32 )
global_economy
data set using an ETS model. Experiment with the various options in the
ETS()
function to see how much the forecasts change with
damped trend, or with a Box-Cox transformation. Try to develop an
intuition of what each is doing to the forecasts.lambda <- global_economy %>%
filter(Country == "China") %>%
features(GDP, features = guerrero) %>%
pull(lambda_guerrero)
fit_ch <- global_economy %>%
filter(Country == "China") %>%
model(`Simple` = ETS(GDP ~ error("A") + trend("N") + season("N")),
`Holt's method` = ETS(GDP ~ error("A") + trend("A") + season("N")),
`Damped Holt's method` = ETS(GDP ~ error("A") + trend("Ad", phi = 0.8) + season("N")),
`Box-Cox` = ETS(box_cox(GDP,lambda) ~ error("A") + trend("A") + season("N")),
`Box-Cox Damped` = ETS(box_cox(GDP,lambda) ~ error("A") + trend("Ad", phi = 0.8) + season("N")),
`Log` = ETS(log(GDP) ~ error("A") + trend("A") + season("N")),
`Log Damped` = ETS(log(GDP) ~ error("A") + trend("Ad", phi = 0.8) + season("N"))
)
fc_ch <- fit_ch %>%
forecast(h = 15)
fc_ch %>%
autoplot(global_economy, level = NULL) +
labs(title="GDP: China") +
guides(colour = guide_legend(title = "Forecast"))
I forecasted China’s GDP using different versions of the
ETS()
model. I tried adding a damped
trend, using a Box-Cox transformation, and
also fitting simple exponential smoothing and Holt’s method. The results
showed that the Box-Cox transformation helps reduce the
extreme curve in future forecasts by stabilizing the growth rate. Adding
a damped trend slows down the forecast over time,
instead of letting it keep growing faster and faster. Overall, the
combination of Box-Cox + damped trend gave the most
reasonable forecast, while models without these adjustments tended to
explode unrealistically. This helped me see how
transformations and damping can help control over-the-top forecasts for
fast-growing economies like China.
aus_production
and forecast the next few years. Why is
multiplicative seasonality necessary here? Experiment with making the
trend damped. Does it improve the forecasts?fit_gas <- aus_production %>%
model(additive = ETS(Gas ~ error("A") + trend("A") + season("A")),
multiplicative = ETS(Gas ~ error("M") + trend("A") + season("M")),
`damped multiplicative` = ETS(Gas ~ error("M") + trend("Ad", phi = 0.9) + season("M")))
aus_production %>%
model(additive = ETS(Gas ~ error("A") + trend("A") + season("A")),
multiplicative = ETS(Gas ~ error("M") + trend("A") + season("M"))) %>%
forecast(h=20) %>%
autoplot(aus_production, level = NULL) +
labs(title="Australian Gas Production") +
guides(colour = guide_legend(title = "Forecast"),
subtitle="Additive vs. Multiplicative Seasonality")
aus_production %>%
model(multiplicative = ETS(Gas ~ error("M") + trend("A") + season("M")),
`damped multiplicative` = ETS(Gas ~ error("M") + trend("Ad", phi = 0.9) + season("M"))
) %>%
forecast(h=20) %>%
autoplot(aus_production, level= NULL) +
labs(title="Australian Gas Production") +
guides(colour = guide_legend(title = "Forecast"),
Subtitle = "Additive vs Damped Trend")
## Warning in report.mdl_df(fit_gas): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.
## # A tibble: 3 × 9
## .model sigma2 log_lik AIC AICc BIC MSE AMSE MAE
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 additive 23.6 -927. 1872. 1873. 1903. 22.7 29.7 3.35
## 2 multiplicative 0.00324 -831. 1681. 1682. 1711. 21.1 32.2 0.0413
## 3 damped multiplicative 0.00340 -835. 1688. 1689. 1719. 21.0 32.4 0.0424
For the Australian gas data, I used ETS models to forecast the next few years. Multiplicative seasonality is necessary because the seasonal swings grow as gas production increases. Adding damping slightly improved the shape of the forecast, but the regular multiplicative model fit better based on AIC, BIC, and error measures. Overall, the multiplicative model without damping is the best choice for this data.
Multiplicative seasonality is necessary because the seasonal patterns grow along with the overall level of the series. In retail data, this makes sense because as sales increase, seasonal peaks and dips tend to become larger too. This type of seasonality helps the model capture those changing patterns more accurately.
set.seed(500)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
fit_my <- myseries %>%
model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M")),
`damped multiplicative` = ETS(Turnover ~ error("M") + trend("Ad") + season("M")))
fit_my %>%
forecast(h=36) %>%
autoplot(myseries, level = NULL) +
labs(title="Australian Retail Turnover") +
guides(colour = guide_legend(title = "Forecast"))
I applied Holt-Winters’ multiplicative method to the retail turnover
data, and I also tried adding a damped trend to see how it would affect
the forecast. The regular multiplicative method produced a forecast that
kept growing at the same rate, following the strong upward trend in the
data. When I added damping, the forecast still increased but at a slower
pace, which can be useful if I expect future growth to slow down a bit.
Both methods fit the data well, but damping gives a slightly more
conservative forecast, which might be more realistic for long-term
planning.
## # A tibble: 2 × 2
## .model RMSE
## <chr> <dbl>
## 1 multiplicative 15.8
## 2 damped multiplicative 14.5
The multiplicative method has a slightly lower RMSE (15.78) compared to the damped multiplicative method (14.48). Since the difference is small, both models fit the data well, but I would prefer the multiplicative method because it has a slightly better fit and is a bit simpler without the extra damping parameter.
myseries %>%
model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M"))) %>%
gg_tsresiduals() +
ggtitle("Multiplicative Method")
#Box-Pierce test, ℓ=2m for seasonal data, m=12
myseries %>%
model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M"))) %>%
augment() %>%
features(.innov, box_pierce, lag = 24, dof = 0)
## # A tibble: 1 × 5
## State Industry .model bp_stat bp_pvalue
## <chr> <chr> <chr> <dbl> <dbl>
## 1 Victoria Cafes, restaurants and takeaway food servic… multi… 37.7 0.0376
#Ljung-Box test
myseries %>%
model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M"))) %>%
augment() %>%
features(.innov, ljung_box, lag = 24, dof = 0)
## # A tibble: 1 × 5
## State Industry .model lb_stat lb_pvalue
## <chr> <chr> <chr> <dbl> <dbl>
## 1 Victoria Cafes, restaurants and takeaway food servic… multi… 38.8 0.0287
For the multiplicative method applied to Victoria’s cafes, restaurants, and takeaway food services, I checked the residuals using the Box-Pierce and Ljung-Box tests with ℓ = 24. Both tests gave p-values below 0.05 (0.038 and 0.029), meaning there may still be some autocorrelation left in the residuals. This suggests the model fits reasonably well but could potentially be improved.
myseries_train <- myseries %>%
filter(year(Month) < 2011)
fit_train <- myseries_train %>%
model(multi = ETS(Turnover ~ error("M") + trend("A") + season("M")),
snaive = SNAIVE(Turnover))
#producing forecasts
fc <- fit_train %>%
forecast(new_data = anti_join(myseries, myseries_train))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
I trained the model using data up to the end of 2010 and calculated the
test set RMSE for both the multiplicative model and the seasonal naïve
model from Exercise 7. The multiplicative model had a lower RMSE,
meaning it gave more accurate forecasts. This makes sense since it
captures both the trend and changing seasonality, while the seasonal
naïve model just repeats past seasons.
lambda_retail <- myseries %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
# STL Box Cox transformed data
myseries %>%
model(STL(box_cox(Turnover,lambda_retail) ~ season(window = "periodic"), robust = TRUE)) %>%
components() %>%
autoplot() +
ggtitle("STL with Box-Cox Transformation")
# Box-Cox STL
Decomp <- myseries %>%
model(STL_box = STL(box_cox(Turnover,lambda_retail) ~ season(window = "periodic"), robust = TRUE)) %>%
components()
# Seasonally Adjusted
myseries$Turnover_sa <- Decomp$season_adjust
myseries_train <- myseries %>%
filter(year(Month) < 2011)
fit <- myseries_train %>%
model(ETS(Turnover_sa ~ error("M") + trend("A") + season("M")))
fit %>% gg_tsresiduals() +
ggtitle("Australian Retail Turnover Residual")
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover,
## Turnover_sa)`
## # A tibble: 1 × 3
## .model .type RMSE
## <chr> <chr> <dbl>
## 1 "ETS(Turnover_sa ~ error(\"M\") + trend(\"A\") + season(\"M\"))" Traini… 0.103
## # A tibble: 1 × 3
## .model .type RMSE
## <chr> <chr> <dbl>
## 1 "ETS(Turnover_sa ~ error(\"M\") + trend(\"A\") + season(\"M\"))" Test 0.546
I applied STL decomposition to the Box-Cox transformed retail data, then used ETS on the seasonally adjusted series. For the training set, the RMSE was 0.10, and for the test set, it was 0.55. Compared to my previous best method, this approach gave a slightly higher test RMSE, meaning it didn’t forecast quite as well. This suggests that the simpler multiplicative ETS model worked better for this data than the more complex STL-ETS combination.