library(tsibble)
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
##
## Attaching package: 'tsibble'
## The following objects are masked from 'package:base':
##
## intersect, setdiff, union
library(fpp3)
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
## ✔ tibble 3.3.1 ✔ ggplot2 4.0.1
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.2 ✔ feasts 0.4.2
## ✔ lubridate 1.9.4 ✔ fable 0.5.0
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ lubridate::interval() masks tsibble::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
library(ggplot2)
library(dplyr)
Produce forecasts for the following series using whichever of
NAIVE(y), SNAIVE(y) or
RW(y ~ drift()) is more appropriate in each case:
global_economy) global_economy %>%
filter(Country == "Australia") %>%
model(RW(Population ~ drift())) %>%
forecast(h = 10) %>%
autoplot(global_economy)
I used random walk since the Austrailian poualtion Australian population shows a strong upward trend over time with no seasonality. A random walk with drift captures a persistent trend.
aus_production)aus_production %>%
model(SNAIVE(Bricks)) %>%
forecast(h = 8) %>%
autoplot(aus_production)
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
I used Snative or seasonal nativesince brick production has strong quarterly seasonality (construction cycles), and often weak long-term trend.
aus_livestock)aus_livestock %>%
filter(State == "New South Wales",
Animal == "Lambs") %>%
model(SNAIVE(Count)) %>%
forecast(h = "2 years") %>%
autoplot(aus_livestock)
I used snative for this as there is also a strong seasonal componant to livestock.
hh_budget %>%
model(RW(Wealth ~ drift())) %>%
forecast(h = 10) %>%
autoplot(hh_budget)
This is rw-drift as thre is an upwards trend, but it doesn’t have seasonality.
aus_retail).fc <- aus_retail %>%
model(SNAIVE(Turnover)) %>%
forecast(h = 12)
fc %>%
autoplot(aus_retail) +
facet_null()
Retail turnover is strongly seasonal withe effects such as seasonal hiring and firing.
Use the Facebook stock price (data set gafa_stock)
to do the following:
fb <- gafa_stock %>%
filter(Symbol == "FB")
fb %>%
autoplot(Close) +
labs(title = "Facebook (Meta) Daily Closing Price",
y = "Closing Price ($)")
fb <- gafa_stock %>%
filter(Symbol == "FB") %>%
update_tsibble(index = Date, regular = TRUE) %>%
fill_gaps()
fit_drift <- fb %>%
model(RW(Close ~ drift()))
fc_drift <- fit_drift %>%
forecast(h = 30)
fc_drift %>%
autoplot(fb) +
labs(title = "Facebook Stock Forecast (Drift Method)")
# compute drift components
first <- fb$Close[1]
last <- fb$Close[nrow(fb)]
n <- nrow(fb)
slope <- (last - first)/(n - 1)
# create drift line
fb <- fb %>%
mutate(
t = row_number(),
drift = first + slope*(t - 1)
)
# plot
fb %>%
autoplot(Close) +
geom_line(aes(y = drift), color = "red", linewidth = 1) +
labs(title = "Drift = Line Between First and Last Points")
Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?
The naive method performs best for this dataset. Stock prices behave like a random walk, meaning the next value is best predicted by the most recent observation. The data shows no stable trend or seasonality, so the mean, drift, and seasonal naive methods make incorrect assumptions about the structure, while the naive method matches the unpredictable nature of stock prices.
fb <- gafa_stock %>%
filter(Symbol == "FB") %>%
update_tsibble(index = Date, regular = TRUE) %>%
fill_gaps()
fb %>%
model(NAIVE(Close)) %>%
forecast(h = 30) %>%
autoplot(fb) +
labs(title = "Facebook Stock Forecast (Naive Method)")
Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.
# Extract data of interest
recent_production <- aus_production |>
filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()
## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_rug()`).
# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)
What do you conclude?
A seasonal naïve model was fitted to the quarterly beer production data from 1992 onward. The residual diagnostics show no clear trend or remaining seasonal structure, and most autocorrelations lie within the significance bounds. This suggests the residuals are approximately white noise and that the seasonal naïve model adequately captures the seasonal pattern in the data.
Repeat the previous exercise using the Australian Exports series
from global_economy and the Bricks series from
aus_production. Use whichever of NAIVE() or
SNAIVE() is more appropriate in each case.
#extract value
exports <- global_economy %>%
filter(Country == "Australia")
#fit model
fit_exp <- exports %>%
model(NAIVE(Exports))
fit_exp %>%
gg_tsresiduals()
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_rug()`).
#forecasts
fit_exp %>%
forecast(h = 10) %>%
autoplot(exports) +
labs(title = "Australian Exports Forecast (Naive)")
For the Australian Exports series, the naive method was used because the data is annual and doesn’t acontain seasonal pattern. The residuals resemble white noise, indicating an adequate model.
#extract
bricks <- aus_production %>%
select(Quarter, Bricks)
#fit model
fit_bricks <- bricks %>%
model(SNAIVE(Bricks))
fit_bricks %>%
gg_tsresiduals()
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 24 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_rug()`).
#forecasts
fit_bricks %>%
forecast(h = 8) %>%
autoplot(bricks) +
labs(title = "Bricks Forecast (Seasonal Naive)")
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
For the Bricks series, the seasonal naive method was used because the quarterly data display strong seasonality. The residual diagnostics show no remaining structure, and the forecasts repeat the most recent seasonal cycle.
aus_livestock using SNAIVE(). Plot the
resulting forecasts including the historical data. Is this a reasonable
benchmark for these series?#Extract
vic_livestock <- aus_livestock |>
filter(State == "Victoria")
#Fit Model
fit_vic <- vic_livestock |>
model(SNAIVE(Count))
#Forecast Seasonal Naive
fit_vic <- vic_livestock |>
model(SNAIVE(Count))
#Forecast
fc_vic <- fit_vic |>
forecast(h = "2 years")
#Plot
fc_vic |>
autoplot(vic_livestock) +
facet_wrap(~Animal, scales = "free_y") +
scale_x_yearmonth(date_breaks = "20 years", date_labels = "%y") +
labs(
title = "Victorian Livestock Forecasts (Seasonal Naive)",
y = "Number slaughtered",
x = "Year"
) +
theme(
axis.text.x = element_text(size = 7),
panel.spacing = unit(0.8, "lines")
)
Are the following statements true or false? Explain your answer.
False, normal residuals are not required for good forecasts. What matters is that residuals look like white noise: no pattern, no autocorrelation, and mean near zero. Normality mainly affects statistical inference and prediction intervals, not forecast accuracy.
False, small residuals on the training data only show the model fits the past well. The model may be overfitting and capturing noise, which leads to poor future predictions. Forecast quality must be judged using new (test) data.
False, MAPE is easy to interpret but unreliable when values are near zero and can bias results. Other measures like MAE, RMSE, or MASE are often more appropriate. No single accuracy measure is always best.
False, more complexity often causes overfitting and worse forecasts. Simple models are usually more stable and reliable. It is better to check trend, seasonality, or transformations rather than just adding parameters.
False, a single test set can be misleading due to random variation or unusual events. Models should be evaluated across multiple forecast periods (time-series cross-validation) and also judged on stability and interpretability, not just one accuracy number.
For your retail time series (from Exercise 7 in Section 2.10):
set.seed(12345678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries_train <- myseries |> filter(year(Month) < 2011)
Check that your data have been split appropriately by producing the following plot.
The data were divided into a training set consisting of observations prior to 2011 and a test set covering the remaining period. The plot confirms the split was performed correctly, with the training data shown separately from the test data used for forecast evaluation.
autoplot(myseries, Turnover) + autolayer(myseries_train, Turnover, colour = "red")
SNAIVE() applied to
your training data (myseries_train). fit <- myseries_train |>
model(SNAIVE(Turnover ~ lag("year")))
fit |> gg_tsresiduals()
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_rug()`).
Do the residuals appear to be uncorrelated and normally distributed?
The residual diagnostics suggest the model is not fully adequate. The residual time plot shows noticeable patterns rather than random fluctuations around zero. In the ACF plot, several lags exceed the significance bounds, indicating remaining autocorrelation. This means the model has not captured all of the dependence structure in the data. Although the histogram of the residuals appears roughly symmetric, the presence of autocorrelation violates the assumption of uncorrelated errors. Therefore, the seasonal naïve model does not sufficiently explain the series and a more complex model is needed.Produce forecasts for the test data
fc <- fit |>
forecast(new_data = anti_join(myseries, myseries_train))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fc |>
autoplot(myseries)
Compare the accuracy of your forecasts against the actual values.
The seasonal naïve model provides reasonable forecasts due to the strong seasonal structure of the data. However, the remaining autocorrelation suggests that more advanced models could improve forecast accuracy.
fit |>
accuracy()
## # A tibble: 1 × 12
## State Industry .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Norther… Clothin… "SNAI… Trai… 0.439 1.21 0.915 5.23 12.4 1 1 0.768
fc |>
accuracy(myseries)
## # A tibble: 1 × 12
## .model State Industry .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 "SNAIVE(… Nort… Clothin… Test 0.836 1.55 1.24 5.94 9.06 1.36 1.28 0.601
How sensitive are the accuracy measures to the amount of training data used?
The accuracy measures are not highly sensitive to the amount of training data used because the seasonal naïve model relies mainly on the most recent seasonal observations. As long as at least one full seasonal cycle is available, the forecasts remain similar, though a longer training set provides more stable accuracy estimates.