SNAIVE() applied to your
training data (myseries_train).library(fpp3)
## ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.4
## ✔ dplyr 1.1.2 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.0 ✔ feasts 0.3.1
## ✔ lubridate 1.9.2 ✔ fable 0.3.3
## ✔ ggplot2 3.4.4 ✔ fabletools 0.3.4
## Warning: package 'tsibble' was built under R version 4.3.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
Do exercises 5.1, 5.2, 5.3, 5.4 and 5.7 in the Hyndman book.
Produce forecasts for the following series using whichever of
NAIVE(y), SNAIVE(y) or
RW(y ~ drift()) is more appropriate in each case:
global_economy)australia_economy <- global_economy |>
filter(Country == "Australia")
australia_economy |>
autoplot(Population)
australia_economy |>
model(RW(Population ~ drift())) |>
forecast(h = 5) |>
autoplot(australia_economy)
aus_production)aus_production |>
filter(!is.na(Bricks)) |>
model(SNAIVE(Bricks ~ lag("year"))) |>
forecast(h= 14) |>
autoplot(aus_production)
## Warning: Removed 20 rows containing missing values (`geom_line()`).
aus_livestock)aus_livestock |>
filter(Animal == "Lambs" & State == "New South Wales") |>
model(
classical_decomposition(Count, type="multiplicative")
) |>
components() |>
autoplot()
## Warning: Removed 6 rows containing missing values (`geom_line()`).
aus_livestock |>
filter(Animal == "Lambs" & State == "New South Wales") |>
ACF(Count) |>
autoplot()
Using the ACF function to identify if there if the data is highly seasonal.
aus_livestock |>
filter(Animal == "Lambs" & State == "New South Wales") |>
model(RW(Count ~ drift())) |>
forecast(h=48) |>
autoplot(aus_livestock)
aus_livestock |>
filter(Animal == "Lambs" & State == "New South Wales") |>
model(SNAIVE(Count ~ lag("year"))) |>
forecast(h=48) |>
autoplot(aus_livestock)
Due to the scallop shape of the ACF graph shows a seasonality, so the
SNAIVE model can be used to forecast the outcome.
hh_budget).hh_budget |>
autoplot(Wealth)
hh_budget |>
model(NAIVE(Wealth)) |>
forecast(h=5) |>
autoplot(hh_budget)
Naive model works well for economic and financial time series.
aus_retail)aus_retail |>
filter(Industry == "Cafes, restaurants and takeaway food services") |>
model(RW(Turnover ~ drift())) |>
forecast(h=24) |>
autoplot(aus_retail) +
facet_wrap(~State, scales="free_y", ncol = 2)
Use the Facebook stock price (data set gafa_stock) to do
the following:
gafa_stock |>
filter(Symbol == "FB") |>
autoplot()
## Plot variable not specified, automatically selected `.vars = Open`
fb <- gafa_stock |>
filter(Symbol == "FB") |>
mutate(Date = row_number()) |>
update_tsibble(index = Date, regular = TRUE)
fb |> model(RW(Open ~ drift())) |>
forecast(h= 63) |>
autoplot(fb)
The first observation:
fb[1,]
The second observation:
fb[1258,]
Using geom_segment we can draw a line from the start of
the time series, to the end.
fb |> model(RW(Open ~ drift())) |>
forecast(h= 63) |>
autoplot(fb) +
geom_segment(aes(x = 1, y = 54.83, xend = 1258, yend = 134.45),
colour = "blue", linetype = "dashed")
As shown above, the line drawn connects to the forecasted line.
fb |>
model(
Mean = MEAN(Open),
`Naïve` = NAIVE(Open),
Drift = NAIVE(Open ~ drift())
) |>
forecast(h = 365) |>
autoplot(fb, level = NULL)
Both the Mean and Naive models do not show
a plausible forecasr for the time series. Drift appears to be the best
because it is unlikely the Opening price would be one constant value for
a year.
Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.
# Extract data of interest
recent_production <- aus_production |>
filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()
## Warning: Removed 4 rows containing missing values (`geom_line()`).
## Warning: Removed 4 rows containing missing values (`geom_point()`).
## Warning: Removed 4 rows containing non-finite values (`stat_bin()`).
# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)
The residuals are centered around zero so the forecasts will not be biased. However the residuals do not show a normal distribution so wthe residuals do not look like white noise.
Another way to test if the residuals are white noise is doing the Portmanteau tests.
augment(fit) |>
features(.resid, ljung_box, lag=4)
Reject the hypothesis of white noise if the p value is less than 0.05. The above output shows a p_value less than 0.05, so the residuals do not show white noise.
Repeat the previous exercise using the Australian Exports series from
global_economy and the Bricks series from
aus_production. Use whichever of NAIVE() or SNAIVE() is
more appropriate in each case.
global_economy# Extract data of interest
recent_production <- global_economy |>
filter(Country == "Australia")
# Define and estimate a model
fit <- recent_production |> model(NAIVE(Exports))
# Look at the residuals
fit |> gg_tsresiduals()
## Warning: Removed 1 row containing missing values (`geom_line()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing non-finite values (`stat_bin()`).
The residuals show to be centered around 0.
# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)
fit |>
augment() |>
features( .resid, ljung_box, lag = 10)
We do not reject the hypothesis of white noise because the p_value is greater than 0.05.
Bricks from aus_productionaus_production |>
autoplot(Bricks)
## Warning: Removed 20 rows containing missing values (`geom_line()`).
aus_production |>
ACF(Bricks) |>
autoplot()
As the data is seasonal as shown by the ACF plot, so SNAIVE would be better used.
# Define and estimate a model
fit <- aus_production |>
filter(!is.na(Bricks)) |>
model(SNAIVE(Bricks))
# Look at the residuals
fit |> gg_tsresiduals()
## Warning: Removed 4 rows containing missing values (`geom_line()`).
## Warning: Removed 4 rows containing missing values (`geom_point()`).
## Warning: Removed 4 rows containing non-finite values (`stat_bin()`).
From the plots above the residuals do not seem be white noise. The Residuals are not normally distributed and appear correlated.
fit |> forecast() |> autoplot(aus_production)
## Warning: Removed 20 rows containing missing values (`geom_line()`).
set.seed(811)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries_train <- myseries |>
filter(year(Month) < 2011)
autoplot(myseries, Turnover) +
autolayer(myseries_train, Turnover, colour = "red")
SNAIVE() applied to
your training data (myseries_train).fit <- myseries_train |>
model(SNAIVE(Turnover))
fit |> gg_tsresiduals()
## Warning: Removed 12 rows containing missing values (`geom_line()`).
## Warning: Removed 12 rows containing missing values (`geom_point()`).
## Warning: Removed 12 rows containing non-finite values (`stat_bin()`).
fc <- fit |>
forecast(new_data = anti_join(myseries, myseries_train))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fc |> autoplot(myseries)
fit |> accuracy()
fc |> accuracy(myseries)
The accuracy in my_series are higher than the previous
one.
The more data available the more accurate a model can be. Accuracy measures are scale dependent. There is a caveat, however, that too much data can lead to over fitting.