library(fpp3)
## ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──
## ✔ tibble      3.2.1     ✔ tsibble     1.1.4
## ✔ dplyr       1.1.2     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.0     ✔ feasts      0.3.1
## ✔ lubridate   1.9.2     ✔ fable       0.3.3
## ✔ ggplot2     3.4.4     ✔ fabletools  0.3.4
## Warning: package 'tsibble' was built under R version 4.3.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

Ask

Do exercises 5.1, 5.2, 5.3, 5.4 and 5.7 in the Hyndman book.

5.1

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)

australia_economy <- global_economy |>
  filter(Country == "Australia")
australia_economy |>
  autoplot(Population)

australia_economy |> 
  model(RW(Population ~ drift())) |>
  forecast(h = 5) |>
  autoplot(australia_economy)

Bricks (aus_production)

aus_production |>
    filter(!is.na(Bricks)) |>
  model(SNAIVE(Bricks ~ lag("year"))) |>
  forecast(h= 14) |>
  autoplot(aus_production)
## Warning: Removed 20 rows containing missing values (`geom_line()`).

NSW Lambs (aus_livestock)

aus_livestock |>
  filter(Animal == "Lambs" & State == "New South Wales") |>
  model(
    classical_decomposition(Count, type="multiplicative")
  ) |>
    components() |>
  autoplot()
## Warning: Removed 6 rows containing missing values (`geom_line()`).

aus_livestock |>
  filter(Animal == "Lambs" & State == "New South Wales") |>
  ACF(Count) |>
  autoplot()

Using the ACF function to identify if there if the data is highly seasonal.

aus_livestock |>
  filter(Animal == "Lambs" & State == "New South Wales") |>
  model(RW(Count ~ drift())) |>
  forecast(h=48) |>
  autoplot(aus_livestock)

aus_livestock |>
  filter(Animal == "Lambs" & State == "New South Wales") |>
  model(SNAIVE(Count ~ lag("year"))) |>
  forecast(h=48) |>
  autoplot(aus_livestock)

Due to the scallop shape of the ACF graph shows a seasonality, so the SNAIVE model can be used to forecast the outcome.

Household wealth (hh_budget).

hh_budget |>
  autoplot(Wealth)

hh_budget |>
  model(NAIVE(Wealth)) |>
  forecast(h=5) |>
  autoplot(hh_budget)

Naive model works well for economic and financial time series.

Australian takeaway food turnover (aus_retail)

aus_retail |>
  filter(Industry == "Cafes, restaurants and takeaway food services") |>
  model(RW(Turnover ~ drift())) |>
  forecast(h=24) |>
  autoplot(aus_retail) +
  facet_wrap(~State, scales="free_y", ncol = 2)

5.2

Use the Facebook stock price (data set gafa_stock) to do the following:

Produce a time plot of the series.

gafa_stock |>
  filter(Symbol == "FB") |>
  autoplot()
## Plot variable not specified, automatically selected `.vars = Open`

Produce forecasts using the drift method and plot them.

fb <- gafa_stock |>
  filter(Symbol == "FB") |>
  mutate(Date = row_number()) |>
  update_tsibble(index = Date, regular = TRUE) 
  
  
fb |> model(RW(Open ~ drift())) |>
  forecast(h= 63) |>
  autoplot(fb)

Show that the forecasts are identical to extending the line drawn between the first and last observations.

The first observation:

fb[1,]

The second observation:

fb[1258,]

Using geom_segment we can draw a line from the start of the time series, to the end.

fb |> model(RW(Open ~ drift())) |>
  forecast(h= 63) |>
  autoplot(fb) +
  geom_segment(aes(x = 1, y = 54.83, xend = 1258, yend = 134.45),
               colour = "blue", linetype = "dashed")

As shown above, the line drawn connects to the forecasted line.

Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

fb |>
  model(
    Mean = MEAN(Open),
    `Naïve` = NAIVE(Open),
    Drift = NAIVE(Open ~ drift())
  ) |>
  forecast(h = 365) |>
  autoplot(fb, level = NULL)

Both the Mean and Naive models do not show a plausible forecasr for the time series. Drift appears to be the best because it is unlikely the Opening price would be one constant value for a year.

5.3

Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

# Extract data of interest
recent_production <- aus_production |>
  filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()
## Warning: Removed 4 rows containing missing values (`geom_line()`).
## Warning: Removed 4 rows containing missing values (`geom_point()`).
## Warning: Removed 4 rows containing non-finite values (`stat_bin()`).

# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)

The residuals are centered around zero so the forecasts will not be biased. However the residuals do not show a normal distribution so wthe residuals do not look like white noise.

Another way to test if the residuals are white noise is doing the Portmanteau tests.

augment(fit) |>
  features(.resid, ljung_box, lag=4)

Reject the hypothesis of white noise if the p value is less than 0.05. The above output shows a p_value less than 0.05, so the residuals do not show white noise.

5.4

Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

Australian Exports from global_economy

# Extract data of interest
recent_production <- global_economy |>
  filter(Country == "Australia")

# Define and estimate a model
fit <- recent_production |> model(NAIVE(Exports))

# Look at the residuals
fit |> gg_tsresiduals()
## Warning: Removed 1 row containing missing values (`geom_line()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing non-finite values (`stat_bin()`).

The residuals show to be centered around 0.

# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)

fit |>
  augment() |>
  features( .resid, ljung_box, lag = 10)

We do not reject the hypothesis of white noise because the p_value is greater than 0.05.

Bricks from aus_production

aus_production |>
  autoplot(Bricks)
## Warning: Removed 20 rows containing missing values (`geom_line()`).

aus_production |>
  ACF(Bricks) |>
  autoplot()

As the data is seasonal as shown by the ACF plot, so SNAIVE would be better used.

# Define and estimate a model
fit <- aus_production |>
  filter(!is.na(Bricks)) |>
  model(SNAIVE(Bricks))

# Look at the residuals
fit |> gg_tsresiduals()
## Warning: Removed 4 rows containing missing values (`geom_line()`).
## Warning: Removed 4 rows containing missing values (`geom_point()`).
## Warning: Removed 4 rows containing non-finite values (`stat_bin()`).

From the plots above the residuals do not seem be white noise. The Residuals are not normally distributed and appear correlated.

fit |> forecast() |> autoplot(aus_production)
## Warning: Removed 20 rows containing missing values (`geom_line()`).

5.7: For your retail time series (from Exercise 7 in Section 2.10)

Create a training dataset consisting of observations before 2011 using

set.seed(811)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries_train <- myseries |>
  filter(year(Month) < 2011)

Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit <- myseries_train |>
  model(SNAIVE(Turnover))

Check the residuals.

fit |> gg_tsresiduals()
## Warning: Removed 12 rows containing missing values (`geom_line()`).
## Warning: Removed 12 rows containing missing values (`geom_point()`).
## Warning: Removed 12 rows containing non-finite values (`stat_bin()`).

Do the residuals appear to be uncorrelated and normally distributed?

The residuals do not appear to be normally distributed as it is rigth skewed and they are correlated.

Produce forecasts for the test data

fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fc |> autoplot(myseries)

Compare the accuracy of your forecasts against the actual values.

fit |> accuracy()
fc |> accuracy(myseries)

The accuracy in my_series are higher than the previous one.

How sensitive are the accuracy measures to the amount of training data used?

The more data available the more accurate a model can be. Accuracy measures are scale dependent. There is a caveat, however, that too much data can lead to over fitting.