1. Produce forecasts for the following series using whichever of
NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each
case:
library(fpp3)
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.0 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.5
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.3.2
## ✔ lubridate 1.9.3 ✔ fable 0.3.4
## ✔ ggplot2 3.5.0 ✔ fabletools 0.4.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
#This data has an increasing trend (population), so using RW(y ~ drift()) will be best
aus_economy <- global_economy %>%
filter(Code == "AUS")
aus_economy %>%
model(Drift = RW(Population ~ drift())) %>%
forecast(h = 15) %>%
autoplot(aus_economy) +
labs(title = "Australian Population Forecast")

#Because the data here is seasonal in quarters we will the SNAIVE method
autoplot(aus_production, Bricks) +
labs( title = "Quarterly Production of Bricks in Australia")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production %>%
filter(!is.na(Bricks)) %>%
model(SNAIVE(Bricks ~ lag("year"))) %>%
forecast(h = 15) %>%
autoplot(aus_production) +
labs(title = "Australian Bricks Production Forecast")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_livestock %>%
filter(State == "New South Wales",
Animal == "Lambs") %>%
model(NAIVE(Count)) %>%
forecast(h = 24) %>%
autoplot(aus_livestock) +
labs(title = "New South Wales Lambs")

#As there in no repetitive constant trend in seasonality I decided to use the NAIVE method.
hh_budget %>%
autoplot(Wealth)

#This data set is on household wealth, which we see has a positive trend, so we'll use the RW(y ~ drift()) method.
hh_budget %>%
model(Drift = RW(Wealth ~ drift())) %>%
forecast(h = 15) %>%
autoplot(hh_budget) +
labs(title = "Household wealth Forecast")

# This chart shows seasonality per year so SNAIVE is ideal here
aus_retail %>%
filter(State == "South Australia",
Industry == 'Takeaway food services') %>%
model(SNAIVE(Turnover ~ lag("year"))) %>%
forecast(h = 15) %>%
autoplot(aus_retail) +
labs(title = "South Australia Takeout Turnover Forecast")

2. Use the Facebook stock price (data set gafa_stock) to do the
following:
#Produce a time plot of the series.
data("gafa_stock")
data("hh_budget")
unique(gafa_stock$Symbol)
## [1] "AAPL" "AMZN" "FB" "GOOG"
#
gafa_stock %>%
filter( Symbol == "FB") %>%
autoplot(Close) +
labs(y = "USD", Title = "Closing FB Prices")

#Produce forecasts using the drift method and plot them.
fb <- gafa_stock %>%
filter( Symbol == "FB")
fb_tibble <- as_tsibble(fb, key = "Symbol", index = "Date", regular = TRUE) %>% fill_gaps()
fb_tibble %>%
model(Drift = RW(Close ~ drift())) %>%
forecast(h = 15) %>%
autoplot(fb_tibble) +
labs(title = "FB Closing Prices Forecast")

#Show that the forecasts are identical to extending the line drawn between the first and last observations.
df <- data.frame(x1 = as.Date('2014-01-02'), x2 = as.Date('2018-12-31'), y1 = 54.71, y2 = 131.09)
fb_tibble %>%
model(Drift = RW(Close ~ drift())) %>%
forecast(h = 120) %>%
autoplot(fb) +
labs(title = "FB Closing Price Forecast") +
geom_segment(aes(x = x1, y = y1, xend = x2, yend = y2, colour = "segment"), data = df)

# We can conclude that the line between the first and last observation does match the forecasts.
#Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?
fb_tibble %>%
model(
Mean = MEAN(Close),
Naive = NAIVE(Close),
Drift = RW(Close ~ drift())
) %>%
forecast(h = 120) %>%
autoplot(fb_tibble) +
labs(title = "South Australian Takeout Turnover Forecast")

#From the used benchmark functions, the naive benchmark has the smallest prediction interval, therefore is the most suitable.
3. Apply a seasonal naïve method to the quarterly Australian beer
production data from 1992. Check if the residuals look like white noise,
and plot the forecasts. The following code will help.
data("aus_production")
# Extract data of interest
recent_production <- aus_production |>
filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)

What do you conclude?
We can conclude that the results are not white noise, the values
don’t seem independent and random.The does appear to be a very gradual
downwards trend in beer production. Because of this the snaive model is
overestimating future beer production from the prior year’s production
amount.
4. Repeat the previous exercise using the Australian Exports series
from global_economy and the Bricks series from aus_production. Use
whichever of NAIVE() or SNAIVE() is more appropriate in each case.
data("global_economy")
# Extract data of interest
aus_exports <- global_economy |>
filter(Country == "Australia")
# Define and estimate a model
fit_aus_exports <- aus_exports |> model(NAIVE(Exports))
# Look at the residuals
fit_aus_exports |> gg_tsresiduals()
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).

# Look a some forecasts
fit_aus_exports |> forecast() |> autoplot(aus_exports)

For the Aus Exports, the native method was more viable as it is an
economic time series. This data set does seem to contain white noise as
the graph should a large count of residuals at 0.The lags are also close
to the dashed line, we can conclude that the forecast is not bias.
# Define and estimate a model
fit <- aus_production |>
filter(!is.na(Bricks)) |>
model(SNAIVE(Bricks ~ lag("year")))
# Look at the residuals
fit |> gg_tsresiduals()
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

# Look a some forecasts
fit |> forecast() |> autoplot(aus_production)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

There is a high autocorrelation between multiple lags, as well as a
seasonal pattern. The histogram is also not uniform, meaning that the
model is not suitable for the time series.
5. For your retail time series (from Exercise 7 in Section
2.10):
Create a training dataset consisting of observations before 2011
using
set.seed(15)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries_train <- myseries |>
filter(year(Month) < 2011)
Check that your data have been split appropriately by producing the
following plot.
autoplot(myseries, Turnover) +
autolayer(myseries_train, Turnover, colour = "red")

# the data has been split correctly (split is from 2011 onwards)
Fit a seasonal naïve model using SNAIVE() applied to your training
data (myseries_train).
fit <- myseries_train |>
model(SNAIVE(Turnover))
Check the residuals.
fit |> gg_tsresiduals()
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_bin()`).

# The residuals are autocorrelated and are not normally distributed, they are right-tailed.
Produce forecasts for the test data
fc <- fit |>
forecast(new_data = anti_join(myseries, myseries_train))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fc |> autoplot(myseries)

Compare the accuracy of your forecasts against the actual
values.
fit |> accuracy()
## # A tibble: 1 × 12
## State Industry .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 New Sou… Furnitu… SNAIV… Trai… 8.41 18.4 13.3 4.83 8.10 1 1 0.667
fc |> accuracy(myseries)
## # A tibble: 1 × 12
## .model State Industry .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… New … Furnitu… Test 66.9 80.9 67.5 16.9 17.1 5.07 4.40 0.902
#The accuracy measures from the training set is smaller in comparison.
How sensitive are the accuracy measures to the amount of training
data used?
The errors on the training data are much smaller than the test data.
The model performed poorly on the test data, which had fewer points.
When compared, all the metrics performed worse on the test dataset in
comparison to the training set.