DATA 624 Homework 3

library(fpp3)
library(tidyverse)

Question 5.1

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)

I chose RW since there is a clear upward trend and no obvious seasonality.

aus_pop <- global_economy %>%
  filter(Country == "Australia")

pop_fit <- aus_pop %>%
  model(RW(Population ~ drift()))

pop_fc <- pop_fit %>%
  forecast(h = "10 years")

autoplot(aus_pop, Population) +
  autolayer(pop_fc, series = "Forecast") +
  labs(title = "Australia Population Forecast",
       y = "Population")

Bricks (aus_production)

I chose SNAIVE since there is strong seasonality and less dominant trend.

bricks <- aus_production %>%
  filter(!is.na(Bricks))

bricks_fit <- bricks %>%
  model(SNAIVE(Bricks))

bricks_fc <- bricks_fit %>%
  forecast(h = "5 years")  # or whatever horizon you choose

autoplot(bricks, Bricks) +
  autolayer(bricks_fc, series = "Forecast") +
  labs(title = "Bricks Production Forecast (SNAIVE)")

NSW Lambs (aus_livestock)

I chose SNAIVE since there is strong seasonality oscillating counts and no strong drift.

nsw_lambs <- aus_livestock %>%
  filter(State == "New South Wales", Animal == "Lambs")

lambs_fit <- nsw_lambs %>%
  model(SNAIVE(Count))

lambs_fc <- lambs_fit %>%
  forecast(h = "5 years")

autoplot(nsw_lambs, Count) +
  autolayer(lambs_fc, series = "Forecast") +
  labs(title = "NSW Lambs Count Forecast (SNAIVE)")

Household wealth (hh_budget).

I chose RW since there is an upward trend and weak or maybe no seasonality.

wealth <- hh_budget

wealth_fit <- wealth %>%
  model(RW(Wealth ~ drift()))

wealth_fc <- wealth_fit %>%
  forecast(h = "5 years")

autoplot(wealth, Wealth) +
  autolayer(wealth_fc, series = "Forecast") +
  labs(title = "Household Wealth Forecast (Drift)")

Australian takeaway food turnover (aus_retail).

There is strong seasonality and trend. So I chose SNAIVE with drift.

takeaway <- aus_retail %>%
  filter(Industry == "Takeaway food services") %>%
  summarise(Turnover = sum(Turnover))

takeaway_fit <- takeaway %>%
  model(SNAIVE(Turnover ~ drift()))

takeaway_fc <- takeaway_fit %>%
  forecast(h = "5 years")

autoplot(takeaway, Turnover) +
  autolayer(takeaway_fc, series = "Forecast") +
  labs(title = "Australian Takeaway Food Turnover Forecast")

Question 5.2

Use the Facebook stock price (data set gafa_stock) to do the following:

Produce a time plot of the series.

fb_stock <- gafa_stock[,c(1,2,6)] %>%
  filter(Symbol == "FB", year(Date) >= 2014) %>%
  mutate(day = row_number()) %>%
  update_tsibble(index = Date, regular = TRUE) %>%
  fill_gaps()

autoplot(fb_stock, Close) +
  labs(title = "Facebook (Meta) Stock Price", y = "Closing Price (USD)")

Produce forecasts using the drift method and plot them. Show that the forecasts are identical to extending the line drawn between the first and last observations. Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

fb_model <- fb_stock %>%
  model(RW(Close ~ drift()))

fb_forecast <- fb_model %>%
  forecast(h = "3 years")

fb_forecast %>%
autoplot(fb_stock) +
  labs(title = "Drift Forecast for Facebook Stock Price", y = "Closing Price")

n <- nrow(fb_stock)
first_val <- first(fb_stock$Close)
last_val <- last(fb_stock$Close)
slope <- (last_val - first_val) / (n - 1)

drift_line <- tibble(
  Date = c(fb_stock$Date, fb_forecast$Date),
  Drift_Line = first_val + slope * (0:(n + nrow(fb_forecast) - 1))
) %>%
  as_tsibble(index = Date)

autoplot(fb_forecast, fb_stock, level = NULL) +
  geom_line(data = drift_line, aes(x = Date, y = Drift_Line), color = "red", linetype = "dashed") +
  labs(
    title = "Drift Forecast vs Line Between First and Last Observations",
    y = "Closing Price (USD)",
    caption = "Dashed red line = straight line from first to last observation"
  )

fb_models <- fb_stock %>%
  model(
    naive = NAIVE(Close),
    rw = RW(Close),
    drift = RW(Close ~ drift())
  )

fb_fc_all <- fb_models %>%
  forecast(h = "3 years")

fb_fc_all %>%
  autoplot(fb_stock, level = NULL) +
  labs(title = "Facebook Stock Forecasts: NAIVE vs RW vs DRIFT", y = "Closing Price") +
  facet_wrap(~.model)

Since the stock shows clear long-term trend then drift will perform better as it usually does because it extrapolates that trend forward. NAIVE just uses the last value which may work better for volatile data. And rw without drift is similar to NAIVE they both assume no trend.

Question 5.3

Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

Extract data of interest

recent_production <- aus_production |> filter(year(Quarter) >= 1992) # Define and estimate a model fit <- recent_production |> model(SNAIVE(Beer)) # Look at the residuals fit |> gg_tsresiduals() # Look a some forecasts fit |> forecast() |> autoplot(recent_production) What do you conclude?

recent_production <- aus_production |>
  filter(year(Quarter) >= 1992)

fit <- recent_production |> model(SNAIVE(Beer))

fit |> gg_tsresiduals()

fit |> forecast() |> autoplot(recent_production)

From the residuals it seems to be uncorrelated and normally distributed. It is also centered around zero. This indicates that SNAIVE model is reasonable and captures the seasonality well.

Question 5.4

Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

exports_data <- global_economy |> 
  filter(Country == "Australia") |> 
  select(Year, Exports)

exports_data |> autoplot(Exports)

exports_fit <- exports_data |> model(NAIVE(Exports))
exports_fit |> gg_tsresiduals()

exports_fit |> forecast(h = 10) |> autoplot(exports_data)

bricks_data <- aus_production |> 
  filter(year(Quarter) >= 1992) |> 
  select(Quarter, Bricks)

bricks_data |> autoplot(Bricks)

bricks_fit <- bricks_data |> model(SNAIVE(Bricks))
bricks_fit |> gg_tsresiduals()

bricks_fit |> forecast() |> autoplot(bricks_data)

Question 5.7

For your retail time series (from Exercise 7 in Section 2.10):

Create a training dataset consisting of observations before 2011 using

myseries_train <- myseries |> filter(year(Month) < 2011) Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) + autolayer(myseries_train, Turnover, colour = “red”) Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit <- myseries_train |> model(SNAIVE()) Check the residuals.

fit |> gg_tsresiduals() Do the residuals appear to be uncorrelated and normally distributed?

Produce forecasts for the test data

fc <- fit |> forecast(new_data = anti_join(myseries, myseries_train)) fc |> autoplot(myseries) Compare the accuracy of your forecasts against the actual values.

fit |> accuracy() fc |> accuracy(myseries) How sensitive are the accuracy measures to the amount of training data used?

myseries <- aus_retail |>
  filter(State == "New South Wales",
         Industry == "Department stores")
myseries_train <- myseries |>
  filter(year(Month) < 2011)
autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red") +
  labs(title = "Retail Turnover: Full Series vs Training Data",
       subtitle = "Training data shown in red")

fit <- myseries_train |> model(SNAIVE(Turnover))
fit |> gg_tsresiduals()

myseries_test <- anti_join(myseries, myseries_train)
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fc <- fit |> forecast(new_data = myseries_test)
fc |> autoplot(myseries) +
  labs(title = "SNAIVE Forecast vs Actual Data",
       subtitle = "Forecasts from training data applied to test period")

fit |> accuracy()
## # A tibble: 1 × 12
##   State  Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE    ACF1
##   <chr>  <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 New S… Departm… SNAIV… Trai…  11.2  24.7  19.1  3.24  5.58     1     1 -0.0527
fc |> accuracy(myseries)
## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… New … Departm… Test  -2.81  25.5  19.9 -1.02  4.05  1.04  1.03 0.201

From the residual time plot there doesn’t seem to be any obvious strong trends there are some large spikes which suggests outliers. There is also autocorrelations present which means strong seasonality and that the residuals are not completely uncorrelated. This may suggest the model is too simple. The histogram is approximately normally distributed with the mean being slightly above 0. The SNAIVE forecast follows the seasonal pattern accurate however it doesn’t account for the upward trend from 2010 it continues at the same level. The confidence interval seems to widen over time which shows the increasing uncertainty. So the SNAIVE captures the seasonality but misses the trend

The accuracy is sensitive to training data length. A longer training period provides more seasonal cycles which helps SNAIVE but includes outdated patterns. A shorter training period may show recent trends better but misses long term seasonality.