624HW3

library(tsibble)

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## 
## Attaching package: 'tsibble'

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, union

library(fpp3)

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──

## ✔ tibble      3.3.1     ✔ ggplot2     4.0.1
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.2     ✔ feasts      0.4.2
## ✔ lubridate   1.9.4     ✔ fable       0.5.0

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()     masks base::date()
## ✖ dplyr::filter()       masks stats::filter()
## ✖ tsibble::intersect()  masks base::intersect()
## ✖ lubridate::interval() masks tsibble::interval()
## ✖ dplyr::lag()          masks stats::lag()
## ✖ tsibble::setdiff()    masks base::setdiff()
## ✖ tsibble::union()      masks base::union()

library(ggplot2)
library(dplyr)

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)

    global_economy %>% 
      filter(Country == "Australia") %>% 
      model(RW(Population ~ drift())) %>% 
      forecast(h = 10) %>% 
      autoplot(global_economy)

I used random walk since the Austrailian poualtion Australian population shows a strong upward trend over time with no seasonality. A random walk with drift captures a persistent trend.

Bricks (aus_production)

aus_production %>% 
  model(SNAIVE(Bricks)) %>% 
  forecast(h = 8) %>% 
  autoplot(aus_production)

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

I used Snative or seasonal nativesince brick production has strong quarterly seasonality (construction cycles), and often weak long-term trend.

NSW Lambs (aus_livestock)

aus_livestock %>% 
  filter(State == "New South Wales",
         Animal == "Lambs") %>% 
  model(SNAIVE(Count)) %>% 
  forecast(h = "2 years") %>% 
  autoplot(aus_livestock)

I used snative for this as there is also a strong seasonal componant to livestock.

Household wealth.

hh_budget %>% 
  model(RW(Wealth ~ drift())) %>% 
  forecast(h = 10) %>% 
  autoplot(hh_budget)

This is rw-drift as thre is an upwards trend, but it doesn’t have seasonality.

Australian takeaway food turnover (aus_retail).

fc <- aus_retail %>%
  model(SNAIVE(Turnover)) %>%
  forecast(h = 12)
fc %>% 
  autoplot(aus_retail) +
  facet_null()

Retail turnover is strongly seasonal withe effects such as seasonal hiring and firing.

Use the Facebook stock price (data set gafa_stock) to do the following:

Produce a time plot of the series.

fb <- gafa_stock %>% 
  filter(Symbol == "FB")

fb %>% 
  autoplot(Close) +
  labs(title = "Facebook (Meta) Daily Closing Price",
       y = "Closing Price ($)")

Produce forecasts using the drift method and plot them.

fb <- gafa_stock %>% 
  filter(Symbol == "FB") %>% 
  update_tsibble(index = Date, regular = TRUE) %>% 
  fill_gaps()

fit_drift <- fb %>% 
  model(RW(Close ~ drift()))

fc_drift <- fit_drift %>% 
  forecast(h = 30)

fc_drift %>% 
  autoplot(fb) +
  labs(title = "Facebook Stock Forecast (Drift Method)")

Show that the forecasts are identical to extending the line drawn between the first and last observations.

# compute drift components
first <- fb$Close[1]
last  <- fb$Close[nrow(fb)]
n     <- nrow(fb)
slope <- (last - first)/(n - 1)

# create drift line
fb <- fb %>% 
  mutate(
    t = row_number(),
    drift = first + slope*(t - 1)
  )

# plot
fb %>% 
  autoplot(Close) +
  geom_line(aes(y = drift), color = "red", linewidth = 1) +
  labs(title = "Drift = Line Between First and Last Points")

Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

The naive method performs best for this dataset. Stock prices behave like a random walk, meaning the next value is best predicted by the most recent observation. The data shows no stable trend or seasonality, so the mean, drift, and seasonal naive methods make incorrect assumptions about the structure, while the naive method matches the unpredictable nature of stock prices.

fb <- gafa_stock %>% 
  filter(Symbol == "FB") %>% 
  update_tsibble(index = Date, regular = TRUE) %>% 
  fill_gaps()

fb %>% 
  model(NAIVE(Close)) %>% 
  forecast(h = 30) %>% 
  autoplot(fb) +
  labs(title = "Facebook Stock Forecast (Naive Method)")

Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

# Extract data of interest
recent_production <- aus_production |>
  filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()

## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_rug()`).

# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)

What do you conclude?

A seasonal naïve model was fitted to the quarterly beer production data from 1992 onward. The residual diagnostics show no clear trend or remaining seasonal structure, and most autocorrelations lie within the significance bounds. This suggests the residuals are approximately white noise and that the seasonal naïve model adequately captures the seasonal pattern in the data.

Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

#extract value

exports <- global_economy %>% 
  filter(Country == "Australia")

#fit model
fit_exp <- exports %>% 
  model(NAIVE(Exports))

fit_exp %>% 
  gg_tsresiduals()

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_rug()`).

#forecasts

fit_exp %>% 
  forecast(h = 10) %>% 
  autoplot(exports) +
  labs(title = "Australian Exports Forecast (Naive)")

For the Australian Exports series, the naive method was used because the data is annual and doesn’t acontain seasonal pattern. The residuals resemble white noise, indicating an adequate model.

#extract 
bricks <- aus_production %>% 
  select(Quarter, Bricks)

#fit model
fit_bricks <- bricks %>% 
  model(SNAIVE(Bricks))

fit_bricks %>% 
  gg_tsresiduals()

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 24 rows containing non-finite outside the scale range
## (`stat_bin()`).

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_rug()`).

#forecasts
fit_bricks %>% 
  forecast(h = 8) %>% 
  autoplot(bricks) +
  labs(title = "Bricks Forecast (Seasonal Naive)")

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

For the Bricks series, the seasonal naive method was used because the quarterly data display strong seasonality. The residual diagnostics show no remaining structure, and the forecasts repeat the most recent seasonal cycle.

Produce forecasts for the 7 Victorian series in aus_livestock using SNAIVE(). Plot the resulting forecasts including the historical data. Is this a reasonable benchmark for these series?

#Extract
vic_livestock <- aus_livestock |>
  filter(State == "Victoria")
#Fit Model
fit_vic <- vic_livestock |>
  model(SNAIVE(Count))
#Forecast Seasonal Naive
fit_vic <- vic_livestock |>
  model(SNAIVE(Count))
#Forecast
fc_vic <- fit_vic |>
  forecast(h = "2 years")
#Plot
fc_vic |>
  autoplot(vic_livestock) +
  facet_wrap(~Animal, scales = "free_y") +
  scale_x_yearmonth(date_breaks = "20 years", date_labels = "%y") +
  labs(
    title = "Victorian Livestock Forecasts (Seasonal Naive)",
    y = "Number slaughtered",
    x = "Year"
  ) +
  theme(
    axis.text.x = element_text(size = 7),
    panel.spacing = unit(0.8, "lines")
  )

Are the following statements true or false? Explain your answer.
1. Good forecast methods should have normally distributed residuals.
False, normal residuals are not required for good forecasts. What matters is that residuals look like white noise: no pattern, no autocorrelation, and mean near zero. Normality mainly affects statistical inference and prediction intervals, not forecast accuracy.
1. A model with small residuals will give good forecasts.
False, small residuals on the training data only show the model fits the past well. The model may be overfitting and capturing noise, which leads to poor future predictions. Forecast quality must be judged using new (test) data.
1. The best measure of forecast accuracy is MAPE.
False, MAPE is easy to interpret but unreliable when values are near zero and can bias results. Other measures like MAE, RMSE, or MASE are often more appropriate. No single accuracy measure is always best.
1. If your model doesn’t forecast well, you should make it more complicated.
False, more complexity often causes overfitting and worse forecasts. Simple models are usually more stable and reliable. It is better to check trend, seasonality, or transformations rather than just adding parameters.
1. Always choose the model with the best forecast accuracy as measured on the test set.
False, a single test set can be misleading due to random variation or unusual events. Models should be evaluated across multiple forecast periods (time-series cross-validation) and also judged on stability and interpretability, not just one accuracy number.
For your retail time series (from Exercise 7 in Section 2.10):
1. Create a training dataset consisting of observations before 2011 using
```
set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))    
myseries_train <- myseries |>   filter(year(Month) < 2011)
```
1. Check that your data have been split appropriately by producing the following plot.
  
  The data were divided into a training set consisting of observations prior to 2011 and a test set covering the remaining period. The plot confirms the split was performed correctly, with the training data shown separately from the test data used for forecast evaluation.
```
    autoplot(myseries, Turnover) +   autolayer(myseries_train, Turnover, colour = "red")
```
1. Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).
```
 fit <- myseries_train |>
  model(SNAIVE(Turnover ~ lag("year")))
```
1. Check the residuals.
```
    fit |> gg_tsresiduals()
```
```
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
```
```
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).
```
```
## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_bin()`).
```
```
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_rug()`).
```
Do the residuals appear to be uncorrelated and normally distributed?

The residual diagnostics suggest the model is not fully adequate. The residual time plot shows noticeable patterns rather than random fluctuations around zero. In the ACF plot, several lags exceed the significance bounds, indicating remaining autocorrelation. This means the model has not captured all of the dependence structure in the data. Although the histogram of the residuals appears roughly symmetric, the presence of autocorrelation violates the assumption of uncorrelated errors. Therefore, the seasonal naïve model does not sufficiently explain the series and a more complex model is needed.Produce forecasts for the test data
```
fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train))
```
```
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
```
```
fc |>
  autoplot(myseries)
```
1. Compare the accuracy of your forecasts against the actual values.
  
  The seasonal naïve model provides reasonable forecasts due to the strong seasonal structure of the data. However, the remaining autocorrelation suggests that more advanced models could improve forecast accuracy.
```
fit |> 
  accuracy()
```
```
## # A tibble: 1 × 12
##   State    Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Norther… Clothin… "SNAI… Trai… 0.439  1.21 0.915  5.23  12.4     1     1 0.768
```
```
fc |> 
  accuracy(myseries)
```
```
## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 "SNAIVE(… Nort… Clothin… Test  0.836  1.55  1.24  5.94  9.06  1.36  1.28 0.601
```
1. How sensitive are the accuracy measures to the amount of training data used?
  
  The accuracy measures are not highly sensitive to the amount of training data used because the seasonal naïve model relies mainly on the most recent seasonal observations. As long as at least one full seasonal cycle is available, the forecasts remain similar, though a longer training set provides more stable accuracy estimates.

624HW3

Rebecca Bronstein

2026-02-22