library(fpp3)
library(tidyverse)

5.1

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case: - Australian Population (global_economy) - Bricks (aus_production) - NSW Lambs (aus_livestock) - Household wealth (hh_budget). - Australian takeaway food turnover (aus_retail).

Australian Population (global_economy)

aus_pop <- global_economy %>% 
  filter(Country == "Australia")

autoplot(aus_pop, Population) +
  labs(title = "Australian Population", y = "Millions")

This plot of the Australian Population shows a strong uptrend with no seasonality so the most appropriate forecast here would be the drift method since it shows change over time.

# Forecast with drift
fit_pop <- aus_pop %>% model(RW(Population ~ drift()))
fc_pop <- forecast(fit_pop, h = 10)

autoplot(fc_pop, aus_pop) +
  labs(title = "Population Forecast", y = "Millions")

Bricks (aus_production)

bricks <- aus_production %>% 
  select(Bricks)

autoplot(bricks) +
  labs(title = "Bricks Production", y = "Millions of bricks")

## Plot variable not specified, automatically selected `.vars = Bricks`

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

There is clear seasonality here as there is repeating peaks and valleys depending on the quarter. Here a seasonal naive (SNAIVE) forecast would be appropriate since it uses data from the previous quarter to forecast the next.

# Forecast with SNAVIE
fit_bricks <- bricks %>% model(SNAIVE(Bricks))
fc_bricks <- forecast(fit_bricks, h = "5 years")

autoplot(fc_bricks, bricks) +
  labs(title = "Bricks Forecast", y = "Millions of bricks")

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

NSW Lambs (aus_livestock)

lambs <- aus_livestock %>%
  filter(State == "New South Wales", Animal == "Lambs")

autoplot(lambs, Count) +
  labs(title = "NSW Lamb Slaughterings", y = "Thousands")

This monthly plot of lamb slaughtering shows repeating peaks and valleys in the same months each year, indicating seasonality. A SNAIVE forecast would be the most appropriate.

# Forecast with SNAIVE
fit_lambs <- lambs %>% model(SNAIVE(Count))
fc_lambs <- forecast(fit_lambs, h = "2 years")

autoplot(fc_lambs, lambs) +
  labs(title = "Lambs Forecast", y = "Thousands")

Household wealth (hh_budget)

wealth <- hh_budget %>% select(Wealth)

autoplot(wealth) +
  labs(title = "Household Wealth", y = "Billions")

## Plot variable not specified, automatically selected `.vars = Wealth`

The household wealth series shows a clear long term uptrend and does not show any obvious repeating seasonal fluctuations so a random walk with drift model would be most appropriate here.

# Forecast with drift
fit_wealth <- wealth %>% model(RW(Wealth ~ drift()))
fc_wealth <- forecast(fit_wealth, h = "5 years")

autoplot(fc_wealth, wealth) +
  labs(title = "Household Wealth Forecast", y = "Billions")

Australian takeaway food turnover (aus_retail)

takeaway <- aus_retail %>%
  filter(Industry == "Takeaway food services") %>% 
  summarise(Turnover = sum(Turnover))

autoplot(takeaway, Turnover) +
  labs(title = "Takeaway Food Turnover", y = "Millions")

This plot of monthly Australian takeaway food turnover has a strong uptrend as well as clear seasonality, with sales rising and falling at similar times each year. In this case, SNAIVE would be a better choice than drift since it will capture the seasonal fluctuations as well as the general uptrend.

# Forecast with SNAIVE
fit_takeaway <- takeaway %>% model(SNAIVE(Turnover))
fc_takeaway <- forecast(fit_takeaway, h = "3 year")

autoplot(fc_takeaway, takeaway) +
  labs(title = "Takeaway Food Forecast", y = "Millions")

5.2

Use the Facebook stock price (data set gafa_stock) to do the following: - Produce a time plot of the series. - Produce forecasts using the drift method and plot them. - Show that the forecasts are identical to extending the line drawn between the first and last observations. - Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

Time plot of Facebook stock price

fb <- gafa_stock %>%
  filter(Symbol == "FB") %>% 
  mutate(trading_day = row_number()) %>% 
  update_tsibble(index = trading_day, regular = TRUE)

autoplot(fb, Close) +
  labs(title = "Facebook Stock Price", y = "Closing Price ($)")

Forecast with Drift method

fb %>% 
  model(Drift = RW(Close ~ drift())) %>%
  forecast(h = 252) %>% 
  autoplot(fb, level = NULL) +
  labs(title = "Facebook closing stock price", y = "$US") +
  guides(colour = guide_legend(title = "Forecast"))

Show that the forecasts are identical to extending the line drawn between the first and last observations.

In the formula for the drift method, y1 represents the first observation in the series and yT represents the last observation. These two values are used to calculate the slope (yT - y1) / (Tn - 1), where T represents the index of the observation. Using this slope with the first and last observed value, we can extrapolate a line from the first observation to the forecasted predictions.

fit_drift <- fb %>% model(RW(Close ~ drift()))
fc_drift  <- forecast(fit_drift, h = 252)

Tn    <- max(fb$trading_day)
y1    <- fb %>% filter(trading_day == 1)  %>% pull(Close)
yT    <- fb %>% filter(trading_day == Tn) %>% pull(Close)
slope <- (yT - y1) / (Tn - 1)

autoplot(fc_drift, fb, level = NULL) +
  geom_segment(aes(x = 1, y = y1,
                   xend = Tn + 252,
                   yend = yT + 252 * slope,
                   colour = "Line through first & last"),
                  linetype = "dashed") +
                  labs(title = "FB Drift Forecast = Extension of First–Last Line",
                       y = "$US") +
                  guides(colour = guide_legend(title = "Overlay"))

## Warning in geom_segment(aes(x = 1, y = y1, xend = Tn + 252, yend = yT + : All aesthetics have length 1, but the data has 1258 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
##   a single row.

Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

fits <- fb %>%
  model(
    naive = NAIVE(Close),
    drift = RW(Close ~ drift()),
    mean  = MEAN(Close)
  )

fc <- fits %>% forecast(h = 252)

autoplot(fc, fb, level = NULL) +
  labs(title = "FB: Benchmark forecasts", y = "Close ($)") +
  guides(colour = guide_legend(title = "Method"))

The other benchmark forecasts for the Facebook stock shows different patterns. The naive method shows a flat line that extends from the last observed price. The drift method creates a straight upward trend from the last point. The mean method flattens the forecast at the average of the stock price. The SNAIVE method wasn’t used here because there was no seasonality shown. Based on these plots, the naive method would be best here for predicting the facebook stock price. The drift method assumes the stock price rises in a fixed slope and the stock price has pretty much never flat lined, so the mean methos is not optimal either. This leave the naive method as the best option for predicting the facebook stock price.

5.3

Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

# Extract data of interest
recent_production <- aus_production |>
  filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()

## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_rug()`).

# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)

What do you conclude? The quarterly Australian beer production series shows strong seasonality, so a seasonal naïve model is a good starting point. The residual plots indicate that the residuals are not pure white noise, as the mean is not 0. The time plot suggests changing variance, and the ACF plot shows a clear spike at lag 4 (with smaller spikes at lags 8 and 12). Together, this indicates that not all seasonal structure has been captured.

The forecast plot repeats the seasonal pattern, but it does not reflect the gradual long-term decline in beer production. We can conclude that seasonal naive model captures the seasonality in the series but fails to capture the downward trend in beer production.

5.4

Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

Austrilian Exports

exports <- global_economy %>% 
  filter(Country == "Australia")

autoplot(exports)

## Plot variable not specified, automatically selected `.vars = GDP`

# Define and estimate a model
fit_exp <- exports %>% model(NAIVE(Exports))

# Look at the residuals
fit_exp %>% gg_tsresiduals()

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_rug()`).

# Look a some forecasts
fit_exp %>% forecast(h = 10) %>% autoplot(exports)

The plot of Australian exports shows a strong upward trend without seasonality so a naive model would be more appropriate than a seasonal naive model. The residual plot shows values roughly centered at zero, but the ACF has a significant negative spike at lag 1. This means that the residuals are not just white noise, suggesting the naïve model did not fully capture the trend.

The forecast plot shows a flat line, which does not capture the upward trend in exports. We can conclude that the naive model is not the best fit for this data. In this case, a drift model would most likely be better as it can capture the upward trend better than the naive model.

Brick proudction

bricks <- aus_production %>%
  select(Bricks) %>%
  drop_na()

autoplot(bricks, Bricks)

fit_b <- bricks %>% model(SNAIVE(Bricks))
fit_b %>% gg_tsresiduals()

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_rug()`).

fit_b %>% forecast(h = 10) %>% autoplot(bricks)

The quarterly bricks production data show strong seasonality, so a seasonal naïve model is appropriate. The residual diagnostics indicate that the residuals are not pure white noise. The ACF plot has significant spikes at seasonal lags, which means that some seasonal components are not captured by the model. Also, the histogram shows that the residuals are not normally distributed.

The forecast plot repeats the past seasonal cycle but does not capture the overall downtrend in brick production nor the changes in variability. From this we can conclude that although the seasonal naive model captures the recurring seasonal pattern, it fails to account for the variability in the data.

5.7

For your retail time series (from Exercise 7 in Section 2.10):

Create a training dataset consisting of observations before 2011 using:

set.seed(123)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries_train <- myseries |>
  filter(year(Month) < 2011)

Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit <- myseries_train |>
  model(SNAIVE(Turnover))

Check the residuals.

fit |> gg_tsresiduals()

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_bin()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_rug()`).

Do the residuals appear to be uncorrelated and normally distributed? The residual time plot is centered around zero on average, but shows runs of positive and negative values, suggesting the forecasts may be biased at times. The ACF plot shows may bars outside blue bounds, meaning the residuals are not uncorrelated. The histogram is somewhat bell shaped but has some skewness, indicating the residuals are not perfectly normally distributed. The residual diagnostic indicates the residuals are not uncorrelated nor normally distributed.

Produce forecasts for the test data

fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`

fc |> autoplot(myseries)

The forecast repeats the seasonal pattern but doesn’t capture the variance in the fluctautions.

Compare the accuracy of your forecasts against the actual values.

fit |> accuracy()

fc |> accuracy(myseries)

The forecast model performed poorly on the test model compared to the training model. The test set had a much larger RMSE (213 vs 46 in training) and MAPE (15.2% vs 7.3%), showing that the seasonal naïve model fit the training data but produced much less accurate forecasts out of sample. The large difference in MAPE indicates that the forecasts were on average twice as far off on the test data. On average, the forecast model was 46 units off in the training data where as 213 off in the testing data, nearly 5 times as big. The large differential indiactes taht the seasonal naive mode failed to capture the changes after 2011.

How sensitive are the accuracy measures to the amount of training data used? The accuracy measures are very sensitive to the amount of training data used. Less training data means that the forecasts are modeled on older seasonal patterns, which decreases accuracy and produces larger RMSE and MAPE in the test data. With more training data, the model captures more recent behavior, improving forecast accuracy.

HW3

Jian Quan Chen

2025-09-21

5.1