DATA624 - HW 3

library(fpp3)
library(tidyverse)

Exercise 5.11.1

1. Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)
Bricks (aus_production)
NSW Lambs (aus_livestock)
Household wealth (hh_budget).
Australian takeaway food turnover (aus_retail).

Answer:

Australian Population (global_economy):

aus_pop <- global_economy |>
  filter(Country == "Australia")

aus_pop |>
  autoplot(Population)

The Australian population data show a steady upward trend without any apparent seasonality or cyclical patterns. Since the population grows gradually over time with a relatively constant rate of increase, the Random Walk with Drift (RW with drift) model is appropriate for forecasting.

aus_pop |>
  model(RW(Population ~ drift())) |>
  forecast(h = 10) |>
  autoplot(aus_pop)+
  theme_bw()

The RW with drift forecast for Australian population shows a steady increase over time, following the long-term growth pattern. The forecast predicts that the population will keep rising at a similar rate as before. The blue shaded areas represent uncertainty, but since population growth is usually stable, the forecast is quite reliable in the short term.

Bricks (aus_production):

aus_bricks <- aus_production|>
  filter(!is.na(Bricks))|>
  select(Quarter, Bricks)
aus_bricks |>
  autoplot(Bricks) +
  theme_bw()

The Australian bricks production data exhibits quarterly seasonality, an inconsistent trend with an initial rise until 1980 followed by fluctuations, and does not follow a simple upward drift like the Australian population. Since there is clear seasonality, the best method to use is Seasonal Naïve (SNAIVE). This method repeats the last observed seasonal values for forecasting.

aus_bricks |>
  model(SNAIVE(Bricks)) |>
  forecast(h = "2 years") |>
  autoplot(aus_bricks) +
  theme_bw()

The SNAIVE forecast for Australian bricks production captures the quarterly seasonal pattern, extending recent fluctuations into the future. The prediction intervals widen over time, indicating uncertainty due to past variations. Since no long-term trend is assumed, the forecast follows the existing seasonal structure without an upward or downward drift.

NSW Lambs (aus_livestock):

nsw_lambs <- aus_livestock |>
  filter(State == "New South Wales", Animal == "Lambs")

nsw_lambs |>
  autoplot(Count)+
  theme_bw()

The NSW lamb slaughter data has seasonal patterns with ups and downs, an uncertain trend that declined from the 1980s to 2000 before stabilizing, and high short-term fluctuations. Since seasonality is clear, the Seasonal Naïve (SNAIVE) model is the best choice as it repeats recent seasonal patterns for future forecasts.

gg_season(nsw_lambs)+
  theme_bw()

## Plot variable not specified, automatically selected `y = Count`

The above plot using gg_season() seasonal plot also confirms the seasonality.

nsw_lambs |>
  model(SNAIVE(Count)) |>
  forecast(h = "2 years") |>
  autoplot(nsw_lambs) + 
  theme_bw()

The SNAIVE forecast for NSW lamb slaughter follows the same seasonal pattern as past years. The blue shaded areas show increasing uncertainty over time, but no long-term trend is assumed.

Household wealth (hh_budget):

hh_wealth <- hh_budget |>
  filter(!is.na(Wealth)) |>
  select(Year, Wealth)
hh_wealth |>
  autoplot(Wealth) +
  theme_bw()

The household wealth data shows a clear upward trend with some fluctuations but no strong seasonality, making Random Walk with Drift (RW with drift) the best forecasting method.

hh_wealth |>
  group_by(Country) |>
  model(RW(Wealth ~ drift())) |>
  forecast(h = 10) |>
  autoplot(hh_wealth) +
  theme_bw()

The RW with drift forecast for household wealth shows a steady upward trend for all countries, continuing past growth patterns. The blue shaded areas represent increasing uncertainty over time, but the general direction remains positive.

Australian takeaway food turnover (aus_retail):

takeaway <- aus_retail |>
  filter(Industry == "Takeaway food services")

takeaway |>
  autoplot(Turnover) +
  theme_bw()

takeaway |>
  group_by(State) |>
  model(SNAIVE(Turnover)) |>
  forecast(h = "2 years") |>
  autoplot(takeaway) +
    facet_wrap(~State, scales = "free")+  # Ensure each state has its own y-scale
  theme_bw()

The SNAIVE forecast for takeaway food turnover shows a strong upward trend with seasonal patterns across all states. The prediction intervals (blue shading) widen over time, indicating increasing uncertainty.

Exercise 5.11.2

Use the Facebook stock price (data set gafa_stock) to do the following:

1. Produce a time plot of the series.
1. Produce forecasts using the drift method and plot them.
1. Show that the forecasts are identical to extending the line drawn between the first and last observations.
1. Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

Answer:

a. Produce a time plot of the series.

fb_stock <- gafa_stock |>
  filter(Symbol == "FB")

fb_stock |>
  autoplot(Close) +
 theme_bw()

Since stock prices tend to follow trends but without seasonality, the drift method (RW with drift) is a good choice.

b. Produce forecasts using the drift method and plot them.

fb_stock <- gafa_stock |>
  filter(Symbol == "FB") |>
  select(Date, Close) |>  
  complete(Date = seq(min(Date), max(Date), by = "day")) |>  
  fill(Close, .direction = "down") |>  # Carry forward last known price
  as_tsibble(index = Date, regular = TRUE)

fb_stock |>
  model(RW(Close ~ drift())) |>
  forecast(h = 63) |> 
  autoplot(fb_stock) +
  labs(title = "Facebook Stock Price Forecast (Drift Method)")+
  theme_bw()

The RW with drift model predicts Facebook’s stock price by extending the historical trend, assuming a similar rate of increase or decrease. The blue shaded areas represent the 80% and 95% prediction intervals, showing growing uncertainty in the forecast. Since the stock price has been declining, the forecast continues this downward trend with some uncertainty.

c. Show that the forecasts are identical to extending the line drawn between the first and last observations.

first_obs <- fb_stock |> slice(1) #1st observation
last_obs <- fb_stock |> slice(n()) #last observation

fb_stock |>
  model(RW(Close ~ drift())) |>
  forecast(h = 63) |>
  autoplot(fb_stock) +
  labs(title = "Daily Close Price of Facebook", y = "USD") +
  annotate("segment", 
           x = first_obs$Date, y = first_obs$Close, 
           xend = last_obs$Date, yend = last_obs$Close,
           colour = "purple", linetype = "dashed", linewidth = 1) +  
  theme_bw()

The RW with drift forecast (blue shaded area) closely follows the dashed trend-line, confirming that the model simply extends the past trend into the future based on historical average changes.

d. Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

fb_models <- fb_stock |>
  model(
    NAIVE = NAIVE(Close),
    MEAN = MEAN(Close),
    RW_Drift = RW(Close ~ drift())  # Keeping RW with drift for comparison
  )

# Generate forecasts for 63 trading days ahead
fb_forecasts <- fb_models |>
  forecast(h = 63)  # Removed misplaced pipe here

# Plot the forecasts
fb_forecasts |>
  autoplot(fb_stock, level = NULL) +  # Ensure it uses the full dataset
  labs(title = "Facebook Stock Price Forecast: Comparing Benchmark Models",
       y = "Stock Price (USD)",
       x = "Year") +
  theme_minimal()

RW with Drift is the best model as it captures the stock’s long-term trend, unlike NAIVE, which assumes no change, and MEAN, which unrealistically predicts a constant average price.

Exercise 5.11.3

Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help. What do you conclude?

# Extract data of interest
recent_production <- aus_production |>
  filter(year(Quarter) >= 1992)

# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))

# Look at the residuals
fit |> gg_tsresiduals()

# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)

What do you conclude?

Answer:

The SNAIVE model effectively reflects the seasonal pattern in Australian beer production. The forecast follows historical trends with expanding prediction intervals. However, the residual analysis shows some autocorrelation, meaning the residuals are not completely white noise. The histogram looks nearly normal, but the ACF plot indicates remaining structure in the data. So, while SNAIVE is a reasonable choice, it does not fully explain all variations which, suggests possible room for improvement in the model.

Exercise 5.11.4

Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

Answer:

Australian Exports series from global_economy:

Time Series Plot:

global_economy |>
  filter(Country == "Australia") |>
  autoplot(Exports) +
  theme_bw()

The time series plot shows an increasing trend but no clear seasonal cycles. Exports fluctuate, but they don’t repeat in a predictable seasonal pattern. Let’s check with STL Decomposition.

Check for Seasonality: STL Decomposition

global_economy |>
  filter(Country == "Australia") |>
  model(STL(Exports ~ season(window = "periodic"))) |>
  components() |>
  autoplot()

The trend component shows a steady upward trend over time and remainder component shows random variations, but no repeating seasonal pattern. let’s also take a look at the ACF Plot.

ACF to Detect Seasonality:

global_economy |>
  filter(Country == "Australia") |>
  ACF(Exports) |>
  autoplot()

The ACF plot shows strong autocorrelation, meaning past values influence future values and there are no strong seasonal spikes.

we will chose NAIVE because from the above we can see that the data has a trend but no strong seasonal pattern.

# Extract data of interest
aus_exports <- global_economy |>
  filter(Country == "Australia")

# Define and estimate a model
fit <- aus_exports |> model(NAIVE(Exports))

# Look at the residuals
fit |> gg_tsresiduals()

# Look at some forecasts
fit |> forecast() |> autoplot(aus_exports)

The NAIVE model is an appropriate choice for forecasting Australian exports because the data shows an upward trend but no clear seasonality. NAIVE assumes that future exports will stay at the last observed value, making it useful for short-term forecasts but less reliable for long-term predictions. The residuals show some fluctuations but no clear trend, and while the ACF plot suggests minor autocorrelation, the model performs reasonably well. The forecast follows the last observed export value, with widening prediction intervals indicating growing uncertainty. However, if exports continue to rise, NAIVE may not be the best choice for long-term forecasting since it does not account for trends.

Bricks series from aus_production:

Time Series Plot:

bricks_data <- aus_production |>
  filter(!is.na(Bricks)) 

bricks_data |> 
  autoplot(Bricks) +
  theme_bw()

The Bricks production data shows strong repeating cycles, suggesting seasonality and there is no clear long-term trend after 1980; the series fluctuates around a mean value.

Check for Seasonality: STL Decomposition

bricks_data |>
  model(STL(Bricks ~ season(window = "periodic"))) |>
  components() |>
  autoplot()

trend component fluctuates but does not show a continuous upward or downward pattern and s easonal component is very strong, indicating regular quarterly patterns.The remainder component is fairly random

ACF to Detect Seasonality:

bricks_data |>
  ACF(Bricks) |>
  autoplot()

The ACF plot shows significant spikes at regular lags (every 4 quarters). The ACF values slowly decrease rather than dropping sharply. This suggests that past values strongly influence future values, which supports using a seasonal model.

we will chose SNAIVE because from the above we can see that the data shows strong seasonal patterns.

# Define and estimate a model
fit <- fit_bricks <- bricks_data |>  model(SNAIVE(Bricks))

# Look at the residuals
fit_bricks |> gg_tsresiduals()

# Look at some forecasts
fit |> forecast(h=8) |> autoplot(bricks_data)

The SNAIVE model is a reasonable choice for forecasting bricks production because the data shows strong seasonal patterns but no clear long-term trend. The forecast follows the last observed seasonal cycle, with prediction intervals widening over time, indicating increasing uncertainty. However, the residuals still show some patterns, and the ACF plot suggests the model hasn’t captured all variations. While SNAIVE works well for short-term predictions, exploring other models in the future may help improve long-term accuracy.

Exercise 5.11.7

For your retail time series (from Exercise 7 in Section 2.10):

a. Create a training dataset consisting of observations before 2011 using

set.seed(786)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))
autoplot(myseries,.vars=Turnover)+theme_bw()

myseries_train <- myseries |>
  filter(year(Month) < 2011)

b. Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red") + theme_bw()

c. Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

The plot shows clear seasonal patterns, so, SNAIVE will work well by repeating the last observed season.

fit <- myseries_train |>
  model(SNAIVE(Turnover))

d. Check the residuals.

fit |> gg_tsresiduals()

Do the residuals appear to be uncorrelated and normally distributed?

The residual analysis and autocorrelation plots show that the SNAIVE model does not fully capture all variations in the data. The residuals are roughly normally distributed, but they show some autocorrelation, meaning they are not completely uncorrelated.

e. Produce forecasts for the test data

let’s generate forecasts for the test data (2011 and beyond) using the SNAIVE model and visualize the results.

fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`

fc |> autoplot(myseries) + theme_bw()

f. Compare the accuracy of your forecasts against the actual values.

fit |> accuracy() #accuracy on training data

## # A tibble: 1 × 12
##   State    Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 New Sou… Cafes, … SNAIV… Trai…  14.1  42.1  31.5  6.07  12.7     1     1 0.847

fc |> accuracy(myseries) #accuracy on test data

## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… New … Cafes, … Test   175.  202.  176.  25.9  26.0  5.58  4.81 0.924

The SNAIVE model performed well on the training data (MAPE: 6.07%) but had much higher errors on test data (MAPE: 26.02%), meaning it struggled to predict future values accurately. This suggests that while SNAIVE captures seasonality, it does not account for the increasing trend, making it less reliable for long-term forecasting.

g. How sensitive are the accuracy measures to the amount of training data used?

The accuracy measures are highly sensitive to the amount of training data used, as the way data is split affects forecast performance. More training data can help capture seasonality but may include outdated trends, while less training data might not fully capture seasonal patterns. Any change in the training set alters the forecast and impacts accuracy measurements. In our case, the test MAPE (26.02%) is much higher than the training MAPE (6.07%), showing that the model struggles with recent trends. This suggests the need for a better model that accounts for both seasonality and trend.

DATA624 - HW 3

Farhana Akther

2025-02-23

Exercise 5.11.1

Exercise 5.11.2

Exercise 5.11.3

Exercise 5.11.4

Exercise 5.11.7