DATA624 HW3

library(fpp3)
library(ggplot2)

data("global_economy")  
data("aus_production")  
data("aus_livestock")   
data("hh_budget")       
data("aus_retail")

5.1

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)

RW(y ~ drift())

aus_pop <- global_economy %>% filter(Country == "Australia")

aus_pop %>%
  model(RW(Population ~ drift())) %>%
  forecast(h = 10) %>%
  autoplot(aus_pop) +
  ggtitle("Forecast of Australian Population (RW with Drift)")

aus_pop %>%
  model(SNAIVE(Population)) %>%
  forecast(h = 10) %>%
  autoplot(aus_pop) +
  ggtitle("Forecast of Australian Population (SNAIVE)")

aus_pop %>%
  model(NAIVE(Population)) %>%
  forecast(h = 10) %>%
  autoplot(aus_pop) +
  ggtitle("Forecast of Australian Population (NAIVE)")

Bricks (aus_production)

SNAIVE(y)

** some missing value in Bricks cause error in forecast, removed the NULL value.

aus_production <- aus_production %>% drop_na(Bricks)


aus_production %>%
  model(RW(Bricks ~ drift())) %>%
  forecast(h = 10) %>%
  autoplot(aus_production) +
  ggtitle("Forecast of Brick Production (RW with Drift)")

aus_production %>%
  model(SNAIVE(Bricks)) %>%
  forecast(h = 10) %>%
  autoplot(aus_production) +
  ggtitle("Forecast of Bricks Production (Seasonal NAIVE)")

aus_production %>%
  model(NAIVE(Bricks)) %>%
  forecast(h = 10) %>%
  autoplot(aus_production) +
  ggtitle("Forecast of Brick Production (NAIVE)")

NSW Lambs (aus_livestock)

NAIVE(y)

aus_livestock <- aus_livestock %>% filter(Animal == "Lambs", State=="New South Wales")


aus_livestock %>%
  model(RW(Count ~ drift())) %>%
  forecast(h = 30) %>%
  autoplot(aus_livestock) +
  ggtitle("Forecast of NSW Lambs (RW with Drift)")

aus_livestock %>%
  model(SNAIVE(Count)) %>%
  forecast(h = 30) %>%
  autoplot(aus_livestock) +
  ggtitle("Forecast of NSW Lambs (Seasonal NAIVE)")

aus_livestock %>%
  model(NAIVE(Count)) %>%
  forecast(h = 30) %>%
  autoplot(aus_livestock) +
  ggtitle("Forecast of NSW Lambs (NAIVE)")

Household wealth (hh_budget).

RW(y ~ drift())

hh_wealth <- hh_budget %>% filter(Country == "Australia")

hh_wealth %>%
  model(RW(Wealth ~ drift())) %>%
  forecast(h = 10) %>%
  autoplot(hh_wealth) +
  ggtitle("Forecast of Household Wealth (RW with Drift)")

hh_wealth %>%
  model(SNAIVE(Wealth)) %>%
  forecast(h = 10) %>%
  autoplot(hh_wealth) +
  ggtitle("Forecast of Household Wealth (Seasonal NAIVE)")

## Warning: 1 error encountered for SNAIVE(Wealth)
## [1] Non-seasonal model specification provided, use RW() or provide a different lag specification.

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).

hh_wealth %>%
  model(NAIVE(Wealth)) %>%
  forecast(h = 10) %>%
  autoplot(hh_wealth) +
  ggtitle("Forecast of Household Wealth (NAIVE)")

Australian takeaway food turnover (aus_retail).

RW(y ~ drift()) & SNAIVE(y)

aus_retail<-aus_retail

takeaway <- aus_retail %>% filter(Industry == "Takeaway food services") %>%
summarise(Turnover = sum(Turnover, na.rm = TRUE)) %>%
  ungroup()

takeaway %>%
  model(RW(Turnover ~ drift())) %>%
  forecast(h = 20) %>%
  autoplot(takeaway) +
  ggtitle("Forecast of Australian Takeaway Food Turnover (RW with Drift)")

takeaway %>%
  model(SNAIVE(Turnover)) %>%
  forecast(h = 20) %>%
  autoplot(takeaway) +
  ggtitle("Forecast of Australian Takeaway Food Turnover (Seasonal NAIVE)")

takeaway %>%
  model(NAIVE(Turnover)) %>%
  forecast(h = 20) %>%
  autoplot(takeaway) +
  ggtitle("Forecast of Australian Takeaway Food Turnover (NAIVE)")

5.2

Use the Facebook stock price (data set gafa_stock) to do the following:

Produce a time plot of the series.

fb_stock <- gafa_stock %>%
  filter(Symbol == "FB") 

fb_stock %>%
  autoplot(Adj_Close) +
  labs(title = "Facebook Stock Price",
       y = "Adjusted Close Price",
       x = "Year")

Produce forecasts using the drift method and plot them. *Data is an irregular time series, which this model does not support. So I convert to monthly time series

fb_stock_monthly <- fb_stock %>%
  index_by(Month = yearmonth(Date)) %>%  
  summarise(Adj_Close = mean(Adj_Close, na.rm = TRUE))  

fb_stock_monthly %>%
  model(RW(Adj_Close ~ drift())) %>%
  forecast(h = 20) %>%
  autoplot(fb_stock_monthly) +
  labs("Facebook Stock Price Forecast using Monthly Mean (RW with Drift)",
       y = "Adjusted Close Price",
       x = "Year")

Show that the forecasts are identical to extending the line drawn between the first and last observations.

fb_stock_monthly %>%
  mutate(manual_drift = first(Adj_Close) + (row_number() - 1) * ((last(Adj_Close) - first(Adj_Close)) / (n() - 1))) %>%
  autoplot(Adj_Close) +
  geom_line(aes(y = manual_drift), color = "red", linetype = "dashed") +
  labs(title = "Facebook Stock Price Forecast using Monthly Mean",
       y = "Adjusted Close Price",
       x = "Year")

Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

I think RW(y~drift) method is likely to outperform NAIVE and SNAIVE in the stock FB, when there’s a clear upward or downward trend in the stock prices over time, as it assumes the trend will continue into the future.

fb_stock_monthly %>%
  model(SNAIVE(Adj_Close)) %>%
  forecast(h = 20) %>%
  autoplot(fb_stock_monthly) +
  ggtitle("Facebook Stock Price Forecast using Monthly Mean(Seasonal NAIVE)")

fb_stock_monthly %>%
  model(NAIVE(Adj_Close)) %>%
  forecast(h = 20) %>%
  autoplot(fb_stock_monthly) +
  ggtitle("Facebook Stock Price Forecast using Monthly Mean(NAIVE)")

5.3

Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

What do you conclude?

The residual plot shows randomly scattered residuals around zero, with no visible pattern, which suggests correlation.and the resuduals look like white noise.
In the ACF plot of residuals, some autocorrelations are outside the blue confidence bounds, suggesting that the residuals are not completely random.
The histogram of residuals indicates a slight skew, deviating from a normal distribution.
The seasonal naive method captures the strong seasonal pattern in beer production, as seen in the repeating peaks and troughs.

recent_production <- aus_production |>
  filter(year(Quarter) >= 1992)

fit <- recent_production |> model(SNAIVE(Beer))
fit |> gg_tsresiduals()

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

fit |> forecast() |> autoplot(recent_production)

5.4

Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

Australian Exports series from global_economy
- Since the residuals resemble white noise (uncorrelated, zero mean, and stable variance), the NAIVE() model appears to be a good fit for forecasting Australian exports.
- SNAIVE() in Australian export is Error in na.contiguous.default(as.ts(x)) : all times contain an NA.
Bricks series from aus_production
- The NAIVE() model show residuals are centered around zero, and ACF mostly stays within limits.
- SNAIVE() model is not the best fit for the Bricks series.
  - It leaves seasonality in the residuals.
  - Residuals are not normally distributed (histogram is skewed).

exports_data <- global_economy |> 
  filter(Country == "Australia") |> 
  select(Exports)

fit_exports <- exports_data |> model(NAIVE(Exports))
fit_exports |> gg_tsresiduals()

fit_exports |> forecast() |> autoplot(exports_data)

#fit_exports_s <- exports_data |> model(SNAIVE(Exports))
#fit_exports_s |> gg_tsresiduals()
#fit_exports_s |> forecast() |> autoplot(exports_data)

bricks_data <- aus_production |> select(Bricks)

fit_bricks <- bricks_data |> model(NAIVE(Bricks))
fit_bricks |> gg_tsresiduals()

fit_bricks |> forecast() |> autoplot(bricks_data)

fit_bricks_s <- bricks_data |> model(SNAIVE(Bricks))
fit_bricks_s |> gg_tsresiduals()

fit_bricks_s |> forecast() |> autoplot(bricks_data)

5.7

For your retail time series (from Exercise 7 in Section 2.10):

Create a training dataset consisting of observations before 2011 using

#2.10 exe 7
set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

autoplot(myseries, Turnover) +
  labs(title = "Retail Time Series", y = "Turnover ($)")

gg_season(myseries, Turnover) +
  labs(title = "Seasonal Plot")

gg_subseries(myseries, Turnover) +
  labs(title = "Subseries Plot")

gg_lag(myseries, Turnover, geom = "point") +
  labs(title = "Lag Plot")

ACF(myseries, Turnover) |> autoplot() +
  labs(title = "ACF of the Series")

myseries_train <- myseries |>
  filter(year(Month) < 2011)

autoplot(myseries_train, Turnover) +
  labs(title = "Retail Time Series 2011", y = "Turnover ($)")

gg_season(myseries_train, Turnover) +
  labs(title = "Seasonal Plot 2011")

gg_subseries(myseries_train, Turnover) +
  labs(title = "Subseries Plot 2011")

gg_lag(myseries_train, Turnover, geom = "point") +
  labs(title = "Lag Plot 2011")

ACF(myseries_train, Turnover) |> autoplot() +
  labs(title = "ACF of the Series 2011")

Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit <- myseries_train |>
  model(SNAIVE(Turnover))

Check the residuals.

fit |> gg_tsresiduals()

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_bin()`).

Produce forecasts for the test data

fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`

fc |> autoplot(myseries)

Compare the accuracy of your forecasts against the actual values.

fit |> accuracy()

## # A tibble: 1 × 12
##   State    Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Norther… Clothin… SNAIV… Trai… 0.439  1.21 0.915  5.23  12.4     1     1 0.768

fc |> accuracy(myseries)

## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… Nort… Clothin… Test  0.836  1.55  1.24  5.94  9.06  1.36  1.28 0.601

How sensitive are the accuracy measures to the amount of training data used?

I will say that more training data will improve generalization and lower overfitting. When comparing forecasts against actual values, the training set appears to have lower errors.

ME (Mean Error) - Ideally, ME should be close to zero. - The Training set is 0.439, Testing set is 0.836 - The Training set is better.

RMSE (Root Mean Squared Error) - Measures the standard deviation of forecast errors, Lower RMSE indicates better model performance. - The Training set is 1.21, Testing set is 1.55 - The Training set is better.

MAE (Mean Absolute Error) - The lower MAE, the more accurate the model. - The Training set is 0.915, Testing set is 1.24 - The Training set is better.

MPE (Mean Percentage Error) - Positive MPE indicates the model underestimates, Negative MPE means the model overestimates. - The Training set is 5.23 which is better than the Testing set is 5.94 - Both set is underestimates.

MAPE (Mean Abosolute Percentage Error) - Measures the average percentage error in relation to actual values - The Training set is 12.4, Testing set is 9.06 - The Testing set is better.

MASE (Mean Absolute Scaled Error) - If MASE < 1, the model is better than a naive forecast, MASE > 1, the model performs worse than the naive approach. - The Training set is 1, Testing set is 1.36 - The Training set is better.

RMSSE (Root Mean Squared Scaled Error) - If RMSSE < 1, the model is better than a naïve forecast, RMSSE > 1, the model is worse. - The Training set is 1, Testing set is 1.28 - The Training set is better.

ACF1 (Autocorrelation of Residuals at Lag 1) - If ACF1 is high, residuals are not random → model may be missing patterns, ACF1 is close to zero, residuals are uncorrelated, which is ideal. - The Training set is 0.768, Testing set is 0.601 - The Testing set is better.

DATA624 HW3

Benson T

2025-02-20

5.1

5.2

5.3

5.4

5.7