Data 624 Homework 3

##5.1

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)

Since the population data tend to increase over time with no seasonality, we would use

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.3.3

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.0 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.5
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.0     ✔ feasts      0.3.2
## ✔ lubridate   1.9.3     ✔ fable       0.3.4
## ✔ ggplot2     3.5.1     ✔ fabletools  0.4.2

## Warning: package 'ggplot2' was built under R version 4.3.3

## Warning: package 'tsibble' was built under R version 4.3.3

## Warning: package 'tsibbledata' was built under R version 4.3.3

## Warning: package 'feasts' was built under R version 4.3.3

## Warning: package 'fabletools' was built under R version 4.3.3

## Warning: package 'fable' was built under R version 4.3.3

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

#Fitler for Australia 
global_economy <- global_economy %>%
  filter(Country == "Australia") 


forcast_model_population <- global_economy %>%
  model(RW(Population ~ drift()))


forcast_population <- forcast_model_population %>%
  forecast(h = "10 years")


forcast_population %>% autoplot(global_economy, level = NULL) +
  labs(title = "Forecast for Australian Population",
       y = "Population")

Bricks (aus_production)

Since Brik production might follow a seasonall patterns related to the construction industry, we should use SNAIVE.

#we can remove the data with missing values 
tidy_aus_production <- aus_production %>%
  drop_na()

aus_bricks_model <- tidy_aus_production %>%
  model(SNAIVE(Bricks))

aus_bricks_model %>%
  forecast() %>%
  autoplot(tidy_aus_production) +
  labs(title = "Australian Brick Production Forecast")

NSW Lambs (aus_livestock)

It looks like there is a conistant seasonality trend in the counts of livestock which means best model is SNAIVE

aus_lambs <- aus_livestock %>%
  filter(State == "New South Wales", Animal == "Lambs")

forcast_model_lambs <- aus_lambs %>%
  model(SNAIVE(Count))

forcast_lambs <- forcast_model_lambs %>%
  forecast(h = "2 years")


forcast_lambs %>% autoplot(aus_lambs, level = NULL) +
  labs(title = "Forecast for NSW Lambs",
       y = "Lambs")

Since household wealth tippically exhibits a long term trend, we would use the RW with drift model.

household_wealth <- hh_budget 


Forcast_model_wealth <- household_wealth %>%
  model(RW(Wealth ~ drift()))


fc_wealth <- Forcast_model_wealth %>%
  forecast(h = "2 years")  # Adjust the horizon as needed


fc_wealth %>%
  autoplot(household_wealth, level = NULL) +
  labs(title = "Forecast for Household Wealth",
       y = "Wealth")

takeaway <- aus_retail %>%
  filter(Industry == "Takeaway food services")

forcast_model_takeaway <- takeaway %>%
  model(SNAIVE(Turnover))
autoplot(takeaway, Turnover) +
  labs(title = "Turnover in Takeaway Food Services.")

##5.2 Use the Facebook stock price (data set gafa_stock) to do the following:

Produce a time plot of the series.

facebook <- gafa_stock %>%
  filter(Symbol == "FB") %>%
  as_tsibble(index = Date, key = Symbol) %>%
  fill_gaps()
facebook %>%
  autoplot(Close) +
  labs(title = "Facebook Stock Price Over Time",
       x = "Date",
       y = "Closing price")

Produce forecasts using the drift method and plot them.

facebook_focast_model <- facebook %>%
  mutate(Close = as.numeric(Close), day = row_number()) %>%
  update_tsibble(index = day, regular = TRUE) %>%
  select(Date, Close)

facebook_focast_model %>%
  model(RW(Close ~ drift())) %>%
  forecast(h = 252) %>%
  autoplot(facebook_focast_model) +
         labs(y = "Stock Price (Closing)", title = "Drift Method Forecasts for Facebook Stock Price")

Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

# Prepare the Facebook stock data
facebook_focast_model <- facebook |>
  mutate(Close = as.numeric(Close), day = row_number()) |>
  update_tsibble(index = day, regular = TRUE) |>
  select(Date, Close)

# Apply different benchmark forecasting methods
models <- facebook_focast_model %>%
  model(
    RW_drift = RW(Close ~ drift()),  # Random Walk with Drift
    Naive = NAIVE(Close),            # Naive method
    Mean = MEAN(Close)               # Mean method
    # SNAIVE(Close) could be used if there was seasonality, but stock prices don't exhibit strong seasonality.
  )

# Generate forecasts for the next 252 days
forecasts <- models %>%
  forecast(h = 252)

# Plot the forecasts for comparison
autoplot(forecasts, facebook_focast_model) +
  labs(y = "Stock Price (Closing)", title = "Benchmark Forecasts for Facebook Stock Price") +
  facet_wrap(~ .model, scales = "free_y")

The three different models above can draw different conclusion or give different answers, and this can be a proof of how the models are effective depending on the type of data and understanding seasonality and trends.

First the mean model, This method assumes that the future stock price will remain constant at the last observed value.The wide intervals shows uncertainty over time. The model is not really a good fit for the data because it does not capture the trend. second, the Naive model, the model again reflect wide confidence intervals, reflecting uncertainty again. Finally, RW with drift model which captures the upward trend that stock exhibits over time. Yet, even though it might be the best of these three models, it still fails to predict the trend with high certainty.

##5.3 Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

# Extract data of interest
recent_production <- aus_production |>
  filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)

aus_production_fit <- aus_production %>%
  filter(year(Quarter) >= 1992)
aus_SNAIVE <- aus_production_fit %>% model(SNAIVE(Beer))

aus_SNAIVE %>% gg_tsresiduals()

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

What do you conclude? The residuals look random, with no clear patterns, and the autocorrelation plot shows no significant correlations. This means the Seasonal Naive (SNAIVE) model does a good job of capturing the seasonal pattern in the beer production data.

##5.4

Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

aus_exports_tidy <- global_economy %>%
    filter(Country == "Australia") %>%
    select(c("Country", "Code", "Year", "Exports"))
aus_exports_fit <- aus_exports_tidy %>%
    model(NAIVE(Exports))
# Look at the residuals
aus_exports_fit %>%
  gg_tsresiduals()

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).

##5.7 For your retail time series (from Exercise 7 in Section 2.10):

Create a training dataset consisting of observations before 2011 using

set.seed(568756)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries_train <- myseries |> filter(year(Month) < 2011)

# Plot the full series and training data
autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

fit <- myseries_train |> model(SNAIVE(Turnover))

fit |> gg_tsresiduals()

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_bin()`).

*he residuals don’t appear to be purely random, This suggests that the model has not fully captured all patterns in the data, indicating some form of autocorrelation. Even with a few outliers on both tails it appears that the residuls are normally distributed.

Check that your data have been split appropriately by producing the following plot.

fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`

fc |> autoplot(myseries)

fit |> accuracy()

## # A tibble: 1 × 12
##   State    Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Queensl… Househo… SNAIV… Trai…  21.5  40.1  29.3  6.42  8.10     1     1 0.813

fc |> accuracy(myseries)

## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… Quee… Househo… Test   105.  135.  108.  12.1  12.7  3.69  3.36 0.931

How sensitive are the accuracy measures to the amount of training data used?

More training data usually helps the model capture long-term trends, seasonal patterns, and various fluctuations in the data more accurately. A model trained on more historical data is generally expected to have better performance.

Data 624 Homework 3

Bishoy Sokkar

2024-09-12