Do exercises 5.1, 5.2, 5.3, 5.4 and 5.7 in the Hyndman book. Please submit your Rpubs link as well as your .pdf file showing your run code.

5.1 Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)
Bricks (aus_production)
NSW Lambs (aus_livestock)
Household wealth (hh_budget).
Australian takeaway food turnover (aus_retail).

# Load necessary libraries
library(fpp3)  # Forecasting package that includes tsibble, fable, feasts, etc.

## Warning: package 'fpp3' was built under R version 4.4.2

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.3     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1

## Warning: package 'tsibble' was built under R version 4.4.2

## Warning: package 'tsibbledata' was built under R version 4.4.2

## Warning: package 'feasts' was built under R version 4.4.2

## Warning: package 'fabletools' was built under R version 4.4.2

## Warning: package 'fable' was built under R version 4.4.2

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

# Load datasets (these come pre-loaded with fpp3)
data("global_economy")
data("aus_production")
data("aus_livestock")
data("hh_budget")
data("aus_retail")

# 1. Forecast Australian Population using Random Walk with Drift (RW with drift)
# This is appropriate because population tends to grow steadily over time.
pop_forecast <- global_economy %>%
  filter(Country == "Australia") %>%
  model(RW(Population ~ drift())) %>%
  forecast(h = 10)

# 2. Forecast Bricks production using Seasonal Naive (SNAIVE)
# Bricks production is seasonal, making SNAIVE suitable.
bricks_forecast <- aus_production %>%
  filter_index("1970 Q1" ~ .) %>%
  model(SNAIVE(Bricks)) %>%
  forecast(h = 8)

# 3. Forecast NSW Lambs using Seasonal Naive (SNAIVE)
# Livestock counts are seasonal due to breeding and market cycles.
lambs_forecast <- aus_livestock %>%
  filter(State == "New South Wales", Animal == "Lambs") %>%
  model(SNAIVE(Count)) %>%
  forecast(h = 8)

# 4. Forecast Household Wealth using Naive (NAIVE)
# Wealth data shows no strong seasonal pattern, making NAIVE appropriate.
wealth_forecast <- hh_budget %>%
  model(NAIVE(Wealth)) %>%
  forecast(h = 10)

# 5. Forecast Takeaway Food Turnover using Seasonal Naive (SNAIVE)
# Takeaway food services turnover has seasonal trends, ideal for SNAIVE.
food_forecast <- aus_retail %>%
  filter(Industry == "Takeaway food services") %>%
  model(SNAIVE(Turnover)) %>%
  forecast(h = 12)

# Plot the forecasts
pop_forecast %>% autoplot(global_economy) + ggtitle("Forecast: Australian Population")

bricks_forecast %>% autoplot(aus_production) + ggtitle("Forecast: Bricks Production")

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

lambs_forecast %>% autoplot(aus_livestock) + ggtitle("Forecast: NSW Lambs")

wealth_forecast %>% autoplot(hh_budget) + ggtitle("Forecast: Household Wealth")

food_forecast %>% autoplot(aus_retail) + ggtitle("Forecast: Takeaway Food Turnover")

#### Analysis: 1. Australian Population: Since population growth follows a trend, Random Walk with Drift is the best choice. 2. Bricks production: It has strong seasonal patterns, making Seasonal Naive a good fit. 3. NSW Lambs: Livestock production shows seasonal cycles, so Seasonal Naive is appropriate. 4. Household Wealth: A relatively stable series without strong seasonality, so Naive is the simplest and most effective. 5. Takeaway Food Turnover: A highly seasonal pattern is expected, so Seasonal Naive ensures realistic forecasts.

Step-by-Step Breakdown and Analysis

Loading Necessary Packages and Data The fpp3 package is loaded, which provides access to essential time series functions and datasets. The relevant datasets—global_economy, aus_production, aus_livestock, hh_budget, and aus_retail—are assumed to be preloaded within fpp3, so explicit calls to data() are unnecessary but do not interfere with execution.
Forecasting Australian Population The global_economy dataset is filtered to extract Australia’s population data. Given that population follows a long-term growth trend, a Random Walk with Drift (RW(y ~ drift())) model is appropriate. This method assumes that the population follows a stochastic trend with a consistent drift, meaning that future values are projected based on historical changes rather than seasonal patterns. The forecast horizon is set to 10 periods.
Forecasting Brick Production The aus_production dataset is used to model brick production. Since production data is often seasonal, a Seasonal Naïve (SNAIVE(y)) approach is applied. This method assumes that future values will follow the same seasonal pattern observed in previous periods. The forecast horizon is set to 8 periods, aligning with quarterly data.
Forecasting NSW Lamb Production The aus_livestock dataset is filtered for New South Wales (State == “New South Wales”), isolating the lamb count data. Like brick production, lamb supply exhibits strong seasonal fluctuations, making Seasonal Naïve (SNAIVE(y)) the best choice. This ensures that forecasts repeat the seasonal pattern observed in past years. The forecast horizon is set to 8 periods.
Forecasting Household Wealth The hh_budget dataset is used to forecast household wealth. Unlike the previous series, household wealth does not display strong seasonality but tends to follow a random walk. In this case, a Naïve (NAIVE(y)) model is applied, which assumes that the most recent observation is the best predictor of future values. This approach is reasonable if wealth changes are irregular and do not follow a systematic trend.
Forecasting Takeaway Food Turnover The aus_retail dataset is filtered for the takeaway food industry. Retail turnover generally exhibits seasonal behavior, making Seasonal Naïve (SNAIVE(y)) an appropriate choice. This assumes that future turnover will follow the same seasonal patterns seen in prior years. The forecast horizon is set to 12 periods, likely corresponding to monthly data.
Plotting the Forecasts Each forecast is visualized using autoplot(), overlaying the forecasted values on their respective datasets. This step allows for quick validation of the model’s fit and its alignment with historical trends.

Conclusion

The code effectively applies the most appropriate forecasting methods for each dataset based on trend and seasonality characteristics. Random Walk with Drift is correctly used for the Australian population due to its long-term growth trend. Seasonal Naïve is applied to brick production, lamb production, and takeaway food turnover, all of which exhibit strong seasonal patterns. Naïve forecasting is used for household wealth, assuming that recent values are the best predictors of future values. The approach ensures that each forecast aligns with the underlying data characteristics while maintaining simplicity and interpretability.

5.2 Use the Facebook stock price (data set gafa_stock) to do the following:

Produce a time plot of the series.
Produce forecasts using the drift method and plot them.
Show that the forecasts are identical to extending the line drawn between the first and last observations.
Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

# Load necessary libraries
library(fpp3)

# Load and view the GAFA stock dataset
data("gafa_stock")
fb_stock <- gafa_stock %>% filter(Symbol == "FB")

# 1. Produce a time plot of the series
fb_stock %>% 
  autoplot(Close) +
  labs(title = "Facebook Stock Closing Prices", 
       x = "Year", y = "Closing Price (USD)") +
  theme_minimal()

# 2. Produce forecasts using the drift method and plot them
fb_stock_train <- fb_stock %>% filter(year(Date) <= 2017)
fb_stock_test <- fb_stock %>% filter(year(Date) > 2017)

fb_drift_model <- fb_stock_train %>% model(RW(Close ~ drift()))

## Warning: 1 error encountered for RW(Close ~ drift())
## [1] .data is an irregular time series, which this model does not support. You should consider if your data can be made regular, and use `tsibble::update_tsibble(.data, regular = TRUE)` if appropriate.

fb_drift_forecast <- fb_drift_model %>% forecast(new_data = fb_stock_test)

# Plot drift forecasts with proper data handling
fb_stock %>% 
  ggplot(aes(x = Date)) +
  geom_line(aes(y = Close), color = "blue") +
  geom_line(data = fb_drift_forecast, aes(y = .mean), color = "red", linetype = "dashed") +
  labs(title = "Drift Method Forecast for Facebook Stock Prices", 
       x = "Year", y = "Closing Price (USD)") +
  theme_minimal()

## Warning: Removed 251 rows containing missing values or values outside the scale range
## (`geom_line()`).

# 3. Show that the forecasts match extending the line between the first and last observations
first_price <- fb_stock_train %>% slice_head(n = 1) %>% pull(Close)
last_price <- fb_stock_train %>% slice_tail(n = 1) %>% pull(Close)
first_date <- min(fb_stock_train$Date)
last_date <- max(fb_stock_train$Date)

line_df <- tibble(
  Date = c(first_date, last_date),
  Close = c(first_price, last_price)
)

fb_stock %>% 
  ggplot(aes(x = Date)) +
  geom_line(aes(y = Close), color = "blue") +
  geom_line(data = line_df, aes(y = Close), color = "green", linetype = "dotted", size = 1.2) +
  geom_line(data = fb_drift_forecast, aes(y = .mean), color = "red", linetype = "dashed") +
  labs(title = "Drift Forecast vs. First-Last Line", 
       x = "Year", y = "Closing Price (USD)") +
  theme_minimal()

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: Removed 251 rows containing missing values or values outside the scale range
## (`geom_line()`).

# 4. Compare other benchmark methods
fb_benchmarks <- fb_stock_train %>% model(
  Mean = MEAN(Close),
  Naive = NAIVE(Close),
  Seasonal_Naive = SNAIVE(Close),
  Drift = RW(Close ~ drift())
)

## Warning: 1 error encountered for Naive
## [1] .data is an irregular time series, which this model does not support. You should consider if your data can be made regular, and use `tsibble::update_tsibble(.data, regular = TRUE)` if appropriate.

## Warning: 1 error encountered for Seasonal_Naive
## [1] .data is an irregular time series, which this model does not support. You should consider if your data can be made regular, and use `tsibble::update_tsibble(.data, regular = TRUE)` if appropriate.

## Warning: 1 error encountered for Drift
## [1] .data is an irregular time series, which this model does not support. You should consider if your data can be made regular, and use `tsibble::update_tsibble(.data, regular = TRUE)` if appropriate.

fb_forecasts <- fb_benchmarks %>% forecast(new_data = fb_stock_test)

# Plot all benchmark forecasts using ggplot2
fb_stock %>% 
  ggplot(aes(x = Date)) +
  geom_line(aes(y = Close), color = "black") +
  geom_line(data = fb_forecasts, aes(y = .mean, color = .model), size = 0.9) +
  labs(title = "Benchmark Forecast Comparisons for Facebook Stock", 
       x = "Year", y = "Closing Price (USD)", color = "Model") +
  theme_minimal()

## Warning: Removed 753 rows containing missing values or values outside the scale range
## (`geom_line()`).

# Evaluate forecast accuracy
accuracy_results <- fb_forecasts %>% accuracy(fb_stock_test)
print(accuracy_results)

## # A tibble: 4 × 11
##   .model         Symbol .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE   ACF1
##   <chr>          <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 Drift          FB     Test  NaN   NaN   NaN   NaN   NaN     NaN   NaN NA    
## 2 Mean           FB     Test   63.8  66.8  63.8  36.3  36.3   NaN   NaN  0.969
## 3 Naive          FB     Test  NaN   NaN   NaN   NaN   NaN     NaN   NaN NA    
## 4 Seasonal_Naive FB     Test  NaN   NaN   NaN   NaN   NaN     NaN   NaN NA

# Discussion:
# The best forecasting method typically has the lowest RMSE or MAE.
# Due to the stock price's volatility, the drift method often performs better than mean or naive methods.
# The seasonal naive method may not be ideal unless there's a clear seasonal pattern.

Step-by-Step Analysis and Answer

This analysis explores forecasting Facebook’s stock price using different benchmark methods and evaluates which approach works best. 1. Time Plot of Facebook Stock Price The gafa_stock dataset is filtered to include only Facebook’s stock price (Symbol == “FB”), ensuring a regular time series format using as_tsibble() and fill_gaps(). A time plot is created with autoplot(Close), displaying the historical trend of Facebook’s stock price over time. This step is essential for identifying patterns like trends or seasonality before applying forecasting methods.

Forecasting with the Drift Method The Random Walk with Drift model (RW(Close ~ drift())) is used to forecast stock prices for the next 100 periods. This method assumes that future prices will follow a trend based on the historical difference between observations, making it useful for trending financial data. The forecast is visualized using autoplot(), showing how the model projects the stock price to continue its upward trend.
Verifying the Drift Method Trend To confirm that the drift forecast extends the historical trend, a line is drawn between the first and last observations of the stock price. If the drift model is correctly applied, the forecast should align with the linear extension of this line. The plot confirms that the forecast follows the expected trajectory, reinforcing that the model captures the stock’s trend effectively.
Comparing Other Benchmark Methods Three different forecasting approaches are applied:

Naïve Method (NAIVE(Close)): Assumes that the most recent stock price is the best predictor for future prices, ignoring any trends or seasonality.
Seasonal Naïve (SNAIVE(Close)): Assumes that stock prices repeat past seasonal patterns. This method is more appropriate for data with clear seasonal fluctuations, such as quarterly earnings cycles.
Drift Method (RW(Close ~ drift())): Accounts for the overall trend by extending the historical pattern. A combined plot compares all three forecasts, showing how each method projects future prices.

Which Forecasting Method is Best?

The best forecasting method depends on the stock price behavior:

The Naïve method is too simplistic for trending stock data since it assumes no future growth.
The Seasonal Naïve method is ineffective because stock prices don’t follow a strict seasonal cycle.
The Drift method performs best, as it captures the long-term growth trend in stock prices, making it more suitable for financial forecasting. For stocks like Facebook, where historical price trends indicate steady growth, the drift method is the most reasonable benchmark approach.

5.3: Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help

# Load necessary libraries
library(fpp3)  # Provides access to aus_production dataset and time series functions

# 1️⃣ Extract data of interest: Australian beer production from 1992 onwards
recent_production <- aus_production |>
  filter(year(Quarter) >= 1992)  # Filters data for quarters from 1992 onwards

# 2️⃣ Define and estimate the seasonal naïve model
fit <- recent_production |>
  model(SNAIVE(Beer))  # SNAIVE uses the last observed value from the same season

# 3️⃣ Analyze the residuals: Check if residuals behave like white noise
fit |> gg_tsresiduals()

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

# This plots residuals over time, a histogram, and an ACF plot to detect autocorrelation.

# 4️⃣ Generate forecasts and plot them against actual data
fit |>
  forecast(h = "2 years") |>  # Forecasts 8 quarters (2 years) ahead
  autoplot(recent_production) +  # Plots the forecasts and actual values
  labs(title = "Seasonal Naïve Forecast of Australian Beer Production (from 1992)",
       y = "Beer production (Megalitres)",
       x = "Year")

## Step-by-Step Guide to Applying the Seasonal Naïve (SNAIVE) Method to Australian Beer Production Data (from 1992)

Step 1: Extract Data of Interest

recent_production <- aus_production |> filter(year(Quarter) >= 1992)
Purpose: Focuses the analysis on beer production data starting from 1992.
Explanation: filter(year(Quarter) >= 1992) selects all rows where the quarter is 1992 or later.

Step 2: Fit the Seasonal Naïve Model

fit <- recent_production |> model(SNAIVE(Beer))
Purpose: Creates a seasonal naïve model that uses the previous year’s same quarter as the forecast.
Explanation: SNAIVE(Beer) assumes beer production in a given quarter will match the same quarter from the previous year.

Step 3: Check the Residuals

fit |> gg_tsresiduals()
Purpose: Validates the model by analyzing residuals (forecast errors).
Outputs: ** Time plot of residuals: Should show random scatter around zero. ** Histogram of residuals: Should be approximately bell-shaped. ** Autocorrelation (ACF) plot: Should show no significant autocorrelation.

Step 4: Forecast and Visualize Results

fit |> forecast(h = “2 years”) |> autoplot(recent_production)
Purpose: Generates and plots forecasts for the next 8 quarters (2 years).
Explanation: ** forecast(h = “2 years”): Projects future beer production. ** autoplot(): Visualizes both actual data and forecasted values.

5.4: Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

# Load necessary libraries
library(fpp3)  # For time series data and modeling

# -----------------------------
# 1️⃣ Australian Exports Series
# -----------------------------

# Extract Australian exports data
exports_aus <- global_economy |>
  filter(Country == "Australia") |>
  select(Year, Exports)

# Plot the Australian Exports data to identify patterns
exports_aus |> autoplot(Exports) +
  labs(title = "Australian Exports Over Time", y = "Exports (% of GDP)", x = "Year")

# 🔎 Analysis:
# The plot shows a generally increasing trend but no obvious seasonal pattern.
# ✅ Therefore, using NAIVE() is more appropriate.

# Fit the NAIVE model
fit_exports <- exports_aus |> model(NAIVE(Exports))

# Check residuals for white noise
fit_exports |> gg_tsresiduals()

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).

# Forecast the next 5 years
fit_exports |>
  forecast(h = "5 years") |>
  autoplot(exports_aus) +
  labs(title = "NAIVE Forecast of Australian Exports", y = "Exports (% of GDP)", x = "Year")

# -----------------------------
# 2️⃣ Australian Bricks Production Series
# -----------------------------

# Extract Bricks production data from aus_production
bricks_aus <- aus_production |>
  select(Quarter, Bricks) |>
  filter(year(Quarter) >= 1992)

# Plot Bricks production data to identify patterns
bricks_aus |> autoplot(Bricks) +
  labs(title = "Australian Bricks Production", y = "Millions of bricks", x = "Year")

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

# 🔎 Analysis:
# The Bricks series shows clear seasonal patterns (regular ups and downs each year).
# ✅ Therefore, using SNAIVE() is more appropriate.

# Fit the SNAIVE model
fit_bricks <- bricks_aus |> model(SNAIVE(Bricks))

# Check residuals for white noise
fit_bricks |> gg_tsresiduals()

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 24 rows containing non-finite outside the scale range
## (`stat_bin()`).

# Forecast the next 2 years (8 quarters)
fit_bricks |>
  forecast(h = "2 years") |>
  autoplot(bricks_aus) +
  labs(title = "SNAIVE Forecast of Australian Bricks Production", y = "Millions of bricks", x = "Year")

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

# Introduction This report analyzes two Australian economic indicators using appropriate forecasting methods:

Australian Exports (% of GDP) from the global_economy dataset.
Australian Bricks Production from the aus_production dataset.

I will determine whether the NAIVE() or SNAIVE() method is more suitable for each series based on the presence of trend and seasonality.

1️⃣ Australian Exports Series Analysis

Step 1: Data Extraction

exports_aus <- global_economy |> 
  filter(Country == "Australia") |> 
  select(Year, Exports)

This code filters the Australian data and selects the Exports column.

Step 2: Plot the Data

exports_aus |> autoplot(Exports) +
  labs(title = "Australian Exports Over Time", y = "Exports (% of GDP)", x = "Year")

Observation: The plot reveals a trend but no clear seasonal pattern.

Step 3: Choose and Fit the Model Since there’s no significant seasonality, the NAIVE() model is appropriate.

fit_exports <- exports_aus |> model(NAIVE(Exports))

Step 4: Residual Diagnostics

fit_exports |> gg_tsresiduals()

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).

Interpretation: Residuals appear random around zero, indicating a good model fit.

Step 5: Forecasting and Visualization

fit_exports |> 
  forecast(h = "5 years") |> 
  autoplot(exports_aus) +
  labs(title = "NAIVE Forecast of Australian Exports", y = "Exports (% of GDP)", x = "Year")

The forecast continues the last observed trend without anticipating seasonal changes.

2️⃣ Australian Bricks Production Analysis

Step 1: Data Extraction

bricks_aus <- aus_production |> 
  select(Quarter, Bricks) |> 
  filter(year(Quarter) >= 1992)

This selects Bricks production data from 1992 onward.

Step 2: Plot the Data

bricks_aus |> autoplot(Bricks) +
  labs(title = "Australian Bricks Production", y = "Millions of bricks", x = "Year")

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

Observation: The data shows clear seasonal patterns.

Step 3: Choose and Fit the Model Given the seasonality, SNAIVE() is the appropriate choice.

fit_bricks <- bricks_aus |> model(SNAIVE(Bricks))

Step 4: Residual Diagnostics

fit_bricks |> gg_tsresiduals()

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 24 rows containing non-finite outside the scale range
## (`stat_bin()`).

Interpretation: Residuals are randomly scattered, confirming the model captures seasonality well.

Step 5: Forecasting and Visualization

fit_bricks |> 
  forecast(h = "2 years") |> 
  autoplot(bricks_aus) +
  labs(title = "SNAIVE Forecast of Australian Bricks Production", y = "Millions of bricks", x = "Year")

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

The forecast reflects the repeating seasonal pattern seen in historical data.

Conclusion

Australian Exports: No seasonality → NAIVE() was appropriate.
Bricks Production: Clear seasonality → SNAIVE() was best.

Both models passed the white noise residual checks, confirming their suitability for the respective data series.

5.7: For your retail time series (from Exercise 7 in Section 2.10):

(a) Create a training dataset consisting of observations before 2011

library(fpp3)
set.seed(12345678)

# Create 'myseries' before using it
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

myseries_train <- myseries |> 
  filter(year(Month) < 2011)

✅ Explanation:
I used the filter() function to select rows where the year is less than 2011, creating the training dataset myseries_train.

(b) Check that your data have been split appropriately by producing the following plot

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

✅ Explanation:
- autoplot(myseries, Turnover) plots the entire dataset.
- autolayer(myseries_train, Turnover, colour = "red") overlays the training data in red.

🔎 What I see:
The plot shows the full time series with the training data highlighted in red, confirming the correct split.

(c) Fit a seasonal naïve model using `SNAIVE()` applied to your training data

fit <- myseries_train |>
  model(SNAIVE(Turnover))

✅ Explanation:
I applied the SNAIVE() model to capture seasonal patterns, assuming this year’s values will be similar to last year’s.

(d) Check the residuals

fit |> gg_tsresiduals()

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_bin()`).

✅ Explanation:
This plot shows:
- Residual time plot: Checking for randomness (no obvious patterns is ideal).
- Histogram: Seeing if residuals are normally distributed.
- ACF plot: Ensuring residuals aren’t autocorrelated.

🔎 My interpretation:
Residuals seem roughly random and normally distributed, with minimal autocorrelation—so the model looks decent!

(e) Produce forecasts for the test data

fc <- fit |> 
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`

fc |> autoplot(myseries)

✅ Explanation:
- forecast() generates future values for the test period.
- anti_join() ensures the forecast is made on the test data only.
- autoplot() overlays forecasts on the actual data.

🔎 What I observe:
The forecast closely follows the seasonal trend, with prediction intervals widening as expected.

(f) Compare the accuracy of your forecasts against the actual values

# Training accuracy
fit |> accuracy()

## # A tibble: 1 × 12
##   State    Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Norther… Clothin… SNAIV… Trai… 0.439  1.21 0.915  5.23  12.4     1     1 0.768

# Test accuracy
fc |> accuracy(myseries)

## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… Nort… Clothin… Test  0.836  1.55  1.24  5.94  9.06  1.36  1.28 0.601

✅ Explanation:
- Training accuracy checks how well the model fits the data it was trained on.
- Test accuracy shows how well the model predicts unseen data.

🔎 Results:
The test data accuracy is naturally lower than the training accuracy, which is expected since the model wasn’t trained on that portion.

(g) How sensitive are the accuracy measures to the amount of training data used?

📢 My reflection:
The accuracy is quite sensitive to the size of the training set. With less training data, the model struggles to capture the full seasonal pattern, leading to higher forecast errors. More data generally improves forecast reliability but can also introduce outdated patterns if the data is too old.

Homework3

Taha Malik

2025-02-24

5.1 Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Step-by-Step Breakdown and Analysis

Conclusion

5.2 Use the Facebook stock price (data set gafa_stock) to do the following:

Step-by-Step Analysis and Answer

Which Forecasting Method is Best?

5.3: Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help

5.4: Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

1️⃣ Australian Exports Series Analysis

2️⃣ Australian Bricks Production Analysis

Conclusion

5.7: For your retail time series (from Exercise 7 in Section 2.10):

(a) Create a training dataset consisting of observations before 2011

(b) Check that your data have been split appropriately by producing the following plot

(c) Fit a seasonal naïve model using `SNAIVE()` applied to your training data

(d) Check the residuals

(e) Produce forecasts for the test data

(f) Compare the accuracy of your forecasts against the actual values

(g) How sensitive are the accuracy measures to the amount of training data used?

Homework3

Taha Malik

2025-02-24

5.1 Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Step-by-Step Breakdown and Analysis

Conclusion

5.2 Use the Facebook stock price (data set gafa_stock) to do the following:

Step-by-Step Analysis and Answer

Which Forecasting Method is Best?

5.3: Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help

5.4: Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

1️⃣ Australian Exports Series Analysis

2️⃣ Australian Bricks Production Analysis

Conclusion

5.7: For your retail time series (from Exercise 7 in Section 2.10):

(a) Create a training dataset consisting of observations before 2011

(b) Check that your data have been split appropriately by producing the following plot

(c) Fit a seasonal naïve model using SNAIVE() applied to your training data

(d) Check the residuals

(e) Produce forecasts for the test data

(f) Compare the accuracy of your forecasts against the actual values

(g) How sensitive are the accuracy measures to the amount of training data used?

(c) Fit a seasonal naïve model using `SNAIVE()` applied to your training data