Do exercises 5.1, 5.2, 5.3, 5.4 and 5.7 in the Hyndman book. Please submit your Rpubs link as well as your .pdf file showing your run code.
# Load necessary libraries
library(fpp3) # Forecasting package that includes tsibble, fable, feasts, etc.
## Warning: package 'fpp3' was built under R version 4.4.2
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.6
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.1
## ✔ lubridate 1.9.3 ✔ fable 0.4.1
## ✔ ggplot2 3.5.1
## Warning: package 'tsibble' was built under R version 4.4.2
## Warning: package 'tsibbledata' was built under R version 4.4.2
## Warning: package 'feasts' was built under R version 4.4.2
## Warning: package 'fabletools' was built under R version 4.4.2
## Warning: package 'fable' was built under R version 4.4.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
# Load datasets (these come pre-loaded with fpp3)
data("global_economy")
data("aus_production")
data("aus_livestock")
data("hh_budget")
data("aus_retail")
# 1. Forecast Australian Population using Random Walk with Drift (RW with drift)
# This is appropriate because population tends to grow steadily over time.
pop_forecast <- global_economy %>%
filter(Country == "Australia") %>%
model(RW(Population ~ drift())) %>%
forecast(h = 10)
# 2. Forecast Bricks production using Seasonal Naive (SNAIVE)
# Bricks production is seasonal, making SNAIVE suitable.
bricks_forecast <- aus_production %>%
filter_index("1970 Q1" ~ .) %>%
model(SNAIVE(Bricks)) %>%
forecast(h = 8)
# 3. Forecast NSW Lambs using Seasonal Naive (SNAIVE)
# Livestock counts are seasonal due to breeding and market cycles.
lambs_forecast <- aus_livestock %>%
filter(State == "New South Wales", Animal == "Lambs") %>%
model(SNAIVE(Count)) %>%
forecast(h = 8)
# 4. Forecast Household Wealth using Naive (NAIVE)
# Wealth data shows no strong seasonal pattern, making NAIVE appropriate.
wealth_forecast <- hh_budget %>%
model(NAIVE(Wealth)) %>%
forecast(h = 10)
# 5. Forecast Takeaway Food Turnover using Seasonal Naive (SNAIVE)
# Takeaway food services turnover has seasonal trends, ideal for SNAIVE.
food_forecast <- aus_retail %>%
filter(Industry == "Takeaway food services") %>%
model(SNAIVE(Turnover)) %>%
forecast(h = 12)
# Plot the forecasts
pop_forecast %>% autoplot(global_economy) + ggtitle("Forecast: Australian Population")
bricks_forecast %>% autoplot(aus_production) + ggtitle("Forecast: Bricks Production")
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
lambs_forecast %>% autoplot(aus_livestock) + ggtitle("Forecast: NSW Lambs")
wealth_forecast %>% autoplot(hh_budget) + ggtitle("Forecast: Household Wealth")
food_forecast %>% autoplot(aus_retail) + ggtitle("Forecast: Takeaway Food Turnover")
#### Analysis: 1. Australian Population: Since
population growth follows a trend, Random Walk with Drift is the best
choice. 2. Bricks production: It has strong seasonal
patterns, making Seasonal Naive a good fit. 3. NSW
Lambs: Livestock production shows seasonal cycles, so Seasonal
Naive is appropriate. 4. Household Wealth: A relatively
stable series without strong seasonality, so Naive is the simplest and
most effective. 5. Takeaway Food Turnover: A highly
seasonal pattern is expected, so Seasonal Naive ensures realistic
forecasts.
The code effectively applies the most appropriate forecasting methods for each dataset based on trend and seasonality characteristics. Random Walk with Drift is correctly used for the Australian population due to its long-term growth trend. Seasonal Naïve is applied to brick production, lamb production, and takeaway food turnover, all of which exhibit strong seasonal patterns. Naïve forecasting is used for household wealth, assuming that recent values are the best predictors of future values. The approach ensures that each forecast aligns with the underlying data characteristics while maintaining simplicity and interpretability.
# Load necessary libraries
library(fpp3)
# Load and view the GAFA stock dataset
data("gafa_stock")
fb_stock <- gafa_stock %>% filter(Symbol == "FB")
# 1. Produce a time plot of the series
fb_stock %>%
autoplot(Close) +
labs(title = "Facebook Stock Closing Prices",
x = "Year", y = "Closing Price (USD)") +
theme_minimal()
# 2. Produce forecasts using the drift method and plot them
fb_stock_train <- fb_stock %>% filter(year(Date) <= 2017)
fb_stock_test <- fb_stock %>% filter(year(Date) > 2017)
fb_drift_model <- fb_stock_train %>% model(RW(Close ~ drift()))
## Warning: 1 error encountered for RW(Close ~ drift())
## [1] .data is an irregular time series, which this model does not support. You should consider if your data can be made regular, and use `tsibble::update_tsibble(.data, regular = TRUE)` if appropriate.
fb_drift_forecast <- fb_drift_model %>% forecast(new_data = fb_stock_test)
# Plot drift forecasts with proper data handling
fb_stock %>%
ggplot(aes(x = Date)) +
geom_line(aes(y = Close), color = "blue") +
geom_line(data = fb_drift_forecast, aes(y = .mean), color = "red", linetype = "dashed") +
labs(title = "Drift Method Forecast for Facebook Stock Prices",
x = "Year", y = "Closing Price (USD)") +
theme_minimal()
## Warning: Removed 251 rows containing missing values or values outside the scale range
## (`geom_line()`).
# 3. Show that the forecasts match extending the line between the first and last observations
first_price <- fb_stock_train %>% slice_head(n = 1) %>% pull(Close)
last_price <- fb_stock_train %>% slice_tail(n = 1) %>% pull(Close)
first_date <- min(fb_stock_train$Date)
last_date <- max(fb_stock_train$Date)
line_df <- tibble(
Date = c(first_date, last_date),
Close = c(first_price, last_price)
)
fb_stock %>%
ggplot(aes(x = Date)) +
geom_line(aes(y = Close), color = "blue") +
geom_line(data = line_df, aes(y = Close), color = "green", linetype = "dotted", size = 1.2) +
geom_line(data = fb_drift_forecast, aes(y = .mean), color = "red", linetype = "dashed") +
labs(title = "Drift Forecast vs. First-Last Line",
x = "Year", y = "Closing Price (USD)") +
theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 251 rows containing missing values or values outside the scale range
## (`geom_line()`).
# 4. Compare other benchmark methods
fb_benchmarks <- fb_stock_train %>% model(
Mean = MEAN(Close),
Naive = NAIVE(Close),
Seasonal_Naive = SNAIVE(Close),
Drift = RW(Close ~ drift())
)
## Warning: 1 error encountered for Naive
## [1] .data is an irregular time series, which this model does not support. You should consider if your data can be made regular, and use `tsibble::update_tsibble(.data, regular = TRUE)` if appropriate.
## Warning: 1 error encountered for Seasonal_Naive
## [1] .data is an irregular time series, which this model does not support. You should consider if your data can be made regular, and use `tsibble::update_tsibble(.data, regular = TRUE)` if appropriate.
## Warning: 1 error encountered for Drift
## [1] .data is an irregular time series, which this model does not support. You should consider if your data can be made regular, and use `tsibble::update_tsibble(.data, regular = TRUE)` if appropriate.
fb_forecasts <- fb_benchmarks %>% forecast(new_data = fb_stock_test)
# Plot all benchmark forecasts using ggplot2
fb_stock %>%
ggplot(aes(x = Date)) +
geom_line(aes(y = Close), color = "black") +
geom_line(data = fb_forecasts, aes(y = .mean, color = .model), size = 0.9) +
labs(title = "Benchmark Forecast Comparisons for Facebook Stock",
x = "Year", y = "Closing Price (USD)", color = "Model") +
theme_minimal()
## Warning: Removed 753 rows containing missing values or values outside the scale range
## (`geom_line()`).
# Evaluate forecast accuracy
accuracy_results <- fb_forecasts %>% accuracy(fb_stock_test)
print(accuracy_results)
## # A tibble: 4 × 11
## .model Symbol .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Drift FB Test NaN NaN NaN NaN NaN NaN NaN NA
## 2 Mean FB Test 63.8 66.8 63.8 36.3 36.3 NaN NaN 0.969
## 3 Naive FB Test NaN NaN NaN NaN NaN NaN NaN NA
## 4 Seasonal_Naive FB Test NaN NaN NaN NaN NaN NaN NaN NA
# Discussion:
# The best forecasting method typically has the lowest RMSE or MAE.
# Due to the stock price's volatility, the drift method often performs better than mean or naive methods.
# The seasonal naive method may not be ideal unless there's a clear seasonal pattern.
This analysis explores forecasting Facebook’s stock price using different benchmark methods and evaluates which approach works best. 1. Time Plot of Facebook Stock Price The gafa_stock dataset is filtered to include only Facebook’s stock price (Symbol == “FB”), ensuring a regular time series format using as_tsibble() and fill_gaps(). A time plot is created with autoplot(Close), displaying the historical trend of Facebook’s stock price over time. This step is essential for identifying patterns like trends or seasonality before applying forecasting methods.
Forecasting with the Drift Method The Random Walk with Drift model (RW(Close ~ drift())) is used to forecast stock prices for the next 100 periods. This method assumes that future prices will follow a trend based on the historical difference between observations, making it useful for trending financial data. The forecast is visualized using autoplot(), showing how the model projects the stock price to continue its upward trend.
Verifying the Drift Method Trend To confirm that the drift forecast extends the historical trend, a line is drawn between the first and last observations of the stock price. If the drift model is correctly applied, the forecast should align with the linear extension of this line. The plot confirms that the forecast follows the expected trajectory, reinforcing that the model captures the stock’s trend effectively.
Comparing Other Benchmark Methods Three different forecasting approaches are applied:
The best forecasting method depends on the stock price behavior:
# Load necessary libraries
library(fpp3) # Provides access to aus_production dataset and time series functions
# 1️⃣ Extract data of interest: Australian beer production from 1992 onwards
recent_production <- aus_production |>
filter(year(Quarter) >= 1992) # Filters data for quarters from 1992 onwards
# 2️⃣ Define and estimate the seasonal naïve model
fit <- recent_production |>
model(SNAIVE(Beer)) # SNAIVE uses the last observed value from the same season
# 3️⃣ Analyze the residuals: Check if residuals behave like white noise
fit |> gg_tsresiduals()
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).
# This plots residuals over time, a histogram, and an ACF plot to detect autocorrelation.
# 4️⃣ Generate forecasts and plot them against actual data
fit |>
forecast(h = "2 years") |> # Forecasts 8 quarters (2 years) ahead
autoplot(recent_production) + # Plots the forecasts and actual values
labs(title = "Seasonal Naïve Forecast of Australian Beer Production (from 1992)",
y = "Beer production (Megalitres)",
x = "Year")
## Step-by-Step Guide to Applying the Seasonal Naïve (SNAIVE) Method to
Australian Beer Production Data (from 1992)
# Load necessary libraries
library(fpp3) # For time series data and modeling
# -----------------------------
# 1️⃣ Australian Exports Series
# -----------------------------
# Extract Australian exports data
exports_aus <- global_economy |>
filter(Country == "Australia") |>
select(Year, Exports)
# Plot the Australian Exports data to identify patterns
exports_aus |> autoplot(Exports) +
labs(title = "Australian Exports Over Time", y = "Exports (% of GDP)", x = "Year")
# 🔎 Analysis:
# The plot shows a generally increasing trend but no obvious seasonal pattern.
# ✅ Therefore, using NAIVE() is more appropriate.
# Fit the NAIVE model
fit_exports <- exports_aus |> model(NAIVE(Exports))
# Check residuals for white noise
fit_exports |> gg_tsresiduals()
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).
# Forecast the next 5 years
fit_exports |>
forecast(h = "5 years") |>
autoplot(exports_aus) +
labs(title = "NAIVE Forecast of Australian Exports", y = "Exports (% of GDP)", x = "Year")
# -----------------------------
# 2️⃣ Australian Bricks Production Series
# -----------------------------
# Extract Bricks production data from aus_production
bricks_aus <- aus_production |>
select(Quarter, Bricks) |>
filter(year(Quarter) >= 1992)
# Plot Bricks production data to identify patterns
bricks_aus |> autoplot(Bricks) +
labs(title = "Australian Bricks Production", y = "Millions of bricks", x = "Year")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
# 🔎 Analysis:
# The Bricks series shows clear seasonal patterns (regular ups and downs each year).
# ✅ Therefore, using SNAIVE() is more appropriate.
# Fit the SNAIVE model
fit_bricks <- bricks_aus |> model(SNAIVE(Bricks))
# Check residuals for white noise
fit_bricks |> gg_tsresiduals()
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 24 rows containing non-finite outside the scale range
## (`stat_bin()`).
# Forecast the next 2 years (8 quarters)
fit_bricks |>
forecast(h = "2 years") |>
autoplot(bricks_aus) +
labs(title = "SNAIVE Forecast of Australian Bricks Production", y = "Millions of bricks", x = "Year")
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
# Introduction This report analyzes two Australian economic indicators
using appropriate forecasting methods:
global_economy
dataset.aus_production
dataset.I will determine whether the NAIVE() or SNAIVE() method is more suitable for each series based on the presence of trend and seasonality.
exports_aus <- global_economy |>
filter(Country == "Australia") |>
select(Year, Exports)
This code filters the Australian data and selects the
Exports
column.
exports_aus |> autoplot(Exports) +
labs(title = "Australian Exports Over Time", y = "Exports (% of GDP)", x = "Year")
Observation: The plot reveals a trend
but no clear seasonal pattern.
fit_exports <- exports_aus |> model(NAIVE(Exports))
fit_exports |> gg_tsresiduals()
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).
Interpretation: Residuals appear random around zero,
indicating a good model fit.
fit_exports |>
forecast(h = "5 years") |>
autoplot(exports_aus) +
labs(title = "NAIVE Forecast of Australian Exports", y = "Exports (% of GDP)", x = "Year")
The forecast continues the last observed trend without anticipating
seasonal changes.
bricks_aus <- aus_production |>
select(Quarter, Bricks) |>
filter(year(Quarter) >= 1992)
This selects Bricks production data from 1992 onward.
bricks_aus |> autoplot(Bricks) +
labs(title = "Australian Bricks Production", y = "Millions of bricks", x = "Year")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
Observation: The data shows clear seasonal
patterns.
fit_bricks <- bricks_aus |> model(SNAIVE(Bricks))
fit_bricks |> gg_tsresiduals()
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 24 rows containing non-finite outside the scale range
## (`stat_bin()`).
Interpretation: Residuals are randomly scattered,
confirming the model captures seasonality well.
fit_bricks |>
forecast(h = "2 years") |>
autoplot(bricks_aus) +
labs(title = "SNAIVE Forecast of Australian Bricks Production", y = "Millions of bricks", x = "Year")
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
The forecast reflects the repeating seasonal pattern seen in historical
data.
Both models passed the white noise residual checks, confirming their suitability for the respective data series.
library(fpp3)
set.seed(12345678)
# Create 'myseries' before using it
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`, 1))
myseries_train <- myseries |>
filter(year(Month) < 2011)
✅ Explanation:
I used the filter()
function to select rows where the year
is less than 2011, creating the training dataset
myseries_train
.
autoplot(myseries, Turnover) +
autolayer(myseries_train, Turnover, colour = "red")
✅ Explanation:
- autoplot(myseries, Turnover)
plots the entire
dataset.
- autolayer(myseries_train, Turnover, colour = "red")
overlays the training data in red.
🔎 What I see:
The plot shows the full time series with the training data highlighted
in red, confirming the correct split.
SNAIVE()
applied
to your training datafit <- myseries_train |>
model(SNAIVE(Turnover))
✅ Explanation:
I applied the SNAIVE()
model to capture seasonal patterns,
assuming this year’s values will be similar to last year’s.
fit |> gg_tsresiduals()
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_bin()`).
✅ Explanation:
This plot shows:
- Residual time plot: Checking for randomness (no obvious patterns is
ideal).
- Histogram: Seeing if residuals are normally distributed.
- ACF plot: Ensuring residuals aren’t autocorrelated.
🔎 My interpretation:
Residuals seem roughly random and normally distributed, with minimal
autocorrelation—so the model looks decent!
fc <- fit |>
forecast(new_data = anti_join(myseries, myseries_train))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fc |> autoplot(myseries)
✅ Explanation:
- forecast()
generates future values for the test
period.
- anti_join()
ensures the forecast is made on the test data
only.
- autoplot()
overlays forecasts on the actual data.
🔎 What I observe:
The forecast closely follows the seasonal trend, with prediction
intervals widening as expected.
# Training accuracy
fit |> accuracy()
## # A tibble: 1 × 12
## State Industry .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Norther… Clothin… SNAIV… Trai… 0.439 1.21 0.915 5.23 12.4 1 1 0.768
# Test accuracy
fc |> accuracy(myseries)
## # A tibble: 1 × 12
## .model State Industry .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… Nort… Clothin… Test 0.836 1.55 1.24 5.94 9.06 1.36 1.28 0.601
✅ Explanation:
- Training accuracy checks how well the model fits the data it was
trained on.
- Test accuracy shows how well the model predicts unseen data.
🔎 Results:
The test data accuracy is naturally lower than the training accuracy,
which is expected since the model wasn’t trained on that portion.
📢 My reflection:
The accuracy is quite sensitive to the size of the training set. With
less training data, the model struggles to capture the full seasonal
pattern, leading to higher forecast errors. More data generally improves
forecast reliability but can also introduce outdated patterns if the
data is too old.