1 Produce forecasts for the following series using whichever
of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each
case:
Australian Population (global_economy)
Bricks (aus_production)
NSW Lambs (aus_livestock)
Household wealth (hh_budget).
Australian takeaway food turnover (aus_retail).
library(fpp3)
## Warning: package 'fpp3' was built under R version 4.3.3
## Warning: package 'ggplot2' was built under R version 4.3.3
## Warning: package 'tsibble' was built under R version 4.3.3
## Warning: package 'tsibbledata' was built under R version 4.3.3
## Warning: package 'feasts' was built under R version 4.3.3
## Warning: package 'fabletools' was built under R version 4.3.3
## Warning: package 'fable' was built under R version 4.3.3
library(tidyverse)
The population of Australia is increasing , so there’s an upward
trend.For that reason I’ll use RW(y ~ drift())
aus_economy <- global_economy %>%
filter(Code == "AUS")
aus_economy %>%
model(Drift = RW(Population ~ drift())) %>%
forecast(h = 15) %>%
autoplot(aus_economy) +
labs(title = "Australian Population Forcast")

Bricks data (as seen on previous assignments) has apparent
seasonality so SNAIVE(y) will be a good choice
aus_production %>%
filter(!is.na(Bricks)) %>%
model(SNAIVE(Bricks ~ lag("year"))) %>%
forecast(h = 15) %>%
autoplot(aus_production) +
labs(title = "Australian Bricks Production Forcast")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

NSW Lambs has neither seasonality nor cyclic patterns therefore a
good fit will be NAIVE(y)
aus_livestock %>%
filter(State == "New South Wales",
Animal == "Lambs") %>%
model(NAIVE(Count)) %>%
forecast(h = 24) %>%
autoplot(aus_livestock) +
labs(title = "Lambs in New South Wales",
subtitle = "July 1976 - Dec 2018, Forecasted until Dec 2020")

Household wealth now has a slightly upward trend so I’ll use RW(y ~
drift())
hh_budget %>%
model(Drift = RW(Wealth ~ drift())) %>%
forecast(h = 15) %>%
autoplot(hh_budget) +
labs(title = "Household wealth Forcast")

Retail turnover, especially for takeaway food, can exhibit seasonal
patterns so I’ll use SNAIVE(y)
aus_retail %>%
filter(State == "South Australia",
Industry == 'Takeaway food services') %>%
model(SNAIVE(Turnover ~ lag("year"))) %>%
forecast(h = 15) %>%
autoplot(aus_retail) +
labs(title = "South Australian takeaway food turnover Forcast")

2 Use the Facebook stock price (data set gafa_stock) to do
the following:
1) Produce a time plot of the series.
2) Produce forecasts using the drift method and plot them.
3)Show that the forecasts are identical to extending the line drawn
between the first and last observations.
4)Try using some of the other benchmark functions to forecast the
same data set. Which do you think is best? Why?
1)
fb_stock <- gafa_stock %>%
filter(Symbol == "FB")
fb_stock_ts <- as_tsibble(fb_stock, key = "Symbol", index = "Date", regular = TRUE) %>% fill_gaps()
autoplot(fb_stock, Close) +
ggtitle("Facebook Stock Price Time Series") +
xlab("Date") +
ylab("Stock Price")

2)
fb_stock_ts %>%
model(Drift = RW(Close ~ drift())) %>%
forecast(h = 30) %>%
autoplot(fb_stock) +
labs(title = "Facebook Close Price Forcast")

3)
data <- data.frame(x1 = as.Date('2014-01-02'), x2 = as.Date('2018-12-31'), y1 = 54.71, y2 = 131.09)
fb_stock_ts %>%
model(Drift = RW(Close ~ drift())) %>%
forecast(h = 90) %>%
autoplot(fb_stock) +
labs(title = "Facebook Close Price Forcast") +
geom_segment(aes(x = x1, y = y1, xend = x2, yend = y2, colour = "segment"), data = data)

4)
fb_stock_ts %>%
model(
Mean = MEAN(Close),
Naive = NAIVE(Close),
Drift = RW(Close ~ drift())
) %>%
forecast(h = 90) %>%
autoplot(fb_stock_ts)

The naive benchmark provides the most accurate forecast for this
dataset, as its prediction interval is the narrowest compared to other
methods.
3 Apply a seasonal naïve method to the quarterly Australian
beer production data from 1992. Check if the residuals look like white
noise, and plot the forecasts. The following code will help.What do you
conclude?
# Extract data of interest
recent_production <- aus_production |>
filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)

The plot indicates that the results differ significantly from a
white noise series, as the values are relatively small. The residuals
exhibit a mean centered around zero and maintain constant variance,
suggesting that the series is not simply white noise. Additionally, the
ACF plot reveals that the value at lag 4 is notably larger than the
others. This peak can be linked to seasonal patterns, with peaks
occurring every fourth quarter (Q4) and troughs in the second quarter
(Q2). This cyclical behavior highlights the influence of seasonal
effects on the data, indicating that further exploration of seasonal
decomposition could provide additional insights into the underlying
trends.
4) Repeat the previous exercise using the Australian Exports
series from global_economy and the Bricks series from aus_production.
Use whichever of NAIVE() or SNAIVE() is more appropriate in each
case.
Australian Exports
# Extract data of interest
aus_economy <- global_economy %>%
filter(Country == "Australia")
# Define and estimate a model
fit <- aus_economy %>% model(NAIVE(Exports))
# Look at the residuals
fit %>% gg_tsresiduals()
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).

# Look a some forecasts
fit %>% forecast() %>% autoplot(aus_economy)

The graph shows that the residuals of the model fitted to Australian
exports are mostly random, with no strong autocorrelations and a roughly
normal distribution. The model seems to capture the overall trend well,
though there are some larger deviations in certain years, especially in
the early 2000s.
Bricks
fit <- aus_production %>%
filter(!is.na(Bricks)) %>%
model(SNAIVE(Bricks ~ lag("year")))
# Look at the residuals
fit %>% gg_tsresiduals()
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

# Look a some forecasts
# Look a some forecasts
fit %>% forecast() %>%
autoplot(aus_production)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

The residuals for the Bricks series from aus_production show some
large deviations and autocorrelation at shorter lags, indicating the
model does not fully capture certain time-dependent patterns. The
residuals are skewed, with some extreme values, suggesting the model
could be improved to better fit the data.
7 For your retail time series (from Exercise 7 in Section
2.10):
Create a training dataset consisting of observations before 2011
using
set.seed(2001)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries_train <- myseries |>
filter(year(Month) < 2011)
Check that your data have been split appropriately by producing the
following plot.
autoplot(myseries, Turnover) +
autolayer(myseries_train, Turnover, colour = "red")

Fit a seasonal naïve model using SNAIVE() applied to your training
data (myseries_train).
fit <- myseries_train |>
model(SNAIVE(Turnover))
The ACF plot shows significant autocorrelation at the first few
lags, with values outside the confidence bounds, indicating the
residuals are not entirely uncorrelated. The histogram of the residuals
on the other hand is roughly bell-shaped, indicating a near-normal
distribution. However, there is some asymmetry, the tails are slightly
thicker than in a normal distribution, suggesting a slight deviation
from normality.
Produce forecasts for the test data
fc <- fit |>
forecast(new_data = anti_join(myseries, myseries_train))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fc |> autoplot(myseries)

Compare the accuracy of your forecasts against the actual
values
fit |> accuracy()
## # A tibble: 1 × 12
## State Industry .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 South A… Takeawa… SNAIV… Trai… 1.60 5.20 4.12 3.44 10.2 1 1 0.797
fc |> accuracy(myseries)
## # A tibble: 1 × 12
## .model State Industry .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… Sout… Takeawa… Test 15.9 19.2 16.2 18.5 19.0 3.94 3.70 0.907
fit %>% accuracy() %>% select(MAE, RMSE, MAPE, MASE, RMSSE)
## # A tibble: 1 × 5
## MAE RMSE MAPE MASE RMSSE
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 4.12 5.20 10.2 1 1
fc %>% accuracy(myseries) %>% select(MAE, RMSE, MAPE, MASE, RMSSE)
## # A tibble: 1 × 5
## MAE RMSE MAPE MASE RMSSE
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 16.2 19.2 19.0 3.94 3.70
The model performs well on the training data but poorly on the test
data, with much higher errors like MAPE, indicating overfitting. This
happens when the model captures specific patterns in the training data
but fails to generalize to new data. To fix this, the model can be
simplified, more training data added, or regularization techniques used
to improve its ability to generalize.
How sensitive are the accuracy measures to the amount of training
data used?
Accuracy measures are highly sensitive to the amount of training
data used. Larger training datasets typically lead to improved metrics
like MAE, RMSE, and MAPE, reflecting better model performance. With more
different examples, the model captures underlying patterns and reduces
overfitting risk. Small training sets can enhance training accuracy
while resulting in poor test performance. Overall, more training data
usually leads to more reliable forecasts, while insufficient data
weakens effectiveness.