Produce forecasts for the following series using whichever of
NAIVE(y),SNAIVE(y)orRW(y ~ drift())is more appropriate in each case:
global_economy)Use of the RW(y ~ drift()) is most appropriate because
the data is trending strictly increasing and has no seasonality
component.
global_economy |>
filter(Country == 'Australia') |>
model(RW(Population ~ drift())) |>
forecast(h = 5) |>
autoplot(global_economy) +
labs(title = 'Australian Population',
subtitle = 'Drift Method: 5 year forecast')aus_production)Use of the SNAIVE method is most appropriate because the
data is highly seasonal.
aus_production |>
filter(!is.na(Bricks)) |>
model(SNAIVE(Bricks)) |>
forecast(h = '5 years') |>
autoplot(aus_production) +
labs(title = 'Austrialian Clay Brick Production',
subtitle = 'SNAIVE Method: 5 year forecast')## Warning: Removed 20 rows containing missing values (`geom_line()`).
aus_livestock)Looking at the classical decomposition, we can see that the seasonal component is insignificant compared to the trend and random components.
aus_livestock |>
filter(Animal == 'Lambs',
State == 'New South Wales',
!is.na(Count)) |>
model(classical_decomposition(Count, type = 'additive')) |>
components() |>
autoplot()## Warning: Removed 6 rows containing missing values (`geom_line()`).
Use of the NAIVE method is appropriate because there is
no seasonality to the data and the trend is not linear.
aus_livestock |>
filter(Animal == 'Lambs',
State == 'New South Wales',
!is.na(Count)) |>
model(NAIVE(Count)) |>
forecast(h = '5 years') |>
autoplot(aus_livestock) +
labs(title = 'New South Wales Lambs Slaughter',
subtitle = 'NAIVE Model: 5 year forecast')hh_budget)Use of the RW(y ~ drift()) is most appropriate the data
has no seasonality and is generally increasing.
hh_budget |>
filter(!is.na(Wealth)) |>
model(RW(Wealth ~ drift())) |>
forecast(h = '5 years') |>
autoplot(hh_budget) +
labs(title = 'Household Wealth',
subtitle = 'Drift Method: 5 year forecast')aus_retail)aus_retail |>
filter(Industry == 'Takeaway food services') |>
model(stl = STL(Turnover)) |>
components() |>
autoplot()Use of the SNAIVE model is most appropriate because the
STL decomposition shows a small seasonal component.
aus_retail |>
filter(Industry == 'Takeaway food services') |>
model(SNAIVE(Turnover)) |>
forecast(h = '5 years') |>
autoplot(aus_retail) +
labs(title = 'Australian Takeaway food services Turnover',
subtitle = 'SNAIVE Model: 5 year forecast') +
facet_wrap(~State, scales = 'free')Use the Facebook stock price (
gafa_stock) to do the following:
- Produce a time plot of the series
fb <- gafa_stock |>
filter(Symbol == 'FB') |>
mutate(day = row_number()) |>
update_tsibble(index = day, regular = TRUE)
fb |>
autoplot(Close) +
labs(title = 'Daily close price of Facebook')
- Produce forecasts using the drift method and plot them.
fb |>
model(RW(Close ~ drift())) |>
forecast(h = 60) |>
autoplot(fb) +
labs(title = 'Daily Close price of Facebook',
subtitle = 'Drift Method: 60 traiding day forecast')
- Show that the forecasts are identical to extending the line drawn between the first and last observations.
fb |>
model(RW(Close ~ drift())) |>
forecast(h = 60) |>
autoplot(fb) +
labs(title = 'Daily Close price of Facebook',
subtitle = 'Drift Method: 60 traiding day forecast') +
geom_segment(x = fb |>
filter(day == min(day)) |>
pull(day),
y = fb |>
filter(day == min(day)) |>
pull(Close),
xend = fb |>
filter(day == max(day)) |>
pull(day),
yend = fb |>
filter(day == max(day)) |>
pull(Close),
colour = 'blue',
linetype = 'dashed')
- Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?
It makes no sense to use the SNAIVE method since the
data has no seasonality component. Using MEAN and
NAIVE produce a constant forecast which doesn’t take the
overall trend into account.
fb |>
model(mean = MEAN(Close),
Naive = NAIVE(Close),
Drift = RW(Close ~ drift())) |>
forecast(h = 60) |>
autoplot(fb)Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.
# Extract data of interest
recent_production <- aus_production |>
filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()
# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)## # A tibble: 1 × 3
## .model bp_stat bp_pvalue
## <chr> <dbl> <dbl>
## 1 SNAIVE(Beer) 29.7 0.000234
## # A tibble: 1 × 3
## .model lb_stat lb_pvalue
## <chr> <dbl> <dbl>
## 1 SNAIVE(Beer) 32.3 0.0000834
The p-value from both tests is less than 0.05, which indicates that the residuals are not explained by white noise.
Repeat the previous exercise using the Australian Exports series from
global_economy and the Bricks series from
aus_production. Use whichever of NAIVE() or
SNAIVE() is more appropriate in each case.
global_economyglobal_economy |>
filter(Country == 'Australia') |>
autoplot(Exports) +
labs(title = 'Australian Exports')Australian Exports does not appear to have a seasonal component, so
we will construct a model using NAIVE.
model_ae <- global_economy |>
filter(Country == 'Australia') |>
model(NAIVE(Exports))
model_ae |> gg_tsresiduals()
model_ae |> forecast() |> autoplot(global_economy)The residuals are centered around zero and appear to be normally distributed.
## # A tibble: 1 × 4
## Country .model lb_stat lb_pvalue
## <fct> <chr> <dbl> <dbl>
## 1 Australia NAIVE(Exports) 16.4 0.0896
The Ljung-Box test produces a p-value greater than 0.05 which allows us to conclude that the residuals resemble white noise.
aus_production## Warning: Removed 20 rows containing missing values (`geom_line()`).
Bricks production has a strong seasonal component, so we will
construct a model using SNAIVE.
model_b <- aus_production |>
model(SNAIVE(Bricks))
model_b |> gg_tsresiduals()
model_b |> forecast() |> autoplot(aus_production)The residuals do not seem to be centered at zero. The ACF shows strong seasonality that is not accounted for in the model.
## # A tibble: 1 × 3
## .model lb_stat lb_pvalue
## <chr> <dbl> <dbl>
## 1 SNAIVE(Bricks) 274. 0
The Ljung-Box test produces a p-value less than 0.05 which allows us to conclude that the residuals do not resemble white noise.
From your retail time series (from Exercise 7 in Section 2.10):
- Create a training dataset consisting of observations before 2011 using
- Check that your data have been split appropriately by producing the following plot.
- Fit a seasonal naïve model using
SNAIVE()applied to your training data (myseries_train).
- Check the residuals
The residuals do not appear to be uncorrelated nor normally distributed.
- Produce forecasts for the test data
fc <- myseries_fit |>
forecast(new_data = anti_join(myseries, myseries_train,
by = join_by(State, Industry, `Series ID`, Month, Turnover)))
fc |> autoplot(myseries)
- Compare the accuracy of your forecasts against the actual values.
## # A tibble: 1 × 7
## .model .type RMSE MAE MAPE MASE RMSSE
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(Turnover) Training 0.748 0.446 14.9 1 1.00
## # A tibble: 1 × 7
## .model .type RMSE MAE MAPE MASE RMSSE
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(Turnover) Test 0.953 0.801 14.7 1.80 1.27
- How sensitive are the accuracy measures to the amount of training data used?
Accuracy measures are very sensitive to the amount of training data used. The more training data, the better the accuracy. With time series it is also important to use training data as close to the time periods being forecast.