5.1 Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:
Australian Population (global_economy)
Bricks (aus_production)
NSW Lambs (aus_livestock)
Household wealth (hh_budget).
Australian takeaway food turnover (aus_retail). ### Australian Population (global_economy)
global_economy %>%# Filter and Create the column firstfilter(Country =="Australia") %>%mutate(Population_M = Population /1e6) %>%# Modelmodel(Drift =RW(Population_M ~drift())) %>%forecast(h =15) %>%# Make the graphautoplot(global_economy %>%filter(Country =="Australia") %>%mutate(Population_M = Population /1e6)) +labs(title ="Australian Population (global_economy) 15 Year Forecast", x ="Year", y ="Population (Millions)")
Bricks (aus_production)
brick_data <- aus_production %>%filter(!is.na(Bricks))brick_data %>%model(SNaive =SNAIVE(Bricks)) %>%forecast(h =8) %>%autoplot(brick_data) +labs(title ="Bricks Forecast", y ="Millions of Bricks")
NSW Lambs (aus_livestock)
nsw_lambs <- aus_livestock %>%filter(State =="New South Wales", Animal =="Lambs") %>%mutate(Lambs_M = Count /1e6)nsw_lambs %>%model(SNaive =SNAIVE(Lambs_M)) %>%forecast(h =24) %>%autoplot(nsw_lambs) +# Simple and clean!labs(title ="NSW Lambs Production Forecast",y ="Lambs (Millions)",x ="Month")
Household wealth (hh_budget)
hh_budget %>%filter(Country =="Australia") %>%model(Drift =RW(Wealth ~drift())) %>%forecast(h =10) %>%autoplot(hh_budget %>%filter(Country =="Australia")) +labs(title ="Australian Household Wealth Forecast",subtitle ="Using Random Walk with Drift",y ="Wealth (% of disposable income)",x ="Year")
Produce forecasts using the drift method and plot them & Show that the forecasts are identical to extending the line drawn between the first and last observations.
#PREPARE DATAfb_clean <- gafa_stock %>%filter(Symbol =="FB") %>%index_by(Month =yearmonth(Date)) %>%summarise(Close =mean(Close, na.rm =TRUE))#Make Forecastfb_clean %>%model(Drift =RW(Close ~drift())) %>%forecast(h =12) %>%autoplot(fb_clean) +annotate("segment", x =min(fb_clean$Month), y = fb_clean$Close[1], xend =max(fb_clean$Month), yend =tail(fb_clean$Close, 1),color ="red", linetype ="dashed") +labs(title ="Facebook Stock Price: Drift Forecast",subtitle ="Dashed red line connects the first and last observations",y ="Price ($USD)",x ="Year/Month")
Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?
5.3 Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.
recent_production <- aus_production |>filter(year(Quarter) >=1992)# Define and estimate a modelfit <- recent_production |>model(SNAIVE(Beer))# Look at the residualsfit |>gg_tsresiduals()
Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
ℹ Please use `ggtime::gg_tsresiduals()` instead.
Warning: Removed 4 rows containing missing values or values outside the scale range
(`geom_line()`).
Warning: Removed 4 rows containing missing values or values outside the scale range
(`geom_point()`).
Warning: Removed 4 rows containing non-finite outside the scale range
(`stat_bin()`).
Warning: Removed 4 rows containing missing values or values outside the scale range
(`geom_rug()`).
# Look a some forecastsfit |>forecast() |>autoplot(recent_production)
The residuals don’t look like white noise and are normally distribute, in the time plot I don’t see any patterns or trends, and most autocorrected bars are within the blue dashed line.
5.4 Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.
Warning: Removed 4 rows containing missing values or values outside the scale range
(`geom_line()`).
Warning: Removed 4 rows containing missing values or values outside the scale range
(`geom_point()`).
Warning: Removed 4 rows containing non-finite outside the scale range
(`stat_bin()`).
Warning: Removed 4 rows containing missing values or values outside the scale range
(`geom_rug()`).
bricks_fit %>%forecast(h ="5 years") %>%autoplot(aus_production) +labs(title ="Clay Brick Production: Seasonal Naive Forecast", y ="Millions of bricks")
Warning: Removed 20 rows containing missing values or values outside the scale range
(`geom_line()`).
I don’t think Naive() or Snaive() are better. The residual patterns don’t look like white noise, no trend, or seasonal pattern. There’s a significant spike in lag 4 so I think that the relationship isn’t being fully captured.
5.7 For your retail time series (from Exercise 7 in Section 2.10):
a) Create a training dataset consisting of observations before 2011 using
c) Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).
fit <- myseries_train %>%model(SNAIVE(Turnover))
d) Check the residuals.
fit |>gg_tsresiduals()
Warning: Removed 12 rows containing missing values or values outside the scale range
(`geom_line()`).
Warning: Removed 12 rows containing missing values or values outside the scale range
(`geom_point()`).
Warning: Removed 12 rows containing non-finite outside the scale range
(`stat_bin()`).
Warning: Removed 12 rows containing missing values or values outside the scale range
(`geom_rug()`).
The hitsgram shows what looks like a normal distribution. I don’t think the residuals are uncorrelated.
e) Produce forecasts for the test data
fc <- fit |>forecast(new_data =anti_join(myseries, myseries_train))
Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fc |>autoplot(myseries)
f) Compare the accuracy of your forecasts against the actual values.
fit |>accuracy()
# A tibble: 1 × 12
State Industry .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Tasmania Clothin… SNAIV… Trai… 0.532 1.76 1.37 3.03 8.57 1 1 0.597
fc |>accuracy(myseries)
# A tibble: 1 × 12
.model State Industry .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 SNAIVE(T… Tasm… Clothin… Test 7.61 8.78 7.64 23.2 23.4 5.60 4.99 0.674
g)How sensitive are the accuracy measures to the amount of training data used?
Accuracy measures are highly sensitive to the amount of training data used. If the sample is to large then the model with overfit the training data and struggle with the data that looks slightly different. But if the sample is to small the model will underfit the date and struggle to finding anything valuable regardless of the data we give it.