library(fpp3)
library(seasonal)
library(USgas)
Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:
Australian Population (global_economy) Bricks (aus_production) NSW Lambs (aus_livestock) Household wealth (hh_budget). Australian takeaway food turnover (aus_retail).
aus_pop <- global_economy %>%
filter(!is.na(Population)) %>%
filter(Country == "Australia") %>%
select(Population)
autoplot(aus_pop) +
labs(title = "Australian Population Growth")
aus_pop %>%
model(Drift = RW(Population ~ drift())) %>%
forecast(h = "5 years") %>%
autoplot(aus_pop) +
labs(title = "Australian Population Growth Drift Forecast")
The above forecast for Australian Population from
global_economy was created using
RW(y ~ drift()). The reason for using this technique over
NAIVE(y) and SNAIVE(y) was due to the series’
already existing, and consistent, positive trend and lack of seasonality
rendering other techniques useless.
bricks_pro <- aus_production %>%
filter(!is.na(Bricks)) %>%
select(Bricks)
autoplot(bricks_pro) +
labs(title = "Australian Bricks Production")
bricks_pro %>%
model(Snaive = SNAIVE(Bricks)) %>%
forecast(h = "5 years") %>%
autoplot(bricks_pro) +
labs(title = "Australian Bricks Production Seasonal Naive Forecast")
The above forecast for Bricks from aus_production was
created using SNAIVE(y). Highly repetitive and seasonal
data such as the one found in this series is best predicted with
seasonality in mind. In addition, the confidence interval seen in the
graph can also account for any upward or downward fluctuation that can
take place in series’ trend.
nsw_lambs <- aus_livestock %>%
filter(!is.na(Count)) %>%
filter(State == "New South Wales") %>%
filter(Animal == "Lambs")
autoplot(nsw_lambs) +
labs(title = "New South Wales Lambs Unalived")
nsw_lambs %>%
model(Snaive = SNAIVE(Count)) %>%
forecast(h = "5 years") %>%
autoplot(nsw_lambs) +
labs(title = "New South Wales Lambs Unalived Seasonal Naive Forecast")
The above forecast for NSW Lambs from aus_livestock was
created using SNAIVE(y). Just like the series before this
data is best predicted with seasonality in mind. The confidence interval
can also account for any upward or downward fluctuation just like
before.
hh_wealth <- hh_budget %>%
filter(!is.na(Wealth)) %>%
select(Wealth)
autoplot(hh_wealth) +
labs(title = "Household Wealth")
hh_wealth %>%
model(Drift = RW(Wealth ~ drift())) %>%
forecast(h = "5 years") %>%
autoplot(hh_wealth) +
labs(title = "Household Wealth Drift Forecast")
The above forecast for Household wealth from hh_budget
was created using RW(y ~ drift()). The series shows little
to no seasonality and each country has a clear positive trend regardless
of some fluctuation between the years 1999-2002 and 2007-2008.
takeaway <- aus_retail %>%
filter(!is.na(Turnover)) %>%
filter(Industry == "Takeaway food services")
autoplot(takeaway) +
labs(title = "Australian Takeaway Food Turnover")
takeaway %>%
model(Snaive = SNAIVE(Turnover)) %>%
forecast(h = "5 years") %>%
autoplot(takeaway) +
labs(title = "Australian Takeaway Food Turnover Seasonal Naive Forecast") +
facet_wrap(vars(State), scales = "free", ncol = 2, nrow = 4)
The above forecast for Australian takeaway food turnover from
aus_retail was created using SNAIVE(y). The
turnover data for each region of Australia shows clear seasonality even
with a slightly upward trend this can easily be accounted for through
the confidence intervals for each respective graph.
Use the Facebook stock price (data set gafa_stock) to do the following:
Produce a time plot of the series. Produce forecasts using the drift method and plot them. Show that the forecasts are identical to extending the line drawn between the first and last observations. Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?
fb_stock <- gafa_stock %>%
filter(Symbol == "FB") %>%
mutate(day = row_number()) %>%
update_tsibble(index = day, regular = TRUE)
autoplot(fb_stock) +
labs(title = "Facebook Stock")
fb_stock %>%
model(Drift = RW(Open ~ drift())) %>%
forecast(h = 250) %>%
autoplot(fb_stock) +
labs(title = "Facebook Stock Drift Forecast")
fb_stock %>%
model(Drift = RW(Open ~ drift())) %>%
forecast(h = 250) %>%
autoplot(fb_stock, level = NULL) +
geom_segment(aes(x = min(day), y = Open[which.min(day)],
xend = max(day), yend = Open[which.max(day)]),
color = "red", linetype = "dashed") +
labs(title = "Facebook Stock Extrapolated Line Segment")
In order to show that the drift forecast is identical to extending
the line drawn between the first and last observation, a second plot was
created using geom_segment that drew a visible dashed line
between the first and last points of the graph. As can be seen, the
drift forecast is a clear continuation of this line.
fb_stock %>%
model(Mean = MEAN(Open),
Naive = NAIVE(Open),
Snaive = SNAIVE(Open)) %>%
forecast(h = 250) %>%
autoplot(fb_stock, level = NULL) +
labs(title = "Facebook Stock Mean, Naive and Seasonal Naive Forecasts")
When using some of the other benchmark functions to forecast the same data set, the naive technique seems to have had the best results. This statement, of course, should be taken with a grain of salt since the forecast extends all the way to 250 from the last data point. With this in mind, the naive forecast can hold true for a good few days but will then begin to deviate from the true value. Still, a naive forecast does perform better when compared to mean and seasonal naive forecasts.
Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.
# Extract data of interest
recent_production <- aus_production |>
filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()
# Look at some forecasts
fit |> forecast() |> autoplot(recent_production)
# Check for significance
augment(fit) %>%
features(.resid, ljung_box, lag=8)
## # A tibble: 1 × 3
## .model lb_stat lb_pvalue
## <chr> <dbl> <dbl>
## 1 SNAIVE(Beer) 32.3 0.0000834
What do you conclude?
As can be seen from the visuals, there is a consistent up and down pattern in the innovation residuals chart, there are 4 lines on the lag chart extending past or near the boundary, and the distribution seems to be skewed slightly to the right. If this wasn’t enough to entail room for improvement in the forecast, a Ljung-Box test was conducted which led to a large Q* value and significant p-value, indicating that this is not just white noise.
Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.
aus_exports <- global_economy %>%
filter(!is.na(Exports)) %>%
filter(Country == "Australia") %>%
select(Exports)
autoplot(aus_exports)
afit <- aus_exports %>%
model(NAIVE(Exports))
afit %>%
gg_tsresiduals()
afit %>%
forecast() %>%
autoplot(aus_exports)
augment(afit) %>%
features(.resid, ljung_box, lag=10)
## # A tibble: 1 × 3
## .model lb_stat lb_pvalue
## <chr> <dbl> <dbl>
## 1 NAIVE(Exports) 16.4 0.0896
The above visuals show us that the innovation residuals data for
aus_exports from global_economy has a slight pattern but
increases in volatility towards the end, there is just one line in the
lags chart out of bounds, and it is more or less normally distributed.
Add on top of that that the Q* value is low and the p-value is
insignificant and we can conclude that what is remaining from the
forecast is mostly white noise.
bricks <- aus_production %>%
filter(!is.na(Bricks)) %>%
select(Bricks)
autoplot(bricks)
bfit <- bricks %>%
model(SNAIVE(Bricks))
bfit %>%
gg_tsresiduals()
bfit %>%
forecast() %>%
autoplot(bricks)
augment(bfit) %>%
features(.resid, ljung_box, lag=8)
## # A tibble: 1 × 3
## .model lb_stat lb_pvalue
## <chr> <dbl> <dbl>
## 1 SNAIVE(Bricks) 274. 0
The above visuals show us that the innovation residuals data for
bricks from aus_production has an obvious pattern with an
increase in volatility towards the middle, the majority of the lag
segments surpass the boundaries, and its distribution is skewed to the
left. In addition, the Q* value is very large and the p-value is very
significant which helps us conclude that what is remaining from the
forecast is not just white noise and can be improved to take what is
remaining into account.
For your retail time series (from Exercise 7 in Section 2.10):
set.seed(1)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
Create a training dataset consisting of observations before 2011 using
myseries_train <- myseries %>%
filter(year(Month) < 2011)
Check that your data have been split appropriately by producing the following plot.
autoplot(myseries, Turnover) +
autolayer(myseries_train, Turnover, colour = "red")
Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).
mfit <- myseries_train %>%
model(SNAIVE(Turnover))
Check the residuals. Do the residuals appear to be uncorrelated and normally distributed?
mfit %>%
gg_tsresiduals()
As can be seen above, the residuals do not appear to be uncorrelated due to the innovation residuals chart presenting high dependency after the year 1995, almost half of the segments on the lag chart surpassing the boundaries, and their distribution being narrow and skewed to the left.
Produce forecasts for the test data
fc <- mfit %>%
forecast(new_data = anti_join(myseries, myseries_train))
fc %>%
autoplot(myseries)
Compare the accuracy of your forecasts against the actual values.
mfit %>%
accuracy()
## # A tibble: 1 × 12
## State Industry .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Queensl… Clothin… SNAIV… Trai… 5.50 16.5 12.1 5.39 10.6 1 1 0.819
fc %>%
accuracy(myseries)
## # A tibble: 1 × 12
## .model State Industry .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… Quee… Clothin… Test 26.9 40.2 29.3 11.3 12.6 2.43 2.44 0.752
How sensitive are the accuracy measures to the amount of training data used?
The accuracy measures seem to be very sensitive to the amount of training data used. As can be seen in the results above, the measures for the test data are drastically different from the measures for the training data. This is most likely due to the fact that the training data takes into account absolutely everything before what is going to be tested irrespective of any sudden changes or fluctuations that could have taken place towards the end of the data. Meanwhile, the test data, depending on the very last seasonal period in this case, uses the pattern at the very end and repeats it over and over. Given the drastic change in trend between the years 2005-2010, this would lead to the forecast having the errors we see above.