Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:
Australian Population (global_economy)
Bricks (aus_production)
NSW Lambs (aus_livestock)
Household wealth (hh_budget)
Australian takeaway food turnover (aus_retail).
Use the Facebook stock price (data set gafa_stock) to do
the following:
Produce a time plot of the series.
Produce forecasts using the drift method and plot them.
Show that the forecasts are identical to extending the line drawn between the first and last observations.
Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?
Produce forecasts for the 7 Victorian series in
aus_livestock using SNAIVE(). Plot the
resulting forecasts including the historical data. Is this a reasonable
benchmark for these series?
Are the following statements true or false? Explain your answer.
Good forecast methods should have normally distributed residuals.
A model with small residuals will give good forecasts.
The best measure of forecast accuracy is MAPE.
If your model does not forecast well, you should make it more complicated.
Always choose the model with the best forecast accuracy as measured on the test set.
Select one of the time series as follows (but choose your own seed value):
set.seed(12345678)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
Create a training dataset consisting of observations before 2011.
Check that your data have been split appropriately by producing the following plot.
Calculate seasonal naive forecasts using SNAIVE()
applied to your training data (myseries_train).
Check the residuals. Do the residuals appear to be uncorrelated and normally distributed?
Produce forecasts for the test data.
Compare the accuracy of your forecasts against the actual values.
How sensitive are the accuracy measures to the amount of training data used? (Challenge)
Consider the number of pigs slaughtered in New South Wales (data set
aus_livestock).
Produce some plots of the data in order to become familiar with it.
Create a training set of 486 observations
Try using various benchmark methods to forecast the training set and compare the results on the test set. Which method did best?
Check the residuals of your preferred method. Do they resemble white noise?
We will use the bricks data from aus_production
(Australian quarterly clay brick production 1956–2005) for this
exercise.
Use an STL decomposition to calculate the trend-cycle and seasonal indices for additive and multiplicative. (Experiment with having fixed (periodic) or changing seasonality.)
Compute and plot the seasonally adjusted data.
Use a naive method to produce forecasts of the seasonally adjusted data.
Use decomposition_model() to reseasonalise the
results, giving forecasts for the original data.
Do the residuals look uncorrelated?
Compare forecasts from decomposition_model() with
those from SNAIVE(), using a test set comprising the last 2
years of data. Which is better?
tourism contains quarterly visitor nights (in thousands)
from 1998 to 2017 for 76 regions of Australia.
Extract data from the Gold Coast region using
filter() and aggregate total overnight trips (sum over
Purpose) using summarise(). Call this new
dataset gc_tourism.
Using slice() or filter(), create three
training sets for this data excluding the last 1, 2 and 3 years. For
example,
gc_train_1 <- gc_tourism %>% slice(1:(n()-4)).
Compute one year of forecasts for each training set using the
seasonal naïve (SNAIVE()) method. Call these
gc_fc_1, gc_fc_2 and gc_fc_3,
respectively.
Use accuracy() to compare the test set forecast
accuracy using MAPE. Comment on these.
a.From the gafa_stock select apple stock close pricings
and plot it
apply the cross-validation with a minimum lenght of 10, growing by 1 each step. (creates test subsets)
Estimate a Random walk with drift model
repeat step c by using a mean model
Compare both models’ accuracy. Which model did perfome the best?
From the data set prices plot the wheat (do not
forget to remove NA)
Fit a RW with drift, and forecast for the next 50 periods (plot)
Let’s see that you do not trust on the predict interval estimated. In doing so, please bootstrap 500 scenarios
plot all scenarios
What are the conditions in which predict intervals from bootstrapped residuals might be reasonable?