HW#3

Question 1

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)
Bricks (aus_production)
NSW Lambs (aus_livestock)
Household wealth (hh_budget)
Australian takeaway food turnover (aus_retail).

Question 2

Use the Facebook stock price (data set gafa_stock) to do the following:

Produce a time plot of the series.
Produce forecasts using the drift method and plot them.
Show that the forecasts are identical to extending the line drawn between the first and last observations.
Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

Question 3

Produce forecasts for the 7 Victorian series in aus_livestock using SNAIVE(). Plot the resulting forecasts including the historical data. Is this a reasonable benchmark for these series?

Question 4

Are the following statements true or false? Explain your answer.

Good forecast methods should have normally distributed residuals.
A model with small residuals will give good forecasts.
The best measure of forecast accuracy is MAPE.
If your model does not forecast well, you should make it more complicated.
Always choose the model with the best forecast accuracy as measured on the test set.

Question 5

Select one of the time series as follows (but choose your own seed value):

set.seed(12345678)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

Create a training dataset consisting of observations before 2011.
Check that your data have been split appropriately by producing the following plot.
Calculate seasonal naive forecasts using SNAIVE() applied to your training data (myseries_train).
Check the residuals. Do the residuals appear to be uncorrelated and normally distributed?
Produce forecasts for the test data.
Compare the accuracy of your forecasts against the actual values.
How sensitive are the accuracy measures to the amount of training data used? (Challenge)

Question 6

Consider the number of pigs slaughtered in New South Wales (data set aus_livestock).

Produce some plots of the data in order to become familiar with it.
Create a training set of 486 observations
Try using various benchmark methods to forecast the training set and compare the results on the test set. Which method did best?
Check the residuals of your preferred method. Do they resemble white noise?

Question 7

We will use the bricks data from aus_production (Australian quarterly clay brick production 1956–2005) for this exercise.

Use an STL decomposition to calculate the trend-cycle and seasonal indices for additive and multiplicative. (Experiment with having fixed (periodic) or changing seasonality.)
Compute and plot the seasonally adjusted data.
Use a naive method to produce forecasts of the seasonally adjusted data.
Use decomposition_model() to reseasonalise the results, giving forecasts for the original data.
Do the residuals look uncorrelated?
Compare forecasts from decomposition_model() with those from SNAIVE(), using a test set comprising the last 2 years of data. Which is better?

Question 8

tourism contains quarterly visitor nights (in thousands) from 1998 to 2017 for 76 regions of Australia.

Extract data from the Gold Coast region using filter() and aggregate total overnight trips (sum over Purpose) using summarise(). Call this new dataset gc_tourism.
Using slice() or filter(), create three training sets for this data excluding the last 1, 2 and 3 years. For example, gc_train_1 <- gc_tourism %>% slice(1:(n()-4)).
Compute one year of forecasts for each training set using the seasonal naÃ¯ve (SNAIVE()) method. Call these gc_fc_1, gc_fc_2 and gc_fc_3, respectively.
Use accuracy() to compare the test set forecast accuracy using MAPE. Comment on these.

Question 9

a.From the gafa_stock select apple stock close pricings and plot it

apply the cross-validation with a minimum lenght of 10, growing by 1 each step. (creates test subsets)
Estimate a Random walk with drift model
repeat step c by using a mean model
Compare both models’ accuracy. Which model did perfome the best?

Question 10

From the data set prices plot the wheat (do not forget to remove NA)
Fit a RW with drift, and forecast for the next 50 periods (plot)
Let’s see that you do not trust on the predict interval estimated. In doing so, please bootstrap 500 scenarios
plot all scenarios
What are the conditions in which predict intervals from bootstrapped residuals might be reasonable?