Eric_Hirsch_624_Homework

Exercises 1,2,3,4,7

1.Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

1.a Australian Population (global_economy)

This time series lends itself best to drift. There is no apparent seasonality, and the trend is strong and straightforward. The forecast looks quite reasonable.

1.b. Bricks (aus_production)

Brick production is highly seasonal, and the trendline isn’t clear. We use SNAIVE for this time series.

1.c NSW Lambs (aus_livestock)

Lamb slaughter also has a strong seasonal component, so we use SNAIVE.

1.d Household wealth (hh_budget).

Household wealth does not exhibit seasonality and has a discernible, though uneven, trend. We use drift.

1.e Australian takeaway food turnover (aus_retail).

We need to create a new tsibble which combines the data from all Australian states for each month. The result has both trend and seasonality - we choose SNAIVE to emphasize the seasonality.

2 Use the Facebook stock price (data set gafa_stock) to do the following:

2.1 Produce a time plot of the series.

2.b_c Produce forecasts using the drift method and plot them. Show that the forecasts are identical to extending the line drawn between the first and last observations.

We can see that the drift method is contiguous with a line from first to last observation.

## [1] 54.71

## [1] 131.09

2.d Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

We use Naive and Mean. None of the three seem reasonable given the significant shift in trend in which the stock appears to be going down.

3. Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts.

There is no discernible pattern in the residuals or ACF, although the residuals show an odd pattern in the two peaks around 0. We speculate that the reason is the pronounced seasonality - forecasts are likely in that case to significantly overestimate or underestimate the actual value .

3.4 Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

Export residuals show little patterning and are distributed normally around 0. The ACF shows no discernible pattern.

However, Brick production shows bias, patterning in the residuals and strong seasonal patterns in the acf. This may be because SNAIVE is picking up the seasonality but not the trend.

3.7 For your retail time series (from Exercise 8 in Section 2.10), create a training dataset consisting of observations before 2011. Check that your data have been split appropriately. Fit a seasonal naïve model using SNAIVE() applied to your training data. Check the residuals. Do the residuals appear to be uncorrelated and normally distributed? Produce forecasts for the test data. Compare the accuracy of your forecasts against the actual values. How sensitive are the accuracy measures to the amount of training data used?

The forecasting has some obvious problems - 1) The training portion and test portion have very different characteristics, 2) as a result the forecast shows significant bias, 3) as well as significant autocorrelation. The test accuracy, using the RSME, is more than twice that of the training set.

## # A tibble: 1 × 10
##   .model   .type       ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE() Training  1.22  2.85  1.93  5.91  11.1     1     1 0.806

## # A tibble: 1 × 10
##   .model   .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE() Test   6.05  7.03  6.11  15.1  15.2  3.16  2.47 0.664

If we include more of the training set, the result improves. Indeed, the test set now has A LOWER RMSE than the training set. The question as to whether a larger training set improves the result depends on the situation. However, SNAIVE is unlikely to overfit - in this case not including data prior to 2011 seriously underfit the training set.

## # A tibble: 1 × 10
##   .model   .type       ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE() Training  1.18  3.52  2.46  5.17  11.2     1     1 0.822

## # A tibble: 1 × 10
##   .model   .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE() Test  0.172  2.43  2.13 0.538  5.51 0.868 0.691 0.566

Eric_Hirsch_624_Homework_3

2023-02-16