Homework

5.1) Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)

As the data trend shows an increase with no seasonal components, Drift method would be ideal in forecasting.

Bricks (aus_production)

Seasonal Naive is the most appropriate due to the trend of the Brick data propogating in a seasonal and quarterly interval.

NSW Lambs (aus_livestock)

Due to the seasonal component showing no significance and the fluxuation in trend, Naive method would be ideal.

Household wealth (hh_budget).

Due to a pattern in the trend and no seasonality, drift method would be ideal.

Australian takeaway food turnover (aus_retail)

With the seasonal component being minimal, Seasonal Naive method would be ideal.

5.2) Use the Facebook stock price (data set gafa_stock) to do the following:

Produce a time plot of the series.

Produce forecasts using the drift method and plot them.

Show that the forecasts are identical to extending the line drawn between the first and last observations.

Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

Since there is no seasonality component, it would not be ideal to use SNaive. The Mean and Naive will not demonstrate the trend in the data, therefore will be bad methods to implement.

5.3) Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

Both tests have a p-value less than .05, which states the residuals are not explained by the white noise.

## Warning: Removed 4 rows containing missing values (`geom_line()`).

## Warning: Removed 4 rows containing missing values (`geom_point()`).

## Warning: Removed 4 rows containing non-finite values (`stat_bin()`).

Box-Pierce Test

## # A tibble: 1 × 3
##   .model       bp_stat bp_pvalue
##   <chr>          <dbl>     <dbl>
## 1 SNAIVE(Beer)    34.4  0.000160

Ljung-Box Test

## # A tibble: 1 × 3
##   .model       lb_stat lb_pvalue
##   <chr>          <dbl>     <dbl>
## 1 SNAIVE(Beer)    37.8 0.0000412

5.4) Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

Since there is no seasonality, NAIVE method would be appropriate.

## Warning: Removed 1 row containing missing values (`geom_line()`).

## Warning: Removed 1 rows containing missing values (`geom_point()`).

## Warning: Removed 1 rows containing non-finite values (`stat_bin()`).

Both tests produce a p-value greater than .05, which states the residuals resemble white noise.

## # A tibble: 1 × 4
##   Country   .model         bp_stat bp_pvalue
##   <fct>     <chr>            <dbl>     <dbl>
## 1 Australia NAIVE(Exports)    14.6     0.148

## # A tibble: 1 × 4
##   Country   .model         lb_stat lb_pvalue
##   <fct>     <chr>            <dbl>     <dbl>
## 1 Australia NAIVE(Exports)    16.4    0.0896

There is a seasonality component in Bricks, so SNAIVE method would be appropriate.

## Warning: Removed 24 rows containing missing values (`geom_line()`).

## Warning: Removed 24 rows containing missing values (`geom_point()`).

## Warning: Removed 24 rows containing non-finite values (`stat_bin()`).

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 8 rows containing missing values (`()`).

## Warning: Removed 20 rows containing missing values (`geom_line()`).

Both tests a p-value less than 0.05, which states the residuals do not resemble white noise.

## # A tibble: 1 × 3
##   .model         bp_stat bp_pvalue
##   <chr>            <dbl>     <dbl>
## 1 SNAIVE(Bricks)    292.         0

## # A tibble: 1 × 3
##   .model         lb_stat lb_pvalue
##   <chr>            <dbl>     <dbl>
## 1 SNAIVE(Bricks)    301.         0

5.7) For your retail time series (from Exercise 7 in Section 2.10):

Create a training dataset consisting of observations before 2011 using

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries_train <- myseries |>
  filter(year(Month) < 2011)

Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit <- myseries_train |>
  model(SNAIVE(Turnover))

Check the residuals. The residuals are not correlated or normally distributed.

fit |> gg_tsresiduals()

## Warning: Removed 12 rows containing missing values (`geom_line()`).

## Warning: Removed 12 rows containing missing values (`geom_point()`).

## Warning: Removed 12 rows containing non-finite values (`stat_bin()`).

Produce forecasts for the test data

fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`

fc |> autoplot(myseries)

Compare the accuracy of your forecasts against the actual values.

fit |> accuracy()

## # A tibble: 1 × 12
##   State    Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Norther… Clothin… SNAIV… Trai… 0.439  1.21 0.915  5.23  12.4     1     1 0.768

fc |> accuracy(myseries)

## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… Nort… Clothin… Test  0.836  1.55  1.24  5.94  9.06  1.36  1.28 0.601

How sensitive are the accuracy measures to the amount of training data used?

The accuracy measures are very sensitive to the amount of training data used. With quality and quantity training data, the models can forecast better accuracy.

Homework_3

Tyler Brown

2023-09-24