Chapter 5 Homework (2)

Author

Griffin Lessinger

9. Analysis of hh_budget Series

(a). Create a training set for household wealth (hh_budget) by withholding the last four years as a test set

# Create training set
hh_budget.train <- hh_budget[hh_budget$Year %in% 1995:2012, ]
hh_budget.train[16:20, ]
# A tsibble: 5 x 8 [1Y]
# Key:       Country [2]
  Country    Year  Debt    DI Expenditure Savings Wealth Unemployment
  <chr>     <dbl> <dbl> <dbl>       <dbl>   <dbl>  <dbl>        <dbl>
1 Australia  2010  188. 5.55         3.99    7.43   342.         5.21
2 Australia  2011  188. 3.03         2.97    8.06   318.         5.08
3 Australia  2012  189. 0.340        1.77    6.84   353.         5.22
4 Canada     1995  103. 0.641        2.24    9.31   402.         9.51
5 Canada     1996  106. 0.215        2.96    6.79   441.         9.61
# Create test sets
hh_budget.test <- hh_budget[hh_budget$Year %in% 2012:2016, ]
hh_budget.test[3:7, ]
# A tsibble: 5 x 8 [1Y]
# Key:       Country [2]
  Country    Year  Debt    DI Expenditure Savings Wealth Unemployment
  <chr>     <dbl> <dbl> <dbl>       <dbl>   <dbl>  <dbl>        <dbl>
1 Australia  2014  193.  2.61        2.39    7.47   375.         6.08
2 Australia  2015  202.  1.34        2.79    5.64   393.         6.06
3 Australia  2016  209.  1.90        2.14    4.62   422.         5.71
4 Canada     2012  170.  2.19        1.91    4.82   490.         7.29
5 Canada     2013  169.  2.19        2.58    5.26   504.         7.08
# Plot data for a visual
hh_budget.train |> autoplot(Wealth) +
  labs(title = "Household wealth over time")

(b). Fit all the appropriate benchmark methods to the training set and forecast the periods covered by the test set

This data does not seem to be seasonal, so we will avoid using the seasonal naive benchmark model:

# Train models. We'll use Mean, Naive, and Drift models
hh_budget.fit <- hh_budget.train |>
  model(
    Mean = MEAN(Wealth),
    Naive = NAIVE(Wealth),
    Drift = RW(Wealth ~ drift())
  )

# Make forecasts
hh_budget.forecast <- hh_budget.fit |>
  forecast(h = 4)

# Plot forecasts against actual values
hh_budget.forecast |>
  autoplot(hh_budget.train, level = NULL) +
  autolayer(hh_budget.test, .vars = Wealth, color = "black")

(c). Compute the accuracy of your forecasts. Which method does best?

acc <- accuracy(hh_budget.forecast, hh_budget.test[hh_budget.test$Year != 2012,])
acc
# A tibble: 12 × 11
   .model Country   .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE    ACF1
   <chr>  <chr>     <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
 1 Drift  Australia Test   29.1  35.5  29.1  7.23  7.23   NaN   NaN  0.210 
 2 Drift  Canada    Test   33.3  37.2  33.3  6.09  6.09   NaN   NaN -0.229 
 3 Drift  Japan     Test   14.7  17.9  14.7  2.44  2.44   NaN   NaN -0.229 
 4 Drift  USA       Test   75.9  76.2  75.9 12.7  12.7    NaN   NaN -0.561 
 5 Mean   Australia Test   35.7  42.3  35.7  8.89  8.89   NaN   NaN  0.216 
 6 Mean   Canada    Test   90.4  92.9  90.4 16.7  16.7    NaN   NaN -0.0799
 7 Mean   Japan     Test  100.  101.  100.  16.8  16.8    NaN   NaN -0.534 
 8 Mean   USA       Test   82.9  83.3  82.9 13.9  13.9    NaN   NaN -0.423 
 9 Naive  Australia Test   34.7  41.5  34.7  8.64  8.64   NaN   NaN  0.216 
10 Naive  Canada    Test   46.2  51.0  46.2  8.46  8.46   NaN   NaN -0.0799
11 Naive  Japan     Test   36.3  37.8  36.3  6.06  6.06   NaN   NaN -0.534 
12 Naive  USA       Test   82.1  82.5  82.1 13.8  13.8    NaN   NaN -0.423 
print(paste0("Average accuracy of drift (RMSE): ", trunc(100*sum(acc[1:4, ]$RMSE/4))/100))
[1] "Average accuracy of drift (RMSE): 41.69"
print(paste0("Average accuracy of mean (RMSE): ", trunc(100*sum(acc[5:8, ]$RMSE/4))/100))
[1] "Average accuracy of mean (RMSE): 79.86"
print(paste0("Average accuracy of naive (RMSE): ", trunc(100*sum(acc[9:12, ]$RMSE/4))/100))
[1] "Average accuracy of naive (RMSE): 53.21"

Across all four countries, the model that performed the best (in terms of RMSE) was the drift method, with it’s average accuracy listed above. Worst was the mean model.

(d). Check the residuals of your preferred method. Do they resemble white noise?

hh_budget.train[hh_budget.train$Country == "Australia", ] |>
  model(RW(Wealth ~ drift())) |>
  gg_tsresiduals() +
  labs(title = "Residuals for Australia")

hh_budget.train[hh_budget.train$Country == "Canada", ] |>
  model(RW(Wealth ~ drift())) |>
  gg_tsresiduals() +
  labs(title = "Residuals for Canada")

hh_budget.train[hh_budget.train$Country == "Japan", ] |>
  model(RW(Wealth ~ drift())) |>
  gg_tsresiduals() +
  labs(title = "Residuals for Japan")

hh_budget.train[hh_budget.train$Country == "USA", ] |>
  model(RW(Wealth ~ drift())) |>
  gg_tsresiduals() +
  labs(title = "Residuals for USA")

The residuals somewhat resemble white noise, specifically for the first three countries (Australia, Canada, and Japan). We can see, on the residual histograms, scattering of the residuals around 0. However, there is still a strong negative skew on the residual histogram for the USA, suggesting a non-normal distribution. Autocorrelation is low for all, meaning that the drift model captures the temporal structure of the data. A log transform might be in order!

11. Analysis of Bricks data (from aus_production)

(a). Use an STL decomposition to calculate the trend-cycle and seasonal indices

bricks <- aus_production |>
  filter_index("1956 Q1" ~ "2004 Q4") |>
  select(Bricks)

bricks |> autoplot(Bricks) + 
  labs(title = "Australian brick production")

bricks.dcmp <- bricks |>
  model(STL(Bricks ~ trend(window = 51) + season(window = 9))) |>
  components() |>
  select(-.model)

bricks |>
  model(STL(Bricks ~ trend(window = 51) + season(window = 9))) |>
  components() |>
  autoplot()

Above was about the best that I could do to keep the remainder “small” but have the seasonality “regular”. It feels like there is an additional broader seasonality reflected in the remainder? A large trend window was chosen for smoothing of the trend cycle, as there are ~200 observations to work with.

(b). Compute and plot the seasonally adjusted data

bricks.dcmp |>
  autoplot(season_adjust) +
  labs(title = "Seasonally adjusted bricks series")

(c). Use a naïve method to produce forecasts of the seasonally adjusted data

bricks.dcmp |>
  model(NAIVE(season_adjust)) |>
  forecast(h = 16) |>
  autoplot(bricks.dcmp) +
  labs(title = "Naive forecasts of seasonally adjusted series")

Doing this here as they do it in the text.

(d). Use decomposition_model() to reseasonalise the results, giving forecasts for the original data

fit_bricks.dcmp <- bricks |>
  model(stlf = decomposition_model(
    STL(Bricks ~ trend(window = 51) + season(window = 9)),
    NAIVE(season_adjust)
  ))

fit_bricks.dcmp |>
  forecast(h = 16) |>
  autoplot(bricks) +
  labs(title = "Reseasonalized Naive forecasts of bricks series")

Same here, as in the text.

(e). Do the residuals look uncorrelated?

fit_bricks.dcmp |> gg_tsresiduals()

Somewhat? Not fully uncorrelated for some lag values, and the residual histogram is slightly negatively skewed.

(f). Repeat with a robust STL decomposition. Does it make much difference?

fit_bricks.dcmp_r <- bricks |>
  model(stlf = decomposition_model(
    STL(Bricks ~ trend(window = 51) + season(window = 9), robust = TRUE),
    NAIVE(season_adjust)
  ))

fit_bricks.dcmp_r |>
  forecast(h = 16) |>
  autoplot(bricks) +
  labs(title = "Reseasonalized Naive forecasts of bricks series (robust)")

fit_bricks.dcmp_r |> gg_tsresiduals()

The results are not dramatically better, but they are still better than with the non-robust STL decomposition model. Autocorrelation is slightly tighter for most lag values and the residual histogram is less skewed.

(g). Compare forecasts from decomposition_model() with those from SNAIVE(), using a test set comprising the last 2 years of data

bricks.train <- bricks |>
  filter(year(Quarter) < 2003)
bricks.test <- bricks |>
  filter(year(Quarter) >= 2003)

fit_bricks_preds <- bricks.train |>
  model(
    stlf = decomposition_model(
      STL(Bricks ~ trend(window = 51) + season(window = 9), robust = TRUE),
      NAIVE(season_adjust)),
    snaive = SNAIVE(Bricks)
    ) |>
  forecast(h = 8)

accuracy(fit_bricks_preds, bricks.test)
# A tibble: 2 × 10
  .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE   ACF1
  <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
1 snaive Test  17.2   30.6  24.8  4.18  6.06   NaN   NaN -0.236
2 stlf   Test   5.25  19.5  14.4  1.29  3.54   NaN   NaN -0.104

The STL decomposition is a lot more accurate than the seasonal naive model, both in RMSE and MAE. Wow!