Group 6

Chapter 13: Some practical forecasting issues

Teammates

20bds003 - Aman Vignesh Kundeti
20bds013 - Bhashapangu Nikhil
20bds014 - Bommaragoni Karthik
20bds015 - Chinmay S Poola
20bds017 - Devansh Purwar
20bds020 - Raja Chanasha
20bds038 - P.V.S. Pranay
20bds053 - Sowmya S
20bds061 - V Hariteja
20bds063 - Vipul Bawankar

library(fpp3)
## Warning: package 'fpp3' was built under R version 4.1.3
## -- Attaching packages ---------------------------------------------- fpp3 0.5 --
## v tibble      3.2.1     v tsibble     1.1.3
## v dplyr       1.1.0     v tsibbledata 0.4.1
## v tidyr       1.3.0     v feasts      0.3.0
## v lubridate   1.9.2     v fable       0.3.2
## v ggplot2     3.4.1     v fabletools  0.3.2
## Warning: package 'tibble' was built under R version 4.1.3
## Warning: package 'dplyr' was built under R version 4.1.3
## Warning: package 'tidyr' was built under R version 4.1.3
## Warning: package 'lubridate' was built under R version 4.1.3
## Warning: package 'ggplot2' was built under R version 4.1.3
## Warning: package 'tsibble' was built under R version 4.1.3
## Warning: package 'tsibbledata' was built under R version 4.1.3
## Warning: package 'feasts' was built under R version 4.1.3
## Warning: package 'fabletools' was built under R version 4.1.3
## Warning: package 'fable' was built under R version 4.1.3
## -- Conflicts ------------------------------------------------- fpp3_conflicts --
## x lubridate::date()    masks base::date()
## x dplyr::filter()      masks stats::filter()
## x tsibble::intersect() masks base::intersect()
## x tsibble::interval()  masks lubridate::interval()
## x dplyr::lag()         masks stats::lag()
## x tsibble::setdiff()   masks base::setdiff()
## x tsibble::union()     masks base::union()

13.1 Weekly, daily and sub-daily data

Weekly, daily and sub-daily data can be challenging for forecasting for different reasons.

Weekly data

It is difficult to work with weekly data because the seasonal period is both large and non-integer.

The average number of weeks per year is 52.18. Most of the methods require the seasonal period to be an integer. Even if we approximate it by 52, most of the methods cannot handle such a large seasonal period accurately.

The simplest approach is to use an STL decomposition along with a non-seasonal method applied to the seasonally adjusted data.

my_dcmp_spec <- decomposition_model(
  STL(Barrels),
  ETS(season_adjust ~ season("N"))
)
us_gasoline |>
  model(stl_ets = my_dcmp_spec) |>
  forecast(h = "2 years") |>
  autoplot(us_gasoline) +
  labs(y = "Millions of barrels per day",
       title = "Weekly US gasoline production")

## [1] "Figure 13.1: Forecasts for weekly US gasoline production using an STL decomposition with an ETS model for the seasonally adjusted data."

The code applies a seasonal decomposition of time series (STL) and an error, trend, and seasonality (ETS) model to the us_gasoline data, generates forecasts for the next 2 years.

An alternative approach is to use a dynamic harmonic regression model. The order of the ARIMA model is selected by minimising the AICc, although that is done within the ARIMA() function. We use PDQ(0,0,0) to prevent ARIMA() trying to handle the seasonality using seasonal ARIMA components.

gas_dhr <- us_gasoline |>
  model(dhr = ARIMA(Barrels ~ PDQ(0, 0, 0) + fourier(K = 6)))

gas_dhr |>
  forecast(h = "2 years") |>
  autoplot(us_gasoline) +
  labs(y = "Millions of barrels per day",
       title = "Weekly US gasoline production")

## [1] "Figure 13.2: Forecasts for weekly US gasoline production using a dynamic harmonic regression model."

The code fits an ARIMA model with additional Fourier terms to the us_gasoline data, generates forecasts for the next 2 years.

The fitted model has 6 pairs of Fourier terms and can be written as \[ y_t = bt + \sum_{j=1}^{6} \left[ \alpha_j\sin\left(\frac{2\pi j t}{52.18}\right) + \beta_j\cos\left(\frac{2\pi j t}{52.18}\right) \right] + \eta_t \]

where \(\eta_t\) is an ARIMA(0,1,1) process. Because \(\eta_t\) is non-stationary, the model is actually estimated on the differences of the variables on both sides of this equation.

The STL approach is better when the seasonality changes over time. The dynamic harmonic regression approach is better if there are covariates that are useful predictors as these can be added as additional regressors.

Daily and Sub-Daily Data

Daily and sub-daily data are challenging because they involve multiple seasonal patterns, and so it is better to use a method that handles such complex seasonality.

If the time series is relatively short, then it is possible to use one of the single-seasonal methods (e.g., ETS or a seasonal ARIMA model). But when the time series is long enough so that some of the longer seasonal periods become apparent, it will be necessary to use STL, dynamic harmonic regression or Prophet. However, these methods only allow for regular seasonality.

The best way to deal with moving holiday effects is to include dummy variables in the model. This can be done within the ARIMA() or prophet() functions.

13.2 Time series of counts

We assume that the data have a continuous sample space but often data comes in the form of counts.

If the minimum number of customers is at least 100, then the difference between a continuous sample space
[100,∞) and the discrete sample space {100,101,102,…} has no perceivable effect on our forecasts.

However, if our data contains small counts (0,1,2,…) , then we need to use forecasting methods that are more appropriate for a sample space of non-negative integers.

We can use a simple model like “Croston’s method” for this purpose. This method does not properly deal with the count nature of the data either, but it is used so often, that it is worth knowing about it.

With Croston’s method, we construct two new series from our original time series by noting which time periods contain zero values, and which periods contain non-zero values. Let \(q_i\) be the \(i\) th non-zero quantity, and let \(a_i\) be the time between \(q_{i-1}\) and \(q_i\).

Croston’s method involves separate simple exponential smoothing forecasts on the two new series \(a\) and \(q\). Because the method is usually applied to time series of demand for items, \(q\) is often called the “demand” and \(a\) the “inter-arrival time”.

If \(\hat{q}_{i+1|i}\) and \(\hat{a}_{i+1|i}\) are the one-step forecasts of the \((i+1)\) th demand and inter-arrival time respectively, based on data up to demand \(i\) , then Croston’s method gives.

\[\begin{align} \hat{q}_{i+1|i} & = (1-\alpha_q)\hat{q}_{i|i-1} + \alpha_q q_i, \tag{13.1}\\ \hat{a}_{i+1|i} & = (1-\alpha_a)\hat{a}_{i|i-1} + \alpha_a a_i. \tag{13.2} \end{align}\]

The smoothing parameters \(\alpha_a\) and \(\alpha_q\) take values between 0 and 1. Let \(j\) be the time for the last observed positive observation. Then the \(h\)-step ahead forecast for the demand at time \(T+h\) , is given by the ratio

\(hat{y}_{T+h|T} = \hat{q}_{j+1|j}/\hat{a}_{j+1|j}\)

There are no algebraic results allowing us to compute prediction intervals for this method, because the method does not correspond to any statistical model.

The CROSTON() function produces forecasts using Croston’s method. The two smoothing parameters \(\alpha_a\) and \(\alpha_q\) are estimated from the data.

Example: Pharmaceutical sales

Above plot shows for immune sera and immunoglobulin products in Australia. The data contain small counts, with many months registering no sales at all, and only small numbers of items sold in other months.

j06 <- PBS |>
  filter(ATC2 == "J06") |>
  summarise(Scripts = sum(Scripts))

j06 |> autoplot(Scripts) +
  labs(y="Number of scripts",
       title = "Sales for immune sera and immunoglobulins")

## [1] "Figure 13.3: Numbers of scripts sold for Immune sera and immunoglobulins on the Australian Pharmaceutical Benefits Scheme."
Table 13.1: The first 10 non-zero demand values.
Month Scripts
1991 Jul 1
1991 Aug 1
1991 Sep 1
1991 Oct 0
1991 Nov 0
1991 Dec 1
1992 Jan 3
1992 Feb 1
1992 Mar 1
1992 Apr 1
1992 May 1
1992 Jun 1
Table 13.2: The first 10 non-zero demand values shown as demand and inter-arrival series.
i 1 2 3 4 5 6 7 8 9 10
\[ q_i \] 1 1 1 1 3 1 1 1 1 1
\[ a_i \] 1 1 3 1 1 1 1 1 1

In this example, the smoothing parameters are estimated to be \(\alpha_a\) = 0.08, \(\alpha_q\)=0.71, \(\hat{q}_{1|0}\)=4.17, and \(\hat{a}_{1|0}\)=3.52. The final forecasts for the two series are \(\hat{q}_{T+1|T}\)=2.419 and \(\hat{a}_{T+1|T}\)=2.484. So the forecasts are all equal to \(\hat{y}_{T+h|T}\)=2.419/2.484=0.974.

In practice, fable does these calculations for you:

j06 |>
  model(CROSTON(Scripts)) |>
  forecast(h = 6)
## # A fable: 6 x 4 [1M]
## # Key:     .model [1]
##   .model              Month   Scripts .mean
##   <chr>               <mth>    <dist> <dbl>
## 1 CROSTON(Scripts) 2008 Jul 0.9735062 0.974
## 2 CROSTON(Scripts) 2008 Aug 0.9735062 0.974
## 3 CROSTON(Scripts) 2008 Sep 0.9735062 0.974
## 4 CROSTON(Scripts) 2008 Oct 0.9735062 0.974
## 5 CROSTON(Scripts) 2008 Nov 0.9735062 0.974
## 6 CROSTON(Scripts) 2008 Dec 0.9735062 0.974

The Scripts column repeats the mean rather than provide a full distribution, because there is no underlying stochastic model.

13.3 Ensuring forecasts stay within limits

It is typical to desire predictions to be accurate or to need them to fall within a certain range [a, b]. The use of transforms makes both of these situations manageable.

Overview of prices Dataset

prices
## # A tsibble: 198 x 7 [1Y]
##     year  eggs chicken copper nails   oil wheat
##    <dbl> <dbl>   <dbl>  <dbl> <dbl> <dbl> <dbl>
##  1  1800    NA      NA   9.28  169.    NA  920.
##  2  1801    NA      NA   9.00  172.    NA  898.
##  3  1802    NA      NA   8.56  219.    NA  687.
##  4  1803    NA      NA   8.60  189.    NA  646.
##  5  1804    NA      NA   9.60  188.    NA  662.
##  6  1805    NA      NA  10.1   188.    NA  822.
##  7  1806    NA      NA   9.96  163.    NA  736.
##  8  1807    NA      NA  10.4   174.    NA  694.
##  9  1808    NA      NA   8.55  160.    NA  683.
## 10  1809    NA      NA   8.60  163.    NA  786.
## # ... with 188 more rows

Removing initial rows from the data which contains NA in eggs price field

egg_prices <- prices |> filter(!is.na(eggs))
egg_prices
## # A tsibble: 94 x 7 [1Y]
##     year  eggs chicken copper nails   oil wheat
##    <dbl> <dbl>   <dbl>  <dbl> <dbl> <dbl> <dbl>
##  1  1900  277.      NA   6.81  54.1  23.1  399.
##  2  1901  315.      NA   6.94  48.6  18.6  404.
##  3  1902  315.      NA   4.74  41.6  14.9  425.
##  4  1903  321.      NA   5.21  39.5  16.9  399.
##  5  1904  315.      NA   4.98  36.3  15.5  416.
##  6  1905  318.      NA   6.01  36.1  11.2  437.
##  7  1906  303.      NA   7.45  37.3  13.1  410.
##  8  1907  289.      NA   7.81  38.9  12.5  439.
##  9  1908  292.      NA   5.18  40.0  13.0  469.
## 10  1909  321.      NA   5.10  36.5  12.6  531.
## # ... with 84 more rows

Slicing the required Dataset

egg_prices <- egg_prices[,c(1,2)]
egg_prices
## # A tsibble: 94 x 2 [1Y]
##     year  eggs
##    <dbl> <dbl>
##  1  1900  277.
##  2  1901  315.
##  3  1902  315.
##  4  1903  321.
##  5  1904  315.
##  6  1905  318.
##  7  1906  303.
##  8  1907  289.
##  9  1908  292.
## 10  1909  321.
## # ... with 84 more rows

Plotting the egg prices over the period to check for any Trend or Seasonality

autoplot(egg_prices) +
  labs(title = "Annual egg prices",
  y = "$US (in cents adjusted for inflation) ")

Positive Forecast

We can simply use the log scale to impose a positive restriction. Take the actual cost of a dozen eggs from 1900 to 1993, which is displayed in output, as an example. The forecast distributions are forced to remain positive because of the log transformation, which causes them to skew more and more as the mean drops.

egg_prices |>
  model(ETS(eggs ~ trend("A"))) |>
  forecast(h = 50) |>
  autoplot(egg_prices) +
  labs(title = "Annual egg prices",
       y = "$US (in cents adjusted for inflation) ")

After using log

egg_prices |>
  model(ETS(log(eggs) ~ trend("A"))) |>
  forecast(h = 50) |>
  autoplot(egg_prices) +
  labs(title = "Annual egg prices",
       y = "$US (in cents adjusted for inflation) ")

## [1] "Figure 13.4: Forecasts for the price of a dozen eggs, constrained to be positive using a log transformation."

Forecasts constrained to an interval

Imagine that the range of egg pricing was limited to be between 50 and 400. Then, using a scaled logit transform that maps (a,b) to the entire actual line, we can transform the data:

\[ y = \log\left(\frac{x-a}{b-x}\right), \]

where y is the modified data and x is the original scale.To reverse the transformation, we will use

\[ x = \frac{(b-a)e^y}{1+e^y} + a. \]

scaled_logit <- function(x, lower = 0, upper = 1) {
  log((x - lower) / (upper - x))
}
inv_scaled_logit <- function(x, lower = 0, upper = 1) {
  (upper - lower) * exp(x) / (1 + exp(x)) + lower
}
my_scaled_logit <- new_transformation(
                    scaled_logit, inv_scaled_logit)

After using the user defined transformation function

egg_prices |>
  model(
    ETS(my_scaled_logit(eggs, lower = 50, upper = 400)
          ~ trend("A"))
  ) |>
  forecast(h = 50) |>
  autoplot(egg_prices) +
  labs(title = "Annual egg prices",
       y = "$US (in cents adjusted for inflation) ")

## [1] "Figure 13.5: Forecasts for the price of a dozen eggs, constrained to be lie between 50 and 400 cents US."

As a result of the modification, the prediction intervals are over 50. The forecast distributions have significantly skewed due to this fictitious (and unreasonable) limitation.

13.4 Forecast combinations

An easy way to improve forecast accuracy is to use several different methods on the same time series, and to average the resulting forecasts.

Here is an example using monthly revenue from take-away food in Australia, from April 1982 to December 2018. We use forecasts from the following models: ETS, STL-ETS, and ARIMA; and we compare the results using the last 5 years (60 months) of observations.

auscafe <- aus_retail |>
  filter(stringr::str_detect(Industry, "Takeaway")) |>
  summarise(Turnover = sum(Turnover))
train <- auscafe |>
  filter(year(Month) <= 2013)
STLF <- decomposition_model(
  STL(log(Turnover) ~ season(window = Inf)),
  ETS(season_adjust ~ season("N"))
)
cafe_models <- train |>
  model(
    ets = ETS(Turnover),
    stlf = STLF,
    arima = ARIMA(log(Turnover))
  ) |>
  mutate(combination = (ets + stlf + arima) / 3)
cafe_fc <- cafe_models |>
  forecast(h = "5 years")

cafe_fc |>
  autoplot(auscafe |> filter(year(Month) > 2008),
           level = NULL) +
  labs(y = "$ billion",
       title = "Australian monthly expenditure on eating out")

## [1] "Figure 13.6: Point forecasts from various methods applied to Australian monthly expenditure on eating out."
cafe_fc |>
  accuracy(auscafe) |>
  arrange(RMSE)
## # A tibble: 4 x 10
##   .model      .type     ME  RMSE   MAE    MPE  MAPE  MASE RMSSE  ACF1
##   <chr>       <chr>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 combination Test    8.09  41.0  31.8  0.401  2.19 0.776 0.790 0.747
## 2 arima       Test  -25.4   46.2  38.9 -1.77   2.65 0.949 0.890 0.786
## 3 stlf        Test  -36.9   64.1  51.7 -2.55   3.54 1.26  1.23  0.775
## 4 ets         Test   86.5  122.  101.   5.51   6.66 2.46  2.35  0.880

ARIMA does particularly well with this series, while the combination approach does even better.

Forecast combinations Distributions

The cafe_fc object contains forecast distributions, from which any prediction interval can usually be computed. Let’s look at the intervals for the first period.

cafe_fc |> filter(Month == min(Month))
## # A fable: 4 x 4 [1M]
## # Key:     .model [4]
##   .model         Month           Turnover .mean
##   <chr>          <mth>             <dist> <dbl>
## 1 ets         2014 Jan      N(1289, 1118) 1289.
## 2 stlf        2014 Jan t(N(7.2, 0.00063)) 1326.
## 3 arima       2014 Jan t(N(7.2, 0.00061)) 1283.
## 4 combination 2014 Jan            1299.38 1299.

The first three are a mixture of normal and transformed normal distributions. The package does not yet combine such diverse distributions, so the combination output is simply the mean instead.

If we work with simulated sample paths, it is possible to create forecast distributions for the combination forecast as well.

cafe_futures <- cafe_models |>
  # Generate 1000 future sample paths
  generate(h = "5 years", times = 1000) |>
  # Compute forecast distributions from future sample paths
  as_tibble() |>
  group_by(Month, .model) |>
  summarise(
    dist = distributional::dist_sample(list(.sim))
  ) |>
  ungroup() |>
  # Create fable object
  as_fable(index = Month, key = .model,
           distribution = dist, response = "Turnover")
## `summarise()` has grouped output by 'Month'. You can override using the
## `.groups` argument.
## Warning: The dimnames of the fable's distribution are missing and have been set
## to match the response variables.
# Forecast distributions for h=1
cafe_futures |> filter(Month == min(Month))
## # A fable: 4 x 3 [1M]
## # Key:     .model [4]
##      Month .model              dist
##      <mth> <chr>             <dist>
## 1 2014 Jan arima       sample[1000]
## 2 2014 Jan combination sample[1000]
## 3 2014 Jan ets         sample[1000]
## 4 2014 Jan stlf        sample[1000]
#> # A fable: 4 x 3 [1M]
#> # Key:     .model [4]
#>      Month .model              dist
#>      <mth> <chr>             <dist>
#> 1 2014 Jan arima       sample[1000]
#> 2 2014 Jan combination sample[1000]
#> 3 2014 Jan ets         sample[1000]
#> 4 2014 Jan stlf        sample[1000]
cafe_futures |>
  filter(.model == "combination") |>
  autoplot(auscafe |> filter(year(Month) > 2008)) +
  labs(y = "$ billion",
       title = "Australian monthly expenditure on eating out")

## [1] "Figure 13.7: Prediction intervals for the combination forecast of Australian monthly expenditure on eating out."

To check the accuracy of the 95% prediction intervals, we can use a Winkler score

cafe_futures |>
  accuracy(auscafe, measures = interval_accuracy_measures,
    level = 95) |>
  arrange(winkler)
## # A tibble: 4 x 3
##   .model      .type winkler
##   <chr>       <chr>   <dbl>
## 1 combination Test     430.
## 2 stlf        Test     606.
## 3 ets         Test     722.
## 4 arima       Test     779.

Lower is better, so the combination forecast is again better than any of the component models.

13.5 Prediction intervals for aggregates

A common problem is to forecast the aggregate of several time periods of data, using a model fitted to the disaggregated data.

If the point forecasts are means, then adding them up will give a good estimate of the total. But prediction intervals are more tricky due to the correlations between forecast errors.

A general solution is to use simulations. Here is an example using ETS models applied to Australian take-away food sales, assuming we wish to forecast the aggregate revenue in the next 12 months.

fit <- auscafe |>
  # Fit a model to the data
  model(ETS(Turnover))
futures <- fit |>
  # Simulate 10000 future sample paths, each of length 12
  generate(times = 10000, h = 12) |>
  # Sum the results for each sample path
  as_tibble() |>
  group_by(.rep) |>
  summarise(.sim = sum(.sim)) |>
  # Store as a distribution
  summarise(total = distributional::dist_sample(list(.sim)))

We can compute the mean of the simulations, along with prediction intervals:

futures |>
  mutate(
    mean = mean(total),
    pi80 = hilo(total, 80),
    pi95 = hilo(total, 95)
  )
## # A tibble: 1 x 4
##           total   mean                   pi80                   pi95
##          <dist>  <dbl>                 <hilo>                 <hilo>
## 1 sample[10000] 19210. [18317.17, 20110.18]80 [17848.14, 20638.85]95
#> # A tibble: 1 × 4
#>           total   mean             pi80             pi95
#>          <dist>  <dbl>           <hilo>           <hilo>
#> 1 sample[10000] 19212. [18330, 20118]80 [17851, 20628]95

As expected, the mean of the simulated data is close to the sum of the individual forecasts.

forecast(fit, h = 12) |>
  as_tibble() |>
  summarise(total = sum(.mean))
## # A tibble: 1 x 1
##    total
##    <dbl>
## 1 19212.
#> # A tibble: 1 × 1
#>    total
#>    <dbl>
#> 1 19212.

13.6 Backcasting

Sometimes it is useful to “backcast” a time series — that is, forecast in reverse time.

There are no in-built R functions to do this, but it is easy to implement by creating a new time index.

Suppose we want to extend our Australian takeaway to the start of 1981 (the actual data starts in April 1982).

backcasts <- auscafe |>
  mutate(reverse_time = rev(row_number())) |>
  update_tsibble(index = reverse_time) |>
  model(ets = ETS(Turnover ~ season(period = 12))) |>
  forecast(h = 15) |>
  mutate(Month = auscafe$Month[1] - (1:15)) |>
  as_fable(index = Month, response = "Turnover",
    distribution = "Turnover")
backcasts |>
  autoplot(auscafe |> filter(year(Month) < 1990)) +
  labs(title = "Backcasts of Australian food expenditure",
       y = "$ (billions)")

## [1] "Figure 13.8: Backcasts for Australian monthly expenditure on cafés, restaurants and takeaway food services using an ETS model."

Most of the work here is in re-indexing the tsibble object and then re-indexing the fable object.

13.7 Very long and very short time series

Forecasting very short time series

We often get a question that how many data points we need to use, to fit a time series model.

It depends on the number of model parameters to be estimated and the amount of randomness in the data.

The data points required increases with the number of parameters to be estimated and the amount of noise in the data.

Short time series

In Short series, there is no enough data to allow some observations to run for testing purposes, and even time series cross validation can be difficult to apply.

Here AIC’s are very helpful because Choosing the model with the minimum AICc value allows both the number of parameters and the amount of noise to be taken into account.

Now we will fit an ARIMA model to the annual series from the M3-competition with fewer than 20 observations.

library(tsibble)
library(Mcomp)
## Warning: package 'Mcomp' was built under R version 4.1.3
## Loading required package: forecast
## Warning: package 'forecast' was built under R version 4.1.3
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
library(forecast)
library(dplyr)
m3totsibble <- function(z) {
  bind_rows(
    as_tsibble(z$x) |> mutate(Type = "Training"),
    as_tsibble(z$xx) |> mutate(Type = "Test")
  ) |>
    mutate(
      st = z$st,
      type = z$type,
      period = z$period,
      description = z$description,
      sn = z$sn
    ) |>
    as_tibble()
}
short <- Mcomp::M3 |>
  subset("yearly") |>
  purrr::map_dfr(m3totsibble) |>
  group_by(sn) |>
  mutate(n = max(row_number())) |>
  filter(n <= 20) |>
  ungroup() |>
  as_tsibble(index = index, key = c(sn, period, st))

short
## # A tsibble: 3,040 x 9 [1Y]
## # Key:       sn, period, st [152]
##    index value Type     st    type  period description        sn        n
##    <dbl> <dbl> <chr>    <chr> <chr> <chr>  <chr>              <chr> <int>
##  1  1975  941. Training Y1    MICRO YEARLY SALES ( CODE= ABT) N0001    20
##  2  1976 1085. Training Y1    MICRO YEARLY SALES ( CODE= ABT) N0001    20
##  3  1977 1245. Training Y1    MICRO YEARLY SALES ( CODE= ABT) N0001    20
##  4  1978 1445. Training Y1    MICRO YEARLY SALES ( CODE= ABT) N0001    20
##  5  1979 1683. Training Y1    MICRO YEARLY SALES ( CODE= ABT) N0001    20
##  6  1980 2038. Training Y1    MICRO YEARLY SALES ( CODE= ABT) N0001    20
##  7  1981 2343. Training Y1    MICRO YEARLY SALES ( CODE= ABT) N0001    20
##  8  1982 2602. Training Y1    MICRO YEARLY SALES ( CODE= ABT) N0001    20
##  9  1983 2928. Training Y1    MICRO YEARLY SALES ( CODE= ABT) N0001    20
## 10  1984 3104. Training Y1    MICRO YEARLY SALES ( CODE= ABT) N0001    20
## # ... with 3,030 more rows

Now we can apply an ARIMA model to each series.

short_fit <- short |>
  model(arima = ARIMA(value))
short_fit
## # A mable: 152 x 4
## # Key:     sn, period, st [152]
##    sn    period st                      arima
##    <chr> <chr>  <chr>                 <model>
##  1 N0001 YEARLY Y1             <ARIMA(1,2,0)>
##  2 N0002 YEARLY Y2     <ARIMA(1,0,0) w/ mean>
##  3 N0003 YEARLY Y3     <ARIMA(2,0,0) w/ mean>
##  4 N0004 YEARLY Y4             <ARIMA(0,1,0)>
##  5 N0005 YEARLY Y5     <ARIMA(1,0,0) w/ mean>
##  6 N0006 YEARLY Y6    <ARIMA(0,1,0) w/ drift>
##  7 N0007 YEARLY Y7             <ARIMA(0,1,0)>
##  8 N0008 YEARLY Y8             <ARIMA(0,1,1)>
##  9 N0009 YEARLY Y9    <ARIMA(0,1,0) w/ drift>
## 10 N0010 YEARLY Y10            <ARIMA(2,0,0)>
## # ... with 142 more rows

Of the 152 series, 21 had models with zero parameters (white noise and random walks), 86 had models with one parameter, 31 had models with two parameters, 13 had models with three parameters, and only 1 series had a model with four parameters.

Forecasting very long time series

Issues of using very long time series data.

Upto certain data the the models works well as expected but eventually we will have enough data that the difference between the true process and model starts to become more obvious.

And one more problem is that the optimisation of the parameters becomes more time consuming because of the number of observations involved.

Solutions to solve this problem.

To allow the model itself to change over time. ETS models are designed to handle this situation by allowing the trend and seasonal terms to evolve over time. 

One more approach we can use is to throw away the earliest observations and only fit a model to the most recent observations.

13.8 Forecasting on training and test sets

We make predictions for the next step using the training data and predict for multiple steps using the test data.

However, there are times when we want to predict multiple steps using the training data or one-step forecasts on the test data.

Multi-step forecasts on training data

Fitted values are usually one-step predictions made on the training data. However, the same approach can be used for making multi-step predictions. To demonstrate this, we’ll use an ARIMA model to predict Australian take-away food expenditure. We’ll use the last five years as a test set and plot the forecasts.

auscafe <- aus_retail |>
  filter(stringr::str_detect(Industry, "Takeaway")) |>
  summarise(Turnover = sum(Turnover))
training <- auscafe |> filter(year(Month) <= 2013)
test <- auscafe |> filter(year(Month) > 2013)
cafe_fit <- training |>
  model(ARIMA(log(Turnover)))
cafe_fit |>
  forecast(h = 60) |>
  autoplot(auscafe) +
  labs(title = "Australian food expenditure",
       y = "$ (billions)")

## [1] "Figure 13.9: Forecasts from an ARIMA model fitted to the Australian monthly expenditure on cafés, restaurants and takeaway food services."

Below plot of 12-step (one year) forecasts on the training set.

fits12 <- fitted(cafe_fit, h = 12)
training |>
  autoplot(Turnover) +
  autolayer(fits12, .fitted, col = "#D55E00") +
  labs(title = "Australian food expenditure",
       y = "$ (billions)")
## Warning: Removed 25 rows containing missing values (`geom_line()`).

## [1] "Figure 13.10: Twelve-step fitted values from an ARIMA model fitted to the Australian café training data."

One-step forecasts on test data

When creating a model, it’s typical to train it using one set of data and then test its performance on another set of data. However, this can cause issues because the testing process often involves predicting outcomes for different periods of time.

This means that the forecast errors will vary in terms of their variance depending on the length of the prediction. As a result, simply averaging the absolute or squared errors from the test set can lead to inaccuracies because it combines results with different variances.

One solution to this issue is to obtain 1-step errors on the test data.

Here our training data are for times 1,2,…,T−60. We estimate the model on these data, but then compute \[\hat{y}_{T-60+h|T-61+h}\], for h=1,…,T−1. Because the test data are not used to estimate the parameters, this still gives us a “fair” forecast.

Using the same ARIMA model used above, we now apply the model to the test data.

cafe_fit |>
  refit(test) |>
  accuracy()
## # A tibble: 1 x 10
##   .model               .type     ME  RMSE   MAE    MPE  MAPE  MASE RMSSE    ACF1
##   <chr>                <chr>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 ARIMA(log(Turnover)) Train~ -2.49  20.5  15.4 -0.169  1.06 0.236 0.259 -0.0502

13.9 Dealing with outliers and missing values

Real data often contains missing values, outlying observations, and other messy features. Dealing with them can sometimes be troublesome.

Outliers

Outliers are observations in a time series that are significantly different from the majority of the data. They can be either errors or unusual occurrences.

In dealing with outliers, it may be necessary to replace them with missing values or an estimate that is more consistent with the rest of the data.

Below plot shows the number of visitors to the Adelaide Hills region of South Australia. There appears to be an unusual observation in 2002 Q4.

tourism |>
  filter(
    Region == "Adelaide Hills", Purpose == "Visiting"
  ) |>
  autoplot(Trips) +
  labs(title = "Quarterly overnight trips to Adelaide Hills",
       y = "Number of trips")

## [1] "Figure 13.11: Number of overnight trips to the Adelaide Hills region of South Australia."

A useful way to find outliers is to apply STL() to the series with the argument robust=TRUE. Then any outliers should show up in the remainder series.

ah_decomp <- tourism |>
  filter(
    Region == "Adelaide Hills", Purpose == "Visiting"
  ) |>
  # Fit a non-seasonal STL decomposition
  model(
    stl = STL(Trips ~ season(period = 1), robust = TRUE)
  ) |>
  components()
ah_decomp |> autoplot()

## [1] "Figure 13.12: STL decomposition of visitors to the Adelaide Hills region of South Australia, with no seasonal component."

In more challenging cases, using a boxplot of the remainder series would be useful. We can identify as outliers those that are greater than 1.5 interquartile ranges (IQRs) from the central 50% of the data. If the remainder was normally distributed, this would show 7 in every 1000 observations as “outliers”.

A stricter rule is to define outliers as those that are greater than 3 interquartile ranges (IQRs) from the central 50% of the data, which would make only 1 in 500,000 normally distributed observations to be outliers. This is the rule we prefer to use.

outliers <- ah_decomp |>
  filter(
    remainder < quantile(remainder, 0.25) - 3*IQR(remainder) |
    remainder > quantile(remainder, 0.75) + 3*IQR(remainder)
  )
outliers
## # A dable: 1 x 9 [1Q]
## # Key:     Region, State, Purpose, .model [1]
## # :        Trips = trend + remainder
##   Region         State        Purpose .model Quarter Trips trend remai~1 seaso~2
##   <chr>          <chr>        <chr>   <chr>    <qtr> <dbl> <dbl>   <dbl>   <dbl>
## 1 Adelaide Hills South Austr~ Visiti~ stl    2002 Q4  81.1  11.1    70.0    81.1
## # ... with abbreviated variable names 1: remainder, 2: season_adjust
#> # A dable: 1 x 9 [1Q]
#> # Key:     Region, State, Purpose, .model [1]
#> # :        Trips = trend + remainder
#>   Region         State      Purpose .model Quarter Trips trend remai…¹ seaso…²
#>   <chr>          <chr>      <chr>   <chr>    <qtr> <dbl> <dbl>   <dbl>   <dbl>
#> 1 Adelaide Hills South Aus… Visiti… stl    2002 Q4  81.1  11.1    70.0    81.1
#> # … with abbreviated variable names ¹​remainder, ²​season_adjust

Missing values

When data is missing in a forecasting model, it’s important to consider if it will cause bias.

Sometimes, missing data can occur randomly without any particular pattern or reason. In such cases, if the missing data doesn’t affect the forecasting process, it can be dealt with more simply.

Finally, we might remove some unusual observations, thus creating missing values in the series.

Certain methods can handle missing values without any issues such as fable functions for ARIMA, dynamic regression, and NNAR models.

Solutions

There are two ways to handle missing values when they cause errors in data.
—> The first way is to use only the part of the data after the last missing value, assuming that there are enough observations to make reliable predictions.
—> The second way is to replace the missing values with estimated values. This involves fitting an ARIMA model to the data with missing values and then using the model to fill in the missing values.

We will replace the outlier identified in Figure 13.12 by an estimate using an ARIMA model.

ah_miss <- tourism |>
  filter(
    Region == "Adelaide Hills",
    Purpose == "Visiting"
  ) |>
  # Remove outlying observations
  anti_join(outliers) |>
  # Replace with missing values
  fill_gaps()
## Joining with `by = join_by(Quarter, Region, State, Purpose, Trips)`
ah_fill <- ah_miss |>
  # Fit ARIMA model to the data containing missing values
  model(ARIMA(Trips)) |>
  # Estimate Trips for all periods
  interpolate(ah_miss)
ah_fill |>
  # Only show outlying periods
  right_join(outliers |> select(-Trips))
## Joining with `by = join_by(Region, State, Purpose, Quarter)`
## # A tsibble: 1 x 9 [?]
## # Key:       Region, State, Purpose [1]
##   Region         State        Purpose Quarter Trips .model trend remai~1 seaso~2
##   <chr>          <chr>        <chr>     <qtr> <dbl> <chr>  <dbl>   <dbl>   <dbl>
## 1 Adelaide Hills South Austr~ Visiti~ 2002 Q4  8.50 stl     11.1    70.0    81.1
## # ... with abbreviated variable names 1: remainder, 2: season_adjust
#> # A tsibble: 1 x 9 [?]
#> # Key:       Region, State, Purpose [1]
#>   Region         State      Purpose Quarter Trips .model trend remai…¹ seaso…²
#>   <chr>          <chr>      <chr>     <qtr> <dbl> <chr>  <dbl>   <dbl>   <dbl>
#> 1 Adelaide Hills South Aus… Visiti… 2002 Q4  8.50 stl     11.1    70.0    81.1
#> # … with abbreviated variable names ¹​remainder, ²​season_adjust
ah_fill |>
  autoplot(Trips) +
  autolayer(ah_fill |> filter_index("2002 Q3"~"2003 Q1"),
    Trips, colour="#D55E00") +
  labs(title = "Quarterly overnight trips to Adelaide Hills",
       y = "Number of trips")

## [1] "Figure 13.13: Number of overnight trips to the Adelaide Hills region of South Australia with the 2002Q4 outlier being replaced using an ARIMA model for interpolation."

13.10 Further reading

The last chapter in Ord et al. (2017) also covers “Forecasting in practice” and discusses other issues that might be of interest to readers.

Bibliography

  • Ord, J. K., Fildes, R., & Kourentzes, N. (2017). Principles of business forecasting (2nd ed.). Wessex Press Publishing Co. [Amazon]

  • Christou, V., & Fokianos, K. (2015). On count time series prediction. Journal of Statistical Computation and Simulation, 85(2), 357–373. [DOI]

  • Croston, J. D. (1972). Forecasting and stock control for intermittent demands. Operational Research Quarterly, 23(3), 289–303. [DOI]

  • Shenstone, L., & Hyndman, R. J. (2005). Stochastic models underlying Croston’s method for intermittent demand forecasting. Journal of Forecasting, 24(6), 389–402. [DOI]

  • Bates, J. M., & Granger, C. W. J. (1969). The combination of forecasts. Operational Research Quarterly, 20(4), 451–468. [DOI]

  • Clemen, R. (1989). Combining forecasts: A review and annotated bibliography. International Journal of Forecasting, 5(4), 559–583. [DOI]