library(fpp3)
-- Attaching packages ----------------- fpp3 0.3 --
v tibble 3.0.6 v fable 0.2.1
v tsibbledata 0.2.0
-- Conflicts -------------------- fpp3_conflicts --
x lubridate::date() masks base::date()
x dplyr::filter() masks stats::filter()
x zoo::index() masks tsibble::index()
x lubridate::interval() masks tsibble::interval()
x dplyr::lag() masks stats::lag()
https://robjhyndman.com/seminars/uwa2017/
https://robjhyndman.com/seminars/isi2019workshop/
https://github.com/robjhyndman/ETC3550Slides/tree/fable
https://github.com/rstudio-conf-2020/time-series-forecasting/blob/master/materials/labs.R
Higher frequency time series often exhibit more complicated seasonal patterns. For example, daily data may have a weekly pattern as well as an annual pattern.
shows the number of calls to a North American commerical bank per 5-minute interval between 7:00am and 9:05pm each weekday over a 33 week period. The lower panel shows the first four weeks of the same time series. There is a strong daily seasonal pattern with frequency 169 (there are 169 5-minute intervals per day), and a weak weekly seasonal pattern with frequency $169×5=845. (Call volumes on Mondays tend to be higher than the rest of the week.) If a longer series of data were available, we may also have observed an annual seasonal pattern.
load(file = "datasets/bank_calls.rda")
bank_calls %>%
fill_gaps() %>%
autoplot(Calls) +
labs(y = "Call volume", x = "Date")
bank_calls %>%
fill_gaps() %>%
filter(between(DateTime, as.POSIXct("2003-03-03 07:00:00"), as.POSIXct("2003-03-30 21:00:00"))) %>%
autoplot(Calls) +
labs(y = "Call volume", x = "Date")
Apart from the multiple seasonal periods, this series has the additional complexity of missing values between the working periods.
The STL() function is designed to deal with multiple seasonality. It will return multiple seasonal components, as well as a trend and remainder component. In this case, we need to re-index the tsibble to avoid the missing values, and then explicitly give the seasonal periods.
calls <- bank_calls %>%
mutate(t = row_number()) %>%
update_tsibble(index = t, regular = TRUE)
calls %>%
model(STL(sqrt(Calls) ~ season(period = 169) + season(period = 5*169),
robust = TRUE)) %>%
components() %>%
autoplot() + labs(x = "Observation")
There are two seasonal patterns shown, one for the time of day (the third panel), and one for the time of week (the fourth panel). The trend and the weekly seasonality have wider bars (and therefore relatively narrower ranges) compared to the other components, because there is little trend seen in the data, and the weekly seasonality is weak.
The decomposition can also be used in forecasting, with each of the seasonal components forecast using a seasonal naïve method, and the seasonally adjusted data forecast using ETS.
The code is slightly more complicated than usual because we have to add back the time stamps that were lost when we re-indexed the tsibble to handle the periods of missing observations. The square root transformation used in the STL decomposition has ensured the forecasts remain positive.
# Plot results with last 3 weeks of data
fc_with_times %>%
autoplot(bank_calls %>% tail(14 * 169) %>% fill_gaps()) +
labs(x = "Date", y = "Call volume")
With multiple seasonalities, we can use Fourier terms
We will fit a dynamic harmonic regression model with an ARIMA error structure. The total number of Fourier terms for each seasonal period could be selected to minimise the AICc. However, for high seasonal periods, this tends to over-estimate the number of terms required, so we will use a more subjective choice with 10 terms for the daily seasonality and 5 for the weekly seasonality. Again, we will use a square root transformation to ensure the forecasts and prediction intervals remain positive. We set \(D=d=0\) in order to handle the non-stationarity through the regression terms, and \(P=Q=0\) in order to handle the seasonality through the regression terms.
fit <- calls %>%
model(
dhr = ARIMA(sqrt(Calls) ~ PDQ(0, 0, 0) + pdq(d = 0) +
fourier(period = 169, K = 10) + fourier(period = 5*169, K = 5))
)
fc <- fit %>% forecast(h = 5 * 169)
# Add correct time stamps to fable
fc_with_times <- bank_calls %>%
new_data(n = 7 * 24 * 60 / 5) %>%
mutate(time = format(DateTime, format = "%H:%M:%S")) %>%
filter(
time %in% format(bank_calls$DateTime, format = "%H:%M:%S"),
wday(DateTime, week_start = 1) <= 5
) %>%
mutate(t = row_number() + max(calls$t)) %>%
left_join(fc, by = "t") %>%
as_fable(response = "Calls", distribution = Calls)
# Plot results with last 3 weeks of data
fc_with_times %>%
autoplot(bank_calls %>% tail(14 * 169) %>% fill_gaps()) +
labs(x = "Date", y = "Call volume")
This is a large model, containing 33 parameters: 4 ARMA coefficients, 20 Fourier coefficients for frequency 169, and 8 Fourier coefficients for frequency 845. Not all of the Fourier terms for frequency 845 are used because there is some overlap with the terms of frequency 169 (since \(845=5×169\)).
One common application of such models is electricity demand modelling. half-hourly electricity demand (MWh) in Victoria, Australia, during 2012–2014, along with temperatures (degrees Celsius) for the same period for Melbourne (the largest city in Victoria)
vic_elec %>%
pivot_longer(Demand:Temperature, names_to = "Series") %>%
ggplot(aes(x = Time, y = value)) +
geom_line() +
facet_grid(rows = vars(Series), scales = "free_y") +
labs(x = "Time", y = "")
Plotting electricity demand against temperature (Figure 12.6) shows that there is a nonlinear relationship between the two, with demand increasing for low temperatures (due to heating) and increasing for high temperatures (due to cooling).
elec <- vic_elec %>%
mutate(
DOW = wday(Date, label = TRUE),
Working_Day = !Holiday & !(DOW %in% c("Sat", "Sun")),
Cooling = pmax(Temperature, 18)
)
elec %>%
ggplot(aes(x = Temperature, y = Demand, col = Working_Day)) +
geom_point(alpha = 0.6) +
labs(x = "Temperature (degrees Celsius)", y = "Demand (MWh)")
We will fit a regression model with a piecewise linear function of temperature (containing a knot at 18 degrees), and harmonic regression terms to allow for the daily seasonal pattern. Again, we set the orders of the Fourier terms subjectively, while using the AICc to select the order of the ARIMA errors.
fit <- elec %>%
model(
ARIMA(Demand ~ PDQ(0, 0, 0) + pdq(d = 0) +
Temperature + Cooling + Working_Day +
fourier(period = "day", K = 10) +
fourier(period = "week", K = 5) +
fourier(period = "year", K = 3))
)
Forecasting with such models is difficult because we require future values of the predictor variables. In the following example, we have used a repeat of the last two days of temperatures to generate future possible demand values.
temps <- tail(elec, 2 * 48)
fc <- fit %>%
forecast(new_data = tail(elec, 2 * 48))
fc %>%
autoplot(elec %>% tail(10 * 48)) +
labs(x = "Date", y = "Demand (MWh)")
Although the short-term forecasts look reasonable, this is a crude model for a complicated process. The residuals demonstrate that there is a lot of information that has not been captured with this model.
fit %>% gg_tsresiduals()
A recent proposal is the Prophet model, available via the fable.prophet package. originally for forecasting daily data with weekly and yearly seasonality, plus holiday effects. It was later extended to cover more types of seasonal data. It works best with time series that have strong seasonality and several seasons of historical data.
Prophet can be considered a nonlinear regression model:
library(fable.prophet)
package 㤼㸱fable.prophet㤼㸲 was built under R version 4.0.4Loading required package: Rcpp
cement <- aus_production %>%
filter(year(Quarter) >= 1988)
train <- cement %>%
filter(year(Quarter) <= 2007)
fit <- train %>%
model(
arima = ARIMA(Cement),
ets = ETS(Cement),
prophet = prophet(Cement ~ season(period = 4, order = 2, type = "multiplicative"))
)
Note that the seasonal term must have the period fully specified for quarterly and monthly data, as the default values are incorrect.
fc <- fit %>% forecast(h = "2 years 6 months")
fc %>% autoplot(cement)
In this example, the Prophet forecasts are worse than either the ETS or ARIMA forecasts.
fc %>% accuracy(cement)
We will fit a similar model to the dynamic harmonic regression (DHR) model from the previous section, but this time using a Prophet model. For daily and sub-daily data, the default periods are correctly specified, so that we can simply specify the period using a character string as follows.
elec <- vic_elec %>%
mutate(
DOW = wday(Date, label = TRUE),
Working_Day = !Holiday & !(DOW %in% c("Sat", "Sun")),
Cooling = pmax(Temperature, 18)
)
fit <- elec %>%
model(
prophet(Demand ~ Temperature + Cooling + Working_Day +
season(period = "day", order = 10) +
season(period = "week", order = 5) +
season(period = "year", order = 3))
)
fit %>%
components() %>%
autoplot()
The model specification is very similar to the DHR model in the previous section, although the result is different in several important ways. The Prophet model adds a piecewise linear time trend which is not really appropriate here as we don’t expect the long term forecasts to continue to follow the downward linear trend at the end of the series.
There is also substantial remaining autocorrelation in the residuals,
fit %>% gg_tsresiduals()
The resulting forecasts have prediction intervals that are too narrow, resulting in the actual values lying outside the prediction intervals for many periods.
temps <- tail(elec, 2 * 48)
fc <- fit %>%
forecast(new_data = tail(elec, 2 * 48))
fc %>%
autoplot(elec %>% tail(10 * 48)) +
labs(x = "Date", y = "Demand (MWh)")
Prophet has the advantage of being much faster to estimate than the DHR models we have considered previously, and it is completely automated. However, it rarely gives better forecast accuracy than the alternative approaches, as these two examples have illustrated.
One limitation of the models that we have considered so far is that they impose a unidirectional relationship — the forecast variable is influenced by the predictor variables, but not vice versa.
Such feedback relationships are allowed for in the vector autoregressive (VAR) framework. In this framework, all variables are treated symmetrically. They are all modelled as if they all influence each other equally. In more formal terminology, all variables are now treated as “endogenous.”
A VAR model is a generalisation of the univariate autoregressive model for forecasting a vector of time series.
If the series are stationary, we forecast them by fitting a VAR to the data directly (known as a “VAR in levels”). If the series are non-stationary, we take differences of the data in order to make them stationary, then fit a VAR model (known as a “VAR in differences”). In both cases, the models are estimated equation by equation using the principle of least squares. For each equation, the parameters are estimated by minimising the sum of squared \(e_{i,t}\) values.
There are two decisions one has to make when using a VAR to forecast, namely how many variables (denoted by \(K\)) and how many lags (denoted by \(p\)) should be included in the system. The number of coefficients to be estimated in a VAR is equal to \(K+pK^2\) (or \(1+pK\) per equation).
In practice, it is usual to keep \(K\) small and include only variables that are correlated with each other, and therefore useful in forecasting each other. Information criteria are commonly used to select the number of lags to be included. Care should be taken when using the AICc as it tends to choose large numbers of lags; instead, for VAR models, we often use the BIC instead. A more sophisticated version of the model is a “sparse VAR” (where many coefficients are set to zero); another approach is to use “shrinkage estimation” (where coefficients are smaller).
VARs are useful in several contexts:
fit <- us_change %>%
model(
aicc = VAR(vars(Consumption, Income)),
bic = VAR(vars(Consumption, Income), ic = "bic")
)
fit
glance(fit)
A VAR(5) model is selected using the AICc (the default), while a VAR(1) model is selected using the BIC. This is not unusual — the BIC will always select a model that has fewer parameters than the AICc model as it imposes a stronger penalty for the number of parameters.
fit %>%
augment() %>%
ACF(.innov) %>%
autoplot()
We see that the residuals from the VAR(1) model (bic) have significant autocorrelation for Consumption, while the VAR(5) model has effectively captured all the information in the data.
fit %>%
select(aicc) %>%
forecast() %>%
autoplot(us_change %>% filter(year(Quarter) > 2010))
fit %>%
select(bic) %>%
forecast() %>%
autoplot(us_change %>% filter(year(Quarter) > 2010))
Artificial neural networks are forecasting methods that are based on simple mathematical models of the brain. They allow complex nonlinear relationships between the response variable and its predictors.
With time series data, lagged values of the time series can be used as inputs to a neural network, just as we used lagged values in a linear autoregression model. We call this a neural network autoregression or NNAR model.
The NNETAR() function fits an \(NNAR(p,P,k)_m\) model. If the values of p and P are not specified, they are selected automatically. For non-seasonal time series, the default is the optimal number of lags (according to the AIC) for a linear \(AR(p)\) model. For seasonal time series, the default values are \(P=1\) and p is chosen from the optimal linear model fitted to the seasonally adjusted data. If k is not specified, it is set to \(k=(p+P+1)/2\) (rounded to the nearest integer).
When it comes to forecasting, the network is applied iteratively. For forecasting one step ahead, we simply use the available historical inputs. For forecasting two steps ahead, we use the one-step forecast as an input, along with the historical data. This process proceeds until we have computed all the required forecasts.
The surface of the sun contains magnetic regions that appear as dark spots. These affect the propagation of radio waves, and so telecommunication companies like to predict sunspot activity in order to plan for any future difficulties. Sunspots follow a cycle of length between 9 and 14 years. Forecasts from an NNAR(9,5) are shown for the next 30 years. We have used a square root transformation to ensure the forecasts stay positive.
sunspots <- sunspot.year %>% as_tsibble()
fit <- sunspots %>%
model(NNETAR(sqrt(value)))
fit %>%
forecast(h = 30) %>%
autoplot(sunspots) +
labs(x = "Year", y = "Counts",
title = "Yearly sunspots")
Here, the last 9 observations are used as predictors, and there are 5 neurons in the hidden layer. The cyclicity in the data has been modelled well. We can also see the asymmetry of the cycles has been captured by the model, where the increasing part of the cycle is steeper than the decreasing part of the cycle. This is one difference between a NNAR model and a linear AR model — while linear AR models can model cyclicity, the modelled cycles are always symmetric.
Here is a simulation of 9 possible future sample paths for the sunspot data. Each sample path covers the next 30 years after the observed data.
fit %>%
generate(times = 9, h = 30) %>%
autoplot(.sim) +
autolayer(sunspots, value) +
theme(legend.position = "none")
We can generate new time series that are similar to our observed series, using another type of bootstrap.
First, the time series is transformed if necessary, and then decomposed into trend, seasonal and remainder components using STL. Then we obtain shuffled versions of the remainder component to get bootstrapped remainder series. Because there may be autocorrelation present in an STL remainder series, we use a “blocked bootstrap,” where contiguous sections of the time series are selected at random and joined together. These bootstrapped remainder series are added to the trend and seasonal components, and the transformation is reversed to give variations on the original time series.
Consider the quarterly cement production in Australia from 1988 Q1 to 2010 Q2. First we check that the decomposition has adequately captured the trend and seasonality, and that there is no obvious remaining signal in the remainder series.
cement <- aus_production %>%
filter(year(Quarter) >= 1988) %>%
select(Quarter, Cement)
cement_stl <- cement %>%
model(stl = STL(Cement))
cement_stl %>%
components() %>%
autoplot()
Now we can generate several bootstrapped versions of the data. Usually, generate() produces simulations of the future from a model. But here we want simulations for the period of the historical data. So we use the new_data argument to pass in the original data so that the same time periods are used for the simulated data. We will use a block size of 8 to cover two years of data.
library(fable)
cement_stl %>%
generate(new_data = cement,
bootstrap = TRUE) %>%
autoplot(.sim) +
autolayer(cement, Cement) +
guides(colour = "none") +
labs(title = "Cement production: Bootstrapped series",
y="Tonnes ('000)")
argument is not numeric or logical: returning NAError in sample.int(length(x), size, replace, prob) :
invalid first argument
One use for these bootstrapped time series is to improve forecast accuracy. If we produce forecasts from each of the additional time series, and average the resulting forecasts, we get better forecasts than if we simply forecast the original time series directly. This is called “bagging” which stands for “bootstrap aggregating.”
We demonstrate the idea using the cement data. First, we simulate many time series that are similar to the original data, using the block-bootstrap described above.
sim <- cement_stl %>%
generate(new_data = cement, times = 100,
bootstrap_block_size = 8) %>%
select(-.model, -Cement)
Error in UseMethod("generate") :
no applicable method for 'generate' applied to an object of class "stl_decomposition"
For each of these series, we fit an ETS model. A different ETS model may be selected in each case, although it will most likely select the same model because the series are similar. However, the estimated parameters will be different, so the forecasts will be different even if the selected model is the same. This is a time-consuming process as there are a large number of series.
ets_forecasts <- sim %>%
model(ets = ETS(.sim)) %>%
forecast(h = 12)
ets_forecasts %>%
update_tsibble(key = .rep) %>%
autoplot(.mean) +
autolayer(cement, Cement) +
guides(col = FALSE) +
labs(title = "Cement production: boostrapped forecasts",
y="Tonnes ('000)")
Finally, we average these forecasts for each time period to obtained the “bagged forecasts” for the original data.
bagged <- ets_forecasts %>%
summarise(bagged_mean = mean(.mean))
cement %>%
model(ets = ETS(Cement)) %>%
forecast(h = 12) %>%
autoplot(cement) +
autolayer(bagged, bagged_mean, col = "red") +
labs(title = "Cement production in Australia",
y="Tonnes ('000)")
On average, bagging gives better forecasts than just applying ETS() directly. Of course, it is slower because a lot more computation is required.
Weekly data is difficult to work with because the seasonal period (the number of weeks in a year) is both large and non-integer. The average number of weeks in a year is 52.18. Most of the methods we have considered require the seasonal period to be an integer. Even if we approximate it by 52, most of the methods will not handle such a large seasonal period efficiently.
The simplest approach is to use an STL decomposition along with a non-seasonal method applied to the seasonally adjusted data. Here is an example using weekly data on US finished motor gasoline products supplied (in millions of barrels per day) from February 1991 to May 2005.
my_dcmp_spec <- decomposition_model(
STL(Barrels),
ETS(season_adjust ~ season("N"))
)
us_gasoline %>%
model(stl_ets = my_dcmp_spec) %>%
forecast(h = "2 years") %>%
autoplot(us_gasoline) + labs(y = "Millions of barrels per day")
An alternative approach is to use a dynamic harmonic regression model. In the following example, the number of Fourier terms was selected by minimising the AICc. The order of the ARIMA model is also selected by minimising the AICc, although that is done within the ARIMA() function. We use PDQ(0,0,0) to prevent ARIMA() trying to handle the seasonality using seasonal ARIMA components.
gas_dhr <- us_gasoline %>%
model(dhr = ARIMA(Barrels ~ PDQ(0, 0, 0) + fourier(K = 6)))
gas_dhr %>%
forecast(h = "2 years") %>%
autoplot(us_gasoline)
The STL approach is preferable when the seasonality changes over time. The dynamic harmonic regression approach is preferable if there are covariates that are useful predictors as these can be added as additional regressors.
Daily and sub-daily data are challenging for a different reason — they often involve multiple seasonal patterns, and so we need to use a method that handles such complex seasonality.
if the time series is relatively short so that only one type of seasonality is present, then it will be possible to use one of the single-seasonal methods e.g., ETS or a seasonal ARIMA model). But when the time series is long enough so that some of the longer seasonal periods become apparent, it will be necessary to use STL, dynamic harmonic regression or Prophet.
However, these methods only allow for regular seasonality. Capturing seasonality associated with moving events such as Easter, Id, or the Chinese New Year is more difficult.
The best way to deal with moving holiday effects is to include dummy variables in the model. This can be done within the ARIMA() or prophet() functions, for example, but not within ETS(). In fact, prophet() has a holiday() special to easily incorporate holiday effects.
The CROSTON() function produces forecasts using Croston’s method. The two smoothing parameters \(α_a\) and \(α_q\) are estimated from the data. This is different from the way Croston envisaged the method being used. He would simply use \(α_a=α_q=0.1\) , and set \(a_0\) and \(q_0\) to be equal to the first observation in each of the series.
Figure 13.3 shows the numbers of scripts sold each month for Immune sera and immunoglobulin products in Australia. The data contain small counts, with many months registering no sales at all, and only small numbers of items sold in other months.
j06 <- PBS %>%
filter(ATC2 == "J06") %>%
summarise(Scripts = sum(Scripts)) %>%
fill_gaps(Scripts = 0)
j06 %>% autoplot(Scripts)
In this example, the smoothing parameters are estimated to be \(α_a=0.08\), \(α_q=0.71\), \(\hat{q}_{1|0}=4.17\), and \(\hat{a}_{1|0}=3.52\). The final forecasts for the two series are \(\hat{q}_{T+1|T} = 2.419\) and \(\hat{a}_{T+1|T} = 2.484\). So the forecasts are all equal to \(\hat{y}_{T+h|T} = 2.419/2.484 = 0.974\).
fable does these calculations for you:
j06 %>%
model(CROSTON(Scripts)) %>%
forecast(h = 6)
The Scripts column repeats the mean rather than provide a full distribution, because there is no underlying stochastic model.
It is common to want forecasts to be positive, or to require them to be within some specified range \([a,b]\). Both of these situations are relatively easy to handle using transformations.
To impose a positivity constraint, we can simply work on the log scale.
load(file = "datasets/prices.rda")
# real price of a dozen eggs (1900-1993; in cents)
egg_prices <- prices %>% filter(!is.na(eggs))
egg_prices %>%
model(ETS(log(eggs) ~ trend("A"))) %>%
forecast(h = 50) %>%
autoplot(egg_prices)
To see how to handle data constrained to an interval, imagine that the egg prices were constrained to lie within \(a=50\) and \(b=400\).
\(y = \log\left(\frac{x-a}{b-x}\right)\)
\(y = \log\left(\frac{x-a}{b-x}\right)\)
scaled_logit <- function(x, lower = 0, upper = 1) {
log((x - lower) / (upper - x))
}
inv_scaled_logit <- function(x, lower = 0, upper = 1) {
(upper - lower) * exp(x) / (1 + exp(x)) + lower
}
my_scaled_logit <- new_transformation(scaled_logit, inv_scaled_logit)
egg_prices %>%
model(ETS(my_scaled_logit(eggs, lower = 50, upper = 400) ~ trend("A"))) %>%
forecast(h = 50) %>%
autoplot(egg_prices)
The bias-adjustment is automatically applied here, and the prediction intervals from these transformations have the same coverage probability as on the transformed scale, because quantiles are preserved under monotonically increasing transformations.
An easy way to improve forecast accuracy is to use several different methods on the same time series, and to average the resulting forecasts.
Here is an example using monthly revenue from take-away food in Australia, from April 1982 to December 2018. We use forecasts from the following models: ETS, STL-ETS, and ARIMA; and we compare the results using the last 5 years (60 months) of observations.
auscafe <- aus_retail %>%
filter(stringr::str_detect(Industry, "Takeaway")) %>%
summarise(Turnover = sum(Turnover))
train <- auscafe %>%
filter(year(Month) <= 2013)
STLF <- decomposition_model(
STL(log(Turnover) ~ season(window = Inf)),
ETS(season_adjust ~ season("N"))
)
cafe_models <- train %>%
model(
ets = ETS(Turnover),
stlf = STLF,
arima = ARIMA(log(Turnover))
) %>%
mutate(combination = (ets + stlf + arima) / 3)
cafe_fc <- cafe_models %>%
forecast(h = "5 years")
cafe_fc %>%
autoplot(auscafe %>% filter(year(Month) > 2008), level = NULL) +
labs(y = "$ billion", title = "Australian monthly expenditure on eating out")
cafe_fc %>%
accuracy(auscafe) %>%
arrange(RMSE)
ARIMA does particularly well with this series, while the combination approach does even better (based on most measures including RMSE and MAE).
The cafe_fc object contains forecast distributions, from which any prediction interval can usually be computed. Let’s look at the intervals for the first period.
cafe_fc %>% filter(Month == min(Month))
The package does not yet combine such diverse distributions, so the combination output is simply the mean instead.
However, if we work with simulated sample paths, it is possible to create forecast distributions for the combination forecast as well.
cafe_futures <- cafe_models %>%
# Generate 1000 future sample paths
generate(h = "5 years", times = 1000) %>%
# Compute forecast distributions from future sample paths
as_tibble() %>%
group_by(Month, .model) %>%
summarise(dist = distributional::dist_sample(list(.sim))) %>%
ungroup() %>%
# Create fable object
as_fable(index = Month, key = .model, distribution = dist, response = "Turnover")
`summarise()` has grouped output by 'Month'. You can override using the `.groups` argument.
The dimnames of the fable's distribution are missing and have been set to match the response variables.
# Forecast distributions for h=1
cafe_futures %>% filter(Month == min(Month))
Now all four models and the combination are stored as empirical distributions, and we can plot prediction intervals for the combination forecast.
cafe_futures %>%
filter(.model == "combination") %>%
autoplot(auscafe %>% filter(year(Month) > 2008)) +
labs(y = "$ billion", title = "Australian monthly expenditure on eating out")
To check the accuracy of the 95% prediction intervals, we can use a Winkler score
cafe_futures %>% accuracy(auscafe, measures = interval_accuracy_measures, level = 95)
Lower is better, so the combination forecast is again better than any of the component models.
A common problem is to forecast the aggregate of several time periods of data, using a model fitted to the disaggregated data. For example, we may have monthly data but wish to forecast the total for the next year. Or we may have weekly data, and want to forecast the total for the next four weeks.
If the point forecasts are means, then adding them up will give a good estimate of the total. But prediction intervals are more tricky due to the correlations between forecast errors.
A general solution is to use simulations. Here is an example using ETS models applied to Australian take-away food sales, assuming we wish to forecast the aggregate revenue in the next 12 months.
fit <- auscafe %>%
# Fit a model to the data
model(ETS(Turnover))
futures <- fit %>%
# Simulate 10000 future sample paths, each of length 12
generate(times = 10000, h = 12) %>%
# Sum the results for each sample path
as_tibble() %>%
group_by(.rep) %>%
summarise(.sim = sum(.sim)) %>%
# Store as a distribution
summarise(total = distributional::dist_sample(list(.sim)))
We can compute the mean of the simulations, along with prediction intervals:
futures %>%
mutate(
mean = mean(total),
pi80 = hilo(total, 80),
pi95 = hilo(total, 95)
)
As expected, the mean of the simulated data is close to the sum of the individual forecasts.
forecast(fit, h = 12) %>%
as_tibble() %>%
summarise(total = sum(.mean))
Sometimes it is useful to “backcast” a time series — that is, forecast in reverse time. Although there are no in-built R functions to do this, it is easy to implement by creating a new time index.
Suppose we want to extend our Australian takeaway to the start of 1981 (the actual data starts in April 1982).
backcasts <- auscafe %>%
mutate(reverse_time = rev(row_number())) %>%
update_tsibble(index = reverse_time) %>%
model(ets = ETS(Turnover ~ season(period = 12))) %>%
forecast(h = 15) %>%
mutate(Month = auscafe$Month[1] - (1:15)) %>%
as_fable(index = Month, response = "Turnover", distribution = "Turnover")
backcasts %>%
autoplot(auscafe %>% filter(year(Month) < 1985)) +
labs(title = "Backcasts from ETS model")
Most of the work here is in re-indexing the tsibble object and then re-indexing the fable object.
The sample size required increases with the number of parameters to be estimated, and the amount of noise in the data.
The AICc is particularly useful here, because it is a proxy for the one-step forecast out-of-sample MSE. Choosing the model with the minimum AICc value allows both the number of parameters and the amount of noise to be taken into account.
What tends to happen with short series is that the AICc suggests simple models because anything with more than one or two parameters will produce poor forecasts due to the estimation error. We will fit an ARIMA model to the annual series from the M3-competition with fewer than 20 observations. First we need to create a tsibble, containing the relevant series.
m3totsibble <- function(z) {
bind_rows(
as_tsibble(z$x) %>% mutate(Type = "Training"),
as_tsibble(z$xx) %>% mutate(Type = "Test")
) %>%
mutate(
st = z$st,
type = z$type,
period = z$period,
description = z$description,
sn = z$sn,
) %>%
as_tibble()
}
short <- Mcomp::M3 %>%
subset("yearly") %>%
purrr::map_dfr(m3totsibble) %>%
group_by(sn) %>%
mutate(n = max(row_number())) %>%
filter(n <= 20) %>%
ungroup() %>%
as_tsibble(index = index, key = c(sn, period, st))
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
Now we can apply an ARIMA model to each series.
short_fit <- short %>%
model(arima = ARIMA(value))
Of the 152 series, 21 had models with zero parameters (white noise and random walks), 86 had models with one parameter, 31 had models with two parameters, 13 had models with three parameters, and only 1 series had a model with four parameters.
Most time series models do not work well for very long time series. The problem is that real data do not come from the models we use. When the number of observations is not large (say up to about 200) the models often work well as an approximation to whatever process generated the data. But eventually we will have enough data that the difference between the true process and the model starts to become more obvious. An additional problem is that the optimisation of the parameters becomes more time consuming because of the number of observations involved.
What to do about these issues depends on the purpose of the model. A more flexible and complicated model could be used, but this still assumes that the model structure will work over the whole period of the data. A better approach is usually to allow the model itself to change over time. ETS models are designed to handle this situation by allowing the trend and seasonal terms to evolve over time. ARIMA models with differencing have a similar property. But dynamic regression models do not allow any evolution of model components.
If we are only interested in forecasting the next few observations, one simple approach is to throw away the earliest observations and only fit a model to the most recent observations. Then an inflexible model can work well because there is not enough time for the relationships to change substantially.
Typically, we compute one-step forecasts on the training data (the “fitted values”) and multi-step forecasts on the test data. However, occasionally we may wish to compute multi-step forecasts on the training data, or one-step forecasts on the test data.
We will illustrate the method using an ARIMA model for the Australian take-away food expenditure. The last five years are used for a test set, and the forecasts are plotted.
training <- auscafe %>% filter(year(Month) <= 2013)
test <- auscafe %>% filter(year(Month) > 2013)
cafe_fit <- training %>%
model(ARIMA(log(Turnover)))
cafe_fit %>%
forecast(h = 60) %>%
autoplot(auscafe)
The fitted() function has an h argument to allow for
h-step “fitted values” on the training set. Because the model involves both seasonal (lag 12) and first (lag 1) differencing, it is not possible to compute these forecasts for the first few observations.
fits12 <- fitted(cafe_fit, h = 12)
training %>%
autoplot(Turnover) +
autolayer(fits12, aes(.fitted), colour = "red")
Error in is.finite(x) : default method not implemented for type 'list'
It is common practice to fit a model using training data, and then to evaluate its performance on a test data set. The way this is usually done means the comparisons on the test data use different forecast horizons. In the above example, we have used the last sixty observations for the test data, and estimated our forecasting model on the training data. Then the forecast errors will be for 1-step, 2-steps, …, 60-steps ahead. The forecast variance usually increases with the forecast horizon, so if we are simply averaging the absolute or squared errors from the test set, we are combining results with different variances.
One solution to this issue is to obtain 1-step errors on the test data. That is, we still use the training data to estimate any parameters, but when we compute forecasts on the test data, we use all of the data preceding each observation (both training and test data). So our training data are for times 1,2,…,T−60. We estimate the model on these data, but then compute \(\hat{y}_{T-60+h|T-61+h}\), for $h=1,…,T−1. Because the test data are not used to estimate the parameters, this still gives us a “fair” forecast.
Using the same ARIMA model used above, we now apply the model to the test data.
Using the same ARIMA model used above, we now apply the model to the test data.
cafe_fit %>%
refit(test) %>%
accuracy()
Note that model is not re-estimated in this case. Instead, the model obtained previously (and stored as cafe_fit) is applied to the test data. Because the model was not re-estimated, the “residuals” obtained here are actually one-step forecast errors. Consequently, the results produced from the accuracy() command are actually on the test set (despite the output saying “Training set”). This approach can be used to compare one-step forecasts from different models.
the number of visitors to the Adelaide Hills region of South Australia. There appears to be an unusual observation in 2002 Q4.
tourism %>%
filter(Region == "Adelaide Hills", Purpose == "Visiting") %>%
autoplot(Trips)
One useful way to find outliers is to apply STL() to the series with the argument robust=TRUE. Then any outliers should show up in the remainder series. A boxplot of the remainder series would show outliers as those that are greater than 1.5 interquartile ranges (IQRs) from the central 50% of the data. If the remainder was normally distributed, this would show 7 in every 1000 observations as “outliers.” A stricter rule is to define outliers as those that are greater than 3 interquartile ranges (IQRs) from the central 50% of the data, which would make only 1 in 500,000 normally distributed observations to be outliers. This is the rule we prefer to use.
The data have almost no visible seasonality, so we will apply STL without a seasonal component by setting period=1.
ah_decomp <- tourism %>%
filter(Region == "Adelaide Hills", Purpose == "Visiting") %>%
# Fit a non-seasonal STL decomposition
model(stl = STL(Trips ~ season(period = 1), robust = TRUE)) %>%
components()
ah_decomp %>% autoplot()
outliers <- ah_decomp %>%
filter(
remainder < quantile(remainder, 0.25) - 3 * IQR(remainder) |
remainder > quantile(remainder, 0.75) + 3 * IQR(remainder)
)
outliers
This finds the one outlier that we suspected.
Missing data can arise for many reasons, and it is worth considering whether the missingness will induce bias in the forecasting model.
Some methods allow for missing values without any problems. For example, the naïve forecasting method continues to work, with the most recent non-missing value providing the forecast for the future time periods.
The fable functions for ARIMA models, dynamic regression models and NNAR models will also work correctly without causing errors. However, other modelling functions do not handle missing values including ETS() and STL().
When missing values cause errors, there are at least two ways to handle the problem. First, we could just take the section of data after the last missing value, assuming there is a long enough series of observations to produce meaningful forecasts. Alternatively, we could replace the missing values with estimates by first fitting an ARIMA model, and then using the model to interpolate the missing observations.
We will replace the outlier identified by an estimate using an ARIMA model.
ah_miss <- tourism %>%
filter(Region == "Adelaide Hills", Purpose == "Visiting") %>%
# Remove outlying observations
anti_join(outliers) %>%
# Replace with missing values
fill_gaps()
Joining, by = c("Quarter", "Region", "State", "Purpose", "Trips")
ah_fill <- ah_miss %>%
# Fit ARIMA model to the data containing missing values
model(ARIMA(Trips)) %>%
# Estimate Trips for all periods
interpolate(ah_miss)
ah_fill %>%
# Only show outlying periods
right_join(outliers %>% select(-Trips))
Joining, by = c("Region", "State", "Purpose", "Quarter")
The interpolate() function uses the ARIMA model to estimate any missing values in the series. In this case, the outlier of 81.1 has been replaced with 8.5.
The ah_fill data could now be modeled with a function that does not allow missing values.