The process of producing forecasts can be split up into a few fundamental steps.
- Preparing data
- Data visualisation
- Specifying a model
- Model estimation
- Accuracy & performance evaluation
- Producing forecasts
2022-09-19
The process of producing forecasts can be split up into a few fundamental steps.
To illustrate the process, we will fit linear trend models to national GDP data stored in global_economy.
The first step in forecasting is to prepare data in the correct format.
It envolves loading in data, identifying missing values, filtering the time series, and other pre-processing tasks.
The functionality provided by tsibble and other packages in the tidyverse simplifies this step.
Each data set is unique, so it is an essential step to understanding the data’s features and should always be done before models are estimated.
gdppc <- global_economy %>% mutate(GDP_per_capita = GDP/Population) %>% select(Year, Country, GDP, Population, GDP_per_capita) gdppc
## # A tsibble: 15,150 x 5 [1Y] ## # Key: Country [263] ## Year Country GDP Population GDP_per_capita ## <dbl> <fct> <dbl> <dbl> <dbl> ## 1 1960 Afghanistan 537777811. 8996351 59.8 ## 2 1961 Afghanistan 548888896. 9166764 59.9 ## 3 1962 Afghanistan 546666678. 9345868 58.5 ## 4 1963 Afghanistan 751111191. 9533954 78.8 ## 5 1964 Afghanistan 800000044. 9731361 82.2 ## 6 1965 Afghanistan 1006666638. 9938414 101. ## 7 1966 Afghanistan 1399999967. 10152331 138. ## 8 1967 Afghanistan 1673333418. 10372630 161. ## 9 1968 Afghanistan 1373333367. 10604346 130. ## 10 1969 Afghanistan 1408888922. 10854428 130. ## # … with 15,140 more rows
gdppc %>%
filter(Country=="Brazil") %>%
autoplot(GDP_per_capita) +
labs(title = "GDP per capita for Brazil", y = "$US")
There are many different time series models that can be used for forecasting.
Specifying an appropriate model for the data is essential for producing appropriate forecasts.
We can specify different models using the function model in the package fable.
This function use a formula (y ~ x) interface. The response variable(s) are specified on the left of the formula, and the structure of the model is written on the right.
gdppc %>%
filter(Country== "Brazil") %>%
model(TSLM(GDP_per_capita ~ trend()))
## # A mable: 1 x 2 ## # Key: Country [1] ## Country `TSLM(GDP_per_capita ~ trend())` ## <fct> <model> ## 1 Brazil <TSLM>
Once an appropriate model is specified, the next step is to estimate (train) the model given our data.
One or more model specifications can be estimated using the model() function.
estim<-gdppc %>%
model(trend_model = TSLM(GDP_per_capita ~ trend()))
estim
## # A mable: 263 x 2 ## # Key: Country [263] ## Country trend_model ## <fct> <model> ## 1 Afghanistan <TSLM> ## 2 Albania <TSLM> ## 3 Algeria <TSLM> ## 4 American Samoa <TSLM> ## 5 Andorra <TSLM> ## 6 Angola <TSLM> ## 7 Antigua and Barbuda <TSLM> ## 8 Arab World <TSLM> ## 9 Argentina <TSLM> ## 10 Armenia <TSLM> ## # … with 253 more rows
Again, we fit a linear trend model to the GDP per capita data for each combination of key variables (countries) in the tsibble.
In this example, it will fit a model to each of the 263 countries in the dataset. The resulting object is a model table or a “mable”.
The trend_model column contains information about the fitted model for each country (each cell corresponds to a fitted model)
estim
## # A mable: 263 x 2 ## # Key: Country [263] ## Country trend_model ## <fct> <model> ## 1 Afghanistan <TSLM> ## 2 Albania <TSLM> ## 3 Algeria <TSLM> ## 4 American Samoa <TSLM> ## 5 Andorra <TSLM> ## 6 Angola <TSLM> ## 7 Antigua and Barbuda <TSLM> ## 8 Arab World <TSLM> ## 9 Argentina <TSLM> ## 10 Armenia <TSLM> ## # … with 253 more rows
Once a model has been fitted, it is important to check how well it has performed on the data.
There are several diagnostic tools available to check model behavior, and also accuracy measures that allow one model to be compared against another.
It is an essential step to make sure your model account for the behavior of the data !!!
With an appropriate model specified, estimated and checked, it is time to produce the forecasts using the function forecast().
We have to specify the number of future observations to forecast. For example, forecast the next 10 observations can be generated using h = 10.
We can also use natural language; e.g., h = "2 years" can be used to predict two years into the future
fore<-estim %>% forecast(h = 3) fore
## # A fable: 789 x 5 [1Y] ## # Key: Country, .model [263] ## Country .model Year GDP_per_capita .mean ## <fct> <chr> <dbl> <dist> <dbl> ## 1 Afghanistan trend_model 2018 N(526, 9653) 526. ## 2 Afghanistan trend_model 2019 N(534, 9689) 534. ## 3 Afghanistan trend_model 2020 N(542, 9727) 542. ## 4 Albania trend_model 2018 N(4716, 476419) 4716. ## 5 Albania trend_model 2019 N(4867, 481086) 4867. ## 6 Albania trend_model 2020 N(5018, 486012) 5018. ## 7 Algeria trend_model 2018 N(4410, 643094) 4410. ## 8 Algeria trend_model 2019 N(4489, 645311) 4489. ## 9 Algeria trend_model 2020 N(4568, 647602) 4568. ## 10 American Samoa trend_model 2018 N(12491, 652926) 12491. ## # … with 779 more rows
It is generate a forecast table, or fable.
Each row corresponds to one forecast period for each country.
The GDP_per_capita column contains the forecast distribution, while the .mean column contains the point forecast. The point forecast is the mean (or average) of the forecast distribution.
fore
## # A fable: 789 x 5 [1Y] ## # Key: Country, .model [263] ## Country .model Year GDP_per_capita .mean ## <fct> <chr> <dbl> <dist> <dbl> ## 1 Afghanistan trend_model 2018 N(526, 9653) 526. ## 2 Afghanistan trend_model 2019 N(534, 9689) 534. ## 3 Afghanistan trend_model 2020 N(542, 9727) 542. ## 4 Albania trend_model 2018 N(4716, 476419) 4716. ## 5 Albania trend_model 2019 N(4867, 481086) 4867. ## 6 Albania trend_model 2020 N(5018, 486012) 5018. ## 7 Algeria trend_model 2018 N(4410, 643094) 4410. ## 8 Algeria trend_model 2019 N(4489, 645311) 4489. ## 9 Algeria trend_model 2020 N(4568, 647602) 4568. ## 10 American Samoa trend_model 2018 N(12491, 652926) 12491. ## # … with 779 more rows
autoplot()estim %>% forecast(h = 3) %>% filter(Country == "Brazil") %>% autoplot(gdppc) + labs(y = "$US", title = "GDP per capita for Brazil")
bricks <- aus_production %>%
filter_index("1970 Q1" ~ "2004 Q4") %>%
select(Bricks)
filter_index() function is a convenient shorthand for extracting a section of a time series.MEAN(y): Average methodMEAN(y): Average methodbricks %>%
model(MEAN(Bricks))%>%
forecast(h = 20) %>%
autoplot(bricks, level = NULL)
# Level= Null remove the Predict Interval
NAIVE(y): Naïve methodForecasts equal to last observed value.
Forecasts: \(\hat{y}_{T+h|T} =y_T\).
Consequence of efficient market hypothesis.
Naïve forecast is optimal when data follow a random walk (see Section 9.1), these are also called random walk forecasts and the RW() function can be used instead of NAIVE.
NAIVE(y): Naïve methodbricks %>% model(NAIVE(Bricks)) %>% forecast(h = "5 years") %>% autoplot(bricks, level = NULL)
SNAIVE(y~lag(m)): Seasonal naïve methodSNAIVE(y~lag(m)): Seasonal naïve methodbricks %>%
model(SNAIVE(Bricks ~ lag("year"))) %>%
forecast(h = "5 years") %>%
autoplot(bricks, level = NULL) +
labs(title = "Clay brick production in Australia")
bricks %>%
model(SNAIVE(Bricks ~ lag(" 5 year"))) %>%
forecast(h = "5 years") %>%
autoplot(bricks, level = NULL) +
labs(title = "Clay brick production in Australia")
lag() function allows me to decide from where the forecast should be based on. For some time series there is more than one seasonal period, and then the required lag must be specified.RW(y ~ drift()): Drift methodA variation on the naïve method is to allow the forecasts to increase or decrease over tim.
The amount of change over time (called the drift) is set to be the average change seen in the historical data.
Forecasts equal to last value plus average change.
\(\hat{y}_{T+h|T} = y_{T} + \frac{h}{T-1}\sum_{t=2}^T (y_t-y_{t-1}) = y_T + \frac{h}{T-1}(y_T -y_1)\)
bricks %>%
model(RW(Bricks ~ drift())) %>%
forecast(h = "5 years") %>%
autoplot(bricks, level = NULL) +
geom_line(data = slice(bricks, range(cumsum(!is.na(Bricks)))),
aes(y=Bricks), linetype = "dashed", colour = "#0072B2") +
labs(title = "Clay brick production in Australia")
brick_fit <- aus_production %>%
filter_index("1970 Q1" ~ "2004 Q4") %>%
model(
Seasonal_naive = SNAIVE(Bricks),
Naive = NAIVE(Bricks),
Drift = RW(Bricks ~ drift()),
Mean = MEAN(Bricks)
)
## # A mable: 1 x 4 ## Seasonal_naive Naive Drift Mean ## <model> <model> <model> <model> ## 1 <SNAIVE> <NAIVE> <RW w/ drift> <MEAN>
brick_fc <- brick_fit %>% forecast(h = "5 years")
## # A fable: 80 x 4 [1Q] ## # Key: .model [4] ## .model Quarter Bricks .mean ## <chr> <qtr> <dist> <dbl> ## 1 Seasonal_naive 2005 Q1 N(409, 3026) 409 ## 2 Seasonal_naive 2005 Q2 N(423, 3026) 423 ## 3 Seasonal_naive 2005 Q3 N(428, 3026) 428 ## 4 Seasonal_naive 2005 Q4 N(397, 3026) 397 ## # … with 76 more rows
brick_fc %>%
autoplot(aus_production, level = NULL) +
labs(title = "Clay brick production in Australia",
y = "Millions of bricks") +
guides(colour = guide_legend(title = "Forecast"))
# Extract training data apple_stock <- gafa_stock %>% filter(Symbol == "AAPL") %>% mutate(trading_day = row_number()) %>% update_tsibble(index=trading_day, regular=TRUE)
index based on the trading days rather than calendar days.# Specify, estimate and forecast
apple_stock %>%
model(
Mean = MEAN(Close),
Naive = NAIVE(Close),
Drift = RW(Close ~ drift())
) %>%
forecast(h=42) %>%
autoplot(apple_stock, level = NULL) +
labs(title = "Apple closing stock price", y="$US") +
guides(colour=guide_legend(title="Forecast"))
Each observation in a time series can be forecast using all previous observations.
We call these fitted values (\(\hat{y}_{t|t-1}\)) – forecast of \(y_t\) based on observations \(y_1,\dots,y_{t-1}\).
Sometimes drop the subscript: \(\hat{y}_t \equiv \hat{y}_{t|t-1}\).
Fitted values are often not true forecasts because any parameters involved in the forecasting method are estimated using all available observations in the time series.
For example, if we use the mean method, the fitted values are given by
\[ \hat{y}_t = \hat{c}\]
where \(\hat{c}\) is the average computed over all available observations, including those at times after \(t\).
When the estimate involves observations after time \(t\), the fitted values are not true forecasts.
The naïve or seasonal naïve forecasts do not involve any parameters, and so fitted values are true forecasts in such cases.
“Residuals” in a time series model are what is left over after fitting a model.
The residuals are equal to the difference between the observations and the corresponding fitted values: \(e_t = y_t-\hat{y}_{t|t-1}\).
A good forecasting method will yield innovation residuals with the following properties:
The innovation residuals are uncorrelated. If there are correlations between innovation residuals, then there is information left in the residuals which should be used in computing forecasts.
The innovation residuals have zero mean. If they have a mean other than zero, then the forecasts are biased.
Models that do not satisfy these properties can be improved.
However, that does not mean that forecasting methods that satisfy these properties cannot be improved.
Checking these properties is important in order to see whether a method is using all of the available information, but it is not a good way to select a forecasting method.
Adjusting for bias is easy: if the residuals have mean \(m\), then simply subtract \(m\) from all forecasts and the bias problem is solved.
Fixing the correlation problem is harder (Ch 10).
homoscedasticitynormally distributed.It is important to state the distinction between residuals and innovation residuals.
If a transformation has been used in the model, then it is often useful to look at residuals on the transformed scale (innovation residuals)
For example, suppose we modeled the logarithms of the data, \(w_t=log(y_t)\), the innovation residuals would be \(w_t-\hat{w_t}\), whereas the residuals are given by \(y_t-\hat{y_t}\).
The fitted value and residuals can be checked via augment()
apple_stock %>% autoplot(Close)
fit <- apple_stock %>% model(NAIVE(Close)) augment(fit)
## # A tsibble: 1,258 x 7 [1] ## # Key: Symbol, .model [1] ## Symbol .model trading_day Close .fitted .resid .innov ## <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> ## 1 AAPL NAIVE(Close) 1 79.0 NA NA NA ## 2 AAPL NAIVE(Close) 2 77.3 79.0 -1.74 -1.74 ## 3 AAPL NAIVE(Close) 3 77.7 77.3 0.421 0.421 ## 4 AAPL NAIVE(Close) 4 77.1 77.7 -0.556 -0.556 ## 5 AAPL NAIVE(Close) 5 77.6 77.1 0.489 0.489 ## 6 AAPL NAIVE(Close) 6 76.6 77.6 -0.991 -0.991 ## 7 AAPL NAIVE(Close) 7 76.1 76.6 -0.511 -0.511 ## 8 AAPL NAIVE(Close) 8 76.5 76.1 0.399 0.399 ## 9 AAPL NAIVE(Close) 9 78.1 76.5 1.52 1.52 ## 10 AAPL NAIVE(Close) 10 79.6 78.1 1.57 1.57 ## # … with 1,248 more rows
We estimate a naive forecast \(\hat{y}_{t|t-1} = y_{t-1}\)
\(e_t = y_t - \hat{y}_{t|t-1} = y_t-y_{t-1}\)
fitted contains the fitted values;
resid contains the residuals;
innov contains the “innovation residuals” which, in this case, are identical to the regular residuals.
Why is there NAs on the first row of the fit table?
augment(fit) %>% ggplot(aes(x = trading_day)) + geom_line(aes(y = Close, colour = "Data")) + geom_line(aes(y = .fitted, colour = "Fitted"))
augment(fit) %>% filter(trading_day > 1100) %>% ggplot(aes(x = trading_day)) + geom_line(aes(y = Close, colour = "Data")) + geom_line(aes(y = .fitted, colour = "Fitted"))
augment(fit) %>%
autoplot(.resid) +
labs(y = "$US",
title = "Residuals from naïve method")
augment(fit) %>% ggplot(aes(x = .resid)) + geom_histogram(bins = 150) + labs(title = "Histogram of residuals")
augment(fit) %>% ACF(.resid) %>% autoplot() + labs(title = "ACF of residuals")
The naïve method produces forecasts might account for all available information.
The mean of the residuals is close to zero (histogram) but may not be normal — the right tail seems a little too long, even when we ignore the outlier.
There are some significant correlation in the residual series (ACF)
The variation of the residuals does not stay much the same across the historical data (end of the sample)
This forecast method might not be good, and the prediction intervals that are computed assuming a normal distribution may be inaccurate.
gg_tsresiduals() functiongg_tsresiduals(fit)
fb_stock <- gafa_stock %>% filter(Symbol == "FB") %>% mutate(trading_day = row_number()) %>% update_tsibble(index=trading_day, regular=TRUE) fit2 <- fb_stock %>% model(NAIVE(Close)) augment(fit2)
## # A tsibble: 1,258 x 7 [1] ## # Key: Symbol, .model [1] ## Symbol .model trading_day Close .fitted .resid .innov ## <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> ## 1 FB NAIVE(Close) 1 54.7 NA NA NA ## 2 FB NAIVE(Close) 2 54.6 54.7 -0.150 -0.150 ## 3 FB NAIVE(Close) 3 57.2 54.6 2.64 2.64 ## 4 FB NAIVE(Close) 4 57.9 57.2 0.720 0.720 ## 5 FB NAIVE(Close) 5 58.2 57.9 0.310 0.310 ## 6 FB NAIVE(Close) 6 57.2 58.2 -1.01 -1.01 ## 7 FB NAIVE(Close) 7 57.9 57.2 0.720 0.720 ## 8 FB NAIVE(Close) 8 55.9 57.9 -2.03 -2.03 ## 9 FB NAIVE(Close) 9 57.7 55.9 1.83 1.83 ## 10 FB NAIVE(Close) 10 57.6 57.7 -0.140 -0.140 ## # … with 1,248 more rows
gg_tsresiduals(fit2)
We assume that the residuals are white noise (uncorrelated, mean zero, constant variance). If they aren’t, then there is information left in the residuals that should be used in computing forecasts.
So a standard residual diagnostic is to check the ACF of the residuals of a forecasting method.
We expect these to look like white noise.
But we can do a more formal test for autocorrelation:
\(r_k =\) autocorrelation of residual at lag \(k\)
Consider a whole set of \(r_{k}\) values, and develop a test to see whether the set is significantly different from a zero set (instead of doing it individually) - portmanteau test
Box-Pierce test
\[Q = T \sum_{k=1}^\ell r_k^2\] where \(\ell\) is max lag being considered and \(T\) is number of observations.
Ljung-Box test - preferred method
\[Q^* = T(T+2) \sum_{k=1}^\ell (T-k)^{-1}r_k^2\] where \(\ell\) is max lag being considered and \(T\) is number of observations.
My preferences: \(\ell=10\) for non-seasonal data, \(h=2m\) for seasonal data (where \(m\) is seasonal period).
However, the test is not good when \(\ell\) is large, so if these values are larger than \(T/5\), then use \(\ell = T/5\).
The null hypothesis for both test is the residuals are not distinguishable from a white noise series (not significant results).
lag \(= \ell\), dof \(= K\)#Apple Stocks augment(fit) %>% features(.resid, ljung_box, lag=10, dof=0)
## # A tibble: 1 × 4 ## Symbol .model lb_stat lb_pvalue ## <chr> <chr> <dbl> <dbl> ## 1 AAPL NAIVE(Close) 25.8 0.00399
#Facebook Stocks augment(fit2) %>% features(.resid, ljung_box, lag=10, dof=0)
## # A tibble: 1 × 4 ## Symbol .model lb_stat lb_pvalue ## <chr> <chr> <dbl> <dbl> ## 1 FB NAIVE(Close) 12.1 0.276
not distinguishable from a white noise series.fit_drift <- apple_stock %>% model(Drift = RW(Close ~ drift())) augment(fit_drift)
## # A tsibble: 1,258 x 7 [1] ## # Key: Symbol, .model [1] ## Symbol .model trading_day Close .fitted .resid .innov ## <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> ## 1 AAPL Drift 1 79.0 NA NA NA ## 2 AAPL Drift 2 77.3 79.1 -1.80 -1.80 ## 3 AAPL Drift 3 77.7 77.3 0.359 0.359 ## 4 AAPL Drift 4 77.1 77.8 -0.618 -0.618 ## 5 AAPL Drift 5 77.6 77.2 0.426 0.426 ## 6 AAPL Drift 6 76.6 77.7 -1.05 -1.05 ## 7 AAPL Drift 7 76.1 76.7 -0.574 -0.574 ## 8 AAPL Drift 8 76.5 76.2 0.336 0.336 ## 9 AAPL Drift 9 78.1 76.6 1.46 1.46 ## 10 AAPL Drift 10 79.6 78.1 1.50 1.50 ## # … with 1,248 more rows
gg_tsresiduals(fit_drift)
augment(fit_drift) %>% features(.resid, ljung_box, lag=10, dof=1)
## # A tibble: 1 × 4 ## Symbol .model lb_stat lb_pvalue ## <chr> <chr> <dbl> <dbl> ## 1 AAPL Drift 25.8 0.00219
Assuming residuals are normal, uncorrelated, sd = \(\hat\sigma\):
Mean: \(y_{T+h|T} \sim N(\bar{y}, (1 + 1/T)\hat{\sigma}^2)\)
Naïve: \(y_{T+h|T} \sim N(y_T, h\hat{\sigma}^2)\)
Seasonal naïve: & \(y_{T+h|T} \sim N(y_{T+h-m(k+1)}, (k+1)\hat{\sigma}^2)\)
Drift: \(y_{T+h|T} \sim N(y_T + \frac{h}{T-1}(y_T - y_1),h\frac{T+h}{T}\hat{\sigma}^2)\)
where \(k\) is the integer part of \((h-1)/m\).
Note that when \(h=1\) and \(T\) is large, these all give the same approximate forecast variance: \(\hat{\sigma}^2\).
\[\hat{y}_{T+h|T} \pm 1.96 \hat\sigma_h\]
where \(\hat\sigma_h\) is the st dev of the \(h\)-step distribution.
brick_fit <- aus_production %>%
filter_index("1970 Q1" ~ "2004 Q4") %>%
model(Naive = NAIVE(Bricks))
brick_fc <- brick_fit %>%
forecast(h = 1)
brick_fc %>% hilo(level = 95)
## # A tsibble: 1 x 5 [1Q] ## # Key: .model [1] ## .model Quarter Bricks .mean `95%` ## <chr> <qtr> <dist> <dbl> <hilo> ## 1 Naive 2005 Q1 N(397, 1960) 397 [310, 484]95
When forecasting one step ahead, the standard deviation of the forecast distribution can be estimated using the standard deviation of the residuals given by:
\[ \hat{\sigma} = \sqrt{\frac{1}{T-K-M}\sum_{t=1}^{T}e_t^2}\] - K is the number of parameters estimated in the forecasting method,
M is the number of missing values in the residuals. (For example, M = 1 for a naive forecast, because we can’t forecast the first observation.)
When T is large enough, T and K does not matter*
For the four benchmark methods, it is possible to mathematically derive the forecast standard deviation under the assumption of uncorrelated residuals.
If the \(\hat{\sigma}\) is the residual standard deviation (h=1), then \(\hat{\sigma_h}\) is:
brick_fc <- brick_fit %>% forecast(h = 1) brick_fc %>% hilo(level = 95)
## # A tsibble: 1 x 5 [1Q] ## # Key: .model [1] ## .model Quarter Bricks .mean `95%` ## <chr> <qtr> <dist> <dbl> <hilo> ## 1 Naive 2005 Q1 N(397, 1960) 397 [310, 484]95
\(\hat{\sigma} = \sqrt{1960}\)
So \(\hat{\sigma_2} = \sqrt{1960} \times \sqrt{1} = 44.27\)
Interval \(397 \pm 1.96\times44.27 = 397 \pm 86.7 = [310-484]\)
brick_fc <- brick_fit %>% forecast(h = 2) brick_fc %>% hilo(level = 95)
## # A tsibble: 2 x 5 [1Q] ## # Key: .model [1] ## .model Quarter Bricks .mean `95%` ## <chr> <qtr> <dist> <dbl> <hilo> ## 1 Naive 2005 Q1 N(397, 1960) 397 [310, 484]95 ## 2 Naive 2005 Q2 N(397, 3920) 397 [274, 520]95
bricks<-aus_production %>%
filter_index("1970 Q1" ~ "2004 Q4")
bricks%>%
model(Naive = NAIVE(Bricks))%>%
forecast(h = 30) %>%
autoplot(bricks)
When a normal distribution for the residuals is an unreasonable assumption, one alternative is to use bootstrapping (residuals must be uncorrelated with constant variance).
The goal of bootstrap is to create set of futures values (scenarios), and then compute a statistic - In our case, we are interested in the predict intervals.
To create the future value (scenario) for \(y\), we have to forecast it \((y_{T+k})\)
The future value based on the one-step forecast:
\[y_{T+1} = \hat{y}_{T+1|T} + e_{T+1}\]
Assuming future errors will be similar to past errors, we can replace \(e_{T+1}\) by sampling (with replacement) from the collection of errors (residuals).
Adding the new simulated observation to our data set, we can repeat the process to obtain:
\[y_{T+2} = \hat{y}_{T+2|T+1} + e_{T+2}\]
where \(e_{T+2}\) is another draw from the collection of residuals.
Continuing in this way, we can simulate an entire set of future values for our time series.
generate.fit<-bricks%>% model(NAIVE(Bricks)) sim<-fit%>%generate(h=30,times=5, bootstrap = TRUE) sim
## # A tsibble: 150 x 4 [1Q] ## # Key: .model, .rep [5] ## .model Quarter .rep .sim ## <chr> <qtr> <chr> <dbl> ## 1 NAIVE(Bricks) 2005 Q1 1 445. ## 2 NAIVE(Bricks) 2005 Q2 1 427. ## 3 NAIVE(Bricks) 2005 Q3 1 381. ## 4 NAIVE(Bricks) 2005 Q4 1 399. ## 5 NAIVE(Bricks) 2006 Q1 1 373. ## 6 NAIVE(Bricks) 2006 Q2 1 394. ## 7 NAIVE(Bricks) 2006 Q3 1 379. ## 8 NAIVE(Bricks) 2006 Q4 1 346. ## 9 NAIVE(Bricks) 2007 Q1 1 304. ## 10 NAIVE(Bricks) 2007 Q2 1 274. ## # … with 140 more rows
#We generated five possible sample paths (scenarios) for the next 30 trading days (forecast).
bricks %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Bricks)) +
geom_line(aes(y = .sim, colour = as.factor(.rep)),
data = sim) +
guides(colour = "none")
bk <- fit %>% forecast(h = 30, bootstrap = TRUE, times=5000) bk
## # A fable: 30 x 4 [1Q] ## # Key: .model [1] ## .model Quarter Bricks .mean ## <chr> <qtr> <dist> <dbl> ## 1 NAIVE(Bricks) 2005 Q1 sample[5000] 398. ## 2 NAIVE(Bricks) 2005 Q2 sample[5000] 398. ## 3 NAIVE(Bricks) 2005 Q3 sample[5000] 397. ## 4 NAIVE(Bricks) 2005 Q4 sample[5000] 397. ## 5 NAIVE(Bricks) 2006 Q1 sample[5000] 397. ## 6 NAIVE(Bricks) 2006 Q2 sample[5000] 397. ## 7 NAIVE(Bricks) 2006 Q3 sample[5000] 398. ## 8 NAIVE(Bricks) 2006 Q4 sample[5000] 397. ## 9 NAIVE(Bricks) 2007 Q1 sample[5000] 397. ## 10 NAIVE(Bricks) 2007 Q2 sample[5000] 396. ## # … with 20 more rows
Notice that the forecast distribution is now represented as a simulation with 5000 sample paths.
There is no normality assumption, so the prediction intervals are not symmetric.
The .mean column is the mean of the bootstrap samples, so it may be slightly different from the results obtained without a bootstrap.
autoplot(bk, bricks)
level argument to control coverage.eggs <- prices %>%
filter(!is.na(eggs)) %>% select(eggs)
eggs %>% autoplot() +
labs(title="Annual egg prices",
y="$US (adjusted for inflation)")
fit <- eggs %>% model(RW(log(eggs) ~ drift())) fit
## # A mable: 1 x 1 ## `RW(log(eggs) ~ drift())` ## <model> ## 1 <RW w/ drift>
fc <- fit %>% forecast(h = 50) fc
## # A fable: 50 x 4 [1Y] ## # Key: .model [1] ## .model year eggs .mean ## <chr> <dbl> <dist> <dbl> ## 1 RW(log(eggs) ~ drift()) 1994 t(N(4.1, 0.018)) 61.8 ## 2 RW(log(eggs) ~ drift()) 1995 t(N(4.1, 0.036)) 61.4 ## 3 RW(log(eggs) ~ drift()) 1996 t(N(4.1, 0.054)) 61.0 ## 4 RW(log(eggs) ~ drift()) 1997 t(N(4.1, 0.073)) 60.5 ## 5 RW(log(eggs) ~ drift()) 1998 t(N(4.1, 0.093)) 60.1 ## 6 RW(log(eggs) ~ drift()) 1999 t(N(4, 0.11)) 59.7 ## 7 RW(log(eggs) ~ drift()) 2000 t(N(4, 0.13)) 59.3 ## 8 RW(log(eggs) ~ drift()) 2001 t(N(4, 0.15)) 58.9 ## 9 RW(log(eggs) ~ drift()) 2002 t(N(4, 0.17)) 58.6 ## 10 RW(log(eggs) ~ drift()) 2003 t(N(4, 0.19)) 58.2 ## # … with 40 more rows
fc %>% autoplot(eggs) +
labs(title="Annual egg prices",
y="US$ (adjusted for inflation)")
Back-transformed point forecasts are medians.
Back-transformed PI have the correct coverage.
For many purposes, this is acceptable, although the mean is usually preferable ( medians do not add up, whereas means do).
we can adjust the forecast back to the mean (bias adjusted)
fc %>%
autoplot(eggs, level = c(80, 90), point_forecast = lst(mean, median)) +
labs(title="Annual egg prices",
y="US$ (adjusted for inflation)")
\[y_t = \hat{S}_t + \hat{A}_t\]
\(\hat{A}_t\) is seasonally adjusted component \((\hat{A}_t =\hat{T}_t + \hat{R}_t)\)
\(\hat{S}_t\) is seasonal component.
Forecast both term separately.
The seasonal component is (usually) unchanging (or changing extremely slowly) - forecast by simply taking the last year of the estimated component (a seasonal naïve method).
Forecast \(\hat{A}_t\) using non-seasonal time series method.
Combine forecasts of \(\hat{S}_t\) and \(\hat{A}_t\) to get forecasts of original data.
us_retail_employment <- us_employment %>% filter(year(Month) >= 1990, Title == "Retail Trade") %>% select(-Series_ID) autoplot(us_retail_employment)
dcmp <- us_retail_employment %>% model(STL(Employed)) %>% components() %>% select(-.model) dcmp
## # A tsibble: 357 x 6 [1M] ## Month Employed trend season_year remainder season_adjust ## <mth> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 1990 Jan 13256. 13288. -33.0 0.836 13289. ## 2 1990 Feb 12966. 13269. -258. -44.6 13224. ## 3 1990 Mar 12938. 13250. -290. -22.1 13228. ## 4 1990 Apr 13012. 13231. -220. 1.05 13232. ## 5 1990 May 13108. 13211. -114. 11.3 13223. ## 6 1990 Jun 13183. 13192. -24.3 15.5 13207. ## 7 1990 Jul 13170. 13172. -23.2 21.6 13193. ## 8 1990 Aug 13160. 13151. -9.52 17.8 13169. ## 9 1990 Sep 13113. 13131. -39.5 22.0 13153. ## 10 1990 Oct 13185. 13110. 61.6 13.2 13124. ## # … with 347 more rows
dcmp %>% model(NAIVE(season_adjust)) %>% forecast() %>% autoplot(dcmp) + labs(title = "Naive forecasts of seasonally adjusted data")
The previous graphs shows naïve forecasts of the seasonally adjusted US retail employment data.
These are then “reseasonalised” by adding in the seasonal naïve forecasts of the seasonal component.
We can use the function decomposition_model() instead - compute forecasts via any additive decomposition, using model functions to forecast each of the decomposition’s components.
Seasonal components will be forecast automatically using SNAIVE() (standard)
The function will also reseasonalising for you.
The variances from both the seasonally adjusted and seasonal forecasts are combined (predict intervals)
us_retail_employment %>%
model(stlf = decomposition_model(
STL(Employed ~ trend(window = 7), robust = TRUE),
NAIVE(season_adjust)
)) %>%
forecast() %>%
autoplot(us_retail_employment)
It is important to evaluate forecast accuracy using genuine forecasts.
The accuracy of forecasts comes from a model performs on new data that were not used when fitting the model.
A model which fits the training data well will not necessarily forecast well.
A perfect fit can always be obtained by using a model with enough parameters.
Over-fitting a model to data is just as bad as failing to identify a systematic pattern in the data.
So we divide the data into two sub-sample - training and test.
The test set must not be used for any aspect of model development or calculation of forecasts.
Forecast accuracy is based only on the test set.
It should be at least as large as the maximum forecast horizon required
Forecast “error”: the difference between an observed value and its forecast.
\[ e_{T+h} = y_{T+h} - \hat{y}_{T+h|T}, \]
train <- aus_production %>%
filter(between(year(Quarter), 1992, 2007))
beer <- aus_production %>%
filter(year(Quarter) >= 1992)
beer_fc_plot <- train %>%
model(
Mean = MEAN(Beer),
Naive = NAIVE(Beer),
Seasonal_naive = SNAIVE(Beer),
Drift = RW(Beer ~ drift())
) %>% forecast(h=10) %>% autoplot(beer, level = NULL) +
labs(title = "Forecasts for quarterly beer production",
y = "Megalitres") +
guides(colour=guide_legend(title="Forecast"))
beer_fc_plot
What is the most accurate model? How can we tell?
For forecast (forecast errors as well) on the same scale data, we look at:
Mean Absolute error: \(MAE = mean(|e_t|)\)
Root mean squared error: \(RMSE = \sqrt{mean({e_t}^2)}\)
Mean absolute percentage error: \[MAPE= 100\times mean(|e_{T+h}|/ |y_{T+h}|)\]
Mean Absolute Scaled Error \[ MASE = \text{mean}(|e_{T+h}|/Q) \] where \(Q\) is a stable measure of the scale of the time series \(\{y_t\}\).
For non-seasonal time series, \[ Q = (T-1)^{-1}\sum_{t=2}^T |y_t-y_{t-1}| \]
works well. Then MASE is equivalent to MAE relative to a naïve method.
Mean Absolute Scaled Error
\[ MASE = mean(|e_{T+h}|/Q) \]
where \(Q\) is a stable measure of the scale of the time series \(\{y_t\}\).
For seasonal time series, \[ Q = (T-m)^{-1}\sum_{t=m+1}^T |y_t-y_{t-m}| \]
Then MASE is equivalent to MAE relative to a seasonal naïve method.
If it is less than one - Better forecast than the average one-step naïve forecast. If it is more than one - Worse than the average one-step naïve forecast
recent_production <- aus_production %>%
filter(year(Quarter) >= 1992)
train <- recent_production %>%
filter(year(Quarter) <= 2007)
beer_fit <- train %>%
model(
Mean = MEAN(Beer),
Naive = NAIVE(Beer),
Seasonal_naive = SNAIVE(Beer),
Drift = RW(Beer ~ drift())
)
beer_fc <- beer_fit %>%
forecast(h = 10)
accuracy(beer_fit) #provides the accuracy based on the residual (training set)
## # A tibble: 4 × 6 ## .model .type RMSE MAE MAPE MASE ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Drift Training 65.3 54.8 12.2 3.83 ## 2 Mean Training 43.6 35.2 7.89 2.46 ## 3 Naive Training 65.3 54.7 12.2 3.83 ## 4 Seasonal_naive Training 16.8 14.3 3.31 1
accuracy(beer_fc, recent_production) #provides the accuracy of the prediction error, based on the test set.
## # A tibble: 4 × 6 ## .model .type RMSE MAE MAPE MASE ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Drift Test 64.9 58.9 14.6 4.12 ## 2 Mean Test 38.4 34.8 8.28 2.44 ## 3 Naive Test 62.7 57.4 14.2 4.01 ## 4 Seasonal_naive Test 14.3 13.4 3.17 0.937
Quantile scoresWinkler ScoreContinuous Ranked Probability Score or CRPSA more sophisticated version of training/test sets is time series cross-validation.
There are a series of test sets, each consisting of a single observation.
The corresponding training set consists only of all observations that occurred prior to the observation that forms the test set.
Traditional evaluation
Time series cross-validation
Forecast accuracy averaged over test sets.
Also known as “evaluation on a rolling forecasting origin”
We can also produce cross-validation for multi-step forecast (\(h=2,3,4,...\))
Traditional evaluation
Time series cross-validation
Traditional evaluation
Time series cross-validation
Traditional evaluation
Time series cross-validation
Let’s create the training sets using the function stretch_tsibble.
Stretch with a minimum length of 3, growing by 1 each step.
fb_stretch <- fb_stock %>% stretch_tsibble(.init = 3, .step = 1)
## # A tsibble: 790,650 x 4 [1] ## # Key: .id [1,255] ## Date Close trading_day .id ## <date> <dbl> <int> <int> ## 1 2014-01-02 54.7 1 1 ## 2 2014-01-03 54.6 2 1 ## 3 2014-01-06 57.2 3 1 ## 4 2014-01-02 54.7 1 2 ## 5 2014-01-03 54.6 2 2 ## 6 2014-01-06 57.2 3 2 ## 7 2014-01-07 57.9 4 2 ## # … with 790,643 more rows
Estimate RW w/ drift models for each window.
fit_cv <- fb_stretch %>% model(RW(Close ~ drift()))
## # A mable: 1,255 x 3 ## # Key: .id, Symbol [1,255] ## .id Symbol `RW(Close ~ drift())` ## <int> <chr> <model> ## 1 1 FB <RW w/ drift> ## 2 2 FB <RW w/ drift> ## 3 3 FB <RW w/ drift> ## 4 4 FB <RW w/ drift> ## # … with 1,251 more rows
Produce one step ahead forecasts from all models.
fc_cv <- fit_cv %>% forecast(h=1)
## # A fable: 1,255 x 5 [1] ## # Key: .id, Symbol [1,255] ## .id Symbol trading_day Close .mean ## <int> <chr> <dbl> <dist> <dbl> ## 1 1 FB 4 N(58, 3.9) 58.4 ## 2 2 FB 5 N(59, 2) 59.0 ## 3 3 FB 6 N(59, 1.5) 59.1 ## 4 4 FB 7 N(58, 1.8) 57.7 ## # … with 1,251 more rows
# Cross-validated fc_cv %>% accuracy(fb_stock) # Training set fb_stock %>% model(RW(Close ~ drift())) %>% accuracy()
| RMSE | MAE | MAPE | |
|---|---|---|---|
| Cross-validation | 2.418 | 1.469 | 1.266 |
| Training | 2.414 | 1.465 | 1.261 |
As expected, the accuracy measures from the residuals are smaller, as the corresponding “forecasts” are based on a model fitted to the entire data set, rather than being true forecasts.
A good way to choose the best forecasting model is to find the model with the smallest RMSE computed using time series cross-validation.