2022-08-12
\(\alpha\) controls the flexibility of the level
\(\beta\) controls the flexibility of the trend
\(\gamma\) controls the flexibility of the seasonality
Time series \(y_1,y_2,\dots,y_T\).
Random walk forecasts (Naive) - most recent observation is the only important one
\(\hat{y}_{T+h|T}= y_T\)
Average forecasts - all observations are of equal importance
\(\hat{y}_{T+h|T}= \frac1T\sum_{t=1}^T y_t\)
Want something in between these methods.
Most recent data should have more weight.
Simple exponential smoothing uses a weighted moving average with weights that decrease exponentially
\[\hat{y}_{T+h|T} = \alpha y_T + \alpha(1-\alpha) y_{T-1} + \alpha(1-\alpha)^2 y_{T-2}+ \cdots\]
where \(0 \le \alpha \le1\).
where the weights decrease exponentially as observations come from further in the past
The rate at which the weights decrease is controlled by the parameter \(\alpha\).
For the simple exponential smoothing, the only component included is the level (\(\ell_{t}\))
It is defined by two equations:
\[\text{Forecast equation} - \hat{y}_{t+h|t} = \ell_{t}\]
\[\text{Smoothing equation} - \ell_{t} = \alpha y_{t} + (1 - \alpha)\ell_{t-1}\]
\(\ell_t\) is the level (or the smoothed value) of the series at time t.
\(\hat{y}_{t+1|t} = \alpha y_t + (1-\alpha) \hat{y}_{t-1|t}\)
Iterate to get exponentially weighted moving average form:
Weighted average form
\[\hat{y}_{t+1|T}=\sum_{j=0}^{T-1} \alpha(1-\alpha)^j y_{T-j}+(1-\alpha)^T \ell_{0}\] - the last term goes to zero as T increases!
\[\hat{y}_{T+h|T} = \hat{y}_{T+1|T} = \ell_{T}\]
All forecasts take the same value, equal to the last level component.
Remember that these forecasts will only be suitable if the time series has no trend or seasonal component.
algeria_economy <- global_economy %>%
filter(Country == "Algeria")
fit <- algeria_economy %>%
model(ETS(Exports ~ error("A") + trend("N") + season("N")))
algeria_economy%>% autoplot(Exports)+
geom_line(aes(y =.fitted), col="#D55E00", data = augment(fit))
one-step training errors subject to the restriction that \(0 \le \alpha \le1\)For Algerian Exports example:
\(\hat\alpha = 0.84\)
\(\hat\ell_0 = 39.54\)
When t=1
Observation \(y_t\) = \(y_1 = 39.4\) - Real data
Level \(\hat\ell_t\) = \(\hat\ell_1 = 0.84 \times 39.4 + 0.16 \times 39.54 = 39.12\)
Forecast \(\hat{y}_{t|t-1}\) = \(\hat{y}_{1|0} = \hat\ell_0 = 39.54\)
For Algerian Exports example:
\(\hat\alpha = 0.84\)
\(\hat\ell_1 = 39.12\)
When $t=2
\(y_2 = 46.24\) - Real data
\(\hat\ell_2 = 0.84 \times 46.24 + 0.16 \times 39.12 = 45.1\)
\(\hat{y}_{2|1} = \hat\ell_1 = 39.12\)
For Algerian Exports example:
\(\hat\alpha = 0.84\)
\(\hat\ell_2 = 45.1\)
When t=3
\(y_3 = 19.79\) - Real data
\(\hat\ell_3 = 0.84 \times 19.79 + 0.16 \times 45.1 = 23.84\)
\(\hat{y}_{3|2} = \hat\ell_2 = 45.1\)
When t=58 - End of the sample
\(y_{58} = 22.64\) - Real data
\(\hat\ell_{58} = 0.84 \times 22.64 + 0.16 \times 21.43 = 22.44\)
\(\hat{y}_{58|57} = \hat\ell_{57} = 21.43\)
\(\hat{y}_{59|58} = \hat\ell_{58} = 22.44\) \((h=1)\)
\(\hat{y}_{60|58} = \hat\ell_{58} = 22.44\) \((h=2)\)
\(\hat{y}_{61|58} = \hat\ell_{58} = 22.44\) \((h=3)\)
fit %>%
forecast(h = 3)%>%
autoplot(algeria_economy) +
geom_line(aes(y = .fitted), col="#D55E00",
data = augment(fit)) +
labs(y="% of GDP", title="Exports: Algeria") +
guides(colour = "none")
Holt (1957) expanded the simple exponential smoothing to allow forecasting of data with a trend.
This method involves a forecast equation and two smoothing equations
\[\text{Forecast equation} - \hat{y}_{t+h|t} = \ell_{t} + hb_t\]
\[\text{Level equation} - \ell_{t} = \alpha y_{t} + (1 - \alpha)(\ell_{t-1}+b_{t-1})\]
\[\text{Trend equation} - b_{t} = \beta^*(\ell_{t} - \ell_{t-1}) + (1 -\beta^*)b_{t-1}\]
Two smoothing parameters \(\alpha\)(level) and \(\beta^*\)(trend) (\(0\le\alpha,\beta^*\le1\)).
\(\ell_t\) level: weighted average between \(y_t\) and one-step ahead forecast for time \(t\), \((\ell_{t-1} + b_{t-1}=\hat{y}_{t|t-1})\)
\(b_t\) slope: weighted average of \((\ell_{t} - \ell_{t-1})\) and \(b_{t-1}\), current and previous estimate of slope.
Choose \(\alpha, \beta^*, \ell_0, b_0\) to minimize SSE.
The forecast function is no longer flat but trending.
The h-step-ahead forecast is equal to the last estimated level plus h times the last estimated trend value.
Hence the forecasts are a linear function of h.
aus_economy <- global_economy %>% filter(Code == "AUS") %>% mutate(Pop = Population / 1e6) autoplot(aus_economy, Pop) + labs(y = "Millions", title = "Australian population")
The smoothing parameters \(\alpha, \beta^*, \ell_0, b_0\) are estimated by minimising the SSE for the one-step training errors over the sample period (\(t=1,2, ..., 58\))
fit <- aus_economy %>%
model(
AAN = ETS(Pop ~ error("A") + trend("A") + season("N"))
)
fc <- fit %>% forecast(h = 10)
For Australian Population:
\(\hat\alpha = 0.9999\); \(\beta^* = 0.3267\)
\(\hat\ell_0 = 10.05\); \(b_0 = 0.22\)
When t=1 and h=0
\(y_1 = 10.28\) - Real data
\(\hat\ell_1 = 0.999 \times 10.28 + 0.001 \times (10.05 + 0.22)= 10.28\)
\(\hat{b}_1 = 0.3267*(10.28-10.05)+(1-0.3267) \times 0.22 = 0.223\)
\(\hat{y}_{1|0} = \hat\ell_0 + hb_0 = 10.05 + 1 \times 0.22 = 10.28\)
\(\hat\alpha = 0.9999\); \(\beta^* = 0.3267\)
\(\hat\ell_1 = 10.28\); \(\hat{b_1} = 0.223\)
When $t=2
\(y_2 = 10.48\) - Real data
\(\hat\ell_2 = 0.999 \times 10.48 + 0.001 \times (10.28 + 0.223)= 10.48\)
\(\hat{b}_2 = 0.3267*(10.48-10.28)+(1-0.3267) \times 0.223 = 0.225\)
\(\hat{y}_{2|1} = \hat\ell_1 + h\hat{b}_1 = 10.28 + 1 \times 0.225 = 10.50\)
When t=58 - End of the sample
\(y_{58} = 24.60\) - Real data
\(\hat\ell_{58} = 0.999 \times 24.60 + 0.001 \times (24.21 + 0.36)= 24.60\)
\(\hat{b}_{58} = 0.3267*(24.60-24.21)+(1-0.3267) \times 0.36 = 0.37\)
\(\hat{y}_{58|57} = \hat\ell_{57} + h\hat{b}_{57} = 24.21 + 1 \times 0.36 = 24.57\)
\(\hat{y}_{59|58} = \hat\ell_{58} + h\hat{b}_{58} = 24.60 + 1 \times 0.37 = 24.97\) \((h=1)\)
\(\hat{y}_{60|58} = \hat\ell_{58} + h\hat{b}_{58} = 24.60 + 2 \times 0.37 = 25.34\) \((h=2)\)
\(\hat{y}_{60|58} = \hat\ell_{58} + h\hat{b}_{58} = 24.60 + 3 \times 0.37 = 25.71\) \((h=3)\)
The forecasts generated by Holt’s linear method display a constant trend (increasing or decreasing) indefinitely into the future.
It might overforecast for longer periods.
\(\hat{y}_{t+h|t}= \ell_{t} + (\phi+\phi^2 + \dots + \phi^{h})b_{t}\)
\(\ell_{t} = \alpha y_{t} + (1 - \alpha)(\ell_{t-1} + \phi b_{t-1})\)
\(b_{t} = \beta^*(\ell_{t} - \ell_{t-1}) + (1 -\beta^*)\phi b_{t-1}\)
aus_economy %>%
model(
`Holt's method` = ETS(Pop ~ error("A") + trend("A") + season("N")),
`Damped Holt's method` = ETS(Pop ~ error("A") +
trend("Ad", phi = 0.9) + season("N"))) %>%
forecast(h = 30) %>%
autoplot(aus_economy, level = NULL) +
labs(title = "Australian population",
y = "Millions") +
guides(colour = guide_legend(title = "Forecast"))
aus_economy %>% autoplot(Pop)
aus_economy %>%
stretch_tsibble(.init = 10) %>%
model(
SES = ETS(Pop ~ error("A") + trend("N") + season("N")),
Holt = ETS(Pop ~ error("A") + trend("A") + season("N")),
Damped = ETS(Pop ~ error("A") + trend("Ad") +
season("N"))
) %>%
forecast(h = 1) %>%
accuracy(aus_economy)
## # A tibble: 3 × 11 ## .model Country .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1 ## <chr> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 Damped Austra… Test 0.0215 0.0783 0.0552 0.114 0.333 0.220 0.298 0.0565 ## 2 Holt Austra… Test 0.0145 0.0768 0.0534 0.0779 0.325 0.213 0.292 0.439 ## 3 SES Austra… Test 0.257 0.270 0.257 1.44 1.44 1.02 1.03 0.521
fit<-aus_economy %>%
model(Holt = ETS(Pop ~ error("A") + trend("A") + season("N")))
tidy(fit)
## # A tibble: 4 × 4 ## Country .model term estimate ## <fct> <chr> <chr> <dbl> ## 1 Australia Holt alpha 1.00 ## 2 Australia Holt beta 0.327 ## 3 Australia Holt l[0] 10.1 ## 4 Australia Holt b[0] 0.222
fit%>%forecast(h = 10) %>% autoplot(aus_economy)
Holt and Winters extended Holt’s method to capture seasonality.
The Holt-Winters seasonal method comprises the forecast equation and three smoothing equations
There are two variations to this method that differ in the nature of the seasonal component.
The additive method is preferred when the seasonal variations are roughly constant through the series.
the Multiplicative method is preferred when the seasonal variations are changing proportional to the level of the series.
Holt-Winters’ additive method
\[ \begin{align} \hat{y}_{t+h|h} &= \ell_{t} + hb _{t} + s_{t+h-m(k+1)} \\ \ell_{t} &= \alpha(y_{t} - s_{t-m}) + (1 - \alpha)(\ell_{t-1} + b_{t-1})\\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 - \beta^*)b_{t-1}\\ s_{t} &= \gamma (y_{t}-\ell_{t-1}-b_{t-1}) + (1-\gamma)s_{t-m} \end{align} \]
\(k=\) integer part of \((h-1)/m\). Ensures estimates from the final year are used for forecasting.
Parameters:\(0\le \alpha\le 1\), \(0\le \beta^*\le 1\), \(0\le \gamma\le 1-\alpha\) and \(m=\) period of seasonality (e.g. \(m=4\) for quarterly data).
The level equation shows a weighted average between the seasonally adjusted observation \((y_{t} - s_{t-m})\) and the
the non-seasonal forecast \((\ell_{t-1} + b_{t-1})\)
The trend equation is identical to Holt’s linear method.
The seasonal equation shows a weighted average between the current seasonal index, \((y_{t}-\ell_{t-1}-b_{t-1})\), and the seasonal index of the same season last year (i.e., m time periods ago).
Seasonal variations change in proportion to the level of the series.
\[ \begin{align*} \hat{y}_{t+h|t} &= (\ell_{t} + hb_{t})s_{t+h-m(k+1)} \\ \ell_{t} &= \alpha \frac{y_{t}}{s_{t-m}} + (1 - \alpha)(\ell_{t-1} + b_{t-1})\\ b_{t} &= \beta^*(\ell_{t}-\ell_{t-1}) + (1 - \beta^*)b_{t-1} \\ s_{t} &= \gamma \frac{y_{t}}{(\ell_{t-1} + b_{t-1})} + (1 - \gamma)s_{t-m} \end{align*} \]
Additive method: \(s_t\) in absolute terms — within each year \(\sum_i s_i \approx 0\).
Multiplicative method: \(s_t\) in relative terms — within each year \(\sum_i s_i \approx m\).
aus_holidays <- tourism %>%
filter(Purpose == "Holiday") %>%
summarise(Trips = sum(Trips))
fit <- aus_holidays %>%
model(
additive = ETS(Trips ~ error("A") + trend("A") + season("A")),
multiplicative = ETS(Trips ~ error("M") + trend("A") + season("M"))
)
fc <- fit %>% forecast()
fc %>% autoplot(aus_holidays, level = NULL) + labs(y = "Thousands", title = "Overnight trips")
tidy(fit)%>% filter(.model == "additive")
## # A tibble: 9 × 3 ## .model term estimate ## <chr> <chr> <dbl> ## 1 additive alpha 0.236 ## 2 additive beta 0.0298 ## 3 additive gamma 0.000100 ## 4 additive l[0] 9899. ## 5 additive b[0] -37.4 ## 6 additive s[0] -538. ## 7 additive s[-1] -684. ## 8 additive s[-2] -290. ## 9 additive s[-3] 1512.
tidy(fit)%>% filter(.model == "multiplicative")
## # A tibble: 9 × 3 ## .model term estimate ## <chr> <chr> <dbl> ## 1 multiplicative alpha 0.186 ## 2 multiplicative beta 0.0248 ## 3 multiplicative gamma 0.000100 ## 4 multiplicative l[0] 9853. ## 5 multiplicative b[0] -33.4 ## 6 multiplicative s[0] 0.943 ## 7 multiplicative s[-1] 0.926 ## 8 multiplicative s[-2] 0.970 ## 9 multiplicative s[-3] 1.16
Damping is possible with both additive and multiplicative Holt-Winters’ methods. A
One of the most accurate forecasting method for seasonal data is the Holt-Winters method with a damped trend and multiplicative seasonality:
\[ \begin{align*} \hat{y}_{t+h|t} &= [\ell_{t} + (\phi+\phi^2 + \dots + \phi^{h})b_{t}]s_{t+h-m(k+1)} \\ \ell_{t} &= \alpha(y_{t} / s_{t-m}) + (1 - \alpha)(\ell_{t-1} + \phi b_{t-1})\\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 - \beta^*)\phi b_{t-1} \\ s_{t} &= \gamma \frac{y_{t}}{(\ell_{t-1} + \phi b_{t-1})} + (1 - \gamma)s_{t-m} \end{align*} \]
sth_cross_ped <- pedestrian %>%
filter(Date >= "2016-07-01",
Sensor == "Southern Cross Station") %>%
index_by(Date) %>%
summarise(Count = sum(Count)/1000)
sth_cross_ped %>%
filter(Date <= "2016-07-31") %>%
model(hw = ETS(Count ~ error("M") + trend("Ad") + season("M"))) %>%
forecast(h = "2 weeks") %>%
autoplot(sth_cross_ped %>% filter(Date <= "2016-08-14")) +
labs(title = "Daily traffic: Southern Cross",
y="Pedestrians ('000)")
So far we have looked at the methods (algorithms) of exponential smoothing for point forecasts - not at statistical models !!!
So let’s focus on the statistical models that underlie these algorithms
These statistical models generate the same point forecasts, but can also generate prediction (or forecast) intervals.
As we have seen before, it embodies a set of statistical assumptions concerning the generation of sample data (idealizing the true data-generating). process.
As such, a statistical model is a mathematical relationship between one or more random (stochastic) variables and other non-random variables.
Each model consists of two main parts:
Measurement equation that describes the observed data;
Some state equations that describe how the unobserved components or states (level, trend, seasonal) change over time.
Additionally, for each method there exist two models:
identical if they use the same smoothing parameter values. They will, however, generate different prediction intervals.To correctly classify models with additive and multiplicative errors, and distiguish the models from the methods, for now on we will label the models as ETS() - Error, Trend, Seasonal.
Using the previous notation from the methods, the possibilities for each component (or state) are:
Error ={A,M}
Trend = {N,A,\(A_d\)}
Seasonal = {N,A,M}
\[\text{Forecast equation} - \hat{y}_{t+h|t} = \ell_{t}\]
\[\text{Smoothing equation} - \ell_{t} = \alpha y_{t} + (1 - \alpha)\ell_{t-1}\]
error correction form:\[ \begin{align} \ell_{t} = \ell_{t-1}+\alpha(y_{t} -\ell_{t-1})\\ = \ell_{t-1} +\alpha e_{t} \end{align} \] Where \(e_{t} = y_t - \ell_{t-1} = y_t - \hat{y}_{t|t-1} \text{is ther residual at time t}\)
The training data errors lead to the adjustment of the estimated level throughout the smoothing process for \(t=1,..., T\)
For example, if the error at time t is negative, then \(y_t < \hat{y}_{t|t-1}\) and so the level at time \(t-1\) has been over-estimated.
The new level \(\ell_{t}\) is then the previous level \(\ell_{t-1}\) adjusted downwards.
The closer \(\alpha\) is to one, the “rougher” the estimate of the level, but smaller \(\alpha\), the “smoother” the level.
\(y_t = \ell_{t-1} +e_t \rightarrow\) each observation can be represented by the previous level plus an error.
To make this into an innovations state space model, all we need to do is specify the probability distribution for \(e_t\)
For a model with additive errors, we assume that residuals (the one-step training errors) are \(e_t = \varepsilon_t\sim\text{NID}(0,\sigma^2)\). (NID stands for “normally and independently distributed”)
\[ \begin{align*} \text{Measurement equation}&& y_t &= \ell_{t-1} + \varepsilon_t\\ \text{State equation}&& \ell_t&=\ell_{t-1}+\alpha \varepsilon_t \end{align*} \]
\[ \begin{align*} \text{Measurement equation}&& y_t &= \ell_{t-1} + \varepsilon_t\\ \text{State equation}&& \ell_t&=\ell_{t-1}+\alpha \varepsilon_t \end{align*} \]
These two equations, together with the statistical distribution of the errors, form a fully specified statistical model.
Specifically, these constitute an innovations state space model underlying simple exponential smoothing.
“innovations” or “single source of error” because equations have the same error process, \(\varepsilon_t\).
Therefore, the multiplicative form of the state space model as: \[ \begin{align*} \text{Measurement equation}&& y_t &= \ell_{t-1}(1 + \varepsilon_t)\\ \text{State equation}&& \ell_t&=\ell_{t-1}(1+\alpha \varepsilon_t) \end{align*} \]
Holt’s linear method with additive errors.
Assume \(\varepsilon_t=y_t-\ell_{t-1}-b_{t-1} \sim \text{NID}(0,\sigma^2)\).
Substituting into the error correction equations for Holt’s linear method
\[ \begin{align*} y_t&=\ell_{t-1}+b_{t-1}+\varepsilon_t\\ \ell_t&=\ell_{t-1}+b_{t-1}+\alpha \varepsilon_t\\ b_t&=b_{t-1}+\alpha\beta^* \varepsilon_t \end{align*} \] * For simplicity, set \(\beta=\alpha \beta^*\).
Holt’s linear method with multiplicative errors.
where again \(\beta=\alpha \beta^*\) and \(\varepsilon_t \sim \text{NID}(0,\sigma^2)\).
An alternative to estimating the smoothing parameters \(\alpha\), \(\beta\), \(\gamma\) and \(\phi\), and the initial states \(\ell_0\), \(b_0\), \(s_0,s_{-1},\dots,s_{-m+1}\) by minimizing the sum of squared errors is to maximize the “likelihood”.
The likelihood is the probability of the data arising from the specified model. Thus, a large likelihood is associated with a good model.
For an additive error model, maximizing the likelihood (assuming normally distributed errors) gives the same results as minimizing the sum of squared errors.
However, different results will be obtained for multiplicative error models.
Let \(x_t = (\ell_t, b_t, s_t, s_{t-1}, \dots, s_{t-m+1})\) and \(\varepsilon_t{\sim}\mbox{N}(0,\sigma^2)\).
\[ \begin{align*} L^*(\theta,x_0) &= T\log\!\bigg(\sum_{t=1}^T \varepsilon^2_t\!\bigg) + 2\sum_{t=1}^T \log|k(x_{t-1})|\\ &= -2\log(\text{Likelihood}) + \mbox{constant} \end{align*} \]
Usual region
Admissible region
Akaike's Information Criterion \[\text{AIC} = -2\log(\text{L}) + 2k\] where \(L\) is the likelihood and \(k\) is the total number of parameters and initial states that have been estimated (including the residual variance).
Corrected AIC
\[\text{AIC}_{\text{c}} = \text{AIC} + \frac{2k(k+1)}{T-k-1}\]
which is the AIC corrected (for small sample bias).
Bayesian Information Criterion \[ \text{BIC} = \text{AIC} + k[\log(T)-2]\]
Minimizing the AIC assuming Gaussian residuals is asymptotically equivalent to minimizing one-step time series cross validation MSE.
From Hyndman et al. (IJF, 2002):
Method performed very well in M3 competition.
fit <- global_economy %>% mutate(Pop = Population / 1e6) %>% model(ets = ETS(Pop)) fit
## # A mable: 263 x 2 ## # Key: Country [263] ## Country ets ## <fct> <model> ## 1 Afghanistan <ETS(A,A,N)> ## 2 Albania <ETS(M,A,N)> ## 3 Algeria <ETS(M,A,N)> ## 4 American Samoa <ETS(M,A,N)> ## 5 Andorra <ETS(M,A,N)> ## 6 Angola <ETS(M,A,N)> ## 7 Antigua and Barbuda <ETS(M,A,N)> ## 8 Arab World <ETS(M,A,N)> ## 9 Argentina <ETS(A,A,N)> ## 10 Armenia <ETS(M,A,N)> ## # … with 253 more rows
fit %>% forecast(h = 5)
## # A fable: 1,315 x 5 [1Y] ## # Key: Country, .model [263] ## Country .model Year Pop .mean ## <fct> <chr> <dbl> <dist> <dbl> ## 1 Afghanistan ets 2018 N(36, 0.012) 36.4 ## 2 Afghanistan ets 2019 N(37, 0.059) 37.3 ## 3 Afghanistan ets 2020 N(38, 0.16) 38.2 ## 4 Afghanistan ets 2021 N(39, 0.35) 39.0 ## 5 Afghanistan ets 2022 N(40, 0.64) 39.9 ## 6 Albania ets 2018 N(2.9, 0.00012) 2.87 ## 7 Albania ets 2019 N(2.9, 6e-04) 2.87 ## 8 Albania ets 2020 N(2.9, 0.0017) 2.87 ## 9 Albania ets 2021 N(2.9, 0.0036) 2.86 ## 10 Albania ets 2022 N(2.9, 0.0066) 2.86 ## # … with 1,305 more rows
holidays <- tourism %>% filter(Purpose == "Holiday") fit <- holidays %>% model(ets = ETS(Trips)) fit
## # A mable: 76 x 4 ## # Key: Region, State, Purpose [76] ## Region State Purpose ets ## <chr> <chr> <chr> <model> ## 1 Adelaide South Australia Holiday <ETS(A,N,A)> ## 2 Adelaide Hills South Australia Holiday <ETS(A,A,N)> ## 3 Alice Springs Northern Territory Holiday <ETS(M,N,A)> ## 4 Australia's Coral Coast Western Australia Holiday <ETS(M,N,A)> ## 5 Australia's Golden Outback Western Australia Holiday <ETS(M,N,M)> ## 6 Australia's North West Western Australia Holiday <ETS(A,N,A)> ## 7 Australia's South West Western Australia Holiday <ETS(M,N,M)> ## 8 Ballarat Victoria Holiday <ETS(M,N,A)> ## 9 Barkly Northern Territory Holiday <ETS(A,N,A)> ## 10 Barossa South Australia Holiday <ETS(A,N,N)> ## # … with 66 more rows
fit %>% filter(Region == "Snowy Mountains") %>% report()
## Series: Trips ## Model: ETS(M,N,A) ## Smoothing parameters: ## alpha = 0.157 ## gamma = 1e-04 ## ## Initial states: ## l[0] s[0] s[-1] s[-2] s[-3] ## 142 -61 131 -42.2 -27.7 ## ## sigma^2: 0.0388 ## ## AIC AICc BIC ## 852 854 869
fit %>% filter(Region == "Snowy Mountains") %>% components(fit)
## # A dable: 84 x 9 [1Q] ## # Key: Region, State, Purpose, .model [1] ## # : Trips = (lag(level, 1) + lag(season, 4)) * (1 + remainder) ## Region State Purpose .model Quarter Trips level season remainder ## <chr> <chr> <chr> <chr> <qtr> <dbl> <dbl> <dbl> <dbl> ## 1 Snowy Mountai… New … Holiday ets 1997 Q1 NA NA -27.7 NA ## 2 Snowy Mountai… New … Holiday ets 1997 Q2 NA NA -42.2 NA ## 3 Snowy Mountai… New … Holiday ets 1997 Q3 NA NA 131. NA ## 4 Snowy Mountai… New … Holiday ets 1997 Q4 NA 142. -61.0 NA ## 5 Snowy Mountai… New … Holiday ets 1998 Q1 101. 140. -27.7 -0.113 ## 6 Snowy Mountai… New … Holiday ets 1998 Q2 112. 142. -42.2 0.154 ## 7 Snowy Mountai… New … Holiday ets 1998 Q3 310. 148. 131. 0.137 ## 8 Snowy Mountai… New … Holiday ets 1998 Q4 89.8 148. -61.0 0.0335 ## 9 Snowy Mountai… New … Holiday ets 1999 Q1 112. 147. -27.7 -0.0687 ## 10 Snowy Mountai… New … Holiday ets 1999 Q2 103. 147. -42.2 -0.0199 ## # … with 74 more rows
fit %>% filter(Region == "Snowy Mountains") %>% components(fit) %>% autoplot()
fit %>% forecast()
## # A fable: 608 x 7 [1Q] ## # Key: Region, State, Purpose, .model [76] ## Region State Purpose .model Quarter Trips .mean ## <chr> <chr> <chr> <chr> <qtr> <dist> <dbl> ## 1 Adelaide South Australia Holiday ets 2018 Q1 N(210, 457) 210. ## 2 Adelaide South Australia Holiday ets 2018 Q2 N(173, 473) 173. ## 3 Adelaide South Australia Holiday ets 2018 Q3 N(169, 489) 169. ## 4 Adelaide South Australia Holiday ets 2018 Q4 N(186, 505) 186. ## 5 Adelaide South Australia Holiday ets 2019 Q1 N(210, 521) 210. ## 6 Adelaide South Australia Holiday ets 2019 Q2 N(173, 537) 173. ## 7 Adelaide South Australia Holiday ets 2019 Q3 N(169, 553) 169. ## 8 Adelaide South Australia Holiday ets 2019 Q4 N(186, 569) 186. ## 9 Adelaide Hills South Australia Holiday ets 2018 Q1 N(19, 36) 19.4 ## 10 Adelaide Hills South Australia Holiday ets 2018 Q2 N(20, 36) 19.6 ## # … with 598 more rows
fit %>% forecast() %>% filter(Region == "Snowy Mountains") %>% autoplot(holidays) + labs(y = "Thousands", title = "Overnight trips")
Response residuals \[\hat{e}_t = y_t - \hat{y}_{t|t-1}\]
Innovation residuals
Additive error model: \[\hat\varepsilon_t = y_t - \hat{y}_{t|t-1}\]
Multiplicative error model: \[\hat\varepsilon_t = \frac{y_t - \hat{y}_{t|t-1}}{\hat{y}_{t|t-1}}\]
aus_holidays <- tourism %>% filter(Purpose == "Holiday") %>% summarise(Trips = sum(Trips)) fit <- aus_holidays %>% model(ets = ETS(Trips)) %>% report()
## Series: Trips ## Model: ETS(M,N,M) ## Smoothing parameters: ## alpha = 0.358 ## gamma = 0.000969 ## ## Initial states: ## l[0] s[0] s[-1] s[-2] s[-3] ## 9667 0.943 0.927 0.968 1.16 ## ## sigma^2: 0.0022 ## ## AIC AICc BIC ## 1331 1333 1348
residuals(fit) residuals(fit, type = "response")
fit %>% augment()
## # A tsibble: 80 x 6 [1Q] ## # Key: .model [1] ## .model Quarter Trips .fitted .resid .innov ## <chr> <qtr> <dbl> <dbl> <dbl> <dbl> ## 1 ets 1998 Q1 11806. 11230. 576. 0.0513 ## 2 ets 1998 Q2 9276. 9532. -257. -0.0269 ## 3 ets 1998 Q3 8642. 9036. -393. -0.0435 ## 4 ets 1998 Q4 9300. 9050. 249. 0.0275 ## 5 ets 1999 Q1 11172. 11260. -88.0 -0.00781 ## 6 ets 1999 Q2 9608. 9358. 249. 0.0266 ## 7 ets 1999 Q3 8914. 9042. -129. -0.0142 ## 8 ets 1999 Q4 9026. 9154. -129. -0.0140 ## 9 ets 2000 Q1 11071. 11221. -150. -0.0134 ## 10 ets 2000 Q2 9196. 9308. -111. -0.0120 ## # … with 70 more rows
.innov) are given by \(\hat{\varepsilon}_t\) while regular residuals (.resid) are \(y_t - \hat{y}_{t-1}\). They are different when the model has multiplicative errors.Some of the combinations of (Error, Trend, Seasonal) can lead to numerical difficulties; see equations with division by a state.
These are: \(ETS(A,N,M)\), \(ETS(A,A,M)\), \(ETS(A,A_d,M)\)
Models with multiplicative errors are useful for strictly positive data, but are not numerically stable with data containing zeros or negative values.
In that case only the six fully additive models will be applied.
Traditional point forecasts: iterate the equations for \(t=T+1,T+2,\dots,T+h\) and set all \(\varepsilon_t=0\) for \(t>T\).
Example: ETS(A,A,N) \[ \begin{align*} y_{T+1} &= \ell_T + b_T + \varepsilon_{T+1}\\ \hat{y}_{T+1|T} & = \ell_{T}+b_{T}\\ y_{T+2} & = \ell_{T+1} + b_{T+1} + \varepsilon_{T+2}\\ & = (\ell_T + b_T + \alpha\varepsilon_{T+1}) + (b_T + \beta \varepsilon_{T+1}) + \varepsilon_{T+2} \\ \hat{y}_{T+2|T} &= \ell_{T}+2b_{T} \end{align*}\]
These are identical to the forecasts from Holt’s linear method, and also to those from model ETS(M,A,N).
Thus, the point forecasts obtained from the method and from the two models that underlie the method are identical (assuming that the same parameter values are used).
\[ \begin{align*} y_{T+1} &= (\ell_T + b_T )(1+ \varepsilon_{T+1})\\ \hat{y}_{T+1|T} & = \ell_{T}+b_{T}.\\ y_{T+2} & = (\ell_{T+1} + b_{T+1})(1 + \varepsilon_{T+2})\\ & = \left\{ (\ell_T + b_T) (1+ \alpha\varepsilon_{T+1}) + \left[b_T + \beta (\ell_T + b_T)\varepsilon_{T+1}\right] \right\} (1 + \varepsilon_{T+2}) \\ \hat{y}_{T+2|T} &= \ell_{T}+2b_{T} \end{align*} \]
etc.
ETS point forecasts constructed in this way are equal to the means of the forecast distributions (\(\text{E}(y_{t+h} | x_t)\)), except for the models with multiplicative seasonality
fable uses \(\text{E}(y_{t+h} | x_t)\).
Point forecasts for ETS(A,*,*) are identical to ETS(M,*,*) if the parameters are the same.
Prediction intervals: can only be generated using the models.
The prediction intervals will differ between models with additive and multiplicative errors.
Exact formula for some models.
More general to simulate future sample paths, conditional on the last estimate of the states, and to obtain prediction intervals from the percentiles of these simulated future paths.
PI for most ETS models: \(\hat{y}_{T+h|T} \pm c \sigma_h\), where \(c\) depends on coverage probability and \(\sigma_h\) is forecast standard deviation.
h02 <- PBS %>% filter(ATC2 == "H02") %>% summarise(Cost = sum(Cost)) h02 %>% autoplot(Cost)
h02 %>% model(ETS(Cost)) %>% report()
## Series: Cost ## Model: ETS(M,Ad,M) ## Smoothing parameters: ## alpha = 0.307 ## beta = 0.000101 ## gamma = 0.000101 ## phi = 0.978 ## ## Initial states: ## l[0] b[0] s[0] s[-1] s[-2] s[-3] s[-4] s[-5] s[-6] s[-7] s[-8] s[-9] ## 417269 8206 0.872 0.826 0.756 0.773 0.687 1.28 1.32 1.18 1.16 1.1 ## s[-10] s[-11] ## 1.05 0.981 ## ## sigma^2: 0.0046 ## ## AIC AICc BIC ## 5515 5519 5575
h02 %>%
model(ETS(Cost ~ error("A") + trend("A") + season("A"))) %>%
report()
## Series: Cost ## Model: ETS(A,A,A) ## Smoothing parameters: ## alpha = 0.17 ## beta = 0.00631 ## gamma = 0.455 ## ## Initial states: ## l[0] b[0] s[0] s[-1] s[-2] s[-3] s[-4] s[-5] s[-6] s[-7] ## 409706 9097 -99075 -136602 -191496 -174531 -241437 210644 244644 145368 ## s[-8] s[-9] s[-10] s[-11] ## 130570 84458 39132 -11674 ## ## sigma^2: 3.5e+09 ## ## AIC AICc BIC ## 5585 5589 5642
h02 %>% model(ETS(Cost)) %>% forecast() %>% autoplot(h02)
h02 %>%
model(
auto = ETS(Cost),
AAA = ETS(Cost ~ error("A") + trend("A") + season("A"))
) %>%
accuracy()
| Model | MAE | RMSE | MAPE | MASE | RMSSE |
|---|---|---|---|---|---|
| auto | 38649 | 51102 | 4.99 | 0.638 | 0.689 |
| AAA | 43378 | 56784 | 6.05 | 0.716 | 0.766 |