library(fpp3)
In this document, we will be going through exercises 9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8 from Forecasting: Principles and Practice (3rd ed).
Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
The difference between these figures are the fact that the auto correlation of each lag decreases as the sample size grows bigger. In each case data is indicated as white noise.
The critical values decrease as the sample size does because as the amount of data we have increases, so does the assumption that we are closer to the true distribution of the population values. If there is even a small amount of correlation with a very large sample size we know that there might be true autocorrelation within the population. This is also why the autocorrelations decrease over time. The more data we have the less any incidental correlation would show up in the ACF over the true autocorrelation of random data which is 0.
A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
gafa_stock |>
filter(Symbol == "AMZN") |>
gg_tsdisplay(Close, plot_type='partial')
The daily closing prices can be seen to be non-stationary from the daily closing prices as we have a clear increasing trend up until recently. The ACF shows only the tiniest bit of decreasing auto-correlation up to 30 lags out, normally auto-correlation would drop close to zero if the data was non-stationary. The PACF plot tells us something similar with the first lag being extremely partially auto-correlated, as a high partial auto-correlation of the first lag means every point and the one before it heavily influence each other. Thankfully it is the same property that would leave differencing this data into change of close price over time to be transformed into a stationary time series.
For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
First we initially plot the data to get an idea of what kind of series it is. Here we see that it is non seasonal data with an increasing trend and increasing variance over time as well.
df <- global_economy |>
filter(Country == "Turkey")
df |>
gg_tsdisplay(GDP)
As an experiment, I decided to check if transforming such a series where the change in variance seems to be minimal was even necessary. We can see below that a log transformation itself provides the benefit of a more homoscedastic differenced series.
df |>
transmute(
`GDP` = GDP,
`Log GDP` = log(GDP),
`Annual change in GDP` = difference(GDP),
`Annual change in log GDP` = difference(log(GDP)),
`Doubly differenced GDP` = difference(GDP, differences = 2),
`Doubly differenced log GDP` = difference(log(GDP), differences = 2)
) |>
pivot_longer(-Year, names_to="Type", values_to="GDP") |>
mutate(
Type = factor(Type, levels = c(
"GDP",
"Log GDP",
"Annual change in GDP",
"Annual change in log GDP",
"Doubly differenced GDP",
"Doubly differenced log GDP"
))
) |>
ggplot(aes(x = Year, y = GDP)) +
geom_line() +
facet_grid(vars(Type), scales = "free_y") +
labs(title = "Turkish GDP", y = NULL)
Next we find the appropriate box-cox transformation lambda value through the guerrero feature. While utilizing the unitroot category of features to find how many differences we should take of both seasonal and first variety. Note that in this case, since there is no seasonal trend from plotting, we already know that seasonal-differences would not be required.
df |>
features(GDP, features = guerrero) |>
pull(lambda_guerrero) -> lambda
df |>
features(box_cox(GDP, lambda), features = unitroot_nsdiffs) |>
pull(nsdiffs) -> nsdiffs
df |>
features(box_cox(GDP, lambda), features = unitroot_ndiffs) |>
pull(ndiffs) -> ndiffs
cat("The optimal box-cox lambda value for this series is", lambda,
"\nThe optimal number of seasonal differences for this series is", nsdiffs,
"\nThe optimal number of first differences for this series is", ndiffs)
## The optimal box-cox lambda value for this series is 0.1572187
## The optimal number of seasonal differences for this series is 0
## The optimal number of first differences for this series is 1
Below we can see the previously given lambda and orders of differences working well.
df |>
transmute(
`GDP` = GDP,
`Box-Cox GDP` = box_cox(GDP, lambda),
`First Differenced Box-Cox GDP` = difference(box_cox(GDP, lambda), ndiffs)
) |>
pivot_longer(-Year, names_to="Type", values_to="GDP") |>
mutate(
Type = factor(Type, levels = c(
"GDP",
"Box-Cox GDP",
"First Differenced Box-Cox GDP"
))
) |>
ggplot(aes(x = Year, y = GDP)) +
geom_line() +
facet_grid(vars(Type), scales = "free_y") +
labs(title = "Turkish GDP", y = NULL)
First we initially plot the data to get an idea of what kind of series it is. Here we see that it is seasonal data with an increasing trend and increasing variance over time as well.
df <- aus_accommodation |>
filter(State == "Tasmania")
df |>
gg_tsdisplay(Takings)
Next we find the appropriate box-cox transformation lambda value through the guerrero feature. While utilizing the unitroot category of features to find how many differences we should take of both seasonal and first variety.
Note that because lambda is so small we could just use a log transform instead, but since interpretability is not a factor here I choose to use the small lambda.
df |>
features(Takings, features = guerrero) |>
pull(lambda_guerrero) -> lambda
df |>
features(box_cox(Takings, lambda), features = unitroot_nsdiffs) |>
pull(nsdiffs) -> nsdiffs
df |>
features(difference(box_cox(Takings, lambda),lag = 4, differences = nsdiffs), features = unitroot_ndiffs) |>
pull(ndiffs) -> ndiffs
cat("The optimal box-cox lambda value for this series is", lambda,
"\nThe optimal number of seasonal differences for this series is", nsdiffs,
"\nThe optimal number of first differences for this series is", ndiffs)
## The optimal box-cox lambda value for this series is 0.001819643
## The optimal number of seasonal differences for this series is 1
## The optimal number of first differences for this series is 0
Below we see that although the transformation and difference help remove seasonality, there seems to be more variance in the beginning so an ARIMA model based off of this data might not be optimal.
df |>
transmute(
`Takings` = Takings,
`Box-Cox Takings` = box_cox(Takings, lambda),
`Seasonal Differenced Box-Cox Takings` = difference(box_cox(Takings, lambda), lag = 4, differences = nsdiffs)
) |>
pivot_longer(-Date, names_to="Type", values_to="Takings") |>
mutate(
Type = factor(Type, levels = c(
"Takings",
"Box-Cox Takings",
"Seasonal Differenced Box-Cox Takings"
))
) |>
ggplot(aes(x = Date, y = Takings)) +
geom_line() +
facet_grid(vars(Type), scales = "free_y") +
labs(title = "Tasmanian Accommodation Takings", y = NULL)
First we initially plot the data to get an idea of what kind of series it is. Here we see that it is seasonal data with a slightly increasing trend but a heavily increasing variance over time.
df <- souvenirs
df |>
gg_tsdisplay(Sales)
Next we find the appropriate box-cox transformation lambda value through the guerrero feature. While utilizing the unitroot category of features to find how many differences we should take of both seasonal and first variety.
Note that because lambda is so small we could just use a log transform instead, but since interpretability is not a factor here I choose to use the small lambda.
df |>
features(Sales, features = guerrero) |>
pull(lambda_guerrero) -> lambda
df |>
features(box_cox(Sales, lambda), features = unitroot_nsdiffs) |>
pull(nsdiffs) -> nsdiffs
df |>
features(difference(box_cox(Sales, lambda),lag = 12, differences = nsdiffs), features = unitroot_ndiffs) |>
pull(ndiffs) -> ndiffs
cat("The optimal box-cox lambda value for this series is", lambda,
"\nThe optimal number of seasonal differences for this series is", nsdiffs,
"\nThe optimal number of first differences for this series is", ndiffs)
## The optimal box-cox lambda value for this series is 0.002118221
## The optimal number of seasonal differences for this series is 1
## The optimal number of first differences for this series is 0
Below we see that both seasonality and heteroscedasicity get removed well with the given values.
df |>
transmute(
`Sales` = Sales,
`Box-Cox Sales` = box_cox(Sales, lambda),
`Seasonal Differenced Box-Cox Sales` = difference(box_cox(Sales, lambda), lag = 12, differences = nsdiffs)
) |>
pivot_longer(-Month, names_to="Type", values_to="Sales") |>
mutate(
Type = factor(Type, levels = c(
"Sales",
"Box-Cox Sales",
"Seasonal Differenced Box-Cox Sales"
))
) |>
ggplot(aes(x = Month, y = Sales)) +
geom_line() +
facet_grid(vars(Type), scales = "free_y") +
labs(title = "Souvenir Sales", y = NULL)
For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
set.seed(1234567)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
df <- myseries
First we initially plot the data to get an idea of what kind of series it is. Here we see that it is seasonal data with an increasing trend and increasing variance over time as well.
df |>
gg_tsdisplay(Turnover)
Next we find the appropriate box-cox transformation lambda value through the guerrero feature. While utilizing the unitroot category of features to find how many differences we should take of both seasonal and first variety.
Note that because lambda is so small we could just use a log transform instead, but since interpretability is not a factor here I choose to use the small lambda.
df |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero) -> lambda
df |>
features(box_cox(Turnover, lambda), features = unitroot_nsdiffs) |>
pull(nsdiffs) -> nsdiffs
df |>
features(difference(box_cox(Turnover, lambda),lag = 12, differences = nsdiffs), features = unitroot_ndiffs) |>
pull(ndiffs) -> ndiffs
cat("The optimal box-cox lambda value for this series is", lambda,
"\nThe optimal number of seasonal differences for this series is", nsdiffs,
"\nThe optimal number of first differences for this series is", ndiffs)
## The optimal box-cox lambda value for this series is 0.1761103
## The optimal number of seasonal differences for this series is 1
## The optimal number of first differences for this series is 0
Below we see that both seasonality and heteroscedasticity get removed with the given values. Yet there still seems to be a pattern left over with autocorrelated dips and peaks. Perhaps this is a cyclical pattern that is left which would be better removed with a first difference.
df |>
transmute(
`Turnover` = Turnover,
`Box-Cox Turnover` = box_cox(Turnover, lambda),
`Seasonal Differenced Box-Cox Turnover` = difference(box_cox(Turnover, lambda), lag = 12, differences = nsdiffs)
) |>
pivot_longer(-Month, names_to="Type", values_to="Turnover") |>
mutate(
Type = factor(Type, levels = c(
"Turnover",
"Box-Cox Turnover",
"Seasonal Differenced Box-Cox Turnover"
))
) |>
ggplot(aes(x = Month, y = Turnover)) +
geom_line() +
facet_grid(vars(Type), scales = "free_y") +
labs(title = "Souvenir Turnover", y = NULL)
This definitely seems much more like random noise compared to just a singular seasonal difference. Therefore we update our optimal orders. A teachable moment here about blindly following the purely mathematically optimal values.
df |>
autoplot(
difference(
difference(box_cox(Turnover, lambda), lag = 12, differences = nsdiffs
)
))
ndiffs <- 1
cat("The optimal box-cox lambda value for this series is", lambda,
"\nThe optimal number of seasonal differences for this series is", nsdiffs,
"\nThe optimal number of first differences for this series is", ndiffs)
## The optimal box-cox lambda value for this series is 0.1761103
## The optimal number of seasonal differences for this series is 1
## The optimal number of first differences for this series is 1
Simulate and plot some data from simple ARIMA models.
For context of this problem, we generate and plot the original error distribution.
e <- rnorm(100)
tsibble(idx = seq_len(100),e=e, index = idx) |>
autoplot(e)
y <- numeric(100)
spr <- seq(-1,1,0.4)
spr <- append(spr, 0, after=3)
sim <- tsibble(idx = seq_len(100), index = idx)
for(phi in spr){
for(i in 2:100){
y[i] <- phi*y[i-1] + e[i]
}
sim[,paste0('ph1_',phi)] <- y
}
If we increase the magnitude of ϕ1 then the importance of the previous term keeps increasing making the points smoother and closer together. Once we hit ϕ1 = 1 then we now have a random walk. If we decrease the magnitude of ϕ1 the points have less influence on each other up until 0 where the data now is the purely random error term, aka white noise. What making ϕ1 negative does is simply increasing the chance of oscillatory behavior which is strongest at ϕ1 = -1 and when the error term is closer to 0.
sim |>
gather(variable,value, `ph1_-1`:ph1_1, factor_key = TRUE) |>
autoplot(value) + facet_wrap(~variable, scales = "free_y", ncol = 2)
y <- numeric(100)
spr <- seq(-1,1,0.4)
spr <- append(spr, 0, after=3)
sim <- tsibble(idx = seq_len(100), index = idx)
for(thet in spr){
for(i in 2:100){
y[i] <- thet*e[i-1] + e[i]
}
sim[,paste0('thet1_',thet)] <- y
}
As the magnitude of θ1 decreases and gets closer to 0, the effect of the previous error from the distribution decreases and we will end up with the random white noise of the error distribution at 0. As θ1 increases in magnitude towards θ1 = -1 we end up with increasingly oscillatory behavior as the distance between errors get exaggerated. As θ1 increases in magnitude towards θ1 = 1 we end up with a smoothing effect where the series is less likely to drastically change in value between consecutive points.
sim |>
gather(variable,value, `thet1_-1`:thet1_1, factor_key = TRUE) |>
autoplot(value) + facet_wrap(~variable, scales = "free_y", ncol = 2)
y <- numeric(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + 0.6*e[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), ARMA = y, index = idx)
y <- numeric(100)
for(i in 3:100)
y[i] <- -0.8*y[i-1] + 0.3*y[i-2] + e[i]
sim$AR2 <- y
The ARMA model generates a pretty normal looking stationary time series with values that don’t seem to stray too far off from our original error distribution besides an increased standard deviation. Our AR2 model uses the original error distribution as a base but grows in variance over time to completely eclipse the magnitude of values we had in our original error distribution. Additionally, the AR2 model seems to have extracted a seasonal pattern from where there was only data which also increases in variance over time.
par(mfrow=c(2,1))
sim |>
autoplot(AR2)
sim |>
autoplot(ARMA)
Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.
(models <- aus_airpassengers |>
model(
ARIMA = ARIMA(Passengers),
# Has a constant = with drift
`ARIMA010` = ARIMA(Passengers ~ 1 + pdq(0,1,0)),
# Has a constant = with drift
`ARIMA212` = ARIMA(Passengers ~ 1 + pdq(2,1,2)),
# Constant Removed
`ARIMA212-C` = ARIMA(Passengers ~ 0 + pdq(2,1,2), method="ML"),
# Constant Added
`ARIMA021` = ARIMA(Passengers ~ 1 + pdq(0,2,1)),
))
## Warning: Model specification induces a quadratic or higher order polynomial trend.
## This is generally discouraged, consider removing the constant or reducing the number of differences.
## # A mable: 1 x 5
## ARIMA ARIMA010 ARIMA212 `ARIMA212-C`
## <model> <model> <model> <model>
## 1 <ARIMA(0,2,1)> <ARIMA(0,1,0) w/ drift> <ARIMA(2,1,2) w/ drift> <ARIMA(2,1,2)>
## # ℹ 1 more variable: ARIMA021 <model>
models|>
report()
## Warning in report.mdl_df(models): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.
## # A tibble: 5 × 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 ARIMA 4.31 -97.0 198. 198. 202. <cpl [0]> <cpl [1]>
## 2 ARIMA010 4.27 -98.2 200. 201. 204. <cpl [0]> <cpl [0]>
## 3 ARIMA212 4.03 -96.2 204. 207. 215. <cpl [2]> <cpl [2]>
## 4 ARIMA212-C 3.90 -97.2 204. 206. 214. <cpl [2]> <cpl [2]>
## 5 ARIMA021 3.85 -95.1 196. 197. 202. <cpl [0]> <cpl [1]>
We get an ARIMA(0,2,1) model.
models |>
select(ARIMA)|>
gg_tsresiduals()
The residuals mostly look like white noise, but there is increasing heteroscedasticity over time along with large outliers within our residuals.
models |>
augment() |>
filter(.model=='ARIMA') |>
features(.innov, ljung_box, lag = 10, dof = 1)
## # A tibble: 1 × 3
## .model lb_stat lb_pvalue
## <chr> <dbl> <dbl>
## 1 ARIMA 6.70 0.669
Utilizing a portmanteau test we can see that statistically the residuals are white noise.
models |>
select(ARIMA)|>
forecast(h=10) |>
autoplot(aus_airpassengers)
These prediction intervals are varying in tens of millions after a few years, thus these forecasts seem unreliable.
models |>
select(ARIMA)|>
report()
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
(1−B)2(y_t−μt2/2!) = (1-0.8963B)ε_t
Note the absence of the constant c because d > 1.
models |>
select(ARIMA, ARIMA010)|>
forecast(h=10) |>
autoplot(aus_airpassengers)
The ARIMA(0,1,0) model is less optimistic with its estimate than the auto-selected ARIMA(0,2,1) model. The 010 model is simply a random walk with drift where the inclusion of the constant tends to have the model increase. A simple model like this still has a very close AICc of 200 to the auto generated model’s AICc of 198. Telling us that it is still worth considering simple models if computation is a bottleneck.
models |>
select(ARIMA, ARIMA010, ARIMA212)|>
forecast(h=10) |>
autoplot(aus_airpassengers)
The ARIMA(2,1,2) model very closely resembles the ARIMA(0,1,0) model except the forecasts are not linear and instead very slightly sinusoidal while increasing in value. The AICc is worse than the previous two models at 204 even with the increasing complexity. Telling us that more complex isn’t necessarily better.
The constant can not be removed by default unless we force MLE calculation of the model over CSS. This means that removing the constant makes the coefficients non-stationary through CSS calculation.
models |>
select(ARIMA, ARIMA010,`ARIMA212-C`)|>
forecast(h=10) |>
autoplot(aus_airpassengers)
Removing the constant both straightens out the forecasts and makes it much more optimistic than the previous 212 model forecasts. It is also the most optimistic model so far beating out the auto ARIMA model. Removing the constant still leaves us with a similar AICc of 204, so it doesn’t make sense to use the 212 model in any capacity here.
models |>
select(ARIMA, ARIMA021)|>
forecast(h=10) |>
autoplot(aus_airpassengers)
Again here we have more optimistic forecasts than the auto ARIMA(0,2,1) model. However, checking the reports we see that this model with a constant actually beats out the auto ARIMA model. The reason for this being constants are not included in the automatic algorithm when d>1. Thus, it might be worth considering making slight changes to your auto modeled ARIMA model to improve it.
For the United States GDP series (from global_economy):
df <-
global_economy |>
filter(Code == "USA") |>
select(GDP)
autoplot(df, GDP)
This should be box-cox transformed as it seems to be increasing geometrically.
Below we use a familiar algorithm to find the optimal Box-Cox lambda.
df |>
features(GDP, features = guerrero) |>
pull(lambda_guerrero) -> lambda
df |>
features(box_cox(GDP, lambda), features = unitroot_ndiffs) |>
pull(ndiffs) -> ndiffs
cat("The optimal box-cox lambda value for this series is", lambda,
"\nThe optimal number of first differences for this series is", ndiffs)
## The optimal box-cox lambda value for this series is 0.2819443
## The optimal number of first differences for this series is 1
df |>
mutate(GDP_Box = box_cox(GDP,lambda)) -> df
autoplot(df,GDP_Box)
We have successfully reduced the geometric pattern after box-cox transforming.
df |>
gg_tsdisplay(GDP_Box, plot_type = "partial")
The automatically selected ARIMA model is an ARIMA(1,1,0) with drift.
(models <- df |>
model(
ARIMA = ARIMA(box_cox(GDP,lambda)),
ARIMA_Manual = ARIMA(box_cox(GDP,lambda) ~ 0 + pdq(1,1,0)),
ARIMA_SemiAuto = ARIMA(box_cox(GDP,lambda) ~ pdq(p=1,d=2:4,q=0:4)),
ARIMA_FullAuto = ARIMA(box_cox(GDP,lambda), greedy = FALSE, approximation = FALSE, stepwise = FALSE)
))
## # A mable: 1 x 4
## ARIMA ARIMA_Manual ARIMA_SemiAuto ARIMA_FullAuto
## <model> <model> <model> <model>
## 1 <ARIMA(1,1,0) w/ drift> <ARIMA(1,1,0)> <ARIMA(1,2,1)> <ARIMA(1,1,0) w/ drift>
models |>
glance()
## # A tibble: 4 × 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 ARIMA 5479. -325. 657. 657. 663. <cpl [1]> <cpl [0]>
## 2 ARIMA_Manual 7038. -334. 672. 672. 676. <cpl [1]> <cpl [0]>
## 3 ARIMA_SemiAuto 5761. -321. 648. 649. 655. <cpl [1]> <cpl [1]>
## 4 ARIMA_FullAuto 5479. -325. 657. 657. 663. <cpl [1]> <cpl [0]>
We try a few other options for modeling ARIMA that would be plausible based on the ACF and PACF, but we are limited with a (1,1,X) model already seeming to be optimal from our autocorrelation plots and unit roots.
Having ARIMA move through the set space we give it, seems to give a better AICc than the automatically selected model. However, we must consider that since we have a different differential order, AICc is not directly comparable.
Thus, we stick with the automatically produced model.
models |>
select(ARIMA) |>
gg_tsresiduals()
Checking the residual diagnostics there is not any significant correlation, but there are very large outliers that make the distribution skewed heavily left. So, we switch to looking at the ARIMA(1,2,1) that was generated from set searching.
models |>
select(ARIMA_SemiAuto) |>
gg_tsresiduals()
Unfortunately, the residual distribution and graph largely looks the same. So, we will have to continue with the knowledge that our residuals indicate that this model is not utilizing all the information available in the series.
models |>
select(ARIMA) |>
forecast(h=10) |>
autoplot(df)
The forecasts do seem like they follow the overall trend of the series.
models <- df |>
model(
ARIMA = ARIMA(box_cox(GDP,lambda)),
ETS = ETS(GDP)
)
models |>
glance()
## # A tibble: 2 × 11
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots MSE AMSE
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list> <dbl> <dbl>
## 1 ARIMA 5.48e+3 -325. 657. 657. 663. <cpl> <cpl> NA NA
## 2 ETS 6.78e-4 -1590. 3191. 3192. 3201. <NULL> <NULL> 2.79e22 1.20e23
## # ℹ 1 more variable: MAE <dbl>
models |>
forecast(h=10) |>
autoplot(df)
Visually the ETS model seems to continue the trend better, however the prediction interval is extremely large which leaves us with less confidence in this model. Let’s compare accuracy over a five year period to better compare these two models.
(models <- df |>
filter_index(~ "2012") |>
model(
ARIMA = ARIMA(box_cox(GDP,lambda)),
ETS = ETS(GDP)
))
## # A mable: 1 x 2
## ARIMA ETS
## <model> <model>
## 1 <ARIMA(1,1,0) w/ drift> <ETS(M,A,N)>
models |>
forecast(h = 5) |>
accuracy(df)
## # A tibble: 2 × 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ARIMA Test -2.18e11 249926621810. 2.18e11 -1.18 1.18 0.699 0.666 0.400
## 2 ETS Test 2.91e11 345234202897. 2.91e11 1.57 1.57 0.935 0.920 0.191
We can confirm here that our RMSE is better with the ARIMA model over the ETS model, thus making it a better fit. However, we didn’t attempt to optimize the ETS model at all.