Author: Farhod Ibragimov

library(fpp3)
library(future)
library(patchwork)
plan(multisession)
  1. Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.
?aus_livestock
## starting httpd help server ... done
aus_livestock |> 
  distinct(Animal)
aus_pig |> 
  autoplot()+
  labs(title = 'Counts of pigs slaughtered for meat production in Victoria')
## Plot variable not specified, automatically selected `.vars = Count`

aus_pig |>
  features(Count, feat_stl)

From the plot and STL diagnostics can say following

aus_pig |> 
  model(STL(Count~season(window = 'periodic'))) |> 
  components() |> 
  ACF(season_year) |> 
  autoplot()

As we see ACF of the estimated seasonal component shows strong spikes at lags 12 and 24, this suggests that seasonal patter repeats annually.
This result is aligned with the STL diagnostics analysis, which estimated the seasonal strength to be 0.49, suggesting a moderate seasonal effect in the series.
Overall the trend (0.90) is dominant feature, while seasonal dominance is 0.49, suggesting a moderate seasonal pattern.

1a. Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of alpha and l_0, and generate forecasts for the next four months.

fit_pigs <- aus_pig |> 
  model(
    SES = ETS(Count~ error('A') + trend('N') + season('N'))
  )
fit_pigs |> 
  tidy()
report(fit_pigs)
## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##      l[0]
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

The optimal smoothing parameter was alpha = 0.322, indicating that about 32% of the new observation is used when updating the level estimate. The model’s memory of previous levels does not fade away quickly giving more weights to previous observations.
The estimated initial level was l_0 = 100,646.6.

fc_pigs <- fit_pigs |> 
  forecast(h = '4 months') 

fc_pigs |> 
  autoplot(aus_pig |> filter(year(Month) >= '2012 Jan'), level = NULL)+
  labs(title = "Pigs slautering 4 months forecast")

fc_pigs


The forecast points for the next four months are all 95,186.56 slaughtered pigs. This is because simple exponential smoothing is a level-only model, so multi-step forecasts remain constant.
Since the STL features suggested strong trend and moderate seasonality, ETS models like AAN, ANA, AAA or AAdA than simple exponential smoothing may be more appropriate.

1b. Compute a 95% prediction interval for the first forecast using y_hat + - 1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

residuals_sd <- fit_pigs |> 
  augment() |> 
  pull(.innov) |> 
  sd()
residuals_sd
## [1] 9344.666
fc_point <- fc_pigs |> 
  pull(.mean)

y_hat <- fc_point[1]
y_hat
## [1] 95186.56
lower_ci <- y_hat - residuals_sd*1.96
upper_ci <- y_hat + residuals_sd*1.96
print(c(lower_ci, upper_ci))
## [1]  76871.01 113502.10
fc_pigs_intervals <- fc_pigs |> 
  hilo(level = 95) |> 
  unpack_hilo('95%')
fc_pigs_intervals |> 
  slice(1) |> 
  select(.model, Month, .mean, `95%_lower`, `95%_upper`)

The residual standard deviation was SD = 9344.66. With first forecast y_hat = 95186.56 the manual 95% prediction interval is approximately (76,817, 113,556).

R produced the interval (76,854.79, 113,518.30).
The two intervals are very similar, with only minor differences because the ETS model uses its own estimate of the innovation variance rather than the simple residual standard deviation.

  1. Data set global_economy contains the annual Exports from many countries. Select one country to analyse.
?global_economy
canada_exports <- global_economy |> 
  filter(Code == 'CAN') |> 
  select(Exports)
canada_exports

a. Plot the Exports series and discuss the main features of the data.

canada_exports |> 
  autoplot(Exports)+
  labs(title = "Canadian exports (% of GDP)",
       y = 'Exports (% of GDP)')

The Canadian exports series shows a strong upward trend from the 1960s until around 2000, followed by a noticeable decline and stabilization in later years. The series contains sharp growth from 1992 till 2000, followed sharp drop in exports until 2009 and exports appear to stabilize after 2009. Since the data are annual, no seasonal pattern is present.
Overall, the plot suggests that long-term movements play an important role in the series, which may suggest using models that include a trend component.

b. Use an ETS(A,N,N) and ETS(A,A,N) models to forecast the series, and plot the forecasts.

fit_canada <- canada_exports |> 
  model(
    SES = ETS(Exports~ error('A') + trend('N') + season('N')),
    AAN = ETS(Exports~ error('A') + trend('A') + season('N'))
  )
fc_canada <- fit_canada |> 
  forecast(h = 7)
fc_canada |> autoplot(canada_exports, level = NULL)

b. Compute the RMSE values for the training data.

fit_canada |> 
  accuracy() |> 
  select(.model, .type, RMSE)
fit_canada |> 
  tidy()

d. Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.

Both models RMSE values for the training data are very similar , with SES = 1.618 and AAN = 1.626. Since the SES model has a slightly lower RMSE and uses fewer parameters, it provides a more parsimonious fit to the data.
Both models have smoothing parameters alpha very close to 1, therefore the level estimate puts almost all weight on the most recent observation. As a result, the SES model behave similarly to a NAIVE forecasting approach.
In the ETS(A,A,N) model, the trend parameter beta = 0.24 allows the trend to update gradually. Because the alpha = 0.99 (more weights on recent observations) for ETS(A,A,N) model and the most recent observations (after 2000) show decline and stabilization in exports, the estimated trend becomes slightly negative, leading to a downward forecast.

e. Compare the forecasts from both methods. Which do you think is best?


The Canadian exports series shows long-term changes and structural shifts, particularly around 2000 and 2009. Because the direction of the trend changes over time and overall it is moving upward, model such as ETS(A,A,N) may not capture the series well. The SES model adapts to the current level without imposing a trend, which probably explains its slightly better RMSE performance.

fc_one <- fc_canada |> 
  filter(Year == 2018)

fc_one_SES <- fc_one |> 
  filter(.model == 'SES') |> 
  pull(.mean)

fc_one_AAN <- fc_one |> 
  filter(.model == 'AAN') |> 
  pull(.mean)
print(c(fc_one_SES, fc_one_AAN))
## [1] 30.89403 30.77976
fit_canada
residuals_ANN_sd <- fit_canada |> 
  select(SES) |> 
  augment() |> 
  pull(.innov) |> 
  sd()
residuals_ANN_sd
## [1] 1.613758
residuals_AAN_sd <- fit_canada |> 
  select(AAN) |> 
  augment() |> 
  pull(.innov) |> 
  sd()
residuals_AAN_sd
## [1] 1.639708
lower_fc_SES <- fc_one_SES - residuals_ANN_sd*1.96
upper_fc_SES <- fc_one_SES + residuals_ANN_sd*1.96
print(c(lower_fc_SES, upper_fc_SES))
## [1] 27.73107 34.05700
#fc_canada
fc_canada_intervals_SES <- fc_canada |> 
  filter(.model == 'SES') |> 
  hilo(level = 95) |> 
  unpack_hilo('95%')
fc_canada_intervals_SES |> 
  slice(1) |> 
  select(.model, Year, .mean, `95%_lower`, `95%_upper`)
lower_fc_AAN <- fc_one_AAN - residuals_AAN_sd*1.96
upper_fc_AAN <- fc_one_AAN + residuals_AAN_sd*1.96
print(c(lower_fc_AAN, upper_fc_AAN))
## [1] 27.56593 33.99359
fc_canada_intervals_AAN <- fc_canada |> 
  filter(.model == 'AAN') |> 
  hilo(level = 95) |> 
  unpack_hilo('95%')
fc_canada_intervals_AAN |> 
  slice(1) |> 
  select(.model, Year, .mean, `95%_lower`, `95%_upper`)
library(dplyr)
library(knitr)
library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
comparison_table <- tibble(
  .model = c("SES", "AAN"),
  Forecast = c(fc_one_SES, fc_one_AAN),
  Manual_lower = c(lower_fc_SES, lower_fc_AAN),
  Manual_upper = c(upper_fc_SES, upper_fc_AAN)
) |>
  left_join(
    bind_rows(
      fc_canada_intervals_SES |> slice(1),
      fc_canada_intervals_AAN |> slice(1)
    ) |>
      select(.model, R_lower = `95%_lower`, R_upper = `95%_upper`),
    by = ".model"
  )

library(knitr)
library(kableExtra)

comparison_table |>
  select(.model, Year, Forecast, Manual_lower, Manual_upper, R_lower, R_upper) |>
  kable(
    digits = 3,
    col.names = c(
      "Model", "Year", "Forecast",
      "Manual lower", "Manual upper",
      "R lower", "R upper"
    )
  ) |>
  kable_styling(full_width = FALSE) |>
  row_spec(0:nrow(comparison_table), background = "white")
Model Year Forecast Manual lower Manual upper R lower R upper
SES 2018 30.894 27.731 34.057 27.667 34.121
AAN 2018 30.780 27.566 33.994 27.478 34.082

The manually calculated prediction intervals for both models are very close to the intervals calculated by R.
There are small differences because the manual calculation uses the residual standard deviation, while the ETS in R computes prediction intervals using the estimated innovation variance from the state-space model.

  1. Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.
china_gdp <- global_economy |> 
  filter(Country == 'China') |> 
  select(GDP) |> 
  mutate(GDP_bil = GDP / 1e9)
#china_gdp
lambda <- china_gdp |> 
  features(GDP_bil, features = guerrero) |> 
  pull(lambda_guerrero)
lambda
## [1] -0.03446284
china_gdp <- china_gdp |> 
  mutate(GDP_bil_bxcx = box_cox(GDP_bil, lambda))


china_gdp |> autoplot(GDP_bil)+
  labs(title = "China GDP in billions",
       y = 'GDP (billions)')

Chinese GDP shows very strong long-term growth, especially after around 2000 meaning the economy is growing faster over time.

fit_china <- china_gdp |> 
 # stretch_tsibble(.init = 10) |> 
  model(
    SES = ETS(GDP_bil~error('A') + trend('N') + season('N')),
    Holt = ETS(GDP_bil~error('A') + trend('A') + season('N')),
    Damp = ETS(GDP_bil~error('A') + trend('Ad') + season('N')),
    Damp_bxcx= ETS(log(GDP_bil)~error('A') + trend('Ad') + season('N'))
  )
fit_china |> 
  forecast(h = 12)  |>  
  autoplot(china_gdp, level = NULL)+
  scale_y_continuous()+
  guides(color = guide_legend(title = 'Forecast'))

# fit_china|>
#   accuracy()
fit_china_phi_adj <- china_gdp |> 
 # stretch_tsibble(.init = 10) |> 
  model(
    SES = ETS(GDP_bil~error('A') + trend('N') + season('N')),
    Holt = ETS(GDP_bil~error('A') + trend('A') + season('N')),
    Damp = ETS(GDP_bil~error('A') + trend('Ad', phi = 0.9) + season('N')),
    Damp_bxcx= ETS(log(GDP_bil)~error('A') + trend('Ad', phi = 0.55) + season('N'))
  )
fit_china_phi_adj |> 
  forecast(h = 12)  |>  
  autoplot(china_gdp, level = NULL)+
  scale_y_continuous()+
  guides(color = guide_legend(title = 'Forecast'))

I applied different damping coefficients phi to both Damp (phi = 0.9) and Damp_bxcx (phi = 0.55) models:

# fit_china_phi_adj |> 
#   accuracy()

8.7 Find an ETS model for the Gas data from aus_production and forecast the next few years.
Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

gas_production <- aus_production |> 
  select(Gas)
#gas_production
gas_production |> 
  autoplot(Gas)+
  labs(title = 'Australian gas production',
       y = 'Gas (petajoules)')

As we see from the plot gas production variance and trend looks constant and stable until around 1970 with small seasonal fluctuations.
But after 1970 the series shows a strong upward trend, and the seasonal variation becomes larger as the level of the series increases. This suggests that the seasonal variations are proportional to the level indicating multiplicative seasonality.

lambda_gas <- gas_production |> 
  features(Gas, features = guerrero) |> 
  pull(lambda_guerrero)
#lambda_gas
gas_production <- gas_production |> 
  mutate(Gas_bxcx = box_cox(Gas, lambda_gas))
# gas_production
# gas_production |> 
#   autoplot(Gas_bxcx)
 
gas_production |> 
  autoplot(Gas_bxcx)+
  labs(title = 'BoxCox transformed gas production series')

As we see applying BoxCox transformation improved variation in series making it more stable and consistent over time.

gas_production |> 
  model(
    stl = STL(Gas~season(window = 'periodic'))
  ) |> 
  components() |> 
  ACF(season_year, lags = 48) |> 
  autoplot()

I applied STL decomposition to estimate seasonal component of the series. After that I used ACF function on estimated seasonal component.
And as we see lags 4, 8, 12 and other multiplies of 4 show strong positive correlations, also there are strong negative autocorrelations at lags 2, 6 ,10, 14 and etc.

Since the data are quarterly, ACF also confirms that the seasonal pattern repeats every 4 quarters. This confirms the presence of a strong yearly seasonal cycle in the gas production series.

Adding this to my previous findings from plots of regular and BoxCox transformed data, this gives additional evidence of multiplicative seasonality.

fit_gas <- gas_production |> 
  model(
    AAA = ETS(Gas~ error('A') + trend('A') + season('A')),
    AAM = ETS(Gas~ error('A') + trend('A')+ season('M')),
    AAdM = ETS(Gas~ error('A') + trend('Ad')+ season('M')),
  )

fc_gas <- fit_gas |> 
  forecast(h = '2 year')
fc_gas |> 
  autoplot(gas_production |> 
             filter(year(Quarter) >= 2007), 
           level = NULL) +
  geom_point(data = fc_gas, 
             aes( y =.mean, color = .model), size = 1.7) + 
  scale_color_manual(values = c("AAA" = "#D55E00", "AAM" = "#0072B2", "AAdM" = "#009E73")) +
  guides(colour = guide_legend(override.aes = list(linewidth = 1.2))) +
  labs(title = "Gas Production Forecast Comparison",
       subtitle = "Comparing Additive vs Multiplicative Seasonality",
       y = "Petajoules")

Careful look at the plot shows that all three models follow the same quarterly seasonal cycle, but the multiplicative seasonal models fit the data structure better than the additive one. Since the seasonal swings increase with the level of gas production, I think AAM and AAdM methods are more appropriate than AAA method.
The AAdM method is slightly more cautious since the damped trend prevents the forecast from growing constantly over time.

fit_gas |> 
  accuracy()

The results show that models with multiplicative seasonality perform better than the additive model. The AAA method has the highest errors, while AAM and AAdM both have lower and similar RMSEs.
Among them, i would prefer AAdM AAdM provides the best overall accuracy and produces more realistic forecasts due to the damped trend.

  1. Recall your retail time series data (from Exercise 7 in Section 2.10).

For this exercise I would like to choose “Supermarket and grocery stores” time series.

supermarket <- aus_retail |> 
  filter(Industry == 'Supermarket and grocery stores') |> 
  summarise(Turnover = sum(Turnover))

#supermarket

This series have 441 monthly observations of overall Australian supermarket and grocery stores sales turnovers in a time range of April 1982 to November 2018.

a. Why is multiplicative seasonality necessary for this series?

supermarket |> 
  autoplot(Turnover)+
  labs(title = 'Australian Supermarket and grocery stores sales',
       y = 'Turnover ($ million AUD)')

The plot clearly shows that seasonal variations growing wider as the trend of series increases. This strongly suggests presence of multiplicative seasonality.

lambda_supermarket <- supermarket |> 
  features(Turnover, features = guerrero) |> 
  pull(lambda_guerrero)
p1 <- supermarket |> 
  autoplot(Turnover)+
  labs(title = 'Supermarkets Turnover time series')
p2 <- supermarket |> 
  autoplot(box_cox(Turnover, lambda_supermarket))+
  labs(y = 'BoxCox transformed', title = paste0('BoxCox transformed Turnover. Lambda = ', round(lambda_supermarket, 3), ')'))
p1/p2

After applying the Box–Cox transformation (lambda = 0.184), the variance becomes more stable and the seasonal fluctuations appear more constant over time. This suggests that the transformation successfully stabilizes the variance and transformed the series from multiplicative closer to additive behavior.

lambda_supermarket
## [1] 0.1842339

b. Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

fit_retail <- supermarket |>
  model(
    MAM = ETS(Turnover ~ error("M") + trend("A") + season("M")),
    MAdM = ETS(Turnover ~ error("M") + trend("Ad") + season("M"))
  )

fit_retail |>tidy() |> 
  filter(term %in% c("alpha", "beta", "gamma", "phi")) |> 
  pivot_wider(names_from = .model, values_from = estimate)
fit_retail |> 
  forecast(h = '3 year') |> 
  autoplot(supermarket |> filter(Month >= yearmonth('2012 Jan')), level = NULL)

Estimated smoothing parameters for both models are quite similar. The level smoothing parameter alpha is slightly smaller in the damped model, while beta (trend smoothing) is a bit larger for MAM model. Seasonal smoothing gamma is almost the same in both models, suggesting the seasonal pattern is fairly stable. The damping parameter phi = 0.98 in MAdM is very close to 1, which means the trend is only very lightly damped and behaves almost like the regular trend model.
Alss from the plot we don’t see significant difference for 3 years

fit_retail |> 
  accuracy()

c. Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

fit_retail |>
  accuracy()

Both models produce very similar results, but the MAM model performs slightly better. Its RMSE (76.46) is a little bit lower than the damped trend model MAdM (77.66), which means the regular multiplicative Holt-Winters fits the data slightly better on the training set.
The estimated smoothing parameters are also similar between models, and the damping parameter phi = 0.98 is very close to 1, indicating that trend damping very light. Because of this and the lower RMSE, the MAM model is preferred.

d. Check that the residuals from the best method look like white noise.

fit_retail |> 
  select(MAM) |> 
  gg_tsresiduals()
## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

From the plot above I can say that”

Let’s confirm it with Ljuang-Box test

fit_retail |> 
  augment() |> 
  filter(.model == "MAM") |> # Use your best model name here
  features(.innov, ljung_box, lag = 24)

Since p-value is 0 (or less than 0.05) and Q value is very high (713.81), the Ljung-Box test rejecting the null hypothesis. This confirms that residuals are not white noise, and there is still some significant information or pattern left in the errors that the MAM model didn’t capture.

e. Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

train_supermarket <- supermarket |> 
  filter(Month<=yearmonth('2010 Dec'))
test_supermarket <- supermarket |> 
  filter(Month> yearmonth('2010 Dec'))

fit_MAM_NAIVE <- train_supermarket |> 
  model(
    MAM = ETS(Turnover~error('M') + trend('A') + season('M')),
    S_NAIVE = SNAIVE(Turnover)
  )
fc_MAM_NAIVE <- fit_MAM_NAIVE |> 
  forecast(test_supermarket)

fc_MAM_NAIVE |> autoplot(supermarket, level = NULL)

fc_MAM_NAIVE |> 
  accuracy(test_supermarket) |> 
  select(.model, RMSE, MAE, MAPE)

As we see from the forecasts plot and accuracy results MAM model clearly outperforms the seasonal naive benchmark model on the test set. The test RMSE for MAM is 764.86, while the seasonal naive model has a much larger RMSE of 1563.51. Similar improvements appear in MAE and MAPE as well.
This means the Holt-Winters multiplicative model captures the trend and seasonality much better than simply repeating last year’s seasonal pattern, so it is so far better model than seasonal naive approach.

  1. For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?
fit_STL_ETS <- train_supermarket |> 
  model(
    MAM = ETS(Turnover~ error('M') + trend('A')+ season('M')),
    STL_ETS_periodic = decomposition_model(
      STL(log(Turnover) ~ season(window = 'periodic')),
      ETS(season_adjust ~ error('A') + trend('A') + season('N'))
    ),
    STL_ETS_SW15 = decomposition_model(
      STL(log(Turnover) ~ season(window = 15, degree = 0)),
      ETS(season_adjust ~ error('A') + trend('A') + season('N'))
    )
  )

So here I applied 3 models:

fc_STL_ETS <- fit_STL_ETS |> 
  forecast(test_supermarket)
fc_STL_ETS |> 
  accuracy(test_supermarket) |> 
  select(.model, RMSE, MAE, MAPE) |> 
  arrange(RMSE)

As we see STL_ETS_SW15 (model used log transformation, STL decomposition with seasonal window = 15)has the lowest RMSE (514.01) compare to another models including MAM model from previous exercise.

fc_STL_ETS |> 
  autoplot(supermarket |> filter(Month >= yearmonth('2000 Jan')), level = NULL) + 
  facet_wrap(~ .model, ncol = 1) +      
  guides(colour = "none") +            
  labs(title = "Model Comparison: Test Set Performance",
       subtitle = "Data after 2012 Dec",
       y = "Turnover")

From the plot we see that the MAM model underestimates the future values and underfits the upward trend, so its forecasts stay below the actual series.
Both STL_ETS models follow the structure of the data much better, capturing the trend and seasonal movements more accurately. But STL_ETS_SW15 produces forecasts closest to the observed values, especially it alligns much better right before 2016.

Also STL_ETS_SW15 produced slightly overestimated forecasts after 2016, and I think it makes sense to try a damped additive trend inside the ETS part of the STL_ETS_SW15 model and try different values of the damping parameter phi to improve forecast points and accuracy.

Let’s try it below.

fit_STL_ETS_ETSdamped <- train_supermarket |> 
  model(
    MAM = ETS(Turnover~ error('M') + trend('A')+ season('M')),
    STL_ETS_periodic = decomposition_model(
      STL(log(Turnover) ~ season(window = 'periodic')),
      ETS(season_adjust ~ error('A') + trend('A') + season('N'))
    ),
    STL_ETS_SW15 = decomposition_model(
      STL(log(Turnover) ~ season(window = 15, degree = 0)),
      ETS(season_adjust ~ error('A') + trend('A') + season('N'))
    ),
    STL_ETS_SW15_Damped = decomposition_model(
      STL(log(Turnover) ~ season(window = 15)),
      ETS(season_adjust ~ error('A') + trend('Ad', phi = 0.993) + season('N'))
    )
  )

fit_STL_ETS_ETSdamped |> accuracy()

The STL models performs better than MAM model on the training data. Setting the seasonal component to vary with window = 15 already improves RMSE substantially. Adding a damped trend slightly improves the fit further, giving the lowest training RMSE (64.71).
This suggests that a lightly damped trend helps control the growth of the series and produces a slightly better model.

fc_with_ETSdamped <- fit_STL_ETS_ETSdamped |> 
  forecast(test_supermarket)
fc_with_ETSdamped |> 
  accuracy(test_supermarket) |> 
  select(.model, RMSE, MAE, MAPE) |> 
  arrange(RMSE)

As we see the best result on the test data so far was obtained with phi = 0.993. The new STL_ETS_SW15_Damped model significantly improves forecast accuracy, reducing RMSE to 329.34, compared with 514.01 for the STL_ETS_SW15 model. It also performs much better than the original MAM model, which had RMSE = 764.86.
This suggests that adding a damped trend to the ETS component can help to control the trend growth and produces forecasts that align much closer with the observed data.

fc_with_ETSdamped |> 
  autoplot(supermarket, level = NULL) + 
  facet_wrap(~ .model, ncol = 1) +      
  guides(colour = "none") +            
  labs(title = "Model comparison with test set",
       subtitle = "Test set data after 2012 Dec",
       y = "Turnover")

From the plot we see the new STL_ETS_SW15_Damped forecast linealigns with original data much better compared to other models.