Exponential Smoothing Homework

8.1

Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of
α and
ℓ 0, and generate forecasts for the next four months.

n_pigs<-aus_livestock%>%
  filter(year(Month) >='2010')%>%
  filter(Animal =='Pigs', State == 'Victoria')%>%
  summarize(slaughtered = sum(Count)/1e3)

fit<-n_pigs%>%
  model(ETS(slaughtered~error('A')+trend('N')+season('N')))
report(fit)

## Series: slaughtered 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.2252267 
## 
##   Initial states:
##      l[0]
##  67.15734
## 
##   sigma^2:  49.4639
## 
##      AIC     AICc      BIC 
## 930.9858 931.2166 939.0322

The optimal values are as follows:

\[\alpha = .25\\ \ell_0 = 67.15\]

Now lets generate forecasts for the next four months

fc<- fit %>%forecast(h= '4 months')
fc%>%
  autoplot(n_pigs)+ 
  labs(title = 'Pigs slaughtered by month')+
  guides(color = guide_legend(title = 'Forecast'))

Compute a 95% prediction interval for the first forecast where \(s\) is the standard deviation of the residuals. Compare your interval with the interval produced by R.

#get first predicted val
y_hat <-fc$.mean[1]

# get resids from augment(fit)

s<-sd(augment(fit)$.resid)

#manually calculate the interval

u_95<-y_hat + (s*1.96)
l_95<-y_hat - (s*1.96)

cat('prediction interval for h=1 from model: ',l_95,' to ',u_95)

## prediction interval for h=1 from model:  81.76465  to  108.8243

calculate with the r function hilo()

intervals<-fc%>%hilo()
intervals$`95%`[1]

## <hilo[1]>
## [1] [81.50994, 109.079]95

the intervals seem to be slightly wider when calculated by R. It must be downgrading the standard deviance for some reason.

8.5

Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

Plot the Exports series and discuss the main features of the data.

There is no data pre 1970, so we should consider dropping that data.

There seems to be a general linear or exponential trend in the data.

There seems to be no seasonality to mention.

Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.

## Series: Exports 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.9999 
## 
##   Initial states:
##      l[0]
##  14.60206
## 
##   sigma^2:  3.2936
## 
##      AIC     AICc      BIC 
## 240.9366 241.4947 246.4870

Compute the RMSE values for the training data.

accuracy(fit)$RMSE

## [1] 1.775801

Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.

## # A tibble: 2 x 11
##   .model Country .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>  <fct>   <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Holt   Germany Test  0.926  5.04  3.81  1.64  11.4  2.81  2.81 0.860
## 2 SES    Germany Test  4.23   6.26  4.61 11.7   13.2  3.41  3.49 0.745

Since these are additive methods, we can use RMSE to compare them. The Holt method has a better RMSE. Since Holt takes into account trend, this makes sense. This difference will be excasterbated by the increase in the length of the prediction interval.

Compare the forecasts from both methods. Which do you think is best?

This confirms that Holt’s method is superior since it takes into account the trend of the data.

Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

##   model    lower    upper interval level
## 1  Holt 44.57441 51.26298 6.688564    95
## 2   SES 43.67873 50.79274 7.114018    95

The above table shows that Holt has both a tighter confidence interval and is trending with the data better.

8.6

Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

first lets look at the

As we can see, the box cox transformation makes the distribution of the GDP more normal, which bodes well for modeling

Using a very large \(\phi\) and long horizon, we can see the effect of damping on our forecasts.

Lets repeat this on the transformed data

The trend becomes much more linear once we apply a Box Cox transformation. This makes sense because we are removing the exponential component of the signal with a box cox transformation. I like to think about these types of transformations as warping the ‘space’ that the graph is plotted on. If you were to draw this GDP graph on a flat piece of paper, you would get the untransformed data, but if you drew it on a bowl, upon observation, it would appear like the transformed data. ## 8.7

Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

First lets look at the data

Note the expanding variance as the time increases. Multiplicative Seasonality will allow our forecast to grow in variance as the season increases. (alternatively you could perform a BoxCox transformation if you wanted to capture the multiplicative error in a different way.)

Lets make some models

## # A tibble: 2 x 10
##   .model       .type      ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>        <chr>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Holt_Winters Test  -0.0644  7.52  5.04 0.685  6.13 0.904 0.991 0.656
## 2 HW_damped    Test   0.249   7.44  5.04 1.33   6.11 0.904 0.981 0.652

Because these methods are multiplicative it is best not to use RMSE to compare them, we should stick to the percentage errors like MAPE or MASE.

They are virtually identical, so the best practice would be to select the simpler, unamped model.

8.8

Recall your retail time series data (from Exercise 8 in Section 2.10).

Why is multiplicative seasonality necessary for this series?

first, lets look at the plot.

autoplot(myseries)

## Plot variable not specified, automatically selected `.vars = Turnover`

I would argue that there might be a seasonal multiplicative trend to the variation of this data. It does seem to grow in variance as the time increases. This could also be due to the level of the turnover. Futher investigation would be needed.

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

## # A tibble: 2 x 12
##   .model  State Industry .type     ME  RMSE   MAE    MPE  MAPE  MASE RMSSE  ACF1
##   <chr>   <chr> <chr>    <chr>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Holt_W~ West~ Newspap~ Test  -0.136  4.37  3.01 -1.59   9.90 0.721 0.771 0.677
## 2 HW_dam~ West~ Newspap~ Test   0.237  4.35  2.99 -0.281  9.76 0.718 0.767 0.674

Again, the accuracy metrics are nearly identical, so selecting the simpler undamped model is correct.

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,A,M)
## Q* = 39.603, df = 8, p-value = 3.798e-06
## 
## Model df: 16.   Total lags used: 24

The residuals Do look like white noise. We could perform a Shapiro-Wilks test here to see if they actually are white noise, but that might be too specific.

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

## # A tibble: 2 x 2
##   .model        RMSE
##   <chr>        <dbl>
## 1 Holt_Winters  4.95
## 2 NAIVE        11.4

As we can see from the RMSE, the Holt Winters method dramatically outperforms the Naive method on this dataset.

8.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

Now lets look at the box_cox transformed data

As we can see, the error looks far more normal.

Now lets Try our ETS again

## # A tibble: 2 x 2
##   .model         RMSE
##   <chr>         <dbl>
## 1 Holt_Winters 0.0854
## 2 NAIVE        0.195

RMSE is not a good measure to compare a transformed and untransformed dataset because the units of RMSE are the units of the data. A rate measure would probably be better

## # A tibble: 2 x 2
##   Data           MAPE
##   <chr>         <dbl>
## 1 BoxCox         2.89
## 2 Untransformed 10.9

As we can see, MAPE (mean absolute percentage error) which measures percent error per time period is a better metric when comparing across scales.

In this case, The ETS on the transformed data worked significantly better.

Exponential Smoothing Homework

Jack Wright

8.1

8.5

8.6

8.8

8.9