DATA 624 HW5

1) Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

a) Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of alpha and l0 , and generate forecasts for the next four months.

Optimal Values for alpha = 0.3579,l0 = 95487.5

avp<-aus_livestock%>%filter(Animal=="Pigs",State=="Victoria")

fit<-avp%>%
  model(
    si = ETS(Count)
)

r<-fit%>%report()

## Series: Count 
## Model: ETS(A,N,A) 
##   Smoothing parameters:
##     alpha = 0.3579401 
##     gamma = 0.0001000139 
## 
##   Initial states:
##     l[0]     s[0]    s[-1]     s[-2]     s[-3]     s[-4]     s[-5]   s[-6]
##  95487.5 2066.235 6717.311 -2809.562 -1341.299 -7750.173 -8483.418 5610.89
##      s[-7]    s[-8]     s[-9]   s[-10]   s[-11]
##  -579.8107 1215.464 -2827.091 1739.465 6441.989
## 
##   sigma^2:  60742898
## 
##      AIC     AICc      BIC 
## 13545.38 13546.26 13610.24

f<-fit%>%forecast(h=4)
f

## # A fable: 4 x 6 [1M]
## # Key:     Animal, State, .model [1]
##   Animal State    .model    Month             Count  .mean
##   <fct>  <fct>    <chr>     <mth>            <dist>  <dbl>
## 1 Pigs   Victoria si     2019 Jan N(84425, 6.1e+07) 84425.
## 2 Pigs   Victoria si     2019 Feb N(85158, 6.9e+07) 85158.
## 3 Pigs   Victoria si     2019 Mar N(91567, 7.6e+07) 91567.
## 4 Pigs   Victoria si     2019 Apr N(90098, 8.4e+07) 90098.

f%>%autoplot(avp%>%filter(Month>= yearmonth("2018 Jan")))

b) Compute a 95% prediction interval for the first forecast using ^y±1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

MY CI is much smaller then R’s which seems to be much more realistic given the past numbers.

s<-augment(fit)%>%pull(.resid)%>%sd()
y<-f%>%pull(.mean)%>%head(1)
lbound<-y-(1.96*s)
ubound<-y+(1.96*s)
print(paste("My 95% CI : ",lbound," ",ubound))

## [1] "My 95% CI :  69328.2490079102   99521.1652255165"

print(paste("R's 95% CI : ",f%>%pull(Count)%>%head(1)))

## [1] "R's 95% CI :  N(84425, 6.1e+07)"

5) Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

fr<-global_economy%>%filter(Country=="France")

a) Plot the Exports series and discuss the main features of the data.

The data is clearly trending up over time with fluctuations including a large dip in export sin the 90s.

autoplot(fr,Exports)+
  labs(title = "French Exports by % of GDP",
       y = "% of GDP")

b) Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.

fit<-fr%>%
  model(
    ANN = ETS(Exports ~ error("A")+trend("N")+season("N"))
)

r<-fit%>%report()

## Series: Exports 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.9998995 
## 
##   Initial states:
##     l[0]
##  14.3989
## 
##   sigma^2:  1.3745
## 
##      AIC     AICc      BIC 
## 257.9200 258.3644 264.1013

f<-fit%>%forecast(h=10)
f

## # A fable: 10 x 5 [1Y]
## # Key:     Country, .model [1]
##    Country .model  Year    Exports .mean
##    <fct>   <chr>  <dbl>     <dist> <dbl>
##  1 France  ANN     2018 N(31, 1.4)  30.9
##  2 France  ANN     2019 N(31, 2.7)  30.9
##  3 France  ANN     2020 N(31, 4.1)  30.9
##  4 France  ANN     2021 N(31, 5.5)  30.9
##  5 France  ANN     2022 N(31, 6.9)  30.9
##  6 France  ANN     2023 N(31, 8.2)  30.9
##  7 France  ANN     2024 N(31, 9.6)  30.9
##  8 France  ANN     2025  N(31, 11)  30.9
##  9 France  ANN     2026  N(31, 12)  30.9
## 10 France  ANN     2027  N(31, 14)  30.9

f%>%autoplot(fr)+
  labs(title = "Forecast of French Exports by % of GDP",
       y = "% of GDP")

c) Compute the RMSE values for the training data.

RMSE = 1.152003

fit%>%accuracy()

## # A tibble: 1 x 11
##   Country .model .type       ME  RMSE   MAE   MPE  MAPE  MASE RMSSE     ACF1
##   <fct>   <chr>  <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
## 1 France  ANN    Training 0.284  1.15 0.840  1.17  3.86 0.983 0.991 -0.00542

d) Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.

The AAn trended model takes into account the trend of the dataset. Since the french exports by gdp is heavily trending upward the forecast of the trended model are going to be trending much more then the forecast of the non trended model.In this case the RMSE offers a small edge to the trended method.For this particular dataset the trended method definitely has strong consideration because it is strongly trending in recent years however it has not historically always trended which lends the credence of using a different model. However, in this case I think the trended model is the clear choice.

fit2<-fr%>%
  model(
    AAN= ETS(Exports ~ error("A")+trend("A")+season("N"))
)


fit%>%accuracy()

## # A tibble: 1 x 11
##   Country .model .type       ME  RMSE   MAE   MPE  MAPE  MASE RMSSE     ACF1
##   <fct>   <chr>  <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
## 1 France  ANN    Training 0.284  1.15 0.840  1.17  3.86 0.983 0.991 -0.00542

fit2%>%accuracy()

## # A tibble: 1 x 11
##   Country .model .type         ME  RMSE   MAE    MPE  MAPE  MASE RMSSE   ACF1
##   <fct>   <chr>  <chr>      <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 France  AAN    Training -0.0111  1.12 0.800 -0.243  3.81 0.936 0.964 0.0264

f2<-fit2%>%forecast(h=10)



f%>%autoplot(fr)+
  labs(title = "Forecast of French Exports by % of GDP AAN Model",
       y = "% of GDP")

f2%>%autoplot(fr)+
  labs(title = "Forecast of French Exports by % of GDP ANN Model",
       y = "% of GDP")

e) Compare the forecasts from both methods. Which do you think is best?

I think the forecasts, especially when graphed, for the trending method appear to be much more clearly in line with the historic dataset given the trends of the French exports by GDP. Also considering the RMSE and MAE I think the trended method is the better model.

f) Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

Once again my intervals are much more narrow then R’s.

ANNRMSE<-fit%>%accuracy()%>%transmute(RMSE)
AANRMSE<-fit2%>%accuracy()%>%transmute(RMSE)



y<-f%>%pull(.mean)%>%head(1)
lbound<-y-(1.96*ANNRMSE)
ubound<-y+(1.96*ANNRMSE)
print(paste("My 95% CI ANN: ",lbound," ",ubound))

## [1] "My 95% CI ANN:  28.6238549283032   33.1397076931717"

print(paste("R's 95% CI ANN: ",f%>%pull(Exports)%>%head(1)))

## [1] "R's 95% CI ANN:  N(31, 1.4)"

y<-f2%>%pull(.mean)%>%head(1)
lbound<-y-(1.96*AANRMSE)
ubound<-y+(1.96*AANRMSE)

print(paste("My 95% CI AAN: ",lbound," ",ubound))

## [1] "My 95% CI AAN:  28.9789600391914   33.3692025588516"

print(paste("R's 95% CI AAN: ",f%>%pull(Exports)%>%head(1)))

## [1] "R's 95% CI AAN:  N(31, 1.4)"

6) Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

We can see that box_cox transformed data tends to cause the forecasts to trend slightly upward as compared to non transformed. Due to the rapid increase in Chinese GDP a Box Cox trended model is likely the best choice.

[Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]

ch<-global_economy%>%filter(Country=="China")


lambda <- ch %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

fit<-ch%>%
  model(
    ANN = ETS(GDP ~ error("A")+trend("N")+season("N")),
    AAN = ETS(GDP ~ error("A")+trend("A")+season("N")),
    Damped = ETS(GDP ~ error("A") + trend("Ad") + season("N")),
    bc_ANN = ETS(box_cox(GDP,lambda)  ~ error("A")+trend("N")+season("N")),
    bc_AAN = ETS(box_cox(GDP,lambda)  ~ error("A")+trend("A")+season("N")),
    bc_Damped = ETS(box_cox(GDP,lambda)  ~ error("A") + trend("Ad") + season("N"))
)

r<-fit%>%report()
f<-fit%>%forecast(h=10)



f%>%autoplot(ch)+
  labs(title = "Forecasts of Chinese GDP",
       y = "GDP")

7) Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

It is obvious that seasonality is needed in some form here due to the fluctuations. Multiplicative seasonality is the preferred method because the range of the seasonal peaks is increasing as time goes on. Adding the Damped function makes the models slightly more accurate with multiplicative_damped being the best model. Interestingly dampening the additive model produces the lowest RMSE but with what we know about the range of seasonality multiplicative models would be more accurate and preferred.

fit <- aus_production %>%
  model(
    additive = ETS(Gas ~ error("A") + trend("A") +
                                                season("A")),
    multiplicative = ETS(Gas ~ error("M") + trend("A") +
                                                season("M")),
    additive_damp = ETS(Gas ~ error("A") + trend("Ad") +
                                                season("A")),
    multiplicative_damp = ETS(Gas ~ error("M") + trend("Ad") +
                                                season("M"))
  )
fc <- fit %>% forecast(h = 15)
fc %>%
  autoplot(aus_production, level = NULL) +
  labs(title="Australian Gas Production",
       y="Gas Production in petajoules") +
  guides(colour = guide_legend(title = "Forecast"))

fit%>%accuracy()

## # A tibble: 4 x 10
##   .model           .type         ME  RMSE   MAE    MPE  MAPE  MASE RMSSE    ACF1
##   <chr>            <chr>      <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 additive         Traini…  0.00525  4.76  3.35 -4.69  10.9  0.600 0.628  0.0772
## 2 multiplicative   Traini… -0.115    4.60  3.02  0.199  4.08 0.542 0.606 -0.0131
## 3 additive_damp    Traini…  0.600    4.58  3.16  0.460  7.39 0.566 0.604  0.0660
## 4 multiplicative_… Traini… -0.00439  4.59  3.03  0.326  4.10 0.544 0.606 -0.0217

8) Recall your retail time series data (from Exercise 8 in Section 2.10).

a) Why is multiplicative seasonality necessary for this series?

Multiplicative seasonality is the preferred method because the range of the seasonal peaks is increasing as time goes on.

set.seed(19865)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

autoplot(myseries, Turnover) +
  labs(title = "Retail Turnover",
       y = "$Million AUD")

b) Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

fit <- myseries %>%
  model(
    additive = ETS(Turnover ~ error("A") + trend("A") +
                                                season("A")),
    multiplicative = ETS(Turnover ~ error("M") + trend("A") +
                                                season("M")),
    additive_damp = ETS(Turnover ~ error("A") + trend("Ad") +
                                                season("A")),
    multiplicative_damp = ETS(Turnover ~ error("M") + trend("Ad") +
                                                season("M"))
  )
fc <- fit %>% forecast(h = 15)
fc %>%
  autoplot(myseries, level = NULL) +
  labs(title="Australian Retail Turnover in $Million AUD",
       y="$Million AUD") +
  guides(colour = guide_legend(title = "Forecast"))

fit%>%accuracy()

## # A tibble: 4 x 12
##   State   Industry   .model  .type      ME  RMSE   MAE     MPE  MAPE  MASE RMSSE
##   <chr>   <chr>      <chr>   <chr>   <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 South … Other ret… additi… Trai… -0.0871  6.10  4.52 -0.482   3.98 0.546 0.569
## 2 South … Other ret… multip… Trai…  0.152   5.45  3.80 -0.0486  2.82 0.458 0.508
## 3 South … Other ret… additi… Trai…  0.858   6.26  4.65  0.840   4.16 0.561 0.584
## 4 South … Other ret… multip… Trai…  0.482   5.48  3.82  0.325   2.84 0.461 0.511
## # … with 1 more variable: ACF1 <dbl>

c) Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

THe RMSE is clearly better for the multiplicative models then the additive.

d) Check that the residuals from the best method look like white noise.

fit <- myseries %>%
  model(
    
    multiplicative_damp = ETS(Turnover ~ error("M") + trend("Ad") +
                                                season("M"))
  )
fit%>% gg_tsresiduals()

The ACF graph indicates the residuals are not white noise

e) Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

We can see based on RMSE these models beat the naive approach from chapter 5

train<- myseries%>%filter(year(Month)<=2010)
test<- myseries%>%filter(year(Month)>2010)


fit <- train %>%
  model(
    additive = ETS(Turnover ~ error("A") + trend("A") +
                                                season("A")),
    multiplicative = ETS(Turnover ~ error("M") + trend("A") +
                                                season("M")),
    additive_damp = ETS(Turnover ~ error("A") + trend("Ad") +
                                                season("A")),
    multiplicative_damp = ETS(Turnover ~ error("M") + trend("Ad") +
                                                season("M"))
  )

f<-fit%>%forecast(test)
f %>%
  autoplot(myseries, level = NULL) +
  labs(title="Australian Retail Turnover in $Million AUD",
       y="$Million AUD") +
  guides(colour = guide_legend(title = "Forecast"))

fit2 <- train %>%
  model(SNAIVE(Turnover))




fc <- fit2 %>%
  forecast(new_data = anti_join(myseries, train))
fc %>% autoplot(myseries)

fit%>%accuracy()

## # A tibble: 4 x 12
##   State   Industry   .model  .type      ME  RMSE   MAE     MPE  MAPE  MASE RMSSE
##   <chr>   <chr>      <chr>   <chr>   <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 South … Other ret… additi… Trai… -0.0223  5.36  3.98 -0.497   4.07 0.531 0.564
## 2 South … Other ret… multip… Trai…  0.157   4.09  3.05 -0.0938  2.86 0.407 0.431
## 3 South … Other ret… additi… Trai…  0.701   5.55  4.08  0.516   4.16 0.546 0.584
## 4 South … Other ret… multip… Trai…  0.424   4.15  3.10  0.284   2.90 0.413 0.437
## # … with 1 more variable: ACF1 <dbl>

fit2 %>% accuracy()

## # A tibble: 1 x 12
##   State  Industry  .model  .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>  <chr>     <chr>   <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 South… Other re… SNAIVE… Trai…  6.31  9.50  7.49  5.84  6.97     1     1 0.605

fc %>% accuracy(myseries)

## # A tibble: 1 x 12
##   .model   State  Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>  <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(… South… Other r… Test   21.8  33.6  25.6  8.21  10.1  3.42  3.53 0.907

9) For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

The STL decomposition with box cox transformation is the most accurate on the Training set.

lambda <- train %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

fit <- train %>%
  model(
    STL_BC = STL(box_cox(Turnover,lambda)),
    multiplicative = ETS(Turnover ~ error("M") + trend("A") +
                                                season("M")),
    ETS_BC = ETS(box_cox(Turnover,lambda)~ error("M") + trend("A") +
                                                season("M"))
  )


fit%>%accuracy()

## # A tibble: 3 x 12
##   State   Industry  .model  .type      ME  RMSE   MAE      MPE  MAPE  MASE RMSSE
##   <chr>   <chr>     <chr>   <chr>   <dbl> <dbl> <dbl>    <dbl> <dbl> <dbl> <dbl>
## 1 South … Other re… STL_BC  Trai…  0.0869  3.36  2.41  0.00184  2.22 0.321 0.353
## 2 South … Other re… multip… Trai…  0.157   4.09  3.05 -0.0938   2.86 0.407 0.431
## 3 South … Other re… ETS_BC  Trai… -0.480   4.77  3.23 -0.341    2.92 0.431 0.503
## # … with 1 more variable: ACF1 <dbl>