Data 624 HW 5

library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.4     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   2.0.1     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(kableExtra)

## 
## Attaching package: 'kableExtra'

## The following object is masked from 'package:dplyr':
## 
##     group_rows

library(fpp3)

## -- Attaching packages -------------------------------------------- fpp3 0.4.0 --

## v lubridate   1.7.10         v feasts      0.2.2     
## v tsibble     1.0.1.9999     v fable       0.3.1     
## v tsibbledata 0.3.0.9000

## -- Conflicts ------------------------------------------------- fpp3_conflicts --
## x lubridate::date()        masks base::date()
## x dplyr::filter()          masks stats::filter()
## x kableExtra::group_rows() masks dplyr::group_rows()
## x tsibble::intersect()     masks base::intersect()
## x tsibble::interval()      masks lubridate::interval()
## x dplyr::lag()             masks stats::lag()
## x tsibble::setdiff()       masks base::setdiff()
## x tsibble::union()         masks base::union()

library(forecast)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

8.1

Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

vicPig <- aus_livestock %>% filter(State=='Victoria' & Animal == 'Pigs' )%>% fill_gaps(.full=TRUE)

Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of
α and
ℓ 0 , and generate forecasts for the next four months. Compute a 95% prediction interval for the first forecast using
^ y ± 1.96 s where
s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

fit<-vicPig %>% model(ETS(Count ~ error("A") + trend("N") + season("N")))

fit%>%report

## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##      l[0]
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

pigSigma<-sqrt(fit[[3]][[1]]$fit$fit[[1]])

print(paste("Sigma ", pigSigma))

## [1] "Sigma  9353.11499902547"

fc<- fit%>% forecast(h=4) 

fc%>% autoplot(vicPig)

We see an alpha of .3221247 and an l of 100647 sigma of 9353 Our CI

fc$Count[[1]][[1]] + c(-1,1) * 1.96 * pigSigma

## [1]  76854.45 113518.66

R model

hilo(fc$Count)

## <hilo[4]>
## [1] [76854.79, 113518.3]95 [75927.17, 114445.9]95 [75042.22, 115330.9]95
## [4] [74194.54, 116178.6]95

They are quite close given our imprecise distribution.

8.5

Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

set.seed(1072021)
ourCountry <- global_economy %>% filter(Code == sample(global_economy$Code,1))

Our country is France.

Plot the Exports series and discuss the main features of the data.

ourCountry%>% autoplot(vars(Exports))

We see the tail end of the Trente Glorieuses (the thirty post war years renowned for their growth), but it was obviously not an export based boom. The data starts from 1960 on a downbeat, with a massive increase in 1967. The trend is upwards until, there is a sharp drop post 1983, followed by an even sharper recovery. While it can on the whole actually be reasonably accurately modeled with a simple drift trend line, there are substantial variations.

Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.

fit <-ourCountry  %>% filter(Year<2000)%>% model(ETS(Exports ~ error("A") + trend("N") + season("N")))

fc<- fit%>% forecast(h=10) 

fc%>% autoplot(ourCountry%>%filter(Year<2000) )

Compute the RMSE values for the training data.

fc%>% accuracy(ourCountry)

## # A tibble: 1 x 11
##   .model          Country .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE   ACF1
##   <chr>           <fct>   <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 "ETS(Exports ~~ France  Test   1.20  1.63  1.45  4.23  5.23  1.80  1.49 0.0852

1.632346

Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.)

fit <- ourCountry  %>% filter(Year<2000)%>%
  model(
    AAN = ETS(Exports ~ error("A") + trend("A") + season("N"))
  )
fc <- fit %>% forecast(h = 10)


fc%>% autoplot(ourCountry %>% filter(Year<2000))

fc%>% accuracy(ourCountry)

## # A tibble: 1 x 11
##   .model Country .type     ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>  <fct>   <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAN    France  Test  -0.414  1.70  1.29 -1.75  4.86  1.61  1.55 0.290

1.704769

Discuss the merits of the two forecasting methods for this data set.

Somewhat surprisingly the non trend model is ever so slightly better

Compare the forecasts from both methods. Which do you think is best?

That said the trended model is probably better over more cut points, given the underlying data does have a trend

Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

8.6

Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

chinaCountry <- global_economy %>% filter(Country=='China')

fit<- chinaCountry %>% model(
    nonDamped = ETS(GDP ~ error("A") + trend("A") +
                                                season("N")),
    damped = ETS(GDP ~ error("M") + trend("Ad") +
                                                season("N")),
  )
fc<- fit%>% forecast(h=20) 

fc%>% autoplot(chinaCountry)  +
  guides(colour = guide_legend(title = "Forecast"))

Given a very large predictor we see that damped is very conservative, almost laughably so as far as the CI goes (which would have the GDP be massively negative). ## 8.7

Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

fit<- aus_production %>% model(
    nonSeason = ETS(Gas ~ error("A") + trend("A") +
                                                season("N")),
    multiplicative = ETS(Gas ~ error("M") + trend("A") +
                                                season("M"))
  )
fc<- fit%>% forecast(h=20) 

fc%>% autoplot(aus_production)  +
  guides(colour = guide_legend(title = "Forecast"))

fit<- aus_production %>% model(
    nonDamped = ETS(Gas ~ error("A") + trend("A") +
                                                season("M")),
    multiplicative = ETS(Gas ~ error("M") + trend("Ad") +
                                                season("M"))
  )
fc<- fit%>% forecast(h=20) 

fc%>% autoplot(aus_production)  +
  guides(colour = guide_legend(title = "Forecast")) + labs(title='Damped')

## 8.8

Recall your retail time series data (from Exercise 8 in Section 2.10).

set.seed(12345678-4321)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseriestrain <- myseries %>% filter(year(Month) < 2015)

Why is multiplicative seasonality necessary for this series?

The seasonality is inconstant. Were the time period shorter it would be less of an issue, but the seasonal variation eventually exceeds the absolute entire market value of 40 years ago.

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

fit<- myseriestrain %>% model(
    nonDamped = ETS(Turnover ~ error("A") + trend("A") +
                                                season("M")),
    multiplicative = ETS(Turnover ~ error("M") + trend("Ad") +
                                                season("M"))
  )
fc<- fit%>% forecast(h=20) 

fc%>% autoplot(myseriestrain)

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

fc%>% accuracy(myseries)

## # A tibble: 2 x 12
##   .model  State  Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE   ACF1
##   <chr>   <chr>  <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 multip~ New S~ Clothin~ Test   24.2  29.8  24.2  5.00  5.00  1.31  1.23 0.105 
## 2 nonDam~ New S~ Clothin~ Test   22.7  28.5  22.8  4.68  4.70  1.23  1.17 0.0269

The non damped seems to have a better RMSE/MAE. It seems reasonable to use the non damped one.

Check that the residuals from the best method look like white noise.

checkresiduals(fit$nonDamped[[1]]$fit)

## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.

Residuals look mostly random, though there are some lags.

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

myseriestrain <- myseries %>% filter(year(Month) < 2011)
fit<- myseriestrain %>% model(
    nonDamped = ETS(Turnover ~ error("A") + trend("A") +
                                                season("M")),
    SeasonNaive = SNAIVE(Turnover)  )
fc<- fit%>% forecast(h=20) 
fc%>%autoplot(myseriestrain)

fc%>%accuracy(myseries)

## # A tibble: 2 x 12
##   .model  State  Industry  .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>   <chr>  <chr>     <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 nonDam~ New S~ Clothing~ Test  -8.39  15.2  11.1 -2.61  3.40 0.671 0.718 0.656
## 2 Season~ New S~ Clothing~ Test   6.47  19.2  16.8  1.91  5.18 1.01  0.906 0.784

Yes, it is handily beaten ## 8.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

myseriestraints <- myseries %>% filter(year(Month) < 2011) %>% select(Turnover) %>% as.ts
l <- BoxCox.lambda(myseriestraints)

fit<-  myseriestraints %>% stlm( lambda=l )
fc<- fit%>% forecast(h=20) 
fc%>%autoplot()

fc%>%accuracy(myseries%>%select(Turnover)%>%as.ts)

##                        ME     RMSE       MAE        MPE     MAPE      MASE
## Training set  -0.03884369 10.72156  7.928534 -0.1239748 3.831106 0.4779511
## Test set     -13.66822722 17.96061 13.935265 -4.1362662 4.219689 0.8400513
##                    ACF1 Theil's U
## Training set 0.06826631        NA
## Test set     0.55564333 0.2934094

In this case it did improve the RMSE/MAE, and so probably should be used.

Data 624 HW 5

Scott Reed

10/7/2021

8.1

8.5

8.6