Chapter 9, Hyndman and Athanasopoulos

Q 9.1

9.1. Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise?

The primary differences among these three figures is the amount of numbers in the series, as described in the caption to the images. All of the data are white noise based on the fact that none of the autocorreltions in any of the series fall outside of the 95% limits.

F9.32 <- readJPEG("Figure_9.32.jpg",native=TRUE)
plot(0:1,0:1,type="n",ann=FALSE,axes=FALSE)
rasterImage(F9.32,0,0,1,1)

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The critical values (blue dotted lines) are at different distances from the mean of zero because the critical values are calculated as \(\pm2/\sqrt(T)\), so as T increases the critical boundaries decrease. For example, \(\pm2/\sqrt(36) = \pm0.3333\), \(\pm2/\sqrt(360) = 0.1054\) and \(\pm2/\sqrt(1000) = 0.0632\).

The autocorrelations are different in each figure due to the scale of the respective series, even though they all refer to white noise. White noise is determined when 95% of the spikes in the autocorrelations lie within the bounds of the critical values. As more random values are added to each series the autocorrelations will diminish outside the bounds of random chance.

Q 9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

The closing price of Amazon stock peaked in the third quarter of 2018, then fell off from that point. The upward trend in the close price from 2014Q1 to 2018Q3, followed by the downward trend in 2018Q4, shows that the series is non-stationary and should be differenced.

Amzn <- gafa_stock %>%
  filter(Symbol == "AMZN")
 
(Amzn %>%
  autoplot(Close) +
  labs(y = "Close Price", title = "Amazon stock daily closing prices 2014-2018"))

The ACF plot shows a steady, smooth decline but all well above the critical value threshold, proving further that the series is non-stationary and should be differenced, because the changes are not centered around a mean.

Amzn %>% 
  ACF(Close) %>%
    autoplot() +
    labs(y = "ACF", title = "ACF Amazon close price")

The PACF plot shows that the first lag is well above the critical value threshold, again proving that the series is non-stationary and should be differenced.

Amzn %>% 
  PACF(Close) %>%
    autoplot() +
    labs(y = "PACF", title = "PACF Amazon close price")

Q 9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

9.3.a

Turkish GDP from global_economy.

Looking at the raw series data first, we see a slow rising trend with no apparent seasonality followed by a more rapidly rising trend after 2000.

Tk_GDP <- global_economy %>%
  filter(Country == "Turkey")

Tk_GDP %>%
  autoplot(GDP) +
  labs(y = "GDP", title = "Turkey GDP 1960 - 2017")

Finding an appropriate Box-Cox transformation to smooth the data and make it more linear, the lambda of 0.1572 suggests a log transform, which when applied does make the data more linear.

lambda_Tk <- Tk_GDP %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

Tk_GDP %>%
autoplot(box_cox(GDP,lambda_Tk)) +
  labs(y = "",
       title = paste0("Transformed Turkish GDP with lambda = ",round(lambda_Tk,4)))

To find the right differencing in order to make the data more stationary (i.e. remove the trend), a unit root test can be helpful. Let’s try the KPSS test. Seeing a test statistic of 1.2231 and p-value of 0.01 confirms that the data is non-stationary and needs to be differenced.

Tk_GDP %>% features(GDP, unitroot_kpss)

## # A tibble: 1 x 3
##   Country kpss_stat kpss_pvalue
##   <fct>       <dbl>       <dbl>
## 1 Turkey       1.22        0.01

Differencing the non-seasonal data once and running the KPSS test again on the differenced data, the test statistic is now 0.3856 and the p-value is 0.0834, still under 0.1, so we should probably difference the data again.

Tk_GDP %>% mutate(diff_GDP = difference(GDP)) %>% features(diff_GDP, unitroot_kpss)

## # A tibble: 1 x 3
##   Country kpss_stat kpss_pvalue
##   <fct>       <dbl>       <dbl>
## 1 Turkey      0.386      0.0834

With a second differencing, the test statistic is now 0.04223, appropriately tiny, and the p-value is 0.1, indicating that the twice differenced data is now sationary.

Tk_GDP %>% mutate(diff_GDP = difference(difference(GDP))) %>% features(diff_GDP, unitroot_kpss)

## # A tibble: 1 x 3
##   Country kpss_stat kpss_pvalue
##   <fct>       <dbl>       <dbl>
## 1 Turkey     0.0422         0.1

Plotting the twice differenced data reveals that it is now stationary around a mean of zero.

Tk_GDP %>%
  mutate(diff_diff_GDP =  difference(difference(box_cox(GDP,lambda_Tk)))) %>%
  autoplot(diff_diff_GDP) +
  labs(y = "",
       title = paste0("Twice differenced Turkish GDP with Box-Cox lambda transform"))

9.3.b

Accommodation takings in the state of Tasmania from aus_accommodation.

Looking at the raw series data first, we see both a rising trend and seasonality with increasing variance.

Tas_accom <- aus_accommodation %>%
  filter(State == "Tasmania")

Tas_accom %>%
  autoplot(Takings) +
  labs(y = "Takings", title = "Tasmanian takings 1998 - 2016")

Finding an appropriate Box-Cox transformation to smooth the data and make it more linear, the lambda of -0.0488 suggests an inverse log transform, which when applied makes the seasonal variance more consistent.

lambda_Tas <- Tas_accom %>%
  features(Takings, features = guerrero) %>%
  pull(lambda_guerrero)

Tas_accom %>%
autoplot(box_cox(Takings,lambda_Tas)) +
  labs(y = "",
       title = paste0("Transformed Tasmanian takings with lambda = ",round(lambda_Tas,4)))

Given that we have seasonal data it would be more appropriate to find the differencing using unitroot_nsdiffs() rather than the KPSS_test. The test returns nsdiffs == 1, which indicates that one seasonal differencing is required.

Tas_accom %>%
  mutate(Tas_takings = log(Takings)) %>%
  features(Tas_takings, unitroot_nsdiffs)

## # A tibble: 1 x 2
##   State    nsdiffs
##   <chr>      <int>
## 1 Tasmania       1

Using unitroot_ndiffs can test for the need for non-seasonal differencing, but we see ndiffs == 0, meaning that there is no need to difference for non-seasonal reasons.

Tas_accom %>%
  mutate(Tas_diff_takings = difference(log(Takings))) %>%
  features(Tas_diff_takings, unitroot_ndiffs)

## # A tibble: 1 x 2
##   State    ndiffs
##   <chr>     <int>
## 1 Tasmania      0

Applying the seasonal differencing to the log-transformed data, with no non-seasonal differencing, one can see that the data is now stationary around a mean of zero.

Tas_accom %>%
  mutate(Tas_diff_takings =  difference(log(Takings))) %>%
  autoplot(Tas_diff_takings) +
  labs(y = "",
       title = paste0("Seasonally differenced Tasmanian takings with log transform"))

9.3.c

Monthly sales from souvenirs.

Looking at the raw series data first, we see a rising trend and seasonality with increasing variance.

Souvenir <- souvenirs

Souvenir %>%
  autoplot(Sales) +
  labs(y = "AUS $", title = "Monthly sales of souvenirs, Queensland shop 1987 - 1993")

Finding an appropriate Box-Cox transformation to smooth the data and make it more linear, the lambda of 0.0021 suggests a log transform, which when applied makes the seasonal variance more consistent.

lambda_Souv <- Souvenir %>%
  features(Sales, features = guerrero) %>%
  pull(lambda_guerrero)

Souvenir %>%
autoplot(box_cox(Sales,lambda_Souv)) +
  labs(y = "",
       title = paste0("Transformed souvenir sales with lambda = ",round(lambda_Souv,4)))

Given that we again have seasonal data it would be more appropriate to find the differencing using unitroot_nsdiffs(). The test again returns nsdiffs == 1, which indicates that one seasonal differencing is required.

Souvenir %>%
  mutate(Souv_sales = log(Sales)) %>%
  features(Souv_sales, unitroot_nsdiffs)

## # A tibble: 1 x 1
##   nsdiffs
##     <int>
## 1       1

Using unitroot_ndiffs tests for the need for non-seasonal differencing, but we again see ndiffs == 0, meaning that there is no need to difference for non-seasonal reasons.

Souvenir %>%
  mutate(Souv_diff_sales = difference(log(Sales))) %>%
  features(Souv_diff_sales, unitroot_ndiffs)

## # A tibble: 1 x 1
##   ndiffs
##    <int>
## 1      0

Applying the seasonal differencing to the log-transformed data, with no non-seasonal differencing, one can see that the data is now stationary around a mean of zero.

Souvenir %>%
  mutate(Souv_diff_sales =  difference(log(Sales))) %>%
  autoplot(Souv_diff_sales) +
  labs(y = "",
       title = paste0("Seasonally differenced souvenir sales with log transform"))

Q 9.5

For your retail data (from Exercise 8 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

This again refers to the aus_retail dataset showing retail turnover from 1982 to 2019. As a refresher, the original series is shown in the below plot. There is trend and seasonality with increasing variance in the data, so transformation and differencing may be useful.

set.seed(54321)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

autoplot(myseries,Turnover) +
  labs(title = "South Australia retail turnover by Month")

Finding an appropriate Box-Cox transformation to smooth the data and make it more linear, the lambda of -0.1028 suggests an inverse log transform, which when applied makes the seasonal variance at least a little more consistent throughout the series.

lambda_ret <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

myseries %>%
autoplot(box_cox(Turnover,lambda_ret)) +
  labs(y = "",
       title = paste0("Transformed South Australia retail data with lambda = ",round(lambda_ret,4)))

Given that we have seasonal data requires unitroot_nsdiffs(). The test again returns nsdiffs == 1, which indicates that one seasonal differencing is required.

myseries %>%
  mutate(Ret_turnover = box_cox(Turnover,lambda_ret)) %>%
  features(Ret_turnover, unitroot_nsdiffs)

## # A tibble: 1 x 3
##   State           Industry           nsdiffs
##   <chr>           <chr>                <int>
## 1 South Australia Clothing retailing       1

Using unitroot_ndiffs tests for the need for non-seasonal differencing, but the result shows ndiffs == 0, meaning that there is no need to difference for non-seasonal reasons.

myseries %>%
  mutate(Ret_diff_turnover = difference(box_cox(Turnover,lambda_ret))) %>%
  features(Ret_diff_turnover, unitroot_ndiffs)

## # A tibble: 1 x 3
##   State           Industry           ndiffs
##   <chr>           <chr>               <int>
## 1 South Australia Clothing retailing      0

Applying the seasonal differencing to the log-transformed data, with no non-seasonal differencing, one can see that the data is now stationary around a mean of zero.

myseries %>%
  mutate(Ret_diff_turnover =  difference(box_cox(Turnover,lambda_ret))) %>%
  autoplot(Ret_diff_turnover) +
  labs(y = "",
       title = paste0("Seasonally differenced South Australia retail data with log transform"))

Q 9.6

Simulate and plot some data from simple ARIMA models.

9.6.a

Use the following R code to generate data from an AR(1) model with \(\phi_1 = 0.6\) and \(\sigma^2 = 1\). The process starts with \(y_1 = 0\)

set.seed(13579)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

9.6.b

Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?

sim %>% autoplot(y) + labs(title = "AR(1) model time series plot of random sim with phi = 0.6")

changing \(\phi_1 = 1.0\): setting \(\phi_1 = 1.0\) and \(c \ne 0\) creates a random walk with drift.

set.seed(13579)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 1.0*y[i-1] + e[i]
sim1 <- tsibble(idx = seq_len(100), y = y, index = idx)

sim1 %>% autoplot(y) + labs(title = "AR(1) model time series plot of random sim with phi = 1.0 and c != 0")

changing \(\phi_1 = -1.0\): setting \(\phi_1 = -1.0\) and \(c \ne 0\) creates a random stationary series.

set.seed(13579)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- -1.0*y[i-1] + e[i]
sim2 <- tsibble(idx = seq_len(100), y = y, index = idx)

sim2 %>% autoplot(y) + labs(title = "AR(1) model time series plot of random sim with phi = -1.0 and c != 0")

9.6.c

Write your own code to generate data from an MA(1) model with \(\theta_1 = 0.6\) and \(\sigma^2 = 1\).

Modifying the code for the AR(1) by simply changing \(\phi_1\) to \(\theta_1\), we can arrive at code for MA(1) as shown below:

set.seed(24680)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i]
sim_theta <- tsibble(idx = seq_len(100), y = y, index = idx)

sim_theta %>% autoplot(y) + labs(title = "MA(1) model time series plot of random sim with theta = 0.6")

9.6.d

Produce a time plot for the series. How does the plot change as you change \(\theta_1\)?

changing \(\theta_1 = 1.0\): setting \(\theta_1 = 1.0\) makes the lags increase.

set.seed(13579)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 1.0*e[i-1] + e[i]
sim_theta1 <- tsibble(idx = seq_len(100), y = y, index = idx)

sim_theta1 %>% autoplot(y) + labs(title = "MA(1) model time series plot of random sim with theta = 1.0")

changing \(\theta_1 = -1.0\): setting \(\theta_1 = -1.0\) gives more weight to recent observations and makes the process invertible.

set.seed(13579)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- -1.0*e[i-1] + e[i]
sim_theta2 <- tsibble(idx = seq_len(100), y = y, index = idx)

sim_theta2 %>% autoplot(y) + labs(title = "MA(1) model time series plot of random sim with theta = -1.0")

9.6.e

Generate data from an ARMA(1,1) model with \(\phi_1 = 0.6\), \(\theta_1 = 0.6\) and \(\sigma^2 = 1\).

An ARMA(1,1) model includes features of both AR(1) and MA(1) models.

set.seed(15793)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + 0.6*e[i-1] + e[i]
sim_ARMA <- tsibble(idx = seq_len(100), y = y, index = idx)

sim_ARMA %>% autoplot(y) + labs(title = "ARMA(1,1) model time series plot of random sim with phi = 0.6 and theta = 0.6")

9.6.f

Generate data from an AR(2) model with \(\phi_1 = -0.8\), \(\phi_2 = 0.3\) and \(\sigma^2 = 1\). (Note that these parameters will give a non-stationary series.)

An AR(2) model with the above parameters creates a non-stationary series as shown below, which has exponentially increasing variance.

set.seed(17935)
y <- numeric(100)
e <- rnorm(100)
for(i in 3:100)   # in an order 2 AR  model there is no autocorrelation for the first two values in the series
  y[i] <- -0.8*y[i-1] + 0.3*y[i-2] + e[i]
sim_AR2 <- tsibble(idx = seq_len(100), y = y, index = idx)

sim_AR2 %>% autoplot(y) + labs(title = "AR(2) model time series plot of random sim with phi_1 = -0.8 and phi_2 = 0.3")

9.6.g

Graph the latter two series and compare them.

Although graphed individually above, graphing together is interesting, but the scale of the ARMA(1,1) plot is so small that it simply looks like a line against the exponentially increasing AR(2).

sim_ARMA %>% 
  autoplot(y, series = "ARMA(1,1)", color = "red") +
  autolayer(sim_AR2, series = "AR(2)", color = "blue")

## Plot variable not specified, automatically selected `.vars = y`

Q 9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

Here is the original Australian air carriers time series.

Aus_air <- aus_airpassengers
 
(Aus_air %>% autoplot(Passengers) + labs(y = "Passengers (M)", title = "Australian air passengers 1970-2011"))

There appears to be non-linearity and non-staionarity, so perhaps a differencing with a log transformation would help.Checking the need for differencing, there is indeed a need for two differences.

Aus_air %>% features(Passengers, unitroot_ndiffs)

## # A tibble: 1 x 1
##   ndiffs
##    <int>
## 1      2

But first let’s try a log transformation, which seems to make the series look more linear.

Aus_air2 <- aus_airpassengers %>%
  mutate(Passengers = log(Passengers))
 
(Aus_air2 %>% autoplot(Passengers) + labs(y = "Passengers (M)", title = "Australian air passengers 1970-2011 with log transform"))

Checking the need for differencing on the log transformed data, there is still a need to difference it once.

Aus_air2 %>% features(Passengers, unitroot_ndiffs)

## # A tibble: 1 x 1
##   ndiffs
##    <int>
## 1      1

After differencing the log transform, the series looks stationary.

Aus_air3 <- aus_airpassengers %>%
  mutate(Passengers = difference(log(Passengers)))
 
Aus_air3 %>% gg_tsdisplay(Passengers)

Checking the need for differencing on the once differenced log transformed data, there is no longer a need to difference it any further, since it is now stationary.

Aus_air3 %>% features(Passengers, unitroot_ndiffs)

## # A tibble: 1 x 1
##   ndiffs
##    <int>
## 1      0

9.7.a

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

The ARIMA() function when used to fit an optimal model on the series returns the model ARIMA(0,2,1), which indicates no autoregression, 2 degrees of differencing, and one moving average component.

fit_ARIMA <- Aus_air %>%
  model(ARIMA(Passengers))

report(fit_ARIMA)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

Checking residuals to verify that it is white noise, both the ACF and the PACF confirm white noise.

Aus_air3 %>% gg_tsdisplay(Passengers)

Forecasting the next 10 periods.

fit_ARIMA %>% 
  forecast(h = 10) %>%
  autoplot(Aus_air) +
  labs(y = "Passengers (M)", title = "Australian air passengers 1970-2011 10-yr forecast ARIMA(0,2,1)")

9.7.b

Write the model in terms of the backshift operator.

\(y_t = -0.8963B + \epsilon_t\)

9.7.c

Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

Based on the y-axis scale, the forecast for ARIMA(0,1,0) is slightly lower than the forecast was for ARIMA(0,2,1).

fitARIMA010 <- Aus_air %>%
  model(arima010 = ARIMA(Passengers ~ pdq(0,1,0)))

fitARIMA010 %>% 
  forecast(h = 10) %>%
  autoplot(Aus_air) +
  labs(y = "Passengers (M)", title = "Australian air passengers 1970-2011 10-yr forecast ARIMA(0,1,0)")

9.7.d

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

The forecast for ARIMA(2,1,2) with drift returns a NULL model.

fitARIMA212c <- Aus_air %>%
  model(arima212 = ARIMA(Passengers ~ pdq(2, 1, 2), include.constant = TRUE))

report(fitARIMA212c)

## Series: Passengers 
## Model: NULL model 
## NULL model

Removing the constant does not improve this model.

fitARIMA212nc <- Aus_air %>%
  model(arima212 = ARIMA(Passengers ~ pdq(2,1,2), include.constant = FALSE))

report(fitARIMA212nc)

## Series: Passengers 
## Model: NULL model 
## NULL model

9.7.e

Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

Including a constant for the model ARIMA(0,2,1) returns a NULL model.

fitARIMA021 <- Aus_air %>%
  model(arima021 = ARIMA(Passengers ~ pdq(0,2,1), include.constant = TRUE))

report(fitARIMA021)

## Series: Passengers 
## Model: NULL model 
## NULL model

Q 9.8

For the United States GDP series (from global_economy):

Here is the original US GDP series.

US_GDP <- global_economy %>%
  filter(Country == "United States") %>%
  mutate(GDP = GDP/1e12)
 
(US_GDP %>% autoplot(GDP) + labs(y = "GDP $USD(T) ", title = "US GDP"))

9.8.a

if necessary, find a suitable Box-Cox transformation for the data;

There appears to be non-linearity that a Box-Cox transformation could address. Using lambda = 0.2819 smooths the series to be closer to linear.

lambda_GDP <- US_GDP %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

US_GDP_T <- US_GDP %>%
autoplot(box_cox(GDP,lambda_GDP)) +
  labs(y = "",
       title = paste0("Transformed US GDP with lambda = ",round(lambda_GDP,4)))

US_GDP_T

9.8.b

fit a suitable ARIMA model to the transformed data using ARIMA();

transforming the GDP using the box_cox recommended transformation, and using ARIMA() to find the optimal model returns ARIMA(1,1,0) with drift.

fit_ARIMA_US <- US_GDP %>%
  model(ARIMA(box_cox(GDP,lambda_GDP)))

report(fit_ARIMA_US)

## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, lambda_GDP) 
## 
## Coefficients:
##          ar1  constant
##       0.4586    0.0489
## s.e.  0.1198    0.0039
## 
## sigma^2 estimated as 0.0009375:  log likelihood=118.73
## AIC=-231.46   AICc=-231.01   BIC=-225.33

9.8.c

try some other plausible models by experimenting with the orders chosen;

Trying a few other ways to derive models, it appears that the stepwise and search return the best model with the lowest \(AIC_c\).

fit_ARIMA_US2 <- US_GDP %>%
  model(arima110 = ARIMA(GDP ~ pdq(1,1,0)),     # the model found above
        arima011 = ARIMA(GDP ~ pdq(0,1,1)),     # switching ar to ma
        stepwise = ARIMA(GDP),                  # default stepwise
        search = ARIMA(GDP, stepwise = FALSE) # search the model space
        )

report(fit_ARIMA_US2)

## # A tibble: 4 x 9
##   Country       .model   sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <fct>         <chr>     <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 United States arima110 0.0284    21.3 -36.5 -36.1 -30.4 <cpl [1]> <cpl [0]>
## 2 United States arima011 0.0373    13.6 -21.3 -20.8 -15.2 <cpl [0]> <cpl [1]>
## 3 United States stepwise 0.0261    23.3 -40.5 -40.1 -34.4 <cpl [0]> <cpl [2]>
## 4 United States search   0.0261    23.3 -40.5 -40.1 -34.4 <cpl [0]> <cpl [2]>

9.8.d

choose what you think is the best model and check the residual diagnostics;

Using the stepwise model, the residuals look centered around a mean of 0. They also look like white noise based on the ACF and they appear to be distributed with a slight left skew from normal.

fit_ARIMA_US2 %>% select(stepwise) %>% gg_tsresiduals(lag = 36)

9.8.e

produce forecasts of your fitted model. Do the forecasts look reasonable?

The forecast looks reasonable, because it follows the original curve of the data. Moreover, since the data was log transformed in the model the fable package has back transformed it to the original scale automatically.

forecast(fit_ARIMA_US2, h=12) %>%
  filter(.model=="stepwise") %>%
  autoplot(US_GDP) +
  labs(y = "$USD(Trillions) ", title = "US GDP forecast stepwise model")

9.8.f

compare the results with what you would obtain using ETS() (with no transformation).

Since this is non-seasonal data, we can use time series cross-validation to compare results from ARIMA and ETS. Examining the results of the comparison, the RMSE is better on the ETS model.

US_GDP %>%
  slice(-n()) %>%
  stretch_tsibble(.init = 10) %>%
  model(
    ETS(GDP),
    ARIMA(GDP)
  ) %>%
  forecast(h = 1) %>%
  accuracy(US_GDP) %>%
  select(.model, RMSE:MAPE)

## # A tibble: 2 x 5
##   .model      RMSE   MAE   MPE  MAPE
##   <chr>      <dbl> <dbl> <dbl> <dbl>
## 1 ARIMA(GDP) 0.223 0.137 0.512  1.74
## 2 ETS(GDP)   0.194 0.130 0.560  1.79

Plotting the ETS model for visual comparison, it is very apparent that the prediction intervals are larger in the ETS model while the slope of the mean point prediction does not appear to be as steep as the ARIMA model.

US_GDP %>%
  model(ETS(GDP)) %>%
  forecast(h = 10) %>%
  autoplot(US_GDP) +
  labs(y = "$USD(Trillions) ", title = "US GDP forecast ETS model")

Data 624 Predictive Analytics

Douglas Barley

3/14/2022

Chapter 9, Hyndman and Athanasopoulos

Q 9.1

Q 9.2

Q 9.3

9.3.a

9.3.b

9.3.c

Q 9.5

Q 9.6

9.6.a

9.6.b

9.6.c

9.6.d

9.6.e

9.6.f

9.6.g

Q 9.7

9.7.a

9.7.b

9.7.c

9.7.d

9.7.e

Q 9.8

9.8.a

9.8.b

9.8.c

9.8.d

9.8.e

9.8.f