1) Figure 9.32 from https://otexts.com/fpp3/AR.html shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

a) Explain the differences among these figures. Do they all indicate that the data are white noise?

They all indicate white noise, since the auto-correlations aren’t statistically significant, as evidenced by the vertical black lines not extending past the blue, horizontal, dashed ones.

b) Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The magnitude of the critical values varies inversely with the square root of the sample size. As the plots move left to right, the sample size grows, and so the blue lines move closer to 0. The autocorrelations are different because the values are randomly generated.

2) A classic example of a non-stationary series is stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

gafa_stock %>%
  filter(Symbol=='AMZN') %>%
  select(Adj_Close) %>%
  gg_tsdisplay(Adj_Close, plot_type = "partial") +
  labs(title = "Non-Stationarity of Amazon (adjusted) Closing Prices")

3) For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a) Turkish GDP from global_economy.

turk = global_economy %>%
  filter(Country=="Turkey") %>%
  select(GDP)
autoplot(turk, GDP) +
  labs(title = "Annual Turkish GDP")

A logarithmic transformation should work here.

turk %>%
  mutate(diff_log = difference(log(GDP))) %>%
  autoplot(diff_log) +
  labs(title = "Differences of Logarithms of Annual Turkish GDP")

Those differences of logs look like white noise.
Here are the p-values of the Ljung-Box test for independence from autocorrelation, as shown in our book, for lags 1-20:

pvals = c()
for (lag in 1:20){
  p = turk %>%
  mutate(diff_log = difference(log(GDP))) %>%
  features(diff_log, ljung_box, lag=lag) %>%
  select(lb_pvalue)
  
  pvals[lag] = p[1]
}
plot(1:20, pvals, "l", main = "p-values for lags 1-20, Ljung-Box statistic", xlab = "lag")

The null hypothesis is that the diffs of the logs are independently distributed, and the lowest p-value here, at lag=4, isn’t low enough to reject that.
Possibly there is some sort of barely noticeable pattern/cycle in the GDP around 4 years in duration, though I may be misinterpreting here, and at any rate it doesn’t seem even close to statistically significant.

b) Accommodation takings in the state of Tasmania from aus_accommodation.

taz = aus_accommodation %>%
  filter(State=="Tasmania") %>%
  select(Takings)
autoplot(taz, Takings) +
  labs(title = "Quarterly Tasmanian Accommodation Takings")

The overall trend is linear, such that a power tranformation won’t help there, but it would at least smooth out the seasonal variance that increases as the levels increase, overall. The main thing is to account for the seasonality.

autoplot(taz, log(Takings)) +
  labs(title = "Logarithm of Quarterly Tasmanian Accomodation Takings")

# take seasonal diffs of logs, then first and second diffs of those
taz = taz %>%
  mutate(log_diff_season = difference(log(Takings), 4),
         first_diff = difference(log_diff_season),
         second_diff = difference(first_diff))
autoplot(taz, first_diff) +
  labs(title = "First diffs of Seasonal diffs of Log values")

That looks like whitish noise, such that the second differences are probably unnecessary.

taz %>%
  features(first_diff, unitroot_kpss)
## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1    0.0474         0.1

The KPSS null hypothesis is that the data are stationary, and any p-value over .1 is listed as .1, meaning that the null hypothesis cannot be rejected, as is the case here. So indeed just the first difference of the seasonal differences should suffice.

c) Monthly sales from souvenirs.

autoplot(souvenirs, Sales) +
  labs(title = "Untransformed Monthly Sales")

Another good candidate for the log transform, since the seasonally high sales month levels are increasing much faster than the seasonally low ones.

souv = souvenirs %>%
  mutate(log_sales = log(Sales))
autoplot(souv, log_sales) +
  labs(title = "Log-adjusted Sales")

# take seasonal diffs of logs, then first and second diffs of those
souv = souv %>%
  mutate(log_diff_season = difference(log_sales, 12),
         first_diff = difference(log_diff_season),
         second_diff = difference(first_diff))
autoplot(souv, first_diff) +
  labs(title = "First diffs of Seasonal diffs of Log values")

As with the Tasmanian data, the first diffs of the seasonal diffs appear to plot as white noise, which is indeed confirmed by the same KPSS test as before:

souv %>%
  features(first_diff, unitroot_kpss)
## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1    0.0381         0.1

5) For your retail data (from Exercise 8 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

set.seed(624)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)
myseries = myseries %>%
  mutate(boxcox = box_cox(Turnover, lambda))
myseries %>%
  autoplot(boxcox) +
  labs(y = "Transformed Turnover",
       title = latex2exp::TeX(paste0(
         "Box-Cox Transformed Turnover with $\\lambda$ = ",
         round(lambda,2))))

myseries = myseries %>%
  mutate(boxcox_diff_season = difference(boxcox, 12),
         first_diff = difference(boxcox_diff_season))
autoplot(myseries, boxcox_diff_season) +
  labs(title = "Year-on-year changes by month (aka 'Seasonal Differences')",
       y = "YOY for Box-Cox-transformed Values")

That already looks like white noise, with perhaps the biggest problem being clumpiness. Applying the same KPSS statistical test:

myseries %>%
  features(boxcox_diff_season, unitroot_kpss)
## # A tibble: 1 × 4
##   State           Industry               kpss_stat kpss_pvalue
##   <chr>           <chr>                      <dbl>       <dbl>
## 1 New South Wales Takeaway food services    0.0798         0.1

So it looks like just the seasonal differencing is enough to call things stationary, based on the 0.1 p-value.

6) Simulate and plot some data from simple ARIMA models.

a) Use the following R code to generate data from an AR(1) model with \(\phi_1=0.6\) and \(\phi_2=1\). The process starts with \(y_1=0\)

set.seed(624)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

b) Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?

autoplot(sim, y) +
  labs(title = latex2exp::TeX("AR(1) with $\\phi_1 = 0.6$"), x = "", y = "")

If \(\phi_1\) is raised to 0.9, and the same random errors are used, it looks like this:

y = numeric(100)
for(i in 2:100)
  y[i] <- 0.9*y[i-1] + e[i]
sim1 <- tsibble(idx = seq_len(100), y = y, index = idx)
autoplot(sim1, y) +
  labs(title = latex2exp::TeX("AR(1) with $\\phi_1 = 0.9$"), x = "", y = "")

And if \(\phi_1\) is lowered to 0.2, and the same random errors are used, it looks more like a random walk:

y = numeric(100)
for(i in 2:100)
  y[i] <- 0.2*y[i-1] + e[i]
sim1 <- tsibble(idx = seq_len(100), y = y, index = idx)
autoplot(sim1, y) +
  labs(title = latex2exp::TeX("AR(1) with $\\phi_1 = 0.2$"), x = "", y = "")

Finally, if \(\phi_1\) is made negative, and the same random errors are used, it looks like this:

y = numeric(100)
for(i in 2:100)
  y[i] <- -0.6*y[i-1] + e[i]
sim1 <- tsibble(idx = seq_len(100), y = y, index = idx)
autoplot(sim1, y) +
  labs(title = latex2exp::TeX("AR(1) with $\\phi_1 = -0.6$"), x = "", y = "")

The plot gets smoother and stays further away from its initial value when \(\phi_1\) is higher. As \(\phi_1\) shrinks closer to 0, the data become more like random noise. As \(\phi_1\) becomes more negative, the oscillation around the initial value becomes more pronounced.

c) Write your own code to generate data from an MA(1) model with \(\theta_1=0.6\) and \(\sigma^2=1\)

We can reuse the error vector from above, since it is drawn from a distribution with \(\sigma^2=1\). Let’s assume the constant mean of the series in question is 0.

paste("Variance of our error vector:", round(sd(e), 3))
## [1] "Variance of our error vector: 1.02"
theta = 0.6
ma = e + theta * c(0, e[-1]) # use the first error as the first moving average

d) Produce a time plot for the series. How does the plot change as you change \(\theta_1\)?

sim1 = tsibble(idx = seq_len(100), y = ma, index = idx)
autoplot(sim1, y) +
  labs(title = latex2exp::TeX("MA(1) with $\\theta_1 = 0.6$"), x = "", y = "")

If \(\theta_1\) is raised to 0.9, and the same random errors are used, it looks like this:

theta = 0.9
ma = e + theta * c(0, e[-1])
sim1 <- tsibble(idx = seq_len(100), y = ma, index = idx)
autoplot(sim1, y) +
  labs(title = latex2exp::TeX("AR(1) with $\\theta_1 = 0.9$"), x = "", y = "")

And if \(\theta_1\) is lowered to 0.2, and the same random errors are used:

theta = 0.2
ma = e + theta * c(0, e[-1])
sim1 <- tsibble(idx = seq_len(100), y = ma, index = idx)
autoplot(sim1, y) +
  labs(title = latex2exp::TeX("AR(1) with $\\theta_1 = 0.2$"), x = "", y = "")

Finally, AR(1) with negative \(\theta\):

theta = - 0.6
ma = e + theta * c(0, e[-1])
sim1 <- tsibble(idx = seq_len(100), y = ma, index = idx)
autoplot(sim1, y) +
  labs(title = latex2exp::TeX("AR(1) with $\\theta_1 = -0.6$"), x = "", y = "")

The series all have the same shape, but lesser values of \(\theta\) vary less, remaining closer to zero.
Even \(\theta = -0.6\) oscillates less than \(\theta = 0.2\), meaning that it’s not just the absolute value that determines how much it varies.
But if \(\theta\) were made negative enough, it would of course start to vary a lot:

theta = - 6
ma = e + theta * c(0, e[-1])
sim1 <- tsibble(idx = seq_len(100), y = ma, index = idx)
autoplot(sim1, y) +
  labs(title = latex2exp::TeX("AR(1) with $\\theta_1 = -6$"), x = "", y = "")

e) Generate data from an ARMA(1,1) model with \(\phi_1=0.6\) and \(\theta_1=0.6\) and \(\sigma^2=1\)

theta = 0.6
phi = 0.6
y = numeric(30)
e = rnorm(30) # variance == 1

for(i in 2:30){
  y[i] = phi * y[i-1] + e[i] + theta * e[i-1]
}
sim1 <- tsibble(idx = seq_len(30), y = y, index = idx)

f) Generate data from an AR(2) model with \(\phi_1=-0.8\) and \(\phi_2=0.3\) and \(\sigma^2=1\) (Note that these parameters will give a non-stationary series.)

phi1 = -0.8
phi2 = 0.3
y = numeric(30)
e = rnorm(30) # variance == 1

for(i in 3:30){
  y[i] = phi1 * y[i-1] + phi2 * y[i-2] + e[i]
}
sim2 <- tsibble(idx = seq_len(30), y = y, index = idx)

g) Graph the latter two series and compare them.

sim1 %>%
  autoplot(y) +
  labs(title = latex2exp::TeX("ARMA(1,1) with $\\phi_1 = .6$ and $\\theta_1 = .6$"), x = "", y = "")

sim2 %>%
  autoplot(y) +
  labs(title = latex2exp::TeX("AR(2) with $\\phi_1 = -.8$ and $\\phi_2 = .3$"), x = "", y = "")

The second series oscillates further and further above and below zero, while the first series just drifts away from it, more like a random walk.

7) Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

a) Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

apass = aus_airpassengers
autoplot(apass, Passengers) +
  labs(title = "Australian Air Travelers")

fit = apass %>%
  model(arimod = ARIMA(Passengers, stepwise = F))

report(fit)
## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

ARIMA(0,2,1) was chosen.

Here are its residuals:

fit %>%
  gg_tsresiduals()

The residuals do look like white noise, although maybe a log transform would’ve made the variance more consistent from left to right across time.

Now for the forecasts:

fit %>%
  forecast(h=10) %>%
  autoplot(apass) +
  labs(title = "ARIMA(0,2,1) Forecasts")

b) Write the model in terms of the backshift operator.

\(\theta_1\), or the ma1 coefficient reported from the above model fit, is shown as about -0.9,
and there were 2 differences used, so the backshift version of the model’s fit is:

\((1-B)^2y_t = (1-.9B)\varepsilon_t\)

c) Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

fit_010 = apass %>%
  model(arima010 = ARIMA(Passengers ~ pdq(0,1,0)))
report(fit_010)
## Series: Passengers 
## Model: ARIMA(0,1,0) w/ drift 
## 
## Coefficients:
##       constant
##         1.4191
## s.e.    0.3014
## 
## sigma^2 estimated as 4.271:  log likelihood=-98.16
## AIC=200.31   AICc=200.59   BIC=203.97

The IC metrics are about 1% worse for this model than for the one the algorithm selected before.

fit_010 %>%
  forecast(h=10) %>%
  autoplot(apass) +
  labs(title = "ARIMA(0,1,0) Forecasts")

The forecasts are lower for this model than for the (0,2,1) model. They actually look identical to a simple Drift model, which would make sense, since the model here is just modeling the differences from year to year, which is the same as how the Drift model works.

d) Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

fit_212 = apass %>%
  model(arima212 = ARIMA(Passengers ~ 1 + pdq(2,1,2))) # add the 1 to the formula to indicate drift
report(fit_212)
## Series: Passengers 
## Model: ARIMA(2,1,2) w/ drift 
## 
## Coefficients:
##           ar1      ar2     ma1     ma2  constant
##       -0.5518  -0.7327  0.5895  1.0000    3.2216
## s.e.   0.1684   0.1224  0.0916  0.1008    0.7224
## 
## sigma^2 estimated as 4.031:  log likelihood=-96.23
## AIC=204.46   AICc=206.61   BIC=215.43
fit_212 %>%
  forecast(h=10) %>%
  autoplot(apass) +
  labs(title = "ARIMA(2,1,2) Forecasts")

Now without drift:

fit_212_noDrift = apass %>%
  model(arima212noDrift = ARIMA(Passengers ~ 0 + pdq(2,1,2)))
## Warning: 1 error encountered for arima212noDrift
## [1] non-stationary AR part from CSS
report(fit_212_noDrift)
## Series: Passengers 
## Model: NULL model 
## NULL model

Without being able to account for the drift, the ARIMA model is unable to make the series stationary.
If we specify a second differencing, however, it “stationarizes” things, and the IC metrics are slightly better than those of the (2,1,2) with drift:

fit_222_noDrift = apass %>%
  model(arima222noDrift = ARIMA(Passengers ~ 0 + pdq(2,2,2)))
report(fit_222_noDrift)
## Series: Passengers 
## Model: ARIMA(2,2,2) 
## 
## Coefficients:
##           ar1      ar2      ma1      ma2
##       -1.0119  -0.1733  -0.0015  -0.7567
## s.e.   0.3556   0.1558   0.3410   0.3047
## 
## sigma^2 estimated as 4.488:  log likelihood=-96.42
## AIC=202.84   AICc=204.38   BIC=211.87

Here are the forecasts for this model–

fit_222_noDrift %>%
  forecast(h=10) %>%
  autoplot(apass) +
  labs(title = "ARIMA(2,2,2) without Drift Forecasts")

These forecasts are similar to the (0,2,1) forecasts from part a.

e) Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

The constant should turn out to be close to 0, since the model chosen by the ARIMA algorithm in part a didn’t want a constant.

fit_021_Drift = apass %>%
  model(arima021Drift = ARIMA(Passengers ~ 1 + pdq(0,2,1)))
## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.
report(fit_021_Drift)
## Series: Passengers 
## Model: ARIMA(0,2,1) w/ poly 
## 
## Coefficients:
##           ma1  constant
##       -1.0000    0.0571
## s.e.   0.0585    0.0213
## 
## sigma^2 estimated as 3.855:  log likelihood=-95.1
## AIC=196.21   AICc=196.79   BIC=201.63

This did in fact produce the lowest AIC criteria, for what it’s worth. The BIC is the same as the (0,2,1) without a constant, however, probably because of the added model complexity (the added constant, essentially a slope).

fit_021_Drift %>%
  forecast(h=10) %>%
  autoplot(apass) +
  labs(title = "ARIMA(0,2,1) with forced Drift Forecasts")

Now these forecasts are rising polynomially, as the model warned. Though this is discouraged, which means it doesn’t work in general, I assume, it’s easy to see where it’s believable, when looking at that last forecast plot.

8) For the United States GDP series (from global_economy):

a) If necessary, find a suitable Box-Cox transformation for the data.

usgdp = global_economy %>%
  filter(Country=="United States") %>%
  select(GDP)
usgdp %>%
  autoplot(GDP) +
  labs(title = "US GDP")

lambda = usgdp %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)
usgdp = usgdp %>%
  mutate(boxcox = box_cox(GDP, lambda))
autoplot(usgdp, boxcox) +
  labs(y = "Transformed GDP",
       title = latex2exp::TeX(paste0(
         "Box-Cox Transformed USGDP with $\\lambda$ = ",
         round(lambda,2))))

b) Fit a suitable ARIMA model to the transformed data using ARIMA()

fitGDP = usgdp %>%
  model(arimod = ARIMA(boxcox, stepwise = F))

report(fitGDP)
## Series: boxcox 
## Model: ARIMA(1,1,0) w/ drift 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

c) Try some other plausible models by experimenting with the orders chosen

usgdp %>%
  gg_tsdisplay(difference(boxcox), plot_type = "partial") +
  labs(title = "First Differences of BoxCox-Transformed USGDP")

It looks from the top chart like the series needs to be differenced a second time.

usgdp %>%
  gg_tsdisplay(difference(difference(boxcox)), plot_type = "partial") +
  labs(title = "Second Differences of BoxCox-Transformed USGDP")

The last significantly autocorrelated lag for both the ACF and the PACF is lag=1, so I’ll try a pdq(1,2,1) model.

fit121 = usgdp %>%
  model(mod121 = ARIMA(boxcox ~ pdq(1,2,1)))

report(fit121)
## Series: boxcox 
## Model: ARIMA(1,2,1) 
## 
## Coefficients:
##          ar1      ma1
##       0.3325  -0.8461
## s.e.  0.1849   0.1190
## 
## sigma^2 estimated as 5761:  log likelihood=-321.24
## AIC=648.48   AICc=648.94   BIC=654.55

That lowered the IC measures and raised the log likelihood without adding complexity.

d) Choose what you think is the best model and check the residual diagnostics

fit121 %>%
  gg_tsresiduals() +
  labs(title = "ARIMA(1,2,1) Residuals")

The residuals look reasonable. Even after double differencing, it’s very hard to account for a couple of large negative numbers from the 2008-2009 financial crisis.

e) Produce forecasts of your fitted model. Do the forecasts look reasonable?

fit121 %>%
  forecast(h=10) %>%
  autoplot(usgdp) +
  labs(title = "ARIMA(1,2,1)")

Seems reasonable enough. Slightly lower than a simple Drift model.

f) compare the results with what you would obtain using ETS() (with no transformation).

etsMod = usgdp %>%
  model(ETS(GDP))
etsMod %>%
  forecast(h=10) %>%
  autoplot(usgdp) +
  labs(title = "Automatically selected ETS forecast for Untransformed GDP")

etsMod$`ETS(GDP)`
## <lst_mdl[1]>
## [1] <ETS(M,A,N)>

The ETS() function chose a model with multiplicative error and additive trend. The prediction intervals are very wide. I don’t really know how to compare these “results” to the ARIMA ones, since the ARIMA model is fitted on BoxCox-transformed data, whereas the specification for the ETS model called for untransformed data. To get a better comparison, we could fit an ETS() model to the transformed data, and then compare the cross-validated one-step forecasts for the models.

etsMod2 = usgdp %>%
  model(ETS(boxcox))
etsMod2 %>%
  forecast(h=10) %>%
  autoplot(usgdp) +
  labs(title = "Automatically selected ETS forecast for BoxCoxed GDP")

Not surprisingly (since it’s sort of a straight line series once transformed), the forecasts look similar to the ARIMA model’s, although the prediction interval is noticeably wider for ETS.

etsMod2$`ETS(boxcox)`
## <lst_mdl[1]>
## [1] <ETS(M,A,N)>

(It again chose multiplicative error, which may be unexpected, after the power transform.)

Cross-validated one-step accuracy:

usgdp %>%
  slice(-n()) %>%  # remove last row
  stretch_tsibble(.init = 10) %>%
  model(MAN = ETS(boxcox ~ error("M") + trend("A") + season("N")),
        ARIMA121 = ARIMA(boxcox ~ pdq(1,2,1))) %>%
  forecast(h = 1) %>%
  accuracy(usgdp) %>%
  select(.model, RMSE, MAE)
## # A tibble: 2 × 3
##   .model    RMSE   MAE
##   <chr>    <dbl> <dbl>
## 1 ARIMA121  85.2  62.9
## 2 MAN       87.1  66.4

The errors are lower for the ARIMA model, though not hugely.

Going back to the model that ARIMA() automatically chose, the (1,1,0) with drift, which had worse IC numbers than the (1,2,1) I compared to the ETS just now, you can see that the RMSE for its cross-validated one-step forecasts is actually better than both:

usgdp %>%
  slice(-n()) %>%  # remove last row
  stretch_tsibble(.init = 10) %>%
  model(ARIMA(boxcox ~ 1 + pdq(1,1,0))) %>%
  forecast(h = 1) %>%
  accuracy(usgdp) %>%
  select(.model, RMSE, MAE)
## # A tibble: 1 × 3
##   .model                            RMSE   MAE
##   <chr>                            <dbl> <dbl>
## 1 ARIMA(boxcox ~ 1 + pdq(1, 1, 0))  82.5  62.4