9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8 in Hyndman (https://otexts.com/fpp3/arima-exercises.html)

library(caret)
library(corrplot)
library(dplyr)
library(data.table)
library(e1071)
library(fable)
library(forecast)
library(fpp2)
library(fpp3)
library(ggplot2)
library(GGally)
library(knitr)
library(latex2exp) 
library(mlbench)
library(MASS)
library(purrr)
library(patchwork)
library(stats)
library(tsibble)
library(tidyr)

——————————————————————————–

(9.1) Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
9.32
9.32

(a) Explain the differences among these figures. Do they all indicate that the data are white noise?

These figures all appear to be white noise, given the ACF residuals are centered around a mean of 0, and none of the lags are significant enough to stick over the line. The size of the spike and the boundary line is different, where the boundary line is a 95% confidence interval centered around \(T\) (the length of the time series).

(b) Why are the critical values at different distances from the mean of zero? Why are the auto correlations different in each figure when they each refer to white noise?

The critical values are at different distances from the mean due to the length of the time series. When \(T\) gets larger, there are more points to compare each observation against, thus there will be greater variance in the lagged auto-correlation values for each observation.

——————————————————————————–

(9.2) A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
gafa_stock %>%
  filter(Symbol == "AMZN") %>%
  autoplot(Close) +
  labs(title = "Closing Stock Prices for Amazon")

gafa_stock %>%
  filter(Symbol == "AMZN") %>%
  gg_tsdisplay(Close, plot_type='partial') +
  labs(title = "Closing Stock Prices for Amazon")

gafa_stock %>%
  filter(Symbol == "AMZN") %>%
  features(Close, unitroot_ndiffs) 
## # A tibble: 1 × 2
##   Symbol ndiffs
##   <chr>   <int>
## 1 AMZN        1

The auto-plot is clearly non-stationary given the overall rising trend and somewhat cyclical deviations, while the ACF plot shows that the autocorrelation between each observation is far larger than the mean, and that each autocorrelation is very tied to the prior observation. This indicates the data is not white noise, which is what we’d expect from a proper stationary plot. The PACF plot shows a spike at lag 1, 5, 19, and 26. This data would need to be differenced in order to become stationary.

——————————————————————————–

(9.3) For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

(a) Turkish GDP from global_economy.

turkish_gdp <- global_economy %>%
  filter(Country == 'Turkey')

turkish_gdp %>%
  gg_tsdisplay(GDP, plot_type='partial') +
  labs(title = "Turkish GDP")

# Box-Cox Transform
turkish_gdp_lambda <- turkish_gdp %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

# Differencing
turkish_gdp %>%
  mutate(GDP = box_cox(GDP, turkish_gdp_lambda)) %>%
  features(GDP, unitroot_ndiffs)
## # A tibble: 1 × 2
##   Country ndiffs
##   <fct>    <int>
## 1 Turkey       1
turkish_gdp %>%
  gg_tsdisplay(difference(box_cox(GDP,turkish_gdp_lambda)), plot_type='partial') +
  labs(title = "Turkish GDP", subtitle = paste("λ =", round(turkish_gdp_lambda, 5)))

Non-seasonal, non-stationary

(b) Accommodation takings in the state of Tasmania from aus_accommodation.

Tasmania_accommodation  <- aus_accommodation %>%
  filter(State  == 'Tasmania')

Tasmania_accommodation %>%
  gg_tsdisplay(Takings, plot_type='partial') +
  labs(title = "Tasmania Accomodations")

# Box-Cox Transform
Tasmania_accommodation_lambda <- Tasmania_accommodation %>%
  features(Takings, features = guerrero) %>%
  pull(lambda_guerrero)

# Differencing
Tasmania_accommodation %>%
  mutate(GDP = box_cox(Takings, Tasmania_accommodation_lambda)) %>%
  features(Takings, unitroot_ndiffs)
## # A tibble: 1 × 2
##   State    ndiffs
##   <chr>     <int>
## 1 Tasmania      1
Tasmania_accommodation %>%
  gg_tsdisplay(difference(box_cox(Takings,Tasmania_accommodation_lambda), 4), plot_type='partial') +
  labs(title = "Tasmania Accomodations", subtitle = paste("λ =", round(Tasmania_accommodation_lambda, 5)))

Seasonal, non-stationary

(c) Monthly sales from souvenirs.

souvenirs  %>%
  gg_tsdisplay(Sales, plot_type='partial') +
  labs(title = "Monthly Souvenir Sales")

# Box-Cox Transform
souvenirs_lambda <- souvenirs %>%
  features(Sales, features = guerrero) %>%
  pull(lambda_guerrero)

# Differencing
souvenirs %>%
  mutate(GDP = box_cox(Sales, souvenirs_lambda)) %>%
  features(Sales, unitroot_ndiffs)
## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      1
souvenirs %>%
  gg_tsdisplay(difference(box_cox(Sales, souvenirs_lambda), 4), plot_type='partial') +
  labs(title = "Monthly Souvenir Sales", subtitle = paste("λ =", round(souvenirs_lambda, 5)))

Seasonal, non-stationary

——————————————————————————–

(9.5) For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
set.seed(22397)

myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

autoplot(myseries, Turnover) +
  labs(title = "Retail Turnover")

# Box-Cox Transform
myseries_lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

# Differencing
myseries %>%
  features(box_cox(Turnover, myseries_lambda), unitroot_nsdiffs) 
## # A tibble: 1 × 3
##   State             Industry                     nsdiffs
##   <chr>             <chr>                          <int>
## 1 Western Australia Newspaper and book retailing       1
# Stationary plot
myseries %>%
  gg_tsdisplay(difference(box_cox(Turnover,myseries_lambda), 12), plot_type='partial', lag = 36) +
  labs(title = "Retail Turnover", subtitle = paste("λ =", round(myseries_lambda, 5)))

My retail data is non-stationary, seasonal, with white noise, and after performing a unit-root test I see a difference of 1 to become somewhat stationary, and centering around a mean of 0.

——————————————————————————–

(9.6) Simulate and plot some data from simple ARIMA models.

(a) Use the following R code to generate data from an AR(1) model with \(ϕ_1\)=0.6 and \(σ^2\)=1. The process starts with \(y_1\)=0.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

(b) Produce a time plot for the series. How does the plot change as you change \(ϕ_1\)?

for(i in 2:100)
  y[i] <- 0.5*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim %>% autoplot()  +
  labs(title = "ϕ = 0.5")

for(i in 2:100)
  y[i] <- 0*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim %>% autoplot()  +
  labs(title = "ϕ = 0")

for(i in 2:100)
  y[i] <- -0.2*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim %>% autoplot()  +
  labs(title = "ϕ = -0.2")

for(i in 2:100)
  y[i] <- -0.8*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim %>% autoplot()  +
  labs(title = "ϕ = -0.8")

Per section 9.3, changing the \(ϕ_1\) results in different fluctuations within the time-series. When it is 0 the time-series resembles white noise, and when it is 1, the series is a random walk. When it is negative, the bands of data seem to be closer together with more spikes as they oscillate around the mean. The variance of the error term \(ε_t\) will only change the scale of the series, not the patterns.

(c) Write your own code to generate data from an MA(1) model with \(θ_1\) = 0.6 and \(σ^2\) = 1.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i]

sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Here, \(e\) represents a vector of random noise errors with a normal distribution of mean 0 and standard deviation of 1, and thus are uncorrelated. The \(y\) values represent the actual time series generated using \(y_t = θ_1 e_{t-1} + e_t\), where \(y_t\) is the value of the time series at time \(t\), \(e_t\) is the white noise component at time \(t\), and \(θ_1\) is the lag-1 error coefficient.

(d) Produce a time plot for the series. How does the plot change as you change \(θ_1\)?

sim %>%
  autoplot(y) +
  labs(title = "MA(1) model", 
       subtitle = TeX("$\\theta = 0.6, \\sigma^2 = 1$"))

#########
# θ = 0
for(i in 2:100)
  y[i] <- e[i] + 0*e[i-1]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>%
  autoplot(y) +
  labs(title = "MA(1) model", 
       subtitle=TeX("$\\theta = 0 ,  \\sigma^2 = 1$"))

#########
# θ = 0.5
for(i in 2:100)
  y[i] <- e[i] +  0.5*e[i-1]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>%
  autoplot(y) +
  labs(title = "MA(1) model", 
       subtitle=TeX("$\\theta = 0.5 ,  \\sigma^2 = 1$"))

#########
# θ = 1
for(i in 2:100)
  y[i] <- e[i] + 1*e[i-1]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>%
  autoplot(y) +
  labs(title = "MA(1) model", 
       subtitle=TeX("$\\theta = 1 ,  \\sigma^2 = 1$"))

As \(θ_1\) increases, so does the amplitude of each spike, and it appears as if the observations that were below the mean of 0 have shifted up vertically to get closer.

(e) Generate data from an ARMA(1,1) model with \(ϕ_1\) = 0.6, \(θ_1\) = 0.6 and \(σ^2\) = 1.

set.seed(12345)

phi_1  <- 0.6   # AR parameter
theta_1  <- 0.6 # MA parameter
sigma <- 1   # Variance of white noise


for(i in 2:100)
  y[i] <- e[i] + phi_1 * e[i-1] + theta_1 * y[i-1]

arma1_model <- tsibble(idx = seq_len(100), y = y, index = idx)

(f) Generate data from an AR(2) model with \(ϕ_1\) = −0.8, \(ϕ_2\) = 0.3 and \(σ^2\) = 1. (Note that these parameters will give a non-stationary series.)

set.seed(12345)

phi_1 <- -0.8  # AR(1) parameter
phi_2 <- 0.3   # AR(2) parameter
sigma <- 1  # Variance of white noise

y <- numeric(100)
e <- rnorm(100, sigma)

for(i in 3:100)
  y[i] <- phi_1 * y[i-1] + phi_2 * y[i-2] + sigma * e[i]

arma2_model <- tsibble(idx = seq_len(100), y = y, index = idx)

(g) Graph the latter two series and compare them.

arma1_model %>%
  gg_tsdisplay(y, plot_type='partial') +
  labs(title = ("ARMA(1,1) model"), 
       subtitle = TeX("$\\phi_1 = 0.6$, $\\theta_1 = 0.6$, $\\sigma^2 = 1$"))

arma2_model %>%
  gg_tsdisplay(y, plot_type='partial') +
  labs(title = ("AR(2) model"), 
       subtitle = TeX("$\\phi_1 = -0.8, \\phi_2 = 0.3, \\sigma^2 = 1$"))

The ARMA(1,1) model is stationary and the ACF plot spikes quickly drop down to be in the 95% confidence interval around the mean of 0. The AR(2) model isn’t stationary, and increasing with the level of the observations. The PACF plot show that there is a significant negative lag from the first one, while the ACF plot like the model itself. Given that the model is unstable due to the roots of the polynomial of \(ϕ(B)\) not being confined within the unit circle, this model is poor for forecasting.

——————————————————————————–

(9.7) Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

(a) Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

fit <- aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers)) 

# Default model chosen was ARIMA(0,2,1) 
report(fit)
## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8756
## s.e.   0.0722
## 
## sigma^2 estimated as 4.671:  log likelihood=-87.8
## AIC=179.61   AICc=179.93   BIC=182.99
#Forecast
fit %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Aircraft Passengers, ARIMA(0,2,1)")

# Residuals resemble white noise
fit %>% 
  gg_tsresiduals() +
  labs(title = "ARIMA(0,2,1) Residuals")

# p+d+q = 3 degrees of freedom
augment(fit) %>%
  features(.innov, ljung_box, lag = 10, dof = 3)
## # A tibble: 1 × 3
##   .model            lb_stat lb_pvalue
##   <chr>               <dbl>     <dbl>
## 1 ARIMA(Passengers)    6.00     0.539

(b) Write the model in terms of the backshift operator.

Looking at the coefficients for this model, we can write it initially as \(y_t = -0.8756 * e_{t-1} + e_t\). Given that \(By_t = y_{t-1}\), we can then rewrite this as \(y_t = -0.8756 * Be_t + e_t\), and then as \(y_t = (1 -0.8756B)e_t\). Since we have a 2nd-order difference here, the final notation is \((1-B)^2y_t = (1 -0.8756B)e_t\).

(c) Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

fit2 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ pdq(0,1,0)))

fit2 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Aircraft Passengers, ARIMA(0,1,0)")

fit2 %>% 
  gg_tsresiduals() +
  labs(title = "ARIMA(0,1,0) Residuals")

The forecast for the ARIMA(0,2,1) model is higher than this model. However, the residuals seem to be nearly identical in both plots, even though the ACF plots vary.

(d) Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

fit3 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ pdq(2,1,2)))

fit3 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Aircraft Passengers, ARIMA(2,1,2)")

fit3 %>% 
  gg_tsresiduals() +
  labs(title = "ARIMA(2,1,2) Residuals")

######
# Remove the constant
fit4 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))

#fit4 %>% 
#  forecast(h=10) %>%
#  autoplot(aus_airpassengers) +
#  labs(title = "Australian Aircraft Passengers, ARIMA(2,1,2) without constant")

#fit4 %>% 
#  gg_tsresiduals() +
#  labs(title = "ARIMA(2,1,2) Residuals without constant")

The plot resembles that of the ARIMA(0,2,1) model, although with slightly different ACF lags which are still centered around the mean. Removing the constant ensured forecasting would fail, and a null model was made. Chapter 9.7 explains that the the inclusion of a constant in a non-stationary ARIMA model is equivalent to inducing a polynomial trend of order \(d\) in the forecasts. (If the constant is omitted, the forecasts include a polynomial trend of order \(d\)−1.) Since in this model, that value is 1, then we are left with an order of 1-1 = 0, which has no differencing and is thus non-stationary.

(e) Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

fit5 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))

fit5 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Aircraft Passengers, ARIMA(0,2,1)")

fit5 %>% 
  gg_tsresiduals() +
  labs(title = "ARIMA(0,2,1) Residuals")

With the constant added, we are now able to forecast.

——————————————————————————–

(9.8) For the United States GDP series (from global_economy):

(a) if necessary, find a suitable Box-Cox transformation for the data;

us_gdp <- global_economy %>%
  filter(Code == 'USA')

us_gdp %>%
  gg_tsdisplay(GDP, plot_type='partial') +
  labs(title = "United States GDP")

# Box-Cox
us_gdp_lambda <- us_gdp %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

# Apply the Box-Cox transformation
us_gdp <- us_gdp %>%
  mutate(BoxCox_GDP = BoxCox(GDP, us_gdp_lambda))

us_gdp %>%
  gg_tsdisplay(BoxCox_GDP, plot_type='partial') +
  labs(title = "United States GDP", subtitle="Box-Cox Transform")

(b) fit a suitable ARIMA model to the transformed data using ARIMA();

fit <- us_gdp %>%
  model(arima = ARIMA(box_cox(GDP, us_gdp_lambda), stepwise = FALSE, approx = FALSE))

# ARIMA(1,1,0) w/ drift 
report(fit)
## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, us_gdp_lambda) 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

(c) try some other plausible models by experimenting with the orders chosen;

# Differencing of 1 suggested after Box-Cox transform
us_gdp %>%
  features(BoxCox_GDP, unitroot_ndiffs)
## # A tibble: 1 × 2
##   Country       ndiffs
##   <fct>          <int>
## 1 United States      1
# models
usa_arima_models <- us_gdp %>%
    model(arima110 = ARIMA(box_cox(GDP,us_gdp_lambda) ~ pdq(1,1,0)),
          arima111 = ARIMA(box_cox(GDP,us_gdp_lambda) ~ pdq(1,1,1)),
          arima112 = ARIMA(box_cox(GDP,us_gdp_lambda) ~ pdq(1,1,2)),
          arima113 = ARIMA(box_cox(GDP,us_gdp_lambda) ~ pdq(1,1,3)),
          arima210 = ARIMA(box_cox(GDP,us_gdp_lambda) ~ pdq(2,1,0)),
          arima212 = ARIMA(box_cox(GDP,us_gdp_lambda) ~ pdq(2,1,2)),
          arima213 = ARIMA(box_cox(GDP,us_gdp_lambda) ~ pdq(2,1,3)))

# This doesn't work for some reason
#glance(usa_arima_models) %>% arrange(AICc) %>% select(.model:BIC)


# Convert the tibble to a data.table
usa_arima_glance <- glance(usa_arima_models)

# Select specific columns
usa_arima_glance <- as.data.table(usa_arima_glance)[, .(.model, sigma2, log_lik, AIC, AICc, BIC)]

# Arrange by AICc
usa_arima_glance <- usa_arima_glance[order(AICc)]

print(usa_arima_glance)
##      .model   sigma2   log_lik      AIC     AICc      BIC
##      <char>    <num>     <num>    <num>    <num>    <num>
## 1: arima110 5479.297 -325.3238 656.6476 657.1005 662.7768
## 2: arima111 5580.411 -325.3221 658.6443 659.4135 666.8165
## 3: arima210 5580.477 -325.3224 658.6449 659.4141 666.8171
## 4: arima112 5630.034 -325.0679 660.1359 661.3123 670.3511
## 5: arima113 5732.357 -325.0349 662.0699 663.7499 674.3282
## 6: arima212 5734.444 -325.0458 662.0915 663.7715 674.3498
## 7: arima213 5843.415 -325.0293 664.0586 666.3444 678.3600

\(p\) is the number of autoregressive (AR) terms, which represent the dependence on the previous values of the time series. \(d\) represents the number of differencing operations needed to make the series stationary, which has constant mean and variance over time. \(q\) represents the number of moving average (MA) terms, which model the relationship between the current value of the series and the residual errors from previous periods.

(d) choose what you think is the best model and check the residual diagnostics;

Log-likelihood measures the probability that the model I fit to the data will produce the observed data, and a higher log-likelihood means a better-fitting model. AIC is a relative measure of model quality, which takes into account both fit (how well the model represents the training data) and complexity (how many parameters the model uses), and a lower AIC indicates a better model because it balances model fit and complexity. AICc is like AIC, but modified for cases with smaller sample points in a model. Bayesian Information Criterion (BIC) functions similarly to AIC, but it’s stricter with more complex models. Because BIC penalizes complexity more harshly than AIC, it favors simpler models. Given all this information, it’s clear that the original default chosen model by R (ARIMA(1,1,0) is the best model given it has the highest log_lik and lowest AIC, AICc, and BIC), even though they’re all very close. But I’ll use ARIMA(1,1,1) as it’s the second best.

# Extract the ARIMA model object from the `arima121` model
arima111_model <- usa_arima_models %>%
  pluck("arima111")

class(arima111_model)
## [1] "lst_mdl"    "vctrs_vctr" "list"
# Check the specific methods associated with the model
methods(class = class(arima111_model))
## [1] forecast format   Ops      refit    stream   type_sum
## see '?methods' for accessing help and source code
# Extract the fit (model output)
fit_arima111 <- arima111_model[[1]]$fit

# Check if residuals are available in the fit
residuals_arima111 <- fit_arima111$residuals

I’ve spent hours troubleshooting why this doesn’t work, and I’m still unsure- it could be a version issue with my Fable installation, as I wasn’t able to use glance(usa_arima_models) %>% arrange(AICc) %>% select(.model:BIC) either to extract the statistics, hence why I resorted to a data table. Regardless, parts d,e,f and seem to be unfeasible given I can’t move past this error.

(e) produce forecasts of your fitted model. Do the forecasts look reasonable?

(f) compare the results with what you would obtain using ETS() (with no transformation).