ARIMA

Required Libraries

library(tidyverse)
library(fpp3)
library(kableExtra)
library(cowplot)

Problem 9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise?

The differences among these figures are that the ACF range are different. We can see as we progress from left to right, the ranges becomes narrower. They all indicate that they are white noise as we do not see any lags cross the blue line boundary.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

As our sample size increases, we see that the variation between two numbers become less influential to each other compared to smaller sample sizes. As we have less numbers to choose from, and wider ranges, the ACF then detects these critical values at a larger scale from zero.

Problem 9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

amazon <- 
  gafa_stock |>
  filter(Symbol == 'AMZN') |>
  mutate(trading_day = row_number()) |>
  update_tsibble(index=trading_day, regular=T)


kbl(head(amazon)) |>
  kable_classic(full_width = F, html_font = "Cambria")

Symbol	Date	Open	High	Low	Close	Adj_Close	Volume	trading_day
AMZN	2014-01-02	398.80	399.36	394.02	397.97	397.97	2137800	1
AMZN	2014-01-03	398.29	402.71	396.22	396.44	396.44	2210200	2
AMZN	2014-01-06	395.85	397.00	388.42	393.63	393.63	3170600	3
AMZN	2014-01-07	395.04	398.47	394.29	398.03	398.03	1916000	4
AMZN	2014-01-08	398.47	403.00	396.04	401.92	401.92	2316500	5
AMZN	2014-01-09	403.71	406.89	398.44	401.01	401.01	2103000	6

We can clearly see a strong upward trend where the data is not stationary in its closing price.

amazon |>
  autoplot(Close) +
  labs(x = "Day", y="Price ($ USD)", title = "Amazon Closing Price")

We see high autocorrelation in each lag that has a slowly decreasing effect as we lag further ahead.

amazon |>
  ACF(Close) |>
  autoplot() +
  labs(subtitle = "Amazon ACF")

Our PACF plot shows an extremely high autocorrelation in the first lag, while there are more lags outside our critical values of concern later on.

amazon |>
  PACF(Close) |>
  autoplot() +
  labs(title = "Amazon PACF")

Problem 9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Turkish GDP from global_economy.

turkey <- 
  global_economy |>
  filter(Country == "Turkey")

kbl(head(turkey)) |>
  kable_classic(full_width = F, html_font = "Cambria")

Country	Code	Year	GDP	Growth	CPI	Imports	Exports	Population
Turkey	TUR	1960	13995067818	NA	5.40e-05	3.671072	2.055800	27472331
Turkey	TUR	1961	8022222222	1.156069	5.57e-05	6.786704	5.124654	28146893
Turkey	TUR	1962	8922222222	5.571429	5.78e-05	7.970112	5.603985	28832805
Turkey	TUR	1963	10355555556	9.066306	6.15e-05	6.974249	4.184549	29531342
Turkey	TUR	1964	11177777778	5.459057	6.22e-05	5.467197	4.473161	30244232
Turkey	TUR	1965	11944444444	2.823529	6.50e-05	5.395349	4.558140	30972965

turkey |>
  autoplot(GDP) +
  labs(title = "Turkey GDP", x='Year', y='')

lambda <- 
  turkey |>
  features(GDP, features = guerrero) |>
  pull(lambda_guerrero)

turkey |>
  autoplot(box_cox(GDP, lambda)) +
  labs(title = latex2exp::TeX(paste0(
         "Transformed Turkey GDP with $\\lambda$ = ",
         round(lambda,3))), 
       x='Year', 
       y='')

No differencing is needed as the data is recorded yearly.

Accommodation takings in the state of Tasmania from aus_accommodation.

tasmania <- aus_accommodation |>
  filter(State == "Tasmania")

kbl(head(tasmania)) |>
  kable_classic(full_width = F, html_font = "Cambria")

Date	State	Takings	Occupancy	CPI
1998 Q1	Tasmania	28.69907	67	67.0
1998 Q2	Tasmania	19.01751	45	67.4
1998 Q3	Tasmania	16.10556	39	67.5
1998 Q4	Tasmania	25.88618	56	67.8
1999 Q1	Tasmania	28.35130	66	67.8
1999 Q2	Tasmania	20.12543	48	68.1

tasmania |>
  autoplot(Takings) +
  labs(title = "Tasmanian Tourist Accommodation Takings")

lambda <- 
  tasmania |>
  features(Takings, features = guerrero) |>
  pull(lambda_guerrero)

tasmania |>
  autoplot(box_cox(Takings, lambda)) +
  labs(title = latex2exp::TeX(paste0(
         "Transformed Tasmania Takings with $\\lambda$ = ",
         round(lambda,3))), 
       y='')

First Order Differencing

Since we are provided quarterly data, we will use a seasonality (lag) of 4 for the difference() function.

trans_tasmania <-
  tasmania |>
  mutate(bc_trans = box_cox(Takings, lambda)) 

trans_tasmania |>
  autoplot(difference(bc_trans, lag=4)) +
  labs(caption = latex2exp::TeX(paste0(
         "Transformed Tasmania Takings with $\\lambda$ = ",
         round(lambda,3))), 
       y='',
       title = "First Order Differencing")

Unfortunately we still see issues of obtaining stationary data as our first three lags are outside our critical values.

trans_tasmania |>
  ACF(difference(bc_trans, lag=4)) |>
  autoplot() +
   labs(caption = latex2exp::TeX(paste0(
         "Transformed Tasmania Takings with $\\lambda$ = ",
         round(lambda,3))), 
       y='',
       title = "ACF: First Order Differencing")

Second Order Differencing

trans_tasmania |>
  autoplot(difference(bc_trans, lag=4) |> difference()) +
  labs(caption = latex2exp::TeX(paste0(
         "Transformed Tasmania Takings with $\\lambda$ = ",
         round(lambda,3))), 
       y='',
       title = "Second Order Differencing")

Even after our second order, we still have lags outside of our critical values, so we are still unable to make it completely stationary, even though we have significant improvements.

trans_tasmania |>
  ACF(difference(bc_trans, lag=4) |> difference()) |>
  autoplot() +
   labs(caption = latex2exp::TeX(paste0(
         "Transformed Tasmania Takings with $\\lambda$ = ",
         round(lambda,3))), 
       y='',
       title = "ACF: Second Order Differencing")

Monthly sales from souvenirs.

kbl(head(souvenirs)) |>
  kable_classic(full_width = F, html_font = "Cambria")

Month	Sales
1987 Jan	1664.81
1987 Feb	2397.53
1987 Mar	2840.71
1987 Apr	3547.29
1987 May	3752.96
1987 Jun	3714.74

souvenirs |>
  autoplot(Sales) +
  labs(title = "Souvenir Sales")

lambda <- 
  souvenirs |>
  features(Sales, features = guerrero) |>
  pull(lambda_guerrero)

souvenirs |>
  autoplot(box_cox(Sales, lambda)) +
  labs(title = latex2exp::TeX(paste0(
         "Transformed Souvenir Sales with $\\lambda$ = ",
         round(lambda,3))), 
       y='')

First Order Differencing

Since we are provided monthly data, we will use a seasonality (lag) of 12 for the difference() function.

trans_souvenirs <-
  souvenirs |>
  mutate(bc_trans = box_cox(Sales, lambda)) 

trans_souvenirs |>
  autoplot(difference(bc_trans, lag=12)) +
  labs(caption = latex2exp::TeX(paste0(
         "Transformed Souvenir Sales with $\\lambda$ = ",
         round(lambda,3))), 
       y='',
       title = "First Order Differencing")

We still see issues of obtaining stationary data as our first two lags are outside our critical values.

trans_souvenirs |>
  ACF(difference(bc_trans, lag=12)) |>
  autoplot() +
   labs(caption = latex2exp::TeX(paste0(
         "Transformed Souvenir Sales with $\\lambda$ = ",
         round(lambda,3))), 
       y='',
       title = "ACF: First Order Differencing")

Second Order Differencing

trans_souvenirs |>
  autoplot(difference(bc_trans, lag=12) |> difference()) +
  labs(caption = latex2exp::TeX(paste0(
         "Transformed Souvenir Sales with $\\lambda$ = ",
         round(lambda,3))), 
       y='',
       title = "Second Order Differencing")

After our second order, we have improved the data to become stationary with only the initial lag outside the boundaries.

trans_souvenirs |>
  ACF(difference(bc_trans, lag=12) |> difference()) |>
  autoplot() +
   labs(caption = latex2exp::TeX(paste0(
         "Transformed Souvenir Sales with $\\lambda$ = ",
         round(lambda,3))), 
       y='',
       title = "ACF: Second Order Differencing")

Problem 9.5

For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

set.seed(42)

retail <- 
  aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

kbl(head(retail)) |>
  kable_classic(full_width = F, html_font = "Cambria")

State	Industry	Series ID	Month	Turnover
Western Australia	Newspaper and book retailing	A3349822A	1982 Apr	9.7
Western Australia	Newspaper and book retailing	A3349822A	1982 May	11.0
Western Australia	Newspaper and book retailing	A3349822A	1982 Jun	10.7
Western Australia	Newspaper and book retailing	A3349822A	1982 Jul	9.0
Western Australia	Newspaper and book retailing	A3349822A	1982 Aug	9.1
Western Australia	Newspaper and book retailing	A3349822A	1982 Sep	10.0

retail |>
  autoplot(Turnover) +
  labs(title = "Newspaper and Book Turnover")

lambda <- 
  retail |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)

retail |>
  autoplot(box_cox(Turnover, lambda)) +
  labs(title = latex2exp::TeX(paste0(
         "Transformed Newspaper and Book Turnover with $\\lambda$ = ",
         round(lambda,3))), 
       y='')

First Order Differencing

Since we are provided monthly data, we will use a seasonality (lag) of 12 for the difference() function.

trans_retail <-
  retail |>
  mutate(bc_trans = box_cox(Turnover, lambda)) 

trans_retail |>
  autoplot(difference(bc_trans, lag=12)) +
  labs(caption = latex2exp::TeX(paste0(
         "Transformed Newspaper and Book Turnover with $\\lambda$ = ",
         round(lambda,3))), 
       y='',
       title = "First Order Differencing")

We still see a lot of issues of obtaining stationary data as multiple lags are outside our critical values.

trans_retail |>
  ACF(difference(bc_trans, lag=12)) |>
  autoplot() +
   labs(caption = latex2exp::TeX(paste0(
         "Transformed Newspaper and Book Turnover with $\\lambda$ = ",
         round(lambda,3))), 
       y='',
       title = "ACF: First Order Differencing")

Second Order Differencing

trans_retail |>
  autoplot(difference(bc_trans, lag=12) |> difference()) +
  labs(caption = latex2exp::TeX(paste0(
         "Transformed Newspaper and Book Turnover with $\\lambda$ = ",
         round(lambda,3))), 
       y='',
       title = "Second Order Differencing")

After our second order, we have significantly improved the data to become stationary. However, at lag 12, we have a serious issue of autocorrelation occurring.

trans_retail |>
  ACF(difference(bc_trans, lag=12) |> difference()) |>
  autoplot() +
   labs(caption = latex2exp::TeX(paste0(
         "Transformed Newspaper and Book Turnover with $\\lambda$ = ",
         round(lambda,3))), 
       y='',
       title = "ACF: Second Order Differencing")

Problem 9.6

Simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with \(\phi_1 = 0.6\) and \(\sigma^2 = 1\). The process starts with \(y_1=0\).

set.seed(42)

y <- numeric(100)

e <- rnorm(100)

for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]

sim <- 
  tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?

sim |>
  autoplot(y) +
  labs(title = "AR(1)", caption = "Phi = 0.6")

When \(\phi\) is adjusted throughout the time plot series, we can see the variation becomes much more stable as we approach zero, then becomes increasingly different when we \(\phi\) is approaching 1 again.

set.seed(42)

phi_step <-
  seq(from = -1, to = 1, by = 0.25)

sim_phi <- 
  as.data.frame(matrix(ncol = 2, nrow = 0))

colnames(sim_phi) <- c("phi_val", "y")

for (j in 1:length(phi_step)){
    y <- 
      numeric(100)
    e <- 
      rnorm(100)
    
    phi <- 
      phi_step[j]
    
    phi_val <- 
      (y + 1) * phi
    
    for (i in 2:100){
        y[i] <- (phi * y[i-1]) + e[i]
    }
    idx <- 
      seq_len(100)
    
    new_rows <- 
      cbind(idx, phi_val, y)
    
    sim_phi <- 
      rbind(sim_phi, new_rows)
}

sim_phi |>
  as_tsibble(index = idx, key = phi_val) |>
  autoplot(y) +
  facet_wrap(~ phi_val, ncol = 3) +
  theme_bw() + 
  theme(legend.position = "none")

Write your own code to generate data from an MA(1) model with \(\theta_1 = 0.6\) and \(\sigma^2 = 1\).

set.seed(42)

y <- numeric(100)

e <- rnorm(100)

for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i]

sim <- 
  tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change \(\theta_1\)?

When we look at \(\theta\) approaching 0, we see that our values are closer to the mean. When it begins to approach towards 1, the variation in values begin to become larger again.

set.seed(42)

theta_step <-
  seq(from = -1, to = 1, by = 0.25)

sim_theta <- 
  as.data.frame(matrix(ncol = 2, nrow = 0))

colnames(sim_theta) <- c("theta_val", "y")

for (j in 1:length(theta_step)){
    y <- 
      numeric(100)
    e <- 
      rnorm(100)
    
    theta <- 
      theta_step[j]
    
    theta_val <- 
      (y + 1) * theta
    
    for (i in 2:100){
        y[i] <- (theta * e[i-1]) + e[i]
    }
    idx <- 
      seq_len(100)
    
    new_rows <- 
      cbind(idx, theta_val, y)
    
    sim_theta <- 
      rbind(sim_theta, new_rows)
}

sim_theta |>
  as_tsibble(index = idx, key = theta_val) |>
  autoplot(y) +
  facet_wrap(~ theta_val, ncol = 3) +
  theme_bw() + 
  theme(legend.position = "none")

Generate data from an ARMA(1,1) model with \(\phi_1 = 0.6\), \(\theta_1 = 0.6\) and \(\sigma^2 = 1\).

y <- numeric(100)

e <- rnorm(100)

for (i in 2:100){
    y[i] <- (0.6 * y[i-1]) + (0.6 * e[i-1]) + e[i]
}

sim_arma <- 
  tsibble(idx = seq_len(100), y = y, index = idx)

Generate data from an AR(2) model with \(\phi_1 = -0.8\), \(\theta_1 = 0.3\) and \(\sigma^2 = 1\). (Note that these parameters will give a non-stationary series.)

y <- numeric(100)

e <- rnorm(100)

for (i in 3:100){
    y[i] <- (-0.8 * y[i-1]) + (0.3 * y[i-2]) + e[i]
}

sim_ar <- 
  tsibble(idx = seq_len(100), y = y, index = idx)

Graph the latter two series and compare them.

ARMA(1,1) does not show any specific trend or seasonality. AR(2) starts from almost a straight line with approximately zero variance and begins to grow exponentially in variance as we go further through the series.

arma_plot <-
  sim_arma |>
  autoplot(y) +
  labs(title = "ARMA(1,1)")

ar_plot <- 
  sim_ar |>
  autoplot(y) +
  labs(title="AR(2)" , caption = latex2exp::TeX("ARMA(1,1): ($\\phi_1 = 0.6$, $\\theta_1 = 0.6$)
                                                AR(2): ($\\phi_1 = -0.8$, $\\phi_2 = 0.3$)"))

plot_grid(arma_plot, ar_plot, ncol=1)

Problem 9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

kbl(head(aus_airpassengers)) |>
  kable_classic(full_width = F, html_font = "Cambria")

Year	Passengers
1970	7.3187
1971	7.3266
1972	7.7956
1973	9.3846
1974	10.6647
1975	11.0551

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

ARIMA(0,2,1) model was selected

fit <- aus_airpassengers |>
  model(ARIMA(Passengers))

report(fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

We see the remainder as white noise.

fit |>
  gg_tsresiduals()

pass_fc <- 
  fit |>
  forecast(h="10 years")

pass_fc |>
  autoplot(aus_airpassengers) +
  labs(title = "Australian Air Passengers", caption = "Forecasting Next 10 Years")

Write the model in terms of the backshift operator.

\(y_t = (1 + \theta B) \epsilon_t(1-B2)\)

Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

We can see the forecast is dampened and the prediction intervals appear to be narrower.

fit <- 
  aus_airpassengers |>
  model(ARIMA(Passengers ~ 1 + pdq(0,1,0)))

arima_fc_010 <- 
  fit |>
  forecast(h="10 years")

arima_fc_010 |>
  autoplot(aus_airpassengers) +
  labs(title = "10 Year Forecast with Drift", caption = "ARIMA(0, 1, 0)")

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

The forecasts trends are similar, but now we have more variation within the forecasted years and not a straight line. When we removed the constant an error is given thar the forecasts can’t be created.

fit <- 
  aus_airpassengers |>
  model(ARIMA(Passengers ~ 1 + pdq(2,1,2)))

arima_fc_212 <- 
  fit |>
  forecast(h="10 years")

arima_fc_212 |>
  autoplot(aus_airpassengers) +
  labs(title = "10 Year Forecast with Drift", caption = "ARIMA(2, 1, 2)")

Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

We now get a warning

Warning: Model specification induces a quadratic or higher order polynomial trend. 
This is generally discouraged, consider removing the constant or reducing the number of differences.

fit <- 
  aus_airpassengers |>
  model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))

arima_fc_021 <- 
  fit |>
  forecast(h="10 years")

arima_fc_021 |>
  autoplot(aus_airpassengers) +
  labs(title = "10 Year Forecast with Drift", caption = "ARIMA(0, 2, 1)")

ARIMA

John Cruz

2024-03-14

Required Libraries

Problem 9.1

Problem 9.2

Problem 9.3

Problem 9.5

Problem 9.6

Problem 9.7

Problem 9.8