Required Libraries

library(tidyverse)
library(fpp3)
library(kableExtra)
library(cowplot)

Problem 9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

  1. Explain the differences among these figures. Do they all indicate that the data are white noise?

The differences among these figures are that the ACF range are different. We can see as we progress from left to right, the ranges becomes narrower. They all indicate that they are white noise as we do not see any lags cross the blue line boundary.

  1. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

As our sample size increases, we see that the variation between two numbers become less influential to each other compared to smaller sample sizes. As we have less numbers to choose from, and wider ranges, the ACF then detects these critical values at a larger scale from zero.

Problem 9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

amazon <- 
  gafa_stock |>
  filter(Symbol == 'AMZN') |>
  mutate(trading_day = row_number()) |>
  update_tsibble(index=trading_day, regular=T)


kbl(head(amazon)) |>
  kable_classic(full_width = F, html_font = "Cambria")
Symbol Date Open High Low Close Adj_Close Volume trading_day
AMZN 2014-01-02 398.80 399.36 394.02 397.97 397.97 2137800 1
AMZN 2014-01-03 398.29 402.71 396.22 396.44 396.44 2210200 2
AMZN 2014-01-06 395.85 397.00 388.42 393.63 393.63 3170600 3
AMZN 2014-01-07 395.04 398.47 394.29 398.03 398.03 1916000 4
AMZN 2014-01-08 398.47 403.00 396.04 401.92 401.92 2316500 5
AMZN 2014-01-09 403.71 406.89 398.44 401.01 401.01 2103000 6

We can clearly see a strong upward trend where the data is not stationary in its closing price.

amazon |>
  autoplot(Close) +
  labs(x = "Day", y="Price ($ USD)", title = "Amazon Closing Price")

We see high autocorrelation in each lag that has a slowly decreasing effect as we lag further ahead.

amazon |>
  ACF(Close) |>
  autoplot() +
  labs(subtitle = "Amazon ACF")

Our PACF plot shows an extremely high autocorrelation in the first lag, while there are more lags outside our critical values of concern later on.

amazon |>
  PACF(Close) |>
  autoplot() +
  labs(title = "Amazon PACF")

Problem 9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

  1. Turkish GDP from global_economy.
turkey <- 
  global_economy |>
  filter(Country == "Turkey")

kbl(head(turkey)) |>
  kable_classic(full_width = F, html_font = "Cambria")
Country Code Year GDP Growth CPI Imports Exports Population
Turkey TUR 1960 13995067818 NA 5.40e-05 3.671072 2.055800 27472331
Turkey TUR 1961 8022222222 1.156069 5.57e-05 6.786704 5.124654 28146893
Turkey TUR 1962 8922222222 5.571429 5.78e-05 7.970112 5.603985 28832805
Turkey TUR 1963 10355555556 9.066306 6.15e-05 6.974249 4.184549 29531342
Turkey TUR 1964 11177777778 5.459057 6.22e-05 5.467197 4.473161 30244232
Turkey TUR 1965 11944444444 2.823529 6.50e-05 5.395349 4.558140 30972965
turkey |>
  autoplot(GDP) +
  labs(title = "Turkey GDP", x='Year', y='')

lambda <- 
  turkey |>
  features(GDP, features = guerrero) |>
  pull(lambda_guerrero)

turkey |>
  autoplot(box_cox(GDP, lambda)) +
  labs(title = latex2exp::TeX(paste0(
         "Transformed Turkey GDP with $\\lambda$ = ",
         round(lambda,3))), 
       x='Year', 
       y='')

No differencing is needed as the data is recorded yearly.

  1. Accommodation takings in the state of Tasmania from aus_accommodation.
tasmania <- aus_accommodation |>
  filter(State == "Tasmania")

kbl(head(tasmania)) |>
  kable_classic(full_width = F, html_font = "Cambria")
Date State Takings Occupancy CPI
1998 Q1 Tasmania 28.69907 67 67.0
1998 Q2 Tasmania 19.01751 45 67.4
1998 Q3 Tasmania 16.10556 39 67.5
1998 Q4 Tasmania 25.88618 56 67.8
1999 Q1 Tasmania 28.35130 66 67.8
1999 Q2 Tasmania 20.12543 48 68.1
tasmania |>
  autoplot(Takings) +
  labs(title = "Tasmanian Tourist Accommodation Takings")

lambda <- 
  tasmania |>
  features(Takings, features = guerrero) |>
  pull(lambda_guerrero)

tasmania |>
  autoplot(box_cox(Takings, lambda)) +
  labs(title = latex2exp::TeX(paste0(
         "Transformed Tasmania Takings with $\\lambda$ = ",
         round(lambda,3))), 
       y='')

First Order Differencing

Since we are provided quarterly data, we will use a seasonality (lag) of 4 for the difference() function.

trans_tasmania <-
  tasmania |>
  mutate(bc_trans = box_cox(Takings, lambda)) 

trans_tasmania |>
  autoplot(difference(bc_trans, lag=4)) +
  labs(caption = latex2exp::TeX(paste0(
         "Transformed Tasmania Takings with $\\lambda$ = ",
         round(lambda,3))), 
       y='',
       title = "First Order Differencing")

Unfortunately we still see issues of obtaining stationary data as our first three lags are outside our critical values.

trans_tasmania |>
  ACF(difference(bc_trans, lag=4)) |>
  autoplot() +
   labs(caption = latex2exp::TeX(paste0(
         "Transformed Tasmania Takings with $\\lambda$ = ",
         round(lambda,3))), 
       y='',
       title = "ACF: First Order Differencing")

Second Order Differencing

trans_tasmania |>
  autoplot(difference(bc_trans, lag=4) |> difference()) +
  labs(caption = latex2exp::TeX(paste0(
         "Transformed Tasmania Takings with $\\lambda$ = ",
         round(lambda,3))), 
       y='',
       title = "Second Order Differencing")

Even after our second order, we still have lags outside of our critical values, so we are still unable to make it completely stationary, even though we have significant improvements.

trans_tasmania |>
  ACF(difference(bc_trans, lag=4) |> difference()) |>
  autoplot() +
   labs(caption = latex2exp::TeX(paste0(
         "Transformed Tasmania Takings with $\\lambda$ = ",
         round(lambda,3))), 
       y='',
       title = "ACF: Second Order Differencing")

  1. Monthly sales from souvenirs.
kbl(head(souvenirs)) |>
  kable_classic(full_width = F, html_font = "Cambria")
Month Sales
1987 Jan 1664.81
1987 Feb 2397.53
1987 Mar 2840.71
1987 Apr 3547.29
1987 May 3752.96
1987 Jun 3714.74
souvenirs |>
  autoplot(Sales) +
  labs(title = "Souvenir Sales")

lambda <- 
  souvenirs |>
  features(Sales, features = guerrero) |>
  pull(lambda_guerrero)

souvenirs |>
  autoplot(box_cox(Sales, lambda)) +
  labs(title = latex2exp::TeX(paste0(
         "Transformed Souvenir Sales with $\\lambda$ = ",
         round(lambda,3))), 
       y='')

First Order Differencing

Since we are provided monthly data, we will use a seasonality (lag) of 12 for the difference() function.

trans_souvenirs <-
  souvenirs |>
  mutate(bc_trans = box_cox(Sales, lambda)) 

trans_souvenirs |>
  autoplot(difference(bc_trans, lag=12)) +
  labs(caption = latex2exp::TeX(paste0(
         "Transformed Souvenir Sales with $\\lambda$ = ",
         round(lambda,3))), 
       y='',
       title = "First Order Differencing")

We still see issues of obtaining stationary data as our first two lags are outside our critical values.

trans_souvenirs |>
  ACF(difference(bc_trans, lag=12)) |>
  autoplot() +
   labs(caption = latex2exp::TeX(paste0(
         "Transformed Souvenir Sales with $\\lambda$ = ",
         round(lambda,3))), 
       y='',
       title = "ACF: First Order Differencing")

Second Order Differencing

trans_souvenirs |>
  autoplot(difference(bc_trans, lag=12) |> difference()) +
  labs(caption = latex2exp::TeX(paste0(
         "Transformed Souvenir Sales with $\\lambda$ = ",
         round(lambda,3))), 
       y='',
       title = "Second Order Differencing")

After our second order, we have improved the data to become stationary with only the initial lag outside the boundaries.

trans_souvenirs |>
  ACF(difference(bc_trans, lag=12) |> difference()) |>
  autoplot() +
   labs(caption = latex2exp::TeX(paste0(
         "Transformed Souvenir Sales with $\\lambda$ = ",
         round(lambda,3))), 
       y='',
       title = "ACF: Second Order Differencing")

Problem 9.5

For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

set.seed(42)

retail <- 
  aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

kbl(head(retail)) |>
  kable_classic(full_width = F, html_font = "Cambria")
State Industry Series ID Month Turnover
Western Australia Newspaper and book retailing A3349822A 1982 Apr 9.7
Western Australia Newspaper and book retailing A3349822A 1982 May 11.0
Western Australia Newspaper and book retailing A3349822A 1982 Jun 10.7
Western Australia Newspaper and book retailing A3349822A 1982 Jul 9.0
Western Australia Newspaper and book retailing A3349822A 1982 Aug 9.1
Western Australia Newspaper and book retailing A3349822A 1982 Sep 10.0
retail |>
  autoplot(Turnover) +
  labs(title = "Newspaper and Book Turnover")

lambda <- 
  retail |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)

retail |>
  autoplot(box_cox(Turnover, lambda)) +
  labs(title = latex2exp::TeX(paste0(
         "Transformed Newspaper and Book Turnover with $\\lambda$ = ",
         round(lambda,3))), 
       y='')

First Order Differencing

Since we are provided monthly data, we will use a seasonality (lag) of 12 for the difference() function.

trans_retail <-
  retail |>
  mutate(bc_trans = box_cox(Turnover, lambda)) 

trans_retail |>
  autoplot(difference(bc_trans, lag=12)) +
  labs(caption = latex2exp::TeX(paste0(
         "Transformed Newspaper and Book Turnover with $\\lambda$ = ",
         round(lambda,3))), 
       y='',
       title = "First Order Differencing")

We still see a lot of issues of obtaining stationary data as multiple lags are outside our critical values.

trans_retail |>
  ACF(difference(bc_trans, lag=12)) |>
  autoplot() +
   labs(caption = latex2exp::TeX(paste0(
         "Transformed Newspaper and Book Turnover with $\\lambda$ = ",
         round(lambda,3))), 
       y='',
       title = "ACF: First Order Differencing")

Second Order Differencing

trans_retail |>
  autoplot(difference(bc_trans, lag=12) |> difference()) +
  labs(caption = latex2exp::TeX(paste0(
         "Transformed Newspaper and Book Turnover with $\\lambda$ = ",
         round(lambda,3))), 
       y='',
       title = "Second Order Differencing")

After our second order, we have significantly improved the data to become stationary. However, at lag 12, we have a serious issue of autocorrelation occurring.

trans_retail |>
  ACF(difference(bc_trans, lag=12) |> difference()) |>
  autoplot() +
   labs(caption = latex2exp::TeX(paste0(
         "Transformed Newspaper and Book Turnover with $\\lambda$ = ",
         round(lambda,3))), 
       y='',
       title = "ACF: Second Order Differencing")

Problem 9.6

Simulate and plot some data from simple ARIMA models.

  1. Use the following R code to generate data from an AR(1) model with \(\phi_1 = 0.6\) and \(\sigma^2 = 1\). The process starts with \(y_1=0\).
set.seed(42)

y <- numeric(100)

e <- rnorm(100)

for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]

sim <- 
  tsibble(idx = seq_len(100), y = y, index = idx)
  1. Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?
sim |>
  autoplot(y) +
  labs(title = "AR(1)", caption = "Phi = 0.6")

When \(\phi\) is adjusted throughout the time plot series, we can see the variation becomes much more stable as we approach zero, then becomes increasingly different when we \(\phi\) is approaching 1 again.

set.seed(42)

phi_step <-
  seq(from = -1, to = 1, by = 0.25)

sim_phi <- 
  as.data.frame(matrix(ncol = 2, nrow = 0))

colnames(sim_phi) <- c("phi_val", "y")

for (j in 1:length(phi_step)){
    y <- 
      numeric(100)
    e <- 
      rnorm(100)
    
    phi <- 
      phi_step[j]
    
    phi_val <- 
      (y + 1) * phi
    
    for (i in 2:100){
        y[i] <- (phi * y[i-1]) + e[i]
    }
    idx <- 
      seq_len(100)
    
    new_rows <- 
      cbind(idx, phi_val, y)
    
    sim_phi <- 
      rbind(sim_phi, new_rows)
}

sim_phi |>
  as_tsibble(index = idx, key = phi_val) |>
  autoplot(y) +
  facet_wrap(~ phi_val, ncol = 3) +
  theme_bw() + 
  theme(legend.position = "none")

  1. Write your own code to generate data from an MA(1) model with \(\theta_1 = 0.6\) and \(\sigma^2 = 1\).
set.seed(42)

y <- numeric(100)

e <- rnorm(100)

for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i]

sim <- 
  tsibble(idx = seq_len(100), y = y, index = idx)
  1. Produce a time plot for the series. How does the plot change as you change \(\theta_1\)?

When we look at \(\theta\) approaching 0, we see that our values are closer to the mean. When it begins to approach towards 1, the variation in values begin to become larger again.

set.seed(42)

theta_step <-
  seq(from = -1, to = 1, by = 0.25)

sim_theta <- 
  as.data.frame(matrix(ncol = 2, nrow = 0))

colnames(sim_theta) <- c("theta_val", "y")

for (j in 1:length(theta_step)){
    y <- 
      numeric(100)
    e <- 
      rnorm(100)
    
    theta <- 
      theta_step[j]
    
    theta_val <- 
      (y + 1) * theta
    
    for (i in 2:100){
        y[i] <- (theta * e[i-1]) + e[i]
    }
    idx <- 
      seq_len(100)
    
    new_rows <- 
      cbind(idx, theta_val, y)
    
    sim_theta <- 
      rbind(sim_theta, new_rows)
}

sim_theta |>
  as_tsibble(index = idx, key = theta_val) |>
  autoplot(y) +
  facet_wrap(~ theta_val, ncol = 3) +
  theme_bw() + 
  theme(legend.position = "none")

  1. Generate data from an ARMA(1,1) model with \(\phi_1 = 0.6\), \(\theta_1 = 0.6\) and \(\sigma^2 = 1\).
y <- numeric(100)

e <- rnorm(100)

for (i in 2:100){
    y[i] <- (0.6 * y[i-1]) + (0.6 * e[i-1]) + e[i]
}

sim_arma <- 
  tsibble(idx = seq_len(100), y = y, index = idx)
  1. Generate data from an AR(2) model with \(\phi_1 = -0.8\), \(\theta_1 = 0.3\) and \(\sigma^2 = 1\). (Note that these parameters will give a non-stationary series.)
y <- numeric(100)

e <- rnorm(100)

for (i in 3:100){
    y[i] <- (-0.8 * y[i-1]) + (0.3 * y[i-2]) + e[i]
}

sim_ar <- 
  tsibble(idx = seq_len(100), y = y, index = idx)
  1. Graph the latter two series and compare them.

ARMA(1,1) does not show any specific trend or seasonality. AR(2) starts from almost a straight line with approximately zero variance and begins to grow exponentially in variance as we go further through the series.

arma_plot <-
  sim_arma |>
  autoplot(y) +
  labs(title = "ARMA(1,1)")

ar_plot <- 
  sim_ar |>
  autoplot(y) +
  labs(title="AR(2)" , caption = latex2exp::TeX("ARMA(1,1): ($\\phi_1 = 0.6$, $\\theta_1 = 0.6$)
                                                AR(2): ($\\phi_1 = -0.8$, $\\phi_2 = 0.3$)"))

plot_grid(arma_plot, ar_plot, ncol=1)

Problem 9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

kbl(head(aus_airpassengers)) |>
  kable_classic(full_width = F, html_font = "Cambria")
Year Passengers
1970 7.3187
1971 7.3266
1972 7.7956
1973 9.3846
1974 10.6647
1975 11.0551
  1. Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

ARIMA(0,2,1) model was selected

fit <- aus_airpassengers |>
  model(ARIMA(Passengers))

report(fit)
## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

We see the remainder as white noise.

fit |>
  gg_tsresiduals()

pass_fc <- 
  fit |>
  forecast(h="10 years")

pass_fc |>
  autoplot(aus_airpassengers) +
  labs(title = "Australian Air Passengers", caption = "Forecasting Next 10 Years")

  1. Write the model in terms of the backshift operator.

\(y_t = (1 + \theta B) \epsilon_t(1-B2)\)

  1. Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

We can see the forecast is dampened and the prediction intervals appear to be narrower.

fit <- 
  aus_airpassengers |>
  model(ARIMA(Passengers ~ 1 + pdq(0,1,0)))

arima_fc_010 <- 
  fit |>
  forecast(h="10 years")

arima_fc_010 |>
  autoplot(aus_airpassengers) +
  labs(title = "10 Year Forecast with Drift", caption = "ARIMA(0, 1, 0)")

  1. Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

The forecasts trends are similar, but now we have more variation within the forecasted years and not a straight line. When we removed the constant an error is given thar the forecasts can’t be created.

fit <- 
  aus_airpassengers |>
  model(ARIMA(Passengers ~ 1 + pdq(2,1,2)))

arima_fc_212 <- 
  fit |>
  forecast(h="10 years")

arima_fc_212 |>
  autoplot(aus_airpassengers) +
  labs(title = "10 Year Forecast with Drift", caption = "ARIMA(2, 1, 2)")

  1. Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

We now get a warning

Warning: Model specification induces a quadratic or higher order polynomial trend. 
This is generally discouraged, consider removing the constant or reducing the number of differences.
fit <- 
  aus_airpassengers |>
  model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))

arima_fc_021 <- 
  fit |>
  forecast(h="10 years")

arima_fc_021 |>
  autoplot(aus_airpassengers) +
  labs(title = "10 Year Forecast with Drift", caption = "ARIMA(0, 2, 1)")

Problem 9.8