Chapter 8 - ARIMA models



  1. Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
  1. Explain the differences among these figures. Do they all indicate that the data are white noise?

Figure 8.31: Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.

  1. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?


  1. A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
ggtsdisplay(ibmclose)



  1. For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
  1. usnetelec
  2. usgdp
  3. mcopper
  4. enplanements
  5. visitors
autoplot(usnetelec)

Box.test(diff(usnetelec), type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  diff(usnetelec)
## X-squared = 0.8508, df = 1, p-value = 0.3563
kpss.test(diff(usnetelec))
## Warning in kpss.test(diff(usnetelec)): p-value greater than printed p-value
## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(usnetelec)
## KPSS Level = 0.15848, Truncation lag parameter = 3, p-value = 0.1
autoplot(usgdp)

Box.test(diff(usgdp), type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  diff(usgdp)
## X-squared = 39.187, df = 1, p-value = 3.85e-10
autoplot(diff(usgdp))

ndiffs(usgdp)
## [1] 2
autoplot(diff(diff(usgdp)))

Box.test(diff(diff(usgdp)), type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  diff(diff(usgdp))
## X-squared = 53.294, df = 1, p-value = 2.872e-13
ggAcf(diff(diff(usgdp)))

kpss.test(diff(diff(usnetelec)))
## Warning in kpss.test(diff(diff(usnetelec))): p-value greater than printed
## p-value
## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(diff(usnetelec))
## KPSS Level = 0.098532, Truncation lag parameter = 3, p-value = 0.1
autoplot(mcopper)

lambda_mcopper <- BoxCox.lambda(mcopper)
autoplot(diff(BoxCox(mcopper, lambda_mcopper)))

Box.test(diff(BoxCox(mcopper, lambda_mcopper)), type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  diff(BoxCox(mcopper, lambda_mcopper))
## X-squared = 57.517, df = 1, p-value = 3.353e-14
ggAcf(diff(BoxCox(mcopper, lambda_mcopper)))

kpss.test(diff(BoxCox(mcopper, lambda_mcopper)))
## Warning in kpss.test(diff(BoxCox(mcopper, lambda_mcopper))): p-value
## greater than printed p-value
## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(BoxCox(mcopper, lambda_mcopper))
## KPSS Level = 0.057275, Truncation lag parameter = 6, p-value = 0.1
autoplot(enplanements)

lambda_enplanements <- BoxCox.lambda(enplanements)
ndiffs(enplanements)
## [1] 1
nsdiffs(enplanements)
## [1] 1
autoplot(diff(diff(BoxCox(enplanements, lambda_enplanements), lag = 12)))

Box.test(diff(diff(BoxCox(enplanements, lambda_enplanements), lag = 12)), type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  diff(diff(BoxCox(enplanements, lambda_enplanements), lag = 12))
## X-squared = 29.562, df = 1, p-value = 5.417e-08
ggAcf(diff(diff(BoxCox(enplanements, lambda_enplanements),lag = 12)))

kpss.test(diff(diff(BoxCox(enplanements, lambda_enplanements),lag = 12)))
## Warning in kpss.test(diff(diff(BoxCox(enplanements, lambda_enplanements), :
## p-value greater than printed p-value
## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(diff(BoxCox(enplanements, lambda_enplanements), lag = 12))
## KPSS Level = 0.042424, Truncation lag parameter = 5, p-value = 0.1
autoplot(visitors)

lambda_visitors <- BoxCox.lambda(visitors)
ndiffs(visitors)
## [1] 1
nsdiffs(visitors)
## [1] 1
autoplot(diff(diff(BoxCox(visitors, lambda_visitors),lag = 12)))

Box.test(diff(diff(BoxCox(visitors, lambda_visitors),lag = 12)),type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  diff(diff(BoxCox(visitors, lambda_visitors), lag = 12))
## X-squared = 21.804, df = 1, p-value = 3.02e-06
ggAcf(diff(diff(BoxCox(visitors, lambda_visitors),lag = 12)))

kpss.test(diff(diff(BoxCox(visitors, lambda_visitors),lag = 12)))
## Warning in kpss.test(diff(diff(BoxCox(visitors, lambda_visitors), lag =
## 12))): p-value greater than printed p-value
## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(diff(BoxCox(visitors, lambda_visitors), lag = 12))
## KPSS Level = 0.015833, Truncation lag parameter = 4, p-value = 0.1


  1. For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
retail <- read_excel("/Users/hovig/Downloads/retail.xlsx", skip=1)
retail.ts <- ts(retail[,"A3349873A"], frequency=12, start=c(1982,4))
autoplot(retail.ts)

ndiffs(retail.ts)
## [1] 1
nsdiffs(retail.ts)
## [1] 1
kpss.test(diff(diff(BoxCox(retail.ts, BoxCox.lambda(retail.ts)),lag = 12)))
## Warning in kpss.test(diff(diff(BoxCox(retail.ts,
## BoxCox.lambda(retail.ts)), : p-value greater than printed p-value
## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(diff(BoxCox(retail.ts, BoxCox.lambda(retail.ts)), lag = 12))
## KPSS Level = 0.013817, Truncation lag parameter = 5, p-value = 0.1


  1. Use R to simulate and plot some data from simple ARIMA models.
  1. Use the following R code to generate data from an AR(1) model with phi1 = 0.6 and sigma^2 = 1. The process starts with y1 = 0.
  2. Produce a time plot for the series. How does the plot change as you change phi1?
  3. Write your own code to generate data from an MA(1) model with theta1 = 0.6 and sigma^2 = 1.
  4. Produce a time plot for the series. How does the plot change as you change theta1?
  5. Generate data from an ARMA(1,1) model with phi1 = 0.6, theta1 = 0.6 and sigma^2 = 1.
  6. Generate data from an AR(2) model with phi1 = -0.8, phi2 = 0.3 and sigma^2 = 1. (Note that these parameters will give a non-stationary series.)
  7. Graph the latter two series and compare them.
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100){
   y[i] <- 0.6*y[i-1] + e[i]
}
ar1generator <- function(phi1){
  y <- ts(numeric(100))
  e <- rnorm(100)
  for(i in 2:100){
    y[i] <- phi1*y[i-1] + e[i]
  }
  return(y)
}
autoplot(ar1generator(0.3), series = "0.3") +
  geom_line(size = 1, colour = "red") +
  autolayer(y, series = "0.6", size = 1) +
  autolayer(ar1generator(0.9), size = 1, series = "0.9") +
  ylab("AR(1) models") +
  guides(colour = guide_legend(title = "Phi1"))

ma1generator <- function(theta1){
  y <- ts(numeric(100))
  e <- rnorm(100)
  for(i in 2:100){
    y[i] <- theta1*e[i-1] + e[i]
  }
  return(y)
}
autoplot(ma1generator(0.3), series = "0.3") +
  geom_line(size = 1, colour = "red") +
  autolayer(y, series = "0.6", size = 1) +
  autolayer(ar1generator(0.9), size = 1, series = "0.9") +
  ylab("MA(1) models") +
  guides(colour = guide_legend(title = "Theta1"))

y_arima.1.0.1 <- ts(numeric(50))
e <- rnorm(50)
for(i in 2:50){
   y_arima.1.0.1[i] <- 0.6*y_arima.1.0.1[i-1] + 0.6*e[i-1] + e[i]
}
y_arima.2.0.0 <- ts(numeric(50))
e <- rnorm(50)
for(i in 3:50){
   y_arima.2.0.0[i] <- -0.8*y_arima.2.0.0[i-1] + 0.3*y_arima.2.0.0[i-2] + e[i]
}
autoplot(y_arima.1.0.1, series = "ARMA(1, 1)") +
  autolayer(y_arima.2.0.0, series = "AR(2)") +
  ylab("y") +
  guides(colour = guide_legend(title = "Models"))

autoplot(y_arima.1.0.1)



  1. Consider the number of women murdered each year (per 100,000 standard population) in the United States. (Data set wmurders).
  1. By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q) model for these data.
  2. Should you include a constant in the model? Explain.
  3. Write this model in terms of the backshift operator.
  4. Fit the model using R and examine the residuals. Is the model satisfactory?
  5. Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.
  6. Create a plot of the series with forecasts and prediction intervals for the next three periods shown.
  7. Does auto.arima give the same model you have chosen? If not, which model do you think is better?
autoplot(wmurders)

autoplot(diff(wmurders))

ndiffs(wmurders)
## [1] 2
autoplot(diff(wmurders, differences = 2))

kpss.test(diff(wmurders, differences = 2))
## Warning in kpss.test(diff(wmurders, differences = 2)): p-value greater than
## printed p-value
## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(wmurders, differences = 2)
## KPSS Level = 0.045793, Truncation lag parameter = 3, p-value = 0.1
diff(wmurders, differences = 2) %>% ggtsdisplay()

wmurders_arima.0.2.2 <- Arima(wmurders, order = c(0, 2, 2))
checkresiduals(wmurders_arima.0.2.2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,2)
## Q* = 11.764, df = 8, p-value = 0.1621
## 
## Model df: 2.   Total lags used: 10
fc_wmurders_arima.0.2.2 <- forecast(wmurders_arima.0.2.2, h = 3)
fc_wmurders_arima.0.2.2$mean
## Time Series:
## Start = 2005 
## End = 2007 
## Frequency = 1 
## [1] 2.480525 2.374890 2.269256
fc_wmurders_arima.0.2.2$model
## Series: wmurders 
## ARIMA(0,2,2) 
## 
## Coefficients:
##           ma1     ma2
##       -1.0181  0.1470
## s.e.   0.1220  0.1156
## 
## sigma^2 estimated as 0.04702:  log likelihood=6.03
## AIC=-6.06   AICc=-5.57   BIC=-0.15

\((1 - B)^2*yt = (1 - 1.0181*B + 0.1470*B^2)*et\)

\(yt = 2yt-1 - yt-2 + et - 1.0181*et-1 + 0.1470*et-2\)

years <- length(wmurders)
e <- fc_wmurders_arima.0.2.2$residuals
fc1 <- 2*wmurders[years] - wmurders[years - 1] - 1.0181*e[years] + 0.1470*e[years - 1]
fc2 <- 2*fc1 - wmurders[years] + 0.1470*e[years]
fc3 <- 2*fc2 - fc1
c(fc1, fc2, fc3)
## [1] 2.480523 2.374887 2.269252
autoplot(fc_wmurders_arima.0.2.2)

fc_wmurders_autoarima <- forecast(auto.arima(wmurders), h = 3)
accuracy(fc_wmurders_arima.0.2.2)
##                      ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.0113461 0.2088162 0.1525773 -0.2403396 4.331729 0.9382785
##                     ACF1
## Training set -0.05094066
accuracy(fc_wmurders_autoarima)
##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.01065956 0.2072523 0.1528734 -0.2149476 4.335214 0.9400996
##                    ACF1
## Training set 0.02176343
fc_wmurders_autoarima2 <- forecast(auto.arima(wmurders, stepwise = FALSE, approximation = FALSE), h = 3)
accuracy(fc_wmurders_autoarima2)
##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.01336585 0.2016929 0.1531053 -0.3332051 4.387024 0.9415259
##                     ACF1
## Training set -0.03193856
ggtsdisplay(diff(wmurders, differences = 2))

checkresiduals(fc_wmurders_arima.0.2.2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,2)
## Q* = 11.764, df = 8, p-value = 0.1621
## 
## Model df: 2.   Total lags used: 10
checkresiduals(fc_wmurders_autoarima2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,3)
## Q* = 10.706, df = 7, p-value = 0.152
## 
## Model df: 3.   Total lags used: 10