Chapter 8 - ARIMA - Exercises

library(fpp2)
## Loading required package: ggplot2
## Loading required package: forecast
## Loading required package: fma
## Loading required package: expsmooth
library(gridExtra)

1. White Noise ACFs

Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

x1 <- rnorm(36)
x2 <- rnorm(360)
x3 <- rnorm(1000)
p1 <- ggAcf(x1, ylim=c(-1,1), main="", lag.max = 20)
## Warning: Ignoring unknown parameters: ylim, main
p2 <- ggAcf(x2, ylim=c(-1,1), main="", lag.max = 20)
## Warning: Ignoring unknown parameters: ylim, main
p3 <- ggAcf(x3, ylim=c(-1,1), main="", lag.max = 20)
## Warning: Ignoring unknown parameters: ylim, main
gridExtra::grid.arrange(p1,p2,p3,nrow=1)

Figure 8.31: Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.

a. Explain the differences among these figures. Do they all indicate that the data are white noise?

The dashed blue lines indicate whether correlations are significantly differently from zero. If one or more large spikes are outside these bounds, or if substantially more than 5% of spikes are outside these bounds, then the series is probably not white noise. The fact that all the spikes are bounded suggests that these are weak stationary processes.

b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

Each series has a different number of observations. The 95% confidence intervals are bounded ±2/???T where T is the length of the series, thus making them increasingly narrow as the sample size of observations increases. Thus the criteria for qualifying as white noise becomes increasingly strict with more observations.

2. IBM Stock Prices

A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose ). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

ggtsdisplay(ibmclose)

The ACF plot shows strong persistance over the course of 26 lag periods, a sign of autocorrelation and thus non-stationarity. In the PACF plot, only the first lag shows strong autocorrelation (at or near 1.0) with the following lagged correlations of the series resembling white noise after removing the effects of prior lags.

After first-differencing, the persistance dissipates, but there remain a number of instances in which the correlation values spike beyond the critical values.

ggtsdisplay(diff(ibmclose))

3. Transforming Various Time Series to Stationary Data

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Box-Cox Plot Transformation Function

bc.transform <- function (ts) {
  lambda <- BoxCox.lambda(ts)
  print(paste0("Lambda: ",  round(lambda,3)))
  df <- BoxCox(ts, lambda)
  invisible(df)
}
a. usnetelec (Daily)
# frequency(usnetelec) = 1
# ndiffs(usnetelec) = 1
usnetelec.stat <- usnetelec %>% bc.transform() %>% diff()
## [1] "Lambda: 0.517"
df1 <- cbind(RawSeries = usnetelec, DifferencedData = diff(usnetelec.stat))
autoplot(df1, facet=TRUE)

Straightforward first-differencing seems to make the time series weak stationary, as seen form the ACF and PACF plots. A Box-Cox transformation with lambda = 0.5167714 makes the distribution of residuals less dependent upon level.

ggtsdisplay(usnetelec.stat)

b. usgdp (Quarterly)
# frequency(usgdp) = 4
# ndiffs(usgdp) = 2
# nsdiffs(usgdp) = 0 
usgdp.stat <- usgdp %>% bc.transform() %>% diff() %>% diff()
## [1] "Lambda: 0.366"
df2 <- cbind(RawSeries = usgdp, DifferencedData = usgdp.stat)
autoplot(df2, facet=TRUE)

ggtsdisplay(usgdp.stat)

Despite twice-differencing (as confirmed by the ndiffs() function, which is based on the KPSS test), there is still a spike showing negative autocorrelations at lag 1.

c. mcopper (Monthly)
# frequency(mcopper) = 12
# ndiffs(mcopper) = 1
# nsdiffs(mcopper) = 0 
mcopper.stat <- mcopper %>% bc.transform() %>% diff()
## [1] "Lambda: 0.192"
df3 <- cbind(RawSeries = mcopper, DifferencedData = mcopper.stat)
autoplot(df3, facet=TRUE)

ggtsdisplay(mcopper.stat)

In the case of the mcopper time series, the best we can do to reduce non-stationarity is first-differencing, although the ACF and PACF still reveal problems.

d. enplanements (Monthly)
# frequency(enplanements) = 12
# ndiffs(enplanements) = 1
# nsdiffs(enplanements) = 1 
# nsdiffs(diff(enplanements,lag=12)) = 0 
enplanements.stat <- enplanements %>% bc.transform() %>% diff(lag=12) %>% diff()
## [1] "Lambda: -0.227"
df4 <- cbind(RawSeries = enplanements, DifferencedData = enplanements.stat)
autoplot(df4, facet=TRUE)

ggtsdisplay(enplanements.stat)

Here, we perform a combination of operations, Box-Cox transforming, seasonal (lag 12)- and then first-differencing.

e. visitors (Monthly)
# frequency(visitors) = 12
# ndiffs(visitors) = 1
# nsdiffs(visitors) = 1 
# nsdiffs(diff(visitors,lag=12)) = 0 
visitors.stat <- visitors %>% bc.transform() %>% diff(lag=12) %>% diff()
## [1] "Lambda: 0.278"
df5 <- cbind(RawSeries = visitors, DifferencedData = visitors.stat)
autoplot(df5, facet=TRUE)

ggtsdisplay(visitors.stat)

As with part d above, we perform a Box-Cox transformation, a seasonal (lag 12) difference, followed by a first difference.

5. Australian Retail Data (Monthly)

For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

retaildata <- readxl::read_excel('retail.xlsx', skip=1)
myts <- ts(retaildata[,"A3349824F"], frequency=12, start=c(1982,4))
# frequency(myts) = 12
# ndiffs(myts) = 1
# nsdiffs(myts) = 1 
# nsdiffs(diff(visitors,lag=12)) = 0 
myts.stat <- myts %>% bc.transform() %>% diff(lag=12) %>% diff()
## [1] "Lambda: 0.174"
df5 <- cbind(RawSeries = myts, DifferencedData = myts.stat)
autoplot(df5, facet=TRUE) +
  xlab("Years") + ylab("Sales") +
  ggtitle("Australian Retail Data (ID: A3349824F)")

ggtsdisplay(myts.stat)

In accordance with the ndiffs() and nsdiffs() function, we seasonally (lag 12) and first-difference after transforming.

6. ARIMA Simulations

Use R to simulate and plot some data from simple ARIMA models.

  1. Use the following R code to generate data from an AR(1) model with \(\phi_{1} = 0.6\) and \(\sigma^2=1\). The process starts with \(y_1=0\).
ar1 <- function(phi, n=100)
{
  y <- ts(numeric(n))
  e <- rnorm(n)
  for(i in 2:n)
    y[i] <- phi*y[i-1] + e[i]
  return(y)
}
tail(ar1(0.6))
## Time Series:
## Start = 95 
## End = 100 
## Frequency = 1 
## [1] -1.5839197 -2.0658998 -0.2559247 -0.3189330 -0.3655558 -0.3229565
(b) Produce a time plot for the series. How does the plot change as you change $\phi_1$?
dframe <- cbind(ar1(0.6), ar1(0.9), ar1(0.3), ar1(0), ar1(-0.3), ar1(-0.6), ar1(-0.9))
autoplot(dframe, facets = TRUE)

The lower the value for $\phi_1$ the more tightly-staggered the time plot of the series becomes.  This slope coefficient describes mean-reverting behavior in which next period's value should be predicted to be $\phi_1$ times as far away from the mean as this period's value.  If ??1  is negative, it predicts mean-reverting behavior with alternation of signs, i.e., it also predicts that Y will be below the mean next period if it is above the mean this period.(See: https://people.duke.edu/~rnau/411arim.htm)

(c) Write your own code to generate data from an MA(1) model with $\theta_{1}  =  0.6$ and $\sigma^2=1$.
ma1 <- function(theta, n=100)
{
  y <- ts(numeric(100))
  e <- rnorm(100)
  for(i in 2:100)
    y[i] <- theta*e[i-1] + e[i]
  return(y)
}
(d) Produce a time plot for the series. How does the plot change as you change $\theta_1$?
dframe <- cbind(ma1(0.6), ma1(0.9), ma1(0.3), ma1(0), ma1(-0.3), ma1(-0.6), ma1(-0.9))
autoplot(dframe, facets = TRUE)

$\theta_1$ is the moving average parameter. The higher this value when positive, the more weight given to the error of previous periods. That leads to a smoothing effect.


(e) Generate data from an ARMA(1,1) model with $\phi_{1} = 0.6$, $\theta_{1}  = 0.6$ and $\sigma^2=1$.
arma11 <- function(phi, theta, n=100)
{
  y <- ts(numeric(100))
  e <- rnorm(100)
  for(i in 2:100)
    y[i] <- phi*y[i-1] + theta*e[i-1] + e[i]
  return(y)
}
tail(arma11(0.6,0.6))
## Time Series:
## Start = 95 
## End = 100 
## Frequency = 1 
## [1] -1.2911243 -2.6547579 -3.2785213 -2.5945794 -1.3383482 -0.2696531
(f) Generate data from an AR(2) model with $\phi_{1} =-0.8$, $\phi_{2} = 0.3$ and $\sigma^2=1$. (Note that these parameters will give a non-stationary series.)
ar2 <- function(phi1, phi2, n=100)
{
  y <- ts(numeric(100))
  e <- rnorm(100)
  for(i in 3:100)
    y[i] <- phi1*y[i-1] + phi2*y[i-2] + e[i]
  return(y)
}
tail(ar2(-0.8,0.3))
## Time Series:
## Start = 95 
## End = 100 
## Frequency = 1 
## [1] -4454.340  4802.189 -5177.048  5583.334 -6018.535  6489.618
(g) Graph the latter two series and compare them.
ggtsdisplay(arma11(0.6,0.6))

dframe <- cbind(arma11(0.6,0.6), ar2(-0.8,0.3))
autoplot(dframe, facets = TRUE)

In the first ARMA model above, despite there being no differencing, the process seems to be stationary emitting white noise. In the AR autoregressive model however, there is wide oscillation with volatility increasing as time increases, showing non-stationarity, due to the accumulation of information from the second-order term (0.3) that passes forward in each successive period.. The oscillation results from the alternation of signs from the first order condition (-0.8).

7. US Homocides of Women

Consider wmurders , the number of women murdered each year (per 100,000 standard population) in the United States.

a. By studying appropriate graphs of the series in R, find an appropriate ARIMA( ) model for these data.

# frequency(wmurders) = 1
# ndiffs(wmurders) = 2
wmurders.diff <- wmurders %>% diff() %>% diff() 
ggtsdisplay(wmurders.diff)

Rule 7: If the ACF of the differenced series displays a sharp cutoff and/or the lag-1 autocorrelation is negative–i.e., if the series appears slightly “overdifferenced”–then consider adding an MA term to the model. The lag at which the ACF cuts off is the indicated number of MA terms.

From the PACF plot of the second-differenced time series, we discover the MA signature of negative lag-1 partial autocorrelation. Testing ARIMA(0,2,2), we note lower AICc error when we level up to ARIMA(0,2,3), which is not improved by adding a AR component.

We keep in mind that ARIMA(0,2,2) is a general linear exponential smoothing model, essentially the same as Holt’s model. It uses exponentially weighted moving averages to estimate both a local level and a local trend in the series. The long-term forecasts from this model converge to a straight line whose slope depends on the average trend observed toward the end of the series. (See: https://people.duke.edu/~rnau/411arim.htm)

(fit <- Arima(wmurders, order=c(0,2,2))) #AICc=-5.57
## Series: wmurders 
## ARIMA(0,2,2) 
## 
## Coefficients:
##           ma1     ma2
##       -1.0181  0.1470
## s.e.   0.1220  0.1156
## 
## sigma^2 estimated as 0.04702:  log likelihood=6.03
## AIC=-6.06   AICc=-5.57   BIC=-0.15
(fit.keep <- Arima(wmurders, order=c(0,2,3))) #AICc=-6.7
## Series: wmurders 
## ARIMA(0,2,3) 
## 
## Coefficients:
##           ma1     ma2      ma3
##       -1.0154  0.4324  -0.3217
## s.e.   0.1282  0.2278   0.1737
## 
## sigma^2 estimated as 0.04475:  log likelihood=7.77
## AIC=-7.54   AICc=-6.7   BIC=0.35
(fit <- Arima(wmurders, order=c(0,2,1))) #AICc=-6.24
## Series: wmurders 
## ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8995
## s.e.   0.0669
## 
## sigma^2 estimated as 0.04747:  log likelihood=5.24
## AIC=-6.48   AICc=-6.24   BIC=-2.54
(fit <- Arima(wmurders, order=c(1,2,3))) #AICc=-4.3
## Series: wmurders 
## ARIMA(1,2,3) 
## 
## Coefficients:
##           ar1      ma1     ma2      ma3
##       -0.0931  -0.9317  0.3287  -0.2880
## s.e.   0.5181   0.4974  0.6337   0.2483
## 
## sigma^2 estimated as 0.04565:  log likelihood=7.79
## AIC=-5.58   AICc=-4.3   BIC=4.27

b. Should you include a constant in the model? Explain.

No, we should not. The long-term forecasts will follow a quadratic trend when d=2 and c!= 0. Although population of growth of women may be quadratic over certain time frames, we do not expect that to hold over the long-term with population-growth constraints, nor do we expect an exponential increase in murders.

c. Write this model in terms of the backshift operator.

\((1-B)^2Y_t = (1-??_1B)^3\epsilon_{t}\)

d. Fit the model using R and examine the residuals. Is the model satisfactory?

checkresiduals(fit.keep)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,3)
## Q* = 10.706, df = 7, p-value = 0.152
## 
## Model df: 3.   Total lags used: 10

Although the residuals seem to have a positive skew, they pass the Ljung-Box test of mimicking white noise. Also, all the spikes of the ACF plot are now within the signi????cance limits.

e. Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

(fcast <- forecast:: forecast(fit.keep, h = 3))
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.408817 2.137718 2.679916 1.994206 2.823428
## 2006       2.365555 1.985092 2.746018 1.783687 2.947423
## 2007       2.290976 1.753245 2.828706 1.468588 3.113363

Adding the theta MA parameters, we have the function:

\((1-B)^2Y_t = (1+1.0154B)(1-0.4324B)(1+0.3217B)\epsilon_{t}\)

Applying this to this particular time series, the manual calculation is a variant of the below:

#y <- ((2.0154 * (fcast$x[49]-fcast$x[50])) * (0.5676 * (fcast$x[49]-fcast$x[50]))  * (1.3217 * (fcast$x[49]-fcast$x[50])) * fcast$residuals[50])/(1-fcast$x[50])^2
#y

(Needs confirmation)

f. Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

autoplot(fcast)

g. Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

(fit.auto <- auto.arima(wmurders, stepwise=FALSE, approximation=FALSE))
## Series: wmurders 
## ARIMA(0,2,3) 
## 
## Coefficients:
##           ma1     ma2      ma3
##       -1.0154  0.4324  -0.3217
## s.e.   0.1282  0.2278   0.1737
## 
## sigma^2 estimated as 0.04475:  log likelihood=7.77
## AIC=-7.54   AICc=-6.7   BIC=0.35
fcast.auto <- forecast:: forecast(fit.auto, h = 3)
autoplot(fcast.auto)

Yes, the models match: ARIMA(0,2,3). However, a problem with this model is that the predictions show a negative trend, even for very large h, and they do not seem to be bounded by 0.