library(fpp2)
library(ggplot2)

Homework-6 ARIMA models

Exercise 8.1

Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

  1. Explain the differences among these figures. Do they all indicate that the data are white noise?

    Figure 8.31: Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.

    For a white noise series, we expect \(95\%\) of the spikes in the ACF to lie within \(\pm2/\sqrt T\), where \(T\) is the length of the time series. That is why, as \(T\) gets larger, the range between the dashed lines around the mean of zero in the diagrams is getting narrower. The diagrams do have some spikes touching or going slightly beyond the \(95\%\) interval border lines and, counted together, none of them make up more than \(5\%\) of \(T\) values. Therefore all 3 series can be regarded as white noise.

  2. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

    The reason why the critical values are at different distances from the mean of zero is because there is a random autocorrelation with some positive and negative values around the zero line.

    Given that the 3 series are composed of randomly chosen numbers, we expect the values and subsequently the autocorrelation values (in magnitude and direction) to be also random as well. Therefore, we expect the graphics to look different from each other and to show random and small in magnitude fluctuations which is the very definition of white noise.


Exercise 8.2

A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

ggtsdisplay(ibmclose)

Both ACF and PACF graphs show very significant seasonal (daily) autocorrelation, which certainly disqualifies the series being similar to white noise. Also, the graph for the series itself shows cycles of lengthy periods of generally upward and then downward trends.


Exercise 8.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

  1. usnetelec
  2. usgdp
  3. mcopper
  4. enplanements
  5. visitors

a. Box-Cox for usnetelec

autoplot(usnetelec)

(lambda <- BoxCox.lambda(usnetelec))
## [1] 0.5167714
autoplot(BoxCox(usnetelec, lambda))

Even though the suggested lambda is 0.5167714, the transformation made no apparent difference to reduce variation in the data. This may not be surprising as there is no evident trend of increasing or decreasing variation in the time series data. Therefore, in this particular case, no Box-Cox transformation is required.

Also, given that there is no observable seasonality here, there is no need to perform Seasonal Differencing. Therefore it appears to be appropriate to perform ordinary differencing (First Differencing, meaniing at lag 1). The below graph for the differencing and the Box.test confirm that the differenced data can be regarded as white noise.


ggtsdisplay(diff(usnetelec))

Box.test(diff(usnetelec))
## 
##  Box-Pierce test
## 
## data:  diff(usnetelec)
## X-squared = 0.80522, df = 1, p-value = 0.3695

b. Box-Cox for usgdp

autoplot(usgdp)

(lambda <- BoxCox.lambda(usgdp))
## [1] 0.366352
autoplot(BoxCox(usgdp, lambda))

In this example, the Box-Cox transformation, with lambda 0.366352, is helpful as it removed the curvature in the original data and therefore makes it possible for a straight-line linear regression model.

Again, as in (a) above, linearly trending data can be applied to First Defferencing to obtain stationary, “white noise” representation.


ggtsdisplay(diff(BoxCox(usgdp, lambda), lag = frequency(usgdp)))

ndiffs(BoxCox(usgdp, lambda))
## [1] 1

The differenced data unfortunately do not look like pure white noise, however performing ndiffs test shows that only 1 order of differencing is appropriate.

c. Box-Cox for mcopper

ggtsdisplay(mcopper)

(lambda <- BoxCox.lambda(mcopper))
## [1] 0.1919047
ggtsdisplay(BoxCox(mcopper, lambda))

For this data, the Box-Cox transformation, in my opinion, did not make any improvements.

The data appears to have monthly seasonality and therefore it’s best to use Lag-1 differencing.


ggtsdisplay(diff(mcopper))

ndiffs(mcopper)
## [1] 1

d. Box-Cox for enplanements

ggtsdisplay(enplanements)

(lambda <- BoxCox.lambda(enplanements))
## [1] -0.2269461
ggtsdisplay(BoxCox(enplanements, lambda))

The Box-Cox transformation makes slight improvement in stabilizing the high and lows of the seasonal patterns in the data and therefore, in my opinion, it has some variance reducing effect on the data.

Based on the Lag graphs for the transformed data, there appears to be yearly (Lag 12), as well as monthly seasonality (Lag 1). Even though ndiffs recommends one order differencing, I think performing Second-order differencing improves the data to look more like white noise.


tdata <- BoxCox(enplanements, lambda)
tdata %>% diff(lag = frequency(enplanements)) %>% ggtsdisplay()

tdata %>% diff(lag = frequency(enplanements)) %>% diff() %>%  ggtsdisplay()

tdata %>% ndiffs()
## [1] 1

e. Box-Cox for visitors

ggtsdisplay(visitors)

(lambda <- BoxCox.lambda(visitors))
## [1] 0.2775249
tdata <- BoxCox(visitors, lambda)
tdata %>% ggtsdisplay()

The Box-Cox transformation makes a good improvement in reducing variability of the time series progression.

Again, similar to (d) above, the Lag graphs suggest yearly (Lag 12), as well as monthly seasonality (Lag 1). Even though ndiffs recommends one order differencing, I think performing Second-order differencing improves the data, albeit slightly, to look more like white noise.


tdata %>% diff() %>%  ggtsdisplay()

tdata %>% diff() %>%  diff(lag = frequency(visitors)) %>% ggtsdisplay()

tdata %>% ndiffs()
## [1] 1

Exercise 8.5

For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

retaildata <- readxl::read_excel("retail.xlsx", skip = 1)
myts <- ts(retaildata[,"A3349398A"], frequency=12, start=c(1982,4))
myts %>% ggtsdisplay()

lambda <- BoxCox.lambda(myts)
tdata <- BoxCox(myts, lambda)
tdata %>% ggtsdisplay()

The transformation with a low-value lambda (\(\lambda = 0.1231563\)) straightens the trend line of the data.

There is a clear yearlly seasonality and therefore single Seasonal differencing should make this data look stationary.


tdata %>% diff(lag = frequency(tdata)) %>% ggtsdisplay()

tdata %>% ndiffs()
## [1] 1

Even though the above Lag plots don’t make it look like some pure white noise, trying to do further, second-order, differrencing does not offer any improvement and actually makes it worse.


tdata %>% diff(lag = frequency(tdata)) %>% diff() %>%  ggtsdisplay()


Exercise 8.6

Use R to simulate and plot some data from simple ARIMA models.

  1. Use the following R code to generate data from an AR(1) model with \(\phi_{1} = 0.6\) and \(\sigma^2=1\). The process starts with \(y_1=0\).

    y <- ts(numeric(100))
    e <- rnorm(100)
    for(i in 2:100)
      y[i] <- 0.6*y[i-1] + e[i]
  2. Produce a time plot for the series. How does the plot change as you change \(\phi_{1}\)?

  3. Write your own code to generate data from an MA(1) model with \(\theta_{1} = 0.6\) and \(\sigma^2=1\).

  4. Produce a time plot for the series. How does the plot change as you change \(\theta_{1}\)?

  5. Generate data from an ARMA(1,1) model with \(\phi_{1} = 0.6\), \(\theta_{1} = 0.6\) and \(\sigma^2=1\).

  6. Generate data from an AR(2) model with \(\phi_{1} = -0.8\), \(\phi_{2} = 0.3\) and \(\sigma^2=1\). (Note that these parameters will give a non-stationary series.)

  7. Graph the latter two series and compare them.

a. R code to generate AR(1) data

set.seed(123)
tsgenAR1 <- function(phi) {
    y <- ts(numeric(100))
    e <- rnorm(100)
    for(i in 2:100)
      y[i] <- phi*y[i-1] + e[i]
    return (y)
}

b. Time plots for the AR(1) series

Time plot for the series with \(\phi_{1}=0.6\) [oscillations around mean of \(0\)]


tsgenAR1(0.6) %>% ggtsdisplay()

Time plot for the series with \(\phi_{1}=1\) [random walk]


tsgenAR1(1) %>% ggtsdisplay()

Time plot for the series with \(\phi_{1}=0\) [white noise]


tsgenAR1(0) %>% ggtsdisplay()

c. R code to generate MA(1) data

tsgenMA1 <- function(theta) {
    y <- ts(numeric(100))
    e <- rnorm(100)
    for(i in 2:100)
      y[i] <- theta*e[i-1] + e[i]
    return (y)
}

d. Time plots for the MA(1) series

Time plot for the series with \(\theta_{1}=0.6\) [oscillations around mean of \(0\)]


tsgenMA1(0.6) %>% ggtsdisplay()

Time plot for the series with \(\theta_{1}=1\)

The closer \(\theta\) values get to \(1\), the less the data looks like white noise.


tsgenMA1(1) %>% ggtsdisplay()

Time plot for the series with \(\theta_{1}=0\) [white noise]


tsgenMA1(0) %>% ggtsdisplay()

e. R code to generate ARMA(1,1) data

tsgenARMA1 <- function(phi, theta) {
    y <- ts(numeric(100))
    e <- rnorm(100)
    for(i in 2:100)
      y[i] <- phi*y[i-1] + theta*e[i-1] + e[i]
    return (y)
}

f. R code to generate AR(2) data

tsgenAR2 <- function(phi_1, phi_2) {
    y <- ts(numeric(100))
    e <- rnorm(100)
    for(i in 3:100)
      y[i] <- phi_1*y[i-1] + phi_2*y[i-2] + e[i]
    return (y)
}

g. Graph and compare ARMA(1,1) and AR(2)

Time plot for the ARMA(1,1) series with \(\phi_{1} = 0.6\), \(\theta_{1} = 0.6\)


tsgenARMA1(0.6, 0.6) %>% ggtsdisplay()

Time plot for the AR(2) series with \(\phi_{1} = -0.8\), \(\phi_{2} = 0.3\)


tsgenAR2(-0.8, 0.3) %>% ggtsdisplay()

Both data oscillate around zero mean, however ARMA(1,1) look more like white noise, whereas AR(2) appears to have some gradual but exponential increase in the variance.


Exercise 8.7

Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.

  1. By studying appropriate graphs of the series in R, find an appropriate \(ARIMA(p,d,q)\) model for these data.
  2. Should you include a constant in the model? Explain.
  3. Write this model in terms of the backshift operator.
  4. Fit the model using R and examine the residuals. Is the model satisfactory?
  5. Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.
  6. Create a plot of the series with forecasts and prediction intervals for the next three periods shown.
  7. Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

a. Find \(ARIMA(p,d,q)\) model

wmurders %>% ggtsdisplay()

  1. The data initially show an increasing trend upwards. Then it oscillates around certain level and then finally shows steady decline. It is possible that the data follow similar pattern of historical crime rate. There is a spike post year 2000, which would suggest the deaths on 9/11 2001.

  1. There is no evidence of changing variance, so Box-Cox transformation can be skipped.

  1. To turn the data into stationary representation, first differencing is used even though unit-root test suggested second-order differencing. Oddly, performing Box.test on second-order differencing contradicts the ndiffs result and confirms that the single first differencing would be enough.


wmurders %>% ndiffs()
## [1] 2
wmurders %>% diff() %>% Box.test()
## 
##  Box-Pierce test
## 
## data:  .
## X-squared = 0.39628, df = 1, p-value = 0.529
wmurders %>% diff() %>% diff() %>% Box.test()
## 
##  Box-Pierce test
## 
## data:  .
## X-squared = 24.722, df = 1, p-value = 6.623e-07
wmurders %>% diff() %>% ggtsdisplay()
  1. Given the ACF/PACF plots above, the candidate models to try would be ARIMA(2,1,0) and ARIMA(0,1,2). The two models are very similar, with ARIMA(0,1,2) having slightly better (lower) information criterial values (AICc, etc.).


(fit1 <- Arima(wmurders, order = c(2,1,0)))
## Series: wmurders 
## ARIMA(2,1,0) 
## 
## Coefficients:
##           ar1     ar2
##       -0.0572  0.2967
## s.e.   0.1277  0.1275
## 
## sigma^2 estimated as 0.04265:  log likelihood=9.48
## AIC=-12.96   AICc=-12.48   BIC=-6.99
(fit2 <- Arima(wmurders, order = c(0,1,2)))
## Series: wmurders 
## ARIMA(0,1,2) 
## 
## Coefficients:
##           ma1     ma2
##       -0.0660  0.3712
## s.e.   0.1263  0.1640
## 
## sigma^2 estimated as 0.0422:  log likelihood=9.71
## AIC=-13.43   AICc=-12.95   BIC=-7.46
checkresiduals(fit2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,2)
## Q* = 9.7748, df = 8, p-value = 0.2812
## 
## Model df: 2.   Total lags used: 10
  1. Trying different models, by increasing values of parameters p, d, and q, did not show better AICc results. Therefore the preferred model would be ARIMA(0,1,2).

  1. The residuals, shown above, from ARIMA(0,1,2) model do indeed look like white noise, confirming a good fit.

b. Constant in the model?

Given that there is no consistent trend, the model should not include a constant which would introduce a drift.

c. Backshift notation

\[(1 - B)y_{t}\]

d. Fit and residuals

As already mentioned in part (a), the ARIMA(0,1,2) is a good fit with residuals looking as white noise.

e. Forecast checked by hand

f3 <- forecast(fit2, h=3)
f3
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.458450 2.195194 2.721707 2.055834 2.861066
## 2006       2.477101 2.116875 2.837327 1.926183 3.028018
## 2007       2.477101 1.979272 2.974929 1.715738 3.238464

Checking forcast by hand for ARIMA(0,1,2) and \(T+1\) \[ y_{t} = c + \varepsilon_t + \theta_{1}\varepsilon_{t-1} + \theta_{2}\varepsilon_{t-2} \] \[ y_{T+h} = c + \varepsilon_{T+h} + \theta_{1}\varepsilon_{T+h-1} + \theta_{2}\varepsilon_{T+h-2} \] \[ y_{T+1} = 0 + 0 + \theta_{1}\varepsilon_{T} + \theta_{2}\varepsilon_{T-1} \]

res <- resid(fit2)
len <- length(res)
et <- res[len]
et_1 <- res[len - 1]
ma1 <- coef(fit2)["ma1"]
ma2 <- coef(fit2)["ma2"]

\[ y_{T+1} = (-0.066)*(0.0502) + (0.3712)*(-0.3438) \]


Point Forcast for \(h=1\) below:

(yt1 <- wmurders[length(wmurders)] + ma1 * et + ma2 * et_1)
##     ma1 
## 2.45845

Point Forcast for \(h=2\) below:

(yt2 <- yt1 + ma2 * et)
##      ma1 
## 2.477101

Point Forcast for \(h=3\) below (same as for \(h=2\)):

(yt3 <- yt2)
##      ma1 
## 2.477101

f. Forecast plot

autoplot(f3)

g. auto.arima() model

(fit <- auto.arima(wmurders, seasonal = FALSE, stepwise = FALSE, approximation = FALSE))
## Series: wmurders 
## ARIMA(0,2,3) 
## 
## Coefficients:
##           ma1     ma2      ma3
##       -1.0154  0.4324  -0.3217
## s.e.   0.1282  0.2278   0.1737
## 
## sigma^2 estimated as 0.04475:  log likelihood=7.77
## AIC=-7.54   AICc=-6.7   BIC=0.35

The auto.arima() model turned out to be inferior with much worse information criteria (AICc) values. The manually ARIMA(0,1,2) is best because as mentioned earlier the unit-root test, ndiff(), result was not ideal, which must’ve influenced the auto.arima() outcome.