(8.1)

  1. Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
  1. Explain the differences among these figures. Do they all indicate that the data are white noise?

They all indicate that the data are white noise. The ACF bars all fall within the blue dash line indicating that there is no significant autocorrelation within the data. The the data with 36 random numbers, the ACF bars are taller. As the number of random numbers increase, we can see that the height of the bars become smaller. This indicates lesser and lesser autocorrelation as the number of random numbers increase in the sample.

  1. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The dash lines are estimated wit consideration to the size of the data set [±1.96/sqrt(N)]. As the size of the data set increases, the absolute value of the critical value gets smaller. This means that for smaller observations, it is more likely to the data to exhibit correlation simply because of chance compared to much larger data sets. So, this is taken into consideration when estimating the boundaries of the dash lines. This means that higher autocorrelation is needed to reject the null hypothesis that the autocorrelation is zero when the data set is smaller.

(8.2)

A classic example of a non-stationary series is the daily closing IBM stock price series (data set disclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differences.

The plot shows trend. I’m not sure if it shows seasonality. But because it shows trend, this means that the data is not stationary.

The ACF plot decreases slowly and smoothly, which indicates that data is not stationary as the AFC for stationary data usually drops very quickly to zero.

The PACF shows that the 1st lag is very high, and all the other PACF values are close to zero. This indicates that the data is not stationary.

library(fma)
tsdisplay(ibmclose)

(8.3)

For the following series, find an appropriate Box-Cox transformation and order of difference in order to obtain stationary data.

library(expsmooth)

(a) usnetelec

The plot for usnetelec doesn’t seem to have seasonality, but it does have an upward trend.

ggtsdisplay(usnetelec)

The Ljung-Box test (p-value 0.36) for the first difference shows that the 1st difference can be considered as white noise. This suggests that first difference would suffice to make the data stationary.

Box.test(diff(usnetelec), type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  diff(usnetelec)
## X-squared = 0.8508, df = 1, p-value = 0.3563

The ndiffs function shows that this data would need 1st differencing to make it stationary.

ndiffs(usnetelec)
## [1] 1

(b) usgdp

The usgdp plot shows an upward trend. We know that this data is not stationary.

ggtsdisplay(usgdp)

The ndiffs function recommends a 2nd order of differencing to make the data stationary.

ndiffs(usgdp)
## [1] 2

The Ljung-Box test shows that 1st differencing of the data can’t be considered as white noise. This tests for null hypothesis that data is white noise. The p-value is very small, which means the null hypothesis is rejected.

Box.test(diff(usgdp), type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  diff(usgdp)
## X-squared = 39.187, df = 1, p-value = 3.85e-10

With 1st differencing, it appears that the data still has some slight upward trend.

ggtsdisplay(diff(usgdp))

The 2nd differenced data plot is below. It doesn’t have the trend anymore, but it looks like there is some variance.

ggtsdisplay(diff(diff(usgdp)))

We can apply Box-Cox transformation to stabilize the variance.

lambda <- BoxCox.lambda(usgdp)
usgdp %>% BoxCox(lambda) %>% diff(1) %>% diff(1) -> usgdp2
ggtsdisplay(usgdp2)

The test-statistic is small. We don’t have evidence to suggest that the data is not stationary.

usgdp2 %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.012 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

(c) mcopper

The plot shows some kind of trend, and it seems that there might be some seasonality.

ggtsdisplay(mcopper)

First, the variance is stabilize via Box-Cox transformation. The plot after the transformation looks like below.

lambda <- BoxCox.lambda(mcopper)
mcopper %>% BoxCox(lambda) -> mcopper2
ggtsdisplay(mcopper2)

I then apply the 1st differencing.

ggtsdisplay(diff(mcopper2))

The unit root test shows a small test-statistic, which does not provide evidence to reject the null hypothesis that the data is stationary.

usgdp2 %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.012 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

(d) enplanements

The plot shows trend and seasonality.

ggtsdisplay(enplanements)

Frequency of data is 12.

frequency(enplanements)
## [1] 12

To make the data stationary, Box-Cox is going to be applied to stabilize the variance, and then seasonally difference the data based on the frequency. The ACF looks like it still has some autocorrelation. So, the data is going to be differenced in the next step.

lambda <- BoxCox.lambda(enplanements)
enplanements %>% BoxCox(lambda) %>% diff(12) -> enplanements2
ggtsdisplay(enplanements2)

2nd differencing with lag 1 improves the ACF.

ggtsdisplay(diff(enplanements2))

The unit root test statistic is small, which does not provide evidence to reject the null hypothesis that the data is stationary.

diff(enplanements2) %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0424 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

(e) visitors

The plot shows trend and seasonality. Box-Cox is applied to stabilize the variance, and then seasonally difference the data. After that, the data is differenced for the second time with lag of 1.

ggtsdisplay(visitors)

frequency(visitors)
## [1] 12
lambda <- BoxCox.lambda(visitors)
visitors %>% BoxCox(lambda) %>% diff(12) %>% diff(1) -> visitors2
ggtsdisplay(visitors2)

The test statistic is small, which does not provide evidence to reject null hypothesis that data is stationary.

visitors2 %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.0158 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

(8.5)

For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

The plot shows upward trend and seasonality.

retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349335T"],frequency=12, start=c(1982,4))
ggtsdisplay(myts)

The frequency of the data is 12.

frequency(myts)
## [1] 12

To make this data stationary, Box-Cox transformation is going to be applied to stabilize the variance. Then, a seasonal difference is going to be taken followed by a second difference with lag of 1.

lambda <- BoxCox.lambda(myts)
myts %>% BoxCox(lambda) %>% diff(12) %>% diff(1) -> myts2
ggtsdisplay(myts2)

The unit root test statistic is small, which does not provide evidence to reject the null hypothesis that data is stationary.

myts2 %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0162 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

(8.6)

Use R to simulate and plot some data from simple ARIMA models.

  1. Use the following R code to generate data from an AR(1) model with phi1 = 0.6 and sigma^2 = 1. The process starts with y1 = 0.
  2. Produce a time plot for the series. How does the plot change as you change phi1?
  3. Write your own code to generate data from an MA(1) model with theta1 = 0.6 and sigma^2 = 1.
  4. Produce a time plot for the series. How does the plot change as you change theta1?
  5. Generate data from an ARMA(1,1) model with phi1 = 0.6, theta1 = 0.6 and sigma^2 = 1.
  6. Generate data from an AR(2) model with phi1 = -0.8, phi2 = 0.3 and sigma^2 = 1. (Note that these parameters will give a non-stationary series.)
  7. Graph the latter two series and compare them

(a)

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100){
   y[i] <- 0.6*y[i-1] + e[i]
}

(b)

Produce a time plot for the series. How does the plot change as you change phi1?

ar1_simulation <- function(phi1){
  y <- ts(numeric(100))
  e <- rnorm(100)
  for(i in 2:100){
    y[i] <- phi1*y[i-1] + e[i]
  }
  return(y)
}

It appears that as phi1 increases, the plot becomes more dispersed.

autoplot(ar1_simulation(0.1), series = "0.1") + 
  autolayer(ar1_simulation(0.6), series = "0.6") + 
    autolayer(ar1_simulation(.9), series = "0.9")

(c)

Write your own code to generate data from an MA(1) model

ma1_simulation <- function(theta1){
  y <- ts(numeric(100))
  e <- rnorm(100)
  for(i in 2:100){
    y[i] <- theta1*e[i-1] + e[i]
  }
  return(y)
}

(d)

Produce a time plot for the series. How does the plot change as you change theta1?

The higher the value of theta1, the more influence the previous error has on the ith term. The lower the value of theta in the MA(1) model, the next data point is less influenced by the error in the previous term.

autoplot(ma1_simulation(0.1), series = "0.1") + 
  autolayer(ma1_simulation(0.6), series = "0.6") + 
    autolayer(ma1_simulation(.9), series = "0.9")

(e)

Generate data from an ARMA(1,1) model with phi1 = 0.6, theta1 = 0.6 and sigma^2 = 1.

y <- ts(numeric(50))
e <- rnorm(50)
for(i in 2:50){
   y[i] <- 0.6*y[i-1] + 0.6*e[i-1] + e[i]
}

(f)

Generate data from an AR(2) model with phi1 = -0.8, phi2 = 0.3 and sigma^2 = 1. (Note that these parameters will give a non-stationary series.)

y2 <- ts(numeric(50))
e <- rnorm(50)
for(i in 3:50){
   y2[i] <- -0.8*y2[i-1] + 0.3*y2[i-2] + e[i]
}

(g)

Graph the latter two series and compare them

ARMA(1,1) plot uses autogregression with moving average. AR(2) uses autoregression only with lag of 2. As (f) indicated, AR(2) parameters produced non-stationary series. AR(2) does not oscillate and appears to be a lot more stationary.

autoplot(y, series = "ARMA(1,1)") + 
  autolayer(y2, series = "AR(2)") 

(8.7)

Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.

  1. By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q) model for these data.
  2. Should you include a constant in the model? Explain.
  1. Write this model in terms of the backshift operator.
  1. Fit the model using R and examine the residuals. Is the model satisfactory?
  2. Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.
  3. Create a plot of the series with forecasts and prediction intervals for the next three periods shown.
  4. Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

(a)

By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q) model for these data.

Plot shows there’s trend in the data but not much seasonality.

ggtsdisplay(wmurders)

The ndiffs functions suggest doing 2 orders of differencing.

ndiffs(wmurders)
## [1] 2
wmurders2 <- diff(diff(wmurders))
ggtsdisplay(wmurders2)

KPSS test statistic is small, which means that there is no evidence to reject null hypothesis that data is stationary.

wmurders2 %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.0458 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

For both ACF and PACF, lag 1 is significant. An appropriate ARIMA model could be ARIMA(1,2,1).

(b)

Should you include a constant in the model? Explain.

The data in ARIMA was differenced twice (d=2). If there is a constant in the model, as per the textbook, a value of d=2 would lead to a long-term forecasts follow a quadratic trend. Since this is dangerous in forecasting, we would exclude a constant in the model.

(d)

Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

(model <- Arima(wmurders, order=c(1,2,1)))
## Series: wmurders 
## ARIMA(1,2,1) 
## 
## Coefficients:
##           ar1      ma1
##       -0.2434  -0.8261
## s.e.   0.1553   0.1143
## 
## sigma^2 estimated as 0.04632:  log likelihood=6.44
## AIC=-6.88   AICc=-6.39   BIC=-0.97

The ACF bars fall within the limits. The residual histogram shows that it is generally normal in distribution. This model is okay.

checkresiduals(model)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,2,1)
## Q* = 12.419, df = 8, p-value = 0.1335
## 
## Model df: 2.   Total lags used: 10

(f)

Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

autoplot(forecast(model), h=3)

The auto.arima function recommends ARIMA(1,2,1).

(model2 <- auto.arima(wmurders, seasonal=FALSE))
## Series: wmurders 
## ARIMA(1,2,1) 
## 
## Coefficients:
##           ar1      ma1
##       -0.2434  -0.8261
## s.e.   0.1553   0.1143
## 
## sigma^2 estimated as 0.04632:  log likelihood=6.44
## AIC=-6.88   AICc=-6.39   BIC=-0.97

It appears that auto.arima and the manual arima selection of p, d, q are the same in this case.