Question - 8.1

Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

(a.) Explain the differences among these figures. Do they all indicate that the data are white noise?

(b.) Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

(a)

Explain the differences among these figures. Do they all indicate that the data are white noise?

Answer:

  1. The ACF lags and autocorrelations approach zero while the sample size increases.

  2. No spikes exceed the critical value threshold. All three plots contains only white noise.

(b)

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

Answer:

  1. The critical values are defined to be within \(\pm \frac{1.96}{\sqrt{T}}\) where T is the length of the time series. As \(T\) gets larger, the absolute value of the critical value decreases.

  2. Therefore, smaller sample size has a larger range of the critical value, and the larger the sample size the smaller the absolute critical value.

Question - 8.2

A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

(a)

Use R to plot the daily closing prices for IBM stock and the ACF and PACF.

Answer:

(b)

Explain how each plot shows that the series is non-stationary and should be differenced.

Answer:

According to the plot:

  1. There is an upward trend before 120 and a downward trend between 120 and 270 days.

  2. The ACF lags slowly decrease and show no seasonality.

  3. The PACF has a significant 1st lag \(\approx 1\) and all others close to zero.

  4. Therefore, this time series is non-stationary and should be differenced to produce a stationary time series.

Question - 8.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

(a.) usnetelec

(b.) usgdp

(c.) mcopper

(d.) enplanements

(e.) visitors

(a)

usnetelec

Answer:

  1. According to the plot, there is an upward trend with no seasonality.

  2. The ACF lags slowly decrease and show no seasonality.

  3. The PACF has a significant 1st lag \(\approx 1\) and all others close to zero.

  4. Thus, this time series is non-stationary and should be differenced to produce a stationary time series.

  5. The order of differences we got from the transformed data is 2.

  6. Applying Box-Cox transformation and 2 differencing, we have a small value of test-statistic.

  7. Therefore, the final data is made stationary.

## [1] 2
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.072 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

(b)

usgdp

Answer:

  1. According to the plot, there is an upward trend with no seasonality.

  2. The ACF lags slowly decrease and show no seasonality.

  3. The PACF has a significant 1st lag \(\approx 1\) and all others close to zero.

  4. Thus, this time series is non-stationary and should be differenced to produce a stationary time series.

  5. The order of differences we got from the transformed data is 1.

  6. Applying Box-Cox transformation and 1 differencing, we have a small value of test-statistic.

  7. Therefore, the final data is made stationary.

## [1] 1
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.2013 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

(c)

mcopper

Answer:

  1. According to the plot, there is a slight seasonality and an upward trend.

  2. The ACF lags slowly decrease.

  3. The PACF has a significant 1st lag \(\approx 1\), a slightly significant 2nd lag.

  4. Thus, this time series is non-stationary and should be differenced to produce a stationary time series.

  5. The order of differences we got from the transformed data is 1.

  6. Applying Box-Cox transformation and 1 differencing, we have a small value of test-statistic.

  7. Therefore, the final data is made stationary.

## [1] 1
## [1] 0
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 0.0573 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

(d)

enplanements

Answer:

  1. According to the plot, the time series has seasonality and an upward trend.

  2. The ACF lags show the seasonality.

  3. Thus, this time series is non-stationary and should be differenced to produce a stationary time series.

  4. The order of differences we got from the transformed data is 1.

  5. The order of seasonal differences we got from the transformed data is 1.

  6. Applying Box-Cox transformation, 1 differencing and 1 seasonal differencing, we have a small value of test-statistic.

  7. Therefore, the final data is made stationary.

## [1] 1
## [1] 1
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0424 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

(e)

visitors

Answer:

  1. According to the plot, the time series has seasonality and an upward trend.

  2. The ACF lags show the seasonality.

  3. Thus, this time series is non-stationary and should be differenced to produce a stationary time series.

  4. The order of differences we got from the transformed data is 1.

  5. The order of seasonal differences we got from the transformed data is 1.

  6. Applying Box-Cox transformation, 1 differencing and 1 seasonal differencing, we have a small value of test-statistic.

  7. Therefore, the final data is made stationary.

## [1] 1
## [1] 1
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.0158 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

Question - 8.5

For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

(a)

Answer:

  1. According to the plot, the time series has seasonality and an upward trend.

  2. The ACF lags slowly decrease.

  3. The PACF has a significant 1st lag \(\approx 1\) and slightly significant 2nd and 3rd lag.

  4. Thus, this time series is non-stationary and should be differenced to produce a stationary time series.

  5. The order of differences we got from the transformed data is 1.

  6. The order of seasonal differences we got from the transformed data is 1.

  7. Applying Box-Cox transformation, 1 differencing and 1 seasonal differencing, we have a small value of test-statistic.

  8. Therefore, the final data is made stationary.

## [1] 1
## [1] 1
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0165 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

Question - 8.6

Use R to simulate and plot some data from simple ARIMA models.

(a.) Use the following R code to generate data from an AR(1) model with \(\phi_{1} = 0.6 \; and \; \sigma^{2} = 1\). The process starts with \(y_{1} = 0\).

(b.) Produce a time plot for the series. How does the plot change as you change \(\phi_{1}\)?

(c.) Write your own code to generate data from an MA(1) model with \(\theta_{1} = 0.6 \; and \; \sigma^{2} = 1\).

(d.) Produce a time plot for the series. How does the plot change as you change \(\theta_{1}\)?

(e.) Generate data from an ARMA(1,1) model with \(\phi_{1} = 0.6, \; \theta_{1} = 0.6 \; and \; \sigma^{2} = 1\).

(f.) Generate data from an AR(2) model with \(\phi_{1} = -0.8, \; \phi_{2} = 0.3 \; and \; \sigma^{2} = 1\). (Note that these parameters will give a non-stationary series.)

(g.) Graph the latter two series and compare them.

(a)

Use the following R code to generate data from an AR(1) model with \(\phi_{1} = 0.6 \; and \; \sigma^{2} = 1\). The process starts with \(y_{1} = 0\).

Answer:

(b)

Produce a time plot for the series. How does the plot change as you change \(\phi_{1}\)?

Answer:

In textbook 8.3, we know that for an AR(1) model:

  1. \(-1 < \phi_{1} < 1\)

  2. When \(\phi_{1} < 0\), \(y_{t}\) tends to oscillate between positive and negative values.

  3. When \(\phi_{1} = 0\), \(y_{t}\) is equivalent to white noise.

  4. When \(\phi_{1} = 1 \; and \; c = 0\), \(y_{t}\) is equivalent to a random walk.

(c)

Write your own code to generate data from an MA(1) model with \(\theta_{1} = 0.6 \; and \; \sigma^{2} = 1\).

Answer:

(d)

Produce a time plot for the series. How does the plot change as you change \(\theta_{1}\)?

Answer:

In textbook 8.4, we know that for an MA(1) model:

  1. \(-1 < \theta_{1} < 1\)

  2. Having \(\left | \theta \right | < 1\), the most recent observations have higher weight than observations from the most distant past. And this is invertible.

  3. According to the ACF plot when \(\theta_{1} = 0\), \(y_{t}\) is equivalent to white noise.

(e)

Generate data from an ARMA(1,1) model with \(\phi_{1} = 0.6, \; \theta_{1} = 0.6 \; and \; \sigma^{2} = 1\).

Answer:

(f)

Generate data from an AR(2) model with \(\phi_{1} = -0.8, \; \phi_{2} = 0.3 \; and \; \sigma^{2} = 1\). (Note that these parameters will give a non-stationary series.)

Answer:

(g)

Graph the latter two series and compare them.

Answer:

  1. ARMA(1,1): \(\left| \phi_{1} \right| = 0.6 < 1\) and \(\left| \theta_{1} \right| = 0.6 < 1\), this time series is stationary and invertible.

  2. AR(2): \(\phi_{2} - \phi_{1} = 1.1 > 1\), this time series is non-stationary.

  3. ARMA(1,1) shows seasonality. AR(2) oscillates exponentially over time.

  4. The ACF and PACF in ARMA(1,1) have two significant lags only but the ACF in AR(2) flips between positive and negative when decreasing in absolute value over time.

Question - 8.7

Consider wmurders , the number of women murdered each year (per 100,000 standard population) in the United States.

(a.) By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q) model for these data.

(b.) Should you include a constant in the model? Explain.

(c.) Write this model in terms of the backshift operator.

(d.) Fit the model using R and examine the residuals. Is the model satisfactory?

(e.) Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

(f.) Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

(g.) Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

(a)

By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q) model for these data.

Answer:

  1. According to the plot, the time series has an upward trend follows by a downward trend with no seasonality.

  2. The ACF lags decrease in value slowly.

  3. The PACF has a significant 1st lag \(\approx 1\) and all others close to zero.

  4. Thus, this time series is non-stationary and should be differenced to produce a stationary time series.

  5. The order of differences we got from the transformed data is 2.

  6. Applying Box-Cox transformation and 1 differencing, we have a small value of test-statistic.

  7. Thus, the final data is made stationary.

  8. ARIMA(p,d,q) model: p is order of the autoregressive part, d is degree of first differencing involved, q is order of the moving average.

  9. According to the steps and plots, we have d=2, p=1 with only one significant lag in PACF, and q=2 with two significant lags in ACF.

  10. Therefore, ARIMA(1,2,2) is the appropriate model for these data.

## [1] 2
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.0532 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

(b)

Should you include a constant in the model? Explain.

Answer:

In textbook 8.5, we know that for ARIMA models:

  1. If \(c=0 \; and \; d=2\), the long-term forecasts will follow a straight line.

  2. If \(c\neq0 \; and \; d=2\), the long-term forecasts will follow a quadratic trend.

  3. A constant will be excluded in the model as quadratic trend is not good for forecasting.

(c)

Write this model in terms of the backshift operator.

Answer:

According to textbook 8.8, the model in terms of the backshift operator is:

\((1-\phi_{1}B)(1-B)^{2}y_{t} = (1+\theta_{1}B+\theta_{2}B^{2})\varepsilon_{t}\)

(d)

Fit the model using R and examine the residuals. Is the model satisfactory?

Answer:

  1. The lags in ACF plot are all within the threshold limit, which are all white noise. This is also proved by the p-value 0.2111.

  2. The residual histogram is nearly normal with mean close to 0.

  3. The model ARIMA(1,2,2) is satisfactory.

  4. Plug in the coefficients, we have \((1+0.7677B)(1-B)^{2}y_{t} = (1-0.2812B-0.4977B^{2})\varepsilon_{t}\).

## Series: wmurders 
## ARIMA(1,2,2) 
## 
## Coefficients:
##           ar1      ma1      ma2
##       -0.7677  -0.2812  -0.4977
## s.e.   0.2350   0.2879   0.2762
## 
## sigma^2 estimated as 0.04552:  log likelihood=7.38
## AIC=-6.75   AICc=-5.92   BIC=1.13

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,2,2)
## Q* = 9.6215, df = 7, p-value = 0.2111
## 
## Model df: 3.   Total lags used: 10

(e)

Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

Answer:

  1. Forecast three times ahead by function:
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.534015 2.260584 2.807446 2.115838 2.952192
## 2006       2.404157 2.026793 2.781521 1.827029 2.981286
## 2007       2.331482 1.829669 2.833296 1.564025 3.098940
  1. Forecast three times ahead by hand:

\[(1+0.7677B)(1-B)^{2}y_{t} = (1-0.2812B-0.4977B^{2})\varepsilon_{t}\]

\[(1-2B+B^{2}+0.7677B-2*0.7677B^{2}+0.7677B^{2})y_{t} = (1-0.2812B-0.4977B^{3})\varepsilon_{t}\]

\[(1-1.2323B-0.5354B^{2}+0.7677B^{3})y_{t} = (1-0.2812B-0.4977B^{2})\varepsilon_{t}\]

\[y_{t} = 1.2323y_{t-1} + 0.5354y_{t-2} - 0.7677y_{t-3} + \varepsilon_{t} - 0.2812\varepsilon_{t-1} - 0.4977\varepsilon_{t-2}\]

## [1] "Point forecasts in 2005, 2006, and 2007:  2.53400122093286 2.40414535130206 2.33146324099697"

(f)

Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

Answer:

(g)

Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

Answer:

  1. auto.arima() gives ARIMA(1,2,1) instead of ARIMA(1,2,2).

  2. The RMSE, MAE, MAPE, MASE values from ARIMA(1,2,2) are smaller than those from ARIMA(1,2,1).

  3. The residual plot and ACF plot from ARIMA(1,2,2) have smaller range than that from ARIMA(1,2,1).

  4. Thus, I think ARIMA(1,2,2) is better.

## Series: wmurders 
## ARIMA(1,2,2) 
## 
## Coefficients:
##           ar1      ma1      ma2
##       -0.7677  -0.2812  -0.4977
## s.e.   0.2350   0.2879   0.2762
## 
## sigma^2 estimated as 0.04552:  log likelihood=7.38
## AIC=-6.75   AICc=-5.92   BIC=1.13
##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.01109526 0.2034302 0.1504565 -0.2279984 4.285732 0.9252368
##                    ACF1
## Training set 0.01083757

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,2,2)
## Q* = 9.6215, df = 7, p-value = 0.2111
## 
## Model df: 3.   Total lags used: 10
## Series: wmurders 
## ARIMA(1,2,1) 
## 
## Coefficients:
##           ar1      ma1
##       -0.2434  -0.8261
## s.e.   0.1553   0.1143
## 
## sigma^2 estimated as 0.04632:  log likelihood=6.44
## AIC=-6.88   AICc=-6.39   BIC=-0.97
##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.01065956 0.2072523 0.1528734 -0.2149476 4.335214 0.9400996
##                    ACF1
## Training set 0.02176343

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,2,1)
## Q* = 12.419, df = 8, p-value = 0.1335
## 
## Model df: 2.   Total lags used: 10