Problem No. 8.1

Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise?

White noise is demonstrated when a time series has no autocorrelation. For the most part, each of these ACFs so no significant autocorrelation.

What about the one lag bars that appears significant in the series of 36 numbers, or the one or two significant bars in the series of 360? At a 95 percent confidence level, we can expect that one in twenty bars will reach significance by random chance. So we can conclude that the significant lags are probably due to randomness, and do not represent systematic autocorrelation.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The bounds are at \(2 / \sqrt{T}\) where \(T\) is the length of the series. So as \(T\) increasese, the bounds decrease. For each series:

2 / sqrt(36)

## [1] 0.3333333

2 / sqrt(360)

## [1] 0.1054093

2 / sqrt(1000)

## [1] 0.06324555

Problem No. 8.2

A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

We can see below the ACF is what would be expected for a non-stationary time series: High at lag 1 and slowly decreasing. Similarly, the PACF indicates high correlation between the stock’s value at \(t\) and \(t-1\). Stationary time series do not have this correlation among lags.

ggtsdisplay(ibmclose)

Differencing can be used to ‘wipe out’ or stabilize the mean of the time series, eliminating most or all of the seasonality and/or trend, resulting in a stationary time series.

Problem No. 8.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

usnetelec

This series has a very linear trend, Box-Cox transformation is probably not necessary to stabilize variance. However, differencing will be needed to stabilize the mean and transform the series to stationarity:

ggtsdisplay(usnetelec)

A KPSS unit test suggests single differencing is sufficient:

usnetelec %>% diff %>% ur.kpss %>% summary

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.1585 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

autoplot(diff(usnetelec))

usgdp

ggtsdisplay(usgdp)

This series is not as linear as the previous series; there is clear curvature. Use Box-Cox transformation to stabilize variance:

(lambda <- BoxCox.lambda(usgdp))

## [1] 0.366352

usgdp_bc <- BoxCox(usgdp, lambda)
ggtsdisplay(usgdp_bc)

The transformation appears to have stabilized variance, but from the ACF and PACF plots, we still need to stabilize the mean to achieve stationarity:

usgdp_bc %>% diff %>% ur.kpss %>% summary

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.2013 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

autoplot(diff(usgdp_bc))

The p-value is small enough we can fail to reject the null hypothesis that the data is stationary.

mcopper

This time series roughly has three periods, low variance in the beginning, increased variance in the middle, and a rapid climb after 2000.

ggtsdisplay(mcopper)

Employ Box-Cox transformation:

(lambda <- BoxCox.lambda(mcopper))

## [1] 0.1919047

mcopper_bc <- BoxCox(mcopper, lambda)
ggtsdisplay( mcopper_bc )

Add some differencing:

autoplot(diff(mcopper_bc))

By eye and by KPSS this series has been made stationary:

mcopper_bc %>% diff %>% ur.kpss %>% summary

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 0.0573 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

enplanements

There appears to be lower variance at the beginning of the time series compared to the middle and the end. I will use log transformation to stabilize it.

ggtsdisplay(enplanements)

A single difference seems sufficient for stationarity:

autoplot(diff(log(enplanements)))

Test suggests stationarity has been achieved:

enplanements %>% log %>% diff %>% ur.kpss %>% summary

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0129 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

visitors

Similar to above, employ a log transformation to stabilize variance.

ggtsdisplay(visitors)

Check single difference, which looks good:

autoplot(diff(log(visitors)))

Test confirms little evidence of no-stationarity:

visitors %>% log %>% diff %>% ur.kpss %>% summary

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.0781 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

Problem No. 8.5

For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

Load the data and plot

retaildata <- readxl::read_excel('~/Downloads/retail.xlsx', skip=1)
myts <- ts(retaildata[,55], frequency=12, start=c(1982, 4))
ggtsdisplay(myts)

It’s clear some variance stabilization is in order. Log transformation looks pretty good:

autoplot(log(myts))

Find appropriate ordering of differencing and examine:

ndiffs(log(myts))

## [1] 1

myts %>% log %>% diff %>% autoplot

The series looks pretty stationary, which is further confirmed by KPSS test:

myts %>% log %>% diff %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0164 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

Problem No. 8.6

Use R to simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with \(\phi_1 = 0.6\) and \(\sigma^2 = 1\). The process starts with \(y_1 = 0\).

Write a function for easy repetition:

generate_ts <- function(n, phi, seed=1804) {
  set.seed(seed)
  y <- ts(numeric(n))
  e <- rnorm(n)
  for(i in 2:n) y[i] <- phi*y[i-1] + e[i]
  return(y)
}

Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?

plot(generate_ts(n=100, phi=0.6), ylim=c(-9,5))
lines(generate_ts(n=100, phi=0.75), col='red')
lines(generate_ts(n=100, phi=0.99), col='green')
abline(h=0)

Increasing \(\phi_1\) alters the pattern of the time series. Where \(\phi_1 = 0.75\), the series looks similar to the original series, can be higher or lower than the original series. Where \(\phi_1 = 0.99\) the series looks quite different; it’s always above the original series until around \(t=60\), where it is always below it. The \(\phi_1 = 0.99\) is very close to a random walk.

The scale remains about the same regardless of \(\phi_1\).

Write your own code to generate data from an MA(1) model with \(\theta_1 = 0.6\) and \(\sigma^2 = 1\).

Define the model as

\[MA(1) \equiv y_t = c + \epsilon_t + \theta_1 \epsilon_{t-1}\]

Generate:

set.seed(1804)
n <- 100
theta_1 <- 0.6
c <- 0
e <- rnorm(n)
y <- ts( c + e + theta_1*lag(e) )

autoplot(y)

Problem No. 8.7

Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.

By studying appropriate graphs of the series in R, find an appropriate \(ARIMA(p, d, q)\) model for these data.

An \(ARIMA\) model is defined by,

\(p\), the order of autoregression (\(y\) regressed on previous values)
\(d\), differencing required for stationarity
\(q\), the order of the moving average component

Order of differencing. Use ndiffs and plot the result:

ndiffs(wmurders)

## [1] 2

autoplot(diff(wmurders, differences=2))

This looks pretty good, by eyesight. Let’s check unit root test:

ur.kpss(diff(wmurders, differences=2))

## 
## ####################################### 
## # KPSS Unit Root / Cointegration Test # 
## ####################################### 
## 
## The value of the test statistic is: 0.0458

The test statistic is quite small, indicating failure to reject the null hypothesis that the data is stationary.

Order of autoregression. Examine the PACF plot of the differenced series:

ggPacf(diff(wmurders, differences=2))

The first lag is the only one with significant autocorrelation (chosing to regard the lag at 5 as unimportant). This suggests \(p = 1\).

Order of moving average. Examine the ACF plot of the differenced series:

ggAcf(diff(wmurders, differences=2))

First lag is negative and significant, suggesting we include a first order moving average term.

This all adds up to an \(ARIMA(1, 2, 1)\) model.

Should you include a constant in the model? Explain.

Given that \(d=2\), a positive non-zero constant would give forecasts a quadratic trend—which is undesirable in this case.

Write this model in terms of the backshift operator.

Fit the model using R and examine the residuals. Is the model satisfactory?

m <- Arima(wmurders, order=c(1, 2, 1))
checkresiduals(m)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,2,1)
## Q* = 12.419, df = 8, p-value = 0.1335
## 
## Model df: 2.   Total lags used: 10

The ACF plot does not show any significant autocorrelation in the residuals, which is a good signs. The residuals are distributed symmetrically around zero with no extreme outliers.

I’d say this model looks good!

Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

m_fc <- forecast(m, 3)
m_fc

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.470660 2.194836 2.746484 2.048824 2.892496
## 2006       2.363106 1.986351 2.739862 1.786908 2.939304
## 2007       2.252833 1.765391 2.740276 1.507354 2.998313

The form of an \(ARIMA(1, 2, 1)\) model is:

\[y_t'' = c + \phi_1 y_{t-1}'' + \theta_1 \epsilon_{t-1} + \epsilon_t\]

where the tick marks represent twice differencing, \(c\) represents the constant, \(\phi_1 y_{t-1}''\) is the AR component, \(\theta_1 \epsilon_{t-1}\) is the MA component, and \(\epsilon_t\) is the random error.

The model I fit above is:

\[y_t'' = 0 - 0.2434 y_{t-1}'' - 0.8261 \epsilon_{t-1} + \epsilon_t\]

Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

autoplot(m_fc)

Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

Same model:

m1 <- auto.arima(wmurders, approximation=FALSE)
summary(m1)

## Series: wmurders 
## ARIMA(1,2,1) 
## 
## Coefficients:
##           ar1      ma1
##       -0.2434  -0.8261
## s.e.   0.1553   0.1143
## 
## sigma^2 estimated as 0.04632:  log likelihood=6.44
## AIC=-6.88   AICc=-6.39   BIC=-0.97
## 
## Training set error measures:
##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.01065956 0.2072523 0.1528734 -0.2149476 4.335214 0.9400996
##                    ACF1
## Training set 0.02176343

DATA 624—Week No. 8

Ben Horvath

March 22, 2020