Data 624 - Predictive Analytics

Chapter 8

library(fpp2)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## ── Attaching packages ────────────────────────────────────────────── fpp2 2.4 ──
## ✓ ggplot2   3.3.3     ✓ fma       2.4  
## ✓ forecast  8.13      ✓ expsmooth 2.3
## 
library(tseries)
library(urca)

8.1 Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

a. Explain the differences among these figures. Do they all indicate that the data are white noise?

Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.

All three figures indicate that the data are white noise, since the ACF bars are all within the dash lines which indicate critical values for the ACF to be considered statistically significance. For data with smaller number of samples, the ACF bars are taller than the data with larger number of samples. Since the data are randomly generated, they are independently and identically distributed, and therefore should have autocorrelation of zero for all of its lags. This is demonstrated by the figures - as more samples are generated, the autocorrelations tend toward zero.

b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

These dash lines are estimated using ±1.96/√‾‾N with zero center. Mathematically, as N gets bigger, the absolute value of critical value become smaller. Statistically, this means that it is “easier” for smaller data set to exhibit correlation by chance than larger data set. So to compensate, the absolute value of critical values are larger for smaller data set - you need to have higher autocorrelation to reject the null hypothesis that the autocorrelation is zero (i.e. the autocorrelation observed are due to chance only).

8.2 A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

help(ibmclose)
ggtsdisplay(ibmclose)

For a stationary data, the ACF usually drops very quickly to zero. For non-stationary data, the ACF decrease slowly and smoothly. In the ACF plot above, it is apparent that the ACF drop offs very slowly. Therefore it is not stationary.

For the PACF, the 1st lag always equal to ACF’s 1st lag, as can be seen in the figure above. The 1st lag PACF is very high here, and all other PACF are close to zero. This suggests that the data is non-stationary, since the 1st PACF is extremely high (close to one).

8.3 For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a. usnetelec

help(usnetelec)
ggtsdisplay(usnetelec)

From the plots, it appears the time series does not have seasonality and only has an upward trend. It does not require a Box-Cox transformation, and the ACF PACF suggest a difference order of 1.

The result is the following:

usnetelec.s <- diff(usnetelec, 1)
ggtsdisplay(usnetelec.s)

The unit root test result is:

usnetelec.s %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.1585 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The test statistics is smaller than the 10% critical value. So the null hypothesis that the differenced data is stationary cannot be rejected.

b. usgdp

help(usgdp)
ggtsdisplay(usgdp)

For this time series, I found that differencing twice, each time with a lag of 1, yield a stationary data. Box-Cox transform is also used first to stablize variance across time.

The result is the following:

bc.lambda <- BoxCox.lambda(usgdp)
paste('The Box-Cox lambda for this time series is:', bc.lambda)
## [1] "The Box-Cox lambda for this time series is: 0.366352049520934"
usgdp.s <- usgdp %>% BoxCox(bc.lambda) %>% diff(1) %>% diff(1) 
ggtsdisplay(usgdp.s)

The unit root test result is:

usgdp.s %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.012 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The test statistics is very smaller. So the null hypothesis that the differenced data is stationary cannot be rejected.

c. mcopper

help(mcopper)
ggtsdisplay(mcopper)

For the mcopper time series, it seems that there is a seasonality, but it is not strong. There is a large jump after 2010, and the curve is basically flat from 1960 to 1965. I have first used Box-Cox transform to stablize the variance, and then differenced it once.

The result is the following:

bc.lambda <- BoxCox.lambda(mcopper)
paste('The Box-Cox lambda for this time series is:', bc.lambda)
## [1] "The Box-Cox lambda for this time series is: 0.191904709003829"
mcopper.s <- mcopper %>% BoxCox(bc.lambda) %>% diff(1) 
ggtsdisplay(mcopper.s)

The data is stationary after 1965. It seems that there is not much I can do for the flat period before 1965.

The unit root test result is:

mcopper.s %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 0.0573 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The test statistics is very smaller. So the null hypothesis that the differenced data is stationary cannot be rejected.

d. enplanements

help(enplanements)
ggtsdisplay(enplanements)

This time series has strong seasonality and a trend. First I used Box-Cox transformation to stablize the variance over the years. Then I seasonally difference the data (lag of 12), followed by a second difference with lag 1.

The result is the following:

bc.lambda <- BoxCox.lambda(enplanements)
paste('The Box-Cox lambda for this time series is:', bc.lambda)
## [1] "The Box-Cox lambda for this time series is: -0.226946111237065"
enplanements.s <- enplanements %>% BoxCox(bc.lambda) %>% diff(12) %>% diff(1)
ggtsdisplay(enplanements.s)

The unit root test result is:

enplanements.s %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0424 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The test statistics is very smaller. So the null hypothesis that the differenced data is stationary cannot be rejected.

e. visitors

help(visitors)
ggtsdisplay(visitors)

For this time series, it is apparent that there is seasonality and trend. The seasonal variance also increase over time. Therefore, I first apply Box-Cox transformation, followed by seasonal difference of lag 12, and secondary difference using lag of 1.

The result is the following:

bc.lambda <- BoxCox.lambda(visitors)
paste('The Box-Cox lambda for this time series is:', bc.lambda)
## [1] "The Box-Cox lambda for this time series is: 0.277524910835111"
visitors.s <- visitors %>% BoxCox(bc.lambda) %>% diff(12) %>% diff(1)
ggtsdisplay(visitors.s)

The unit root test result is:

visitors.s %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.0158 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The test statistics is very smaller. So the null hypothesis that the differenced data is stationary cannot be rejected.

8.5 For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

First, let’s plot the time series:

retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349873A"], frequency=12, start=c(1982,4))
ggtsdisplay(myts)

The data has seaonality and trend; and the seasonal variance increases over time. I found that I have to seasonally difference the time series (lag of 12), followed by secondary difference of lag 1. I also applied Box-Cox tranform first to the time series.

bc.lambda <- BoxCox.lambda(myts)
paste('Best Box-Cox lambda for this time series is:', bc.lambda)
## [1] "Best Box-Cox lambda for this time series is: 0.127636859661548"
myts.s <- myts %>% BoxCox(bc.lambda) %>% diff(1) %>% diff(12)
ggtsdisplay(myts.s)

myts.s %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0138 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The data is now stationary. The unit root test also shows that the test statistic is small.

8.6 Use R to simulate and plot some data from simple ARIMA models.

a. Use the following R code to generate data from an AR(1) model with \(\phi_{1} = 0.6\) and \(\sigma^2=1\). The process starts with \(y_1=0\).

ar.1 <- function(phi, sd=1, n=100){
  y <- ts(numeric(n))
  e <- rnorm(n, sd=sd)
  for(i in 2:n)
    y[i] <- phi*y[i-1] + e[i]
  return(y)
}

Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?

data.list <- list()
i <- 0
phi.vec <- c(0.0000001, 0.0001, 0.1) * 6
for (phi in phi.vec){
  i <- i + 1
  data.list[[i]] <- ar.1(phi)
}

gen.data <- do.call(cbind, data.list)
colnames(gen.data) <- paste('phi=', phi.vec, sep = '')
autoplot(gen.data) + ylab('Data')

In above code, data generated using AR(1) with three different magnitude of ϕ1 are plotted. It seems that with smaller ϕ1, the data is more random - i.e. the data generated at i depends more on the error term at i than on the data at i-1. The auto correlation will be higher and decay slower for larger ϕ1. Below are the ACF for the three sets data, which demonstrate this.

par(mfrow=c(1,3))
acf(gen.data[,1], main='Phi=0.0000006')
acf(gen.data[,2], main='Phi=0.0006')
acf(gen.data[,3], main='Phi=0.6')

c. Write your own code to generate data from an MA(1) model with \(\theta_{1} = 0.6\) and \(\sigma^2=1\).

ma.1 <- function(theta, sd=1, n=100){
  y <- ts(numeric(n))
  e <- rnorm(n, sd=sd)
  e[1] <- 0
  for(i in 2:n)
    y[i] <- theta*e[i-1] + e[i]
  return(y)
}

d. Produce a time plot for the series. How does the plot change as you change θ1?

data.list <- list()
i <- 0
theta.vec <- c(0.0000001, 0.0001, 0.1) * 6
for (theta in theta.vec){
  i <- i + 1
  data.list[[i]] <- ma.1(theta)
}
gen.data <- do.call(cbind, data.list)
colnames(gen.data) <- paste('theta=', theta.vec, sep = '')
autoplot(gen.data) + ylab('Data')

Similar to AR(1), the smaller θ1 in the MA(1) model means that the next data point generated is less dependant on the error in previous (i-1) term. Consequently, higher theta MAR(1) will have higher autocorrelation, at least in the 1st lag.

par(mfrow=c(1,3))
acf(gen.data[,1], main='theta=0.0000006')
acf(gen.data[,2], main='theta=0.0006')
acf(gen.data[,3], main='theta=0.6')

e. e. Generate data from an ARMA(1,1) model with ϕ1=0.6,θ1=0.6,σ2=1

y1 <- ts(numeric(100))
e <- rnorm(100, sd=1)
for(i in 2:100)
  y1[i] <- 0.6*y1[i-1] + 0.6*e[i-1] + e[i]
autoplot(y1) +
  ggtitle('ARMA(1,1) Generated Data')

f. Generate data from an AR(2) model with ϕ1=−0.8,ϕ2=0.3,σ2=1. (Note that these parameters will give a non-stationary series.)

y2 <- ts(numeric(100))
e <- rnorm(100, sd=1)
for(i in 3:100)
  y2[i] <- -0.8*y2[i-1] + 0.3*y2[i-2] + e[i]
autoplot(y2) +
  ggtitle('AR(2) Generated Data')

g. Graph the latter two series and compare them.

The two generated series are plotted above. The AR(2) generated data are obviously non-stationary. It has an apparent sinusoidal pattern, and the amptitude of the curve increases exponentially over time. We can say that the AR(2) data has seasonality. The ARMA(1,1) generated data does not have this apparent seasonality and it appears to be random and much more stationary than the AR(2) data. Below are the ACF of the two time series:

par(mfrow=c(1,2))
acf(y1, main='ARMA(1,1) Generated Data')
acf(y2, main='AR(2) Generated Data')

Again, we can see that the AR(2) generated data has dominant seasonality, comparing to that of the ARMA(1,1) data.

8.7 Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.

a. By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q) model for these data.

help(wmurders)
ggtsdisplay(wmurders, main='Annual Female Murder Rate in USA')

From the above plots, the time series is not seasonal but has a trend. There is no significant variance is the time series through time, so Box-Cox transformation is not necessary. Differencing twice (d=2) seems to make it stationary. The KPSS test confirms this:

wmurders.s <- wmurders %>% diff(1) %>% diff(1)
ggtsdisplay(wmurders.s)

wmurders.s %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.0458 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

There is one significant spikes in the 1st lags in both the ACF and PACF plots. The appropriate ARIMA model can be ARIMA(1,2,1).

b. Should you include a constant in the model? Explain.

The constant in a non-seasonal ARIMA model is equivalent to including a polynomial trend of order d in the forecast function. The textbook suggests that a quadratic or higher order trend is dangerous for forecasting. The d here is 2, greater than 1, so no, I would not.

c. Write this model in terms of the backshift operator.

(1−ϕ1B)(1−B)2yt=c+(1+θ1B)εt, where c will be 0 if no constant included.

d. Fit the model using R and examine the residuals. Is the model satisfactory?

(fit <- Arima(wmurders, order=c(1,2,1)))
## Series: wmurders 
## ARIMA(1,2,1) 
## 
## Coefficients:
##           ar1      ma1
##       -0.2434  -0.8261
## s.e.   0.1553   0.1143
## 
## sigma^2 estimated as 0.04632:  log likelihood=6.44
## AIC=-6.88   AICc=-6.39   BIC=-0.97
checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,2,1)
## Q* = 12.419, df = 8, p-value = 0.1335
## 
## Model df: 2.   Total lags used: 10

The ACF suggests that the residuals are well within critical limits. And the residuals histogram shows that the residuals are by and large normally shaped. The model is satisfactory.

e. Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

forecast(fit, h=3)
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.470660 2.194836 2.746484 2.048824 2.892496
## 2006       2.363106 1.986351 2.739862 1.786908 2.939304
## 2007       2.252833 1.765391 2.740276 1.507354 2.998313

Expanding the forecasting equation of ARIMA(1,2,1):

(1−ϕ1B)(1−B)2yt=c+(1+θ1B)εt (1+B2−2B−ϕ1B−ϕ1B3+2ϕ1B2)yt=εt+θ1Bεt

Applying the backshift operator:

yt+yt−2−2yt−1−ϕ1yt−1−ϕ1yt−3+2ϕ1yt−2=εt+θ1εt−1

Collecting same terms and moving terms to right side:

yt=(2+ϕ1)yt−1−(1+2ϕ1)yt−2+ϕ1yt−3+εt+θ1εt−1

From R, we estimated ϕ1=−0.2434 and θ1=−0.8261, and we use εt=0 for the forecast. Plugging in all the numbers:

yt=1.7566yt−1−0.5132yt−2−0.2434yt−3−0.8261εt−1

To begin forecasting, we will need the last 3 year data and last year residual:

tail(wmurders, 3)
## Time Series:
## Start = 2002 
## End = 2004 
## Frequency = 1 
## [1] 2.797697 2.662227 2.589383
tail(residuals(fit), 1)
## Time Series:
## Start = 2004 
## End = 2004 
## Frequency = 1 
## [1] 0.03708172

For 2005 forecast:

yt=1.7566×2.589383−0.5132×2.662227−0.2434×2.797697−0.8261×0.03708172=2.470663

For 2006 forecast:

yt=1.7566×2.470663−0.5132×2.589383−0.2434×2.662227−0.8261×0=2.363109

For 2007 forecast:

yt=1.7566×2.363109−0.5132×2.470663−0.2434×2.589383−0.8261×0=2.252837

These are nearly identical to the R calculated point forecasts.

f. Create a plot of the series with forecasts and prediction intervals for the next three periods shown. Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

The plot of the series with forecasts are below:

autoplot(forecast(fit, h=3))

Using auto.arima(), it found the best fit being ARIMA(0,2,3):

(fit2 <- auto.arima(wmurders, seasonal=F, stepwise=F, approximation=F))
## Series: wmurders 
## ARIMA(0,2,3) 
## 
## Coefficients:
##           ma1     ma2      ma3
##       -1.0154  0.4324  -0.3217
## s.e.   0.1282  0.2278   0.1737
## 
## sigma^2 estimated as 0.04475:  log likelihood=7.77
## AIC=-7.54   AICc=-6.7   BIC=0.35

Comparing the log-likelihood and the AIC, it seems that ARIMA(0,2,3) is better than ARIMA(1,2,1), since it has the higher log-likelihood and lower AIC. Below are the forecasts and plot of ARIMA(0,2,3) model. As can be seen, the ARIMA(0,2,3) seems to have included slight curve to the forecasts, while ARIMA(1,2,1) forecasts are more of a straight line.

forecast(fit2, h=3)
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.408817 2.137718 2.679916 1.994206 2.823428
## 2006       2.365555 1.985092 2.746018 1.783687 2.947423
## 2007       2.290976 1.753245 2.828706 1.468588 3.113363
autoplot(forecast(fit2, h=3))