Exercise 8.1

Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise?

The ACF graphs indicate white noise for the 3 different time series. For the white noise, we expect no autocorrelations outside 95% limits, the spkes close to zero.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The formula to get critical values depend on the length of the series. For random numbers, the longer the length of the time series, the closer the bounds to zero becomes through \(\pm2/\sqrt{T}\)

Exercise 8.2

A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

ggtsdisplay(ibmclose)

The time series plot seems to exhibit downward trend. The ACF plot display autocorrelation coefficients that are large and positive, outside of bounds. The PACF plot suggests autocorrelation with 1st lag. All the plots indicate a time series that is not stationary, and a differencing can be applied to make it stationary.

# check number of differencing required
ndiffs(ibmclose)

## [1] 1

cbind("Daily closing IBM stock price" = ibmclose,
      "Daily Change" = diff(ibmclose)) %>%
  autoplot(facets=TRUE)

Exercise 8.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

usnetelec

# original series
ggtsdisplay(usnetelec)

# lambda and transformed series
lambda <- BoxCox.lambda(usnetelec)
lambda

## [1] 0.5167714

usnetelec.bc <- BoxCox(usnetelec, lambda = lambda)
ggtsdisplay(usnetelec.bc)

# check number of differencing required
ndiffs(usnetelec.bc)

## [1] 2

# perform transformation, differencing, plot the stationary series
usnetelec.diff <- usnetelec.bc %>% diff() %>% diff()
ggtsdisplay(usnetelec.diff)

usnetelec.diff %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.072 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

usgdp

# original series
ggtsdisplay(usgdp)

# lambda and transformed series
lambda <- BoxCox.lambda(usgdp)
lambda

## [1] 0.366352

usgdp.bc <- BoxCox(usgdp, lambda = lambda)
ggtsdisplay(usgdp.bc)

# check number of differencing required
ndiffs(usgdp.bc)

## [1] 1

nsdiffs(usgdp.bc)

## [1] 0

# perform transformation, differencing, plot the stationary series
usgdp.diff <- usgdp.bc %>% diff()
ggtsdisplay(usgdp.diff)

usgdp.diff %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.2013 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

mcopper

# original series
ggtsdisplay(mcopper)

# lambda and transformed series
lambda <- BoxCox.lambda(mcopper)
lambda

## [1] 0.1919047

mcopper.bc <- BoxCox(mcopper, lambda = lambda)
ggtsdisplay(mcopper.bc)

# check number of differencing required
ndiffs(mcopper.bc)

## [1] 1

nsdiffs(mcopper.bc)

## [1] 0

# perform transformation, differencing, plot the stationary series
mcopper.diff <- mcopper.bc %>% diff()
ggtsdisplay(mcopper.diff)

mcopper.diff %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 0.0573 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

enplanements

# original series
ggtsdisplay(enplanements)

# lambda and transformed series
lambda <- BoxCox.lambda(enplanements)
lambda

## [1] -0.2269461

enplanements.bc <- BoxCox(enplanements, lambda = lambda)
ggtsdisplay(enplanements.bc)

# check number of differencing required
ndiffs(enplanements.bc)

## [1] 1

nsdiffs(enplanements.bc)

## [1] 1

# perform transformation, differencing, plot the stationary series
enplanements.diff <- enplanements.bc %>% diff(12) %>% diff(1)
ggtsdisplay(enplanements.diff)

enplanements.diff %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0424 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

visitors

# original series
ggtsdisplay(visitors)

# lambda and transformed series
lambda <- BoxCox.lambda(visitors)
lambda

## [1] 0.2775249

visitors.bc <- BoxCox(visitors, lambda = lambda)
ggtsdisplay(visitors.bc)

# check number of differencing required
ndiffs(visitors.bc)

## [1] 1

nsdiffs(visitors.bc)

## [1] 1

# perform transformation, differencing, plot the stationary series
visitors.diff <- enplanements.bc %>% diff(12) %>% diff(1)
ggtsdisplay(visitors.diff)

visitors.diff %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0424 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

Exercise 8.5

For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349335T"],
  frequency=12, start=c(1982,4))

# original series
ggtsdisplay(myts)

# check order of differencing needed
ndiffs(myts)

## [1] 1

nsdiffs(myts)

## [1] 1

# transforming and differencing
myts.diff <- myts %>% BoxCox(lambda = BoxCox.lambda(myts)) %>% diff(12) %>% diff(1)

# check differencing
myts.diff %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0162 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ndiffs(myts.diff)

## [1] 0

nsdiffs(myts.diff)

## [1] 0

# display stationary series
ggtsdisplay(myts.diff)

Exercise 8.6

Use R to simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with \(\phi_{1}=0.6\) and \(\sigma^{2}=1\). The process starts with \(y_{1}=0\).

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]

Produce a time plot for the series. How does the plot change as you change \(\phi_{1}\)?

As \(\phi\) decreases, the pattern seems more pronounced with more up and downs, while the pattern seems to expands with more variance as \(\phi\) increases.

# phi = 0.6
ar1 <- autoplot(y) + ggtitle("phi = 0.6")


# phi = -0.9
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- -0.9*y[i-1] + e[i]

ar2 <- autoplot(y) + ggtitle("phi = -0.9")

# phi = 0.9
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.9*y[i-1] + e[i]

ar3 <- autoplot(y) + ggtitle("phi = 0.9")

grid.arrange(ar1, ar2, ar3)

Write your own code to generate data from an MA(1) model with \(\theta_{1}=0.6\) and \(\sigma^{2}=1\)

y <- ts(numeric(100))
e <- rnorm(100, sd = 1)
for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i]

Produce a time plot for the series. How does the plot change as you change \(\theta_{1}\)?

Similar to \(\phi\), as \(\theta\) varies from small value to large, the pattern seems to gather/contract and expands respectively

# theta = 0.6
ma1 <- autoplot(y) + ggtitle("theta = 0.6")

# theta = -0.9
y <- ts(numeric(100))
e <- rnorm(100, sd = 1)
for(i in 2:100)
  y[i] <- -0.9*e[i-1] + e[i]

ma2 <- autoplot(y) + ggtitle("theta = -0.9")

# theta = 0.9
y <- ts(numeric(100))
e <- rnorm(100, sd = 1)
for(i in 2:100)
  y[i] <- 0.9*e[i-1] + e[i]

ma3 <- autoplot(y) + ggtitle("theta = 0.9")

grid.arrange(ma1, ma2, ma3)

Generate data from an ARMA(1,1) model with \(\phi_{1}=0.6\), \(\theta_{1}=0.6\) and \(\sigma^{2}=1\)

y <- ts(numeric(100))
e <- rnorm(100, sd = 1)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + 0.6*e[i-1] + e[i]
ar_1 <- y
ar_1

## Time Series:
## Start = 1 
## End = 100 
## Frequency = 1 
##   [1]  0.00000000  1.13749784  0.79589538  1.58938148  1.96423378  3.73145822
##   [7]  5.44723832  4.16745639  2.78259891  1.42770427 -0.24416561 -2.22724509
##  [13] -2.88801505 -3.33505313 -2.57618589 -1.06534140 -0.12861519  1.07531630
##  [19]  1.54179212  1.31065688  0.11522932  0.52004219  0.13468607 -1.25754560
##  [25] -0.66862668 -0.52521532 -1.42824873 -1.45800392 -1.37723793 -0.50796846
##  [31]  1.74269122  3.14544432  2.92007892  2.26686677  1.06538359  1.13544352
##  [37]  1.22868474  0.22813256  0.02890145  0.22136591  1.12464713  1.63743591
##  [43]  1.17254519  0.28092364 -1.59322289 -2.21924554 -1.78961935  1.26112252
##  [49]  4.60896821  4.35274641  2.52275427  1.10750574  1.19947125  1.70854700
##  [55]  1.31348652  0.68771480  0.36402736  0.29830860  1.74686864  1.26872784
##  [61] -0.27098412  0.54190947  1.02738014 -0.81840842 -1.96221070 -1.33366884
##  [67] -1.40874480 -2.53306606 -1.41748389  1.06297559  1.32794597  1.55115843
##  [73]  2.39851868  1.97993738  0.29371400  0.93240455  1.50030846  1.43788222
##  [79]  2.00808082  1.85233656  0.51203048 -0.77635841 -0.14387174  1.07823299
##  [85]  2.15102434  1.64552508 -0.16059911  0.71735796  1.71896943  3.04925120
##  [91]  3.83453134  3.92184107  2.88987061  1.25847538  1.40635854  1.91508836
##  [97]  0.56493967  0.09420996  0.07735447 -0.04652694

Generate data from an AR(2) model with \(\phi_{1}=-0.8\), \(\phi_{2}=0.3\) and \(\sigma^{2}=1\). (Note that these parameters will give a non-stationary series.)

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 3:100)
  y[i] <- -0.8*y[i-1] + 0.3*y[i-2] + e[i]

ar_2 <- y
ar_2

## Time Series:
## Start = 1 
## End = 100 
## Frequency = 1 
##   [1]  0.000000e+00  0.000000e+00 -1.766034e-02 -8.112733e-02 -1.247351e+00
##   [6]  8.210908e-02 -4.274627e-01 -2.097230e-01 -7.844836e-01  1.276510e+00
##  [11] -2.179395e+00  1.935257e+00 -3.028074e+00  5.317658e+00 -4.643641e+00
##  [16]  4.905796e+00 -3.421862e+00  6.154831e+00 -7.738244e+00  8.500417e+00
##  [21] -8.481240e+00  9.641654e+00 -9.786070e+00  1.038785e+01 -1.099147e+01
##  [26]  1.150810e+01 -1.185066e+01  1.105544e+01 -1.188322e+01  1.335750e+01
##  [31] -1.228517e+01  1.374135e+01 -1.369116e+01  1.580071e+01 -1.726498e+01
##  [36]  1.867941e+01 -1.878166e+01  2.193818e+01 -2.302312e+01  2.472098e+01
##  [41] -2.760470e+01  3.008818e+01 -3.266097e+01  3.696589e+01 -3.890689e+01
##  [46]  4.391578e+01 -4.603623e+01  4.946917e+01 -5.486001e+01  5.919838e+01
##  [51] -6.340920e+01  6.742670e+01 -7.288819e+01  7.834491e+01 -8.524337e+01
##  [56]  8.973207e+01 -9.614659e+01  1.034606e+02 -1.117503e+02  1.223800e+02
##  [61] -1.310515e+02  1.417140e+02 -1.534256e+02  1.634257e+02 -1.763489e+02
##  [66]  1.911024e+02 -2.061300e+02  2.224654e+02 -2.382128e+02  2.590001e+02
##  [71] -2.780587e+02  3.001013e+02 -3.242283e+02  3.508362e+02 -3.782210e+02
##  [76]  4.086319e+02 -4.409139e+02  4.738349e+02 -5.092649e+02  5.495717e+02
##  [81] -5.924312e+02  6.384162e+02 -6.899643e+02  7.434557e+02 -8.014461e+02
##  [86]  8.627404e+02 -9.314966e+02  1.003728e+03 -1.082282e+03  1.168428e+03
##  [91] -1.259395e+03  1.357393e+03 -1.463961e+03  1.578856e+03 -1.702267e+03
##  [96]  1.834358e+03 -1.978378e+03  2.133437e+03 -2.300067e+03  2.481422e+03

Graph the latter two series and compare them.

ggtsdisplay(ar_1)

ggtsdisplay(ar_2)

The two series’ main differ is stationarity. AR(2) is non-stationary while AR(1) is.

Exercise 8.7

Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.

By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q) model for these data.

# time series and autocorrelation plots
ggtsdisplay(wmurders)

The time series does not seem to have a strong trend, cycle, or seasons. Based on ACF, the correlation decays the further back in time, while PACF shows only lag 1 to be positive and strong.

ndiffs(wmurders)

## [1] 2

Based on plots and differencing order, we would try out Arima(1,2,1)

Should you include a constant in the model? Explain.

No, a constant is not included unless number of differencing is 2.

Write this model in terms of the backshift operator.

\((1-\phi_1B)(1-B)^2 y_t = (1 + \theta_1 B)\varepsilon_t\)

Fit the model using R and examine the residuals. Is the model satisfactory?

(fit <- Arima(wmurders, order=c(1,2,1)))

## Series: wmurders 
## ARIMA(1,2,1) 
## 
## Coefficients:
##           ar1      ma1
##       -0.2434  -0.8261
## s.e.   0.1553   0.1143
## 
## sigma^2 estimated as 0.04632:  log likelihood=6.44
## AIC=-6.88   AICc=-6.39   BIC=-0.97

checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,2,1)
## Q* = 12.419, df = 8, p-value = 0.1335
## 
## Model df: 2.   Total lags used: 10

Yes the model is satisfactory. The residuals appears to approximate normal distribution, the ACF coefficients are witin 95% critical limit

Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

forecast(fit, h=3)

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.470660 2.194836 2.746484 2.048824 2.892496
## 2006       2.363106 1.986351 2.739862 1.786908 2.939304
## 2007       2.252833 1.765391 2.740276 1.507354 2.998313

Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

autoplot(forecast(fit, h=3))

Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

Based on AICc, the auto.arima function seems to have a better model.

(auto.fit <- auto.arima(wmurders,seasonal = FALSE,
                        stepwise = FALSE, approximation = FALSE))

## Series: wmurders 
## ARIMA(0,2,3) 
## 
## Coefficients:
##           ma1     ma2      ma3
##       -1.0154  0.4324  -0.3217
## s.e.   0.1282  0.2278   0.1737
## 
## sigma^2 estimated as 0.04475:  log likelihood=7.77
## AIC=-7.54   AICc=-6.7   BIC=0.35

checkresiduals(auto.fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,3)
## Q* = 10.706, df = 7, p-value = 0.152
## 
## Model df: 3.   Total lags used: 10

forecast(auto.fit, h=3)

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.408817 2.137718 2.679916 1.994206 2.823428
## 2006       2.365555 1.985092 2.746018 1.783687 2.947423
## 2007       2.290976 1.753245 2.828706 1.468588 3.113363

autoplot(forecast(auto.fit, h=3))

Predictive Analytics - Homework 6

Luisa Velasco