8.1

Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers

a.Explain the differences among these figures. Do they all indicate that the data are white noise?

For all 3 charts, the ACF bars don’t have spikes and are within the significant level indicating that they are all white noise.

b.Why are the critical values at different distances from the mean of zero? Why are the auto-correlations different in each figure when they each refer to white noise?

The formula for the critical values is +/- 1.96/(sqrt(T-d)) where T equals the sample size and d equals the difference used. The critical value increases as the sample size increases, which explains why the critical value decreases.

8.2

A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differences.

ggtsdisplay(ibmclose)
## Warning in is.na(main): is.na() applied to non-(list or vector) of type
## 'NULL'

ACF plot shows a clear trend across the plot. The auto-correlation values are greater than the critical value point but decrease gradually. This means that the IBM stock data are non-stationary or predictable using lagged values. The data should be differenced in order to remove the auto-correlation.

PACF plot shows that all except the first plot are close to zero indicating that the data is non-stationary. The 1st plot is ACF’s 1st lag.

8.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

usnetelec

plot(usnetelec)

ndiffs(usnetelec)
## [1] 1

Usnetelec chart has an upward trend and does not need a Box-Cox Transformation. ndiffs function estimates that only 1 difference is necessary to make usnetelec series stationary.

usnetelec2 <- diff(usnetelec,1)
ggtsdisplay(usnetelec2)
## Warning in is.na(main): is.na() applied to non-(list or vector) of type
## 'NULL'

Box.test(usnetelec2, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  usnetelec2
## X-squared = 0.8508, df = 1, p-value = 0.3563
kpss.test(usnetelec2)
## Warning in kpss.test(usnetelec2): p-value greater than printed p-value
## 
##  KPSS Test for Level Stationarity
## 
## data:  usnetelec2
## KPSS Level = 0.16555, Truncation lag parameter = 1, p-value = 0.1

The Ljung Box Test has a P value greater than .05 indicating that the data can be thought of as white noise series.

The KPSS Test has a P value greater than .05 indicating that the null hypothesis of stationarity around a trend is not rejected.

usgdp

plot(usgdp)

ndiffs(usgdp)
## [1] 2

usgdp chart has an upward trend and does not need a Box-Cox Transformation. ndiffs function estimates that 2 differences are necessary to make usnetelec series stationary.

usgdp2 <- diff(diff(usgdp,1),1)
ggtsdisplay(usgdp2)
## Warning in is.na(main): is.na() applied to non-(list or vector) of type
## 'NULL'

Box.test(usgdp, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  usgdp
## X-squared = 233.07, df = 1, p-value < 2.2e-16
kpss.test(usgdp)
## Warning in kpss.test(usgdp): p-value smaller than printed p-value
## 
##  KPSS Test for Level Stationarity
## 
## data:  usgdp
## KPSS Level = 5.7895, Truncation lag parameter = 3, p-value = 0.01

The Ljung Box Test has a P value less than .05 indicating that the data can not be thought of as white noise series.

The KPSS Test has a P value greater than .05 indicating that the null hypothesis of stationarity around a trend is not rejected.

mcopper

plot(mcopper)

ndiffs(mcopper)
## [1] 1

mcopper chart has a minor seasonal trend and an upward trend that increases significantly after 2000. This plot does require a Box-Cox Transformation. ndiffs function estimates that only 1 difference is necessary to make usnetelec series stationary.

nsdiffs(mcopper)
## [1] 0

The nsdiffs function suggests that a seasonal difference is not necessary.

mcopper.lambda <- BoxCox.lambda(mcopper)
mcopper2 <- diff(BoxCox(mcopper, mcopper.lambda))
ggtsdisplay(mcopper2)
## Warning in is.na(main): is.na() applied to non-(list or vector) of type
## 'NULL'

Box.test(mcopper2, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  mcopper2
## X-squared = 57.517, df = 1, p-value = 3.353e-14
kpss.test(mcopper2)
## Warning in kpss.test(mcopper2): p-value greater than printed p-value
## 
##  KPSS Test for Level Stationarity
## 
## data:  mcopper2
## KPSS Level = 0.058427, Truncation lag parameter = 5, p-value = 0.1

The Ljung Box Test has a P value less than .05 indicating that the data can not be thought of as white noise series.

The KPSS Test has a P value greater than .05 indicating that the null hypothesis of stationarity around a trend is not rejected.

enplanements

plot(enplanements)

ndiffs(enplanements)
## [1] 1

enplanements chart has a strong seasonal trend as well as an upward trend that drops significantly after 2000. The data requires a Box-Cox Transformation. ndiffs function estimates that only 1 difference is necessary to make usnetelec series stationary.

nsdiffs(enplanements)
## [1] 1

nsdiffs function estimates that the data requires 1 seasonal difference to make it stationary.

enplanements.lambda <- BoxCox.lambda(enplanements)
enplanements2 <- diff(diff(BoxCox(enplanements,enplanements.lambda),12))
ggtsdisplay(enplanements2)
## Warning in is.na(main): is.na() applied to non-(list or vector) of type
## 'NULL'

Box.test(enplanements2, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  enplanements2
## X-squared = 29.562, df = 1, p-value = 5.417e-08
kpss.test(enplanements2)
## Warning in kpss.test(enplanements2): p-value greater than printed p-value
## 
##  KPSS Test for Level Stationarity
## 
## data:  enplanements2
## KPSS Level = 0.031244, Truncation lag parameter = 3, p-value = 0.1

The Ljung Box Test has a P value less than .05 indicating that the data can not be thought of as white noise series.

The KPSS Test has a P value greater than .05 indicating that the null hypothesis of stationarity around a trend is not rejected.

visitors

plot(visitors)

ndiffs(visitors)
## [1] 1

The plot shows a has a strong seasonal trend and an upward trend. The visitor data does need a Box-Cox Transformation. ndiffs function estimates that only 1 difference is necessary to make usnetelec series stationary.

nsdiffs(visitors)
## [1] 1

nsdiffs function estimates that the data requires 1 seasonal difference to make it stationary.

visitors.lambda <- BoxCox.lambda(visitors)
visitors2 <- diff(diff(BoxCox(visitors,visitors.lambda),12))
ggtsdisplay(visitors2)
## Warning in is.na(main): is.na() applied to non-(list or vector) of type
## 'NULL'

Box.test(visitors2, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  visitors2
## X-squared = 21.804, df = 1, p-value = 3.02e-06
kpss.test(visitors2)
## Warning in kpss.test(visitors2): p-value greater than printed p-value
## 
##  KPSS Test for Level Stationarity
## 
## data:  visitors2
## KPSS Level = 0.013605, Truncation lag parameter = 3, p-value = 0.1

The Ljung Box Test has a P value less than .05 indicating that the data can not be thought of as white noise series.

The KPSS Test has a P value greater than .05 indicating that the null hypothesis of stationarity around a trend is not rejected.

8.5

or your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

retaildata <- readxl::read_excel("retail2.xlsx", skip=1)
head(retaildata)
## # A tibble: 6 x 190
##   `Series ID`         A3349335T A3349627V A3349338X A3349398A A3349468W
##   <dttm>                  <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
## 1 1982-04-01 00:00:00      303.      41.7      63.9      409.      65.8
## 2 1982-05-01 00:00:00      298.      43.1      64        405.      65.8
## 3 1982-06-01 00:00:00      298       40.3      62.7      401       62.3
## 4 1982-07-01 00:00:00      308.      40.9      65.6      414.      68.2
## 5 1982-08-01 00:00:00      299.      42.1      62.6      404.      66  
## 6 1982-09-01 00:00:00      305.      42        64.4      412.      62.3
## # ... with 184 more variables: A3349336V <dbl>, A3349337W <dbl>,
## #   A3349397X <dbl>, A3349399C <dbl>, A3349874C <dbl>, A3349871W <dbl>,
## #   A3349790V <dbl>, A3349556W <dbl>, A3349791W <dbl>, A3349401C <dbl>,
## #   A3349873A <dbl>, A3349872X <dbl>, A3349709X <dbl>, A3349792X <dbl>,
## #   A3349789K <dbl>, A3349555V <dbl>, A3349565X <dbl>, A3349414R <dbl>,
## #   A3349799R <dbl>, A3349642T <dbl>, A3349413L <dbl>, A3349564W <dbl>,
## #   A3349416V <dbl>, A3349643V <dbl>, A3349483V <dbl>, A3349722T <dbl>,
## #   A3349727C <dbl>, A3349641R <dbl>, A3349639C <dbl>, A3349415T <dbl>,
## #   A3349349F <dbl>, A3349563V <dbl>, A3349350R <dbl>, A3349640L <dbl>,
## #   A3349566A <dbl>, A3349417W <dbl>, A3349352V <dbl>, A3349882C <dbl>,
## #   A3349561R <dbl>, A3349883F <dbl>, A3349721R <dbl>, A3349478A <dbl>,
## #   A3349637X <dbl>, A3349479C <dbl>, A3349797K <dbl>, A3349477X <dbl>,
## #   A3349719C <dbl>, A3349884J <dbl>, A3349562T <dbl>, A3349348C <dbl>,
## #   A3349480L <dbl>, A3349476W <dbl>, A3349881A <dbl>, A3349410F <dbl>,
## #   A3349481R <dbl>, A3349718A <dbl>, A3349411J <dbl>, A3349638A <dbl>,
## #   A3349654A <dbl>, A3349499L <dbl>, A3349902A <dbl>, A3349432V <dbl>,
## #   A3349656F <dbl>, A3349361W <dbl>, A3349501L <dbl>, A3349503T <dbl>,
## #   A3349360V <dbl>, A3349903C <dbl>, A3349905J <dbl>, A3349658K <dbl>,
## #   A3349575C <dbl>, A3349428C <dbl>, A3349500K <dbl>, A3349577J <dbl>,
## #   A3349433W <dbl>, A3349576F <dbl>, A3349574A <dbl>, A3349816F <dbl>,
## #   A3349815C <dbl>, A3349744F <dbl>, A3349823C <dbl>, A3349508C <dbl>,
## #   A3349742A <dbl>, A3349661X <dbl>, A3349660W <dbl>, A3349909T <dbl>,
## #   A3349824F <dbl>, A3349507A <dbl>, A3349580W <dbl>, A3349825J <dbl>,
## #   A3349434X <dbl>, A3349822A <dbl>, A3349821X <dbl>, A3349581X <dbl>,
## #   A3349908R <dbl>, A3349743C <dbl>, A3349910A <dbl>, A3349435A <dbl>,
## #   A3349365F <dbl>, A3349746K <dbl>, ...

Set as Time Series

myts <- ts(retaildata[,"A3349338X"],
  frequency=12, start=c(1982,4))

Plot

autoplot(myts)+ggtitle("Autoplot of A3349338X Sales")
## Warning in is.na(main): is.na() applied to non-(list or vector) of type
## 'NULL'

There is a seasonal trend, as well as an upward trend. The plot requires a transformation.

ndiffs(myts)
## [1] 1
nsdiffs(myts)
## [1] 1

The ndiffs & nsdiffs functions suggest that the retail data requires one differencing and seasonal differencing to be stationary.

myts.lambda<- BoxCox.lambda(myts)
myts2 <- diff(diff(BoxCox(myts,myts.lambda),12))
ggtsdisplay(myts2)
## Warning in is.na(main): is.na() applied to non-(list or vector) of type
## 'NULL'

Box.test(myts2, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  myts2
## X-squared = 12.717, df = 1, p-value = 0.0003623
kpss.test(myts2)
## Warning in kpss.test(myts2): p-value greater than printed p-value
## 
##  KPSS Test for Level Stationarity
## 
## data:  myts2
## KPSS Level = 0.021257, Truncation lag parameter = 4, p-value = 0.1

The Ljung Box Test has a P value less than .05 indicating that the data can not be thought of as white noise series.

The KPSS Test has a P value greater than .05 indicating that the null hypothesis of stationarity around a trend is not rejected.

8.6

Use R to simulate and plot some data from simple ARIMA models.

a

Use the following R code to generate data from an AR(1) model with phi1 = 0.6 and σ^2 = 1. The process starts with y1 = 0.

y <- ts(numeric(100))
e <- rnorm(100)

for(i in 2:100)
  y[i] <- 0.6*y[i-1]+e[i]

b

Produce a time plot for the series. How does the plot change as you change phi1?

g <- function(phi1){
  y <- ts(numeric(100))
  e <- rnorm(100)
  
  for(i in 2:100){
    y[i] <- 0.6*y[i-1]+e[i]
  }
  return(y)}
autoplot(g(0.2), series = "0.2") +   geom_line(size = 1, colour = "orange") +   autolayer(g(0.5), series = "0.5", size = 1) +   autolayer(g(1), size = 1, series = "1") +   ylab("AR(1) models") +  guides(colour = guide_legend(title = "Phi1"))
## Warning in is.na(main): is.na() applied to non-(list or vector) of type
## 'NULL'

When phi1 is increased, it looks like there is a slight increase in y.

c

Write your own code to generate data from an MA(1) model with theta1 = 0.6 and σ^2 = 1

g2 <- function(theta1){
  # generate 100 data points from an MA(1) model with input theta1.

  y <- ts(numeric(100))
  # error 'e's have variation sigma^2 as 1.
  e <- rnorm(100)
  for(i in 2:100){
    y[i] <- theta1*e[i-1] + e[i]
  }
  return(y)
}
  1. Produce a time plot for the series. How does the plot change as you change theta1?
autoplot(g2(0.2), series = "0.2") +   geom_line(size = 1, colour = "orange") +   autolayer(g2(0.5), series = "0.5", size = 1) +   autolayer(g2(1), size = 1, series = "1") +   ylab("AR(1) models") +  guides(colour = guide_legend(title = "Theta1"))
## Warning in is.na(main): is.na() applied to non-(list or vector) of type
## 'NULL'

As theta1 increases, there is a increase in y as well. ##e Generate data from an ARMA(1,1) model with phi1 = 0.6, theta1 = 0.6 and σ^2 = 1.

arima1 <- ts(numeric(50))
e <- rnorm(50)
for(i in 2:50){
arima1[i] <- 0.6*arima1[i-1] + 0.6*e[i-1] + e[i]
}

f

Generate data from an AR(2) model with phi1 = -0.8, phi2 = 0.3 and σ^2= 1. (Note that these parameters will give a non-stationary series.)

arima2 <- ts(numeric(50))
e <- rnorm(50)
for(i in 2:50){
arima1[i] <- -0.8*arima2[i-1] + 0.3*arima2[i-1] + e[i]
}

g

Graph the latter two series and compare them.

autoplot(arima1, series = "ARMA(1,1)") +  autolayer(arima2, series = "AR(2)") +  ylab("y") + guides(colour = guide_legend(title = "Models"))
## Warning in is.na(main): is.na() applied to non-(list or vector) of type
## 'NULL'

8.7

Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.

a

By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q) model for these data.

ggtsdisplay(wmurders)
## Warning in is.na(main): is.na() applied to non-(list or vector) of type
## 'NULL'

The chart shows that there is no seasonality in the time series. Additionally, there isn’t a significant variance, so a Box-Cox transformation is not required.

ndiffs(wmurders)
## [1] 2

ndiffs function suggests that we need a differencing of 2.

wmurders2 <- diff(diff(wmurders))
ggtsdisplay(wmurders2)
## Warning in is.na(main): is.na() applied to non-(list or vector) of type
## 'NULL'

Box.test(wmurders2, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  wmurders2
## X-squared = 26.148, df = 1, p-value = 3.162e-07
kpss.test(wmurders2)
## Warning in kpss.test(wmurders2): p-value greater than printed p-value
## 
##  KPSS Test for Level Stationarity
## 
## data:  wmurders2
## KPSS Level = 0.030483, Truncation lag parameter = 1, p-value = 0.1

The Ljung-Box test resulted in a p value less than 0.5 which means the data can not be thought of as white noise series.

The KPSS Test has a P value greater than .05 indicating that the null hypothesis of stationarity around a trend is not rejected.

There are spikes in lags 1 and 2 in the ACF. The appropriate ARIMA model can be ARIMA(0,2,2)

b

Should you include a constant in the model? Explain.

The constant in a non-seasonal ARIMA model is equivalent to including a polynomial trend of order d.The value of d has an effect on the prediction intervals. The higher the value of d, the more rapidly the prediction intervals increase in size but Quadratic or higher order trend is dangerous for forecasting. Since d = 2 in our model, a constant is not required.

c

Write this model in terms of the back shift operator.

(1−B)2∗yt=(1+theta1∗B+theta2∗B2)∗et

d

Fit the model using R and examine the residuals. Is the model satisfactory?

wmurders_fit <- Arima(wmurders, order = c(0,2,2))
checkresiduals(wmurders_fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,2)
## Q* = 11.764, df = 8, p-value = 0.1621
## 
## Model df: 2.   Total lags used: 10

ACF chart shows that the residuals are within the critical values. The histogram shows that the residuals are normally distributed. Overall, the model is satisfactory.

e

Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

forecast(wmurders_fit, h=3)
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.480525 2.202620 2.758430 2.055506 2.905544
## 2006       2.374890 1.985422 2.764359 1.779250 2.970531
## 2007       2.269256 1.772305 2.766206 1.509235 3.029276

f

Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

autoplot(forecast(wmurders_fit, h=3))

g

Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

wmurders_fit2 <- auto.arima(wmurders, seasonal = FALSE, stepwise =FALSE)
forecast(wmurders_fit2, h=3)
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.408817 2.137718 2.679916 1.994206 2.823428
## 2006       2.365555 1.985092 2.746018 1.783687 2.947423
## 2007       2.290976 1.753245 2.828706 1.468588 3.113363
autoplot(forecast(wmurders_fit2, h=3))

checkresiduals(wmurders_fit2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,3)
## Q* = 10.706, df = 7, p-value = 0.152
## 
## Model df: 3.   Total lags used: 10

The auto arima function suggests a Arima(0,2,3) model which looks very similar to the Arima(0,2,2). Both models are okay.