ARIMA

library(tidyverse)
library(fpp2)
library(urca) #ur.kpss() to test to see if data is stationary (white noise)

8.1, 8.2, 8.3, 8.5., 8.6, 8.7

8.1a

Explain the differences among these figures. Do they all indicate that the data are white noise?

The three plots have a random distribution. They also have different sample sizes so the margin of error gets smaller as the sample size gets larger. For white noise we are looking for a random distribution that does not have a significant lag value. All three plots appear to be white noise.

8.1b

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

Critical values are 2/root(T length of time series) so as the length of the time series gets longer, the critical values get smaller. For white noise series, we expect each autocorrelation to be close to be close to zero with random variation.

8.2

A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

The ACF shows the correlation between lagging values. However, since lag values can be correlated to each other, it can be difficult to tell when the correlation will give you any new information you can use to predict the forecast. To tell if you are learning new information from subsequent lags, you take the PACF which takes the partial lag between the last lag. In this case, The ACF starts with a highly correlated lag and then trends down until scalloping around value 12. The PACF shows the most information adding lags are 1 and 13. This is not white noise. There is still trend and seasonailty in this time series.

ibm <- ibmclose
ibm %>% 
  diff(lag = 12) %>% 
  ggtsdisplay()

8.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

usnetelec %>% autoplot()

This time series has no seasonality, only trend. We can take the first difference to obtain stationary data.

usnetelec %>%  diff() %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.1585 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

With 1 difference, our critical value of KPSS unit root test was below the 1% significants threshold meaning we fail to reject the null hypothesis of white noise. The data is stationary.

usnetelec %>%  diff() %>% autoplot()

usgdp %>% autoplot()

This time series has no seasonality, only trend. We can take the first difference to obtain stationary data.

usgdp %>% diff() %>%  ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 1.7909 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

Not quite. Looks like there’s more transforming we can do. Let’s do a BoxCox transformation.

lambda <- BoxCox.lambda(usgdp)
usgdp %>% BoxCox(lambda) %>%  diff() %>%  ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.2013 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

That worked.

usgdp %>% BoxCox(lambda) %>% diff() %>% autoplot()

mcopper %>% autoplot()

This has a trend and looks like it could have seasonality as well.

mcopper %>% ggsubseriesplot()

Doesn’t appear to be any seasonality.

mcopper %>% diff() %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 0.1843 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

1 diff looks good

mcopper %>% diff() %>% autoplot()

The distribution passes the KPSS Unit Root test but the plot doesn’t look like randomness throughout. We could add a Box Cox transformation to see if that helps.

lambda <- BoxCox.lambda(mcopper)
mcopper %>% BoxCox(lambda = lambda) %>% diff() %>% autoplot()

That looks much more evenly random and closer to white noise.

enplanements %>% autoplot()

This series has trend and seasonality.

lambda <- BoxCox.lambda(enplanements)
enplanements %>% BoxCox(lambda = lambda) %>% nsdiffs()

## [1] 1

After BoxCox transformation there is a seasonal difference that needs to be removed.

enplanements %>% BoxCox(lambda = lambda) %>% diff(lag=12) %>% diff() %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0424 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

enplanements %>% BoxCox(lambda = lambda) %>% diff(lag=12) %>% diff() %>% autoplot()

visitors %>% autoplot()

This series has both trend and seasonality.

lambda <- BoxCox.lambda(visitors)
visitors %>% BoxCox(lambda = lambda) %>% nsdiffs()

## [1] 1

After BoxCox transformation there is a seasonal difference that needs to be removed.

visitors %>% BoxCox(lambda = lambda) %>% diff(lag=12) %>% diff() %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.0158 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

visitors %>% BoxCox(lambda = lambda) %>% diff(lag=12) %>% diff() %>% autoplot()

8.5

retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349873A"],  frequency=12, start=c(1982,4))

myts %>% autoplot()

This plot has both trend and seasonality.

lambda <- BoxCox.lambda(myts)
myts %>% BoxCox(lambda = lambda) %>% diff(lag=12) %>% diff() %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0138 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

myts %>% BoxCox(lambda = lambda) %>% diff(lag=12) %>% diff() %>% autoplot()

8.6

set.seed(1234)
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
y %>% autoplot()

y02 <- ts(numeric(100))
y10 <- ts(numeric(100))

for(i in 2:100)
  y02[i] <- 0.2*y[i-1] + e[i]
for(i in 2:100)
  y10[i] <- 1*y[i-1] + e[i]

autoplot(y) +
  autolayer(y02) +
  autolayer(y10)

ma06 <- ts(numeric(100))
ma02 <- ts(numeric(100))
ma10 <- ts(numeric(100))

for(i in 2:100)
  ma06[i] <- 0.6*e[i-1] + e[i]

ma06 %>% ma(1) %>% autoplot()

for(i in 2:100)
  ma02[i] <- 0.2*e[i-1] + e[i]
for(i in 2:100)
  ma10[i] <- 1*e[i-1] + e[i]
autoplot(ma06) +
  autolayer(ma02) +
  autolayer(ma10)

ggtsdisplay(ma02)

ggtsdisplay(ma06)

ggtsdisplay(ma10)

e and f

arma06 <- ts(numeric(100))
ar0803 <- ts(numeric(100))

for(i in 2:100)
  arma06[i] <- 0.6*y[i-1] + 0.6*e[i-1] + e[i]

for(i in 3:100)
  ar0803[i] <- -0.8*y[i-1] + 0.3*y[i-2] + e[i]

arma06 %>% ggtsdisplay()

ar0803 %>% ggtsdisplay()

autoplot(arma06) +
  autolayer(ar0803)

Both seem to oscilate around the mean. The ARMA model seems to still have some trend and seasonality as seen in the ACF plot. The AR model still has significant first difference in the ACF and PACF so there could be trend still present.

8.7

wmurders %>% ggtsdisplay()

The ACF is exponentially decaying and the PACF has one significant spike at lag 1 and none others leading to a (pd0) model.

wmurders %>% ndiffs()

## [1] 2

I’m going to try BoxCox and 1st diff transformations.

lambda <- BoxCox.lambda(wmurders)
wmurders %>% BoxCox(lambda = lambda) %>% diff() %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.5466 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

There is still trend or seasonality in the data. I’m going to take a second difference.

#%>% BoxCox(lambda = lambda) %>% diff()
wmurders  %>% diff() %>% diff() %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.0458 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The first model I would try is (1,2,0) with +- 1 for p and q.

A constant is not included with a second order difference.

(1-B)^2 * yt

#%>% BoxCox(lambda = lambda)
model <- wmurders  %>% Arima(order = c(1,2,0))
model %>% accuracy()

##                        ME      RMSE       MAE         MPE     MAPE
## Training set -0.001376898 0.2274352 0.1777919 0.001486708 5.080777
##                  MASE       ACF1
## Training set 1.093337 -0.1593845

model %>% checkresiduals()

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,2,0)
## Q* = 16.452, df = 9, p-value = 0.05802
## 
## Model df: 1.   Total lags used: 10

The residuals look good, the ACF plot doesn’t reach significants, but the Ljung Box test has a low p-value meaning we reject the null hypothesis that there is no correlation. There is more correlation we can pull out of this model to make the series stationary.

Let’s try the auto.arima function to see if we get a different outcome.

model1 <- wmurders %>% auto.arima()
model1 %>% accuracy()

##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.01065956 0.2072523 0.1528734 -0.2149476 4.335214 0.9400996
##                    ACF1
## Training set 0.02176343

model1 %>% checkresiduals()

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,2,1)
## Q* = 12.419, df = 8, p-value = 0.1335
## 
## Model df: 2.   Total lags used: 10

The Ljung Box test still has a low p-value.

summary(model)

## Series: . 
## ARIMA(1,2,0) 
## 
## Coefficients:
##           ar1
##       -0.6719
## s.e.   0.0981
## 
## sigma^2 estimated as 0.05471:  log likelihood=2
## AIC=0   AICc=0.24   BIC=3.94
## 
## Training set error measures:
##                        ME      RMSE       MAE         MPE     MAPE
## Training set -0.001376898 0.2274352 0.1777919 0.001486708 5.080777
##                  MASE       ACF1
## Training set 1.093337 -0.1593845

summary(model1)

## Series: . 
## ARIMA(1,2,1) 
## 
## Coefficients:
##           ar1      ma1
##       -0.2434  -0.8261
## s.e.   0.1553   0.1143
## 
## sigma^2 estimated as 0.04632:  log likelihood=6.44
## AIC=-6.88   AICc=-6.39   BIC=-0.97
## 
## Training set error measures:
##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.01065956 0.2072523 0.1528734 -0.2149476 4.335214 0.9400996
##                    ACF1
## Training set 0.02176343

When comparing the AICc, the auto.arima model performs better.

model %>% forecast(h = 3)

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.474461 2.174701 2.774221 2.016018 2.932904
## 2006       2.387811 1.889464 2.886158 1.625655 3.149966
## 2007       2.282165 1.477486 3.086843 1.051515 3.512814

r <- model$residuals
len <- length(r)
et <- r[len]
et_1 <- r[len - 1]
ar1 <- coef(model)["ar1"]
yt1 <- wmurders[length(wmurders)] + (1 + ar1) * et_1  
yt2 <- yt1 + (1 + ar1) * et
yt3 <- yt2 + (1 + ar1) * et

print(yt1)

##     ar1 
## 2.48939

print(yt2)

##      ar1 
## 2.591792

print(yt3)

##      ar1 
## 2.694194

I’m not sure why it came out wrong.

model %>% forecast(h = 3) %>% autoplot()

wmurders %>% auto.arima() %>% summary()

## Series: . 
## ARIMA(1,2,1) 
## 
## Coefficients:
##           ar1      ma1
##       -0.2434  -0.8261
## s.e.   0.1553   0.1143
## 
## sigma^2 estimated as 0.04632:  log likelihood=6.44
## AIC=-6.88   AICc=-6.39   BIC=-0.97
## 
## Training set error measures:
##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.01065956 0.2072523 0.1528734 -0.2149476 4.335214 0.9400996
##                    ACF1
## Training set 0.02176343

The auto.arima picked the series (1,2,1). Both models have a negative AICc. The auto.arima’a AICc is much closer to zero. I’m not clear how to interpret this.

model1 %>% forecast(h = 3) %>% autoplot()

ARIMA

Randall Thompson

10/18/2020