Introduction

Stationarity and differencing

y <-  rnorm(1000, 0, 1)
plot.ts(y, main="A stationary time series")

x <- 1:1000
y <- rnorm(1000, 0, 1*x)
z <- 2*x + y
plot.ts(z, main="A non-stationary series")

Which of these series is stationary?

Which of these series is stationary?

 

cbind("Billion kWh" = usmelec,
      "Logs" = log(usmelec),
      "Seasonally\n differenced logs" =
        diff(log(usmelec),12),
      "Doubly\n differenced logs" =
        diff(diff(log(usmelec),12),1)) %>%
  autoplot(facets=TRUE) +
  xlab("Year") + ylab("") +
  ggtitle("Monthly US net electricity generation")

 

 

library(urca)
library(fpp2)
goog %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 7 lags. 
## 
## Value of test-statistic is: 10.7223 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
#performs KPSS test on the differenced series
goog %>% diff() %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 7 lags. 
## 
## Value of test-statistic is: 0.0324 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

 

library(tseries)
goog %>% adf.test()
## 
##  Augmented Dickey-Fuller Test
## 
## data:  .
## Dickey-Fuller = -2.5417, Lag order = 9, p-value = 0.349
## alternative hypothesis: stationary

 

ndiffs(goog)
## [1] 1
#Determines the number of seasonal differencing
usmelec %>% log() %>% nsdiffs()
## [1] 1
#Determines the number of differencing 
#on the seasonally differenced series
usmelec %>% log() %>% diff(lag=12) %>% ndiffs()
## [1] 1

 

Autocorrelation

amtrak <- read.csv("Amtrak.csv")
#Time series containing the 1st 24 months
amtrak.ts <- ts(amtrak$Ridership,
                start=c(1991,1), 
                end=c(1992,12),
                freq=12)
#Generates lag 1 series using dplyr package
lag1 <- dplyr::lag(as.data.frame(amtrak.ts),1)

#Creates a data frame
cordat <- cbind(as.data.frame(amtrak.ts),lag1)

#Computes the Pearson correlation
cor(cordat,use="complete.obs")
##            x          x
## x 1.00000000 0.08088924
## x 0.08088924 1.00000000

 

Acf(amtrak.ts,lag.max=12, main="")

 

Autoregressive (AR) Models

\[ y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \cdots + \phi_p y_{t-p} + \epsilon_t \]

where \(\epsilon_t\) is white noise.

 

Moving Average (MA) Models

 

(Non-seasonal) ARIMA Models

autoplot(uschange[,"Consumption"]) +
  xlab("Year") + ylab("Quarterly percentage change")

fit <- auto.arima(uschange[,"Consumption"], seasonal=FALSE)
fit
## Series: uschange[, "Consumption"] 
## ARIMA(1,0,3) with non-zero mean 
## 
## Coefficients:
##          ar1      ma1     ma2     ma3    mean
##       0.5885  -0.3528  0.0846  0.1739  0.7454
## s.e.  0.1541   0.1658  0.0818  0.0843  0.0930
## 
## sigma^2 = 0.3499:  log likelihood = -164.81
## AIC=341.61   AICc=342.08   BIC=361

\[ y_t = c + 0.589 y_{t-1} - 0.353 \epsilon_{t-1} + 0.085 \epsilon_{t-2} + 0.174 \epsilon_{t-3} + \epsilon_t \]

fit |> 
  forecast(h=10) |>
  autoplot(include=80)

Some Remarks

  • The auto.arima() function is useful, but anything automated can be a little dangerous, and it is worth understanding something of the behavior of the models even when you rely on an automatic procedure to choose the model for you.

  • The constant \(c\) has an important effect on the long-term forecasts obtained from these models

    • If \(c=0\) and \(d=0\), the long-term forecasts will go to zero
    • If \(c=0\) and \(d=1\), the long-term forecasts will go to a non-zero constant
    • If \(c=0\) and \(d=2\), the long-term forecasts will follow a straight line
    • If \(c \neq 0\) and \(d=0\), the long-term forecasts will go to the mean of the data
    • If \(c \neq 0\) and \(d=1\), the long-term forecasts will follow a straight line
    • If \(c \neq 0\) and \(d=2\), the long-term forecasts will follow a quadratic trend
  • The value of \(d\) also has an effect on the prediction intervals — the higher the value of \(d\), the more rapidly the prediction intervals increase in size

  • For \(d=0\), the long-term forecast standard deviation will go to the standard deviation of the historical data, so the prediction intervals will all be essentially the same

  • The value of \(p\) is important if the data show cycles

  • To obtain cyclic forecasts, it is necessary to have \(p \geq 2\), along with some additional conditions on the parameters.

  • For an AR(2) model, cyclic behavior occurs if \(\phi_1^2 + 4 \phi_2 < 1\). In that case, the average period of the cycles is \[ \frac{2 \pi}{arccos(-\phi_1(1+\phi_2)/(4\phi_2))} \]  

Determining the order of ARIMA using the ACF and PACF plots

AR(1) Process

AR(1) Process

 

AR(2) Process

AR(2) Process

 

MA(1) Process

MA(1) Process

 

MA(2) Process

MA(2) Process

 

Estimation of parameters

 

Model selection statistics

 

ARIMA model building

General process for forecasting using an ARIMA model

General process for forecasting using an ARIMA model

 

An example

library(fpp2)
library(ggplot2)
library(dplyr)

elecequip %>% stl(s.window='periodic') %>% seasadj() -> eeadj
autoplot(eeadj)

 

eeadj %>% diff() %>% ggtsdisplay(main="")

 

fit1 <- Arima(eeadj, order=c(3,1,1))
summary(fit1)
## Series: eeadj 
## ARIMA(3,1,1) 
## 
## Coefficients:
##          ar1     ar2     ar3      ma1
##       0.0044  0.0916  0.3698  -0.3921
## s.e.  0.2201  0.0984  0.0669   0.2426
## 
## sigma^2 = 9.577:  log likelihood = -492.69
## AIC=995.38   AICc=995.7   BIC=1011.72
## 
## Training set error measures:
##                      ME     RMSE      MAE         MPE     MAPE    MASE
## Training set 0.03288179 3.054718 2.357169 -0.00647009 2.481603 0.28844
##                     ACF1
## Training set 0.008980578
checkresiduals(fit1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(3,1,1)
## Q* = 24.034, df = 20, p-value = 0.2409
## 
## Model df: 4.   Total lags used: 24
autoplot(forecast(fit1))