Introduction

In my last blog, I have discussed various forecasting methods and accuracy metrics. This blog is aimed at looking at ARIMA models in-depth. We will start with defining essential terms in time series analysis.

Arima models require a stationary time series. It is important to stationarise the data in advance if it is not stationary. What does the term stationary mean in this sense?

Stationary time series

A time series is stationary if it does not have trend, seasonality. Trend is the general direction of change of the data where as seasonality represents the seasonal variation. Another important property is that the variance of stationary time series is not time dependent (constant).

Trend - the general short-term change in direction (Up/down)

Seasonality - increases and decreases in regular time of day, day of week, etc

Stationarity Tests Dickey-Fuller Tests and augmented Dickey-Fuller Test are the commonly used tests for stationarity. KSPSS test, PP test and White noise test(Ljung-Box test) can also be used. For Dickey-Fuller tests, p-value not less than 0.05 means that we fail to reject the null hypothesis. In other words the time series is non-stationary. When p-value is less than 0.05, we can reject the null hypothesis that the time series is not stationary.

Most real life time series data have both trend and seasonality. Let’s see some examples of time series data and tell if they are stationary or not.

# loading the necessary package
library(astsa)
library(tseries)
  1. Let’s start with a white noise data we will generate using arima.sim simulator function from astsa package.
WN <- arima.sim(model = list(order = c(0, 0, 0)), n = 200)
plot(WN)

adf.test(AirPassengers) # p-value = 0.01 indicates stationary data
## 
##  Augmented Dickey-Fuller Test
## 
## data:  AirPassengers
## Dickey-Fuller = -7.3186, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
  1. Time series with trend and seasonality
plot(AirPassengers)

adf.test(AirPassengers) # p-value = 0.01 suggests stationary data but this must definitely be wrong
## 
##  Augmented Dickey-Fuller Test
## 
## data:  AirPassengers
## Dickey-Fuller = -7.3186, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

We can use stl function from stats base package to see the seasonality, trend and noise decomposition. Let’s also

library(fpp2) # for autoplots

autoplot(stl(AirPassengers,"periodic", robust=TRUE))

  1. Some time series may not be so obvious. It is good to use the standard tests in that case.

c1. Dataset: U.S. Monthly Live Births 1948-1979.

autoplot(stl(birth,"periodic", robust=TRUE))

adf.test(birth) # Not stationary!
## 
##  Augmented Dickey-Fuller Test
## 
## data:  birth
## Dickey-Fuller = -2.597, Lag order = 7, p-value = 0.325
## alternative hypothesis: stationary

c2. Dataset: Southern Oscillation Index

# data Southern Oscillation Index

autoplot(stl(soi,"periodic")) # plot the stl decomposed data. 

adf.test(soi) # Not stationary! 
## 
##  Augmented Dickey-Fuller Test
## 
## data:  soi
## Dickey-Fuller = -7.4822, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary

Stationarizing Time series data

So how do we stationarize non-stationary time series? The technique depends on the type of non-stationarity observed in the data.

a. Detrending

Differencing is used to remove trends if the series is stationary around a trend. Such time series is usually called trend stationary.

Here is an example: We have seen

TrendSTationaryData <- arima.sim(model = list(order = c(0, 1, 0)), n = 200) 
plot(TrendSTationaryData)

Let’s try after first order differencing

plot(diff(TrendSTationaryData))

We can check the stationary with Dickey-Fuller Test.

adf.test(diff(TrendSTationaryData)) # p-value 0.01 - stationary
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(TrendSTationaryData)
## Dickey-Fuller = -5.2527, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

Seasonal Detrending should be applied if there is seasonal trend. Let’s take a look at the example below

The birth dataset on example c above will not be fully detrended after one differencing. An additional seasonal differencing is what we need.

d_birth <- diff(birth)
plot(d_birth) # not stationary since seasonal trend remains

Seasonal differencing

dd_birth <- diff(d_birth, lag = 12) # lag  = 12 makes it seasonal differecing. 12 since monthly seasonality
plot(diff(dd_birth)) 

adf.test(dd_birth) # p-value = 0.01 - stationary!
## 
##  Augmented Dickey-Fuller Test
## 
## data:  dd_birth
## Dickey-Fuller = -8.7804, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary

b. Damping - reducing the variance

If your data has large and non-constant variance, it may help to damp the data. Taking the logarithm of the time series is one common approach.

Let’s use djia dataset - financial data of “Dow Jones Industrial Average” to observe this.

plot(djia$Close,main = "Original Close Price")  # close price

plot(log((djia$Close)),main = "Log Close Price") # more constant variance

plot(diff(log((djia$Close))),main = "Differenced Log Close Price")

Model Selection

The list of Box-Jenkin’s models includes autoregressive - AR models, moving average - MA models, ARMA - autoregressive and moving average models, autoregressive integrated moving average models - ARIMA, seasonal ARIMA - SARIMA and others.

As the names indicate, the models depend on the nature of the data. We need to decompose the time series to see if or not autogressive, moving average, integration and seasonality are prevalent on the data.

Box-Jenkin’s models represent the components interms of combinations of letters: order = (p,d,q), seasonal = (P,D,Q), S. The parameters are found in the process of stationarizing the series.

  • p - number of lags (autoregressive part)

  • d - number of differencing needed

  • q - number of error lags (Moving average part)

  • P - number of seasonal lags (autoregressive part)

  • D - number of seasonal differencing needed

  • Q - number of seasonal error lags (Moving average part)

  • S - seasonality

ACF (autocorrelation function) plots and PACF - partial autocorrelation function plots can be used to detect the presence of AR and MA parts. If you have applied differencing in the process of stationarizing, that indicates there is integration in your data.

d - the number of normal differencing needed, D - the number of seasonal differencings applied.

Let’s see how we can use ACF and PACF plots to find AR(p,P) and MA(q,Q) components. acf2 function from astsa package returns the both plots.

The basic guideline for interpreting the ACF and PACF plots are as following:

  1. Look for tail off pattern in either ACF or PACF. The opposite of tail off is cut-off(no more correlation spikes).
  • The two blue dash lines pointed by purple arrows represent the significant threshold levels. Anything that spikes over these two lines reveals the significant correlations.

  • When looking at ACF plot, we ignore the long spike at lag 0 (pointed by the blue arrow). For PACF, the line usually starts at 1.

  • The lag axes will be different depending on the times series data

  1. If tail off at ACF → AR model → Cut off at PACF will provide order p for AR(p).

  2. If tail off at PACF → MA model → Cut off at ACF will provide order q for MA(q).

  3. Tail of at both ACF and PACF → ARMA model (decide p,q with trial and error method starting with smallest numbers)

  4. tail off at seasonal frequencies (n*frequency), use rules 1-4 to define P,Q or both.

Next we will apply those steps on the datasets seen above.

  1. white noise
acf2(WN) 

##      [,1]  [,2]  [,3]  [,4] [,5] [,6]  [,7] [,8]  [,9] [,10] [,11] [,12] [,13]
## ACF     0 -0.11 -0.07 -0.01 0.02 0.14 -0.02 0.04 -0.15  0.02  0.07  0.07  0.10
## PACF    0 -0.11 -0.08 -0.02 0.00 0.13 -0.02 0.07 -0.14  0.03  0.04  0.04  0.12
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF  -0.20 -0.01  0.04  0.03 -0.01  0.06 -0.02  0.04 -0.07  0.10 -0.11 -0.07
## PACF -0.21  0.07 -0.02  0.02 -0.04  0.04  0.05  0.02 -0.01  0.04 -0.12 -0.06

See that there are no spikes. p = 0, q = 0

Obviously the seasonal parameters are also zero. P = 0, Q = 0

acf2(AirPassengers)

##      [,1]  [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF  0.95  0.88 0.81 0.75 0.71 0.68 0.66 0.66 0.67  0.70  0.74  0.76  0.71
## PACF 0.95 -0.23 0.04 0.09 0.07 0.01 0.13 0.09 0.23  0.17  0.17 -0.14 -0.54
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF   0.65  0.59  0.54  0.50  0.47  0.45  0.44  0.46  0.48  0.52  0.53  0.49
## PACF -0.03  0.09  0.02  0.03  0.07  0.05 -0.05  0.05 -0.10  0.05  0.05 -0.16
##      [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37]
## ACF   0.44  0.39  0.35  0.31  0.29  0.27  0.26  0.28  0.30  0.33  0.34  0.30
## PACF -0.04  0.07  0.01  0.01  0.02 -0.01 -0.02 -0.03 -0.01 -0.05  0.05 -0.07
##      [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF   0.25  0.21  0.17  0.14  0.11  0.09  0.08  0.09  0.10  0.12  0.13
## PACF  0.00  0.02 -0.09  0.00  0.02 -0.04  0.00 -0.04 -0.01 -0.01  0.02

Since there if no tail off, let’s try after differencing.

acf2(diff(AirPassengers))

##      [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12]
## ACF   0.3 -0.10 -0.24 -0.30 -0.09 -0.08 -0.09 -0.29 -0.19 -0.10  0.28  0.83
## PACF  0.3 -0.21 -0.16 -0.22  0.01 -0.19 -0.15 -0.45 -0.23 -0.55 -0.13  0.57
##      [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF   0.28 -0.11 -0.22 -0.23 -0.06 -0.07 -0.09 -0.30 -0.16 -0.08  0.26  0.70
## PACF -0.15 -0.17  0.07  0.06  0.01 -0.08  0.04 -0.09  0.00 -0.01 -0.04 -0.04
##      [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36]
## ACF   0.26 -0.10 -0.20 -0.17 -0.07 -0.04 -0.08 -0.25 -0.16 -0.05  0.20  0.58
## PACF -0.03 -0.04  0.02 -0.06 -0.13  0.01  0.03  0.07 -0.13  0.09 -0.12 -0.01
##      [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF   0.23 -0.11 -0.13 -0.12 -0.05 -0.03 -0.08 -0.23 -0.14 -0.03  0.15  0.49
## PACF  0.00 -0.08  0.09 -0.01  0.03  0.02 -0.02  0.00  0.05  0.00  0.00  0.02

Non-seasonal part tails off at PACF:- p = 1, d = 1, q = 0

Seasonal part tails off at ACF, no seasonal differencing applied:- P= 0, d = 0, Q = 0

c.a

acf2(diff(birth))

##       [,1] [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12]
## ACF  -0.32 0.16 -0.08 -0.19  0.09 -0.28  0.06 -0.19 -0.05  0.17 -0.26  0.82
## PACF -0.32 0.06 -0.01 -0.25 -0.03 -0.26 -0.17 -0.29 -0.35 -0.16 -0.59  0.57
##      [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF  -0.28  0.17 -0.07 -0.18  0.08 -0.28  0.07 -0.18 -0.05  0.16 -0.24  0.78
## PACF  0.13  0.11  0.13  0.09  0.00  0.00  0.05  0.04 -0.07 -0.10 -0.20  0.19
##      [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36]
## ACF  -0.27  0.19 -0.08 -0.17  0.07 -0.29  0.07 -0.15 -0.04  0.14 -0.24  0.75
## PACF  0.01  0.05  0.07  0.07 -0.02 -0.06 -0.02  0.09  0.03 -0.06 -0.16  0.03
##      [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF  -0.23  0.16 -0.08 -0.15  0.05 -0.25  0.06 -0.18 -0.03  0.15 -0.22  0.72
## PACF  0.08 -0.10 -0.03  0.07 -0.04  0.06  0.04 -0.07 -0.06  0.02 -0.04  0.10

Differencing is needed since acf2(birth) has too much autocorrelation AKA Never tailing off. d = 1, D = 0.

Non-seasonal part tails off at both PACF and ACF. ( we can attempt with p =1,q = 1)

Seasonal part tails of at both ACF and PACF (so we can start at P = 1, Q = 1)

c.b

acf2(soi)

##      [,1] [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8] [,9] [,10] [,11] [,12] [,13]
## ACF   0.6 0.37  0.21  0.05 -0.11 -0.19 -0.18 -0.10 0.05  0.22  0.36  0.41  0.31
## PACF  0.6 0.01 -0.03 -0.11 -0.14 -0.06  0.03  0.09 0.16  0.19  0.17  0.07 -0.11
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF   0.10 -0.06 -0.17 -0.29 -0.37 -0.32 -0.19 -0.04  0.15  0.31  0.35  0.25
## PACF -0.25 -0.14 -0.04 -0.04 -0.05  0.03  0.04  0.03  0.08  0.11  0.05 -0.04
##      [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37]
## ACF    0.1 -0.03 -0.16 -0.28 -0.37 -0.32 -0.16 -0.02  0.17  0.33  0.39  0.30
## PACF  -0.1 -0.02 -0.02 -0.04 -0.09  0.01  0.06 -0.01  0.08  0.11  0.09 -0.01
##      [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF   0.16  0.00 -0.13 -0.24 -0.27 -0.25 -0.13  0.06  0.21  0.38  0.40
## PACF -0.04 -0.09 -0.03  0.02  0.08  0.00 -0.01  0.03 -0.01  0.10  0.02

No differencing needed (d = 0, D = 0)

non seasonal part tails of at ACF:- p = 1

Seasonal part tails off at ACF:- P = 1

Fitting Model

There are several R packages for time series analysis. We have discussed forecast package in the last blog. We will use astsa this time. I find astsa functions easier to remember and more straight forward.

The sarima function can handle all of the models under Box-Jenkin’s, we will only need to specify the parameters properly.

  • Pure AR - sarima(data, p,0,0)
  • Pure MA - sarima(data, 0,0,p)
  • Pure ARMA - sarima(data, p,0,q)
  • Pure ARIMA - sarima(data, p,d,q)
  • Pure Seasonal - sarima(data, 0,0,0, P,D,Q, S)
  • Seasonal ARIMA - sarima(data, p,d,q,P,D,Q,S)

Let’s start with the first dataset and see how we can interpret the function outputs. The above defined function parameters will be used.

sarima(WN,
       p = 0,d = 0,q = 0,
       P = 0, D = 0, Q = 0,
       S = 0
  
)
## initial  value -0.058012 
## iter   1 value -0.058012
## final  value -0.058012 
## converged
## initial  value -0.058012 
## iter   1 value -0.058012
## final  value -0.058012 
## converged

## $fit
## 
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
##     xreg = xmean, include.mean = FALSE, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##        xmean
##       0.0220
## s.e.  0.0667
## 
## sigma^2 estimated as 0.8905:  log likelihood = -272.19,  aic = 548.37
## 
## $degrees_of_freedom
## [1] 199
## 
## $ttable
##       Estimate     SE t.value p.value
## xmean    0.022 0.0667  0.3297   0.742
## 
## $AIC
## [1] 2.741852
## 
## $AICc
## [1] 2.741953
## 
## $BIC
## [1] 2.774836

Output interpretation

Like in standard regressions models, the modelling is wrong if the residuals of the model do not behave like white noise. Therefore,

  • Standard Residuals Plot - should look like that of standard white noise plot for the model to be correct.

  • ACF of residuals - should have no spikes at all. (no autoregression should be left in the residual)

  • Q-Q plot of residuals - should be as linear as possible.

  • p - values of Ljung-Box statistic - should all be above the blue threshold line.

Accordingly, the residuals of our first model meet the test for white noise. The model can be accepted for forecasting.

Taking another example,

sarima(birth,
       p = 1,d = 1,q = 1,
       P = 1, D = 0, Q = 1,
       S = 12
  
)
## initial  value 2.831680 
## iter   2 value 2.544290
## iter   3 value 2.298148
## iter   4 value 2.221456
## iter   5 value 2.179128
## iter   6 value 2.140968
## iter   7 value 2.115004
## iter   8 value 2.113976
## iter   9 value 2.057246
## iter  10 value 2.027447
## iter  11 value 1.991680
## iter  12 value 1.971219
## iter  13 value 1.960648
## iter  14 value 1.956717
## iter  15 value 1.955033
## iter  16 value 1.954925
## iter  17 value 1.954907
## iter  18 value 1.954896
## iter  19 value 1.954874
## iter  20 value 1.954853
## iter  21 value 1.954830
## iter  22 value 1.954799
## iter  23 value 1.954761
## iter  24 value 1.954761
## iter  25 value 1.954744
## iter  26 value 1.954732
## iter  27 value 1.954732
## iter  28 value 1.954731
## iter  29 value 1.954731
## iter  30 value 1.954729
## iter  31 value 1.954727
## iter  32 value 1.954725
## iter  33 value 1.954724
## iter  34 value 1.954724
## iter  34 value 1.954724
## final  value 1.954724 
## converged
## initial  value 1.966436 
## iter   2 value 1.964359
## iter   3 value 1.960035
## iter   4 value 1.957448
## iter   5 value 1.956736
## iter   6 value 1.956133
## iter   7 value 1.955138
## iter   8 value 1.954444
## iter   9 value 1.953984
## iter  10 value 1.953915
## iter  11 value 1.953882
## iter  12 value 1.953824
## iter  13 value 1.953815
## iter  14 value 1.953804
## iter  15 value 1.953786
## iter  16 value 1.953773
## iter  17 value 1.953764
## iter  18 value 1.953761
## iter  19 value 1.953761
## iter  20 value 1.953760
## iter  21 value 1.953760
## iter  21 value 1.953760
## iter  21 value 1.953760
## final  value 1.953760 
## converged

## $fit
## 
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
##     xreg = constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, 
##         REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1      ma1    sar1     sma1  constant
##       0.3144  -0.7095  0.9953  -0.7926    0.1574
## s.e.  0.0857   0.0597  0.0028   0.0461    1.7213
## 
## sigma^2 estimated as 45.82:  log likelihood = -1254.64,  aic = 2521.29
## 
## $degrees_of_freedom
## [1] 367
## 
## $ttable
##          Estimate     SE  t.value p.value
## ar1        0.3144 0.0857   3.6686  0.0003
## ma1       -0.7095 0.0597 -11.8795  0.0000
## sar1       0.9953 0.0028 351.8634  0.0000
## sma1      -0.7926 0.0461 -17.1982  0.0000
## constant   0.1574 1.7213   0.0915  0.9272
## 
## $AIC
## [1] 6.777656
## 
## $AICc
## [1] 6.778096
## 
## $BIC
## [1] 6.840864

The model residual looks like white noise. The model can be accepted for forecasting.

Forecasting

Forecasting is an easy fit once the model fit. We will use sarima.for function from the same package. In addition to the model parameters discussed so far, we will need the n.ahead parameter to specify how far ahead we would like to forecast.

Let’s forecast the US Monthly Live Births for the five future years: 1980-1984,

sarima.for(birth,
       p = 1,d = 1,q = 1,
       P = 1, D = 0, Q = 1,
       S = 12,
       n.ahead = 5*12) # n.ahead should be number of expected data points. (5 x 12 months)

## $pred
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1979          258.9307 281.4213 263.8897 274.5396 270.8916 294.3042 301.0029
## 1980 275.8587 259.0351 281.7844 264.4513 275.0874 271.4689 294.7747 301.4435
## 1981 276.4228 259.6798 282.3218 265.0717 275.6582 272.0575 295.2535 301.8914
## 1982 276.9932 260.3303 282.8657 265.6982 276.2352 272.6523 295.7391 302.3462
## 1983 277.5698 260.9867 283.4160 266.3306 276.8184 273.2533 296.2313 302.8079
## 1984 278.1527                                                               
##           Sep      Oct      Nov      Dec
## 1979 295.2148 289.0144 272.5867 283.4751
## 1980 295.6840 289.5138 273.1648 284.0023
## 1981 296.1600 290.0199 273.7491 284.5360
## 1982 296.6427 290.5325 274.3396 285.0761
## 1983 297.1321 291.0516 274.9363 285.6226
## 1984                                    
## 
## $se
##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 1979            6.769392  7.911760  8.554956  9.062429  9.517233  9.943668
## 1980 12.177329 12.894528 13.414489 13.870720 14.299303 14.711440 15.111130
## 1981 17.313929 17.971054 18.472755 18.923667 19.352844 19.769278 20.176066
## 1982 22.460214 23.097283 23.601117 24.061010 24.502195 24.932485 25.354519
## 1983 27.750060 28.382174 28.895522 29.369393 29.826482 30.273822 30.713767
## 1984 33.229280                                                            
##            Aug       Sep       Oct       Nov       Dec
## 1979 10.350269 10.740798 11.117408 11.481607 11.834585
## 1980 15.500147 15.879523 16.250009 16.612224 16.966705
## 1981 20.574488 20.965239 21.348808 21.725598 22.095959
## 1982 25.769347 26.177512 26.579381 26.975255 27.365400
## 1983 31.147225 31.574649 31.996337 32.412531 32.823446
## 1984

While it is clear to see that the forecast captures the variations and seasonality in the data pretty well, we can use information criteria (AIC,BIC) to choose between models of similar type. The lower AIC or BIC values indicate a better-fit model.

If you have gone through this and the previous blog, you should have all it takes to start practicing. You can find the source code of this blog at my Github account. Github

I am crazy about data science and applying data science skills to workforce management. Feel free to reach me at LinkedIn if you wish to connect :)

You may as well enjoy my other blogs at RPubs