In my last blog, I have discussed various forecasting methods and accuracy metrics. This blog is aimed at looking at ARIMA models in-depth. We will start with defining essential terms in time series analysis.
Arima models require a stationary time series. It is important to stationarise the data in advance if it is not stationary. What does the term stationary mean in this sense?
Stationary time series
A time series is stationary if it does not have trend, seasonality. Trend is the general direction of change of the data where as seasonality represents the seasonal variation. Another important property is that the variance of stationary time series is not time dependent (constant).
Trend - the general short-term change in direction (Up/down)
Seasonality - increases and decreases in regular time of day, day of week, etc
Stationarity Tests Dickey-Fuller Tests and augmented Dickey-Fuller Test are the commonly used tests for stationarity. KSPSS test, PP test and White noise test(Ljung-Box test) can also be used. For Dickey-Fuller tests, p-value not less than 0.05 means that we fail to reject the null hypothesis. In other words the time series is non-stationary. When p-value is less than 0.05, we can reject the null hypothesis that the time series is not stationary.
Most real life time series data have both trend and seasonality. Let’s see some examples of time series data and tell if they are stationary or not.
# loading the necessary package
library(astsa)
library(tseries)
WN <- arima.sim(model = list(order = c(0, 0, 0)), n = 200)
plot(WN)
adf.test(AirPassengers) # p-value = 0.01 indicates stationary data
##
## Augmented Dickey-Fuller Test
##
## data: AirPassengers
## Dickey-Fuller = -7.3186, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
plot(AirPassengers)
adf.test(AirPassengers) # p-value = 0.01 suggests stationary data but this must definitely be wrong
##
## Augmented Dickey-Fuller Test
##
## data: AirPassengers
## Dickey-Fuller = -7.3186, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
We can use stl function from stats base package to see the seasonality, trend and noise decomposition. Let’s also
library(fpp2) # for autoplots
autoplot(stl(AirPassengers,"periodic", robust=TRUE))
c1. Dataset: U.S. Monthly Live Births 1948-1979.
autoplot(stl(birth,"periodic", robust=TRUE))
adf.test(birth) # Not stationary!
##
## Augmented Dickey-Fuller Test
##
## data: birth
## Dickey-Fuller = -2.597, Lag order = 7, p-value = 0.325
## alternative hypothesis: stationary
c2. Dataset: Southern Oscillation Index
# data Southern Oscillation Index
autoplot(stl(soi,"periodic")) # plot the stl decomposed data.
adf.test(soi) # Not stationary!
##
## Augmented Dickey-Fuller Test
##
## data: soi
## Dickey-Fuller = -7.4822, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
So how do we stationarize non-stationary time series? The technique depends on the type of non-stationarity observed in the data.
Differencing is used to remove trends if the series is stationary around a trend. Such time series is usually called trend stationary.
Here is an example: We have seen
TrendSTationaryData <- arima.sim(model = list(order = c(0, 1, 0)), n = 200)
plot(TrendSTationaryData)
Let’s try after first order differencing
plot(diff(TrendSTationaryData))
We can check the stationary with Dickey-Fuller Test.
adf.test(diff(TrendSTationaryData)) # p-value 0.01 - stationary
##
## Augmented Dickey-Fuller Test
##
## data: diff(TrendSTationaryData)
## Dickey-Fuller = -5.2527, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
Seasonal Detrending should be applied if there is seasonal trend. Let’s take a look at the example below
The birth dataset on example c above will not be fully detrended after one differencing. An additional seasonal differencing is what we need.
d_birth <- diff(birth)
plot(d_birth) # not stationary since seasonal trend remains
Seasonal differencing
dd_birth <- diff(d_birth, lag = 12) # lag = 12 makes it seasonal differecing. 12 since monthly seasonality
plot(diff(dd_birth))
adf.test(dd_birth) # p-value = 0.01 - stationary!
##
## Augmented Dickey-Fuller Test
##
## data: dd_birth
## Dickey-Fuller = -8.7804, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
If your data has large and non-constant variance, it may help to damp the data. Taking the logarithm of the time series is one common approach.
Let’s use djia dataset - financial data of “Dow Jones Industrial Average” to observe this.
plot(djia$Close,main = "Original Close Price") # close price
plot(log((djia$Close)),main = "Log Close Price") # more constant variance
plot(diff(log((djia$Close))),main = "Differenced Log Close Price")
The list of Box-Jenkin’s models includes autoregressive - AR models, moving average - MA models, ARMA - autoregressive and moving average models, autoregressive integrated moving average models - ARIMA, seasonal ARIMA - SARIMA and others.
As the names indicate, the models depend on the nature of the data. We need to decompose the time series to see if or not autogressive, moving average, integration and seasonality are prevalent on the data.
Box-Jenkin’s models represent the components interms of combinations of letters: order = (p,d,q), seasonal = (P,D,Q), S. The parameters are found in the process of stationarizing the series.
p - number of lags (autoregressive part)
d - number of differencing needed
q - number of error lags (Moving average part)
P - number of seasonal lags (autoregressive part)
D - number of seasonal differencing needed
Q - number of seasonal error lags (Moving average part)
S - seasonality
ACF (autocorrelation function) plots and PACF - partial autocorrelation function plots can be used to detect the presence of AR and MA parts. If you have applied differencing in the process of stationarizing, that indicates there is integration in your data.
d - the number of normal differencing needed, D - the number of seasonal differencings applied.
Let’s see how we can use ACF and PACF plots to find AR(p,P) and MA(q,Q) components. acf2 function from astsa package returns the both plots.
The basic guideline for interpreting the ACF and PACF plots are as following:
The two blue dash lines pointed by purple arrows represent the significant threshold levels. Anything that spikes over these two lines reveals the significant correlations.
When looking at ACF plot, we ignore the long spike at lag 0 (pointed by the blue arrow). For PACF, the line usually starts at 1.
The lag axes will be different depending on the times series data
If tail off at ACF → AR model → Cut off at PACF will provide order p for AR(p).
If tail off at PACF → MA model → Cut off at ACF will provide order q for MA(q).
Tail of at both ACF and PACF → ARMA model (decide p,q with trial and error method starting with smallest numbers)
tail off at seasonal frequencies (n*frequency), use rules 1-4 to define P,Q or both.
Next we will apply those steps on the datasets seen above.
acf2(WN)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF 0 -0.11 -0.07 -0.01 0.02 0.14 -0.02 0.04 -0.15 0.02 0.07 0.07 0.10
## PACF 0 -0.11 -0.08 -0.02 0.00 0.13 -0.02 0.07 -0.14 0.03 0.04 0.04 0.12
## [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF -0.20 -0.01 0.04 0.03 -0.01 0.06 -0.02 0.04 -0.07 0.10 -0.11 -0.07
## PACF -0.21 0.07 -0.02 0.02 -0.04 0.04 0.05 0.02 -0.01 0.04 -0.12 -0.06
See that there are no spikes. p = 0, q = 0
Obviously the seasonal parameters are also zero. P = 0, Q = 0
acf2(AirPassengers)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF 0.95 0.88 0.81 0.75 0.71 0.68 0.66 0.66 0.67 0.70 0.74 0.76 0.71
## PACF 0.95 -0.23 0.04 0.09 0.07 0.01 0.13 0.09 0.23 0.17 0.17 -0.14 -0.54
## [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF 0.65 0.59 0.54 0.50 0.47 0.45 0.44 0.46 0.48 0.52 0.53 0.49
## PACF -0.03 0.09 0.02 0.03 0.07 0.05 -0.05 0.05 -0.10 0.05 0.05 -0.16
## [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37]
## ACF 0.44 0.39 0.35 0.31 0.29 0.27 0.26 0.28 0.30 0.33 0.34 0.30
## PACF -0.04 0.07 0.01 0.01 0.02 -0.01 -0.02 -0.03 -0.01 -0.05 0.05 -0.07
## [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF 0.25 0.21 0.17 0.14 0.11 0.09 0.08 0.09 0.10 0.12 0.13
## PACF 0.00 0.02 -0.09 0.00 0.02 -0.04 0.00 -0.04 -0.01 -0.01 0.02
Since there if no tail off, let’s try after differencing.
acf2(diff(AirPassengers))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## ACF 0.3 -0.10 -0.24 -0.30 -0.09 -0.08 -0.09 -0.29 -0.19 -0.10 0.28 0.83
## PACF 0.3 -0.21 -0.16 -0.22 0.01 -0.19 -0.15 -0.45 -0.23 -0.55 -0.13 0.57
## [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF 0.28 -0.11 -0.22 -0.23 -0.06 -0.07 -0.09 -0.30 -0.16 -0.08 0.26 0.70
## PACF -0.15 -0.17 0.07 0.06 0.01 -0.08 0.04 -0.09 0.00 -0.01 -0.04 -0.04
## [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36]
## ACF 0.26 -0.10 -0.20 -0.17 -0.07 -0.04 -0.08 -0.25 -0.16 -0.05 0.20 0.58
## PACF -0.03 -0.04 0.02 -0.06 -0.13 0.01 0.03 0.07 -0.13 0.09 -0.12 -0.01
## [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF 0.23 -0.11 -0.13 -0.12 -0.05 -0.03 -0.08 -0.23 -0.14 -0.03 0.15 0.49
## PACF 0.00 -0.08 0.09 -0.01 0.03 0.02 -0.02 0.00 0.05 0.00 0.00 0.02
Non-seasonal part tails off at PACF:- p = 1, d = 1, q = 0
Seasonal part tails off at ACF, no seasonal differencing applied:- P= 0, d = 0, Q = 0
c.a
acf2(diff(birth))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## ACF -0.32 0.16 -0.08 -0.19 0.09 -0.28 0.06 -0.19 -0.05 0.17 -0.26 0.82
## PACF -0.32 0.06 -0.01 -0.25 -0.03 -0.26 -0.17 -0.29 -0.35 -0.16 -0.59 0.57
## [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF -0.28 0.17 -0.07 -0.18 0.08 -0.28 0.07 -0.18 -0.05 0.16 -0.24 0.78
## PACF 0.13 0.11 0.13 0.09 0.00 0.00 0.05 0.04 -0.07 -0.10 -0.20 0.19
## [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36]
## ACF -0.27 0.19 -0.08 -0.17 0.07 -0.29 0.07 -0.15 -0.04 0.14 -0.24 0.75
## PACF 0.01 0.05 0.07 0.07 -0.02 -0.06 -0.02 0.09 0.03 -0.06 -0.16 0.03
## [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF -0.23 0.16 -0.08 -0.15 0.05 -0.25 0.06 -0.18 -0.03 0.15 -0.22 0.72
## PACF 0.08 -0.10 -0.03 0.07 -0.04 0.06 0.04 -0.07 -0.06 0.02 -0.04 0.10
Differencing is needed since acf2(birth) has too much autocorrelation AKA Never tailing off. d = 1, D = 0.
Non-seasonal part tails off at both PACF and ACF. ( we can attempt with p =1,q = 1)
Seasonal part tails of at both ACF and PACF (so we can start at P = 1, Q = 1)
c.b
acf2(soi)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF 0.6 0.37 0.21 0.05 -0.11 -0.19 -0.18 -0.10 0.05 0.22 0.36 0.41 0.31
## PACF 0.6 0.01 -0.03 -0.11 -0.14 -0.06 0.03 0.09 0.16 0.19 0.17 0.07 -0.11
## [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF 0.10 -0.06 -0.17 -0.29 -0.37 -0.32 -0.19 -0.04 0.15 0.31 0.35 0.25
## PACF -0.25 -0.14 -0.04 -0.04 -0.05 0.03 0.04 0.03 0.08 0.11 0.05 -0.04
## [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37]
## ACF 0.1 -0.03 -0.16 -0.28 -0.37 -0.32 -0.16 -0.02 0.17 0.33 0.39 0.30
## PACF -0.1 -0.02 -0.02 -0.04 -0.09 0.01 0.06 -0.01 0.08 0.11 0.09 -0.01
## [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF 0.16 0.00 -0.13 -0.24 -0.27 -0.25 -0.13 0.06 0.21 0.38 0.40
## PACF -0.04 -0.09 -0.03 0.02 0.08 0.00 -0.01 0.03 -0.01 0.10 0.02
No differencing needed (d = 0, D = 0)
non seasonal part tails of at ACF:- p = 1
Seasonal part tails off at ACF:- P = 1
There are several R packages for time series analysis. We have discussed forecast package in the last blog. We will use astsa this time. I find astsa functions easier to remember and more straight forward.
The sarima function can handle all of the models under Box-Jenkin’s, we will only need to specify the parameters properly.
Let’s start with the first dataset and see how we can interpret the function outputs. The above defined function parameters will be used.
sarima(WN,
p = 0,d = 0,q = 0,
P = 0, D = 0, Q = 0,
S = 0
)
## initial value -0.058012
## iter 1 value -0.058012
## final value -0.058012
## converged
## initial value -0.058012
## iter 1 value -0.058012
## final value -0.058012
## converged
## $fit
##
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S),
## xreg = xmean, include.mean = FALSE, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## xmean
## 0.0220
## s.e. 0.0667
##
## sigma^2 estimated as 0.8905: log likelihood = -272.19, aic = 548.37
##
## $degrees_of_freedom
## [1] 199
##
## $ttable
## Estimate SE t.value p.value
## xmean 0.022 0.0667 0.3297 0.742
##
## $AIC
## [1] 2.741852
##
## $AICc
## [1] 2.741953
##
## $BIC
## [1] 2.774836
Output interpretation
Like in standard regressions models, the modelling is wrong if the residuals of the model do not behave like white noise. Therefore,
Standard Residuals Plot - should look like that of standard white noise plot for the model to be correct.
ACF of residuals - should have no spikes at all. (no autoregression should be left in the residual)
Q-Q plot of residuals - should be as linear as possible.
p - values of Ljung-Box statistic - should all be above the blue threshold line.
Accordingly, the residuals of our first model meet the test for white noise. The model can be accepted for forecasting.
Taking another example,
sarima(birth,
p = 1,d = 1,q = 1,
P = 1, D = 0, Q = 1,
S = 12
)
## initial value 2.831680
## iter 2 value 2.544290
## iter 3 value 2.298148
## iter 4 value 2.221456
## iter 5 value 2.179128
## iter 6 value 2.140968
## iter 7 value 2.115004
## iter 8 value 2.113976
## iter 9 value 2.057246
## iter 10 value 2.027447
## iter 11 value 1.991680
## iter 12 value 1.971219
## iter 13 value 1.960648
## iter 14 value 1.956717
## iter 15 value 1.955033
## iter 16 value 1.954925
## iter 17 value 1.954907
## iter 18 value 1.954896
## iter 19 value 1.954874
## iter 20 value 1.954853
## iter 21 value 1.954830
## iter 22 value 1.954799
## iter 23 value 1.954761
## iter 24 value 1.954761
## iter 25 value 1.954744
## iter 26 value 1.954732
## iter 27 value 1.954732
## iter 28 value 1.954731
## iter 29 value 1.954731
## iter 30 value 1.954729
## iter 31 value 1.954727
## iter 32 value 1.954725
## iter 33 value 1.954724
## iter 34 value 1.954724
## iter 34 value 1.954724
## final value 1.954724
## converged
## initial value 1.966436
## iter 2 value 1.964359
## iter 3 value 1.960035
## iter 4 value 1.957448
## iter 5 value 1.956736
## iter 6 value 1.956133
## iter 7 value 1.955138
## iter 8 value 1.954444
## iter 9 value 1.953984
## iter 10 value 1.953915
## iter 11 value 1.953882
## iter 12 value 1.953824
## iter 13 value 1.953815
## iter 14 value 1.953804
## iter 15 value 1.953786
## iter 16 value 1.953773
## iter 17 value 1.953764
## iter 18 value 1.953761
## iter 19 value 1.953761
## iter 20 value 1.953760
## iter 21 value 1.953760
## iter 21 value 1.953760
## iter 21 value 1.953760
## final value 1.953760
## converged
## $fit
##
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S),
## xreg = constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc,
## REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ma1 sar1 sma1 constant
## 0.3144 -0.7095 0.9953 -0.7926 0.1574
## s.e. 0.0857 0.0597 0.0028 0.0461 1.7213
##
## sigma^2 estimated as 45.82: log likelihood = -1254.64, aic = 2521.29
##
## $degrees_of_freedom
## [1] 367
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.3144 0.0857 3.6686 0.0003
## ma1 -0.7095 0.0597 -11.8795 0.0000
## sar1 0.9953 0.0028 351.8634 0.0000
## sma1 -0.7926 0.0461 -17.1982 0.0000
## constant 0.1574 1.7213 0.0915 0.9272
##
## $AIC
## [1] 6.777656
##
## $AICc
## [1] 6.778096
##
## $BIC
## [1] 6.840864
The model residual looks like white noise. The model can be accepted for forecasting.
Forecasting is an easy fit once the model fit. We will use sarima.for function from the same package. In addition to the model parameters discussed so far, we will need the n.ahead parameter to specify how far ahead we would like to forecast.
Let’s forecast the US Monthly Live Births for the five future years: 1980-1984,
sarima.for(birth,
p = 1,d = 1,q = 1,
P = 1, D = 0, Q = 1,
S = 12,
n.ahead = 5*12) # n.ahead should be number of expected data points. (5 x 12 months)
## $pred
## Jan Feb Mar Apr May Jun Jul Aug
## 1979 258.9307 281.4213 263.8897 274.5396 270.8916 294.3042 301.0029
## 1980 275.8587 259.0351 281.7844 264.4513 275.0874 271.4689 294.7747 301.4435
## 1981 276.4228 259.6798 282.3218 265.0717 275.6582 272.0575 295.2535 301.8914
## 1982 276.9932 260.3303 282.8657 265.6982 276.2352 272.6523 295.7391 302.3462
## 1983 277.5698 260.9867 283.4160 266.3306 276.8184 273.2533 296.2313 302.8079
## 1984 278.1527
## Sep Oct Nov Dec
## 1979 295.2148 289.0144 272.5867 283.4751
## 1980 295.6840 289.5138 273.1648 284.0023
## 1981 296.1600 290.0199 273.7491 284.5360
## 1982 296.6427 290.5325 274.3396 285.0761
## 1983 297.1321 291.0516 274.9363 285.6226
## 1984
##
## $se
## Jan Feb Mar Apr May Jun Jul
## 1979 6.769392 7.911760 8.554956 9.062429 9.517233 9.943668
## 1980 12.177329 12.894528 13.414489 13.870720 14.299303 14.711440 15.111130
## 1981 17.313929 17.971054 18.472755 18.923667 19.352844 19.769278 20.176066
## 1982 22.460214 23.097283 23.601117 24.061010 24.502195 24.932485 25.354519
## 1983 27.750060 28.382174 28.895522 29.369393 29.826482 30.273822 30.713767
## 1984 33.229280
## Aug Sep Oct Nov Dec
## 1979 10.350269 10.740798 11.117408 11.481607 11.834585
## 1980 15.500147 15.879523 16.250009 16.612224 16.966705
## 1981 20.574488 20.965239 21.348808 21.725598 22.095959
## 1982 25.769347 26.177512 26.579381 26.975255 27.365400
## 1983 31.147225 31.574649 31.996337 32.412531 32.823446
## 1984
While it is clear to see that the forecast captures the variations and seasonality in the data pretty well, we can use information criteria (AIC,BIC) to choose between models of similar type. The lower AIC or BIC values indicate a better-fit model.
If you have gone through this and the previous blog, you should have all it takes to start practicing. You can find the source code of this blog at my Github account. Github
I am crazy about data science and applying data science skills to workforce management. Feel free to reach me at LinkedIn if you wish to connect :)
You may as well enjoy my other blogs at RPubs