I took these notes while working through the ARIMA Modeling with R at DataCamp.
Here are some time series datasets with distinctive properties related to modeling.
The AirPassengers dataset from the datasets package contains Monthly totals of international airline passengers, 1949 to 1960. AirPassengers has an upward trend. It has seasonalitity; Aug is peak travel month, and Nov the smallest. It also has heteroscedasticity; as the overall passenger volume grows, small percentage changes have large absolute values.
plot(AirPassengers,
main = "Monthly International Airline Passengers, 1949 - 1960")
text(AirPassengers, labels = 1:12, col = 1:12)
The djia dataset from the astsa package contains daily DJIA values from April 2006 - April 2016. The data have a generally positive trend, but not always. There is no seasonal component, and the residuals are homoscedastic. This time series is an example of a random walk.
library(astsa)
library(xts)
plot(djia$Close)
The soi dataset from the astsa package contains the Southern Oscillation Index (SOI) for a period of 453 months ranging over the years 1950-1987. This series has no trend. It does have seasonality. SOI is a surrogate for sea surface temperature, and you can seee that the data is positively correlated with itself one month apart (\(\rho\) = 0.604), and negatively correlated with itself 6 months apart (\(\rho\) = -0.187) (summer vs winter).
ts.plot(soi,
main = "Southern Oscillation Index (SOI), 1950 - 1987")
lag.plot(soi, set.lags = c(1, 6))
acf(soi, lag.max = 6, plot = FALSE)
##
## Autocorrelations of series 'soi', by lag
##
## 0.0000 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000
## 1.000 0.604 0.374 0.214 0.050 -0.107 -0.187
In normal regression, \(Y_i = \beta X_i + \epsilon\) with the assumption that the residuals are normally distributed, independent, and homoscedastic (i.e., the errors are white noise).
With time series data you can regress the present period value on the prior period value, \(X_t = \phi X_{t-1} + \epsilon\). This is called auto regression (AR). Typically however, the data are correlated. You can overcome this limitation by using a moving average (MA) for the errors, \(\epsilon_t = W_t + \theta W_{t-1}\). The ARMA combines the two models into one model with autoregression and autocorrelated errors.
\[X_t = \phi X_{t-1} + W_t + \theta W_{t-1}\]
Stationary time series have a constant mean (no trend) and constant correlation structure; that is, you can estimate lag correlations on pairs because the data is consistent throughout the series.
A trend stationary time series is stationary around a trend. A simple example is \(Y_t = \alpha + \beta t + X_t\) where \(X_t\) is a stationary data series. A trend stationary series can be coerced into stationarity by differencing the observations \(X_t - X_{t-1}\).
The random walk model, \(X_t = X_{t-1} + W_t + c\) where \(W_t\) is white noise and c is a drift parameter, can also be coerced ito stationarity with differencing.
A time series with trend and heteroscedasticity, as in the AirPassengers and djia datasets above, can be coerced into stationarity with homoscedasticity by taking the log differences. A time series with trend and heteroscedasticity follows a model like \(X_t = (1 + p_t)X_{t-1}\) where \(p_t\) is a small percentage change. Taking the log difference unwinds the percentage change.
Here are the AirPassengers and djia datasets with log differencing. With the AirPassengers dataset, the trend and heteroscedasticity is now removed (but the periodicity remains).
par(mfrow = c(2, 1))
plot(AirPassengers,
main = "Monthly International Airline Passengers, 1949 - 1960")
plot(diff(log(AirPassengers)))
With the djia dataset, the trend and heteroscedasticity are removed, leaving a stationary series.
par(mfrow = c(2, 1))
plot(djia$Close)
plot(diff(log(djia$Close)))
Any stationary time series can be written as a linear combination of white noise. ARMA models have this form, so they are a good choice for modeling stationary time series.
Use the arima.sim() function to generate data from an ARMA model. Here are some examples of data series generated from the arima.sim() function.
par(mfrow = c(1, 3))
# white noise
WN <- arima.sim(model = list(order = c(0, 0, 0)), n = 200)
plot(WN, main = "WN")
# MA(1) with parameter .9
MA <- arima.sim(model = list(order = c(0, 0, 1), ma = 0.9), n = 200)
plot(MA, main = "MA(1)")
# AR(2) with parameters 1.5 and -.75
AR <- arima.sim(model = list(order = c(2, 0, 0), ar = c(1.5, -.75)), n = 200)
plot(AR, main = "AR(2)")
You cannot identify which model to fit to a time series just by looking at a plot of the data. Below is an AR(1) model and an AR(2) model. Their plots are similar.
par(mfrow = c(1, 2))
ar1 <- arima.sim(model = list(order = c(1, 0, 0), ar = -.7),
n = 200)
ma1 <- arima.sim(model = list(order = c(0, 0, 1), ma = -.7),
n = 200)
plot(ar1, main = "AR(1)")
plot(ma1, main = "MA(1)")
The tools to identify the model orders are the autocorrelation function (ACF) and the partial autocorrelation function (PACF).
| AR(p) | MA(q) | ARMA(p, q) | |
|---|---|---|---|
| ACF | Tails off | Cuts of at lag q | Tails off |
| PACF | Cuts off at lag p | Tails off | Tails off |
Produce these two plots with the acf2() function from the astsa package. The combination of the plots will narrow the model selection to one of the three. Suppose you have an AR(1) model \(X_t = 0.9X_{t-1} + W_t\).
set.seed(123)
x <- arima.sim(model = list(order = c(1, 0, 0),
ar = .9),
n = 100)
plot(x, main = "AR(1)")
The ACF plot below tails off, so it is either an AR model or an ARMA model. The PACF plot cuts off at a lag of 1, so it is an AR(1) model.
acf2(x)
## ACF PACF
## [1,] 0.88 0.88
## [2,] 0.77 -0.02
## [3,] 0.67 0.00
## [4,] 0.55 -0.14
## [5,] 0.48 0.12
## [6,] 0.43 0.07
## [7,] 0.39 0.02
## [8,] 0.34 -0.12
## [9,] 0.28 -0.06
## [10,] 0.24 0.05
## [11,] 0.18 -0.06
## [12,] 0.09 -0.17
## [13,] 0.05 0.07
## [14,] 0.01 -0.01
## [15,] -0.03 0.01
## [16,] -0.07 -0.13
## [17,] -0.08 0.07
## [18,] -0.10 -0.02
## [19,] -0.12 0.06
## [20,] -0.11 0.02
Use the sarima() function to fit any ARIMA model to a data series. In this case, fit an AR(1) model. The parameter estimate for ar1 is 0.8781 - close to the 0.9 parameter used in the series generation.
sarima(x, p = 1, d = 0, q = 0)
## initial value 0.693455
## iter 2 value -0.060553
## iter 3 value -0.060659
## iter 4 value -0.060698
## iter 5 value -0.060866
## iter 6 value -0.060914
## iter 7 value -0.060927
## iter 8 value -0.060927
## iter 9 value -0.060929
## iter 10 value -0.060931
## iter 11 value -0.060931
## iter 12 value -0.060931
## iter 12 value -0.060931
## iter 12 value -0.060931
## final value -0.060931
## converged
## initial value -0.053164
## iter 2 value -0.053270
## iter 3 value -0.053886
## iter 4 value -0.053890
## iter 5 value -0.053893
## iter 6 value -0.053893
## iter 7 value -0.053893
## iter 8 value -0.053893
## iter 9 value -0.053893
## iter 10 value -0.053893
## iter 10 value -0.053893
## iter 10 value -0.053893
## final value -0.053893
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 xmean
## 0.8781 -0.3569
## s.e. 0.0456 0.7220
##
## sigma^2 estimated as 0.8847: log likelihood = -136.5, aic = 279.01
##
## $degrees_of_freedom
## [1] 98
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.8781 0.0456 19.2759 0.0000
## xmean -0.3569 0.7220 -0.4943 0.6222
##
## $AIC
## [1] 2.79009
##
## $AICc
## [1] 2.791327
##
## $BIC
## [1] 2.868245
Try this again with AR(2) model \(X_t = 1.5X_{t-1} - 0.75X_{t-2} + W_t\). Here is the simulated time series.
set.seed(123)
x <- arima.sim(model = list(order = c(2, 0, 0),
ar = c(1.5, -.75)),
n = 200)
plot(x, main = "AR(2)")
The ACF tails off again, so it’s an AR or ARMA. The PACF cuts off at lag q, so it’s an AR(2).
acf2(x)
## ACF PACF
## [1,] 0.83 0.83
## [2,] 0.46 -0.73
## [3,] 0.06 -0.04
## [4,] -0.27 -0.03
## [5,] -0.43 0.06
## [6,] -0.43 -0.08
## [7,] -0.31 0.04
## [8,] -0.14 -0.08
## [9,] 0.01 0.03
## [10,] 0.13 0.07
## [11,] 0.20 0.02
## [12,] 0.22 0.08
## [13,] 0.20 -0.01
## [14,] 0.14 -0.11
## [15,] 0.04 -0.04
## [16,] -0.10 -0.11
## [17,] -0.21 0.10
## [18,] -0.27 -0.09
## [19,] -0.26 0.00
## [20,] -0.18 -0.06
## [21,] -0.08 -0.01
## [22,] 0.01 -0.09
## [23,] 0.07 0.01
## [24,] 0.10 0.10
## [25,] 0.11 0.01
Fit an AR(2) model to the series. The parameter estimate for ar1 equals 1.4328, and for ar2 equals -0.7255, close to the 1.5 and -.75 in the simulated data series.
sarima(x, p = 2, d = 0, q = 0)
## initial value 0.893841
## iter 2 value 0.748514
## iter 3 value 0.369992
## iter 4 value 0.187328
## iter 5 value 0.042574
## iter 6 value -0.056822
## iter 7 value -0.069519
## iter 8 value -0.073202
## iter 9 value -0.073471
## iter 10 value -0.073747
## iter 11 value -0.073759
## iter 12 value -0.073760
## iter 13 value -0.073760
## iter 13 value -0.073760
## iter 13 value -0.073760
## final value -0.073760
## converged
## initial value -0.067631
## iter 2 value -0.067644
## iter 3 value -0.067656
## iter 4 value -0.067657
## iter 5 value -0.067657
## iter 6 value -0.067657
## iter 7 value -0.067657
## iter 7 value -0.067657
## iter 7 value -0.067657
## final value -0.067657
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ar2 xmean
## 1.4328 -0.7255 -0.0565
## s.e. 0.0479 0.0479 0.2244
##
## sigma^2 estimated as 0.8619: log likelihood = -270.26, aic = 548.51
##
## $degrees_of_freedom
## [1] 197
##
## $ttable
## Estimate SE t.value p.value
## ar1 1.4328 0.0479 29.9166 0.0000
## ar2 -0.7255 0.0479 -15.1555 0.0000
## xmean -0.0565 0.2244 -0.2518 0.8014
##
## $AIC
## [1] 2.742563
##
## $AICc
## [1] 2.743175
##
## $BIC
## [1] 2.808529
Repeat this exercise once more, this time with the MA(1) model \(X_t = W_t - .8W_{t-1}\). Here is the simulated time series.
set.seed(123)
x <- arima.sim(model = list(order = c(0, 0, 1),
ma = 0.8),
n = 200)
plot(x)
The ACF cuts off at 1, so it is an MA(1). No need to look at the PACF.
acf2(x)
## ACF PACF
## [1,] 0.39 0.39
## [2,] -0.07 -0.27
## [3,] 0.02 0.20
## [4,] -0.02 -0.18
## [5,] -0.02 0.13
## [6,] -0.02 -0.14
## [7,] 0.00 0.13
## [8,] 0.02 -0.09
## [9,] -0.01 0.06
## [10,] 0.02 -0.02
## [11,] -0.06 -0.09
## [12,] -0.16 -0.10
## [13,] -0.07 0.04
## [14,] 0.05 0.03
## [15,] 0.02 -0.03
## [16,] -0.05 -0.04
## [17,] -0.01 0.03
## [18,] 0.01 -0.04
## [19,] -0.11 -0.12
## [20,] -0.16 -0.06
## [21,] -0.01 0.07
## [22,] 0.06 0.00
## [23,] -0.02 -0.06
## [24,] -0.03 -0.01
## [25,] 0.01 0.02
Fit an MA(1) model to the time series. The parameter estimate for ma1 equals 0.9231, close to the 0.8 in the generated time series.
sarima(x, p = 0, d = 0, q = 1)
## initial value 0.155021
## iter 2 value 0.034060
## iter 3 value 0.007456
## iter 4 value -0.026302
## iter 5 value -0.049695
## iter 6 value -0.049854
## iter 7 value -0.049881
## iter 8 value -0.049924
## iter 9 value -0.049952
## iter 10 value -0.049960
## iter 11 value -0.049960
## iter 12 value -0.049960
## iter 13 value -0.049960
## iter 13 value -0.049960
## iter 13 value -0.049960
## final value -0.049960
## converged
## initial value -0.054644
## iter 2 value -0.055618
## iter 3 value -0.056055
## iter 4 value -0.056369
## iter 5 value -0.056426
## iter 6 value -0.056453
## iter 7 value -0.056455
## iter 8 value -0.056461
## iter 9 value -0.056466
## iter 10 value -0.056466
## iter 11 value -0.056466
## iter 11 value -0.056466
## final value -0.056466
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ma1 xmean
## 0.9231 0.0048
## s.e. 0.0402 0.1276
##
## sigma^2 estimated as 0.8847: log likelihood = -272.49, aic = 550.99
##
## $degrees_of_freedom
## [1] 198
##
## $ttable
## Estimate SE t.value p.value
## ma1 0.9231 0.0402 22.9571 0.00
## xmean 0.0048 0.1276 0.0376 0.97
##
## $AIC
## [1] 2.754945
##
## $AICc
## [1] 2.75525
##
## $BIC
## [1] 2.80442
An ARMA model is an autoregression with correlated errors.
\[X_t = \phi X_{t-1} + W_t + \theta X_{t-1}\]
You can identify an ARMA model from the ACF and PACF plots. Both will decay to zero. However, the PACF will not tell you the p an q orders. The best practice is to start with an ARMA(1, 1) and then add parameters as needed.
Here is simulated data from an ARMA(2,1) process with \(\phi = (1, -.9)\) and \(\theta = .8\).
set.seed(123)
x <- arima.sim(model = list(order = c(2, 0, 1),
ar = c(1, -.9),
ma = .8),
n = 200)
plot(x)
Both the ACF and PACF decay to zero, so this is an ARMA time series.
acf2(x)
## ACF PACF
## [1,] 0.57 0.57
## [2,] -0.27 -0.88
## [3,] -0.78 0.09
## [4,] -0.61 -0.18
## [5,] 0.00 -0.09
## [6,] 0.50 -0.02
## [7,] 0.54 0.02
## [8,] 0.17 -0.07
## [9,] -0.27 -0.01
## [10,] -0.45 -0.02
## [11,] -0.26 -0.02
## [12,] 0.11 0.06
## [13,] 0.35 0.01
## [14,] 0.30 -0.03
## [15,] 0.01 0.02
## [16,] -0.26 -0.03
## [17,] -0.30 0.00
## [18,] -0.09 0.01
## [19,] 0.16 -0.05
## [20,] 0.25 0.02
## [21,] 0.14 0.02
## [22,] -0.07 -0.10
## [23,] -0.21 0.02
## [24,] -0.16 0.07
## [25,] 0.04 0.03
At this point you know you have an ARMA time series, but you don’t know what orders to assign. The right answer, of course, is ARMA(2, 1) because that is the series you generated. The AIC and BIC measures help you decide. Both measures penalize additional parameters, but the BIC is a little more strict: AIC uses \(k = 2\) and BIC uses \(k = \log(n)\). The model with the smallest AIC or BIC is usually best. AIC and BIC often agree with each other. Below are tht \(2^3 = 8\) combinations of p and q up to order 2. The lowest BIC is ARMA(2, 1) (2.961724) - success! The AIC criterion recommends ARMA(2, 2).
x.ar1 <- sarima(x, p = 1, d = 0, q = 0, details = FALSE)
x.ma1 <- sarima(x, p = 0, d = 0, q = 1, details = FALSE)
x.ar2 <- sarima(x, p = 2, d = 0, q = 0, details = FALSE)
x.ma2 <- sarima(x, p = 0, d = 0, q = 2, details = FALSE)
x.arma11 <- sarima(x, p = 1, d = 0, q = 1, details = FALSE)
x.arma21 <- sarima(x, p = 2, d = 0, q = 1, details = FALSE)
x.arma12 <- sarima(x, p = 1, d = 0, q = 2, details = FALSE)
x.arma22 <- sarima(x, p = 2, d = 0, q = 2, details = FALSE)
cat("AR(1) (AIC, BIC): ", x.ar1$AIC, ", ", x.ar1$BIC, "\n")
## AR(1) (AIC, BIC): 5.185697 , 5.235171
cat("MA(1) (AIC, BIC): ", x.ma1$AIC, ", ", x.ma1$BIC, "\n")
## MA(1) (AIC, BIC): 4.521101 , 4.570576
cat("AR(2) (AIC, BIC): ", x.ar2$AIC, ", ", x.ar2$BIC, "\n")
## AR(2) (AIC, BIC): 3.257452 , 3.323418
cat("MA(2) (AIC, BIC): ", x.ma2$AIC, ", ", x.ma2$BIC, "\n")
## MA(2) (AIC, BIC): 3.797553 , 3.863519
cat("ARMA(1, 1) (AIC, BIC): ", x.arma11$AIC, ", ", x.arma11$BIC, "\n")
## ARMA(1, 1) (AIC, BIC): 4.198092 , 4.264059
cat("ARMA(2, 1) (AIC, BIC): ", x.arma21$AIC, ", ", x.arma21$BIC, "\n")
## ARMA(2, 1) (AIC, BIC): 2.879266 , 2.961724
cat("ARMA(1, 2) (AIC, BIC): ", x.arma12$AIC, ", ", x.arma12$BIC, "\n")
## ARMA(1, 2) (AIC, BIC): 3.686392 , 3.76885
cat("ARMA(2, 2) (AIC, BIC): ", x.arma22$AIC, ", ", x.arma22$BIC, "\n")
## ARMA(2, 2) (AIC, BIC): 2.870311 , 2.96926
Perform a residual analysis to make sure the residuals are white Gaussian noise.
Here are the diagnostic plots for the ARMA(2, 1) model.
sarima(x, p = 2, d = 0, q = 1)
## initial value 1.358558
## iter 2 value 0.596753
## iter 3 value 0.398714
## iter 4 value 0.265589
## iter 5 value 0.196824
## iter 6 value 0.141164
## iter 7 value -0.005829
## iter 8 value -0.010162
## iter 9 value -0.010553
## iter 10 value -0.010570
## iter 11 value -0.010576
## iter 12 value -0.010578
## iter 13 value -0.010578
## iter 14 value -0.010578
## iter 15 value -0.010578
## iter 15 value -0.010578
## iter 15 value -0.010578
## final value -0.010578
## converged
## initial value -0.002361
## iter 2 value -0.003395
## iter 3 value -0.003693
## iter 4 value -0.004103
## iter 5 value -0.004198
## iter 6 value -0.004260
## iter 7 value -0.004303
## iter 8 value -0.004305
## iter 9 value -0.004306
## iter 10 value -0.004306
## iter 10 value -0.004306
## final value -0.004306
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ar2 ma1 xmean
## 1.0409 -0.8807 0.8417 0.0517
## s.e. 0.0351 0.0342 0.0434 0.1523
##
## sigma^2 estimated as 0.9601: log likelihood = -282.93, aic = 575.85
##
## $degrees_of_freedom
## [1] 196
##
## $ttable
## Estimate SE t.value p.value
## ar1 1.0409 0.0351 29.6831 0.0000
## ar2 -0.8807 0.0342 -25.7678 0.0000
## ma1 0.8417 0.0434 19.3974 0.0000
## xmean 0.0517 0.1523 0.3397 0.7344
##
## $AIC
## [1] 2.879266
##
## $AICc
## [1] 2.880292
##
## $BIC
## [1] 2.961724
A time series is ARIMA(p,d,q) if the differenced series (of order d) is ARMA(p, q). Consider the integrated model \(Y_t = 0.9Y_{t-1} + W_t\) where \(Y_t = X_t - X_{t-1}\). The integrated series is non-stationary, but differencing makes it stationary.
x <- arima.sim(model = list(order = c(1, 1, 0), ar = 0.9),
n = 100)
par(mfrow = c(2, 1))
plot(x,
main = "Simulated ARIMA(1, 1, 0)")
plot(diff(x),
main = "Differenced ARIMA(1, 1, 0) = ARMA(1, 0, 0)")
Here is an ARMA(2, 1, 0) model with drift.
\[Y_t = 1 + 1.5Y_{t-1} - 0.75Y_{t-2} + W_t\] where \(Y_t = X_t - X_{t-1}\).
set.seed(123)
x <- arima.sim(model = list(order = c(2, 1, 0),
ar = c(1.5, -0.75),
mean = 1),
n = 250)
plot(x)
Plot the P/ACF functions of the differenced data to determine the model.
acf2(diff(x))
## ACF PACF
## [1,] 0.82 0.82
## [2,] 0.44 -0.75
## [3,] 0.01 0.01
## [4,] -0.31 -0.03
## [5,] -0.46 0.07
## [6,] -0.43 -0.06
## [7,] -0.27 0.04
## [8,] -0.07 -0.08
## [9,] 0.09 0.03
## [10,] 0.19 0.10
## [11,] 0.23 0.03
## [12,] 0.22 0.04
## [13,] 0.16 -0.03
## [14,] 0.08 -0.09
## [15,] -0.03 -0.02
## [16,] -0.15 -0.07
## [17,] -0.22 0.09
## [18,] -0.23 -0.08
## [19,] -0.18 -0.01
## [20,] -0.09 -0.04
## [21,] 0.00 -0.01
## [22,] 0.07 -0.04
## [23,] 0.10 0.03
## [24,] 0.10 0.06
## [25,] 0.08 -0.02
## [26,] 0.05 -0.05
The ACF tails off, indicating an AR process. The PACF cuts off at lag 2. Fit an AR(2, 1, 0) model to the series. The estimated parameters for ar1 (1.4524) and ar2(-0.7598) are close the simulated series parameters.
sarima(x, p = 2, d = 1, q = 0)
## initial value 0.925157
## iter 2 value 0.760964
## iter 3 value 0.395097
## iter 4 value 0.245884
## iter 5 value -0.049048
## iter 6 value -0.052712
## iter 7 value -0.078554
## iter 8 value -0.079542
## iter 9 value -0.080615
## iter 10 value -0.081164
## iter 11 value -0.081194
## iter 12 value -0.081195
## iter 13 value -0.081195
## iter 14 value -0.081195
## iter 15 value -0.081195
## iter 15 value -0.081195
## final value -0.081195
## converged
## initial value -0.076425
## iter 2 value -0.076433
## iter 3 value -0.076438
## iter 4 value -0.076439
## iter 5 value -0.076439
## iter 6 value -0.076439
## iter 7 value -0.076439
## iter 7 value -0.076439
## iter 7 value -0.076439
## final value -0.076439
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ar2 constant
## 1.4524 -0.7598 0.0931
## s.e. 0.0408 0.0409 0.1897
##
## sigma^2 estimated as 0.8485: log likelihood = -335.62, aic = 679.25
##
## $degrees_of_freedom
## [1] 247
##
## $ttable
## Estimate SE t.value p.value
## ar1 1.4524 0.0408 35.6208 0.0000
## ar2 -0.7598 0.0409 -18.5675 0.0000
## constant 0.0931 0.1897 0.4909 0.6239
##
## $AIC
## [1] 2.716999
##
## $AICc
## [1] 2.717389
##
## $BIC
## [1] 2.773342
Here is a dataset of actual data. Dataset globtemp from the astsa package contains annual global temperature deviations to 2015. A plot of the data shows random walk behavior, so you should work with the differenced data.
plot(globtemp)
Plot the P/ACF of the differenced data.
acf2(diff(globtemp))
## ACF PACF
## [1,] -0.24 -0.24
## [2,] -0.19 -0.26
## [3,] -0.08 -0.23
## [4,] 0.20 0.06
## [5,] -0.15 -0.16
## [6,] -0.03 -0.09
## [7,] 0.03 -0.05
## [8,] 0.14 0.07
## [9,] -0.16 -0.09
## [10,] 0.11 0.11
## [11,] -0.05 -0.03
## [12,] 0.00 -0.02
## [13,] -0.13 -0.10
## [14,] 0.14 0.02
## [15,] -0.01 0.00
## [16,] -0.08 -0.09
## [17,] 0.00 0.00
## [18,] 0.19 0.11
## [19,] -0.07 0.04
## [20,] 0.02 0.13
## [21,] -0.02 0.09
## [22,] 0.08 0.08
You could say both the ACF and PACF tail off, implying an ARIMA(1, 1, 1). Or you could say the ACF cuts off at lag two while the PACF tails off, implying an ARIMA(0, 1, 2). Or you could say the ACF is tailing off while the PACF cuts off at lag 3, implying an ARIMA(3, 1, 0) model. Try all three.
globtemp.arima111 <- sarima(globtemp, p = 1, d = 1, q = 1,
details = FALSE)
globtemp.arima012 <- sarima(globtemp, p = 0, d = 1, q = 2,
details = FALSE)
globtemp.arima310 <- sarima(globtemp, p = 3, d = 1, q = 0,
details = FALSE)
cat("ARIMA(1,1,1) (AIC, BIC): ",
globtemp.arima111$AIC, ", ",
globtemp.arima111$BIC, "\n")
## ARIMA(1,1,1) (AIC, BIC): -1.716773 , -1.630691
cat("ARIMA(0,1,2) (AIC, BIC): ",
globtemp.arima012$AIC, ", ",
globtemp.arima012$BIC, "\n")
## ARIMA(0,1,2) (AIC, BIC): -1.723268 , -1.637185
cat("ARIMA(3,1,0) (AIC, BIC): ",
globtemp.arima310$AIC, ", ",
globtemp.arima310$BIC, "\n")
## ARIMA(3,1,0) (AIC, BIC): -1.717413 , -1.60981
The ARIMA(0, 1, 2) model has the lowest AIC and BIC.
One way to check an analysis is to overfit the model by adding an extra parameter to see if it makes a difference in the results. If the results change drastically, rethink the model. If the results do not change much, you can be confident that your fit is correct.
Here is an ARIMA(0,1,1) model with MA parameter .9, \(Y_t = Y_{t-1} + W_t + 0.9W_{t-1}\).
set.seed(123)
x <- arima.sim(model = list(order = c(0, 1, 1),
ma = 0.9),
n = 250)
plot(x)
Assess the P/ACF to choose a candidate model for the process.
acf2(diff(x))
## ACF PACF
## [1,] 0.41 0.41
## [2,] -0.11 -0.33
## [3,] -0.04 0.20
## [4,] -0.07 -0.24
## [5,] -0.06 0.15
## [6,] -0.03 -0.17
## [7,] 0.04 0.21
## [8,] 0.07 -0.15
## [9,] -0.01 0.10
## [10,] -0.04 -0.12
## [11,] -0.09 -0.02
## [12,] -0.12 -0.09
## [13,] -0.03 0.08
## [14,] 0.08 0.03
## [15,] 0.07 -0.02
## [16,] -0.04 -0.06
## [17,] -0.07 -0.01
## [18,] -0.09 -0.10
## [19,] -0.12 -0.02
## [20,] -0.07 -0.05
## [21,] 0.05 0.12
## [22,] 0.10 -0.06
## [23,] -0.01 -0.03
## [24,] -0.07 -0.05
## [25,] -0.04 0.00
## [26,] -0.06 -0.05
The ACF of the differenced data cuts off at lag 1 while the PACF tails off - clearly an ARIMA(0, 1, 1) model. Fit the model to the data using the standard method.
sarima(x, p = 0, d = 1, q = 1)
## initial value 0.204719
## iter 2 value 0.067142
## iter 3 value 0.020355
## iter 4 value 0.008626
## iter 5 value -0.057128
## iter 6 value -0.057335
## iter 7 value -0.059199
## iter 8 value -0.059277
## iter 9 value -0.059432
## iter 10 value -0.059432
## iter 11 value -0.059432
## iter 11 value -0.059432
## final value -0.059432
## converged
## initial value -0.065584
## iter 2 value -0.066183
## iter 3 value -0.066758
## iter 4 value -0.066899
## iter 5 value -0.067003
## iter 6 value -0.067040
## iter 7 value -0.067044
## iter 8 value -0.067049
## iter 9 value -0.067053
## iter 10 value -0.067055
## iter 11 value -0.067055
## iter 12 value -0.067055
## iter 13 value -0.067055
## iter 13 value -0.067055
## iter 13 value -0.067055
## final value -0.067055
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ma1 constant
## 0.9562 -0.0192
## s.e. 0.0185 0.1149
##
## sigma^2 estimated as 0.8659: log likelihood = -337.97, aic = 681.94
##
## $degrees_of_freedom
## [1] 248
##
## $ttable
## Estimate SE t.value p.value
## ma1 0.9562 0.0185 51.7312 0.0000
## constant -0.0192 0.1149 -0.1675 0.8671
##
## $AIC
## [1] 2.727767
##
## $AICc
## [1] 2.727961
##
## $BIC
## [1] 2.770025
The diagnostic plots look good. The t-value for ma1 is 51.7312. AIC is 2.727767 and BIC is 2.770025. Now check a model by overfitting (adding a parameter) with an additional MA parameter to see if it makes a difference.
sarima(x, p = 0, d = 1, q = 2)
## initial value 0.204719
## iter 2 value 0.045506
## iter 3 value 0.010553
## iter 4 value -0.016998
## iter 5 value -0.053664
## iter 6 value -0.055528
## iter 7 value -0.062292
## iter 8 value -0.064508
## iter 9 value -0.064955
## iter 10 value -0.065426
## iter 11 value -0.065569
## iter 12 value -0.065591
## iter 13 value -0.065596
## iter 14 value -0.065599
## iter 15 value -0.065599
## iter 16 value -0.065599
## iter 17 value -0.065599
## iter 17 value -0.065599
## iter 17 value -0.065599
## final value -0.065599
## converged
## initial value -0.071348
## iter 2 value -0.072129
## iter 3 value -0.072381
## iter 4 value -0.072487
## iter 5 value -0.072545
## iter 6 value -0.072550
## iter 7 value -0.072553
## iter 8 value -0.072553
## iter 8 value -0.072553
## final value -0.072553
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ma1 ma2 constant
## 0.8481 -0.1111 -0.0182
## s.e. 0.0668 0.0665 0.1015
##
## sigma^2 estimated as 0.8565: log likelihood = -336.6, aic = 681.19
##
## $degrees_of_freedom
## [1] 247
##
## $ttable
## Estimate SE t.value p.value
## ma1 0.8481 0.0668 12.6980 0.0000
## ma2 -0.1111 0.0665 -1.6716 0.0959
## constant -0.0182 0.1015 -0.1793 0.8579
##
## $AIC
## [1] 2.724771
##
## $AICc
## [1] 2.725161
##
## $BIC
## [1] 2.781114
The t-value for ma1 is still significant (t = 12.698, p < 0.0001), but the t-value for ma2 is not significant (t = -1.6716, p = 0.0959). The AIC decreased (AIC = 2.724771), but BIC increased (BIC = 2.781114).
Forecast a data series with the sarima.for() function from the astsa package. Forecast the following ARIMA(1,1,0) model with AR parameter 0.9, \(Y_t = 0.9Y_{t_1} + W_t\) where \(Y_t = X_t - X_{t-1}\). Here is the whole data series, 120 points.
set.seed(123)
x <- arima.sim(model = list(order = c(1, 1, 0),
ar = 0.9),
n = 120)
plot(x)
Propose a model from the P/ACF plots of the differenced data using the first 100 data points.
acf2(diff(window(x, end = 100)))
## ACF PACF
## [1,] 0.88 0.88
## [2,] 0.77 -0.03
## [3,] 0.67 0.00
## [4,] 0.55 -0.14
## [5,] 0.48 0.13
## [6,] 0.43 0.06
## [7,] 0.39 0.02
## [8,] 0.34 -0.13
## [9,] 0.28 -0.04
## [10,] 0.23 0.03
## [11,] 0.17 -0.06
## [12,] 0.09 -0.16
## [13,] 0.04 0.07
## [14,] 0.00 -0.01
## [15,] -0.03 0.01
## [16,] -0.07 -0.13
## [17,] -0.08 0.08
## [18,] -0.11 -0.04
## [19,] -0.12 0.08
## [20,] -0.11 -0.02
The ACF plot tails off while the PACF plot cuts off at lag 1, an ARIMA(1,1,0) model.
sarima(window(x, end = 100), p = 1, d = 1, q = 0)
## initial value 0.698011
## iter 2 value -0.056080
## iter 3 value -0.056204
## iter 4 value -0.056235
## iter 5 value -0.056355
## iter 6 value -0.056400
## iter 7 value -0.056413
## iter 8 value -0.056414
## iter 9 value -0.056416
## iter 10 value -0.056418
## iter 11 value -0.056418
## iter 12 value -0.056418
## iter 12 value -0.056418
## iter 12 value -0.056418
## final value -0.056418
## converged
## initial value -0.048759
## iter 2 value -0.048866
## iter 3 value -0.049476
## iter 4 value -0.049477
## iter 5 value -0.049479
## iter 6 value -0.049479
## iter 6 value -0.049479
## final value -0.049479
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 constant
## 0.8777 -0.3320
## s.e. 0.0458 0.7266
##
## sigma^2 estimated as 0.8924: log likelihood = -135.58, aic = 277.15
##
## $degrees_of_freedom
## [1] 97
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.8777 0.0458 19.1841 0.0000
## constant -0.3320 0.7266 -0.4569 0.6487
##
## $AIC
## [1] 2.799526
##
## $AICc
## [1] 2.800788
##
## $BIC
## [1] 2.878166
The diagnostic plots look good, and the parameter estimation for ar1 is significant (ar1 = 0.8777, t = 19.1841, p < 0.0001). Forecast the next 20 points in the series.
sarima.for(window(x, end = 100),
n.ahead = 20,
p = 1, d = 1, q = 1)
## $pred
## Time Series:
## Start = 101
## End = 120
## Frequency = 1
## [1] -42.77585 -43.44865 -44.07820 -44.67005 -45.22901 -45.75930 -46.26459
## [8] -46.74808 -47.21257 -47.66049 -48.09396 -48.51483 -48.92471 -49.32502
## [15] -49.71697 -50.10165 -50.47997 -50.85275 -51.22071 -51.58446
##
## $se
## Time Series:
## Start = 101
## End = 120
## Frequency = 1
## [1] 0.9444305 2.0260603 3.2427648 4.5394601 5.8809516 7.2436292
## [7] 8.6111796 9.9721691 11.3185611 12.6447490 13.9468964 15.2224718
## [13] 16.4699118 17.6883743 18.8775531 20.0375397 21.1687167 22.2716777
## [19] 23.3471653 24.3960239
lines(x)
Here is the same process with real life data, the globtemp dataset. Forecast 35 years ahead with the ARIMA(0,1,2) model.
sarima.for(globtemp, n.ahead = 35, p = 0, d = 1, q = 2)
## $pred
## Time Series:
## Start = 2016
## End = 2050
## Frequency = 1
## [1] 0.7995567 0.7745381 0.7816919 0.7888457 0.7959996 0.8031534 0.8103072
## [8] 0.8174611 0.8246149 0.8317688 0.8389226 0.8460764 0.8532303 0.8603841
## [15] 0.8675379 0.8746918 0.8818456 0.8889995 0.8961533 0.9033071 0.9104610
## [22] 0.9176148 0.9247687 0.9319225 0.9390763 0.9462302 0.9533840 0.9605378
## [29] 0.9676917 0.9748455 0.9819994 0.9891532 0.9963070 1.0034609 1.0106147
##
## $se
## Time Series:
## Start = 2016
## End = 2050
## Frequency = 1
## [1] 0.09909556 0.11564576 0.12175580 0.12757353 0.13313729 0.13847769
## [7] 0.14361964 0.14858376 0.15338730 0.15804492 0.16256915 0.16697084
## [13] 0.17125943 0.17544322 0.17952954 0.18352490 0.18743511 0.19126540
## [19] 0.19502047 0.19870459 0.20232164 0.20587515 0.20936836 0.21280424
## [25] 0.21618551 0.21951471 0.22279416 0.22602604 0.22921235 0.23235497
## [31] 0.23545565 0.23851603 0.24153763 0.24452190 0.24747019
A pure seasonal ARMA time series would be correlated at the seasonal lags only. The ACF and PACF would behave just as their nonseasonal counterparts, but at the seasonal lags, 1S, 2S, …, where S is the seasonal period (S = 12 for monthly data). Denote a seasonal AR model with 12 month seasonal period with notation \(SAR(1)_{12}\). Here is a 12 month pure seasonal model, a \(SARMA(P = 1, Q = 1)_{S = 12}\).
\[Y_t = 0.9Y_{t-12} + W_t + 0.5W_{t-12}\]
Mixed seasonal models are more common. Here is a \(SARIMA(0,0,0)\times(1,0,0)_{12}\) model.
\[Y_t = \Phi X_{t-12} + W_t + \theta W_{t-1}\]
The SAR(1) part is the relation to last year’s value \(X_{t-12}\). The MA(1) part is the relation to last month’s shock \(W_{t-1}\).
Here is how you would apply this to the AirPassengers series. Recall that AirPassengers contains monthly international air passenger totals from 1949 to 1960.
plot(AirPassengers,
main = "International Air Passengers, 1949 - 1960")
The variance is increasing, so log transform the data.
plot(log(AirPassengers),
main = "International Air Passengers, 1949 - 1960")
There is an obvious trend, so difference the data as well.
plot(diff(log(AirPassengers)),
main = "International Air Passengers, 1949 - 1960")
The log-diffed series shows seasonal persistance with cycles per year, take a seasonal difference as well.
plot(diff(diff(log(AirPassengers)), lag = 12),
main = "International Air Passengers, 1949 - 1960")
At this point we have d = 1, D = 1, S = 12. Now look at the P/ACF plots.
ddlx <- diff(diff(log(AirPassengers)), lag = 12)
acf2(ddlx)
## ACF PACF
## [1,] -0.34 -0.34
## [2,] 0.11 -0.01
## [3,] -0.20 -0.19
## [4,] 0.02 -0.13
## [5,] 0.06 0.03
## [6,] 0.03 0.03
## [7,] -0.06 -0.06
## [8,] 0.00 -0.02
## [9,] 0.18 0.23
## [10,] -0.08 0.04
## [11,] 0.06 0.05
## [12,] -0.39 -0.34
## [13,] 0.15 -0.11
## [14,] -0.06 -0.08
## [15,] 0.15 -0.02
## [16,] -0.14 -0.14
## [17,] 0.07 0.03
## [18,] 0.02 0.11
## [19,] -0.01 -0.01
## [20,] -0.12 -0.17
## [21,] 0.04 0.13
## [22,] -0.09 -0.07
## [23,] 0.22 0.14
## [24,] -0.02 -0.07
## [25,] -0.10 -0.10
## [26,] 0.05 -0.01
## [27,] -0.03 0.04
## [28,] 0.05 -0.09
## [29,] -0.02 0.05
## [30,] -0.05 0.00
## [31,] -0.05 -0.10
## [32,] 0.20 -0.02
## [33,] -0.12 0.01
## [34,] 0.08 -0.02
## [35,] -0.15 0.02
## [36,] -0.01 -0.16
## [37,] 0.05 -0.03
## [38,] 0.03 0.01
## [39,] -0.02 0.05
## [40,] -0.03 -0.08
## [41,] -0.07 -0.17
## [42,] 0.10 0.07
## [43,] -0.09 -0.10
## [44,] 0.03 -0.06
## [45,] -0.04 -0.03
## [46,] -0.04 -0.12
## [47,] 0.11 -0.01
## [48,] -0.05 -0.05
The seasonal ACF cuts off at lag 1S (S = 12). The PACF tails off at each of the lags, 1S, 2S, 3S, etc. This suggests a seasonal MA(1), so P = 0, Q = 1. Next, the non-seasonal lags tail off in both the ACF and PACF, suggesting an ARMA(1,1), so p = 1, and q = 1. Putting this together, we have a \(SARIMA(1,1,1)\times(0,1,1)_{12}\) model.
AirPassengers.fit1 <- sarima(log(AirPassengers),
p = 1, d = 1, q = 1,
P = 0, D = 1, Q = 1,
S = 12)
## initial value -3.085211
## iter 2 value -3.225399
## iter 3 value -3.276697
## iter 4 value -3.276902
## iter 5 value -3.282134
## iter 6 value -3.282524
## iter 7 value -3.282990
## iter 8 value -3.286319
## iter 9 value -3.286413
## iter 10 value -3.288141
## iter 11 value -3.288262
## iter 12 value -3.288394
## iter 13 value -3.288768
## iter 14 value -3.288969
## iter 15 value -3.289089
## iter 16 value -3.289094
## iter 17 value -3.289100
## iter 17 value -3.289100
## iter 17 value -3.289100
## final value -3.289100
## converged
## initial value -3.288388
## iter 2 value -3.288459
## iter 3 value -3.288530
## iter 4 value -3.288649
## iter 5 value -3.288753
## iter 6 value -3.288781
## iter 7 value -3.288784
## iter 7 value -3.288784
## iter 7 value -3.288784
## final value -3.288784
## converged
AirPassengers.fit1$ttable
## Estimate SE t.value p.value
## ar1 0.1960 0.2475 0.7921 0.4298
## ma1 -0.5784 0.2132 -2.7127 0.0076
## sma1 -0.5643 0.0747 -7.5544 0.0000
The model fit shows the non-seasonal AR component (ar1) is not significant, so drop it.
AirPassengers.fit1 <- sarima(log(AirPassengers),
p = 0, d = 1, q = 1,
P = 0, D = 1, Q = 1,
S = 12)
## initial value -3.086228
## iter 2 value -3.267980
## iter 3 value -3.279950
## iter 4 value -3.285996
## iter 5 value -3.289332
## iter 6 value -3.289665
## iter 7 value -3.289672
## iter 8 value -3.289676
## iter 8 value -3.289676
## iter 8 value -3.289676
## final value -3.289676
## converged
## initial value -3.286464
## iter 2 value -3.286855
## iter 3 value -3.286872
## iter 4 value -3.286874
## iter 4 value -3.286874
## iter 4 value -3.286874
## final value -3.286874
## converged
AirPassengers.fit1$ttable
## Estimate SE t.value p.value
## ma1 -0.4018 0.0896 -4.4825 0
## sma1 -0.5569 0.0731 -7.6190 0
Now forecast two years beyond the end of the data. The shading shows the 1 and 2 RMS prediction intervals.
sarima.for(log(AirPassengers),
p = 0, d = 1, q = 1,
P = 0, D = 1, Q = 1,
S = 12,
n.ahead = 24)
## $pred
## Jan Feb Mar Apr May Jun Jul
## 1961 6.110186 6.053775 6.171715 6.199300 6.232556 6.368779 6.507294
## 1962 6.206435 6.150025 6.267964 6.295550 6.328805 6.465028 6.603543
## Aug Sep Oct Nov Dec
## 1961 6.502906 6.324698 6.209008 6.063487 6.168025
## 1962 6.599156 6.420947 6.305257 6.159737 6.264274
##
## $se
## Jan Feb Mar Apr May Jun
## 1961 0.03671562 0.04278291 0.04809072 0.05286830 0.05724856 0.06131670
## 1962 0.09008475 0.09549708 0.10061869 0.10549195 0.11014981 0.11461854
## Jul Aug Sep Oct Nov Dec
## 1961 0.06513124 0.06873441 0.07215787 0.07542612 0.07855851 0.08157070
## 1962 0.11891946 0.12307018 0.12708540 0.13097758 0.13475740 0.13843405
Try this again with the unemp dataset from the astsa package.
plot(unemp,
main = "Monthly U.S. Unemployment, 1948-1978")
The dataset exhibits trending, so remove it with differencing. The P/ACF plots indicate a persistant 12 monthly seasonal trend.
unemp.d <- diff(unemp)
acf2(unemp.d)
## ACF PACF
## [1,] 0.03 0.03
## [2,] -0.15 -0.15
## [3,] -0.25 -0.25
## [4,] -0.08 -0.10
## [5,] 0.27 0.21
## [6,] -0.12 -0.23
## [7,] 0.24 0.32
## [8,] -0.11 -0.11
## [9,] -0.27 -0.30
## [10,] -0.20 -0.19
## [11,] 0.00 0.05
## [12,] 0.76 0.64
## [13,] -0.02 -0.15
## [14,] -0.21 -0.22
## [15,] -0.28 -0.06
## [16,] -0.12 -0.16
## [17,] 0.23 -0.09
## [18,] -0.14 -0.09
## [19,] 0.23 0.08
## [20,] -0.12 -0.04
## [21,] -0.24 0.00
## [22,] -0.18 0.02
## [23,] -0.01 -0.09
## [24,] 0.72 0.21
## [25,] -0.01 -0.01
## [26,] -0.19 -0.09
## [27,] -0.26 -0.01
## [28,] -0.10 -0.02
## [29,] 0.23 -0.06
## [30,] -0.13 -0.07
## [31,] 0.22 -0.05
## [32,] -0.12 -0.06
## [33,] -0.22 -0.01
## [34,] -0.16 0.04
## [35,] -0.03 -0.08
## [36,] 0.69 0.13
## [37,] -0.02 -0.04
## [38,] -0.15 0.03
## [39,] -0.24 0.02
## [40,] -0.09 -0.06
## [41,] 0.19 -0.16
## [42,] -0.12 0.02
## [43,] 0.19 -0.08
## [44,] -0.11 -0.03
## [45,] -0.23 -0.07
## [46,] -0.15 0.01
## [47,] -0.01 0.05
## [48,] 0.66 0.10
Remove the seasonal component with a 12-month lag difference. Now examine the P/ACF plots to fit an SARIMA model.
unemp.dd = diff(unemp.d, lag = 12)
acf2(unemp.dd)
## ACF PACF
## [1,] 0.21 0.21
## [2,] 0.33 0.29
## [3,] 0.15 0.05
## [4,] 0.17 0.05
## [5,] 0.10 0.01
## [6,] 0.06 -0.02
## [7,] -0.06 -0.12
## [8,] -0.02 -0.03
## [9,] -0.09 -0.05
## [10,] -0.17 -0.15
## [11,] -0.08 0.02
## [12,] -0.48 -0.43
## [13,] -0.18 -0.02
## [14,] -0.16 0.15
## [15,] -0.11 0.03
## [16,] -0.15 -0.04
## [17,] -0.09 -0.01
## [18,] -0.09 0.00
## [19,] 0.03 0.01
## [20,] -0.01 0.01
## [21,] 0.02 -0.01
## [22,] -0.02 -0.16
## [23,] 0.01 0.01
## [24,] -0.02 -0.27
## [25,] 0.09 0.05
## [26,] -0.05 -0.01
## [27,] -0.01 -0.05
## [28,] 0.03 0.05
## [29,] 0.08 0.09
## [30,] 0.01 -0.04
## [31,] 0.03 0.02
## [32,] -0.05 -0.07
## [33,] 0.01 -0.01
## [34,] 0.02 -0.08
## [35,] -0.06 -0.08
## [36,] -0.02 -0.23
## [37,] -0.12 -0.08
## [38,] 0.01 0.06
## [39,] -0.03 -0.07
## [40,] -0.03 -0.01
## [41,] -0.10 0.03
## [42,] -0.02 -0.03
## [43,] -0.13 -0.11
## [44,] 0.00 -0.04
## [45,] -0.06 0.01
## [46,] 0.01 0.00
## [47,] 0.02 -0.03
## [48,] 0.11 -0.04
The seasonal component cuts off at lag 1S in the ACF plot tails off 2S, 3S, etc., so this is an MA(1) and the seasonal parameters will be (0,1,1). Looking at the first 12 months, the ACF tails of and the PACF cuts off at lag 2, so this is an AR(2) and the nonseasonal parameters will be (2,1,0). The SARIMA model is \(SARIMA(2,1,0)\times(0,1,1)_{12}\).
Fit the \(SARIMA(2,1,0)\times(0,1,1)_{12}\) model. The ACF of the residuals is are within the band, the normal Q-Q plot is linear, and the p-values for the Ljung-Box statistic are above the dashed line. This is a good fit.
sarima(unemp, p = 2, d = 1, q = 0, P = 0, D = 1, Q = 1, S = 12)
## initial value 3.340809
## iter 2 value 3.105512
## iter 3 value 3.086631
## iter 4 value 3.079778
## iter 5 value 3.069447
## iter 6 value 3.067659
## iter 7 value 3.067426
## iter 8 value 3.067418
## iter 8 value 3.067418
## final value 3.067418
## converged
## initial value 3.065481
## iter 2 value 3.065478
## iter 3 value 3.065477
## iter 3 value 3.065477
## iter 3 value 3.065477
## final value 3.065477
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), include.mean = !no.constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ar2 sma1
## 0.1351 0.2464 -0.6953
## s.e. 0.0513 0.0515 0.0381
##
## sigma^2 estimated as 449.6: log likelihood = -1609.91, aic = 3227.81
##
## $degrees_of_freedom
## [1] 356
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.1351 0.0513 2.6326 0.0088
## ar2 0.2464 0.0515 4.7795 0.0000
## sma1 -0.6953 0.0381 -18.2362 0.0000
##
## $AIC
## [1] 8.723811
##
## $AICc
## [1] 8.723988
##
## $BIC
## [1] 8.765793
Now forecast the series for three years.
sarima.for(unemp, p = 2, d = 1, q = 0, P = 0, D = 1, Q = 1, S = 12, n.ahead = 36)
## $pred
## Jan Feb Mar Apr May Jun Jul
## 1979 676.4664 685.1172 653.2388 585.6939 553.8813 664.4072 647.0657
## 1980 683.3045 687.7649 654.8658 586.1507 553.9285 664.1108 646.6220
## 1981 682.6406 687.0977 654.1968 585.4806 553.2579 663.4398 645.9508
## Aug Sep Oct Nov Dec
## 1979 611.0828 594.6414 569.3997 587.5801 581.1833
## 1980 610.5345 594.0427 568.7684 586.9320 580.5249
## 1981 609.8632 593.3713 568.0970 586.2606 579.8535
##
## $se
## Jan Feb Mar Apr May Jun Jul
## 1979 21.20465 32.07710 43.70167 53.66329 62.85364 71.12881 78.73590
## 1980 116.99599 124.17344 131.51281 138.60466 145.49706 152.12863 158.52302
## 1981 194.25167 201.10648 208.17066 215.11503 221.96039 228.64285 235.16874
## Aug Sep Oct Nov Dec
## 1979 85.75096 92.28663 98.41329 104.19488 109.67935
## 1980 164.68623 170.63839 176.39520 181.97333 187.38718
## 1981 241.53258 247.74268 253.80549 259.72970 265.52323
Try again with the chicken dataset from the astsa package.
plot(chicken,
main = "Monthly Whole bird spot price, 8/2001 - 7/2016",
ylab = "US cents per pound")
First remove the trend component.
chicken.d <- diff(chicken)
plot(chicken.d,
main = "Monthly Whole bird spot price, 8/2001 - 7/2016",
ylab = "diff(US cents per pound)")
Examine the P/ACF plots. The nonseasonal component tails off in the ACF and cuts off at lag 2 in the PACF, an AR(2). The seasonal component tails off in the ACF and cuts off at lag 1, an AR(1). The SARIMA model is \(SARIMA(2,1,0)\times(1,0,0)_{12}\).
acf2(chicken.d)
## ACF PACF
## [1,] 0.72 0.72
## [2,] 0.39 -0.29
## [3,] 0.09 -0.14
## [4,] -0.07 0.03
## [5,] -0.16 -0.10
## [6,] -0.20 -0.06
## [7,] -0.27 -0.19
## [8,] -0.23 0.12
## [9,] -0.11 0.10
## [10,] 0.09 0.16
## [11,] 0.26 0.09
## [12,] 0.33 0.00
## [13,] 0.20 -0.22
## [14,] 0.07 0.03
## [15,] -0.03 0.03
## [16,] -0.10 -0.11
## [17,] -0.19 -0.09
## [18,] -0.25 0.01
## [19,] -0.29 -0.03
## [20,] -0.20 0.07
## [21,] -0.08 -0.04
## [22,] 0.08 0.06
## [23,] 0.16 -0.05
## [24,] 0.18 0.02
## [25,] 0.08 -0.14
## [26,] -0.06 -0.19
## [27,] -0.21 -0.13
## [28,] -0.31 -0.06
## [29,] -0.40 -0.08
## [30,] -0.40 -0.05
## [31,] -0.33 0.01
## [32,] -0.18 0.03
## [33,] 0.02 0.10
## [34,] 0.20 0.02
## [35,] 0.30 -0.01
## [36,] 0.35 0.09
## [37,] 0.26 -0.12
## [38,] 0.13 0.01
## [39,] -0.02 -0.01
## [40,] -0.14 -0.05
## [41,] -0.23 0.02
## [42,] -0.21 0.12
## [43,] -0.18 -0.05
## [44,] -0.11 -0.13
## [45,] -0.03 -0.07
## [46,] 0.08 0.01
## [47,] 0.21 0.14
## [48,] 0.33 0.05
Fit the \(SARIMA(2,1,0)\times(1,0,0)_{12}\) model.
sarima(chicken, p = 2, d = 1, q = 0, P = 1, D = 0, Q = 0, S = 12)
## initial value 0.015039
## iter 2 value -0.226398
## iter 3 value -0.412955
## iter 4 value -0.460882
## iter 5 value -0.470787
## iter 6 value -0.471082
## iter 7 value -0.471088
## iter 8 value -0.471090
## iter 9 value -0.471092
## iter 10 value -0.471095
## iter 11 value -0.471095
## iter 12 value -0.471096
## iter 13 value -0.471096
## iter 14 value -0.471096
## iter 15 value -0.471097
## iter 16 value -0.471097
## iter 16 value -0.471097
## iter 16 value -0.471097
## final value -0.471097
## converged
## initial value -0.473585
## iter 2 value -0.473664
## iter 3 value -0.473721
## iter 4 value -0.473823
## iter 5 value -0.473871
## iter 6 value -0.473885
## iter 7 value -0.473886
## iter 8 value -0.473886
## iter 8 value -0.473886
## iter 8 value -0.473886
## final value -0.473886
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ar2 sar1 constant
## 0.9154 -0.2494 0.3237 0.2353
## s.e. 0.0733 0.0739 0.0715 0.1973
##
## sigma^2 estimated as 0.3828: log likelihood = -169.16, aic = 348.33
##
## $degrees_of_freedom
## [1] 175
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.9154 0.0733 12.4955 0.0000
## ar2 -0.2494 0.0739 -3.3728 0.0009
## sar1 0.3237 0.0715 4.5238 0.0000
## constant 0.2353 0.1973 1.1923 0.2347
##
## $AIC
## [1] 1.945971
##
## $AICc
## [1] 1.947256
##
## $BIC
## [1] 2.035005
Now forecast 5 years.
sarima.for(chicken, p = 2, d = 1, q = 0, P = 1, D = 0, Q = 0, S = 12, n.ahead = 60)
## $pred
## Jan Feb Mar Apr May Jun Jul
## 2016
## 2017 110.5358 110.5612 110.5480 110.7055 111.0047 111.1189 111.1552
## 2018 111.8108 111.9782 112.1330 112.3431 112.5991 112.7952 112.9661
## 2019 114.1331 114.3464 114.5556 114.7827 115.0247 115.2473 115.4617
## 2020 116.7942 117.0224 117.2492 117.4819 117.7193 117.9505 118.1790
## 2021 119.5651 119.7980 120.0306 120.2650 120.5010 120.7350 120.9681
## Aug Sep Oct Nov Dec
## 2016 111.0907 110.8740 110.6853 110.5045 110.5527
## 2017 111.1948 111.2838 111.3819 111.4825 111.6572
## 2018 113.1380 113.3260 113.5168 113.7085 113.9242
## 2019 115.6765 115.8965 116.1174 116.3386 116.5675
## 2020 118.4077 118.6380 118.8686 119.0993 119.3326
## 2021
##
## $se
## Jan Feb Mar Apr May Jun
## 2016
## 2017 3.7414959 4.1793190 4.5747009 4.9373266 5.2742129 5.5903499
## 2018 8.2010253 8.5605811 8.9054714 9.2372195 9.5572539 9.8667955
## 2019 12.0038164 12.2921541 12.5738417 12.8492868 13.1188976 13.3830477
## 2020 15.1557253 15.3959082 15.6323906 15.8653300 16.0948844 16.3212022
## 2021 17.8397890 18.0473081 18.2524651 18.4553364 18.6559977 18.8545213
## Jul Aug Sep Oct Nov Dec
## 2016 0.6187194 1.3368594 2.0462419 2.6867986 3.2486625
## 2017 5.8893133 6.2367345 6.6253573 7.0309771 7.4344077 7.8255932
## 2018 10.1668604 10.4736807 10.7857727 11.0980056 11.4063211 11.7085266
## 2019 13.6420693 13.9002670 14.1573839 14.4122197 14.6638269 14.9117124
## 2020 16.5444204 16.7657634 16.9852163 17.2025022 17.4174076 17.6298379
## 2021 19.0509752
Here is one more. The birth dataset from the astsa package contins monthly live births in thousands for the United States, 1948-1979.
plot(birth,
main = "Monthly Live Births in U.S., 1948 - 1979")
Remove the trending by diffing.
birth.d <- diff(birth)
plot(birth.d)
Examine the P/ACF. Notice the seasonal persistance (the spikes in the lags of 1, 2, 3, and 4).
acf2(birth.d)
## ACF PACF
## [1,] -0.32 -0.32
## [2,] 0.16 0.06
## [3,] -0.08 -0.01
## [4,] -0.19 -0.25
## [5,] 0.09 -0.03
## [6,] -0.28 -0.26
## [7,] 0.06 -0.17
## [8,] -0.19 -0.29
## [9,] -0.05 -0.35
## [10,] 0.17 -0.16
## [11,] -0.26 -0.59
## [12,] 0.82 0.57
## [13,] -0.28 0.13
## [14,] 0.17 0.11
## [15,] -0.07 0.13
## [16,] -0.18 0.09
## [17,] 0.08 0.00
## [18,] -0.28 0.00
## [19,] 0.07 0.05
## [20,] -0.18 0.04
## [21,] -0.05 -0.07
## [22,] 0.16 -0.10
## [23,] -0.24 -0.20
## [24,] 0.78 0.19
## [25,] -0.27 0.01
## [26,] 0.19 0.05
## [27,] -0.08 0.07
## [28,] -0.17 0.07
## [29,] 0.07 -0.02
## [30,] -0.29 -0.06
## [31,] 0.07 -0.02
## [32,] -0.15 0.09
## [33,] -0.04 0.03
## [34,] 0.14 -0.06
## [35,] -0.24 -0.16
## [36,] 0.75 0.03
## [37,] -0.23 0.08
## [38,] 0.16 -0.10
## [39,] -0.08 -0.03
## [40,] -0.15 0.07
## [41,] 0.05 -0.04
## [42,] -0.25 0.06
## [43,] 0.06 0.04
## [44,] -0.18 -0.07
## [45,] -0.03 -0.06
## [46,] 0.15 0.02
## [47,] -0.22 -0.04
## [48,] 0.72 0.10
Remove the seasonal persistance with a 12 month lag. The nonseasonal ACF cuts off at 1 and the PACF tails off, so MA(1). The nonseasonal ACF cuts off at 1 and the PACF cuts off at 1, so MA(1). The model is \(SARIMA(0,1,1)\times(0,1,1)_{12}\).
birth.dd <- diff(birth.d, lag = 12)
acf2(birth.dd, max.lag = 60)
## ACF PACF
## [1,] -0.30 -0.30
## [2,] -0.09 -0.20
## [3,] -0.09 -0.21
## [4,] 0.00 -0.14
## [5,] 0.07 -0.03
## [6,] 0.03 0.02
## [7,] -0.07 -0.06
## [8,] -0.04 -0.08
## [9,] 0.11 0.06
## [10,] 0.04 0.08
## [11,] 0.13 0.23
## [12,] -0.43 -0.32
## [13,] 0.14 -0.06
## [14,] -0.01 -0.13
## [15,] 0.03 -0.13
## [16,] 0.01 -0.11
## [17,] 0.02 0.02
## [18,] 0.00 0.06
## [19,] 0.03 0.04
## [20,] -0.07 -0.10
## [21,] -0.01 0.02
## [22,] 0.00 0.00
## [23,] 0.06 0.17
## [24,] -0.01 -0.13
## [25,] -0.12 -0.14
## [26,] 0.17 0.07
## [27,] -0.04 -0.04
## [28,] 0.03 -0.02
## [29,] -0.05 0.02
## [30,] -0.09 -0.06
## [31,] -0.01 -0.07
## [32,] 0.19 0.05
## [33,] -0.03 0.07
## [34,] -0.09 -0.06
## [35,] -0.02 0.05
## [36,] -0.04 -0.16
## [37,] 0.17 -0.01
## [38,] -0.14 -0.04
## [39,] 0.03 -0.01
## [40,] -0.05 -0.03
## [41,] 0.03 -0.01
## [42,] 0.10 0.01
## [43,] 0.00 0.00
## [44,] -0.10 0.03
## [45,] -0.03 -0.02
## [46,] 0.06 -0.07
## [47,] 0.02 0.05
## [48,] 0.01 -0.11
## [49,] -0.01 0.05
## [50,] 0.06 0.06
## [51,] -0.08 -0.03
## [52,] 0.03 -0.03
## [53,] 0.01 0.04
## [54,] -0.02 0.02
## [55,] -0.01 -0.04
## [56,] 0.00 -0.01
## [57,] -0.07 -0.13
## [58,] 0.17 0.07
## [59,] -0.04 0.07
## [60,] -0.01 -0.05
Fit an \(SARIMA(0,1,1)\times(0,1,1)_{12}\). The p-values for the Ljung-Box statistic are all below the band, i.e., the residuals are not white noise. There is additional correlation.
sarima(birth, p = 0, d = 1, q = 1, P = 0, D = 1, Q = 1, S = 12)
## initial value 2.219164
## iter 2 value 2.013310
## iter 3 value 1.988107
## iter 4 value 1.980026
## iter 5 value 1.967594
## iter 6 value 1.965384
## iter 7 value 1.965049
## iter 8 value 1.964993
## iter 9 value 1.964992
## iter 9 value 1.964992
## iter 9 value 1.964992
## final value 1.964992
## converged
## initial value 1.951264
## iter 2 value 1.945867
## iter 3 value 1.945729
## iter 4 value 1.945723
## iter 5 value 1.945723
## iter 5 value 1.945723
## iter 5 value 1.945723
## final value 1.945723
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), include.mean = !no.constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ma1 sma1
## -0.4734 -0.7861
## s.e. 0.0598 0.0451
##
## sigma^2 estimated as 47.4: log likelihood = -1211.28, aic = 2428.56
##
## $degrees_of_freedom
## [1] 358
##
## $ttable
## Estimate SE t.value p.value
## ma1 -0.4734 0.0598 -7.9097 0
## sma1 -0.7861 0.0451 -17.4227 0
##
## $AIC
## [1] 6.545975
##
## $AICc
## [1] 6.546062
##
## $BIC
## [1] 6.577399
Try adding an additional AR() parameter (nonseasonal) to the model to account for the additional correlation. Much better!
sarima(birth, p = 1, d = 1, q = 1, P = 0, D = 1, Q = 1, S = 12)
## initial value 2.218186
## iter 2 value 2.032584
## iter 3 value 1.982464
## iter 4 value 1.975643
## iter 5 value 1.971721
## iter 6 value 1.967284
## iter 7 value 1.963840
## iter 8 value 1.961106
## iter 9 value 1.960849
## iter 10 value 1.960692
## iter 11 value 1.960683
## iter 12 value 1.960675
## iter 13 value 1.960672
## iter 13 value 1.960672
## iter 13 value 1.960672
## final value 1.960672
## converged
## initial value 1.940459
## iter 2 value 1.934425
## iter 3 value 1.932752
## iter 4 value 1.931750
## iter 5 value 1.931074
## iter 6 value 1.930882
## iter 7 value 1.930860
## iter 8 value 1.930859
## iter 8 value 1.930859
## final value 1.930859
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), include.mean = !no.constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ma1 sma1
## 0.3038 -0.7006 -0.8000
## s.e. 0.0865 0.0604 0.0441
##
## sigma^2 estimated as 45.91: log likelihood = -1205.93, aic = 2419.85
##
## $degrees_of_freedom
## [1] 357
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.3038 0.0865 3.5104 5e-04
## ma1 -0.7006 0.0604 -11.5984 0e+00
## sma1 -0.8000 0.0441 -18.1302 0e+00
##
## $AIC
## [1] 6.522519
##
## $AICc
## [1] 6.522695
##
## $BIC
## [1] 6.564418