Time series regression models
Regression: \(Y_i = \beta X_i + \epsilon_i\), where \(\epsilon_i\) is white noise - assumptions about errors: - independent - normally distributed - homoscedastic - i.e., white noise - white noise: - independent normals with common variance - is basic building block of time series
AutoRegression: \(X_t = \phi X_{t-1} + \epsilon_t\) (\(\epsilon_t\) is white noise) - assuming that \(t\) and \(t-1\) are not correlated - may lead to bad forecasts
Moving Average: \(\epsilon_t = W_t + \theta W_{t-1}\) (\(W_t\) is white noise)
- moving average assumes \(t\) and \(t-1\) are correlated
ARMA: \(X_t = \phi X_{t-1} + W_t + \theta W_{t-1}\) - AutoRegression + Moving Average - i.e., AutoRegression with autocorrelated errors
Data play
In the video, you saw various types of data. In this exercise, you will plot additional time series data and compare them to what you saw in the video. It is useful to think about how these time series compare to the series in the video. In particular, concentrate on the type of trend, seasonality or periodicity, and homoscedasticity.
Before you use a data set for the first time, you should use the help system to see the details of the data. For example, use help(AirPassengers)
or ?AirPassengers
to see the details of the series.
library(pacman)
p_load(astsa, xts)
# View a detailed description of AirPassengers
help(AirPassengers)
## starting httpd help server ... done
# Plot AirPassengers
plot(AirPassengers)
# Plot the DJIA daily closings
plot(astsa::djia$Close)
# Plot the Southern Oscillation Index (soi)
plot(astsa::soi)
As you can see, the AirPassengers
dataset contains monthly information on airline passengers from 1949 through 1960. Note that when you plot ts objects using plot()
, the data will automatically be plotted over time.
The AirPassengers
data show a handful of important qualities, including seasonality, trend, and heteroscedasticity, which distinguish the data from standard white noise.
Stationarity and nonstationarity
A time series is stationary when it is “stable”, meaning: - the mean is constant over time (no trend) - the correlation structure remains constant over time
Stationarity
Given data, \(x_1,...,x_n\) we can estimate by averaging
For example, if the mean is constant, we can estimate it by the sample average \(\bar{x}\)
Pairs can be used to estimate correlation on different lags: - \((x_1,x_2), (x_2,x_3), (x_3,x_4)\), for lag 1 - \((x_1,x_3), (x_2,x_4), (x_3,x_5)\), for lag 2
Random Walk trend
Not stationary, but differenced data are stationary - e.g., \(X_t\) for global temperatures trends upwards - \(X_t-X_{t-1}\) is stationary with random movement
Trend Stationarity
Stationary around a trend, differencing still works like Random Walk!
Nonstationarity in trend and variability - First log (stabilize variance and/or linearize trend), then difference - \(X_t\) - \(log(X_t)\) - \(log(X_t) - log(X_{t-1})\)
Differencing
As seen in the video, when a time series is trend stationary, it will have stationary behavior around a trend. A simple example is \(Y_t=\alpha+\beta t+Xt\) where \(X_t\) is stationary.
A different type of model for trend is random walk, which has the form \(X_t=X_{t−1}+W_t\), where \(W_t\) is white noise. It is called a random walk because at time \(t\) the process is where it was at time \(t−1\) plus a completely random movement. For a random walk with drift, a constant is added to the model and will cause the random walk to drift in the direction (positive or negative) of the drift.
We simulated and plotted data from these models. Note the difference in the behavior of the two models.
In both cases, simple differencing can remove the trend and coerce the data to stationarity. Differencing looks at the difference between the value of a time series at a certain point in time and its preceding value. That is, \(X_t−X_{t−1}\) is computed.
To check that it works, you will difference each generated time series and plot the detrended series. If a time series is in x
, then diff(x)
will have the detrended series obtained by differencing the data. To plot the detrended series, simply use plot(diff(x))
.
# Plot detrended y (trend stationary)
plot(diff(y))
# Plot detrended x (random walk)
plot(diff(x))
As you can see, differencing both your trend stationary data and your random walk data has the effect of removing the trends, despite the important differences between the two datasets.
Detrending data
As you have seen in the previous exercise, differencing is generally good for removing trend from time series data. Recall that differencing looks at the difference between the value of a time series at a certain point in time and its preceding value.
In this exercise, you will use differencing diff() to detrend and plot real time series data.
# Plot globtemp and detrended globtemp
par(mfrow = c(2,1))
plot(astsa::globtemp)
plot(diff(astsa::globtemp))
# Plot cmort and detrended cmort
par(mfrow = c(2,1))
plot(astsa::cmort)
plot(diff(astsa::cmort))
Differencing is a great way to remove trends from your data.
Dealing with trend and heteroscedasticity Here, we will coerce nonstationary data to stationarity by calculating the return or growth rate as follows.
Often time series are generated as
\(X_t=(1+p_t)X_{t−1}\)
meaning that the value of the time series observed at time \(t\) equals the value observed at time \(t−1\) and a small percent change \(p_t\) at time \(t\).
A simple deterministic example is putting money into a bank with a fixed interest \(p\). In this case, \(X_t\) is the value of the account at time period \(t\) with an initial deposit of \(X_0\).
Typically, \(p_t\) is referred to as the return or growth rate of a time series, and this process is often stable.
For reasons that are outside the scope of this course, it can be shown that the growth rate \(p_t\) can be approximated by
\(Y_t = logXt−logX_{t−1} ≈ p_t\)
In R, \(p_t\) is often calculated as diff(log(x))
and plotting it can be done in one line plot(diff(log(x)))
.
# astsa and xts are preloaded
# Plot GNP series (gnp) and its growth rate
par(mfrow = c(2,1))
plot(astsa::gnp)
plot(diff(log(astsa::gnp)))
# Plot the DJIA closings (djia$Close) and its returns
par(mfrow = c(2,1))
plot(astsa::djia$Close)
plot(diff(log((astsa::djia$Close))))
Once again, by combining a few commands to manipulate your data, you can coerce otherwise nonstationary data to stationarity.
Stationary time series: ARMA
Wold Decomposition
Wold proved that any stationary time series may be represented as a linear combination of white noise:
\(X_t = W_t + a_1W_{t-1} + a_2W_{t-2} + ...\)
for constants \(a_1, a_2, ...\)
Any ARMA model has this form, which means they are suited to modeling time series
Note: Special case of MA(q) is already in this form, where constants are 0 after q-th term.
**Generating ARMA using arima.sim() - Basic syntax
arima.sim(model, n, ...)
model
is a list with order of the model as c(p, d, q)
and the coefficients
Simulating ARMA models
As we saw in the video, any stationary time series can be written as a linear combination of white noise. In addition, any ARMA model has this form, so it is a good choice for modeling stationary time series.
R provides a simple function called arima.sim()
to generate data from an ARMA model. For example, the syntax for generating 100 observations from an MA(1) with parameter .9 is arima.sim(model = list(order = c(0, 0, 1), ma = .9 ), n = 100)
. You can also use order = c(0, 0, 0)
to generate white noise.
In this exercise, you will generate data from various ARMA models. For each command, generate 200 observations and plot the result.
# Generate and plot white noise
WN <- arima.sim(model = list(order = c(0, 0, 0)), n = 200)
plot(WN)
# Generate and plot an MA(1) with parameter .9 by filtering the noise
MA <- arima.sim(model = list(order = c(0, 0, 1), ma = .9), n = 200)
plot(MA)
# Generate and plot an AR(1) with parameters 1.5 and -.75
AR <- arima.sim(model = list(order = c(2, 0, 0), ar = c(1.5, -.75)), n = 200)
plot(AR)
The arima.sim()
command is a very useful way to quickly simulate time series data.
AR and MA models
Identifying model types
Estimation
Fitting an AR(1) model
Recall that you use the ACF and PACF pair to help identify the orders \(p\) and \(q\) of an ARMA model.
In this exercise, you will generate data from the AR(1) model,
\(X_t=.9X_{t−1}+W_t\),
look at the simulated data and the sample ACF and PACF pair to determine the order. Then, you will fit the model and compare the estimated parameters to the true parameters.
Throughout this course, you will be using sarima()
from the astsa package to easily fit models to data. The command produces a residual diagnostic graphic that can be ignored until diagnostics is discussed later in the chapter.
# Generate 100 observations from the AR(1) model
x <- arima.sim(model = list(order = c(1, 0, 0), ar = .9), n = 100)
# Plot the generated data
plot(x)
# Plot the sample P/ACF pair
astsa::acf2(x)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF 0.88 0.80 0.72 0.64 0.56 0.50 0.43 0.37 0.31 0.28 0.2 0.13 0.05
## PACF 0.88 0.12 -0.03 -0.02 -0.08 0.04 -0.02 -0.05 -0.01 0.06 -0.2 -0.04 -0.12
## [,14] [,15] [,16] [,17] [,18] [,19] [,20]
## ACF -0.03 -0.07 -0.12 -0.14 -0.18 -0.21 -0.21
## PACF -0.09 0.12 -0.06 0.06 -0.09 -0.04 0.13
# Fit an AR(1) to the data and examine the t-table
astsa::sarima(x, p = 1, d = 0, q = 0)
## initial value 0.846956
## iter 2 value 0.041752
## iter 3 value 0.040535
## iter 4 value 0.040504
## iter 5 value 0.040462
## iter 6 value 0.040408
## iter 7 value 0.040389
## iter 8 value 0.040386
## iter 9 value 0.040368
## iter 10 value 0.040368
## iter 10 value 0.040368
## iter 10 value 0.040368
## final value 0.040368
## converged
## initial value 0.062199
## iter 2 value 0.061300
## iter 3 value 0.059627
## iter 4 value 0.059327
## iter 5 value 0.059084
## iter 6 value 0.058845
## iter 7 value 0.058767
## iter 8 value 0.058584
## iter 9 value 0.058577
## iter 10 value 0.058575
## iter 11 value 0.058564
## iter 12 value 0.058564
## iter 13 value 0.058564
## iter 13 value 0.058564
## iter 13 value 0.058564
## final value 0.058564
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 xmean
## 0.903 0.4713
## s.e. 0.043 1.0156
##
## sigma^2 estimated as 1.105: log likelihood = -147.75, aic = 301.5
##
## $degrees_of_freedom
## [1] 98
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.9030 0.0430 21.0132 0.0000
## xmean 0.4713 1.0156 0.4641 0.6436
##
## $AIC
## [1] 3.015005
##
## $AICc
## [1] 3.016242
##
## $BIC
## [1] 3.09316
As you can see, the sarima()
command provides extensive output to understand the results of your model fit. What do you glean from this output?
Fitting an AR(2) model
For this exercise, we generated data from the AR(2) model,
\(X_t=1.5X_{t−1}−.75X_{t−2}+W_t\)
Look at the simulated dataand the sample ACF and PACF pair to determine the model order. Then fit the model and compare the estimated parameters to the true parameters.
x <- arima.sim(model = list(order = c(2, 0, 0), ar = c(1.5, -.75)), n = 200)
# Plot x
plot(x)
# Plot the sample P/ACF of x
astsa::acf2(x)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## ACF 0.85 0.53 0.16 -0.15 -0.34 -0.37 -0.28 -0.11 0.08 0.25 0.36 0.39
## PACF 0.85 -0.73 -0.02 0.02 0.05 0.08 -0.02 0.09 0.11 0.11 0.01 0.01
## [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF 0.33 0.22 0.08 -0.07 -0.19 -0.27 -0.29 -0.23 -0.12 0.02 0.16 0.26
## PACF 0.04 0.01 -0.03 -0.10 -0.03 -0.10 0.01 0.06 -0.03 0.02 0.10 -0.02
## [,25]
## ACF 0.29
## PACF 0.00
# Fit an AR(2) to the data and examine the t-table
astsa::sarima(x, p = 2, d = 0, q = 0)
## initial value 1.083715
## iter 2 value 0.960271
## iter 3 value 0.533310
## iter 4 value 0.301466
## iter 5 value 0.102656
## iter 6 value 0.038199
## iter 7 value 0.011531
## iter 8 value 0.003597
## iter 9 value 0.003589
## iter 10 value 0.003574
## iter 11 value 0.003572
## iter 12 value 0.003570
## iter 13 value 0.003570
## iter 14 value 0.003570
## iter 15 value 0.003570
## iter 15 value 0.003570
## final value 0.003570
## converged
## initial value 0.010372
## iter 2 value 0.010350
## iter 3 value 0.010329
## iter 4 value 0.010325
## iter 5 value 0.010325
## iter 6 value 0.010325
## iter 6 value 0.010325
## iter 6 value 0.010325
## final value 0.010325
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ar2 xmean
## 1.5035 -0.7536 -0.1930
## s.e. 0.0463 0.0465 0.2836
##
## sigma^2 estimated as 1.006: log likelihood = -285.85, aic = 579.71
##
## $degrees_of_freedom
## [1] 197
##
## $ttable
## Estimate SE t.value p.value
## ar1 1.5035 0.0463 32.4714 0.0000
## ar2 -0.7536 0.0465 -16.2154 0.0000
## xmean -0.1930 0.2836 -0.6806 0.4969
##
## $AIC
## [1] 2.898527
##
## $AICc
## [1] 2.89914
##
## $BIC
## [1] 2.964494
As you can see from the t-table output, the estimates produced by the sarima()
command are close to the true parameters.
Fitting an MA(1) model
In this exercise, we generated data from an MA(1) model,
\(X_t=W_t−.8W_{t−1}\)
Look at the simulated data and the sample ACF and PACF to determine the order based on the table given in the first exercise. Then fit the model.
Recall that for pure MA(q) models, the theoretical ACF will cut off at lag q while the PACF will tail off.
x <- arima.sim(model = list(order = c(0, 0, 1), ma = -.8), n = 100)
# Plot x
plot(x)
# Plot the sample P/ACF of x
acf2(x)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## ACF -0.54 0.06 0.02 -0.07 0.00 0.14 -0.10 -0.05 0.11 0.04 -0.16 0.05
## PACF -0.54 -0.33 -0.18 -0.21 -0.24 -0.01 -0.01 -0.14 -0.03 0.19 0.01 -0.13
## [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
## ACF 0.04 -0.07 0.20 -0.20 0.02 0.07 -0.01 -0.03
## PACF -0.01 -0.04 0.15 -0.02 -0.05 0.03 0.03 -0.02
# Fit an MA(1) to the data and examine the t-table
sarima(x, p = 0, d = 0, q = 1)
## initial value 0.206958
## iter 2 value -0.023059
## iter 3 value -0.067090
## iter 4 value -0.070087
## iter 5 value -0.077113
## iter 6 value -0.077915
## iter 7 value -0.078558
## iter 8 value -0.079020
## iter 9 value -0.079039
## iter 10 value -0.079043
## iter 10 value -0.079043
## iter 10 value -0.079043
## final value -0.079043
## converged
## initial value -0.074439
## iter 2 value -0.074466
## iter 3 value -0.074533
## iter 4 value -0.074533
## iter 4 value -0.074533
## iter 4 value -0.074533
## final value -0.074533
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ma1 xmean
## -0.7928 0.0131
## s.e. 0.0555 0.0199
##
## sigma^2 estimated as 0.853: log likelihood = -134.44, aic = 274.88
##
## $degrees_of_freedom
## [1] 98
##
## $ttable
## Estimate SE t.value p.value
## ma1 -0.7928 0.0555 -14.2939 0.0000
## xmean 0.0131 0.0199 0.6585 0.5118
##
## $AIC
## [1] 2.74881
##
## $AICc
## [1] 2.750047
##
## $BIC
## [1] 2.826965
Once again, the parameter estimates produced by sarima()
come quite close to the input specified when the data was created (-.8)
AR and MA together: ARMA
\(X_t = \phi X_{t-1} + W_t + \theta W_{t-1}\) autoregression with correlated errors
Fitting an ARMA model
You are now ready to merge the AR model and the MA model into the ARMA model. We generated data from the ARMA(2,1) model,
\(X_t= X_{t−1} − .9X_{t−2} + W_t + .8W_{t−1}\),
x <- arima.sim(model = list(order = c(2, 0, 1), ar = c(1, -.9), ma = .8), n = 250).
Look at the simulated data and the sample ACF and PACF pair to determine a possible model.
Recall that for \(ARMA(p,q)\) models, both the theoretical ACF and PACF tail off. In this case, the orders are difficult to discern from data and it may not be clear if either the sample ACF or sample PACF is cutting off or tailing off. In this case, you know the actual model orders, so fit an ARMA(2,1) to the generated data. General modeling strategies will be discussed further in the course.
x <- arima.sim(model = list(order = c(2, 0, 1), ar = c(1, -.9), ma = .8), n = 250)
# astsa is preloaded
# Plot x
plot(x)
# Plot the sample P/ACF of x
astsa::acf2(x)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## ACF 0.56 -0.31 -0.76 -0.47 0.18 0.53 0.32 -0.16 -0.41 -0.21 0.19 0.36
## PACF 0.56 -0.90 0.48 -0.36 0.15 -0.20 -0.02 -0.01 0.06 0.05 -0.02 -0.04
## [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF 0.15 -0.21 -0.34 -0.13 0.19 0.29 0.09 -0.18 -0.23 -0.06 0.14 0.16
## PACF 0.00 -0.01 0.02 -0.01 -0.02 -0.06 0.06 -0.01 -0.03 -0.03 -0.05 0.05
## [,25] [,26]
## ACF 0.01 -0.14
## PACF -0.10 0.03
# Fit an ARMA(2,1) to the data and examine the t-table
astsa::sarima(x, p = 2, d = 0, q = 1)
## initial value 1.267650
## iter 2 value 0.509271
## iter 3 value 0.231477
## iter 4 value 0.225223
## iter 5 value 0.224361
## iter 6 value 0.221953
## iter 7 value 0.212395
## iter 8 value 0.193387
## iter 9 value 0.137473
## iter 10 value 0.087587
## iter 11 value 0.043601
## iter 12 value 0.021179
## iter 13 value 0.018108
## iter 14 value 0.016541
## iter 15 value 0.013846
## iter 16 value 0.013828
## iter 17 value 0.013805
## iter 18 value 0.013805
## iter 19 value 0.013805
## iter 19 value 0.013805
## final value 0.013805
## converged
## initial value 0.021221
## iter 2 value 0.021158
## iter 3 value 0.021125
## iter 4 value 0.021105
## iter 5 value 0.021088
## iter 6 value 0.021087
## iter 7 value 0.021087
## iter 7 value 0.021087
## iter 7 value 0.021087
## final value 0.021087
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ar2 ma1 xmean
## 0.972 -0.8534 0.7595 0.0502
## s.e. 0.033 0.0324 0.0378 0.1278
##
## sigma^2 estimated as 1.021: log likelihood = -360.01, aic = 730.01
##
## $degrees_of_freedom
## [1] 246
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.9720 0.0330 29.4939 0.0000
## ar2 -0.8534 0.0324 -26.3164 0.0000
## ma1 0.7595 0.0378 20.0785 0.0000
## xmean 0.0502 0.1278 0.3929 0.6947
##
## $AIC
## [1] 2.920052
##
## $AICc
## [1] 2.920705
##
## $BIC
## [1] 2.990481
As you can see, the sarima()
command can estimate parameter values for many different types of models, including AR, MA, and ARMA.
Identify an ARMA model
Look at (1) the data plots and (2) the sample ACF and PACF of the logged and differenced varve series:
dl_varve <- diff(log(astsa::varve))
Use help(varve) to read about the data or use an internet search on varve to learn more.
From the ACF and PACF, which model is the most likely model for dl_varve
?
Remember that an MA(q) model has an ACF that cuts off at lag q and a PACF that tails off. In this case, the ACF cuts off at lag 1 and the PACF tails off, suggesting an MA(1) model.
Model choice and residual analysis
AIC and BIC
Error | Number of Parameters | |
---|---|---|
\(\textit{average(observed-predicted)}^\textit{2}\) | + | \(\textit{k(p+q)}\) |
Residual Analysis
sarima()
includes residual analysis and graphic showing:
Model choice - I Based on the sample P/ACF pair of the logged and differenced varve data (dl_varve
), an MA(1) was indicated. The best approach to fitting ARMA is to start with a low order model, and then try to add a parameter at a time to see if the results change.
In this exercise, you will fit various models to the dl_varve
data and note the AIC and BIC for each model. In the next exercise, you will use these AICs and BICs to choose a model. Remember that you want to retain the model with the smallest AIC and/or BIC value.
A note before you start:
sarima(x, p = 0, d = 0, q = 1)
and sarima(x, 0, 0, 1)
are the same.
dl_varve <- diff(log(astsa::varve))
# Fit an MA(1) to dl_varve.
sarima(dl_varve, p = 0, d = 0, q = 1)
## initial value -0.551780
## iter 2 value -0.671633
## iter 3 value -0.706234
## iter 4 value -0.707586
## iter 5 value -0.718543
## iter 6 value -0.719692
## iter 7 value -0.721967
## iter 8 value -0.722970
## iter 9 value -0.723231
## iter 10 value -0.723247
## iter 11 value -0.723248
## iter 12 value -0.723248
## iter 12 value -0.723248
## iter 12 value -0.723248
## final value -0.723248
## converged
## initial value -0.722762
## iter 2 value -0.722764
## iter 3 value -0.722764
## iter 4 value -0.722765
## iter 4 value -0.722765
## iter 4 value -0.722765
## final value -0.722765
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ma1 xmean
## -0.7710 -0.0013
## s.e. 0.0341 0.0044
##
## sigma^2 estimated as 0.2353: log likelihood = -440.68, aic = 887.36
##
## $degrees_of_freedom
## [1] 631
##
## $ttable
## Estimate SE t.value p.value
## ma1 -0.7710 0.0341 -22.6002 0.0000
## xmean -0.0013 0.0044 -0.2818 0.7782
##
## $AIC
## [1] 1.401826
##
## $AICc
## [1] 1.401856
##
## $BIC
## [1] 1.422918
# Fit an MA(2) to dl_varve. Improvement?
sarima(dl_varve, p = 0, d = 0, q = 2)
## initial value -0.551780
## iter 2 value -0.679736
## iter 3 value -0.728605
## iter 4 value -0.734640
## iter 5 value -0.735449
## iter 6 value -0.735979
## iter 7 value -0.736015
## iter 8 value -0.736059
## iter 9 value -0.736060
## iter 10 value -0.736060
## iter 11 value -0.736061
## iter 12 value -0.736061
## iter 12 value -0.736061
## iter 12 value -0.736061
## final value -0.736061
## converged
## initial value -0.735372
## iter 2 value -0.735378
## iter 3 value -0.735379
## iter 4 value -0.735379
## iter 4 value -0.735379
## iter 4 value -0.735379
## final value -0.735379
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ma1 ma2 xmean
## -0.6710 -0.1595 -0.0013
## s.e. 0.0375 0.0392 0.0033
##
## sigma^2 estimated as 0.2294: log likelihood = -432.69, aic = 873.39
##
## $degrees_of_freedom
## [1] 630
##
## $ttable
## Estimate SE t.value p.value
## ma1 -0.6710 0.0375 -17.9057 0.0000
## ma2 -0.1595 0.0392 -4.0667 0.0001
## xmean -0.0013 0.0033 -0.4007 0.6888
##
## $AIC
## [1] 1.379757
##
## $AICc
## [1] 1.379817
##
## $BIC
## [1] 1.40788
# Fit an ARMA(1,1) to dl_varve. Improvement?
sarima(dl_varve, p = 1, d = 0, q = 1)
## initial value -0.550994
## iter 2 value -0.648962
## iter 3 value -0.676965
## iter 4 value -0.699167
## iter 5 value -0.724554
## iter 6 value -0.726719
## iter 7 value -0.729066
## iter 8 value -0.731976
## iter 9 value -0.734235
## iter 10 value -0.735969
## iter 11 value -0.736410
## iter 12 value -0.737045
## iter 13 value -0.737600
## iter 14 value -0.737641
## iter 15 value -0.737643
## iter 16 value -0.737643
## iter 17 value -0.737643
## iter 18 value -0.737643
## iter 18 value -0.737643
## iter 18 value -0.737643
## final value -0.737643
## converged
## initial value -0.737522
## iter 2 value -0.737527
## iter 3 value -0.737528
## iter 4 value -0.737529
## iter 5 value -0.737530
## iter 5 value -0.737530
## iter 5 value -0.737530
## final value -0.737530
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ma1 xmean
## 0.2341 -0.8871 -0.0013
## s.e. 0.0518 0.0292 0.0028
##
## sigma^2 estimated as 0.2284: log likelihood = -431.33, aic = 870.66
##
## $degrees_of_freedom
## [1] 630
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.2341 0.0518 4.5184 0.0000
## ma1 -0.8871 0.0292 -30.4107 0.0000
## xmean -0.0013 0.0028 -0.4618 0.6444
##
## $AIC
## [1] 1.375456
##
## $AICc
## [1] 1.375517
##
## $BIC
## [1] 1.403579
IC and BIC help you find the model with the smallest error using the least number of parameters. The idea is based on the parsimony principle, which is basic to all science and tells you to choose the simplest scientific explanation that fits the evidence.
Residual analysis - I
As you saw in the video, an sarima()
run includes a residual analysis graphic. Specifically, the output shows (1) the standardized residuals, (2) the sample ACF of the residuals, (3) a normal Q-Q plot, and (4) the p-values corresponding to the Box-Ljung-Pierce Q-statistic.
In each run, check the four residual plots as follows (model assumes errors are Gaussian white noise):
As in the previous exercise, dl_varve <- diff(log(varve))
, which is plotted below a plot of varve
. The astsa package is preloaded.
par(mfrow = c(2,1))
plot(astsa::varve)
plot(dl_varve)
# Fit an MA(1) to dl_varve. Examine the residuals
sarima(dl_varve, p = 0, d = 0, q = 1)
## initial value -0.551780
## iter 2 value -0.671633
## iter 3 value -0.706234
## iter 4 value -0.707586
## iter 5 value -0.718543
## iter 6 value -0.719692
## iter 7 value -0.721967
## iter 8 value -0.722970
## iter 9 value -0.723231
## iter 10 value -0.723247
## iter 11 value -0.723248
## iter 12 value -0.723248
## iter 12 value -0.723248
## iter 12 value -0.723248
## final value -0.723248
## converged
## initial value -0.722762
## iter 2 value -0.722764
## iter 3 value -0.722764
## iter 4 value -0.722765
## iter 4 value -0.722765
## iter 4 value -0.722765
## final value -0.722765
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ma1 xmean
## -0.7710 -0.0013
## s.e. 0.0341 0.0044
##
## sigma^2 estimated as 0.2353: log likelihood = -440.68, aic = 887.36
##
## $degrees_of_freedom
## [1] 631
##
## $ttable
## Estimate SE t.value p.value
## ma1 -0.7710 0.0341 -22.6002 0.0000
## xmean -0.0013 0.0044 -0.2818 0.7782
##
## $AIC
## [1] 1.401826
##
## $AICc
## [1] 1.401856
##
## $BIC
## [1] 1.422918
# Fit an ARMA(1,1) to dl_varve. Examine the residuals
sarima(dl_varve, p = 1, d = 0, q = 1)
## initial value -0.550994
## iter 2 value -0.648962
## iter 3 value -0.676965
## iter 4 value -0.699167
## iter 5 value -0.724554
## iter 6 value -0.726719
## iter 7 value -0.729066
## iter 8 value -0.731976
## iter 9 value -0.734235
## iter 10 value -0.735969
## iter 11 value -0.736410
## iter 12 value -0.737045
## iter 13 value -0.737600
## iter 14 value -0.737641
## iter 15 value -0.737643
## iter 16 value -0.737643
## iter 17 value -0.737643
## iter 18 value -0.737643
## iter 18 value -0.737643
## iter 18 value -0.737643
## final value -0.737643
## converged
## initial value -0.737522
## iter 2 value -0.737527
## iter 3 value -0.737528
## iter 4 value -0.737529
## iter 5 value -0.737530
## iter 5 value -0.737530
## iter 5 value -0.737530
## final value -0.737530
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ma1 xmean
## 0.2341 -0.8871 -0.0013
## s.e. 0.0518 0.0292 0.0028
##
## sigma^2 estimated as 0.2284: log likelihood = -431.33, aic = 870.66
##
## $degrees_of_freedom
## [1] 630
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.2341 0.0518 4.5184 0.0000
## ma1 -0.8871 0.0292 -30.4107 0.0000
## xmean -0.0013 0.0028 -0.4618 0.6444
##
## $AIC
## [1] 1.375456
##
## $AICc
## [1] 1.375517
##
## $BIC
## [1] 1.403579
By now you have mastered constructing parameters through the sarima()
command, but the rich and comprehensive output of this command is always worth exploring. What did you learn about the residuals produced by your MA(1) and ARMA(1,1) models?
ARMA get in By now you have gained considerable experience fitting ARMA models to data, but before you start celebrating, try one more exercise (sort of) on your own.
The data in oil
are crude oil, WTI spot price FOB (in dollars per barrel), weekly data from 2000 to 2008. Use your skills to fit an ARMA model to the returns. The weekly crude oil prices (oil
) are plotted for you. Throughout the exercise, work with the returns, which you will calculate.
As before, the astsa package is preloaded for you. The data are preloaded as oil
.
oil <- ts(astsa::oil, start = 2000, end = 2008, frequency = 52)
plot(oil, main = "Crude Oil - USD per Barrel")
# Calculate approximate oil returns
oil_returns <- diff(log(oil))
# Plot oil_returns. Notice the outliers.
plot(oil_returns)
# Plot the P/ACF pair for oil_returns
acf2(oil_returns)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF 0.14 -0.11 0.05 -0.03 -0.03 -0.10 -0.10 0.03 0.03 -0.03 -0.10 -0.01 0.07
## PACF 0.14 -0.13 0.09 -0.07 0.00 -0.12 -0.07 0.03 0.01 -0.02 -0.12 0.01 0.04
## [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF 0.04 -0.08 -0.03 0.01 0.05 -0.07 -0.04 0.13 0.13 -0.04 -0.03 -0.04
## PACF 0.04 -0.10 0.00 -0.04 0.06 -0.09 0.02 0.09 0.09 -0.06 0.02 -0.06
## [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37]
## ACF 0.03 -0.03 -0.07 0.01 0.04 0.00 0.00 -0.03 0.02 -0.01 -0.03 -0.09
## PACF 0.04 -0.05 -0.01 0.03 -0.01 -0.01 0.01 0.01 -0.01 -0.05 -0.01 -0.07
## [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49]
## ACF 0.00 0.04 -0.01 -0.06 0.06 0.02 -0.02 0.05 0.03 0.08 0.09 0.01
## PACF 0.02 -0.01 -0.01 -0.04 0.05 -0.06 -0.01 0.07 0.02 0.09 0.04 0.05
## [,50] [,51] [,52] [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60] [,61]
## ACF -0.06 0.07 -0.01 -0.07 -0.01 -0.06 -0.06 -0.08 -0.04 -0.02 -0.03 -0.01
## PACF -0.08 0.13 -0.08 0.03 -0.03 -0.03 -0.08 -0.04 0.00 -0.02 -0.06 -0.03
## [,62] [,63] [,64] [,65] [,66] [,67] [,68] [,69] [,70] [,71] [,72] [,73]
## ACF 0.02 0.03 0.00 -0.03 0.00 0.01 0.03 0.00 -0.05 -0.07 0.01 -0.03
## PACF 0.02 0.01 -0.03 -0.05 0.01 -0.02 0.02 -0.06 -0.03 -0.09 0.02 -0.07
## [,74] [,75] [,76] [,77] [,78] [,79] [,80] [,81] [,82] [,83] [,84] [,85]
## ACF -0.05 0.02 0.07 0.06 -0.06 0.02 0.04 -0.07 -0.05 0.02 0.03 0.01
## PACF 0.01 0.00 0.06 0.04 -0.06 0.10 -0.04 -0.05 -0.02 0.05 0.02 -0.03
## [,86] [,87] [,88] [,89] [,90] [,91] [,92] [,93] [,94] [,95] [,96] [,97]
## ACF 0.05 0.13 -0.08 -0.05 0.07 -0.06 0.01 0.09 -0.04 0.01 0.04 0.02
## PACF 0.10 0.12 -0.12 -0.03 0.05 -0.04 0.07 0.03 -0.07 0.05 -0.02 0.02
## [,98] [,99] [,100] [,101] [,102] [,103] [,104] [,105] [,106] [,107] [,108]
## ACF 0.01 -0.03 -0.03 -0.03 -0.04 -0.08 -0.07 0.05 0.01 0.01 0.10
## PACF -0.03 0.00 -0.06 -0.04 0.02 -0.05 -0.03 0.05 -0.04 0.04 0.04
## [,109] [,110] [,111] [,112] [,113] [,114] [,115] [,116] [,117] [,118]
## ACF 0.06 0.00 0.02 -0.03 0.04 0.04 -0.04 -0.06 -0.03 0.01
## PACF 0.02 0.01 0.01 0.02 0.06 0.00 -0.04 -0.03 -0.04 0.05
## [,119] [,120] [,121] [,122] [,123] [,124] [,125] [,126] [,127] [,128]
## ACF 0.09 -0.04 -0.05 0.02 0.04 0 -0.08 0.02 0.00 -0.05
## PACF 0.06 0.00 -0.05 0.00 0.06 0 -0.08 -0.01 -0.07 -0.05
## [,129] [,130] [,131] [,132] [,133] [,134] [,135] [,136] [,137] [,138]
## ACF 0.00 0.00 0.06 0.06 0.04 0.00 0.04 -0.01 -0.04 0.05
## PACF -0.01 0.02 0.00 0.02 0.08 -0.03 0.00 0.00 0.00 0.05
## [,139] [,140] [,141] [,142] [,143] [,144] [,145] [,146] [,147] [,148]
## ACF -0.04 -0.02 0.03 -0.03 -0.06 0.01 0.00 -0.05 0.00 -0.05
## PACF -0.05 0.00 -0.02 0.04 -0.06 0.00 -0.02 -0.01 0.03 -0.05
## [,149] [,150] [,151] [,152] [,153] [,154] [,155] [,156] [,157] [,158]
## ACF -0.11 -0.03 0.01 0.00 0.04 0.04 0.05 0.06 -0.04 -0.07
## PACF -0.08 0.01 0.03 -0.03 0.00 -0.03 0.02 0.00 -0.01 -0.06
## [,159] [,160] [,161] [,162] [,163] [,164] [,165] [,166] [,167] [,168]
## ACF 0.04 0.04 -0.01 0.07 0.00 0.00 -0.03 -0.03 -0.02 -0.02
## PACF 0.01 0.01 0.03 0.06 0.01 0.02 -0.04 0.02 -0.01 0.04
## [,169] [,170] [,171] [,172] [,173] [,174] [,175] [,176] [,177] [,178]
## ACF 0.02 -0.01 -0.02 0.01 0.02 0.03 0.00 0.00 0.00 -0.01
## PACF -0.01 0.00 0.01 -0.04 0.02 0.03 0.02 -0.04 -0.06 0.04
## [,179] [,180] [,181] [,182] [,183] [,184] [,185] [,186] [,187] [,188]
## ACF -0.03 0.05 0.02 0.07 0.05 -0.02 0.03 -0.01 -0.05 -0.06
## PACF 0.04 0.02 0.00 0.04 0.01 -0.02 -0.01 -0.02 -0.01 -0.01
## [,189] [,190] [,191] [,192] [,193] [,194] [,195] [,196] [,197] [,198]
## ACF -0.05 -0.06 -0.02 0.01 0.03 0.01 0.02 -0.02 -0.08 0.02
## PACF -0.02 -0.03 0.02 0.00 0.06 -0.01 -0.02 -0.04 -0.02 -0.02
## [,199] [,200] [,201] [,202] [,203] [,204] [,205] [,206] [,207] [,208]
## ACF 0.02 -0.01 -0.02 -0.01 -0.02 0.03 0.02 0.04 0.04 -0.05
## PACF 0.02 -0.04 -0.04 -0.03 -0.07 0.03 -0.02 0.01 0.05 -0.06
# Assuming both P/ACF are tailing, fit a model
sarima(oil_returns, p = 1, d = 0, q = 1)
## initial value -3.195594
## iter 2 value -3.201743
## iter 3 value -3.208031
## iter 4 value -3.208326
## iter 5 value -3.212457
## iter 6 value -3.215700
## iter 7 value -3.218603
## iter 8 value -3.220442
## iter 9 value -3.220894
## iter 10 value -3.221122
## iter 11 value -3.221413
## iter 12 value -3.221501
## iter 13 value -3.221524
## iter 14 value -3.221525
## iter 14 value -3.221525
## iter 14 value -3.221525
## final value -3.221525
## converged
## initial value -3.222340
## iter 2 value -3.222340
## iter 3 value -3.222341
## iter 4 value -3.222342
## iter 5 value -3.222342
## iter 5 value -3.222342
## iter 5 value -3.222342
## final value -3.222342
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ma1 xmean
## -0.4752 0.6768 0.0029
## s.e. 0.1084 0.0871 0.0022
##
## sigma^2 estimated as 0.001589: log likelihood = 750.22, aic = -1492.43
##
## $degrees_of_freedom
## [1] 413
##
## $ttable
## Estimate SE t.value p.value
## ar1 -0.4752 0.1084 -4.3846 0.0000
## ma1 0.6768 0.0871 7.7674 0.0000
## xmean 0.0029 0.0022 1.3184 0.1881
##
## $AIC
## [1] -3.587576
##
## $AICc
## [1] -3.587436
##
## $BIC
## [1] -3.548819
You have now successfully manipulated some real-world time series data, explored the qualities of that data, and modeled an ARMA(1,1) model to your data. In the next chapter, you will explore more complicated models.
ARIMA - integrated ARMA
Identifying ARIMA
ACF and PCF of an integrated ARMA
ARIMA - plug and play
As you saw in the video, a time series is called \(ARIMA(p,d,q)\) if the differenced series (of order \(d\)) is \(ARMA(p,q)\).
To get a sense of how the model works, you will analyze simulated data from the integrated model
\(Y_t =.9Y_{t−1} + W_t\)
where \(Y_t = \nabla X_t = X_t − X_{t−1}\). In this case, the model is an \(ARIMA(1,1,0)\) because the differenced data are an autoregression of order one.
The simulated time series is in x
and it was generated in R as
x <- arima.sim(model = list(order = c(1, 1, 0), ar = .9), n = 200)
You will plot the generated data and the sample ACF and PACF of the generated data to see how integrated data behave. Then, you will difference the data to make it stationary. You will plot the differenced data and the corresponding sample ACF and PACF to see how differencing makes a difference.
As before, the astsa package is preloaded in your workspace. Data from an \(ARIMA(1,1,0)\) with AR parameter .9 is saved in object x
.
x <- arima.sim(model = list(order = c(1, 1, 0), ar = .9), n = 200)
# Plot x
plot(x)
# Plot the P/ACF pair of x
acf2(x)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF 0.99 0.98 0.97 0.95 0.92 0.90 0.87 0.83 0.8 0.77 0.73 0.69 0.66
## PACF 0.99 -0.50 -0.26 -0.17 -0.09 -0.04 0.02 -0.02 0.0 -0.04 -0.02 -0.10 0.01
## [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF 0.62 0.58 0.54 0.50 0.46 0.42 0.38 0.34 0.29 0.25 0.21 0.18
## PACF 0.02 -0.05 -0.09 -0.06 -0.02 -0.01 -0.04 0.02 0.04 0.00 0.04 0.01
# Plot the differenced data
plot(diff(x))
# Plot the P/ACF pair of the differenced data
acf2(diff(x))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF 0.88 0.80 0.70 0.61 0.51 0.42 0.33 0.28 0.21 0.20 0.16 0.12 0.10
## PACF 0.88 0.08 -0.11 0.01 -0.10 -0.08 0.00 0.08 -0.04 0.15 -0.05 -0.12 0.05
## [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF 0.10 0.11 0.09 0.06 0.04 0.01 -0.03 -0.06 -0.10 -0.16 -0.20 -0.23
## PACF 0.09 0.03 -0.06 -0.10 -0.04 -0.02 -0.08 0.00 -0.02 -0.17 0.04 -0.03
As you can see, differencing the data in your \(ARIMA(1,1,0)\) model makes it stationary and allows for further analysis.
Simulated ARIMA
Before analyzing actual time series data, you should try working with a slightly more complicated model.
Here, we generated 250 observations from the \(ARIMA(2,1,0)\) model with drift given by
\(Y_t = 1 + 1.5Y_{t−1} − .75Y_{t−2} + W_t\)
where \(Y_t = \nabla X_t = X_t − X_{t−1}\).
You will use the established techniques to fit a model to the data.
The astsa package is preloaded and the generated data are in x
. The series x
and the detrended seriesy <- diff(x)
have been plotted.
# Plot sample P/ACF of differenced data and determine model
acf2(diff(x))
# Estimate parameters and examine output
sarima(x, p = 2, d = 1, q = 0)
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ar1 ar2 constant
1.5197 -0.7669 1.2335
s.e. 0.0401 0.0401 0.2570
sigma^2 estimated as 1.004: log likelihood = -355.41, aic = 718.82
$degrees_of_freedom
[1] 246
$ttable
Estimate SE t.value p.value
ar1 1.5197 0.0401 37.9191 0
ar2 -0.7669 0.0401 -19.1321 0
constant 1.2335 0.2570 4.7996 0
$AIC
[1] 2.886818
$AICc
[1] 2.887211
$BIC
[1] 2.943323
As you can see from your t-table, the estimated parameters are very close to 1.5 and -0.75.
Global warming
Now that you have some experience fitting an ARIMA model to simulated data, your next task is to apply your skills to some real world data.
The data in globtemp
(from astsa) are the annual global temperature deviations to 2015. In this exercise, you will use established techniques to fit an ARIMA model to the data. A plot of the data shows random walk behavior, which suggests you should work with the differenced data. The differenced data diff(globtemp)
are also plotted.
After plotting the sample ACF and PACF of the differenced data diff(globtemp)
, you can say that either
After fitting the first two models, check the AIC and BIC to choose the preferred model.
par(mfrow = c(2,1))
plot(astsa::globtemp)
plot(diff(astsa::globtemp))
# Plot the sample P/ACF pair of the differenced data
acf2(diff(globtemp))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## ACF -0.24 -0.19 -0.08 0.20 -0.15 -0.03 0.03 0.14 -0.16 0.11 -0.05 0.00
## PACF -0.24 -0.26 -0.23 0.06 -0.16 -0.09 -0.05 0.07 -0.09 0.11 -0.03 -0.02
## [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22]
## ACF -0.13 0.14 -0.01 -0.08 0 0.19 -0.07 0.02 -0.02 0.08
## PACF -0.10 0.02 0.00 -0.09 0 0.11 0.04 0.13 0.09 0.08
# Fit an ARIMA(1,1,1) model to globtemp
sarima(globtemp, p = 1, d = 1, q = 1)
## initial value -2.218917
## iter 2 value -2.253118
## iter 3 value -2.263750
## iter 4 value -2.272144
## iter 5 value -2.282786
## iter 6 value -2.296777
## iter 7 value -2.297062
## iter 8 value -2.297253
## iter 9 value -2.297389
## iter 10 value -2.297405
## iter 11 value -2.297413
## iter 12 value -2.297413
## iter 13 value -2.297414
## iter 13 value -2.297414
## iter 13 value -2.297414
## final value -2.297414
## converged
## initial value -2.305504
## iter 2 value -2.305800
## iter 3 value -2.305821
## iter 4 value -2.306655
## iter 5 value -2.306875
## iter 6 value -2.306950
## iter 7 value -2.306955
## iter 8 value -2.306955
## iter 8 value -2.306955
## final value -2.306955
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ma1 constant
## 0.3549 -0.7663 0.0072
## s.e. 0.1314 0.0874 0.0032
##
## sigma^2 estimated as 0.009885: log likelihood = 119.88, aic = -231.76
##
## $degrees_of_freedom
## [1] 132
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.3549 0.1314 2.7008 0.0078
## ma1 -0.7663 0.0874 -8.7701 0.0000
## constant 0.0072 0.0032 2.2738 0.0246
##
## $AIC
## [1] -1.716773
##
## $AICc
## [1] -1.715416
##
## $BIC
## [1] -1.630691
# Fit an ARIMA(0,1,2) model to globtemp. Which model is better?
sarima(globtemp, p = 0, d = 1, q = 2)
## initial value -2.220513
## iter 2 value -2.294887
## iter 3 value -2.307682
## iter 4 value -2.309170
## iter 5 value -2.310360
## iter 6 value -2.311251
## iter 7 value -2.311636
## iter 8 value -2.311648
## iter 9 value -2.311649
## iter 9 value -2.311649
## iter 9 value -2.311649
## final value -2.311649
## converged
## initial value -2.310187
## iter 2 value -2.310197
## iter 3 value -2.310199
## iter 4 value -2.310201
## iter 5 value -2.310202
## iter 5 value -2.310202
## iter 5 value -2.310202
## final value -2.310202
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ma1 ma2 constant
## -0.3984 -0.2173 0.0072
## s.e. 0.0808 0.0768 0.0033
##
## sigma^2 estimated as 0.00982: log likelihood = 120.32, aic = -232.64
##
## $degrees_of_freedom
## [1] 132
##
## $ttable
## Estimate SE t.value p.value
## ma1 -0.3984 0.0808 -4.9313 0.0000
## ma2 -0.2173 0.0768 -2.8303 0.0054
## constant 0.0072 0.0033 2.1463 0.0337
##
## $AIC
## [1] -1.723268
##
## $AICc
## [1] -1.721911
##
## $BIC
## [1] -1.637185
Judging by the AIC and BIC, the ARIMA(0,1,2) model performs better than the ARIMA(1,1,1) model on the globtemp
data. Remember to thoroughly examine the output of your sarima()
command to gain a full understanding of your model.
ARIMA diagnostics
Diagnostics - simulated overfitting
One way to check an analysis is to overfit the model by adding an extra parameter to see if it makes a difference in the results. If adding parameters changes the results drastically, then you should rethink your model. If, however, the results do not change by much, you can be confident that your fit is correct.
We generated 250 observations from an ARIMA(0,1,1) model with MA parameter .9. First, you will fit the model to the data using established techniques.
Then, you can check a model by overfitting (adding a parameter) to see if it makes a difference. In this case, you will add an additional MA parameter to see that it is not needed.
As usual, the astsa package is preloaded and the generated data in x
are plotted in your workspace. The differenced data diff(x)
are also plotted. Note that it looks stationary.
x <- arima.sim(model = list(order = c(0, 1, 1), ma = .9), n = 250)
par(mfrow = c(2,1))
plot(x)
plot(diff(x))
# Plot sample P/ACF pair of the differenced data
acf2(diff(x))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF 0.42 -0.10 -0.04 -0.01 0.03 0.02 0.01 0.07 0.02 -0.08 -0.01 0.08 -0.01
## PACF 0.42 -0.34 0.22 -0.18 0.19 -0.16 0.16 -0.04 0.01 -0.10 0.13 -0.04 -0.03
## [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF -0.09 -0.02 0.06 0.02 -0.04 -0.04 0.08 0.16 0.09 0.03 0.04 0.01
## PACF -0.06 0.08 -0.02 0.00 -0.04 0.02 0.10 0.08 0.01 0.04 0.00 0.02
## [,26]
## ACF -0.08
## PACF -0.13
# Fit the first model, compare parameters, check diagnostics
astsa::sarima(x, p = 0, d = 1, q = 1)
## initial value 0.247668
## iter 2 value 0.101855
## iter 3 value 0.052606
## iter 4 value 0.003447
## iter 5 value 0.000436
## iter 6 value 0.000327
## iter 7 value 0.000304
## iter 8 value 0.000277
## iter 9 value 0.000246
## iter 10 value 0.000242
## iter 11 value 0.000241
## iter 12 value 0.000241
## iter 13 value 0.000241
## iter 14 value 0.000241
## iter 15 value 0.000241
## iter 16 value 0.000241
## iter 16 value 0.000241
## final value 0.000241
## converged
## initial value -0.003385
## iter 2 value -0.003680
## iter 3 value -0.003786
## iter 4 value -0.003810
## iter 5 value -0.003847
## iter 6 value -0.003847
## iter 6 value -0.003847
## final value -0.003847
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ma1 constant
## 0.9071 0.1810
## s.e. 0.0359 0.1195
##
## sigma^2 estimated as 0.9855: log likelihood = -353.77, aic = 713.55
##
## $degrees_of_freedom
## [1] 248
##
## $ttable
## Estimate SE t.value p.value
## ma1 0.9071 0.0359 25.2465 0.0000
## constant 0.1810 0.1195 1.5146 0.1312
##
## $AIC
## [1] 2.854184
##
## $AICc
## [1] 2.854378
##
## $BIC
## [1] 2.896441
# Fit the second model and compare fit
astsa::sarima(x, p = 0, d = 1, q = 2)
## initial value 0.247668
## iter 2 value 0.082421
## iter 3 value 0.056210
## iter 4 value 0.035808
## iter 5 value 0.000779
## iter 6 value -0.001022
## iter 7 value -0.004734
## iter 8 value -0.005086
## iter 9 value -0.005126
## iter 10 value -0.005140
## iter 11 value -0.005142
## iter 12 value -0.005144
## iter 13 value -0.005145
## iter 14 value -0.005145
## iter 15 value -0.005145
## iter 16 value -0.005145
## iter 17 value -0.005145
## iter 17 value -0.005145
## iter 17 value -0.005145
## final value -0.005145
## converged
## initial value -0.008903
## iter 2 value -0.009279
## iter 3 value -0.010045
## iter 4 value -0.010247
## iter 5 value -0.010342
## iter 6 value -0.010384
## iter 7 value -0.010388
## iter 8 value -0.010392
## iter 9 value -0.010392
## iter 9 value -0.010392
## iter 9 value -0.010392
## final value -0.010392
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ma1 ma2 constant
## 0.8328 -0.1672 0.181
## s.e. 0.0636 0.0619 0.103
##
## sigma^2 estimated as 0.9591: log likelihood = -352.14, aic = 712.27
##
## $degrees_of_freedom
## [1] 247
##
## $ttable
## Estimate SE t.value p.value
## ma1 0.8328 0.0636 13.1030 0.0000
## ma2 -0.1672 0.0619 -2.7018 0.0074
## constant 0.1810 0.1030 1.7571 0.0801
##
## $AIC
## [1] 2.849094
##
## $AICc
## [1] 2.849484
##
## $BIC
## [1] 2.905437
As you can see from the t-table, the second MA parameter is not significantly different from zero and the first MA parameter is approximately the same in each run. Also, the AIC and BIC both increase when the parameter is added. In addition, the residual analysis of your ARIMA(0,1,1) model is fine. All of these facts together indicate that you have a successful model fit.
Diagnostics - global temperatures
You can now finish your analysis of global temperatures. Recall that you previously fit two models to the data in globtemp
, an ARIMA(1,1,1) and an ARIMA(0,1,2). In the final analysis, check the residual diagnostics and use AIC and BIC for model choice.
The data are plotted for you.
plot(astsa::globtemp, ylab = "globtemp", main = "Global Temperature Deviations")
# Fit ARIMA(0,1,2) to globtemp and check diagnostics
sarima(globtemp, p = 0, d = 1, q = 2)
## initial value -2.220513
## iter 2 value -2.294887
## iter 3 value -2.307682
## iter 4 value -2.309170
## iter 5 value -2.310360
## iter 6 value -2.311251
## iter 7 value -2.311636
## iter 8 value -2.311648
## iter 9 value -2.311649
## iter 9 value -2.311649
## iter 9 value -2.311649
## final value -2.311649
## converged
## initial value -2.310187
## iter 2 value -2.310197
## iter 3 value -2.310199
## iter 4 value -2.310201
## iter 5 value -2.310202
## iter 5 value -2.310202
## iter 5 value -2.310202
## final value -2.310202
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ma1 ma2 constant
## -0.3984 -0.2173 0.0072
## s.e. 0.0808 0.0768 0.0033
##
## sigma^2 estimated as 0.00982: log likelihood = 120.32, aic = -232.64
##
## $degrees_of_freedom
## [1] 132
##
## $ttable
## Estimate SE t.value p.value
## ma1 -0.3984 0.0808 -4.9313 0.0000
## ma2 -0.2173 0.0768 -2.8303 0.0054
## constant 0.0072 0.0033 2.1463 0.0337
##
## $AIC
## [1] -1.723268
##
## $AICc
## [1] -1.721911
##
## $BIC
## [1] -1.637185
# Fit ARIMA(1,1,1) to globtemp and check diagnostics
sarima(globtemp, p = 1, d = 1, q = 1)
## initial value -2.218917
## iter 2 value -2.253118
## iter 3 value -2.263750
## iter 4 value -2.272144
## iter 5 value -2.282786
## iter 6 value -2.296777
## iter 7 value -2.297062
## iter 8 value -2.297253
## iter 9 value -2.297389
## iter 10 value -2.297405
## iter 11 value -2.297413
## iter 12 value -2.297413
## iter 13 value -2.297414
## iter 13 value -2.297414
## iter 13 value -2.297414
## final value -2.297414
## converged
## initial value -2.305504
## iter 2 value -2.305800
## iter 3 value -2.305821
## iter 4 value -2.306655
## iter 5 value -2.306875
## iter 6 value -2.306950
## iter 7 value -2.306955
## iter 8 value -2.306955
## iter 8 value -2.306955
## final value -2.306955
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ma1 constant
## 0.3549 -0.7663 0.0072
## s.e. 0.1314 0.0874 0.0032
##
## sigma^2 estimated as 0.009885: log likelihood = 119.88, aic = -231.76
##
## $degrees_of_freedom
## [1] 132
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.3549 0.1314 2.7008 0.0078
## ma1 -0.7663 0.0874 -8.7701 0.0000
## constant 0.0072 0.0032 2.2738 0.0246
##
## $AIC
## [1] -1.716773
##
## $AICc
## [1] -1.715416
##
## $BIC
## [1] -1.630691
# Which is the better model?
"ARIMA(0,1,2)"
## [1] "ARIMA(0,1,2)"
Your model diagnostics suggest that both the ARIMA(0,1,2) and the ARIMA(1,1,1) are reasonable models. However, the AIC and BIC suggest that the ARIMA(0,1,2) performs slightly better, so this should be your preferred model. Although you were not asked to do so, you can use overfitting to assess the final model. For example, try fitting an ARIMA(1,1,2) or an ARIMA(0,1,3) to the data.
Forecasting ARIMA
sarima.for()
to forecast in the astsa-package
Forecasting simulated ARIMA
Now that you are an expert at fitting ARIMA models, you can use your skills for forecasting. First, you will work with simulated data.
We generated 120 observations from an ARIMA(1,1,0) model with AR parameter .9. The data are in y
and the first 100 observations are in x. These observations are plotted for you. You will fit an ARIMA(1,1,0) model to the data in x
and verify that the model fits well. Then use sarima.for() from astsa
to forecast the data 20 time periods ahead. You will then compare the forecasts to the actual data in y
.
The basic syntax for forecasting is sarima.for(data, n.ahead, p, d, q)
where n.ahead
is a positive integer that specifies the forecast horizon. The predicted values and their standard errors are printed, the data are plotted in black, and the forecasts are in red along with 2 mean square prediction error bounds as blue dashed lines.
The astsa package is preloaded and the data (x
) and differenced data (diff(x)
) are plotted.
y <- arima.sim(model = list(order = c(1, 1, 0), ar = .9), n = 120)
x <- ts(y, start = 1, end = 100, frequency = 1)
par(mfrow = c(2,1))
plot(x)
plot(diff(x))
# Plot P/ACF pair of differenced data
astsa::acf2(diff(x))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF 0.85 0.74 0.65 0.57 0.50 0.44 0.39 0.31 0.21 0.09 0.01 -0.08 -0.09
## PACF 0.85 0.08 0.02 0.00 0.01 0.02 0.00 -0.15 -0.14 -0.17 0.03 -0.15 0.21
## [,14] [,15] [,16] [,17] [,18] [,19] [,20]
## ACF -0.13 -0.19 -0.25 -0.28 -0.29 -0.30 -0.28
## PACF -0.13 -0.05 -0.11 0.09 0.06 -0.03 0.03
# Fit model - check t-table and diagnostics
astsa::sarima(x, p = 1, d = 1, q = 0)
## initial value 0.632596
## iter 2 value -0.051045
## iter 3 value -0.053007
## iter 4 value -0.053139
## iter 5 value -0.053422
## iter 6 value -0.053781
## iter 7 value -0.053859
## iter 8 value -0.053874
## iter 9 value -0.053895
## iter 10 value -0.053925
## iter 11 value -0.053946
## iter 12 value -0.053952
## iter 13 value -0.053952
## iter 14 value -0.053952
## iter 15 value -0.053953
## iter 16 value -0.053953
## iter 17 value -0.053953
## iter 18 value -0.053953
## iter 19 value -0.053953
## iter 20 value -0.053953
## iter 21 value -0.053953
## iter 22 value -0.053953
## iter 22 value -0.053953
## iter 22 value -0.053953
## final value -0.053953
## converged
## initial value -0.049408
## iter 2 value -0.049470
## iter 3 value -0.049740
## iter 4 value -0.049767
## iter 5 value -0.049802
## iter 6 value -0.049821
## iter 7 value -0.049822
## iter 8 value -0.049823
## iter 9 value -0.049823
## iter 10 value -0.049824
## iter 10 value -0.049824
## iter 10 value -0.049824
## final value -0.049824
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 constant
## 0.8726 -0.0147
## s.e. 0.0496 0.7017
##
## sigma^2 estimated as 0.8921: log likelihood = -135.54, aic = 277.08
##
## $degrees_of_freedom
## [1] 97
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.8726 0.0496 17.5823 0.0000
## constant -0.0147 0.7017 -0.0209 0.9833
##
## $AIC
## [1] 2.798836
##
## $AICc
## [1] 2.800099
##
## $BIC
## [1] 2.877476
# Forecast the data 20 time periods ahead
astsa::sarima.for(x, n.ahead = 20, p = 1, d = 1, q = 0)
## $pred
## Time Series:
## Start = 101
## End = 120
## Frequency = 1
## [1] -15.0218971 -12.3590917 -10.0373072 -8.0131092 -6.2485953 -4.7106905
## [7] -3.3705325 -2.2029351 -1.1859200 -0.3003082 0.4706368 1.1415196
## [13] 1.7250846 2.2324533 2.6733304 3.0561846 3.3884062 3.6764440
## [19] 3.9259256 4.1417616
##
## $se
## Time Series:
## Start = 101
## End = 120
## Frequency = 1
## [1] 0.9445349 2.0051651 3.1954564 4.4629953 5.7739609 7.1055712
## [7] 8.4420413 9.7722824 11.0884824 12.3851782 13.6586198 14.9063226
## [13] 16.1267428 17.3190377 18.4828878 19.6183612 20.7258118 21.8058004
## [19] 22.8590350 23.8863245
lines(y)
As you can see, the sarima.for()
command provides a simple method for forecasting. Although the blue error bands are relatively wide, the prediction remains quite valuable.
Forecasting global temperatures Now you can try forecasting real data.
Here, you will forecast the annual global temperature deviations globtemp
to 2050. Recall that in previous exercises, you fit an ARIMA(0,1,2) model to the data. You will refit the model to confirm it, and then forecast the series 35 years into the future.
The astsa package is preloaded and the data are plotted.
plot(astsa::globtemp, ylab = "globtemp", main = "Global Temperature Deviations")
# Fit an ARIMA(0,1,2) to globtemp and check the fit
sarima(globtemp, p = 0, d = 1, q = 2)
## initial value -2.220513
## iter 2 value -2.294887
## iter 3 value -2.307682
## iter 4 value -2.309170
## iter 5 value -2.310360
## iter 6 value -2.311251
## iter 7 value -2.311636
## iter 8 value -2.311648
## iter 9 value -2.311649
## iter 9 value -2.311649
## iter 9 value -2.311649
## final value -2.311649
## converged
## initial value -2.310187
## iter 2 value -2.310197
## iter 3 value -2.310199
## iter 4 value -2.310201
## iter 5 value -2.310202
## iter 5 value -2.310202
## iter 5 value -2.310202
## final value -2.310202
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ma1 ma2 constant
## -0.3984 -0.2173 0.0072
## s.e. 0.0808 0.0768 0.0033
##
## sigma^2 estimated as 0.00982: log likelihood = 120.32, aic = -232.64
##
## $degrees_of_freedom
## [1] 132
##
## $ttable
## Estimate SE t.value p.value
## ma1 -0.3984 0.0808 -4.9313 0.0000
## ma2 -0.2173 0.0768 -2.8303 0.0054
## constant 0.0072 0.0033 2.1463 0.0337
##
## $AIC
## [1] -1.723268
##
## $AICc
## [1] -1.721911
##
## $BIC
## [1] -1.637185
# Forecast data 35 years into the future
sarima.for(globtemp, n.ahead = 35, p = 0, d = 1, q = 2)
## $pred
## Time Series:
## Start = 2016
## End = 2050
## Frequency = 1
## [1] 0.7995567 0.7745381 0.7816919 0.7888457 0.7959996 0.8031534 0.8103072
## [8] 0.8174611 0.8246149 0.8317688 0.8389226 0.8460764 0.8532303 0.8603841
## [15] 0.8675379 0.8746918 0.8818456 0.8889995 0.8961533 0.9033071 0.9104610
## [22] 0.9176148 0.9247687 0.9319225 0.9390763 0.9462302 0.9533840 0.9605378
## [29] 0.9676917 0.9748455 0.9819994 0.9891532 0.9963070 1.0034609 1.0106147
##
## $se
## Time Series:
## Start = 2016
## End = 2050
## Frequency = 1
## [1] 0.09909556 0.11564576 0.12175580 0.12757353 0.13313729 0.13847769
## [7] 0.14361964 0.14858376 0.15338730 0.15804492 0.16256915 0.16697084
## [13] 0.17125943 0.17544322 0.17952954 0.18352490 0.18743511 0.19126540
## [19] 0.19502047 0.19870459 0.20232164 0.20587515 0.20936836 0.21280424
## [25] 0.21618551 0.21951471 0.22279416 0.22602604 0.22921235 0.23235497
## [31] 0.23545565 0.23851603 0.24153763 0.24452190 0.24747019
In the next chapter, you will learn how to analyze seasonal time series data.
Pure seasonal models
Consider pure seasonal models such as an SAR\((P = 1)_{s=12}\)
\(X_t = \Phi X_{t-12} + W_t\)
Fit a pure seasonal model As with other models, you can fit seasonal models in R using the sarima()
command in the astsa package.
To get a feeling of how pure seasonal models work, it is best to consider simulated data. We generated 250 observations from a pure seasonal model given by
\(X_t = .9X_{t−12} + Wt + .5W_{t−12}\),
which we would denote as a SARMA\((P = 1, Q = 1)_S = 12\). Three years of data and the model ACF and PACF are plotted for you.
You will compare the sample ACF and PACF values from the generated data to the true values.
The astsa
package is preloaded for you and the generated data are in x
.
# Plot sample P/ACF to lag 60 and compare to the true values
acf2(x, max.lag = 60)
# Fit the seasonal model to x
sarima(x, p = 0, d = 0, q = 0, P = 1, D = 0, Q = 1, S = 12)
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
sar1 sma1 xmean
0.9311 0.4824 -0.5766
s.e. 0.0204 0.0633 0.8797
sigma^2 estimated as 0.9767: log likelihood = -372.74, aic = 753.48
$degrees_of_freedom
[1] 249
$ttable
Estimate SE t.value p.value
sar1 0.9311 0.0204 45.6189 0.0000
sma1 0.4824 0.0633 7.6158 0.0000
xmean -0.5766 0.8797 -0.6554 0.5128
$AIC
[1] 2.989992
$AICc
[1] 2.990377
$BIC
[1] 3.046015
Fitting a seasonal model using sarima()
requires a few additional arguments, but follows the same basic process as nonseasonal models.
Mixed seasonal models
Fit a mixed seasonal model
Pure seasonal dependence such as that explored earlier in this chapter is relatively rare. Most seasonal time series have mixed dependence, meaning only some of the variation is explained by seasonal trends.
Recall that the full seasonal model is denoted by SARIMA\((p,d,q)\)x\((P,D,Q)_S\) where capital letters denote the seasonal orders.
As before, this exercise asks you to compare the sample P/ACF pair to the true values for some simulated seasonal data and fit a model to the data using sarima()
. This time, the simulated data come from a mixed seasonal model, SARIMA\((0,0,1)\)x\((0,0,1)_{12}\). The plots show three years of data, as well as the model ACF and PACF. Notice that, as opposed to the pure seasonal model, there are correlations at the nonseasonal lags as well as the seasonal lags.
As always, the astsa package is preloaded. The generated data are in x
.
# Plot sample P/ACF pair to lag 60 and compare to actual
acf2(x, max.lag = 60)
# Fit the seasonal model to x
sarima(x, p = 0, d = 0, q = 1, P = 0, D = 0, Q = 1, S = 12)
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ma1 sma1 xmean
-0.6142 0.7888 0.0784
s.e. 0.0565 0.0475 0.0430
sigma^2 estimated as 1.005: log likelihood = -364.26, aic = 736.52
$degrees_of_freedom
[1] 249
$ttable
Estimate SE t.value p.value
ma1 -0.6142 0.0565 -10.8809 0.0000
sma1 0.7888 0.0475 16.6094 0.0000
xmean 0.0784 0.0430 1.8253 0.0692
$AIC
[1] 2.922696
$AICc
[1] 2.92308
$BIC
[1] 2.978718
Time series data collected on a seasonal basis typically have mixed dependence. For example, what happens in June is often related to what happened in May as well as what happened in June of last year.
Data analysis - unemployment I
In the video, we fit a seasonal ARIMA model to the log of the monthly AirPassengers
data set. You will now start to fit a seasonal ARIMA model to the monthly US unemployment data, unemp
, from the astsa
package.
The first thing to do is to plot the data, notice the trend and the seasonal persistence. Then look at the detrended data and remove the seasonal persistence. After that, the fully differenced data should look stationary.
The astsa package is preloaded in your workspace.
# Plot unemp
plot(unemp, main = "Unemployment")
# Difference your data and plot
d_unemp <- diff(unemp)
plot(d_unemp, main = "Differenced Unemployment")
# Plot seasonal differenced diff_unemp
dd_unemp <- diff(d_unemp, lag = 12)
plot(dd_unemp, main = "Seasonal Differenced Unemployment")
Now that you have removed the trend and seasonal variation in unemployment, the data appear to be stationary.
Data analysis - unemployment II
Now, you will continue fitting an SARIMA model to the monthly US unemployment unemp
time series by looking at the sample ACF and PACF of the fully differenced series.
Note that the lag axis in the sample P/ACF plot is in terms of years. Thus, lags 1, 2, 3, … represent 1 year (12 months), 2 years (24 months), 3 years (36 months), …
Once again, the astsa package is preloaded in your workspace.
# Plot P/ACF pair of the fully differenced data to lag 60
dd_unemp <- diff(diff(unemp), lag = 12)
acf2(dd_unemp, max.lag = 60)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF 0.21 0.33 0.15 0.17 0.10 0.06 -0.06 -0.02 -0.09 -0.17 -0.08 -0.48 -0.18
## PACF 0.21 0.29 0.05 0.05 0.01 -0.02 -0.12 -0.03 -0.05 -0.15 0.02 -0.43 -0.02
## [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF -0.16 -0.11 -0.15 -0.09 -0.09 0.03 -0.01 0.02 -0.02 0.01 -0.02 0.09
## PACF 0.15 0.03 -0.04 -0.01 0.00 0.01 0.01 -0.01 -0.16 0.01 -0.27 0.05
## [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37]
## ACF -0.05 -0.01 0.03 0.08 0.01 0.03 -0.05 0.01 0.02 -0.06 -0.02 -0.12
## PACF -0.01 -0.05 0.05 0.09 -0.04 0.02 -0.07 -0.01 -0.08 -0.08 -0.23 -0.08
## [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49]
## ACF 0.01 -0.03 -0.03 -0.10 -0.02 -0.13 0.00 -0.06 0.01 0.02 0.11 0.13
## PACF 0.06 -0.07 -0.01 0.03 -0.03 -0.11 -0.04 0.01 0.00 -0.03 -0.04 0.02
## [,50] [,51] [,52] [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60]
## ACF 0.10 0.07 0.10 0.12 0.06 0.14 0.05 0.04 0.04 0.07 -0.03
## PACF 0.03 -0.05 0.02 0.02 -0.08 0.00 -0.03 -0.07 0.05 0.04 -0.04
# Fit an appropriate model
sarima(unemp, p = 2, d = 1, q = 0, P = 0, D = 1, Q = 1, S = 12)
## initial value 3.340809
## iter 2 value 3.105512
## iter 3 value 3.086631
## iter 4 value 3.079778
## iter 5 value 3.069447
## iter 6 value 3.067659
## iter 7 value 3.067426
## iter 8 value 3.067418
## iter 8 value 3.067418
## final value 3.067418
## converged
## initial value 3.065481
## iter 2 value 3.065478
## iter 3 value 3.065477
## iter 3 value 3.065477
## iter 3 value 3.065477
## final value 3.065477
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), include.mean = !no.constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ar2 sma1
## 0.1351 0.2464 -0.6953
## s.e. 0.0513 0.0515 0.0381
##
## sigma^2 estimated as 449.6: log likelihood = -1609.91, aic = 3227.81
##
## $degrees_of_freedom
## [1] 356
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.1351 0.0513 2.6326 0.0088
## ar2 0.2464 0.0515 4.7795 0.0000
## sma1 -0.6953 0.0381 -18.2362 0.0000
##
## $AIC
## [1] 8.723811
##
## $AICc
## [1] 8.723988
##
## $BIC
## [1] 8.765793
As always, keep a close eye on the output from your sarima()
command to get a feel for the fit of your model.
Data analysis - commodity prices
Making money in commodities is not easy. Most commodities traders lose money rather than make it. The package astsa
includes the data set chicken
, which is the monthly whole bird spot price, Georgia docks, US cents per pound, from August, 2001 to July, 2016.
The astsa package is preloaded in your R console and the data are plotted for you, note the trend and seasonal components.
First, you will use your skills to carefully fit an SARIMA model to the commodity. Later, you will use the fitted model to try and forecast the whole bird spot price.
After removing the trend, the sample ACF and PACF suggest an AR(2) model because the PACF cuts off after lag 2 and the ACF tails off. However, the ACF has a small seasonal component remaining. This can be taken care of by fitting an addition SAR(1) component.
By the way, if you are interested in analyzing other commodities from various regions, you can find many different time series at index mundi.
plot(chicken, ylab = "Cents per Pound", main = "Chicken Prices")
# Plot differenced chicken
plot(diff(chicken))
# Plot P/ACF pair of differenced data to lag 60
acf2(diff(chicken), max.lag = 60)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## ACF 0.72 0.39 0.09 -0.07 -0.16 -0.20 -0.27 -0.23 -0.11 0.09 0.26 0.33
## PACF 0.72 -0.29 -0.14 0.03 -0.10 -0.06 -0.19 0.12 0.10 0.16 0.09 0.00
## [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF 0.20 0.07 -0.03 -0.10 -0.19 -0.25 -0.29 -0.20 -0.08 0.08 0.16 0.18
## PACF -0.22 0.03 0.03 -0.11 -0.09 0.01 -0.03 0.07 -0.04 0.06 -0.05 0.02
## [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36]
## ACF 0.08 -0.06 -0.21 -0.31 -0.40 -0.40 -0.33 -0.18 0.02 0.20 0.30 0.35
## PACF -0.14 -0.19 -0.13 -0.06 -0.08 -0.05 0.01 0.03 0.10 0.02 -0.01 0.09
## [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF 0.26 0.13 -0.02 -0.14 -0.23 -0.21 -0.18 -0.11 -0.03 0.08 0.21 0.33
## PACF -0.12 0.01 -0.01 -0.05 0.02 0.12 -0.05 -0.13 -0.07 0.01 0.14 0.05
## [,49] [,50] [,51] [,52] [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60]
## ACF 0.26 0.12 -0.01 -0.11 -0.13 -0.09 -0.09 -0.06 0.03 0.17 0.29 0.32
## PACF -0.20 -0.01 0.07 -0.04 0.02 0.00 -0.08 0.03 0.04 0.00 0.01 0.03
# Fit ARIMA(2,1,0) to chicken - not so good
sarima(chicken, p = 2, d = 1, q = 0)
## initial value 0.001863
## iter 2 value -0.156034
## iter 3 value -0.359181
## iter 4 value -0.424164
## iter 5 value -0.430212
## iter 6 value -0.432744
## iter 7 value -0.432747
## iter 8 value -0.432749
## iter 9 value -0.432749
## iter 10 value -0.432751
## iter 11 value -0.432752
## iter 12 value -0.432752
## iter 13 value -0.432752
## iter 13 value -0.432752
## iter 13 value -0.432752
## final value -0.432752
## converged
## initial value -0.420883
## iter 2 value -0.420934
## iter 3 value -0.420936
## iter 4 value -0.420937
## iter 5 value -0.420937
## iter 6 value -0.420937
## iter 6 value -0.420937
## iter 6 value -0.420937
## final value -0.420937
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ar2 constant
## 0.9494 -0.3069 0.2632
## s.e. 0.0717 0.0718 0.1362
##
## sigma^2 estimated as 0.4286: log likelihood = -178.64, aic = 365.28
##
## $degrees_of_freedom
## [1] 176
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.9494 0.0717 13.2339 0.0000
## ar2 -0.3069 0.0718 -4.2723 0.0000
## constant 0.2632 0.1362 1.9328 0.0549
##
## $AIC
## [1] 2.040695
##
## $AICc
## [1] 2.041461
##
## $BIC
## [1] 2.111922
# Fit SARIMA(2,1,0,1,0,0,12) to chicken - that works
sarima(chicken, p = 2, d = 1, q = 0, P = 1, D = 0, Q = 0, S = 12)
## initial value 0.015039
## iter 2 value -0.226398
## iter 3 value -0.412955
## iter 4 value -0.460882
## iter 5 value -0.470787
## iter 6 value -0.471082
## iter 7 value -0.471088
## iter 8 value -0.471090
## iter 9 value -0.471092
## iter 10 value -0.471095
## iter 11 value -0.471095
## iter 12 value -0.471096
## iter 13 value -0.471096
## iter 14 value -0.471096
## iter 15 value -0.471097
## iter 16 value -0.471097
## iter 16 value -0.471097
## iter 16 value -0.471097
## final value -0.471097
## converged
## initial value -0.473585
## iter 2 value -0.473664
## iter 3 value -0.473721
## iter 4 value -0.473823
## iter 5 value -0.473871
## iter 6 value -0.473885
## iter 7 value -0.473886
## iter 8 value -0.473886
## iter 8 value -0.473886
## iter 8 value -0.473886
## final value -0.473886
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ar2 sar1 constant
## 0.9154 -0.2494 0.3237 0.2353
## s.e. 0.0733 0.0739 0.0715 0.1973
##
## sigma^2 estimated as 0.3828: log likelihood = -169.16, aic = 348.33
##
## $degrees_of_freedom
## [1] 175
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.9154 0.0733 12.4955 0.0000
## ar2 -0.2494 0.0739 -3.3728 0.0009
## sar1 0.3237 0.0715 4.5238 0.0000
## constant 0.2353 0.1973 1.1923 0.2347
##
## $AIC
## [1] 1.945971
##
## $AICc
## [1] 1.947256
##
## $BIC
## [1] 2.035005
You have successfully fit an ARIMA model to a commodity. If you are interested in analyzing other commodities from various regions, you can find many different time series at index mundi
Data analysis - birth rate
Now you will use your new skills to carefully fit an SARIMA model to the birth
time series from astsa
. The data are monthly live births (adjusted) in thousands for the United States, 1948-1979, and includes the baby boom after WWII.
The birth
data are plotted in your R console. Note the long-term trend (random walk) and the seasonal component of the data.
plot(birth, main="US Birth Rate")
# Plot P/ACF to lag 60 of differenced data
d_birth <- diff(birth)
acf2(d_birth, max.lag = 60)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## ACF -0.32 0.16 -0.08 -0.19 0.09 -0.28 0.06 -0.19 -0.05 0.17 -0.26 0.82
## PACF -0.32 0.06 -0.01 -0.25 -0.03 -0.26 -0.17 -0.29 -0.35 -0.16 -0.59 0.57
## [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF -0.28 0.17 -0.07 -0.18 0.08 -0.28 0.07 -0.18 -0.05 0.16 -0.24 0.78
## PACF 0.13 0.11 0.13 0.09 0.00 0.00 0.05 0.04 -0.07 -0.10 -0.20 0.19
## [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36]
## ACF -0.27 0.19 -0.08 -0.17 0.07 -0.29 0.07 -0.15 -0.04 0.14 -0.24 0.75
## PACF 0.01 0.05 0.07 0.07 -0.02 -0.06 -0.02 0.09 0.03 -0.06 -0.16 0.03
## [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF -0.23 0.16 -0.08 -0.15 0.05 -0.25 0.06 -0.18 -0.03 0.15 -0.22 0.72
## PACF 0.08 -0.10 -0.03 0.07 -0.04 0.06 0.04 -0.07 -0.06 0.02 -0.04 0.10
## [,49] [,50] [,51] [,52] [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60]
## ACF -0.24 0.16 -0.08 -0.13 0.05 -0.26 0.05 -0.17 -0.02 0.15 -0.23 0.70
## PACF 0.01 0.00 -0.03 0.04 0.03 0.00 -0.01 0.01 0.03 0.04 -0.09 0.04
# Plot P/ACF to lag 60 of seasonal differenced data
dd_birth <- diff(d_birth, lag = 12)
acf2(dd_birth, max.lag = 60)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF -0.3 -0.09 -0.09 0.00 0.07 0.03 -0.07 -0.04 0.11 0.04 0.13 -0.43 0.14
## PACF -0.3 -0.20 -0.21 -0.14 -0.03 0.02 -0.06 -0.08 0.06 0.08 0.23 -0.32 -0.06
## [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF -0.01 0.03 0.01 0.02 0.00 0.03 -0.07 -0.01 0 0.06 -0.01 -0.12
## PACF -0.13 -0.13 -0.11 0.02 0.06 0.04 -0.10 0.02 0 0.17 -0.13 -0.14
## [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37]
## ACF 0.17 -0.04 0.03 -0.05 -0.09 -0.01 0.19 -0.03 -0.09 -0.02 -0.04 0.17
## PACF 0.07 -0.04 -0.02 0.02 -0.06 -0.07 0.05 0.07 -0.06 0.05 -0.16 -0.01
## [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49]
## ACF -0.14 0.03 -0.05 0.03 0.10 0 -0.10 -0.03 0.06 0.02 0.01 -0.01
## PACF -0.04 -0.01 -0.03 -0.01 0.01 0 0.03 -0.02 -0.07 0.05 -0.11 0.05
## [,50] [,51] [,52] [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60]
## ACF 0.06 -0.08 0.03 0.01 -0.02 -0.01 0.00 -0.07 0.17 -0.04 -0.01
## PACF 0.06 -0.03 -0.03 0.04 0.02 -0.04 -0.01 -0.13 0.07 0.07 -0.05
# Fit SARIMA(0,1,1)x(0,1,1)_12. What happens?
sarima(birth, p = 0, d = 1, q = 1, P = 0, D = 1, Q = 1, S = 12)
## initial value 2.219164
## iter 2 value 2.013310
## iter 3 value 1.988107
## iter 4 value 1.980026
## iter 5 value 1.967594
## iter 6 value 1.965384
## iter 7 value 1.965049
## iter 8 value 1.964993
## iter 9 value 1.964992
## iter 9 value 1.964992
## iter 9 value 1.964992
## final value 1.964992
## converged
## initial value 1.951264
## iter 2 value 1.945867
## iter 3 value 1.945729
## iter 4 value 1.945723
## iter 5 value 1.945723
## iter 5 value 1.945723
## iter 5 value 1.945723
## final value 1.945723
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), include.mean = !no.constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ma1 sma1
## -0.4734 -0.7861
## s.e. 0.0598 0.0451
##
## sigma^2 estimated as 47.4: log likelihood = -1211.28, aic = 2428.56
##
## $degrees_of_freedom
## [1] 358
##
## $ttable
## Estimate SE t.value p.value
## ma1 -0.4734 0.0598 -7.9097 0
## sma1 -0.7861 0.0451 -17.4227 0
##
## $AIC
## [1] 6.545975
##
## $AICc
## [1] 6.546062
##
## $BIC
## [1] 6.577399
# Fit another model, this time with an AR
sarima(birth, p = 1, d = 1, q = 1, P = 0, D = 1, Q = 1, S = 12)
## initial value 2.218186
## iter 2 value 2.032584
## iter 3 value 1.982464
## iter 4 value 1.975643
## iter 5 value 1.971721
## iter 6 value 1.967284
## iter 7 value 1.963840
## iter 8 value 1.961106
## iter 9 value 1.960849
## iter 10 value 1.960692
## iter 11 value 1.960683
## iter 12 value 1.960675
## iter 13 value 1.960672
## iter 13 value 1.960672
## iter 13 value 1.960672
## final value 1.960672
## converged
## initial value 1.940459
## iter 2 value 1.934425
## iter 3 value 1.932752
## iter 4 value 1.931750
## iter 5 value 1.931074
## iter 6 value 1.930882
## iter 7 value 1.930860
## iter 8 value 1.930859
## iter 8 value 1.930859
## final value 1.930859
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), include.mean = !no.constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ma1 sma1
## 0.3038 -0.7006 -0.8000
## s.e. 0.0865 0.0604 0.0441
##
## sigma^2 estimated as 45.91: log likelihood = -1205.93, aic = 2419.85
##
## $degrees_of_freedom
## [1] 357
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.3038 0.0865 3.5104 5e-04
## ma1 -0.7006 0.0604 -11.5984 0e+00
## sma1 -0.8000 0.0441 -18.1302 0e+00
##
## $AIC
## [1] 6.522519
##
## $AICc
## [1] 6.522695
##
## $BIC
## [1] 6.564418
The residual analysis from the first fit indicated that the residuals were not white noise. Hence, it was necessary to include an additional nonseasonal AR term to account for the extra correlation.
Forecasting seasonal ARIMA
Forecasting monthly unemployment
Previously, you fit an SARIMA\((2,1,0, 0,1,1)_{12}\) model to the monthly US unemployment time series unemp
. You will now use that model to forecast the data 3 years.
The unemp
data are preloaded into your R workspace and plotted.
plot(unemp)
# Fit your previous model to unemp and check the diagnostics
sarima(unemp, p = 2, d = 1, q = 0, P = 0, D = 1, Q = 1, S = 12)
## initial value 3.340809
## iter 2 value 3.105512
## iter 3 value 3.086631
## iter 4 value 3.079778
## iter 5 value 3.069447
## iter 6 value 3.067659
## iter 7 value 3.067426
## iter 8 value 3.067418
## iter 8 value 3.067418
## final value 3.067418
## converged
## initial value 3.065481
## iter 2 value 3.065478
## iter 3 value 3.065477
## iter 3 value 3.065477
## iter 3 value 3.065477
## final value 3.065477
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), include.mean = !no.constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ar2 sma1
## 0.1351 0.2464 -0.6953
## s.e. 0.0513 0.0515 0.0381
##
## sigma^2 estimated as 449.6: log likelihood = -1609.91, aic = 3227.81
##
## $degrees_of_freedom
## [1] 356
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.1351 0.0513 2.6326 0.0088
## ar2 0.2464 0.0515 4.7795 0.0000
## sma1 -0.6953 0.0381 -18.2362 0.0000
##
## $AIC
## [1] 8.723811
##
## $AICc
## [1] 8.723988
##
## $BIC
## [1] 8.765793
# Forecast the data 3 years into the future
sarima.for(unemp, n.ahead = 36, p = 2, d = 1, q = 0, P = 0, D = 1, Q = 1, S = 12)
## $pred
## Jan Feb Mar Apr May Jun Jul Aug
## 1979 676.4664 685.1172 653.2388 585.6939 553.8813 664.4072 647.0657 611.0828
## 1980 683.3045 687.7649 654.8658 586.1507 553.9285 664.1108 646.6220 610.5345
## 1981 682.6406 687.0977 654.1968 585.4806 553.2579 663.4398 645.9508 609.8632
## Sep Oct Nov Dec
## 1979 594.6414 569.3997 587.5801 581.1833
## 1980 594.0427 568.7684 586.9320 580.5249
## 1981 593.3713 568.0970 586.2606 579.8535
##
## $se
## Jan Feb Mar Apr May Jun Jul
## 1979 21.20465 32.07710 43.70167 53.66329 62.85364 71.12881 78.73590
## 1980 116.99599 124.17344 131.51281 138.60466 145.49706 152.12863 158.52302
## 1981 194.25167 201.10648 208.17066 215.11503 221.96039 228.64285 235.16874
## Aug Sep Oct Nov Dec
## 1979 85.75096 92.28663 98.41329 104.19488 109.67935
## 1980 164.68623 170.63839 176.39520 181.97333 187.38718
## 1981 241.53258 247.74268 253.80549 259.72970 265.52323
As you can see, the forecast is able to replicate much of the seasonal variation in the original unemployment data.
How hard is it to forecast commodity prices?
As previously mentioned, making money in commodities is not easy. To see a difficulty in predicting a commodity, you will forecast the price of chicken to five years in the future. When you complete your forecasts, you will note that even just a few years out, the acceptable range of prices is very large. This is because commodities are subject to many sources of variation.
Recall that you previously fit an SARIMA\((2,1,0, 1,0,0)_{12}\) model to the monthly US chicken price series chicken
. You will use this model to calculate your forecasts.
The astsa package is preloaded for you and the monthly price of chicken data (chicken
) are plotted.
plot(chicken)
# Fit the chicken model again and check diagnostics
sarima(chicken, p = 2, d = 1, q = 0, P = 1, D = 0, Q = 0, S = 12)
## initial value 0.015039
## iter 2 value -0.226398
## iter 3 value -0.412955
## iter 4 value -0.460882
## iter 5 value -0.470787
## iter 6 value -0.471082
## iter 7 value -0.471088
## iter 8 value -0.471090
## iter 9 value -0.471092
## iter 10 value -0.471095
## iter 11 value -0.471095
## iter 12 value -0.471096
## iter 13 value -0.471096
## iter 14 value -0.471096
## iter 15 value -0.471097
## iter 16 value -0.471097
## iter 16 value -0.471097
## iter 16 value -0.471097
## final value -0.471097
## converged
## initial value -0.473585
## iter 2 value -0.473664
## iter 3 value -0.473721
## iter 4 value -0.473823
## iter 5 value -0.473871
## iter 6 value -0.473885
## iter 7 value -0.473886
## iter 8 value -0.473886
## iter 8 value -0.473886
## iter 8 value -0.473886
## final value -0.473886
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ar2 sar1 constant
## 0.9154 -0.2494 0.3237 0.2353
## s.e. 0.0733 0.0739 0.0715 0.1973
##
## sigma^2 estimated as 0.3828: log likelihood = -169.16, aic = 348.33
##
## $degrees_of_freedom
## [1] 175
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.9154 0.0733 12.4955 0.0000
## ar2 -0.2494 0.0739 -3.3728 0.0009
## sar1 0.3237 0.0715 4.5238 0.0000
## constant 0.2353 0.1973 1.1923 0.2347
##
## $AIC
## [1] 1.945971
##
## $AICc
## [1] 1.947256
##
## $BIC
## [1] 2.035005
# Forecast the chicken data 5 years into the future
sarima.for(chicken, n.ahead = 60, p = 2, d = 1, q = 0, P = 1, D = 0, Q = 0, S = 12)
## $pred
## Jan Feb Mar Apr May Jun Jul Aug
## 2016 111.0907
## 2017 110.5358 110.5612 110.5480 110.7055 111.0047 111.1189 111.1552 111.1948
## 2018 111.8108 111.9782 112.1330 112.3431 112.5991 112.7952 112.9661 113.1380
## 2019 114.1331 114.3464 114.5556 114.7827 115.0247 115.2473 115.4617 115.6765
## 2020 116.7942 117.0224 117.2492 117.4819 117.7193 117.9505 118.1790 118.4077
## 2021 119.5651 119.7980 120.0306 120.2650 120.5010 120.7350 120.9681
## Sep Oct Nov Dec
## 2016 110.8740 110.6853 110.5045 110.5527
## 2017 111.2838 111.3819 111.4825 111.6572
## 2018 113.3260 113.5168 113.7085 113.9242
## 2019 115.8965 116.1174 116.3386 116.5675
## 2020 118.6380 118.8686 119.0993 119.3326
## 2021
##
## $se
## Jan Feb Mar Apr May Jun
## 2016
## 2017 3.7414959 4.1793190 4.5747009 4.9373266 5.2742129 5.5903499
## 2018 8.2010253 8.5605811 8.9054714 9.2372195 9.5572539 9.8667955
## 2019 12.0038164 12.2921541 12.5738417 12.8492868 13.1188976 13.3830477
## 2020 15.1557253 15.3959082 15.6323906 15.8653300 16.0948844 16.3212022
## 2021 17.8397890 18.0473081 18.2524651 18.4553364 18.6559977 18.8545213
## Jul Aug Sep Oct Nov Dec
## 2016 0.6187194 1.3368594 2.0462419 2.6867986 3.2486625
## 2017 5.8893133 6.2367345 6.6253573 7.0309771 7.4344077 7.8255932
## 2018 10.1668604 10.4736807 10.7857727 11.0980056 11.4063211 11.7085266
## 2019 13.6420693 13.9002670 14.1573839 14.4122197 14.6638269 14.9117124
## 2020 16.5444204 16.7657634 16.9852163 17.2025022 17.4174076 17.6298379
## 2021 19.0509752
You have now mastered the process of detrending your time series data, exploring the qualities of the data to determine an appropriate model, fitting and adjusting a model, and even forecasting based on the model!
Congratulations!
What you’ve learned
Don’t stop here
astsa
package