ARIMA Models in R

Time series regression models

Regression: \(Y_i = \beta X_i + \epsilon_i\), where \(\epsilon_i\) is white noise - assumptions about errors: - independent - normally distributed - homoscedastic - i.e., white noise - white noise: - independent normals with common variance - is basic building block of time series

AutoRegression: \(X_t = \phi X_{t-1} + \epsilon_t\) (\(\epsilon_t\) is white noise) - assuming that \(t\) and \(t-1\) are not correlated - may lead to bad forecasts

Moving Average: \(\epsilon_t = W_t + \theta W_{t-1}\) (\(W_t\) is white noise)
- moving average assumes \(t\) and \(t-1\) are correlated

ARMA: \(X_t = \phi X_{t-1} + W_t + \theta W_{t-1}\) - AutoRegression + Moving Average - i.e., AutoRegression with autocorrelated errors

Data play

In the video, you saw various types of data. In this exercise, you will plot additional time series data and compare them to what you saw in the video. It is useful to think about how these time series compare to the series in the video. In particular, concentrate on the type of trend, seasonality or periodicity, and homoscedasticity.

Before you use a data set for the first time, you should use the help system to see the details of the data. For example, use help(AirPassengers) or ?AirPassengers to see the details of the series.

library(pacman)
p_load(astsa, xts)

# View a detailed description of AirPassengers
help(AirPassengers)

## starting httpd help server ... done

# Plot AirPassengers
plot(AirPassengers)

# Plot the DJIA daily closings
plot(astsa::djia$Close)

# Plot the Southern Oscillation Index (soi)
plot(astsa::soi)

As you can see, the AirPassengers dataset contains monthly information on airline passengers from 1949 through 1960. Note that when you plot ts objects using plot(), the data will automatically be plotted over time.

The AirPassengers data show a handful of important qualities, including seasonality, trend, and heteroscedasticity, which distinguish the data from standard white noise.

Stationarity and nonstationarity

A time series is stationary when it is “stable”, meaning: - the mean is constant over time (no trend) - the correlation structure remains constant over time

Stationarity

Given data, \(x_1,...,x_n\) we can estimate by averaging

For example, if the mean is constant, we can estimate it by the sample average \(\bar{x}\)

Pairs can be used to estimate correlation on different lags: - \((x_1,x_2), (x_2,x_3), (x_3,x_4)\), for lag 1 - \((x_1,x_3), (x_2,x_4), (x_3,x_5)\), for lag 2

Random Walk trend

Not stationary, but differenced data are stationary - e.g., \(X_t\) for global temperatures trends upwards - \(X_t-X_{t-1}\) is stationary with random movement

Trend Stationarity

Stationary around a trend, differencing still works like Random Walk!

Nonstationarity in trend and variability - First log (stabilize variance and/or linearize trend), then difference - \(X_t\) - \(log(X_t)\) - \(log(X_t) - log(X_{t-1})\)

Differencing

As seen in the video, when a time series is trend stationary, it will have stationary behavior around a trend. A simple example is \(Y_t=\alpha+\beta t+Xt\) where \(X_t\) is stationary.

A different type of model for trend is random walk, which has the form \(X_t=X_{t−1}+W_t\), where \(W_t\) is white noise. It is called a random walk because at time \(t\) the process is where it was at time \(t−1\) plus a completely random movement. For a random walk with drift, a constant is added to the model and will cause the random walk to drift in the direction (positive or negative) of the drift.

We simulated and plotted data from these models. Note the difference in the behavior of the two models.

In both cases, simple differencing can remove the trend and coerce the data to stationarity. Differencing looks at the difference between the value of a time series at a certain point in time and its preceding value. That is, \(X_t−X_{t−1}\) is computed.

To check that it works, you will difference each generated time series and plot the detrended series. If a time series is in x, then diff(x) will have the detrended series obtained by differencing the data. To plot the detrended series, simply use plot(diff(x)).

# Plot detrended y (trend stationary)
plot(diff(y))

# Plot detrended x (random walk)
plot(diff(x))

As you can see, differencing both your trend stationary data and your random walk data has the effect of removing the trends, despite the important differences between the two datasets.

Detrending data

As you have seen in the previous exercise, differencing is generally good for removing trend from time series data. Recall that differencing looks at the difference between the value of a time series at a certain point in time and its preceding value.

In this exercise, you will use differencing diff() to detrend and plot real time series data.

# Plot globtemp and detrended globtemp
par(mfrow = c(2,1))
plot(astsa::globtemp) 
plot(diff(astsa::globtemp))

# Plot cmort and detrended cmort
par(mfrow = c(2,1))
plot(astsa::cmort)
plot(diff(astsa::cmort))

Differencing is a great way to remove trends from your data.

Dealing with trend and heteroscedasticity Here, we will coerce nonstationary data to stationarity by calculating the return or growth rate as follows.

Often time series are generated as

\(X_t=(1+p_t)X_{t−1}\)

meaning that the value of the time series observed at time \(t\) equals the value observed at time \(t−1\) and a small percent change \(p_t\) at time \(t\).

A simple deterministic example is putting money into a bank with a fixed interest \(p\). In this case, \(X_t\) is the value of the account at time period \(t\) with an initial deposit of \(X_0\).

Typically, \(p_t\) is referred to as the return or growth rate of a time series, and this process is often stable.

For reasons that are outside the scope of this course, it can be shown that the growth rate \(p_t\) can be approximated by

\(Y_t = logXt−logX_{t−1} ≈ p_t\)

In R, \(p_t\) is often calculated as diff(log(x)) and plotting it can be done in one line plot(diff(log(x))).

# astsa and xts are preloaded 

# Plot GNP series (gnp) and its growth rate
par(mfrow = c(2,1))
plot(astsa::gnp)
plot(diff(log(astsa::gnp)))

# Plot the DJIA closings (djia$Close) and its returns
par(mfrow = c(2,1))
plot(astsa::djia$Close)
plot(diff(log((astsa::djia$Close))))

Once again, by combining a few commands to manipulate your data, you can coerce otherwise nonstationary data to stationarity.

Stationary time series: ARMA

Wold Decomposition

Wold proved that any stationary time series may be represented as a linear combination of white noise:

\(X_t = W_t + a_1W_{t-1} + a_2W_{t-2} + ...\)

for constants \(a_1, a_2, ...\)

Any ARMA model has this form, which means they are suited to modeling time series

Note: Special case of MA(q) is already in this form, where constants are 0 after q-th term.

**Generating ARMA using arima.sim() - Basic syntax

arima.sim(model, n, ...)

model is a list with order of the model as c(p, d, q) and the coefficients
- p - AR order
- d - to be discussed
- q - MA order

Simulating ARMA models

As we saw in the video, any stationary time series can be written as a linear combination of white noise. In addition, any ARMA model has this form, so it is a good choice for modeling stationary time series.

R provides a simple function called arima.sim() to generate data from an ARMA model. For example, the syntax for generating 100 observations from an MA(1) with parameter .9 is arima.sim(model = list(order = c(0, 0, 1), ma = .9 ), n = 100). You can also use order = c(0, 0, 0) to generate white noise.

In this exercise, you will generate data from various ARMA models. For each command, generate 200 observations and plot the result.

# Generate and plot white noise
WN <- arima.sim(model = list(order = c(0, 0, 0)), n = 200)
plot(WN)

# Generate and plot an MA(1) with parameter .9 by filtering the noise
MA <- arima.sim(model = list(order = c(0, 0, 1), ma = .9), n = 200)  
plot(MA)

# Generate and plot an AR(1) with parameters 1.5 and -.75
AR <- arima.sim(model = list(order = c(2, 0, 0), ar = c(1.5, -.75)), n = 200) 
plot(AR)

The arima.sim() command is a very useful way to quickly simulate time series data.

AR and MA models

Identifying model types

Estimation

Estimation for time series is similar to using least squares for regression
Estimates are obtained numerically using ideas of Gauss and Newton

Fitting an AR(1) model

Recall that you use the ACF and PACF pair to help identify the orders \(p\) and \(q\) of an ARMA model.

In this exercise, you will generate data from the AR(1) model,

\(X_t=.9X_{t−1}+W_t\),

look at the simulated data and the sample ACF and PACF pair to determine the order. Then, you will fit the model and compare the estimated parameters to the true parameters.

Throughout this course, you will be using sarima() from the astsa package to easily fit models to data. The command produces a residual diagnostic graphic that can be ignored until diagnostics is discussed later in the chapter.

# Generate 100 observations from the AR(1) model
x <- arima.sim(model = list(order = c(1, 0, 0), ar = .9), n = 100)

# Plot the generated data  
plot(x)

# Plot the sample P/ACF pair
astsa::acf2(x)

##      [,1] [,2]  [,3]  [,4]  [,5] [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12] [,13]
## ACF  0.88 0.80  0.72  0.64  0.56 0.50  0.43  0.37  0.31  0.28   0.2  0.13  0.05
## PACF 0.88 0.12 -0.03 -0.02 -0.08 0.04 -0.02 -0.05 -0.01  0.06  -0.2 -0.04 -0.12
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20]
## ACF  -0.03 -0.07 -0.12 -0.14 -0.18 -0.21 -0.21
## PACF -0.09  0.12 -0.06  0.06 -0.09 -0.04  0.13

# Fit an AR(1) to the data and examine the t-table
astsa::sarima(x, p = 1, d = 0, q = 0)

## initial  value 0.846956 
## iter   2 value 0.041752
## iter   3 value 0.040535
## iter   4 value 0.040504
## iter   5 value 0.040462
## iter   6 value 0.040408
## iter   7 value 0.040389
## iter   8 value 0.040386
## iter   9 value 0.040368
## iter  10 value 0.040368
## iter  10 value 0.040368
## iter  10 value 0.040368
## final  value 0.040368 
## converged
## initial  value 0.062199 
## iter   2 value 0.061300
## iter   3 value 0.059627
## iter   4 value 0.059327
## iter   5 value 0.059084
## iter   6 value 0.058845
## iter   7 value 0.058767
## iter   8 value 0.058584
## iter   9 value 0.058577
## iter  10 value 0.058575
## iter  11 value 0.058564
## iter  12 value 0.058564
## iter  13 value 0.058564
## iter  13 value 0.058564
## iter  13 value 0.058564
## final  value 0.058564 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
##     fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##         ar1   xmean
##       0.903  0.4713
## s.e.  0.043  1.0156
## 
## sigma^2 estimated as 1.105:  log likelihood = -147.75,  aic = 301.5
## 
## $degrees_of_freedom
## [1] 98
## 
## $ttable
##       Estimate     SE t.value p.value
## ar1     0.9030 0.0430 21.0132  0.0000
## xmean   0.4713 1.0156  0.4641  0.6436
## 
## $AIC
## [1] 3.015005
## 
## $AICc
## [1] 3.016242
## 
## $BIC
## [1] 3.09316

As you can see, the sarima() command provides extensive output to understand the results of your model fit. What do you glean from this output?

Fitting an AR(2) model

For this exercise, we generated data from the AR(2) model,

\(X_t=1.5X_{t−1}−.75X_{t−2}+W_t\)

Look at the simulated dataand the sample ACF and PACF pair to determine the model order. Then fit the model and compare the estimated parameters to the true parameters.

x <- arima.sim(model = list(order = c(2, 0, 0), ar = c(1.5, -.75)), n = 200)

# Plot x
plot(x)

# Plot the sample P/ACF of x
astsa::acf2(x)

##      [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8] [,9] [,10] [,11] [,12]
## ACF  0.85  0.53  0.16 -0.15 -0.34 -0.37 -0.28 -0.11 0.08  0.25  0.36  0.39
## PACF 0.85 -0.73 -0.02  0.02  0.05  0.08 -0.02  0.09 0.11  0.11  0.01  0.01
##      [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF   0.33  0.22  0.08 -0.07 -0.19 -0.27 -0.29 -0.23 -0.12  0.02  0.16  0.26
## PACF  0.04  0.01 -0.03 -0.10 -0.03 -0.10  0.01  0.06 -0.03  0.02  0.10 -0.02
##      [,25]
## ACF   0.29
## PACF  0.00

# Fit an AR(2) to the data and examine the t-table
astsa::sarima(x, p = 2, d = 0, q = 0)

## initial  value 1.083715 
## iter   2 value 0.960271
## iter   3 value 0.533310
## iter   4 value 0.301466
## iter   5 value 0.102656
## iter   6 value 0.038199
## iter   7 value 0.011531
## iter   8 value 0.003597
## iter   9 value 0.003589
## iter  10 value 0.003574
## iter  11 value 0.003572
## iter  12 value 0.003570
## iter  13 value 0.003570
## iter  14 value 0.003570
## iter  15 value 0.003570
## iter  15 value 0.003570
## final  value 0.003570 
## converged
## initial  value 0.010372 
## iter   2 value 0.010350
## iter   3 value 0.010329
## iter   4 value 0.010325
## iter   5 value 0.010325
## iter   6 value 0.010325
## iter   6 value 0.010325
## iter   6 value 0.010325
## final  value 0.010325 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
##     fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1      ar2    xmean
##       1.5035  -0.7536  -0.1930
## s.e.  0.0463   0.0465   0.2836
## 
## sigma^2 estimated as 1.006:  log likelihood = -285.85,  aic = 579.71
## 
## $degrees_of_freedom
## [1] 197
## 
## $ttable
##       Estimate     SE  t.value p.value
## ar1     1.5035 0.0463  32.4714  0.0000
## ar2    -0.7536 0.0465 -16.2154  0.0000
## xmean  -0.1930 0.2836  -0.6806  0.4969
## 
## $AIC
## [1] 2.898527
## 
## $AICc
## [1] 2.89914
## 
## $BIC
## [1] 2.964494

As you can see from the t-table output, the estimates produced by the sarima() command are close to the true parameters.

Fitting an MA(1) model

In this exercise, we generated data from an MA(1) model,

\(X_t=W_t−.8W_{t−1}\)

Look at the simulated data and the sample ACF and PACF to determine the order based on the table given in the first exercise. Then fit the model.

Recall that for pure MA(q) models, the theoretical ACF will cut off at lag q while the PACF will tail off.

x <- arima.sim(model = list(order = c(0, 0, 1), ma = -.8), n = 100)

# Plot x
plot(x)

# Plot the sample P/ACF of x
acf2(x)

##       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12]
## ACF  -0.54  0.06  0.02 -0.07  0.00  0.14 -0.10 -0.05  0.11  0.04 -0.16  0.05
## PACF -0.54 -0.33 -0.18 -0.21 -0.24 -0.01 -0.01 -0.14 -0.03  0.19  0.01 -0.13
##      [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
## ACF   0.04 -0.07  0.20 -0.20  0.02  0.07 -0.01 -0.03
## PACF -0.01 -0.04  0.15 -0.02 -0.05  0.03  0.03 -0.02

# Fit an MA(1) to the data and examine the t-table
sarima(x, p = 0, d = 0, q = 1)

## initial  value 0.206958 
## iter   2 value -0.023059
## iter   3 value -0.067090
## iter   4 value -0.070087
## iter   5 value -0.077113
## iter   6 value -0.077915
## iter   7 value -0.078558
## iter   8 value -0.079020
## iter   9 value -0.079039
## iter  10 value -0.079043
## iter  10 value -0.079043
## iter  10 value -0.079043
## final  value -0.079043 
## converged
## initial  value -0.074439 
## iter   2 value -0.074466
## iter   3 value -0.074533
## iter   4 value -0.074533
## iter   4 value -0.074533
## iter   4 value -0.074533
## final  value -0.074533 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
##     fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ma1   xmean
##       -0.7928  0.0131
## s.e.   0.0555  0.0199
## 
## sigma^2 estimated as 0.853:  log likelihood = -134.44,  aic = 274.88
## 
## $degrees_of_freedom
## [1] 98
## 
## $ttable
##       Estimate     SE  t.value p.value
## ma1    -0.7928 0.0555 -14.2939  0.0000
## xmean   0.0131 0.0199   0.6585  0.5118
## 
## $AIC
## [1] 2.74881
## 
## $AICc
## [1] 2.750047
## 
## $BIC
## [1] 2.826965

Once again, the parameter estimates produced by sarima() come quite close to the input specified when the data was created (-.8)

AR and MA together: ARMA

\(X_t = \phi X_{t-1} + W_t + \theta W_{t-1}\) autoregression with correlated errors

Fitting an ARMA model

You are now ready to merge the AR model and the MA model into the ARMA model. We generated data from the ARMA(2,1) model,

\(X_t= X_{t−1} − .9X_{t−2} + W_t + .8W_{t−1}\),

x <- arima.sim(model = list(order = c(2, 0, 1), ar = c(1, -.9), ma = .8), n = 250).

Look at the simulated data and the sample ACF and PACF pair to determine a possible model.

Recall that for \(ARMA(p,q)\) models, both the theoretical ACF and PACF tail off. In this case, the orders are difficult to discern from data and it may not be clear if either the sample ACF or sample PACF is cutting off or tailing off. In this case, you know the actual model orders, so fit an ARMA(2,1) to the generated data. General modeling strategies will be discussed further in the course.

x <- arima.sim(model = list(order = c(2, 0, 1), ar = c(1, -.9), ma = .8), n = 250)

# astsa is preloaded

# Plot x
plot(x)

# Plot the sample P/ACF of x
astsa::acf2(x)

##      [,1]  [,2]  [,3]  [,4] [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12]
## ACF  0.56 -0.31 -0.76 -0.47 0.18  0.53  0.32 -0.16 -0.41 -0.21  0.19  0.36
## PACF 0.56 -0.90  0.48 -0.36 0.15 -0.20 -0.02 -0.01  0.06  0.05 -0.02 -0.04
##      [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF   0.15 -0.21 -0.34 -0.13  0.19  0.29  0.09 -0.18 -0.23 -0.06  0.14  0.16
## PACF  0.00 -0.01  0.02 -0.01 -0.02 -0.06  0.06 -0.01 -0.03 -0.03 -0.05  0.05
##      [,25] [,26]
## ACF   0.01 -0.14
## PACF -0.10  0.03

# Fit an ARMA(2,1) to the data and examine the t-table
astsa::sarima(x, p = 2, d = 0, q = 1)

## initial  value 1.267650 
## iter   2 value 0.509271
## iter   3 value 0.231477
## iter   4 value 0.225223
## iter   5 value 0.224361
## iter   6 value 0.221953
## iter   7 value 0.212395
## iter   8 value 0.193387
## iter   9 value 0.137473
## iter  10 value 0.087587
## iter  11 value 0.043601
## iter  12 value 0.021179
## iter  13 value 0.018108
## iter  14 value 0.016541
## iter  15 value 0.013846
## iter  16 value 0.013828
## iter  17 value 0.013805
## iter  18 value 0.013805
## iter  19 value 0.013805
## iter  19 value 0.013805
## final  value 0.013805 
## converged
## initial  value 0.021221 
## iter   2 value 0.021158
## iter   3 value 0.021125
## iter   4 value 0.021105
## iter   5 value 0.021088
## iter   6 value 0.021087
## iter   7 value 0.021087
## iter   7 value 0.021087
## iter   7 value 0.021087
## final  value 0.021087 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
##     fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##         ar1      ar2     ma1   xmean
##       0.972  -0.8534  0.7595  0.0502
## s.e.  0.033   0.0324  0.0378  0.1278
## 
## sigma^2 estimated as 1.021:  log likelihood = -360.01,  aic = 730.01
## 
## $degrees_of_freedom
## [1] 246
## 
## $ttable
##       Estimate     SE  t.value p.value
## ar1     0.9720 0.0330  29.4939  0.0000
## ar2    -0.8534 0.0324 -26.3164  0.0000
## ma1     0.7595 0.0378  20.0785  0.0000
## xmean   0.0502 0.1278   0.3929  0.6947
## 
## $AIC
## [1] 2.920052
## 
## $AICc
## [1] 2.920705
## 
## $BIC
## [1] 2.990481

As you can see, the sarima() command can estimate parameter values for many different types of models, including AR, MA, and ARMA.

Identify an ARMA model

Look at (1) the data plots and (2) the sample ACF and PACF of the logged and differenced varve series:

dl_varve <- diff(log(astsa::varve))

Use help(varve) to read about the data or use an internet search on varve to learn more.

From the ACF and PACF, which model is the most likely model for dl_varve?

Remember that an MA(q) model has an ACF that cuts off at lag q and a PACF that tails off. In this case, the ACF cuts off at lag 1 and the PACF tails off, suggesting an MA(1) model.

Model choice and residual analysis

AIC and BIC

Error		Number of Parameters
\(\textit{average(observed-predicted)}^\textit{2}\)	+	\(\textit{k(p+q)}\)

AIC and BIC measure the error and penalize (differently) for adding parameters
for example AIC has k = 2 and BIC has k = log(n)
Goal: find the model with the smallest AIC or BIC

Residual Analysis

sarima() includes residual analysis and graphic showing:

Standardized residuals

should be white noise

Sample ACF of residuals

points within blue lines indicates normality

Normal Q-Q plot

points should mostly be on the line

Q-statistic p-values

points above blue line indicates normality

Model choice - I Based on the sample P/ACF pair of the logged and differenced varve data (dl_varve), an MA(1) was indicated. The best approach to fitting ARMA is to start with a low order model, and then try to add a parameter at a time to see if the results change.

In this exercise, you will fit various models to the dl_varve data and note the AIC and BIC for each model. In the next exercise, you will use these AICs and BICs to choose a model. Remember that you want to retain the model with the smallest AIC and/or BIC value.

A note before you start:

sarima(x, p = 0, d = 0, q = 1) and sarima(x, 0, 0, 1)

are the same.

dl_varve <- diff(log(astsa::varve))

# Fit an MA(1) to dl_varve.   
sarima(dl_varve, p = 0, d = 0, q = 1)

## initial  value -0.551780 
## iter   2 value -0.671633
## iter   3 value -0.706234
## iter   4 value -0.707586
## iter   5 value -0.718543
## iter   6 value -0.719692
## iter   7 value -0.721967
## iter   8 value -0.722970
## iter   9 value -0.723231
## iter  10 value -0.723247
## iter  11 value -0.723248
## iter  12 value -0.723248
## iter  12 value -0.723248
## iter  12 value -0.723248
## final  value -0.723248 
## converged
## initial  value -0.722762 
## iter   2 value -0.722764
## iter   3 value -0.722764
## iter   4 value -0.722765
## iter   4 value -0.722765
## iter   4 value -0.722765
## final  value -0.722765 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
##     fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ma1    xmean
##       -0.7710  -0.0013
## s.e.   0.0341   0.0044
## 
## sigma^2 estimated as 0.2353:  log likelihood = -440.68,  aic = 887.36
## 
## $degrees_of_freedom
## [1] 631
## 
## $ttable
##       Estimate     SE  t.value p.value
## ma1    -0.7710 0.0341 -22.6002  0.0000
## xmean  -0.0013 0.0044  -0.2818  0.7782
## 
## $AIC
## [1] 1.401826
## 
## $AICc
## [1] 1.401856
## 
## $BIC
## [1] 1.422918

# Fit an MA(2) to dl_varve. Improvement?
sarima(dl_varve, p = 0, d = 0, q = 2)

## initial  value -0.551780 
## iter   2 value -0.679736
## iter   3 value -0.728605
## iter   4 value -0.734640
## iter   5 value -0.735449
## iter   6 value -0.735979
## iter   7 value -0.736015
## iter   8 value -0.736059
## iter   9 value -0.736060
## iter  10 value -0.736060
## iter  11 value -0.736061
## iter  12 value -0.736061
## iter  12 value -0.736061
## iter  12 value -0.736061
## final  value -0.736061 
## converged
## initial  value -0.735372 
## iter   2 value -0.735378
## iter   3 value -0.735379
## iter   4 value -0.735379
## iter   4 value -0.735379
## iter   4 value -0.735379
## final  value -0.735379 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
##     fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ma1      ma2    xmean
##       -0.6710  -0.1595  -0.0013
## s.e.   0.0375   0.0392   0.0033
## 
## sigma^2 estimated as 0.2294:  log likelihood = -432.69,  aic = 873.39
## 
## $degrees_of_freedom
## [1] 630
## 
## $ttable
##       Estimate     SE  t.value p.value
## ma1    -0.6710 0.0375 -17.9057  0.0000
## ma2    -0.1595 0.0392  -4.0667  0.0001
## xmean  -0.0013 0.0033  -0.4007  0.6888
## 
## $AIC
## [1] 1.379757
## 
## $AICc
## [1] 1.379817
## 
## $BIC
## [1] 1.40788

# Fit an ARMA(1,1) to dl_varve. Improvement?
sarima(dl_varve, p = 1, d = 0, q = 1)

## initial  value -0.550994 
## iter   2 value -0.648962
## iter   3 value -0.676965
## iter   4 value -0.699167
## iter   5 value -0.724554
## iter   6 value -0.726719
## iter   7 value -0.729066
## iter   8 value -0.731976
## iter   9 value -0.734235
## iter  10 value -0.735969
## iter  11 value -0.736410
## iter  12 value -0.737045
## iter  13 value -0.737600
## iter  14 value -0.737641
## iter  15 value -0.737643
## iter  16 value -0.737643
## iter  17 value -0.737643
## iter  18 value -0.737643
## iter  18 value -0.737643
## iter  18 value -0.737643
## final  value -0.737643 
## converged
## initial  value -0.737522 
## iter   2 value -0.737527
## iter   3 value -0.737528
## iter   4 value -0.737529
## iter   5 value -0.737530
## iter   5 value -0.737530
## iter   5 value -0.737530
## final  value -0.737530 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
##     fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1      ma1    xmean
##       0.2341  -0.8871  -0.0013
## s.e.  0.0518   0.0292   0.0028
## 
## sigma^2 estimated as 0.2284:  log likelihood = -431.33,  aic = 870.66
## 
## $degrees_of_freedom
## [1] 630
## 
## $ttable
##       Estimate     SE  t.value p.value
## ar1     0.2341 0.0518   4.5184  0.0000
## ma1    -0.8871 0.0292 -30.4107  0.0000
## xmean  -0.0013 0.0028  -0.4618  0.6444
## 
## $AIC
## [1] 1.375456
## 
## $AICc
## [1] 1.375517
## 
## $BIC
## [1] 1.403579

IC and BIC help you find the model with the smallest error using the least number of parameters. The idea is based on the parsimony principle, which is basic to all science and tells you to choose the simplest scientific explanation that fits the evidence.

Residual analysis - I

As you saw in the video, an sarima() run includes a residual analysis graphic. Specifically, the output shows (1) the standardized residuals, (2) the sample ACF of the residuals, (3) a normal Q-Q plot, and (4) the p-values corresponding to the Box-Ljung-Pierce Q-statistic.

In each run, check the four residual plots as follows (model assumes errors are Gaussian white noise):

The standardized residuals should behave as a white noise sequence with mean zero and variance one. Examine the residual plot for departures from this behavior.
The sample ACF of the residuals should look like that of white noise. Examine the ACF for departures from this behavior.
Normality is an essential assumption when fitting ARMA models. Examine the Q-Q plot for departures from normality and to identify outliers.
Use the Q-statistic plot to help test for departures from whiteness of the residuals.

As in the previous exercise, dl_varve <- diff(log(varve)), which is plotted below a plot of varve. The astsa package is preloaded.

par(mfrow = c(2,1))
plot(astsa::varve) 
plot(dl_varve)

# Fit an MA(1) to dl_varve. Examine the residuals  
sarima(dl_varve, p = 0, d = 0, q = 1)

## initial  value -0.551780 
## iter   2 value -0.671633
## iter   3 value -0.706234
## iter   4 value -0.707586
## iter   5 value -0.718543
## iter   6 value -0.719692
## iter   7 value -0.721967
## iter   8 value -0.722970
## iter   9 value -0.723231
## iter  10 value -0.723247
## iter  11 value -0.723248
## iter  12 value -0.723248
## iter  12 value -0.723248
## iter  12 value -0.723248
## final  value -0.723248 
## converged
## initial  value -0.722762 
## iter   2 value -0.722764
## iter   3 value -0.722764
## iter   4 value -0.722765
## iter   4 value -0.722765
## iter   4 value -0.722765
## final  value -0.722765 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
##     fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ma1    xmean
##       -0.7710  -0.0013
## s.e.   0.0341   0.0044
## 
## sigma^2 estimated as 0.2353:  log likelihood = -440.68,  aic = 887.36
## 
## $degrees_of_freedom
## [1] 631
## 
## $ttable
##       Estimate     SE  t.value p.value
## ma1    -0.7710 0.0341 -22.6002  0.0000
## xmean  -0.0013 0.0044  -0.2818  0.7782
## 
## $AIC
## [1] 1.401826
## 
## $AICc
## [1] 1.401856
## 
## $BIC
## [1] 1.422918

# Fit an ARMA(1,1) to dl_varve. Examine the residuals
sarima(dl_varve, p = 1, d = 0, q = 1)

## initial  value -0.550994 
## iter   2 value -0.648962
## iter   3 value -0.676965
## iter   4 value -0.699167
## iter   5 value -0.724554
## iter   6 value -0.726719
## iter   7 value -0.729066
## iter   8 value -0.731976
## iter   9 value -0.734235
## iter  10 value -0.735969
## iter  11 value -0.736410
## iter  12 value -0.737045
## iter  13 value -0.737600
## iter  14 value -0.737641
## iter  15 value -0.737643
## iter  16 value -0.737643
## iter  17 value -0.737643
## iter  18 value -0.737643
## iter  18 value -0.737643
## iter  18 value -0.737643
## final  value -0.737643 
## converged
## initial  value -0.737522 
## iter   2 value -0.737527
## iter   3 value -0.737528
## iter   4 value -0.737529
## iter   5 value -0.737530
## iter   5 value -0.737530
## iter   5 value -0.737530
## final  value -0.737530 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
##     fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1      ma1    xmean
##       0.2341  -0.8871  -0.0013
## s.e.  0.0518   0.0292   0.0028
## 
## sigma^2 estimated as 0.2284:  log likelihood = -431.33,  aic = 870.66
## 
## $degrees_of_freedom
## [1] 630
## 
## $ttable
##       Estimate     SE  t.value p.value
## ar1     0.2341 0.0518   4.5184  0.0000
## ma1    -0.8871 0.0292 -30.4107  0.0000
## xmean  -0.0013 0.0028  -0.4618  0.6444
## 
## $AIC
## [1] 1.375456
## 
## $AICc
## [1] 1.375517
## 
## $BIC
## [1] 1.403579

By now you have mastered constructing parameters through the sarima() command, but the rich and comprehensive output of this command is always worth exploring. What did you learn about the residuals produced by your MA(1) and ARMA(1,1) models?

ARMA get in By now you have gained considerable experience fitting ARMA models to data, but before you start celebrating, try one more exercise (sort of) on your own.

The data in oil are crude oil, WTI spot price FOB (in dollars per barrel), weekly data from 2000 to 2008. Use your skills to fit an ARMA model to the returns. The weekly crude oil prices (oil) are plotted for you. Throughout the exercise, work with the returns, which you will calculate.

As before, the astsa package is preloaded for you. The data are preloaded as oil.

oil <- ts(astsa::oil, start = 2000, end = 2008, frequency = 52)
plot(oil, main = "Crude Oil - USD per Barrel")

# Calculate approximate oil returns
oil_returns <- diff(log(oil))

# Plot oil_returns. Notice the outliers.
plot(oil_returns)

# Plot the P/ACF pair for oil_returns
acf2(oil_returns)

##      [,1]  [,2] [,3]  [,4]  [,5]  [,6]  [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF  0.14 -0.11 0.05 -0.03 -0.03 -0.10 -0.10 0.03 0.03 -0.03 -0.10 -0.01  0.07
## PACF 0.14 -0.13 0.09 -0.07  0.00 -0.12 -0.07 0.03 0.01 -0.02 -0.12  0.01  0.04
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF   0.04 -0.08 -0.03  0.01  0.05 -0.07 -0.04  0.13  0.13 -0.04 -0.03 -0.04
## PACF  0.04 -0.10  0.00 -0.04  0.06 -0.09  0.02  0.09  0.09 -0.06  0.02 -0.06
##      [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37]
## ACF   0.03 -0.03 -0.07  0.01  0.04  0.00  0.00 -0.03  0.02 -0.01 -0.03 -0.09
## PACF  0.04 -0.05 -0.01  0.03 -0.01 -0.01  0.01  0.01 -0.01 -0.05 -0.01 -0.07
##      [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49]
## ACF   0.00  0.04 -0.01 -0.06  0.06  0.02 -0.02  0.05  0.03  0.08  0.09  0.01
## PACF  0.02 -0.01 -0.01 -0.04  0.05 -0.06 -0.01  0.07  0.02  0.09  0.04  0.05
##      [,50] [,51] [,52] [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60] [,61]
## ACF  -0.06  0.07 -0.01 -0.07 -0.01 -0.06 -0.06 -0.08 -0.04 -0.02 -0.03 -0.01
## PACF -0.08  0.13 -0.08  0.03 -0.03 -0.03 -0.08 -0.04  0.00 -0.02 -0.06 -0.03
##      [,62] [,63] [,64] [,65] [,66] [,67] [,68] [,69] [,70] [,71] [,72] [,73]
## ACF   0.02  0.03  0.00 -0.03  0.00  0.01  0.03  0.00 -0.05 -0.07  0.01 -0.03
## PACF  0.02  0.01 -0.03 -0.05  0.01 -0.02  0.02 -0.06 -0.03 -0.09  0.02 -0.07
##      [,74] [,75] [,76] [,77] [,78] [,79] [,80] [,81] [,82] [,83] [,84] [,85]
## ACF  -0.05  0.02  0.07  0.06 -0.06  0.02  0.04 -0.07 -0.05  0.02  0.03  0.01
## PACF  0.01  0.00  0.06  0.04 -0.06  0.10 -0.04 -0.05 -0.02  0.05  0.02 -0.03
##      [,86] [,87] [,88] [,89] [,90] [,91] [,92] [,93] [,94] [,95] [,96] [,97]
## ACF   0.05  0.13 -0.08 -0.05  0.07 -0.06  0.01  0.09 -0.04  0.01  0.04  0.02
## PACF  0.10  0.12 -0.12 -0.03  0.05 -0.04  0.07  0.03 -0.07  0.05 -0.02  0.02
##      [,98] [,99] [,100] [,101] [,102] [,103] [,104] [,105] [,106] [,107] [,108]
## ACF   0.01 -0.03  -0.03  -0.03  -0.04  -0.08  -0.07   0.05   0.01   0.01   0.10
## PACF -0.03  0.00  -0.06  -0.04   0.02  -0.05  -0.03   0.05  -0.04   0.04   0.04
##      [,109] [,110] [,111] [,112] [,113] [,114] [,115] [,116] [,117] [,118]
## ACF    0.06   0.00   0.02  -0.03   0.04   0.04  -0.04  -0.06  -0.03   0.01
## PACF   0.02   0.01   0.01   0.02   0.06   0.00  -0.04  -0.03  -0.04   0.05
##      [,119] [,120] [,121] [,122] [,123] [,124] [,125] [,126] [,127] [,128]
## ACF    0.09  -0.04  -0.05   0.02   0.04      0  -0.08   0.02   0.00  -0.05
## PACF   0.06   0.00  -0.05   0.00   0.06      0  -0.08  -0.01  -0.07  -0.05
##      [,129] [,130] [,131] [,132] [,133] [,134] [,135] [,136] [,137] [,138]
## ACF    0.00   0.00   0.06   0.06   0.04   0.00   0.04  -0.01  -0.04   0.05
## PACF  -0.01   0.02   0.00   0.02   0.08  -0.03   0.00   0.00   0.00   0.05
##      [,139] [,140] [,141] [,142] [,143] [,144] [,145] [,146] [,147] [,148]
## ACF   -0.04  -0.02   0.03  -0.03  -0.06   0.01   0.00  -0.05   0.00  -0.05
## PACF  -0.05   0.00  -0.02   0.04  -0.06   0.00  -0.02  -0.01   0.03  -0.05
##      [,149] [,150] [,151] [,152] [,153] [,154] [,155] [,156] [,157] [,158]
## ACF   -0.11  -0.03   0.01   0.00   0.04   0.04   0.05   0.06  -0.04  -0.07
## PACF  -0.08   0.01   0.03  -0.03   0.00  -0.03   0.02   0.00  -0.01  -0.06
##      [,159] [,160] [,161] [,162] [,163] [,164] [,165] [,166] [,167] [,168]
## ACF    0.04   0.04  -0.01   0.07   0.00   0.00  -0.03  -0.03  -0.02  -0.02
## PACF   0.01   0.01   0.03   0.06   0.01   0.02  -0.04   0.02  -0.01   0.04
##      [,169] [,170] [,171] [,172] [,173] [,174] [,175] [,176] [,177] [,178]
## ACF    0.02  -0.01  -0.02   0.01   0.02   0.03   0.00   0.00   0.00  -0.01
## PACF  -0.01   0.00   0.01  -0.04   0.02   0.03   0.02  -0.04  -0.06   0.04
##      [,179] [,180] [,181] [,182] [,183] [,184] [,185] [,186] [,187] [,188]
## ACF   -0.03   0.05   0.02   0.07   0.05  -0.02   0.03  -0.01  -0.05  -0.06
## PACF   0.04   0.02   0.00   0.04   0.01  -0.02  -0.01  -0.02  -0.01  -0.01
##      [,189] [,190] [,191] [,192] [,193] [,194] [,195] [,196] [,197] [,198]
## ACF   -0.05  -0.06  -0.02   0.01   0.03   0.01   0.02  -0.02  -0.08   0.02
## PACF  -0.02  -0.03   0.02   0.00   0.06  -0.01  -0.02  -0.04  -0.02  -0.02
##      [,199] [,200] [,201] [,202] [,203] [,204] [,205] [,206] [,207] [,208]
## ACF    0.02  -0.01  -0.02  -0.01  -0.02   0.03   0.02   0.04   0.04  -0.05
## PACF   0.02  -0.04  -0.04  -0.03  -0.07   0.03  -0.02   0.01   0.05  -0.06

# Assuming both P/ACF are tailing, fit a model
sarima(oil_returns, p = 1, d = 0, q = 1)

## initial  value -3.195594 
## iter   2 value -3.201743
## iter   3 value -3.208031
## iter   4 value -3.208326
## iter   5 value -3.212457
## iter   6 value -3.215700
## iter   7 value -3.218603
## iter   8 value -3.220442
## iter   9 value -3.220894
## iter  10 value -3.221122
## iter  11 value -3.221413
## iter  12 value -3.221501
## iter  13 value -3.221524
## iter  14 value -3.221525
## iter  14 value -3.221525
## iter  14 value -3.221525
## final  value -3.221525 
## converged
## initial  value -3.222340 
## iter   2 value -3.222340
## iter   3 value -3.222341
## iter   4 value -3.222342
## iter   5 value -3.222342
## iter   5 value -3.222342
## iter   5 value -3.222342
## final  value -3.222342 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
##     fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ar1     ma1   xmean
##       -0.4752  0.6768  0.0029
## s.e.   0.1084  0.0871  0.0022
## 
## sigma^2 estimated as 0.001589:  log likelihood = 750.22,  aic = -1492.43
## 
## $degrees_of_freedom
## [1] 413
## 
## $ttable
##       Estimate     SE t.value p.value
## ar1    -0.4752 0.1084 -4.3846  0.0000
## ma1     0.6768 0.0871  7.7674  0.0000
## xmean   0.0029 0.0022  1.3184  0.1881
## 
## $AIC
## [1] -3.587576
## 
## $AICc
## [1] -3.587436
## 
## $BIC
## [1] -3.548819

You have now successfully manipulated some real-world time series data, explored the qualities of that data, and modeled an ARMA(1,1) model to your data. In the next chapter, you will explore more complicated models.

ARIMA - integrated ARMA

Identifying ARIMA

a time series exhibits ARIMA behavior if the differenced data has ARMA behavior

ACF and PCF of an integrated ARMA

ACF has a linear decay
PCF is close to 1 at lag one, and zero afterwards.

ARIMA - plug and play

As you saw in the video, a time series is called \(ARIMA(p,d,q)\) if the differenced series (of order \(d\)) is \(ARMA(p,q)\).

To get a sense of how the model works, you will analyze simulated data from the integrated model

\(Y_t =.9Y_{t−1} + W_t\)

where \(Y_t = \nabla X_t = X_t − X_{t−1}\). In this case, the model is an \(ARIMA(1,1,0)\) because the differenced data are an autoregression of order one.

The simulated time series is in x and it was generated in R as

x <- arima.sim(model = list(order = c(1, 1, 0), ar = .9), n = 200)

You will plot the generated data and the sample ACF and PACF of the generated data to see how integrated data behave. Then, you will difference the data to make it stationary. You will plot the differenced data and the corresponding sample ACF and PACF to see how differencing makes a difference.

As before, the astsa package is preloaded in your workspace. Data from an \(ARIMA(1,1,0)\) with AR parameter .9 is saved in object x.

x <- arima.sim(model = list(order = c(1, 1, 0), ar = .9), n = 200)

# Plot x
plot(x)

# Plot the P/ACF pair of x
acf2(x)

##      [,1]  [,2]  [,3]  [,4]  [,5]  [,6] [,7]  [,8] [,9] [,10] [,11] [,12] [,13]
## ACF  0.99  0.98  0.97  0.95  0.92  0.90 0.87  0.83  0.8  0.77  0.73  0.69  0.66
## PACF 0.99 -0.50 -0.26 -0.17 -0.09 -0.04 0.02 -0.02  0.0 -0.04 -0.02 -0.10  0.01
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF   0.62  0.58  0.54  0.50  0.46  0.42  0.38  0.34  0.29  0.25  0.21  0.18
## PACF  0.02 -0.05 -0.09 -0.06 -0.02 -0.01 -0.04  0.02  0.04  0.00  0.04  0.01

# Plot the differenced data
plot(diff(x))

# Plot the P/ACF pair of the differenced data
acf2(diff(x))

##      [,1] [,2]  [,3] [,4]  [,5]  [,6] [,7] [,8]  [,9] [,10] [,11] [,12] [,13]
## ACF  0.88 0.80  0.70 0.61  0.51  0.42 0.33 0.28  0.21  0.20  0.16  0.12  0.10
## PACF 0.88 0.08 -0.11 0.01 -0.10 -0.08 0.00 0.08 -0.04  0.15 -0.05 -0.12  0.05
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF   0.10  0.11  0.09  0.06  0.04  0.01 -0.03 -0.06 -0.10 -0.16 -0.20 -0.23
## PACF  0.09  0.03 -0.06 -0.10 -0.04 -0.02 -0.08  0.00 -0.02 -0.17  0.04 -0.03

As you can see, differencing the data in your \(ARIMA(1,1,0)\) model makes it stationary and allows for further analysis.

Simulated ARIMA

Before analyzing actual time series data, you should try working with a slightly more complicated model.

Here, we generated 250 observations from the \(ARIMA(2,1,0)\) model with drift given by

\(Y_t = 1 + 1.5Y_{t−1} − .75Y_{t−2} + W_t\)

where \(Y_t = \nabla X_t = X_t − X_{t−1}\).

You will use the established techniques to fit a model to the data.

The astsa package is preloaded and the generated data are in x. The series x and the detrended seriesy <- diff(x) have been plotted.

# Plot sample P/ACF of differenced data and determine model
acf2(diff(x))

# Estimate parameters and examine output
sarima(x, p = 2, d = 1, q = 0)

Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
    Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed, 
    optim.control = list(trace = trc, REPORT = 1, reltol = tol))

Coefficients:
         ar1      ar2  constant
      1.5197  -0.7669    1.2335
s.e.  0.0401   0.0401    0.2570

sigma^2 estimated as 1.004:  log likelihood = -355.41,  aic = 718.82

$degrees_of_freedom
[1] 246

$ttable
         Estimate     SE  t.value p.value
ar1        1.5197 0.0401  37.9191       0
ar2       -0.7669 0.0401 -19.1321       0
constant   1.2335 0.2570   4.7996       0

$AIC
[1] 2.886818

$AICc
[1] 2.887211

$BIC
[1] 2.943323

As you can see from your t-table, the estimated parameters are very close to 1.5 and -0.75.

Global warming

Now that you have some experience fitting an ARIMA model to simulated data, your next task is to apply your skills to some real world data.

The data in globtemp (from astsa) are the annual global temperature deviations to 2015. In this exercise, you will use established techniques to fit an ARIMA model to the data. A plot of the data shows random walk behavior, which suggests you should work with the differenced data. The differenced data diff(globtemp) are also plotted.

After plotting the sample ACF and PACF of the differenced data diff(globtemp), you can say that either

The ACF and the PACF are both tailing off, implying an ARIMA(1,1,1) model.
The ACF cuts off at lag 2, and the PACF is tailing off, implying an ARIMA(0,1,2) model.
The ACF is tailing off and the PACF cuts off at lag 3, implying an ARIMA(3,1,0) model. Although this model fits reasonably well, it is the worst of the three (you can check it) because it uses too many parameters for such small autocorrelations.

After fitting the first two models, check the AIC and BIC to choose the preferred model.

par(mfrow = c(2,1))
plot(astsa::globtemp) 
plot(diff(astsa::globtemp))

# Plot the sample P/ACF pair of the differenced data 
acf2(diff(globtemp))

##       [,1]  [,2]  [,3] [,4]  [,5]  [,6]  [,7] [,8]  [,9] [,10] [,11] [,12]
## ACF  -0.24 -0.19 -0.08 0.20 -0.15 -0.03  0.03 0.14 -0.16  0.11 -0.05  0.00
## PACF -0.24 -0.26 -0.23 0.06 -0.16 -0.09 -0.05 0.07 -0.09  0.11 -0.03 -0.02
##      [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22]
## ACF  -0.13  0.14 -0.01 -0.08     0  0.19 -0.07  0.02 -0.02  0.08
## PACF -0.10  0.02  0.00 -0.09     0  0.11  0.04  0.13  0.09  0.08

# Fit an ARIMA(1,1,1) model to globtemp
sarima(globtemp, p = 1, d = 1, q = 1)

## initial  value -2.218917 
## iter   2 value -2.253118
## iter   3 value -2.263750
## iter   4 value -2.272144
## iter   5 value -2.282786
## iter   6 value -2.296777
## iter   7 value -2.297062
## iter   8 value -2.297253
## iter   9 value -2.297389
## iter  10 value -2.297405
## iter  11 value -2.297413
## iter  12 value -2.297413
## iter  13 value -2.297414
## iter  13 value -2.297414
## iter  13 value -2.297414
## final  value -2.297414 
## converged
## initial  value -2.305504 
## iter   2 value -2.305800
## iter   3 value -2.305821
## iter   4 value -2.306655
## iter   5 value -2.306875
## iter   6 value -2.306950
## iter   7 value -2.306955
## iter   8 value -2.306955
## iter   8 value -2.306955
## final  value -2.306955 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1      ma1  constant
##       0.3549  -0.7663    0.0072
## s.e.  0.1314   0.0874    0.0032
## 
## sigma^2 estimated as 0.009885:  log likelihood = 119.88,  aic = -231.76
## 
## $degrees_of_freedom
## [1] 132
## 
## $ttable
##          Estimate     SE t.value p.value
## ar1        0.3549 0.1314  2.7008  0.0078
## ma1       -0.7663 0.0874 -8.7701  0.0000
## constant   0.0072 0.0032  2.2738  0.0246
## 
## $AIC
## [1] -1.716773
## 
## $AICc
## [1] -1.715416
## 
## $BIC
## [1] -1.630691

# Fit an ARIMA(0,1,2) model to globtemp. Which model is better?
sarima(globtemp, p = 0, d = 1, q = 2)

## initial  value -2.220513 
## iter   2 value -2.294887
## iter   3 value -2.307682
## iter   4 value -2.309170
## iter   5 value -2.310360
## iter   6 value -2.311251
## iter   7 value -2.311636
## iter   8 value -2.311648
## iter   9 value -2.311649
## iter   9 value -2.311649
## iter   9 value -2.311649
## final  value -2.311649 
## converged
## initial  value -2.310187 
## iter   2 value -2.310197
## iter   3 value -2.310199
## iter   4 value -2.310201
## iter   5 value -2.310202
## iter   5 value -2.310202
## iter   5 value -2.310202
## final  value -2.310202 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ma1      ma2  constant
##       -0.3984  -0.2173    0.0072
## s.e.   0.0808   0.0768    0.0033
## 
## sigma^2 estimated as 0.00982:  log likelihood = 120.32,  aic = -232.64
## 
## $degrees_of_freedom
## [1] 132
## 
## $ttable
##          Estimate     SE t.value p.value
## ma1       -0.3984 0.0808 -4.9313  0.0000
## ma2       -0.2173 0.0768 -2.8303  0.0054
## constant   0.0072 0.0033  2.1463  0.0337
## 
## $AIC
## [1] -1.723268
## 
## $AICc
## [1] -1.721911
## 
## $BIC
## [1] -1.637185

Judging by the AIC and BIC, the ARIMA(0,1,2) model performs better than the ARIMA(1,1,1) model on the globtemp data. Remember to thoroughly examine the output of your sarima() command to gain a full understanding of your model.

ARIMA diagnostics

Diagnostics - simulated overfitting

One way to check an analysis is to overfit the model by adding an extra parameter to see if it makes a difference in the results. If adding parameters changes the results drastically, then you should rethink your model. If, however, the results do not change by much, you can be confident that your fit is correct.

We generated 250 observations from an ARIMA(0,1,1) model with MA parameter .9. First, you will fit the model to the data using established techniques.

Then, you can check a model by overfitting (adding a parameter) to see if it makes a difference. In this case, you will add an additional MA parameter to see that it is not needed.

As usual, the astsa package is preloaded and the generated data in x are plotted in your workspace. The differenced data diff(x) are also plotted. Note that it looks stationary.

x <- arima.sim(model = list(order = c(0, 1, 1), ma = .9), n = 250)

par(mfrow = c(2,1))
plot(x) 
plot(diff(x))

# Plot sample P/ACF pair of the differenced data
acf2(diff(x))

##      [,1]  [,2]  [,3]  [,4] [,5]  [,6] [,7]  [,8] [,9] [,10] [,11] [,12] [,13]
## ACF  0.42 -0.10 -0.04 -0.01 0.03  0.02 0.01  0.07 0.02 -0.08 -0.01  0.08 -0.01
## PACF 0.42 -0.34  0.22 -0.18 0.19 -0.16 0.16 -0.04 0.01 -0.10  0.13 -0.04 -0.03
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF  -0.09 -0.02  0.06  0.02 -0.04 -0.04  0.08  0.16  0.09  0.03  0.04  0.01
## PACF -0.06  0.08 -0.02  0.00 -0.04  0.02  0.10  0.08  0.01  0.04  0.00  0.02
##      [,26]
## ACF  -0.08
## PACF -0.13

# Fit the first model, compare parameters, check diagnostics
astsa::sarima(x, p = 0, d = 1, q = 1)

## initial  value 0.247668 
## iter   2 value 0.101855
## iter   3 value 0.052606
## iter   4 value 0.003447
## iter   5 value 0.000436
## iter   6 value 0.000327
## iter   7 value 0.000304
## iter   8 value 0.000277
## iter   9 value 0.000246
## iter  10 value 0.000242
## iter  11 value 0.000241
## iter  12 value 0.000241
## iter  13 value 0.000241
## iter  14 value 0.000241
## iter  15 value 0.000241
## iter  16 value 0.000241
## iter  16 value 0.000241
## final  value 0.000241 
## converged
## initial  value -0.003385 
## iter   2 value -0.003680
## iter   3 value -0.003786
## iter   4 value -0.003810
## iter   5 value -0.003847
## iter   6 value -0.003847
## iter   6 value -0.003847
## final  value -0.003847 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ma1  constant
##       0.9071    0.1810
## s.e.  0.0359    0.1195
## 
## sigma^2 estimated as 0.9855:  log likelihood = -353.77,  aic = 713.55
## 
## $degrees_of_freedom
## [1] 248
## 
## $ttable
##          Estimate     SE t.value p.value
## ma1        0.9071 0.0359 25.2465  0.0000
## constant   0.1810 0.1195  1.5146  0.1312
## 
## $AIC
## [1] 2.854184
## 
## $AICc
## [1] 2.854378
## 
## $BIC
## [1] 2.896441

# Fit the second model and compare fit
astsa::sarima(x, p = 0, d = 1, q = 2)

## initial  value 0.247668 
## iter   2 value 0.082421
## iter   3 value 0.056210
## iter   4 value 0.035808
## iter   5 value 0.000779
## iter   6 value -0.001022
## iter   7 value -0.004734
## iter   8 value -0.005086
## iter   9 value -0.005126
## iter  10 value -0.005140
## iter  11 value -0.005142
## iter  12 value -0.005144
## iter  13 value -0.005145
## iter  14 value -0.005145
## iter  15 value -0.005145
## iter  16 value -0.005145
## iter  17 value -0.005145
## iter  17 value -0.005145
## iter  17 value -0.005145
## final  value -0.005145 
## converged
## initial  value -0.008903 
## iter   2 value -0.009279
## iter   3 value -0.010045
## iter   4 value -0.010247
## iter   5 value -0.010342
## iter   6 value -0.010384
## iter   7 value -0.010388
## iter   8 value -0.010392
## iter   9 value -0.010392
## iter   9 value -0.010392
## iter   9 value -0.010392
## final  value -0.010392 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ma1      ma2  constant
##       0.8328  -0.1672     0.181
## s.e.  0.0636   0.0619     0.103
## 
## sigma^2 estimated as 0.9591:  log likelihood = -352.14,  aic = 712.27
## 
## $degrees_of_freedom
## [1] 247
## 
## $ttable
##          Estimate     SE t.value p.value
## ma1        0.8328 0.0636 13.1030  0.0000
## ma2       -0.1672 0.0619 -2.7018  0.0074
## constant   0.1810 0.1030  1.7571  0.0801
## 
## $AIC
## [1] 2.849094
## 
## $AICc
## [1] 2.849484
## 
## $BIC
## [1] 2.905437

As you can see from the t-table, the second MA parameter is not significantly different from zero and the first MA parameter is approximately the same in each run. Also, the AIC and BIC both increase when the parameter is added. In addition, the residual analysis of your ARIMA(0,1,1) model is fine. All of these facts together indicate that you have a successful model fit.

Diagnostics - global temperatures

You can now finish your analysis of global temperatures. Recall that you previously fit two models to the data in globtemp, an ARIMA(1,1,1) and an ARIMA(0,1,2). In the final analysis, check the residual diagnostics and use AIC and BIC for model choice.

The data are plotted for you.

plot(astsa::globtemp, ylab = "globtemp", main = "Global Temperature Deviations")

# Fit ARIMA(0,1,2) to globtemp and check diagnostics  
sarima(globtemp, p = 0, d = 1, q = 2)

## initial  value -2.220513 
## iter   2 value -2.294887
## iter   3 value -2.307682
## iter   4 value -2.309170
## iter   5 value -2.310360
## iter   6 value -2.311251
## iter   7 value -2.311636
## iter   8 value -2.311648
## iter   9 value -2.311649
## iter   9 value -2.311649
## iter   9 value -2.311649
## final  value -2.311649 
## converged
## initial  value -2.310187 
## iter   2 value -2.310197
## iter   3 value -2.310199
## iter   4 value -2.310201
## iter   5 value -2.310202
## iter   5 value -2.310202
## iter   5 value -2.310202
## final  value -2.310202 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ma1      ma2  constant
##       -0.3984  -0.2173    0.0072
## s.e.   0.0808   0.0768    0.0033
## 
## sigma^2 estimated as 0.00982:  log likelihood = 120.32,  aic = -232.64
## 
## $degrees_of_freedom
## [1] 132
## 
## $ttable
##          Estimate     SE t.value p.value
## ma1       -0.3984 0.0808 -4.9313  0.0000
## ma2       -0.2173 0.0768 -2.8303  0.0054
## constant   0.0072 0.0033  2.1463  0.0337
## 
## $AIC
## [1] -1.723268
## 
## $AICc
## [1] -1.721911
## 
## $BIC
## [1] -1.637185

# Fit ARIMA(1,1,1) to globtemp and check diagnostics
sarima(globtemp, p = 1, d = 1, q = 1)

## initial  value -2.218917 
## iter   2 value -2.253118
## iter   3 value -2.263750
## iter   4 value -2.272144
## iter   5 value -2.282786
## iter   6 value -2.296777
## iter   7 value -2.297062
## iter   8 value -2.297253
## iter   9 value -2.297389
## iter  10 value -2.297405
## iter  11 value -2.297413
## iter  12 value -2.297413
## iter  13 value -2.297414
## iter  13 value -2.297414
## iter  13 value -2.297414
## final  value -2.297414 
## converged
## initial  value -2.305504 
## iter   2 value -2.305800
## iter   3 value -2.305821
## iter   4 value -2.306655
## iter   5 value -2.306875
## iter   6 value -2.306950
## iter   7 value -2.306955
## iter   8 value -2.306955
## iter   8 value -2.306955
## final  value -2.306955 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1      ma1  constant
##       0.3549  -0.7663    0.0072
## s.e.  0.1314   0.0874    0.0032
## 
## sigma^2 estimated as 0.009885:  log likelihood = 119.88,  aic = -231.76
## 
## $degrees_of_freedom
## [1] 132
## 
## $ttable
##          Estimate     SE t.value p.value
## ar1        0.3549 0.1314  2.7008  0.0078
## ma1       -0.7663 0.0874 -8.7701  0.0000
## constant   0.0072 0.0032  2.2738  0.0246
## 
## $AIC
## [1] -1.716773
## 
## $AICc
## [1] -1.715416
## 
## $BIC
## [1] -1.630691

# Which is the better model?
"ARIMA(0,1,2)"

## [1] "ARIMA(0,1,2)"

Your model diagnostics suggest that both the ARIMA(0,1,2) and the ARIMA(1,1,1) are reasonable models. However, the AIC and BIC suggest that the ARIMA(0,1,2) performs slightly better, so this should be your preferred model. Although you were not asked to do so, you can use overfitting to assess the final model. For example, try fitting an ARIMA(1,1,2) or an ARIMA(0,1,3) to the data.

Forecasting ARIMA

the model describes how the dynamics of the time series behave over time.
forecasting simply continues the mdoel dynamics into the future.
use sarima.for() to forecast in the astsa-package
- red plotted area is predicted values
- dark gray band is 1 * RMSPE
- light gray band is 2 * RMSPE

Forecasting simulated ARIMA

Now that you are an expert at fitting ARIMA models, you can use your skills for forecasting. First, you will work with simulated data.

We generated 120 observations from an ARIMA(1,1,0) model with AR parameter .9. The data are in y and the first 100 observations are in x. These observations are plotted for you. You will fit an ARIMA(1,1,0) model to the data in x and verify that the model fits well. Then use sarima.for() from astsa to forecast the data 20 time periods ahead. You will then compare the forecasts to the actual data in y.

The basic syntax for forecasting is sarima.for(data, n.ahead, p, d, q) where n.ahead is a positive integer that specifies the forecast horizon. The predicted values and their standard errors are printed, the data are plotted in black, and the forecasts are in red along with 2 mean square prediction error bounds as blue dashed lines.

The astsa package is preloaded and the data (x) and differenced data (diff(x)) are plotted.

y <- arima.sim(model = list(order = c(1, 1, 0), ar = .9), n = 120)
x <- ts(y, start = 1, end = 100, frequency = 1)

par(mfrow = c(2,1))
plot(x) 
plot(diff(x))

# Plot P/ACF pair of differenced data 
astsa::acf2(diff(x))

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]  [,8]  [,9] [,10] [,11] [,12] [,13]
## ACF  0.85 0.74 0.65 0.57 0.50 0.44 0.39  0.31  0.21  0.09  0.01 -0.08 -0.09
## PACF 0.85 0.08 0.02 0.00 0.01 0.02 0.00 -0.15 -0.14 -0.17  0.03 -0.15  0.21
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20]
## ACF  -0.13 -0.19 -0.25 -0.28 -0.29 -0.30 -0.28
## PACF -0.13 -0.05 -0.11  0.09  0.06 -0.03  0.03

# Fit model - check t-table and diagnostics
astsa::sarima(x, p = 1, d = 1, q = 0)

## initial  value 0.632596 
## iter   2 value -0.051045
## iter   3 value -0.053007
## iter   4 value -0.053139
## iter   5 value -0.053422
## iter   6 value -0.053781
## iter   7 value -0.053859
## iter   8 value -0.053874
## iter   9 value -0.053895
## iter  10 value -0.053925
## iter  11 value -0.053946
## iter  12 value -0.053952
## iter  13 value -0.053952
## iter  14 value -0.053952
## iter  15 value -0.053953
## iter  16 value -0.053953
## iter  17 value -0.053953
## iter  18 value -0.053953
## iter  19 value -0.053953
## iter  20 value -0.053953
## iter  21 value -0.053953
## iter  22 value -0.053953
## iter  22 value -0.053953
## iter  22 value -0.053953
## final  value -0.053953 
## converged
## initial  value -0.049408 
## iter   2 value -0.049470
## iter   3 value -0.049740
## iter   4 value -0.049767
## iter   5 value -0.049802
## iter   6 value -0.049821
## iter   7 value -0.049822
## iter   8 value -0.049823
## iter   9 value -0.049823
## iter  10 value -0.049824
## iter  10 value -0.049824
## iter  10 value -0.049824
## final  value -0.049824 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1  constant
##       0.8726   -0.0147
## s.e.  0.0496    0.7017
## 
## sigma^2 estimated as 0.8921:  log likelihood = -135.54,  aic = 277.08
## 
## $degrees_of_freedom
## [1] 97
## 
## $ttable
##          Estimate     SE t.value p.value
## ar1        0.8726 0.0496 17.5823  0.0000
## constant  -0.0147 0.7017 -0.0209  0.9833
## 
## $AIC
## [1] 2.798836
## 
## $AICc
## [1] 2.800099
## 
## $BIC
## [1] 2.877476

# Forecast the data 20 time periods ahead
astsa::sarima.for(x, n.ahead = 20, p = 1, d = 1, q = 0)

## $pred
## Time Series:
## Start = 101 
## End = 120 
## Frequency = 1 
##  [1] -15.0218971 -12.3590917 -10.0373072  -8.0131092  -6.2485953  -4.7106905
##  [7]  -3.3705325  -2.2029351  -1.1859200  -0.3003082   0.4706368   1.1415196
## [13]   1.7250846   2.2324533   2.6733304   3.0561846   3.3884062   3.6764440
## [19]   3.9259256   4.1417616
## 
## $se
## Time Series:
## Start = 101 
## End = 120 
## Frequency = 1 
##  [1]  0.9445349  2.0051651  3.1954564  4.4629953  5.7739609  7.1055712
##  [7]  8.4420413  9.7722824 11.0884824 12.3851782 13.6586198 14.9063226
## [13] 16.1267428 17.3190377 18.4828878 19.6183612 20.7258118 21.8058004
## [19] 22.8590350 23.8863245

lines(y)

As you can see, the sarima.for() command provides a simple method for forecasting. Although the blue error bands are relatively wide, the prediction remains quite valuable.

Forecasting global temperatures Now you can try forecasting real data.

Here, you will forecast the annual global temperature deviations globtemp to 2050. Recall that in previous exercises, you fit an ARIMA(0,1,2) model to the data. You will refit the model to confirm it, and then forecast the series 35 years into the future.

The astsa package is preloaded and the data are plotted.

plot(astsa::globtemp, ylab = "globtemp", main = "Global Temperature Deviations")

# Fit an ARIMA(0,1,2) to globtemp and check the fit
sarima(globtemp, p = 0, d = 1, q = 2)

## initial  value -2.220513 
## iter   2 value -2.294887
## iter   3 value -2.307682
## iter   4 value -2.309170
## iter   5 value -2.310360
## iter   6 value -2.311251
## iter   7 value -2.311636
## iter   8 value -2.311648
## iter   9 value -2.311649
## iter   9 value -2.311649
## iter   9 value -2.311649
## final  value -2.311649 
## converged
## initial  value -2.310187 
## iter   2 value -2.310197
## iter   3 value -2.310199
## iter   4 value -2.310201
## iter   5 value -2.310202
## iter   5 value -2.310202
## iter   5 value -2.310202
## final  value -2.310202 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ma1      ma2  constant
##       -0.3984  -0.2173    0.0072
## s.e.   0.0808   0.0768    0.0033
## 
## sigma^2 estimated as 0.00982:  log likelihood = 120.32,  aic = -232.64
## 
## $degrees_of_freedom
## [1] 132
## 
## $ttable
##          Estimate     SE t.value p.value
## ma1       -0.3984 0.0808 -4.9313  0.0000
## ma2       -0.2173 0.0768 -2.8303  0.0054
## constant   0.0072 0.0033  2.1463  0.0337
## 
## $AIC
## [1] -1.723268
## 
## $AICc
## [1] -1.721911
## 
## $BIC
## [1] -1.637185

# Forecast data 35 years into the future
sarima.for(globtemp, n.ahead = 35, p = 0, d = 1, q = 2)

## $pred
## Time Series:
## Start = 2016 
## End = 2050 
## Frequency = 1 
##  [1] 0.7995567 0.7745381 0.7816919 0.7888457 0.7959996 0.8031534 0.8103072
##  [8] 0.8174611 0.8246149 0.8317688 0.8389226 0.8460764 0.8532303 0.8603841
## [15] 0.8675379 0.8746918 0.8818456 0.8889995 0.8961533 0.9033071 0.9104610
## [22] 0.9176148 0.9247687 0.9319225 0.9390763 0.9462302 0.9533840 0.9605378
## [29] 0.9676917 0.9748455 0.9819994 0.9891532 0.9963070 1.0034609 1.0106147
## 
## $se
## Time Series:
## Start = 2016 
## End = 2050 
## Frequency = 1 
##  [1] 0.09909556 0.11564576 0.12175580 0.12757353 0.13313729 0.13847769
##  [7] 0.14361964 0.14858376 0.15338730 0.15804492 0.16256915 0.16697084
## [13] 0.17125943 0.17544322 0.17952954 0.18352490 0.18743511 0.19126540
## [19] 0.19502047 0.19870459 0.20232164 0.20587515 0.20936836 0.21280424
## [25] 0.21618551 0.21951471 0.22279416 0.22602604 0.22921235 0.23235497
## [31] 0.23545565 0.23851603 0.24153763 0.24452190 0.24747019

In the next chapter, you will learn how to analyze seasonal time series data.

Pure seasonal models

often collect data with a known seasonal component
air passengers (1 cycle every S = 12 months)
Johnson & Johnson earnings (1 cycle every S = 4 quarters)

Consider pure seasonal models such as an SAR\((P = 1)_{s=12}\)

\(X_t = \Phi X_{t-12} + W_t\)

all lags are at the seasonal level

Fit a pure seasonal model As with other models, you can fit seasonal models in R using the sarima() command in the astsa package.

To get a feeling of how pure seasonal models work, it is best to consider simulated data. We generated 250 observations from a pure seasonal model given by

\(X_t = .9X_{t−12} + Wt + .5W_{t−12}\),

which we would denote as a SARMA\((P = 1, Q = 1)_S = 12\). Three years of data and the model ACF and PACF are plotted for you.

You will compare the sample ACF and PACF values from the generated data to the true values.

The astsa package is preloaded for you and the generated data are in x.

# Plot sample P/ACF to lag 60 and compare to the true values
acf2(x, max.lag = 60)

# Fit the seasonal model to x
sarima(x, p = 0, d = 0, q = 0, P = 1, D = 0, Q = 1, S = 12)

Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
    Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
    fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))

Coefficients:
        sar1    sma1    xmean
      0.9311  0.4824  -0.5766
s.e.  0.0204  0.0633   0.8797

sigma^2 estimated as 0.9767:  log likelihood = -372.74,  aic = 753.48

$degrees_of_freedom
[1] 249

$ttable
      Estimate     SE t.value p.value
sar1    0.9311 0.0204 45.6189  0.0000
sma1    0.4824 0.0633  7.6158  0.0000
xmean  -0.5766 0.8797 -0.6554  0.5128

$AIC
[1] 2.989992

$AICc
[1] 2.990377

$BIC
[1] 3.046015

Fitting a seasonal model using sarima() requires a few additional arguments, but follows the same basic process as nonseasonal models.

Mixed seasonal models

mixed models: SARIMA\((p, d, q)\) x \((P, D, Q)_s\) model
consider a SARIMA\((0, 0, 1)\) x \((1, 0, 0)_{12}\) model
- \(X_t = \Phi X_{t-12} + W_t + \theta W_{t-1}\)
- SAR(1): value this month related to last year’s value \(X_{t-12}\)
- MA(1): this month’s value related to last month’s shock \(W_{t-1}\)

Fit a mixed seasonal model

Pure seasonal dependence such as that explored earlier in this chapter is relatively rare. Most seasonal time series have mixed dependence, meaning only some of the variation is explained by seasonal trends.

Recall that the full seasonal model is denoted by SARIMA\((p,d,q)\)x\((P,D,Q)_S\) where capital letters denote the seasonal orders.

As before, this exercise asks you to compare the sample P/ACF pair to the true values for some simulated seasonal data and fit a model to the data using sarima(). This time, the simulated data come from a mixed seasonal model, SARIMA\((0,0,1)\)x\((0,0,1)_{12}\). The plots show three years of data, as well as the model ACF and PACF. Notice that, as opposed to the pure seasonal model, there are correlations at the nonseasonal lags as well as the seasonal lags.

As always, the astsa package is preloaded. The generated data are in x.

# Plot sample P/ACF pair to lag 60 and compare to actual
acf2(x,  max.lag = 60)

# Fit the seasonal model to x
sarima(x, p = 0, d = 0, q = 1, P = 0, D = 0, Q = 1, S = 12)

Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
    Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
    fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))

Coefficients:
          ma1    sma1   xmean
      -0.6142  0.7888  0.0784
s.e.   0.0565  0.0475  0.0430

sigma^2 estimated as 1.005:  log likelihood = -364.26,  aic = 736.52

$degrees_of_freedom
[1] 249

$ttable
      Estimate     SE  t.value p.value
ma1    -0.6142 0.0565 -10.8809  0.0000
sma1    0.7888 0.0475  16.6094  0.0000
xmean   0.0784 0.0430   1.8253  0.0692

$AIC
[1] 2.922696

$AICc
[1] 2.92308

$BIC
[1] 2.978718

Time series data collected on a seasonal basis typically have mixed dependence. For example, what happens in June is often related to what happened in May as well as what happened in June of last year.

Data analysis - unemployment I

In the video, we fit a seasonal ARIMA model to the log of the monthly AirPassengers data set. You will now start to fit a seasonal ARIMA model to the monthly US unemployment data, unemp, from the astsa package.

The first thing to do is to plot the data, notice the trend and the seasonal persistence. Then look at the detrended data and remove the seasonal persistence. After that, the fully differenced data should look stationary.

The astsa package is preloaded in your workspace.

# Plot unemp 
plot(unemp, main = "Unemployment")

# Difference your data and plot
d_unemp <- diff(unemp)
plot(d_unemp, main = "Differenced Unemployment")

# Plot seasonal differenced diff_unemp
dd_unemp <- diff(d_unemp, lag = 12)   
plot(dd_unemp, main = "Seasonal Differenced Unemployment")

Now that you have removed the trend and seasonal variation in unemployment, the data appear to be stationary.

Data analysis - unemployment II

Now, you will continue fitting an SARIMA model to the monthly US unemployment unemp time series by looking at the sample ACF and PACF of the fully differenced series.

Note that the lag axis in the sample P/ACF plot is in terms of years. Thus, lags 1, 2, 3, … represent 1 year (12 months), 2 years (24 months), 3 years (36 months), …

Once again, the astsa package is preloaded in your workspace.

# Plot P/ACF pair of the fully differenced data to lag 60
dd_unemp <- diff(diff(unemp), lag = 12)
acf2(dd_unemp, max.lag = 60)

##      [,1] [,2] [,3] [,4] [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12] [,13]
## ACF  0.21 0.33 0.15 0.17 0.10  0.06 -0.06 -0.02 -0.09 -0.17 -0.08 -0.48 -0.18
## PACF 0.21 0.29 0.05 0.05 0.01 -0.02 -0.12 -0.03 -0.05 -0.15  0.02 -0.43 -0.02
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF  -0.16 -0.11 -0.15 -0.09 -0.09  0.03 -0.01  0.02 -0.02  0.01 -0.02  0.09
## PACF  0.15  0.03 -0.04 -0.01  0.00  0.01  0.01 -0.01 -0.16  0.01 -0.27  0.05
##      [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37]
## ACF  -0.05 -0.01  0.03  0.08  0.01  0.03 -0.05  0.01  0.02 -0.06 -0.02 -0.12
## PACF -0.01 -0.05  0.05  0.09 -0.04  0.02 -0.07 -0.01 -0.08 -0.08 -0.23 -0.08
##      [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49]
## ACF   0.01 -0.03 -0.03 -0.10 -0.02 -0.13  0.00 -0.06  0.01  0.02  0.11  0.13
## PACF  0.06 -0.07 -0.01  0.03 -0.03 -0.11 -0.04  0.01  0.00 -0.03 -0.04  0.02
##      [,50] [,51] [,52] [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60]
## ACF   0.10  0.07  0.10  0.12  0.06  0.14  0.05  0.04  0.04  0.07 -0.03
## PACF  0.03 -0.05  0.02  0.02 -0.08  0.00 -0.03 -0.07  0.05  0.04 -0.04

# Fit an appropriate model
sarima(unemp, p = 2, d = 1, q = 0, P = 0, D = 1, Q = 1, S = 12)

## initial  value 3.340809 
## iter   2 value 3.105512
## iter   3 value 3.086631
## iter   4 value 3.079778
## iter   5 value 3.069447
## iter   6 value 3.067659
## iter   7 value 3.067426
## iter   8 value 3.067418
## iter   8 value 3.067418
## final  value 3.067418 
## converged
## initial  value 3.065481 
## iter   2 value 3.065478
## iter   3 value 3.065477
## iter   3 value 3.065477
## iter   3 value 3.065477
## final  value 3.065477 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), include.mean = !no.constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1     ar2     sma1
##       0.1351  0.2464  -0.6953
## s.e.  0.0513  0.0515   0.0381
## 
## sigma^2 estimated as 449.6:  log likelihood = -1609.91,  aic = 3227.81
## 
## $degrees_of_freedom
## [1] 356
## 
## $ttable
##      Estimate     SE  t.value p.value
## ar1    0.1351 0.0513   2.6326  0.0088
## ar2    0.2464 0.0515   4.7795  0.0000
## sma1  -0.6953 0.0381 -18.2362  0.0000
## 
## $AIC
## [1] 8.723811
## 
## $AICc
## [1] 8.723988
## 
## $BIC
## [1] 8.765793

As always, keep a close eye on the output from your sarima() command to get a feel for the fit of your model.

Data analysis - commodity prices

Making money in commodities is not easy. Most commodities traders lose money rather than make it. The package astsa includes the data set chicken, which is the monthly whole bird spot price, Georgia docks, US cents per pound, from August, 2001 to July, 2016.

The astsa package is preloaded in your R console and the data are plotted for you, note the trend and seasonal components.

First, you will use your skills to carefully fit an SARIMA model to the commodity. Later, you will use the fitted model to try and forecast the whole bird spot price.

After removing the trend, the sample ACF and PACF suggest an AR(2) model because the PACF cuts off after lag 2 and the ACF tails off. However, the ACF has a small seasonal component remaining. This can be taken care of by fitting an addition SAR(1) component.

By the way, if you are interested in analyzing other commodities from various regions, you can find many different time series at index mundi.

plot(chicken, ylab = "Cents per Pound", main = "Chicken Prices")

# Plot differenced chicken
plot(diff(chicken))

# Plot P/ACF pair of differenced data to lag 60
acf2(diff(chicken), max.lag = 60)

##      [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12]
## ACF  0.72  0.39  0.09 -0.07 -0.16 -0.20 -0.27 -0.23 -0.11  0.09  0.26  0.33
## PACF 0.72 -0.29 -0.14  0.03 -0.10 -0.06 -0.19  0.12  0.10  0.16  0.09  0.00
##      [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF   0.20  0.07 -0.03 -0.10 -0.19 -0.25 -0.29 -0.20 -0.08  0.08  0.16  0.18
## PACF -0.22  0.03  0.03 -0.11 -0.09  0.01 -0.03  0.07 -0.04  0.06 -0.05  0.02
##      [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36]
## ACF   0.08 -0.06 -0.21 -0.31 -0.40 -0.40 -0.33 -0.18  0.02  0.20  0.30  0.35
## PACF -0.14 -0.19 -0.13 -0.06 -0.08 -0.05  0.01  0.03  0.10  0.02 -0.01  0.09
##      [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF   0.26  0.13 -0.02 -0.14 -0.23 -0.21 -0.18 -0.11 -0.03  0.08  0.21  0.33
## PACF -0.12  0.01 -0.01 -0.05  0.02  0.12 -0.05 -0.13 -0.07  0.01  0.14  0.05
##      [,49] [,50] [,51] [,52] [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60]
## ACF   0.26  0.12 -0.01 -0.11 -0.13 -0.09 -0.09 -0.06  0.03  0.17  0.29  0.32
## PACF -0.20 -0.01  0.07 -0.04  0.02  0.00 -0.08  0.03  0.04  0.00  0.01  0.03

# Fit ARIMA(2,1,0) to chicken - not so good
sarima(chicken, p = 2, d = 1, q = 0)

## initial  value 0.001863 
## iter   2 value -0.156034
## iter   3 value -0.359181
## iter   4 value -0.424164
## iter   5 value -0.430212
## iter   6 value -0.432744
## iter   7 value -0.432747
## iter   8 value -0.432749
## iter   9 value -0.432749
## iter  10 value -0.432751
## iter  11 value -0.432752
## iter  12 value -0.432752
## iter  13 value -0.432752
## iter  13 value -0.432752
## iter  13 value -0.432752
## final  value -0.432752 
## converged
## initial  value -0.420883 
## iter   2 value -0.420934
## iter   3 value -0.420936
## iter   4 value -0.420937
## iter   5 value -0.420937
## iter   6 value -0.420937
## iter   6 value -0.420937
## iter   6 value -0.420937
## final  value -0.420937 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1      ar2  constant
##       0.9494  -0.3069    0.2632
## s.e.  0.0717   0.0718    0.1362
## 
## sigma^2 estimated as 0.4286:  log likelihood = -178.64,  aic = 365.28
## 
## $degrees_of_freedom
## [1] 176
## 
## $ttable
##          Estimate     SE t.value p.value
## ar1        0.9494 0.0717 13.2339  0.0000
## ar2       -0.3069 0.0718 -4.2723  0.0000
## constant   0.2632 0.1362  1.9328  0.0549
## 
## $AIC
## [1] 2.040695
## 
## $AICc
## [1] 2.041461
## 
## $BIC
## [1] 2.111922

# Fit SARIMA(2,1,0,1,0,0,12) to chicken - that works
sarima(chicken, p = 2, d = 1, q = 0, P = 1, D = 0, Q = 0, S = 12)

## initial  value 0.015039 
## iter   2 value -0.226398
## iter   3 value -0.412955
## iter   4 value -0.460882
## iter   5 value -0.470787
## iter   6 value -0.471082
## iter   7 value -0.471088
## iter   8 value -0.471090
## iter   9 value -0.471092
## iter  10 value -0.471095
## iter  11 value -0.471095
## iter  12 value -0.471096
## iter  13 value -0.471096
## iter  14 value -0.471096
## iter  15 value -0.471097
## iter  16 value -0.471097
## iter  16 value -0.471097
## iter  16 value -0.471097
## final  value -0.471097 
## converged
## initial  value -0.473585 
## iter   2 value -0.473664
## iter   3 value -0.473721
## iter   4 value -0.473823
## iter   5 value -0.473871
## iter   6 value -0.473885
## iter   7 value -0.473886
## iter   8 value -0.473886
## iter   8 value -0.473886
## iter   8 value -0.473886
## final  value -0.473886 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1      ar2    sar1  constant
##       0.9154  -0.2494  0.3237    0.2353
## s.e.  0.0733   0.0739  0.0715    0.1973
## 
## sigma^2 estimated as 0.3828:  log likelihood = -169.16,  aic = 348.33
## 
## $degrees_of_freedom
## [1] 175
## 
## $ttable
##          Estimate     SE t.value p.value
## ar1        0.9154 0.0733 12.4955  0.0000
## ar2       -0.2494 0.0739 -3.3728  0.0009
## sar1       0.3237 0.0715  4.5238  0.0000
## constant   0.2353 0.1973  1.1923  0.2347
## 
## $AIC
## [1] 1.945971
## 
## $AICc
## [1] 1.947256
## 
## $BIC
## [1] 2.035005

You have successfully fit an ARIMA model to a commodity. If you are interested in analyzing other commodities from various regions, you can find many different time series at index mundi

Data analysis - birth rate

Now you will use your new skills to carefully fit an SARIMA model to the birth time series from astsa. The data are monthly live births (adjusted) in thousands for the United States, 1948-1979, and includes the baby boom after WWII.

The birth data are plotted in your R console. Note the long-term trend (random walk) and the seasonal component of the data.

plot(birth, main="US Birth Rate")

# Plot P/ACF to lag 60 of differenced data
d_birth <- diff(birth)
acf2(d_birth, max.lag = 60)

##       [,1] [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12]
## ACF  -0.32 0.16 -0.08 -0.19  0.09 -0.28  0.06 -0.19 -0.05  0.17 -0.26  0.82
## PACF -0.32 0.06 -0.01 -0.25 -0.03 -0.26 -0.17 -0.29 -0.35 -0.16 -0.59  0.57
##      [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF  -0.28  0.17 -0.07 -0.18  0.08 -0.28  0.07 -0.18 -0.05  0.16 -0.24  0.78
## PACF  0.13  0.11  0.13  0.09  0.00  0.00  0.05  0.04 -0.07 -0.10 -0.20  0.19
##      [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36]
## ACF  -0.27  0.19 -0.08 -0.17  0.07 -0.29  0.07 -0.15 -0.04  0.14 -0.24  0.75
## PACF  0.01  0.05  0.07  0.07 -0.02 -0.06 -0.02  0.09  0.03 -0.06 -0.16  0.03
##      [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF  -0.23  0.16 -0.08 -0.15  0.05 -0.25  0.06 -0.18 -0.03  0.15 -0.22  0.72
## PACF  0.08 -0.10 -0.03  0.07 -0.04  0.06  0.04 -0.07 -0.06  0.02 -0.04  0.10
##      [,49] [,50] [,51] [,52] [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60]
## ACF  -0.24  0.16 -0.08 -0.13  0.05 -0.26  0.05 -0.17 -0.02  0.15 -0.23  0.70
## PACF  0.01  0.00 -0.03  0.04  0.03  0.00 -0.01  0.01  0.03  0.04 -0.09  0.04

# Plot P/ACF to lag 60 of seasonal differenced data
dd_birth <- diff(d_birth, lag = 12)
acf2(dd_birth, max.lag = 60)

##      [,1]  [,2]  [,3]  [,4]  [,5] [,6]  [,7]  [,8] [,9] [,10] [,11] [,12] [,13]
## ACF  -0.3 -0.09 -0.09  0.00  0.07 0.03 -0.07 -0.04 0.11  0.04  0.13 -0.43  0.14
## PACF -0.3 -0.20 -0.21 -0.14 -0.03 0.02 -0.06 -0.08 0.06  0.08  0.23 -0.32 -0.06
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF  -0.01  0.03  0.01  0.02  0.00  0.03 -0.07 -0.01     0  0.06 -0.01 -0.12
## PACF -0.13 -0.13 -0.11  0.02  0.06  0.04 -0.10  0.02     0  0.17 -0.13 -0.14
##      [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37]
## ACF   0.17 -0.04  0.03 -0.05 -0.09 -0.01  0.19 -0.03 -0.09 -0.02 -0.04  0.17
## PACF  0.07 -0.04 -0.02  0.02 -0.06 -0.07  0.05  0.07 -0.06  0.05 -0.16 -0.01
##      [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49]
## ACF  -0.14  0.03 -0.05  0.03  0.10     0 -0.10 -0.03  0.06  0.02  0.01 -0.01
## PACF -0.04 -0.01 -0.03 -0.01  0.01     0  0.03 -0.02 -0.07  0.05 -0.11  0.05
##      [,50] [,51] [,52] [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60]
## ACF   0.06 -0.08  0.03  0.01 -0.02 -0.01  0.00 -0.07  0.17 -0.04 -0.01
## PACF  0.06 -0.03 -0.03  0.04  0.02 -0.04 -0.01 -0.13  0.07  0.07 -0.05

# Fit SARIMA(0,1,1)x(0,1,1)_12. What happens?
sarima(birth, p = 0, d = 1, q = 1, P = 0, D = 1, Q = 1, S = 12)

## initial  value 2.219164 
## iter   2 value 2.013310
## iter   3 value 1.988107
## iter   4 value 1.980026
## iter   5 value 1.967594
## iter   6 value 1.965384
## iter   7 value 1.965049
## iter   8 value 1.964993
## iter   9 value 1.964992
## iter   9 value 1.964992
## iter   9 value 1.964992
## final  value 1.964992 
## converged
## initial  value 1.951264 
## iter   2 value 1.945867
## iter   3 value 1.945729
## iter   4 value 1.945723
## iter   5 value 1.945723
## iter   5 value 1.945723
## iter   5 value 1.945723
## final  value 1.945723 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), include.mean = !no.constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ma1     sma1
##       -0.4734  -0.7861
## s.e.   0.0598   0.0451
## 
## sigma^2 estimated as 47.4:  log likelihood = -1211.28,  aic = 2428.56
## 
## $degrees_of_freedom
## [1] 358
## 
## $ttable
##      Estimate     SE  t.value p.value
## ma1   -0.4734 0.0598  -7.9097       0
## sma1  -0.7861 0.0451 -17.4227       0
## 
## $AIC
## [1] 6.545975
## 
## $AICc
## [1] 6.546062
## 
## $BIC
## [1] 6.577399

# Fit another model, this time with an AR
sarima(birth, p = 1, d = 1, q = 1, P = 0, D = 1, Q = 1, S = 12)

## initial  value 2.218186 
## iter   2 value 2.032584
## iter   3 value 1.982464
## iter   4 value 1.975643
## iter   5 value 1.971721
## iter   6 value 1.967284
## iter   7 value 1.963840
## iter   8 value 1.961106
## iter   9 value 1.960849
## iter  10 value 1.960692
## iter  11 value 1.960683
## iter  12 value 1.960675
## iter  13 value 1.960672
## iter  13 value 1.960672
## iter  13 value 1.960672
## final  value 1.960672 
## converged
## initial  value 1.940459 
## iter   2 value 1.934425
## iter   3 value 1.932752
## iter   4 value 1.931750
## iter   5 value 1.931074
## iter   6 value 1.930882
## iter   7 value 1.930860
## iter   8 value 1.930859
## iter   8 value 1.930859
## final  value 1.930859 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), include.mean = !no.constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1      ma1     sma1
##       0.3038  -0.7006  -0.8000
## s.e.  0.0865   0.0604   0.0441
## 
## sigma^2 estimated as 45.91:  log likelihood = -1205.93,  aic = 2419.85
## 
## $degrees_of_freedom
## [1] 357
## 
## $ttable
##      Estimate     SE  t.value p.value
## ar1    0.3038 0.0865   3.5104   5e-04
## ma1   -0.7006 0.0604 -11.5984   0e+00
## sma1  -0.8000 0.0441 -18.1302   0e+00
## 
## $AIC
## [1] 6.522519
## 
## $AICc
## [1] 6.522695
## 
## $BIC
## [1] 6.564418

The residual analysis from the first fit indicated that the residuals were not white noise. Hence, it was necessary to include an additional nonseasonal AR term to account for the extra correlation.

Forecasting seasonal ARIMA

Forecasting monthly unemployment

Previously, you fit an SARIMA\((2,1,0, 0,1,1)_{12}\) model to the monthly US unemployment time series unemp. You will now use that model to forecast the data 3 years.

The unemp data are preloaded into your R workspace and plotted.

plot(unemp)

# Fit your previous model to unemp and check the diagnostics
sarima(unemp, p = 2, d = 1, q = 0, P = 0, D = 1, Q = 1, S = 12)

## initial  value 3.340809 
## iter   2 value 3.105512
## iter   3 value 3.086631
## iter   4 value 3.079778
## iter   5 value 3.069447
## iter   6 value 3.067659
## iter   7 value 3.067426
## iter   8 value 3.067418
## iter   8 value 3.067418
## final  value 3.067418 
## converged
## initial  value 3.065481 
## iter   2 value 3.065478
## iter   3 value 3.065477
## iter   3 value 3.065477
## iter   3 value 3.065477
## final  value 3.065477 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), include.mean = !no.constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1     ar2     sma1
##       0.1351  0.2464  -0.6953
## s.e.  0.0513  0.0515   0.0381
## 
## sigma^2 estimated as 449.6:  log likelihood = -1609.91,  aic = 3227.81
## 
## $degrees_of_freedom
## [1] 356
## 
## $ttable
##      Estimate     SE  t.value p.value
## ar1    0.1351 0.0513   2.6326  0.0088
## ar2    0.2464 0.0515   4.7795  0.0000
## sma1  -0.6953 0.0381 -18.2362  0.0000
## 
## $AIC
## [1] 8.723811
## 
## $AICc
## [1] 8.723988
## 
## $BIC
## [1] 8.765793

# Forecast the data 3 years into the future
sarima.for(unemp, n.ahead = 36, p = 2, d = 1, q = 0, P = 0, D = 1, Q = 1, S = 12)

## $pred
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1979 676.4664 685.1172 653.2388 585.6939 553.8813 664.4072 647.0657 611.0828
## 1980 683.3045 687.7649 654.8658 586.1507 553.9285 664.1108 646.6220 610.5345
## 1981 682.6406 687.0977 654.1968 585.4806 553.2579 663.4398 645.9508 609.8632
##           Sep      Oct      Nov      Dec
## 1979 594.6414 569.3997 587.5801 581.1833
## 1980 594.0427 568.7684 586.9320 580.5249
## 1981 593.3713 568.0970 586.2606 579.8535
## 
## $se
##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 1979  21.20465  32.07710  43.70167  53.66329  62.85364  71.12881  78.73590
## 1980 116.99599 124.17344 131.51281 138.60466 145.49706 152.12863 158.52302
## 1981 194.25167 201.10648 208.17066 215.11503 221.96039 228.64285 235.16874
##            Aug       Sep       Oct       Nov       Dec
## 1979  85.75096  92.28663  98.41329 104.19488 109.67935
## 1980 164.68623 170.63839 176.39520 181.97333 187.38718
## 1981 241.53258 247.74268 253.80549 259.72970 265.52323

As you can see, the forecast is able to replicate much of the seasonal variation in the original unemployment data.

How hard is it to forecast commodity prices?

As previously mentioned, making money in commodities is not easy. To see a difficulty in predicting a commodity, you will forecast the price of chicken to five years in the future. When you complete your forecasts, you will note that even just a few years out, the acceptable range of prices is very large. This is because commodities are subject to many sources of variation.

Recall that you previously fit an SARIMA\((2,1,0, 1,0,0)_{12}\) model to the monthly US chicken price series chicken. You will use this model to calculate your forecasts.

The astsa package is preloaded for you and the monthly price of chicken data (chicken) are plotted.

plot(chicken)

# Fit the chicken model again and check diagnostics
sarima(chicken, p = 2, d = 1, q = 0, P = 1, D = 0, Q = 0, S = 12)

## initial  value 0.015039 
## iter   2 value -0.226398
## iter   3 value -0.412955
## iter   4 value -0.460882
## iter   5 value -0.470787
## iter   6 value -0.471082
## iter   7 value -0.471088
## iter   8 value -0.471090
## iter   9 value -0.471092
## iter  10 value -0.471095
## iter  11 value -0.471095
## iter  12 value -0.471096
## iter  13 value -0.471096
## iter  14 value -0.471096
## iter  15 value -0.471097
## iter  16 value -0.471097
## iter  16 value -0.471097
## iter  16 value -0.471097
## final  value -0.471097 
## converged
## initial  value -0.473585 
## iter   2 value -0.473664
## iter   3 value -0.473721
## iter   4 value -0.473823
## iter   5 value -0.473871
## iter   6 value -0.473885
## iter   7 value -0.473886
## iter   8 value -0.473886
## iter   8 value -0.473886
## iter   8 value -0.473886
## final  value -0.473886 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1      ar2    sar1  constant
##       0.9154  -0.2494  0.3237    0.2353
## s.e.  0.0733   0.0739  0.0715    0.1973
## 
## sigma^2 estimated as 0.3828:  log likelihood = -169.16,  aic = 348.33
## 
## $degrees_of_freedom
## [1] 175
## 
## $ttable
##          Estimate     SE t.value p.value
## ar1        0.9154 0.0733 12.4955  0.0000
## ar2       -0.2494 0.0739 -3.3728  0.0009
## sar1       0.3237 0.0715  4.5238  0.0000
## constant   0.2353 0.1973  1.1923  0.2347
## 
## $AIC
## [1] 1.945971
## 
## $AICc
## [1] 1.947256
## 
## $BIC
## [1] 2.035005

# Forecast the chicken data 5 years into the future
sarima.for(chicken, n.ahead = 60, p = 2, d = 1, q = 0, P = 1, D = 0, Q = 0, S = 12)

## $pred
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 2016                                                                111.0907
## 2017 110.5358 110.5612 110.5480 110.7055 111.0047 111.1189 111.1552 111.1948
## 2018 111.8108 111.9782 112.1330 112.3431 112.5991 112.7952 112.9661 113.1380
## 2019 114.1331 114.3464 114.5556 114.7827 115.0247 115.2473 115.4617 115.6765
## 2020 116.7942 117.0224 117.2492 117.4819 117.7193 117.9505 118.1790 118.4077
## 2021 119.5651 119.7980 120.0306 120.2650 120.5010 120.7350 120.9681         
##           Sep      Oct      Nov      Dec
## 2016 110.8740 110.6853 110.5045 110.5527
## 2017 111.2838 111.3819 111.4825 111.6572
## 2018 113.3260 113.5168 113.7085 113.9242
## 2019 115.8965 116.1174 116.3386 116.5675
## 2020 118.6380 118.8686 119.0993 119.3326
## 2021                                    
## 
## $se
##             Jan        Feb        Mar        Apr        May        Jun
## 2016                                                                  
## 2017  3.7414959  4.1793190  4.5747009  4.9373266  5.2742129  5.5903499
## 2018  8.2010253  8.5605811  8.9054714  9.2372195  9.5572539  9.8667955
## 2019 12.0038164 12.2921541 12.5738417 12.8492868 13.1188976 13.3830477
## 2020 15.1557253 15.3959082 15.6323906 15.8653300 16.0948844 16.3212022
## 2021 17.8397890 18.0473081 18.2524651 18.4553364 18.6559977 18.8545213
##             Jul        Aug        Sep        Oct        Nov        Dec
## 2016             0.6187194  1.3368594  2.0462419  2.6867986  3.2486625
## 2017  5.8893133  6.2367345  6.6253573  7.0309771  7.4344077  7.8255932
## 2018 10.1668604 10.4736807 10.7857727 11.0980056 11.4063211 11.7085266
## 2019 13.6420693 13.9002670 14.1573839 14.4122197 14.6638269 14.9117124
## 2020 16.5444204 16.7657634 16.9852163 17.2025022 17.4174076 17.6298379
## 2021 19.0509752

You have now mastered the process of detrending your time series data, exploring the qualities of the data to determine an appropriate model, fitting and adjusting a model, and even forecasting based on the model!

Congratulations!

What you’ve learned

how to identify an ARMA model from data looking at ACF and PACF
how to use integrated ARMA (ARIMA) models for nonstationary time series
how to cope with seasonality

Don’t stop here

astsa package
other DataCamp courses in Time Series Analysis

ARIMA Models in R

DataCamp - David Stoffer

10/28/2020