I took these notes while working through the Introduction to Time Series Analysis at DataCamp.

Exploratory Analysis

The most useful way to view raw time series (ts) data is the print() function. It shows the observations and the Start, End, and Frequency. Function length() returns the number of observations. Functions head() and tail() print the first and last n observations. str() shows the structure.

Here are these functions used with the Nile dataset from the datasets package. Nile contains n = 100 measurements of the annual flow of the river Nile at Aswan, 1871–1970, in 10^8 m^3.

print(Nile)

## Time Series:
## Start = 1871 
## End = 1970 
## Frequency = 1 
##   [1] 1120 1160  963 1210 1160 1160  813 1230 1370 1140  995  935 1110  994
##  [15] 1020  960 1180  799  958 1140 1100 1210 1150 1250 1260 1220 1030 1100
##  [29]  774  840  874  694  940  833  701  916  692 1020 1050  969  831  726
##  [43]  456  824  702 1120 1100  832  764  821  768  845  864  862  698  845
##  [57]  744  796 1040  759  781  865  845  944  984  897  822 1010  771  676
##  [71]  649  846  812  742  801 1040  860  874  848  890  744  749  838 1050
##  [85]  918  986  797  923  975  815 1020  906  901 1170  912  746  919  718
##  [99]  714  740

length(Nile)

## [1] 100

head(Nile, n = 10)

##  [1] 1120 1160  963 1210 1160 1160  813 1230 1370 1140

tail(Nile, n = 10)

##  [1] 1020  906  901 1170  912  746  919  718  714  740

The plot() function is another useful way to explore the data. When there are multiple time series in the data, use plot.ts(). The default xlab is “Time”, so you will want to override it.

plot(Nile, 
     xlab = "Year", 
     ylab = "River Volume (1e9 m^{3})",
     main = "Annual River Nile Volume at Aswan, 1871-1970", 
     type ="b")  # for points

The deltat() and frequency() functions show the spacing within a repeating series and number of periods within a series.

Here are these functions used with the AirPassengers dataset from the datasets package. AirPassengers contains n = 144 monthly totals of international airline passengers, 1949 to 1960.

plot(AirPassengers, 
     xlab = "Month", 
     ylab = "Passengers",
     main = "International Airline Passengers, 1949-1960")

# for monthly ts, points within year are separated by 1/12.
deltat(AirPassengers)

## [1] 0.08333333

# Number of observations per period (12 months in year).
frequency(AirPassengers)

## [1] 12

# retrieve the observation number within the cycle. 
cycle(AirPassengers)

##      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949   1   2   3   4   5   6   7   8   9  10  11  12
## 1950   1   2   3   4   5   6   7   8   9  10  11  12
## 1951   1   2   3   4   5   6   7   8   9  10  11  12
## 1952   1   2   3   4   5   6   7   8   9  10  11  12
## 1953   1   2   3   4   5   6   7   8   9  10  11  12
## 1954   1   2   3   4   5   6   7   8   9  10  11  12
## 1955   1   2   3   4   5   6   7   8   9  10  11  12
## 1956   1   2   3   4   5   6   7   8   9  10  11  12
## 1957   1   2   3   4   5   6   7   8   9  10  11  12
## 1958   1   2   3   4   5   6   7   8   9  10  11  12
## 1959   1   2   3   4   5   6   7   8   9  10  11  12
## 1960   1   2   3   4   5   6   7   8   9  10  11  12

If there are missing observations (NAs) in the time series, it is common practice to impute values with the time series mean.

Suppose AirPassengers is missing 12 months of data. The code below replaces the NAs with the mean and overlays the result in the original plot.

x <- AirPassengers
x[85:96] <- NA
plot(x)
x[which(is.na(x))] <- mean(x, na.rm = TRUE)
points(x, type = "l", col = 2, lty = 3)

The ts() function creates time series objects. A ts object is a vector or matrix with additional attributes, including time indices for each observation, the sampling frequency and time increment between observations, and the cycle length for periodic data.

If the time series is continuous, its points may or may not be evenly spaced. If it is discrete, the points are necesarily evenly spaced.

# Create a time series of quarterly data starting in 2017
x <- rnorm(n = 20)
x.ts <- ts(x, 
           start = 2017, 
           frequency = 4)
plot(x.ts, 
     xlab = "Quarter",
     type = "b")

The xts object is an alternative to ts. Create an xts object with xts(x, order.by) where x is the the data and order.by is a vector of dates/times to index the data.

library(xts)
x.xts <- xts(x, 
             order.by = seq(as.Date("2017-01-01"), 
                          length = 20, 
                          by = "quarters"))
plot(x.xts, 
     xlab = "Quarter",
     type = "b")

If there are multiple time series in the data set, use the ts.plot() function instead of plot(). Dataset EuStockMarkets from the datasets package has multiple series. EuStockMarkets contains daily closing prices of major European stock indices from 1991-1998: Germany (DAX), Switzerland (SMI), France (CAC), and the UK (FTSE).

ts.plot(EuStockMarkets, 
        col = 1:4, 
        xlab = "Year", 
        ylab = "Index Value", 
        main = "Major European Stock Indices, 1991-1998")
legend("topleft", 
       colnames(EuStockMarkets), 
       lty = 1, 
       col = 1:4, 
       bty = "n")

Prediction

The White Noise (WN) model is the simplest example of a stationary process. It has a fixed mean and variance. The WN model is one of several autoregressive integrated moving average (ARIMA) models. An ARIMA(p, d, q) model has three parts, the autoregressive order p (number of time lags), the order of integration (or differencing) d, and the moving average order q. When two out of the three terms are zeros, the model may be referred to based on the non-zero parameter, dropping “AR”, “I” or “MA” from the acronym describing the model. For example, ARIMA (1, 0,0) is AR(1), ARIMA(0,1,0) is I(1), and ARIMA(0,0,1) is MA(1). The WN model is ARIMA(0,0,0).

Simulate a WN time series using the arima.sim() function with argument model = list(order = c(0, 0, 0)). Here is a 50-period WN model with mean 100 and standard deviation sd of 10.

wn <- arima.sim(model = list(order = c(0, 0, 0)), 
                n = 50, 
                mean = 100, 
                sd = 10)
ts.plot(wn,
        xlab = "Period", 
        ylab = "", 
        main = "WN Model, mean = 100, sd = 10")

Fit a WN model to a dataset with arima(x, order = c(0, 0, 0)). The model returns the mean, var, and se.

arima(wn, order = c(0, 0, 0))

## 
## Call:
## arima(x = wn, order = c(0, 0, 0))
## 
## Coefficients:
##       intercept
##        100.2029
## s.e.     1.6230
## 
## sigma^2 estimated as 131.7:  log likelihood = -192.96,  aic = 389.92

The model mean will be identical to the series mean. The variance will be close to the series variance.

# mean = intercept
mean(wn)

## [1] 100.2029

# se ~ s.e.
sqrt(var(wn) / length(wn))

## [1] 1.639436

# var ~ sigma^2
var(wn)

## [1] 134.3875

The Random Walk (RW) model is a WN model with a strong time dependance, so there is no fixed mean or variance. It is a non-stationary model. The random walk model is \(Y_t = c + Y_{t-1} + \epsilon_t\) where \(c\) is the drift coefficient, an overall slope parameter.

Simulate a RW time series using the arima.sim() function with argument model = list(order = c(0, 1, 0)). Here is a 50-period RW model with mean 0.

rw <- arima.sim(model = list(order = c(0, 1, 0)), 
                n = 50)
ts.plot(rw,
        xlab = "Period", 
        ylab = "", 
        main = "RW Model, mean = 0")

The first difference of a RW model is just a WN model.

ts.plot(diff(rw),
        xlab = "Period", 
        ylab = "", 
        main = "Diff(RW) = WN")

Specify the drift parameter with mean. The drift parameter is the slope of the RW model.

rw <- arima.sim(model = list(order = c(0, 1, 0)), 
                n = 100, 
                mean = 1)
ts.plot(rw,
        xlab = "Period", 
        ylab = "", 
        main = "RW Model, mean = 1")

Fit a random walk model with drift by first differencing the data, then fitting the WN model to the differenced data. The arima() intercept is the drift variable.

wn.mod <- arima(diff(rw), 
                order = c(0, 0, 0))

ts.plot(rw)
abline(a = 0, b = wn.mod$coef)

rw.mod <- arima(rw, 
                order = c(0, 1, 0))
points(rw - residuals(rw.mod), type = "l", col = 2, lty = 2)

When dealing with time series data, ask first if it stationary because stationary models are much simpler. A stationary process oscillates randomly about a fixed mean, a phenomenon called reversion to the mean. In contrast, nonstationary processes have time trends, periodicity, or lack mean reversion. Even when a process is nonstationary, the changes in the series may be approximately stationary. For example, inflation rates show a pattern over time related to Federal Reserve policy, but the changes in interest rates are stationary.

WN processes are stationary, but RW processes (the cumulative sum of the WN process) are not. Here are plots of a WN process and corresponding RW process using the cumsum() of the WN. Only the WN process is stationary.

# White noise
wn <- arima.sim(model = list(order = c(0, 0, 0)), 
                n = 100) 

# Random walk from white noise
rw <- cumsum(wn)
  
plot.ts(cbind(wn, rw),
        xlab = "Period",
        main = "WN with Zero Mean, and Corresponding RW")

Here is another WN process with corresponding RW process, this time with a drift parameter of 0.4.

# White noise with mean <> 0
wn <- arima.sim(model = list(order = c(0, 0, 0)), 
                n = 100, 
                mean = 0.4) 
  
# Random walk with drift from white noise
rw <- cumsum(wn)

plot.ts(cbind(wn, rw),
        xlab = "Period",
        main = "WN with Nonzero Mean, and Corresponding RW")

Correlation Analysis

Instead of comparing the plots of two time series, it is often useful to measure the correlation between two time seies by plotting the one time series against the other. E.g., in finance it is common to plot one stock’s price or log return against another stock.¹

Here is a short diversion into the relationship between log returns and stock prices. Below is the Germany DAX stock index from the EuStockMarkets dataset, and the corresponding log return. Daily returns are typically small with near-zero average.

DAX <- EuStockMarkets[, 1]
lnDiff <- diff(log(EuStockMarkets[,1]))
plot.ts(cbind(DAX, 
           lnDiff),
        xlab = "",
        main = "Germany DAX Stock Index, 1991-1998")

The occasionally extreme returns create a heavy-tailed distribution. Looking at all the indices in EuStockMarkets, the average daily return is about 0 with a standard deviation of 1%, but the histogram and normal probability plots reveal some daily returns are almost 5% in magnitude. The distribution of log returns is non-normal, especially in the tails.

eu_pctrtn <- diff(log(EuStockMarkets))

# column means
colMeans(eu_pctrtn)

##          DAX          SMI          CAC         FTSE 
## 0.0006520417 0.0008178997 0.0004370540 0.0004319851

# column standard deviations
apply(eu_pctrtn, MARGIN = 2, FUN = sd)

##         DAX         SMI         CAC        FTSE 
## 0.010300837 0.009250036 0.011030875 0.007957728

# histogram of percent returns for each index
par(mfrow = c(2,2))
hist(eu_pctrtn[, "DAX"], main = "DAX", xlab = "Percent Return")
hist(eu_pctrtn[, "SMI"], main = "SMI", xlab = "Percent Return")
hist(eu_pctrtn[, "CAC"], main = "CAC", xlab = "Percent Return")
hist(eu_pctrtn[, "FTSE"], main = "FTSE", xlab = "Percent Return")

# normal quantile plots of percent returns for each index
par(mfrow = c(2,2))
qqnorm(eu_pctrtn[, "DAX"], main = "DAX Normal Q-Q Plot")
qqline(eu_pctrtn[, "DAX"])
qqnorm(eu_pctrtn[, "SMI"], main = "SMI Normal Q-Q Plot")
qqline(eu_pctrtn[, "SMI"])
qqnorm(eu_pctrtn[, "CAC"], main = "CAC Normal Q-Q Plot")
qqline(eu_pctrtn[, "CAC"])
qqnorm(eu_pctrtn[, "FTSE"], main = "FTSE Normal Q-Q Plot")
qqline(eu_pctrtn[, "FTSE"])

With that background on log returns, now back to the measurement of the correlation between two time seies. Here are the bi-variate log returns of the stock indices from the EuStockMarkets dataset. The pairs of data drawn from the multivariate normal distribution form a roughly elliptically shaped point cloud.

pairs(diff(log(EuStockMarkets)))

Quantify the correlation between two quantitative variables with the Pearson correlation coefficient. Recall that the covariance between series \(X\) and \(Y\) is defined \(Cov(X, Y) = E[(X - \mu_X) (Y - \mu_Y)]\) and this simplifies to \(Cov(X, Y) = E[XY] - \mu_X \mu_Y\). The covariance of \(X\) and \(Y\) is positive if \(X\) and \(Y\) increase together and negative if they move in opposite directions. If \(X\) and \(Y\) are independent, \(E[XY] = E[X]E[Y] = \mu_X \mu_Y\), so \(Cov(X, Y) = 0\). Covariances are often inconvenient because their values depend on the units of the series. Dividing \(Cov(X, Y)\) by the standard deviations \(\sigma_X \sigma_Y\) creates a unitless variable with range [-1, 1], also known as the Pearson correlation:

\[\rho = \frac{\sigma_{XY}} {\sigma_X \sigma_Y}.\]

Incidentally, \(\rho\) is related to the slope of the linear regression line: \(\beta_1 = \frac{\sigma_{XY}}{\sigma_X^2} = \rho \frac{\sigma_Y}{\sigma_X}\).

Here are the correlations of the EuStockMarket dataset indices. \(|\rho| < .3\) is typically considered small, \(.3 < |\rho| < .5\) is medium, and \(|\rho| > .5\) is considered large. All pairs of the indices below have a large correlation.

cor(diff(log(EuStockMarkets)))

##            DAX       SMI       CAC      FTSE
## DAX  1.0000000 0.7031219 0.7344304 0.6394674
## SMI  0.7031219 1.0000000 0.6160454 0.5847791
## CAC  0.7344304 0.6160454 1.0000000 0.6485679
## FTSE 0.6394674 0.5847791 0.6485679 1.0000000

Autocorrelations (aka, lagged correlations) assess whether a time series is dependent on its past. The lag-1 autocorrelation of \(x\) is \(Cor(X_t, X_{t-1})\). Here is a lag-1 autocorrelation plot of Germany’s DAX stock index from the EuStockMarkets dataset.

plot(EuStockMarkets[-1, "DAX"],
     EuStockMarkets[-nrow(EuStockMarkets), "DAX"],
     main = "Lag-1 Autocorrelation of DAX Index",
     ylab = "t",
     xlab = "t - 1")

The autocorrelation function (ACF) acf() calculates the lag autocorrelation. There are two ways to use this function. Specify lag.max and plot = FALSE to see a list of autocorrelations up to lag.max. Specify plot = TRUE (default) to see a visual representation of the autocorrelation. The lag-1 autocorrelation from the DAX index plotted above is 0.997. (The first row shows the lags.)

acf(EuStockMarkets[, "DAX"], lag.max = 1, plot = FALSE)

## 
## Autocorrelations of series 'EuStockMarkets[, "DAX"]', by lag
## 
## 0.00000 0.00385 
##   1.000   0.997

The numeric estimates are important for detailed calculations, but it is also useful to visualize the ACF as a function of the lag. The acf() function produces a figure by default. It also makes a default choice for lag.max, the maximum number of lags to be displayed. Here is the ACF plot for the DAX stock index in EuStockMarkets.

acf(EuStockMarkets[, "DAX"])

Autoregression

The autoregressive (AR) model is the most widely used time series model. It shares the familiar interpretation of a simple linear regression, but each observation is regressed on the previous observation.

\[Y_t - \mu = \phi(Y_{t-1} - \mu) + \epsilon_t\]

where \(\epsilon_t \sim WN(0, \sigma_\epsilon^2)\). The AR model also includes the white noise (WN) and random walk (RW) models as special cases.

The arima.sim() function can simulate data from an AR model by setting the model argument equal to list(ar = phi) where phi is a slope parameter in the interval (-1, 1). As phi approaches 1, the plot smooths. With negative phi values, the plot oscillates.

# small autocorrelation
x <- arima.sim(model = list(ar = 0.5), n = 100)

# large autocorrelation
y <- arima.sim(model = list(ar = 0.9), n = 100)

# negative autocorrelation (oscillation)
z <- arima.sim(model = list(ar = -0.75), n = 100)

plot.ts(cbind(x, y, z))

The plots generated by the acf() function provide useful information about each lag. Series x (small slope parameter) has positive autocorrelation for the first couple lags, but they quickly decay toward zero. Series y (large slope parameter) has positive autocorrelation for many lags. Series z (negative slope parameter) has an oscillating pattern.

par(mfrow = c(2,2))
acf(x)
acf(y)
acf(z)

The stationary AR model has a slope parameter between -1 and 1. The AR model exhibits higher persistence when its slope parameter is closer to 1, but the process reverts to its mean fairly quickly. Its sample ACF also decays to zero at a quick (geometric) rate, meaning values far in the past have little impact on the present value of the process.

Below, the AR model with slope parameter 0.98 exhibits greater persistence than with slope parameter 0.90, but both decay to 0.

ar90 <- arima.sim(model = list(ar = 0.9), n = 200)
ar98 <- arima.sim(model = list(ar = 0.98), n = 200)

par(mfrow = c(2,2))
ts.plot(ar90)
ts.plot(ar98)
acf(ar90)
acf(ar98)

By contrast, the random walk (RW) model is a special case of the AR model in which the slope parameter is equal to 1. The RW model is nonstationary, and shows considerable persistence and relatively little decay in the ACF.

ar100 <- arima.sim(model = list(order = c(0, 1, 0)), n = 200)
par(mfrow = c(2,1))
ts.plot(ar100)
acf(ar100)

Fit the AR(1) model (autoregressive model with one time lag) using the arima() function with order = c(1, 0, 0), meaning 1 time lag, 0 differencing, and 0 order moving average.

Below is an AR(1) model fit to the AirPassengers dataset. The output of the arima function shows \(\phi\) as ar1 = 0.9646, \(\mu\) as intercept = 278.4649, and \(\hat{\sigma}_\epsilon^2\) as sigma^2 = 1119.

ar1 <- arima(AirPassengers, order = c(1, 0, 0))
ar1.fit <- AirPassengers - residuals(ar1)
print(ar1)

## 
## Call:
## arima(x = AirPassengers, order = c(1, 0, 0))
## 
## Coefficients:
##          ar1  intercept
##       0.9646   278.4649
## s.e.  0.0214    67.1141
## 
## sigma^2 estimated as 1119:  log likelihood = -711.09,  aic = 1428.18

par(mfrow = c(2, 1))
ts.plot(AirPassengers)
points(ar1.fit, type = "l", col = 2, lty = 2)
acf(AirPassengers)

Use the ARIMA model to forecast observations with the predict() function. Specify the number of periods beyond the last observation with the n.ahead parameter.

Below is a forecast of 10 periods (years) beyond the 1871-1970 annual observations in the Nile dataset. The relatively wide band of confidence (dotted lines) is a result of the low persistence in the data.

ar1 <-arima(Nile, order  = c(1, 0, 0))
print(ar1)

## 
## Call:
## arima(x = Nile, order = c(1, 0, 0))
## 
## Coefficients:
##          ar1  intercept
##       0.5063   919.5685
## s.e.  0.0867    29.1410
## 
## sigma^2 estimated as 21125:  log likelihood = -639.95,  aic = 1285.9

ts.plot(Nile, xlim = c(1871, 1980))
ar1.fit <- Nile - resid(ar1)

ar.pred <- predict(ar1, n.ahead = 10)$pred
ar.pred.se <- predict(ar1, n.ahead = 10)$se
points(ar.pred, type = "l", col = 2)
points(ar.pred - 2*ar.pred.se, type = "l", col = 2, lty = 2)
points(ar.pred + 2*ar.pred.se, type = "l", col = 2, lty = 2)

Simple Moving Average

The simple moving average (MA) model is

\[Y_t = \mu + \epsilon_t + \theta\epsilon_{t-1}\]

If the slope parameter \(\theta\) is zero, \(Y_t\) is a white noise process, \(Y_t \sim (\mu, \sigma_\epsilon^2)\). Large \(\theta\) indicates large autocorrelation. Negative \(\theta\) indicates an oscillating series.

The MA model is used to account for very short-run autocorrelation. Each observation is regressed on the previous innovation, which is not actually observed. Like the AR model, the MA model includes the white noise (WN) model as special case.

Simulate the MA model using arima.sim() with parameter list(ma = theta), where theta is a slope parameter from the interval (-1, 1).

Here are three MA models. The first has slope paramter 0.5 and the second has slope parameter 0.9. The second plot shows more persistance as a result. THe third plot has a negative slope parameter and oscillates as a result.

x <- arima.sim(model = list(ma = 0.5), n = 100)
y <- arima.sim(model = list(ma = 0.9), n = 100)
z <- arima.sim(model = list(ma = -0.5), n = 100)

plot.ts(cbind(x, y, z))

Use the acf() function to estimate the autocorrelation functions. The MA series x with slope = 0.5 has a positive sample autocorrelation at the first lag, but it is approximately zero at other lags. The series y with slope = 0.9 has a larger sample autocorrelation at its first lag, but it is also approximately zero for the others. The series z with slope = -0.5 has a negative sample autocorrelation at the first lag and alternates, but is approximately zero for all higher lags.

par(mfrow = c(2, 2))
acf(x)
acf(y)
acf(z)

Fit the MA model using the arima() function with `order = c(0, 0, 1), meaning 0 time lag, 0 differencing, and 1st order moving average.

Below is an MA model fit to the Nile dataset. The output of the arima function shows \(\theta\) as ma1 = 0.3783, \(\mu\) as intercept = 919.2433, and \(\sigma_\epsilon^2\) as sigma^2 = 23272.

ma <- arima(Nile, order = c(0, 0, 1))
print(ma)

## 
## Call:
## arima(x = Nile, order = c(0, 0, 1))
## 
## Coefficients:
##          ma1  intercept
##       0.3783   919.2433
## s.e.  0.0791    20.9685
## 
## sigma^2 estimated as 23272:  log likelihood = -644.72,  aic = 1295.44

ts.plot(Nile)
ma.fit <- Nile - resid(ma)
points(ma.fit, type = "l", col = 2, lty = 2)

Use the ARIMA model to forecast observations with the predict() function. The MA model can only produce a 1-step forecast. Except for the 1-step forecast, all forecasts from the MA model are equal to the estimated mean (intercept).

Below is a forecast of 10 periods (years) beyond the 1871-1970 annual observations in the Nile dataset.

predict(ma, n.ahead = 10)

## $pred
## Time Series:
## Start = 1971 
## End = 1980 
## Frequency = 1 
##  [1] 868.8747 919.2433 919.2433 919.2433 919.2433 919.2433 919.2433
##  [8] 919.2433 919.2433 919.2433
## 
## $se
## Time Series:
## Start = 1971 
## End = 1980 
## Frequency = 1 
##  [1] 152.5508 163.1006 163.1006 163.1006 163.1006 163.1006 163.1006
##  [8] 163.1006 163.1006 163.1006

ts.plot(Nile, xlim = c(1871, 1980))
ma.pred <- predict(ma, n.ahead = 10)$pred
ma.pred.se <- predict(ma, n.ahead = 10)$se
points(ma.pred, type = "l", col = 2)
points(ma.pred - 2*ma.pred.se, type = "l", col = 2, lty = 2)
points(ma.pred + 2*ma.pred.se, type = "l", col = 2, lty = 2)

How do you decide which model provides the best fit? Measure the model fit with the Akaike information criterion (AIC) and/or Bayesian information criterion (BIC). These indicators penalize models with more estimated parameters to avoid overfitting, so smaller indicator values are preferable.

Use the AIC() and BIC() functions to estimate the indicators. Below are the AIC and BIC for the AR(1) and MA models. Although the predictions from both models are similar (they have a correlation coeffiicent of 0.94), both the AIC and BIC indicate that the AR model is a slightly better fit for the Nile data.

cor(ar1.fit, ma.fit)

## [1] 0.9401765

AIC(ar1)

## [1] 1285.904

AIC(ma)

## [1] 1295.442

BIC(ar1)

## [1] 1293.72

BIC(ma)

## [1] 1303.257

References

DataCamp: Introduction to Time Series Analysis

The log return (aka, continuously compounded return) is the change (first difference) in the logarithm of price. The log return is the percentage change. One advantage of log returns is that calculating multi-period returns is simple - just add the invidual period returns.↩

Introduction to Time Series Analysis

Michael Foley

2019-06-19