Time Series Basics

1. Time Series and stationarity

library("forecast")
library("fpp")
load("/Users/arminda/ownCloud/Prob_y_Estad/PyE(I)/AID/TS-New/milk.RData")

A stationary time series is one whose properties do not depend on the time at which the series is observed (the mean, variance and autocorrelation structure do not change over time). In general, a stationary time series will have no predictable patterns in the long-term. Time plots will show the series to be roughly horizontal, flat looking, without any trend and with constant variance.

A time series is stationary if its statistical properties are constant over time. A stationary series has no trend, its variation around its mean have constant amplitude and it wiggles in a consistent fashion, i.e., its short-term random time patterns always look the same in a statistical sense.

Consider the eight series plotted above. Which ones do you think are stationary?

2. Time Series and autocorrelation
Some characteristic of time series are:

  1. Observations in time series tend to be correlated . The data has memory, observations today are affected by what happened in the past.
  2. How correlated are the observations with each other?
  3. How far does the memory go?

The autocorrelation coefficient measures the degree of (linear) association between the variables that are separated \(k\) points in time. These \(r_k\) coefficients at lags \(k\), are plotted in the Correlogram (ACF = Auto Correlation Function).

The Partial Autocorrelation coefficient measures dependency between observation separated \(k\) points in time, taking into account (or controlling for) the variables in between. For instance, the partial correlation at lag \(5\) measures the correlation betwen observations separated \(5\) units of time, taking into account the effect (partialling out the effect) of observations separated \(2, 3\) and \(4\) time units. They are partial correlation coefficients (like the ones we defined in multiple linear regression). The prefix Auto is because they are correlations of the data with itself.

These two kind of coefficients are plotted in the Correlogram and Partial Correlogram (Sample ACF and PACF). Inspection of these plots will help us in answering the questions above.

The correlogram of a stationary time series goes to zero quickly:

  1. Slow decrease of autocorrelations indicate that the series is non stationary.
  2. Slow decrease of autocorrelations in the seasonal lags indicates that the series is non stationary with seasonality.

3. Transformations to achieve stationarity

Log transformation
Transformations such as logarithms (or square root) can help to stabilize the variance of a time series.

Differencing

First-order differencing: this creates a new series of data taking the differences between consecutive observations. First differences are the change between one observation and the next.

\[ d_t^1=y_t^{\prime}= y_t- y_{t-1} \]

The differenced series will have only \(T-1\) values since it is not possible to calculate a difference for the first observation. Often (not always) a first difference will “detrend”” the data.

Second-order differencig: Occasionally the differenced data will not appear stationary and it may be necessary to difference the data a second time to obtain a stationary series:

\[ \begin{eqnarray*} d_t^2=y_t^{\prime \prime}&=& y_t^{\prime}- y_{t-1}^{\prime} \\ &=&y_t- 2y_{t-1} + y_{t-2} \\ \end{eqnarray*} \]

In practice, it is almost never necessary to go beyond second-order differences.

Seasonal differencing: is defined as a difference between a value and a value with lag that is a multiple of s (the period). Seasonal differences are the change between one year to the next.

With s = 12, which may occur with monthly data, a seasonal difference is:

\[d_t^{12}= y_t - y_{t-12}\]

The differences (from the previous year) may be about the same for each month of the year giving us a stationary series.

With s = 4, which may occur with quarterly data, a seasonal difference is:

\[d_t^4 = y_t - y_{t-4}\]

Seasonal differencing removes seasonal trend and can also get rid of a seasonal random walk type of nonstationarity.

Differencing for Trend and Seasonality: When both trend and seasonality are present, we may need to apply both a non-seasonal first difference and a seasonal difference.

When both seasonal and first differences are applied, it makes no difference which is done first, the result will be the same. However, if the data have a strong seasonal pattern,it is recommended that seasonal differencing be done first because sometimes the resulting series will be stationary and there will be no need for a further first difference. If first differencing is done first, there will still be seasonality present.

How does this work? With the command diff() in R:

data(ibmclose)
## Warning in data(ibmclose): data set 'ibmclose' not found
par(mfrow=c(1,4))
plot(ibmclose, xlab = "Year")
plot(diff(ibmclose), xlab = "Year")
plot(diff(ibmclose,12), xlab = "Year")
plot(diff(diff(ibmclose,12)), xlab = "Year")

Plots: (from left to right) Original series, first-order difference, first-order seasonal difference, seasonal and first differences.

How many differences to apply to the original series to make it stationary is a very important question. Besides inspecting ACF and PACF plots, looking for no patterns, Robert Nau’s advise is very useful. In particular, rule 3:

(3) Rule 3: The optimal order of differencing is often the order of differencing at which the standard deviation is lowest.  

Comparing the standard deviations of successive differentiations is a good way to proceed:

sd(ibmclose)
## [1] 84.21924
sd(diff(ibmclose))
## [1] 7.258214
sd(diff(ibmclose,12))
## [1] 27.34944
sd(diff(diff(ibmclose,12)))
## [1] 9.952536

The standard deviation stops decreasing with one first-order differentiation.

4. Stationarity tests

One way to determine more objectively if differencing is required is to use a stationarity test. These are statistical hypothesis tests of stationarity that are designed for determining whether differencing is required. A warning is in order here: these tests may lead to conflicting answers.

Unit roots tests:

A number of unit root tests are available, and they are based on different assumptions and may lead to conflicting answers. One of the most popular tests is the Augmented Dickey-Fuller (ADF) test. This test is applied on the seasonal-adjusted series to see if it is stationary or needs more regular differences. The alternative hypothesis is stationarity, so, we are looking here for low p-values (\(<0.05\)) not to reject that the series is stationary:

library("tseries")
adf.test(diff(ibmclose))
## Warning in adf.test(diff(ibmclose)): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(ibmclose)
## Dickey-Fuller = -6.067, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary

The Phillips-Perron test tests the null that the series has a unit root. If it is not rejected, then the series with one first-order differences should be stationary.

pp.test(ibmclose)
## 
##  Phillips-Perron Unit Root Test
## 
## data:  ibmclose
## Dickey-Fuller Z(alpha) = -3.9465, Truncation lag parameter = 5, p-value
## = 0.8893
## alternative hypothesis: stationary
pp.test(diff(ibmclose))
## Warning in pp.test(diff(ibmclose)): p-value smaller than printed p-value
## 
##  Phillips-Perron Unit Root Test
## 
## data:  diff(ibmclose)
## Dickey-Fuller Z(alpha) = -318.04, Truncation lag parameter = 5, p-value
## = 0.01
## alternative hypothesis: stationary

And for the time series labeled (b) at the beginning of this document:

pp.test(hsales)
## Warning in pp.test(hsales): p-value smaller than printed p-value
## 
##  Phillips-Perron Unit Root Test
## 
## data:  hsales
## Dickey-Fuller Z(alpha) = -44.195, Truncation lag parameter = 5, p-value
## = 0.01
## alternative hypothesis: stationary

Other types of (stationary) tests:

In these tests the null hypothesis is stationarity, so we look here for large p-values not to reject stationarity. One of the most popular tests is the one due to Kwiatkowski, Phillips, Schimdt and Shin (1922). The null can be specified to be the series is level stationary, stationary around a constant mean, which is the default option or the series is trend stationary, stationary around a constant trend with the option null=“Trend”.

For instance, for the time series labeled (b) at the beginning of this document:

kpss.test(hsales)
## Warning in kpss.test(hsales): p-value greater than printed p-value
## 
##  KPSS Test for Level Stationarity
## 
## data:  hsales
## KPSS Level = 0.11498, Truncation lag parameter = 5, p-value = 0.1

There are two other functions in R that may be sometimes useful: ndiffs() and nsdiffs(). They are supossed to estimate the number of differences required to make a given time series stationary and the number of seasonal differences with nsdiffs(). My own experience is that they don’t always work but they are worth the try.

ndiffs(ibmclose)
## [1] 1