Intro to Time Series

November 29, 2016

Outline:

What is time series
Stationarity
ACF/correlogram
unit roots
ADF test
trend vs difference stationarity
Random Walks

Time Series data

We're not getting random data which means we can't make valid inferences using the Classical Linear Regression Assumptions.

We're looking at data that is generated from some process, so we want to model that process to squeeze out systematic variations in the data.

Remember our overall task: use variation in \(x\) to understand variation in \(y\). OLS regression on time series data will lead us astray if we don't account for certain possibilities.

Autcorrelation (sometimes "serial correlation")

The big problem we're facing in TS!

Negative autocorrelation is when error terms go up and down.
Positive autocorrelation is when error terms follow one another. Instead of jumping up and down, they gradually sway up and down.

Autocorrelation means our statistical inferences won't be valid until we appropriately tweak our models.

The basic components of time series data

Remember, our goal is to try to understand variation in Y by looking at variation in X. Normally if we see X and Y increasing together, that means X and Y are highly correlated. But now that comovement might be simply because of a time trend.

Removing this regularity in the data means we can ask about variation in X and Y around the trend.

Another regularity is seasonal variation.

Stationarity

If a series is stationary:

It has to do with greeting cards
We can generalize to different time periods
- but just because the sun rose yesterday, doesn't mean it will rise tomorrow.
We're less prone to make spurious regressions.
- Swimming pool drownings and Nicholas Cage films
- Divorce rate in Maine and consumption of margarine

Stationarity

A process is stationary if the mean, variance, and autocorrelation don't change over time.

\[E(x_t) = \mu_x\] \[var(x_t) = \sigma_x^2\] \(cov(x_t,x_{t-j}) = f(j)\), but does not depend on \(t\).

Data with a strong trend may be trend stationary but won't be stationary without (at least) accounting for that trend.

Testing for stationarity

Graphs
- "Anyone who tries to analyze time series without plotting it first is asking for trouble." (Chatfield, quoted in Gujarati's Econometrics by Example)
Autocorrelation functions and Correlograms
Unit root tests

Autocorrelation functions (ACF)

We care about effects of lag(x,1:something), not just lag(x,1).
- i.e. the effects may persist over multiple periods (e.g. effects of booms and busts)
The ACF of lag \(k\): \[\rho_k= \frac{\gamma_k}{\gamma_0} = \frac{cov(x_t,x_{t-k})}{var(x)}\]

Correlogram

Example: Stress dreams

Google trends data for "dream teeth falling out"

Example: Stress dreams

Random walks, example:

Extra credit: Based on a series of coin tosses I will give you 1 extra credit point for each H, but -1 point for each T.

Questions:

What is E(extra credit)?
How many times would you get more than 100 (or less than -100) if you flipped infinite times?
What sort of streaks would you expect if you flipped infinite times?

Random walks, example: 100 flips

rbinom(100,1,0.5) %>% ifelse(.,1,-1) %>% cumsum()

  [1]  1  2  1  2  3  2  1  0 -1  0  1  2  1  0 -1  0 -1  0 -1  0  1  0  1
 [24]  0  1  2  3  2  1  2  3  2  3  2  1  0 -1  0  1  0 -1  0  1  2  3  2
 [47]  3  2  3  4  3  2  1  2  3  4  5  4  3  2  3  2  1  2  3  4  5  4  5
 [70]  6  7  6  7  8  9  8  9 10  9 10 11 12 11 12 11 12 13 12 11 10 11 10
 [93]  9 10 11 12 13 12 11 12

  [1]  1  2  1  2  1  2  3  2  3  2  3  2  3  2  3  4  5  6  5  4  5  4  3
 [24]  4  3  2  3  2  1  0  1  0  1  0 -1  0  1  2  1  0  1  2  3  2  3  2
 [47]  3  4  5  6  5  4  3  2  3  4  3  4  3  2  3  4  3  2  1  0  1  0 -1
 [70] -2 -3 -4 -5 -4 -5 -6 -5 -4 -3 -2 -3 -2 -3 -4 -3 -4 -5 -4 -3 -2 -1 -2
 [93] -1 -2 -3 -4 -3 -4 -3 -2

Random walks, example: streaks

Random walks vs streaks

Random walks

Types of random walks

Random walk
Random walk with drift
Random walk with drift around trend

Unit roots

To test for presence of a "unit root" try the regression: \[\Delta x_t = \beta_0 + \beta_1 t + \beta_2 x_{t-1} + u_t\] \[(x_t - x_{t-1}) = \beta_0 + \beta_1 t + \beta_2 x_{t-1} + u_t\]

If our null hypothesis (\(H_0: \beta_2 = 0\)) is tests whether \(x_t\) is just \(x_{t-1}\) plus some random component.

The other terms test for presence of a trend (\(\beta_1\)) or "drift" (\(\beta_0\)).

t-test isn't valid if the series is non-stationary
The \(\tau\) (tau) test can test \(H_0\).
- This is also known as the Dickey-Fuller (DF) test.

Dickey Fuller test

Tests for unit roots. The alternatives are: * Explosive change (\(x_t\) multiplies \(x_{t-1}\) in some way) * Convergence (The effect of \(x_{t-1}\) goes away after a while)

But if we have a unit root in our series, then we need to do something about it before we can use that series to answer other questions.

Data Transformations to get stationary

Differencing means looking at a change in the variable from period to period instead of the absolute value of the variable
- e.g. GDP growth is more likely to be stationary than GDP
Detrending involves estimating a trend (e.g. "\(X\) increases by about 50 units per year"), then subtracting that trend from the data. Now the data tells you whether \(X\) was above or below trend
Using a logarithmic model can help stabilize the variance.

Seasonality

Some data fluctuate in a predictable way:
- Consumer spending peaks before Christmas
- Time spent on homework peaks just before it's due
Ignoring this regularity can be problematic