# Load libraries you will use
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# New libraries
import yfinance as yf # financial data from yahoo finance
from pandas_datareader import data as pdr # Data from FRED (Federal Reserve St. Louis)
# Decomposition, Autocorrelation Functions, and Dickey Fuller (non-sationarity)
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.tsa.stattools import acf as sm_acf, adfuller
from scipy.stats import norm
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.stats.stattools import durbin_watson
import statsmodels.api as sm
import statsmodels.formula.api as smf
# Utilities
from scipy.stats import norm
# How many decimals in your outputs? Here 2
= "{:.2f}".format pd.options.display.float_format
Time series Econometrics
CD2010 Introduction to Econometrics
1 Introduction
Time series analysis is one of the core tools in econometrics, as it allows us to study how economic and financial variables evolve over time. Unlike cross-sectional data, where each observation is independent, time series data often display strong dependencies: today’s value is influenced by yesterday’s outcome, and expectations about tomorrow affect decisions made today.
In this workshop, we will explore the foundations of time series econometrics through both theory and hands-on Python applications. The goal is to give you intuition about the main concepts, as well as the ability to simulate and analyze real data on your own.
We will cover the following topics:
- Stationary Stochastic Processes
- Understanding the building blocks of a time series (mean, variance, autocovariance).
- Identifying the difference between stationary and non-stationary series, and why it matters for econometric modeling.
- Understanding the building blocks of a time series (mean, variance, autocovariance).
- Types of Series
- Distinguishing between deterministic trends and stochastic trends.
- Exploring economic examples where these differences are crucial (e.g., GDP growth, inflation, stock prices).
- Distinguishing between deterministic trends and stochastic trends.
- Random Walk Models
- Simulating random walks with and without drift, and interpreting their economic meaning.
- Connecting the random walk hypothesis to financial markets (Fama, 1965) and the idea that stock prices follow unpredictable paths.
- Simulating random walks with and without drift, and interpreting their economic meaning.
- Applications and Visualization
- Using Python to generate simulations, plots, and diagnostics.
- Applying these techniques to real-world macroeconomic and financial datasets.
- Using Python to generate simulations, plots, and diagnostics.
Learning objectives
- Understand the difference between stationary and non‑stationary processes and why it affects inference.
- Identify and model seasonality via decomposition and seasonal dummies.
- Detect serial autocorrelation using the ACF and Durbin–Watson on regression residuals.
- Use ADF (Augmented Dickey–Fuller) tests to assess unit roots and guide transformations (differences/logs).
1.1 Load libraries
It is good practice to keep all the libraries in a single code chunk. In this way, you can always come and check the same place to make sure you have loaded all the (previously installed) libraries you will be using for the Workshop.
2 Stationarity vs non-stationarity
If a series satisfies the next three equations, it is said to be weakly or covariance stationary
Equation | Explanation |
---|---|
E(y_t) = \mu \quad t=1,2,\ldots,\infty | The mean of the series is constant over time. |
E\big((y_t-\mu)^2\big) = \sigma^2 < \infty | The variance of the series is constant and finite. |
E\big((y_{t_1}-\mu)(y_{t_2}-\mu)\big) = \gamma_{t_2-t_1} \quad \forall t_1,t_2 | The covariance between two periods depends only on their distance, not on the specific time. |
Stationarity refers to a time series whose statistical properties, such as mean, variance, and autocorrelation, do not change over time. In simple terms, a stationary series looks roughly the same at any point in time. For example, consider the daily temperature in a city where each day has a slight random variation around an average value, say around 20°C. If this average and the variation remain consistent throughout the year, the series is stationary. Stationarity is crucial for many statistical models because it makes it easier to predict future values based on past data.
Non-stationarity, on the other hand, means that the statistical properties of the time series change over time. This could be due to trends, seasonality, or other factors. For instance, if you track the population of a city over several decades, you’ll likely see an upward trend as the population grows. This upward trend makes the series non-stationary because the mean is changing over time. Non-stationary series are more challenging to analyze and predict because past patterns are not reliable indicators of future behavior.
A random walk is a specific type of non-stationary series where the value at any point is the sum of the previous value plus a random step. Imagine a drunkard taking steps in a random direction: if you track their position after each step, there is no predictable pattern, and the series tends to wander away from the starting point. For example, if you start at zero and each day add a random number between -1 and 1, the series will drift over time without a consistent mean or variance. This lack of predictability and tendency to wander illustrates the concept of a random walk.
2.1 Simulated Time-Series Regimes
We’ll simulate four classic behaviors and visualize each:
- Stationary (mean-reverting): Fluctuates around a stable average; shocks fade out over time.
- Trend-stationary: A clear deterministic trend (up or down) plus temporary wiggles; shocks fade, the trend remains.
- Random walk: Today’s level is yesterday’s level plus a new shock; shocks accumulate, so variance grows over time.
- Random walk with drift: Like a random walk, but with a persistent upward (or downward) push each period.
# Simulate and plot: stationary, trend-stationary, random walk, random walk with drift
import numpy as np
import matplotlib.pyplot as plt
123)
np.random.seed(
= 300 # number of periods
N = 1.0 # shock volatility
sigma = 0.6 # persistence for stationary noise
phi = 0.15 # per-period drift for the RW with drift
drift
# 1) Stationary (AR-like mean reversion)
= np.random.normal(0, sigma, size=N)
eps1 = np.zeros(N)
y_stat for t in range(1, N):
= phi * y_stat[t-1] + eps1[t]
y_stat[t]
# 2) Trend-stationary (deterministic trend + stationary noise)
= np.linspace(0, 10, N)
trend = np.random.normal(0, sigma, size=N)
eps2 = np.zeros(N)
noise2 for t in range(1, N):
= phi * noise2[t-1] + eps2[t]
noise2[t] = trend + noise2
y_trend_stat
# 3) Random walk (non-stationary; shocks accumulate)
= np.random.normal(0, sigma, size=N)
eps3 = np.cumsum(eps3)
y_rw
# 4) Random walk with drift (non-stationary with persistent direction)
= np.random.normal(0, sigma, size=N)
eps4 = np.cumsum(drift + eps4)
y_rwd
# Plot
= plt.subplots(2, 2, figsize=(12, 8))
fig, axes = axes.ravel()
ax
0].plot(y_stat)
ax[0].set_title("Stationary (mean-reverting)")
ax[0].set_xlabel("Time"); ax[0].set_ylabel("Level")
ax[
1].plot(y_trend_stat)
ax[1].set_title("Trend-stationary (trend + temporary noise)")
ax[1].set_xlabel("Time"); ax[1].set_ylabel("Level")
ax[
2].plot(y_rw)
ax[2].set_title("Random walk (shocks accumulate)")
ax[2].set_xlabel("Time"); ax[2].set_ylabel("Level")
ax[
3].plot(y_rwd)
ax[3].set_title("Random walk with drift (persistent direction)")
ax[3].set_xlabel("Time"); ax[3].set_ylabel("Level")
ax[
plt.tight_layout() plt.show()
3 Calculation of financial returns and growth
3.1 Returns, Growth, and Why We Use Logs
Let P_t be a price (or index level) at time t and Y_t be an economic level (e.g., GDP, CPI).
3.1.1 Simple (arithmetic) percent change
- Price/Index return R_t \;=\; \frac{P_t - P_{t-1}}{P_{t-1}} \;=\; \frac{P_t}{P_{t-1}} - 1
- Growth of an economic series (e.g., GDP, CPI/inflation) g_t \;=\; \frac{Y_t - Y_{t-1}}{Y_{t-1}} \;=\; \frac{Y_t}{Y_{t-1}} - 1
- In percent: 100 \times R_t or 100 \times g_t.
3.1.2 Continuously compounded (log) change
Log return / log growth r_t \;=\; \ln P_t - \ln P_{t-1} \;=\; \ln\!\left(\frac{P_t}{P_{t-1}}\right), \qquad \Delta \ln Y_t \;=\; \ln Y_t - \ln Y_{t-1}, where \Delta \ln Y_t \;=\; \ln Y_t - \ln Y_{t-1} is the first difference of the natural logarithm of Y.
Small-change approximation r_t \;\approx\; R_t, \qquad \Delta \ln Y_t \;\approx\; \frac{Y_t}{Y_{t-1}} - 1
Exact conversion R_t \;=\; e^{r_t} - 1, \qquad r_t \;=\; \ln(1 + R_t)
3.1.3 Interpreting inflation and growth with indices
- Inflation from a price index (e.g., CPI)
\pi_t \;=\; \frac{\text{CPI}_t}{\text{CPI}_{t-1}} - 1 \;\;\approx\;\; \Delta \ln(\text{CPI}_t). Therefore, we can say that the inflation rate is equal to the first difference of the natural logarithm of the Consumer Price Index.
- GDP growth \text{GDP growth}_t \;=\; \frac{Y_t}{Y_{t-1}} - 1 \;\;\approx\;\; \Delta \ln(Y_t).
Therefore, we can say that the GDP grwoth rate is equal to the first difference of the natural logarithm of the GDP.
Notes - Logs require positive levels (P_t>0,\; Y_t>0).
- To annualize monthly log changes: r_{\text{annual}} \approx 12 \times \bar{r}_{\text{monthly}}.
For simple returns: R_{\text{annual}} = (1+\bar{R}_{\text{monthly}})^{12} - 1.
3.2 Load the data
For this Workshop we will be downloading the data directly from the internet. We will be able to download data from different sources such as Yahoo Finance or FRED (the Saint Louis Fed Database). From these databases you can source thousands of financial and economic data series from many market exchanges and countries around the world.
We will collect the following series:
- USD GDP in millions of dollars, not seasonally adjusted, from 1980Q1 to present.
- Índice de Precios y Cotizaciones (IPyC), from 1985-01 to present.
- S&P500, from 1991-11.
Notice that the code asks for information since January 1980 for all variables, but each variable returns data for its own available history.
What is a market index?
A market index in a financial market is a virtual portfolio composed by a group of public firms that issue shares in that market. The IPyC index tries to emulate a virtual portfolio with the biggest 35 public firms in the Mexican market. The S&P500 index tries to emulate a virtual portfolio with the biggest 500 public firms in the US market. In each case, the weight (%) assigned for each firm in the virtual portfolio is the market size of each firm.
# US GDP Millions of Dollars, Not Seasonally Adjusted (FRED: NA000334Q)
= "1980-01-01"
start = "2025-08-28"
end = pdr.DataReader("NA000334Q", "fred", start=start, end=end)
gdp = ["GDP"]
gdp.columns
# IPyC (^MXX) and S&P500 (^GSPC) from Yahoo Finance (monthly)
= ["^MXX", "^GSPC"]
tickers = "1980-01-01"
start = "2025-08-28"
end = yf.download(tickers, start= start, end = end, interval="1mo")
data
data.tail()
C:\Users\L03544739\AppData\Local\Temp\ipykernel_27120\1724145601.py:11: FutureWarning: YF.download() has changed argument auto_adjust default to True
data = yf.download(tickers, start= start, end = end, interval="1mo")
[ 0% ][*********************100%***********************] 2 of 2 completed
Price | Close | High | Low | Open | Volume | |||||
---|---|---|---|---|---|---|---|---|---|---|
Ticker | ^GSPC | ^MXX | ^GSPC | ^MXX | ^GSPC | ^MXX | ^GSPC | ^MXX | ^GSPC | ^MXX |
Date | ||||||||||
2025-04-01 | 5569.06 | 56259.28 | 5695.31 | 57415.86 | 4835.04 | 49799.28 | 5597.53 | 52546.87 | 118936380000 | 4775961900.00 |
2025-05-01 | 5911.69 | 57841.69 | 5968.61 | 59735.43 | 5578.64 | 55241.51 | 5625.14 | 55602.80 | 105346260000 | 4234019100.00 |
2025-06-01 | 6204.95 | 57450.88 | 6215.08 | 58600.78 | 5861.43 | 55603.77 | 5896.68 | 57973.40 | 106456300000 | 4386838400.00 |
2025-07-01 | 6339.39 | 57397.93 | 6427.02 | 58680.87 | 6177.97 | 55287.90 | 6187.25 | 57446.31 | 114004890000 | 4081420100.00 |
2025-08-01 | 6460.26 | 58708.86 | 6508.23 | 59421.61 | 6212.69 | 56486.86 | 6287.28 | 57485.57 | 99352030000 | 3414573000.00 |
As you may see, this function downloads 5 variables for each series. The OHCL (open, high, close, and low) prices and the volume. We will now keep only the closing price for each date and we will get rid of any missing information (NaNs).
# Keep Close for indices. For stocks you also get the Adjust Price, but it's ok to work with the Close.
= data["Close"].dropna(how="all")
close = close[["^MXX"]].dropna().rename(columns={"^MXX": "MXX"})
MXX = close[["^GSPC"]].dropna().rename(columns={"^GSPC": "SP500"})
SP500
# Preview heads
gdp.head(), MXX.head(), SP500.head()
( GDP
DATE
1980-01-01 683564
1980-04-01 698570
1980-07-01 707671
1980-10-01 767505
1981-01-01 758799,
Ticker MXX
Date
1991-11-01 1384.20
1991-12-01 1431.50
1992-01-01 1623.50
1992-02-01 1860.60
1992-03-01 1875.70,
Ticker SP500
Date
1985-01-01 179.63
1985-02-01 181.18
1985-03-01 180.66
1985-04-01 179.83
1985-05-01 189.55)
3.3 Summarize data to get an overview
We start by obtaining descriptive statistics for each series.
# Descriptive statistics
= pd.concat({
summary_indices "GDP": gdp["GDP"].describe(),
"MXX": MXX["MXX"].describe(),
"SP500": SP500["SP500"].describe()
=1)
}, axis summary_indices
GDP | MXX | SP500 | |
---|---|---|---|
count | 182.00 | 406.00 | 488.00 |
mean | 3113918.38 | 26162.00 | 1593.23 |
std | 1816331.23 | 19290.86 | 1382.84 |
min | 683564.00 | 1327.10 | 179.63 |
25% | 1540822.50 | 5830.34 | 497.38 |
50% | 2747598.50 | 29114.48 | 1211.58 |
75% | 4336879.00 | 44047.14 | 2060.58 |
max | 7606502.00 | 58708.86 | 6460.26 |
4 Graphical analysis of time series
4.1 Series in levels
We now compare different plots from the data we downloaded. We start with the S&P500 index (monthly Close).
="SP500", title="S&P 500 (Close, monthly)")
SP500.plot(y plt.show()
Now we plot the evolution of the USD GDP (quarterly), not seasonally adjusted.
="GDP", title="US GDP 1980–present (not seasonally adjusted)")
gdp.plot(y plt.show()
Finally, let’s have a look at the Mexican Financial Market Index, IPyC (monthly close).
="MXX", title="IPyC (Close, monthly)")
MXX.plot(y plt.show()
5 Seasonality & Decomposition
Seasonality is a repeating calendar pattern. US GDP series is not seasonally adjusted and quarterly, so we expect systematic Quarter-to-Quarter patterns.
Two structures are common:
Additive model: y_t = T_t + S_t + R_t
Seasonal swings are roughly constant in size over time.Multiplicative model: y_t = T_t \times S_t \times R_t
Seasonal swings scale with the level. Taking logs makes it additive:
\ln y_t = \ln T_t + \ln S_t + \ln R_t.
Components:
- T_t (Trend/Level): the smooth long-run path (growth/decline).
- (S_t (Seasonality): systematic, calendar-related pattern that repeats every m periods (e.g., months in a year).
- R_t (Remainder/Residual/Irregular): what’s left after removing trend and seasonality (noise, shocks, idiosyncrasies).
When to use which? If variance rises with the level (macro series with growth), multiplicative often fits better; otherwise additive is fine.
Example of Additive Model on simulated series Y additive
Example of Multiplicative Model on simulated series Y multiplicative
Example of Additive Model on the logarithm of simulated series Y multiplicative
5.1 Decomposition: US GDP (quarterly)
5.1.1 Additive model
# Change your series, model, and period accordingly (4=quarterly, 12=monthly)
= seasonal_decompose(gdp["GDP"], model="additive", period=4)
decomp_add
# ---- Additive model: 2×2 panel ----
= plt.subplots(2, 2, figsize=(10, 6), sharex=True)
fig, axes = axes.flatten()
axes
0].plot(decomp_add.observed); axes[0].set_title("Observed")
axes[1].plot(decomp_add.trend); axes[1].set_title("Trend (Tₜ)")
axes[2].plot(decomp_add.seasonal); axes[2].set_title("Seasonality (Sₜ)")
axes[3].plot(decomp_add.resid); axes[3].set_title("Remainder (Rₜ)")
axes[
"US GDP (NSA) — Additive Decomposition (period=4)", y=1.02)
fig.suptitle( plt.show()
5.1.2 Multiplicative model
= seasonal_decompose(gdp["GDP"], model="multiplicative", period=4)
decomp_mul
# ---- Multiplicative model: 2×2 panel ----
= plt.subplots(2, 2, figsize=(10, 6), sharex=True)
fig, axes = axes.flatten()
axes
0].plot(decomp_mul.observed); axes[0].set_title("Observed")
axes[1].plot(decomp_mul.trend); axes[1].set_title("Trend (Tₜ)")
axes[2].plot(decomp_mul.seasonal); axes[2].set_title("Seasonality (Sₜ)")
axes[3].plot(decomp_mul.resid); axes[3].set_title("Remainder (Rₜ)")
axes[
"US GDP (NSA) — Multiplicative Decomposition (period=4)", y=1.02)
fig.suptitle( plt.show()
Interpretation
- The trend captures long‑run growth.
- The seasonal panel shows the within‑year pattern (Q1…Q4) repeated across years.
- The residual shows short‑run, non‑seasonal variation.
Forecasting implication
Ignoring seasonality leads to systematic bias in specific quarters. Seasonal ARIMA/ETS or including quarterly dummies corrects this.
5.2 Decomposition: MXX & S&P 500 (monthly)
Financial indices may show weak seasonality in levels, but it is still useful to practice the tools.
= seasonal_decompose(MXX["MXX"].dropna(), model="additive", period=12)
decomp_mxx = decomp_mxx.plot()
fig "MXX (Levels) — Additive Decomposition (period=12)", y=1.02)
fig.suptitle(
plt.show()
= seasonal_decompose(SP500["SP500"].dropna(), model="multiplicative", period=12)
decomp_sp = decomp_sp.plot()
fig "S&P 500 (Levels) — Multiplicative Decomposition (period=12)", y=1.02)
fig.suptitle( plt.show()
For returns, deterministic seasonality is often modeled with calendar dummies (e.g., months), while level seasonality is handled with seasonal differencing or seasonal ARIMA on the levels.
6 Unit Roots and Differencing
We have already seen that a random walk is non-stationary: its variance grows over time, and shocks have permanent effects. This happens because the process has what we call a unit root.
Formally, a time series has a unit root if it can be written as:
y_t = \rho y_{t-1} + \epsilon_t, \quad \epsilon_t \sim \text{WN}(0,\sigma^2).
- If |\rho| < 1, the series is stationary: shocks die out and the process reverts to its mean. Thinks of this as the ability to forget something (e.g, a bad break up).
- If \rho = 1, the series has a unit root. This is exactly the case of a random walk. Because \rho=1, shocks never fade; they accumulate permanently. In our example, it’s the case where you never get over your toxic ex-couple.
- If |\rho| > 1, the series explodes. This does not make sense in economic/financial terms, so we won’t cover that case here.
6.1 Making It Stationary
If we take first differences, which is substracting from today’s value the first lag (or other lag of the series), we get:
\Delta y_t = y_t - y_{t-1},
then for the unit root case (\rho = 1):
\Delta y_t = \epsilon_t,
which is just white noise: constant mean, constant variance, and no memory of the past.
Stock prices: non-stationary with a unit root, but returns (first log differences) are often stationary.
GDP levels: may follow a stochastic trend (unit root), but growth rates behave more like stationary series.
In short, the idea of the random walk generalizes into the concept of a unit root process.
Unit roots explain why many economic and financial series are non-stationary in levels, but become stationary after differencing.
Sometimes it can be the case that the first difference of a series is still non-stationary. No worries, we can take as many differences as we need to make a series stationary. However the interpretation might get more complex. The second difference of y_t will be:
\Delta^2 y_t = \Delta y_t - \Delta y_{t-1},
7 Series in the first log-difference
Now, we create variables that might be stationary, by taking the first difference of the logarithm of the series (log-returns or log-growth), and therefore, removing one unit root (you remove one onit root per difference).
# Continuously compounded return of the MXX
= np.log(MXX["MXX"]).diff()
rMXX
="Continously compounded return for the MXX index")
rMXX.plot(title plt.show()
# S&P500 log return
= np.log(SP500["SP500"]).diff()
rSP500 ="Continously compounded return for the SP500 index")
rSP500.plot(title plt.show()
# GDP growth rate (log-diff)
= np.log(gdp["GDP"]).diff()
GDPgrowth
="US GDP (log growth)")
GDPgrowth.plot(title plt.show()
# Descriptive statistics returns and growth
= pd.concat({
summary_returns "GDP growth": GDPgrowth.describe(),
"MXX log-returns": rMXX.describe(),
"SP500 log-returns": rSP500.describe()
=1)
}, axis summary_returns
GDP growth | MXX log-returns | SP500 log-returns | |
---|---|---|---|
count | 181.00 | 405.00 | 487.00 |
mean | 0.01 | 0.01 | 0.01 |
std | 0.03 | 0.07 | 0.04 |
min | -0.06 | -0.35 | -0.25 |
25% | -0.01 | -0.03 | -0.02 |
50% | 0.02 | 0.01 | 0.01 |
75% | 0.04 | 0.05 | 0.04 |
max | 0.08 | 0.19 | 0.12 |
8 White noise
First we start by modelling a white noise process (notice that for this section we do not actually use the dataset we just downloaded). A white noise process is one with (virtually) no discernible structure. A definition of a white noise process is:
E(y_t)=\mu,\quad \mathrm{var}(y_t)=\sigma^2,\quad \gamma_{t-r}= \begin{cases} \sigma^2 & \text{if } t=r\\ 0 & \text{otherwise} \end{cases}
This means that our series y_t has a constant mean (i.e., no trend) equal to \mu, a constant variance \sigma^2, and an autocovariance equal to zero for different periods.
The following code produces 4 white-noise series of different lengths; 50, 100, 500, and 1000 observations. We set a seed for reproducibility. We create the white-noise series from a Standard Normal Distribution.
123)
np.random.seed(
= 50, 100, 500, 1000
n_1, n_2, n_3, n_4
= np.random.normal(size=n_1)
white_noise_1 = np.random.normal(size=n_2)
white_noise_2 = np.random.normal(size=n_3)
white_noise_3 = np.random.normal(size=n_4)
white_noise_4
= plt.subplots(2, 2, figsize=(12, 7))
fig, axes = axes.ravel()
axes for ax, wn, title in zip(
axes,
[white_noise_1, white_noise_2, white_noise_3, white_noise_4],"White Noise series 50 obs.", "White Noise series 100 obs.",
["White Noise series 500 obs.", "White Noise series 1000 obs."]
):
ax.plot(wn)
ax.set_title(title)"Time")
ax.set_xlabel("Value")
ax.set_ylabel(
plt.tight_layout() plt.show()
8.1 Descriptive statistics of our series
Combine 4 series of different length by padding with NaN
to the maximum length, and obtain descriptive stats.
= max(len(white_noise_1), len(white_noise_2), len(white_noise_3), len(white_noise_4))
t
def pad_to(arr, L):
= np.empty(L)
out = np.nan
out[:] len(arr)] = arr
out[:return out
= pd.DataFrame({
noises "white_noise_1": pad_to(white_noise_1, t),
"white_noise_2": pad_to(white_noise_2, t),
"white_noise_3": pad_to(white_noise_3, t),
"white_noise_4": pad_to(white_noise_4, t),
}) noises.describe().T
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
white_noise_1 | 50.00 | 0.01 | 1.20 | -2.80 | -0.78 | -0.12 | 0.92 | 2.39 |
white_noise_2 | 100.00 | 0.08 | 1.04 | -2.12 | -0.74 | 0.14 | 0.81 | 2.60 |
white_noise_3 | 500.00 | -0.04 | 0.97 | -3.23 | -0.67 | -0.03 | 0.61 | 2.96 |
white_noise_4 | 1000.00 | -0.02 | 0.96 | -3.80 | -0.66 | -0.01 | 0.64 | 3.57 |
8.2 Histograms of white noises
Since we are creating our series from random draws from a Standard Normal Distribution, we expect their histograms to look close to normal. Sample size matters for how well we “cover” the distribution.
= plt.subplots(2, 2, figsize=(12, 7))
fig, axes = axes.ravel()
axes for ax, wn, title in zip(
axes,
[white_noise_1, white_noise_2, white_noise_3, white_noise_4],"Histogram White Noise series 50 obs.",
["Histogram White Noise series 100 obs.",
"Histogram White Noise series 500 obs.",
"Histogram White Noise series 1000 obs."]
):=20, density=True)
ax.hist(wn, bins# Overlay standard normal pdf
= np.linspace(wn.min()-1, wn.max()+1, 200)
xs 0, 1))
ax.plot(xs, norm.pdf(xs,
ax.set_title(title)
plt.tight_layout() plt.show()
8.3 Autocorrelation Functions of White Noise series
White noise series should not be autocorrelated beyond lag 0. We plot the ACF for our four series to inspect this property. (Blue bands are 95% confidence intervals.)
= plt.subplots(2, 2, figsize=(12, 7))
fig, axes = axes.ravel()
axes for ax, wn, title in zip(
axes,
[white_noise_1, white_noise_2, white_noise_3, white_noise_4],"ACF white noise 50 obs.", "ACF white noise 100 obs.",
["ACF white noise 500 obs.", "ACF white noise 1000 obs."]
):=12, ax=ax)
plot_acf(wn, lags
ax.set_title(title)
plt.tight_layout() plt.show()
8.4 AR(1) Processes and Unit Roots
An autoregressive process of order 1 AR(1), is defined as series that is modeled based on it’s first lag (value of previous period) plus a shock:
y_t = \rho y_{t-1} + \epsilon_t, \quad \epsilon_t \sim \text{WN}(0,\sigma^2).
The behavior of the process depends on the value of \rho:
- \rho = 0.5 (moderate positive correlation): shocks die out gradually, the series reverts to its mean.
- \rho = -0.5 (negative correlation): the series oscillates above and below its mean, with shocks alternating in sign.
- \rho = 0.9 (high persistence): shocks take a long time to fade; the series shows strong momentum.
- \rho = 1 (unit root): shocks never fade, the process becomes a random walk and is non-stationary.
Graphically, this looks like the following:
# set seed for replicability
42)
np.random.seed(#Number of observations
= 100
T # Shocks
= np.random.normal(0,1,T)
eps
# AR(1)
def simulate_ar1(rho, T, eps):
= np.zeros(T)
y for t in range(1,T):
= rho*y[t-1] + eps[t]
y[t] return y
= [0.5, -0.5, 0.9, 1.0]
rhos = plt.subplots(2,2, figsize=(12,6))
fig, axes
for ax, rho in zip(axes.flatten(), rhos):
= simulate_ar1(rho, T, eps)
y
ax.plot(y)f"AR(1) with ρ = {rho}")
ax.set_title(
plt.tight_layout() plt.show()
9 Serial Autocorrelation
Serial autocorrelation means values or residuals correlate with their own lags. This violates the classical OLS independence assumption and can inflate t-statistics.
9.1 Autocorrelation Function (ACF)
# ACF S&P levels. It's good practice to use dropna() to get rid of NaNs
=12, title = "ACF - S&P 500 levels")
plot_acf(SP500.dropna(), lags plt.show()
# ACF S%P cc returns
=12, title = "ACF - S&P 500 cc returns")
plot_acf(rSP500.dropna(), lags plt.show()
Alternatively, create figures (panels).
# Plots of ACF for GDP
= plt.subplots(1, 2, figsize=(12,4))
fig, ax =12, ax=ax[0]); ax[0].set_title("ACF — US GDP (levels)")
plot_acf(gdp.dropna(), lags=12, ax=ax[1]); ax[1].set_title("ACF — US GDP (log growth)")
plot_acf(GDPgrowth.dropna(), lags; plt.show() plt.tight_layout()
Reading the ACF
- Strong, persistent spikes in GDP levels reflect trend/non‑stationarity.
- GDP log‑growth usually shows much weaker correlation.
# Plots of ACF for MXX
= plt.subplots(1, 2, figsize=(12,4))
fig, ax =12, ax=ax[0]); ax[0].set_title("ACF — MXX (levels)")
plot_acf(MXX.dropna(), lags=12, ax=ax[1]); ax[1].set_title("ACF — MXX (cc returns)")
plot_acf(rMXX.dropna(), lags
plt.tight_layout() plt.show()
- Equity returns often have little linear autocorrelation.
9.2 Durbin–Watson (DW) on residuals
DW tests first‑order autocorrelation in regression residuals. Values: ~2 (no AR1), <2 (positive), >2 (negative).
We test two simple “constant‑only” regressions (plug in your own models as needed).
def dw_from_constant_only(y):
= y.dropna()
y_ = sm.add_constant(np.ones(len(y_)))
X = sm.OLS(y_.values, X).fit()
model = durbin_watson(model.resid)
dw return model, dw
= dw_from_constant_only(GDPgrowth)
model_g, dw_g = dw_from_constant_only(rSP500)
model_sp, dw_sp
print(f"DW — GDP log-growth residuals: {dw_g:.3f}")
print(f"DW — S&P500 returns residuals: {dw_sp:.3f}")
DW — GDP log-growth residuals: 3.048
DW — S&P500 returns residuals: 1.967
To compute DW for any model, fit it and pass
model.resid
todurbin_watson(resid)
.
10 Stationarity Assessment (ADF Test)
The Augmented Dickey–Fuller (ADF) test checks for a unit root.
- Null H_0: unit root (non‑stationary).
- Alternative: stationary (around a constant or a trend, depending on the spec).
We test GDP levels vs log‑growth, and SP500 levels vs log‑returns.
def adf_report(series, name="", regression="c"):
"""
regression:
'c' -> constant
'ct' -> constant + trend
'n' -> no constant
"""
= series.dropna().astype(float)
s = adfuller(s, regression=regression, autolag="AIC")
result = result
test_stat, pvalue, usedlag, nobs, crit, icbest print(f"ADF Test — {name} | regression='{regression}'")
print(f" test statistic: {test_stat:.3f}")
print(f" p-value : {pvalue:.4f}")
print(f" used lags : {usedlag}")
print(f" nobs : {nobs}")
print(" critical values:")
for k, v in crit.items():
print(f" {k}: {v:.3f}")
print(f" best IC (AIC) : {icbest:.3f}")
print("-"*54)
# GDP levels likely non-stationary (include trend)
"GDP"], name="US GDP (levels)", regression="ct")
adf_report(gdp[
# GDP log-growth likely stationary (no constant due to difference)
="US GDP (log growth)", regression="n")
adf_report(GDPgrowth, name
# Second difference of GDP
= GDPgrowth.diff()
GDP_2diff ="US GDP (growth velocity)", regression="n")
adf_report(GDP_2diff, name
# S&P 500 levels likely non-stationary (include trend)
"SP500"], name="S&P 500 (levels)", regression="ct")
adf_report(SP500[
# S&P cc returns perhaps stationary (no constant due to difference)
="S&P 500 (log returns)", regression="n") adf_report(rSP500, name
ADF Test — US GDP (levels) | regression='ct'
test statistic: 1.595
p-value : 1.0000
used lags : 12
nobs : 169
critical values:
1%: -4.013
5%: -3.437
10%: -3.142
best IC (AIC) : 4188.938
------------------------------------------------------
ADF Test — US GDP (log growth) | regression='n'
test statistic: -1.064
p-value : 0.2590
used lags : 11
nobs : 169
critical values:
1%: -2.579
5%: -1.943
10%: -1.615
best IC (AIC) : -938.596
------------------------------------------------------
ADF Test — US GDP (growth velocity) | regression='n'
test statistic: -6.233
p-value : 0.0000
used lags : 14
nobs : 165
critical values:
1%: -2.579
5%: -1.943
10%: -1.615
best IC (AIC) : -938.627
------------------------------------------------------
ADF Test — S&P 500 (levels) | regression='ct'
test statistic: 2.302
p-value : 1.0000
used lags : 15
nobs : 472
critical values:
1%: -3.978
5%: -3.420
10%: -3.133
best IC (AIC) : 5543.106
------------------------------------------------------
ADF Test — S&P 500 (log returns) | regression='n'
test statistic: -21.097
p-value : 0.0000
used lags : 0
nobs : 486
critical values:
1%: -2.570
5%: -1.942
10%: -1.616
best IC (AIC) : -1574.585
------------------------------------------------------
Interpretation guide
High p-value (> 0.05): fail to reject unit root → treat as non‑stationary (difference or detrend before ARMA‑type modeling).
Low p-value (≤ 0.05): reject unit root → treat as stationary under the chosen specification.
Therefore, we can say that the GDP is integrated of order 2 (we took difference twice to find a stationary version of the series), that GDP growth is integrated of order 1, and that the GDP velocity is integrated of order zero.
With the S&P 500 we can say that the series in levels is integrated of order 1, or I(1), and that the continuously compounded returns of the SP500 are I(0), hence stationary.