Standard OLS on panel data with cross-sectional correlation in residuals produces biased standard errors. The Fama-MacBeth (1973) procedure corrects for this by running \(T\) separate cross-sectional regressions and averaging the resulting coefficients. The standard errors are computed from the time-series variation of those cross-sectional estimates — not from the pooled residuals — making them robust to cross-sectional dependence.
Fama, E. F., & MacBeth, J. D. (1973). Risk, return, and equilibrium: Empirical tests. Journal of Political Economy, 81(3), 607–636.
The procedure consists of two passes:
Pass 1 — Time-Series Regression (N regressions, one per asset)
For each asset \(i\), regress its full return history on the Fama-French factors to estimate factor loadings (betas):
\[r_{i,t} = \alpha_i + \beta_{i,\text{MKT}} \cdot \text{MKT}_t + \beta_{i,\text{SMB}} \cdot \text{SMB}_t + \beta_{i,\text{HML}} \cdot \text{HML}_t + \varepsilon_{i,t}\]
Pass 2 — Cross-Sectional Regression (T regressions, one per date)
At each date \(t\), regress the cross-section of returns on the betas from Pass 1 to obtain period-\(t\) risk premia \(\hat{\lambda}_t\):
\[r_{i,t} = \gamma_{0,t} + \lambda_{\text{MKT},t} \cdot \hat{\beta}_{i,\text{MKT}} + \lambda_{\text{SMB},t} \cdot \hat{\beta}_{i,\text{SMB}} + \lambda_{\text{HML},t} \cdot \hat{\beta}_{i,\text{HML}} + u_{i,t}\]
Final Step — Averaging & Inference
Average the \(T\) estimates of each \(\lambda\) across time and test whether the mean differs significantly from zero:
\[\hat{\lambda}_j = \frac{1}{T} \sum_{t=1}^{T} \hat{\lambda}_{j,t}, \qquad H_0: \bar{\lambda}_j = 0 \quad \text{vs} \quad H_1: \bar{\lambda}_j \neq 0\]
This analysis uses a panel of 6 U.S. stocks (AAPL, FORD, GE, GM, IBM, MSFT) with daily returns and Fama-French factor observations from January 2011 to December 2015 (≈ 1,257 trading days × 6 assets = 7,542 observations).
# Install any missing packages automatically
required_packages <- c("tidyverse", "broom", "kableExtra")
new_packages <- required_packages[
!(required_packages %in% installed.packages()[, "Package"])
]
if (length(new_packages) > 0) {
install.packages(new_packages, dependencies = TRUE)
}
# Load libraries
library(tidyverse) # Data manipulation and pipe operator
library(broom) # Tidies model output (tidy, glance, augment)
library(kableExtra) # Enhanced kable tables for R Markdown# Read the dataset
# Columns:
# symbol : stock ticker identifier
# date : trading date (character, format: d-Mon-YY)
# ri : daily return of asset i
# MKT : Fama-French market factor (excess market return)
# SMB : Small-Minus-Big size factor
# HML : High-Minus-Low value factor
data <- read.csv("data.csv", stringsAsFactors = FALSE)
glimpse(data)Rows: 7,542
Columns: 6
$ symbol <chr> "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL",…
$ date <chr> "4-Jan-11", "5-Jan-11", "6-Jan-11", "7-Jan-11", "10-Jan-11", "1…
$ ri <dbl> 0.0052062641, 0.0081462879, -0.0008082435, 0.0071360567, 0.0186…
$ MKT <dbl> -0.0013138901, 0.0049946699, -0.0021252276, -0.0018465050, -0.0…
$ SMB <dbl> -0.0065, 0.0018, 0.0001, 0.0022, 0.0041, 0.0016, 0.0031, -0.002…
$ HML <dbl> 0.0008, 0.0013, -0.0025, -0.0006, 0.0039, 0.0036, 0.0000, -0.00…
# Panel summary
tibble(
Metric = c("Total observations", "Number of assets", "Assets",
"First date", "Last date"),
Value = c(
nrow(data),
n_distinct(data$symbol),
paste(sort(unique(data$symbol)), collapse = ", "),
data$date[1],
data$date[nrow(data)]
)
) %>%
kbl(caption = "Panel Dataset Summary") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE)| Metric | Value |
|---|---|
| Total observations | 7542 |
| Number of assets | 6 |
| Assets | AAPL, FORD, GE, GM, IBM, MSFT |
| First date | 4-Jan-11 |
| Last date | 31-Dec-15 |
# First few rows
head(data, 10) %>%
kbl(caption = "First 10 Rows of the Dataset", digits = 6) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))| symbol | date | ri | MKT | SMB | HML |
|---|---|---|---|---|---|
| AAPL | 4-Jan-11 | 0.005206 | -0.001314 | -0.0065 | 0.0008 |
| AAPL | 5-Jan-11 | 0.008146 | 0.004995 | 0.0018 | 0.0013 |
| AAPL | 6-Jan-11 | -0.000808 | -0.002125 | 0.0001 | -0.0025 |
| AAPL | 7-Jan-11 | 0.007136 | -0.001847 | 0.0022 | -0.0006 |
| AAPL | 10-Jan-11 | 0.018657 | -0.001377 | 0.0041 | 0.0039 |
| AAPL | 11-Jan-11 | -0.002368 | 0.003718 | 0.0016 | 0.0036 |
| AAPL | 12-Jan-11 | 0.008104 | 0.008967 | 0.0031 | 0.0000 |
| AAPL | 13-Jan-11 | 0.003652 | -0.001712 | -0.0026 | -0.0044 |
| AAPL | 14-Jan-11 | 0.008067 | 0.007357 | -0.0010 | -0.0073 |
| AAPL | 18-Jan-11 | -0.022725 | 0.001375 | 0.0056 | 0.0015 |
For each of the \(N\) assets, run a single OLS regression of the asset’s returns on the three Fama-French factors over the full sample. This yields one set of factor loadings per asset: \(\hat{\beta}_{i,\text{MKT}}\), \(\hat{\beta}_{i,\text{SMB}}\), \(\hat{\beta}_{i,\text{HML}}\).
beta_estimates <- data %>%
# Nest the full time series for each asset
nest(ts_data = c(date, ri, MKT, SMB, HML)) %>%
# Fit OLS for each asset over its time-series observations
mutate(
model_output = map(
ts_data,
~ tidy(lm(ri ~ MKT + SMB + HML, data = .x))
)
) %>%
# Expand nested model output
unnest(model_output) %>%
# Keep asset identifier, factor name, and estimated loading
select(symbol, term, estimate) %>%
# Pivot from long to wide (one column per factor loading)
pivot_wider(
names_from = term,
values_from = estimate
) %>%
# Rename columns; drop (Intercept) — alphas not used in Pass 2
select(
symbol,
b_MKT = MKT,
b_SMB = SMB,
b_HML = HML
)beta_estimates %>%
kbl(
caption = "Pass 1: Estimated Factor Loadings (Betas) per Asset",
digits = 6,
col.names = c("Symbol", "β MKT", "β SMB", "β HML")
) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE)| Symbol | β MKT | β SMB | β HML |
|---|---|---|---|
| AAPL | 0.900006 | 0.068535 | -0.057821 |
| FORD | 0.512865 | -0.264414 | 0.138003 |
| GE | 1.077868 | 0.099406 | 0.090179 |
| GM | 1.285386 | 0.003905 | -0.022156 |
| IBM | 0.816887 | 0.033604 | -0.012054 |
| MSFT | 0.965625 | 0.058182 | -0.064103 |
Interpretation: Each row shows an asset’s sensitivity to the three Fama-French factors. A \(\hat{\beta}_{\text{MKT}} > 1\) indicates the stock amplifies market movements; \(\hat{\beta}_{\text{SMB}} > 0\) indicates small-cap tilt; \(\hat{\beta}_{\text{HML}} > 0\) indicates value tilt.
Each observation in the original panel receives its asset’s estimated beta vector. These betas serve as fixed regressors in Pass 2.
For each of the \(T\) dates, run a cross-sectional OLS regression of the \(N\) assets’ returns on their betas (from Pass 1). This produces \(T\) estimates of the risk premia \(\hat{\lambda}_t\) for each factor.
risk_premia <- data_with_betas %>%
# Nest each cross-section (N assets × 1 date)
nest(cs_data = c(symbol, ri, b_MKT, b_SMB, b_HML)) %>%
# Fit cross-sectional OLS at each date t
mutate(
model_output = map(
cs_data,
~ tidy(lm(ri ~ b_MKT + b_SMB + b_HML, data = .x))
)
) %>%
# Expand nested output
unnest(model_output) %>%
# Keep date, factor label, and estimated risk premium
select(date, term, estimate) %>%
# Pivot to wide: one lambda column per factor
pivot_wider(
names_from = term,
values_from = estimate
) %>%
# Rename for readability
select(
date,
gamma_0 = `(Intercept)`,
lambda_MKT = b_MKT,
lambda_SMB = b_SMB,
lambda_HML = b_HML
)head(risk_premia, 10) %>%
kbl(
caption = "Pass 2: Cross-Sectional Risk Premia (first 10 dates)",
digits = 6,
col.names = c("Date", "γ₀", "λ MKT", "λ SMB", "λ HML")
) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))| Date | γ₀ | λ MKT | λ SMB | λ HML |
|---|---|---|---|---|
| 4-Jan-11 | -0.029761 | 0.041629 | -0.025520 | 0.057372 |
| 5-Jan-11 | 0.022334 | -0.011347 | -0.158046 | 0.062847 |
| 6-Jan-11 | -0.029243 | 0.037301 | 0.007029 | -0.173234 |
| 7-Jan-11 | -0.017503 | 0.012722 | 0.032269 | -0.064226 |
| 10-Jan-11 | 0.036414 | -0.036631 | 0.017123 | 0.058646 |
| 11-Jan-11 | 0.002744 | 0.004089 | -0.095361 | 0.089858 |
| 12-Jan-11 | 0.072331 | -0.055365 | -0.164496 | 0.043036 |
| 13-Jan-11 | 0.015027 | -0.019357 | 0.001815 | 0.025630 |
| 14-Jan-11 | 0.019775 | -0.016486 | 0.063259 | 0.039214 |
| 18-Jan-11 | -0.018911 | 0.010146 | 0.052508 | -0.090027 |
The Fama-MacBeth risk premia are the time-series averages of the \(\hat{\lambda}_{j,t}\) sequences. A one-sample \(t\)-test against \(\mu = 0\) determines whether each factor is significantly priced:
\[H_0: \bar{\lambda}_j = 0 \quad \text{(factor } j \text{ carries no risk premium)}\] \[H_1: \bar{\lambda}_j \neq 0\]
t_MKT <- t.test(risk_premia$lambda_MKT, mu = 0)
t_SMB <- t.test(risk_premia$lambda_SMB, mu = 0)
t_HML <- t.test(risk_premia$lambda_HML, mu = 0)
One Sample t-test
data: risk_premia$lambda_MKT
t = -0.37879, df = 1256, p-value = 0.7049
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-0.002546371 0.001722208
sample estimates:
mean of x
-0.0004120813
One Sample t-test
data: risk_premia$lambda_SMB
t = 0.97712, df = 1256, p-value = 0.3287
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-0.003711466 0.011076953
sample estimates:
mean of x
0.003682744
summary_table <- tibble(
Factor = c("Market (MKT)", "Size (SMB)", "Value (HML)"),
Mean_Lambda = c(
mean(risk_premia$lambda_MKT, na.rm = TRUE),
mean(risk_premia$lambda_SMB, na.rm = TRUE),
mean(risk_premia$lambda_HML, na.rm = TRUE)
),
Std_Dev = c(
sd(risk_premia$lambda_MKT, na.rm = TRUE),
sd(risk_premia$lambda_SMB, na.rm = TRUE),
sd(risk_premia$lambda_HML, na.rm = TRUE)
),
t_Statistic = c(
t_MKT$statistic,
t_SMB$statistic,
t_HML$statistic
),
p_Value = c(
t_MKT$p.value,
t_SMB$p.value,
t_HML$p.value
),
Significant = c(
ifelse(t_MKT$p.value < 0.05, "Yes *", "No"),
ifelse(t_SMB$p.value < 0.05, "Yes *", "No"),
ifelse(t_HML$p.value < 0.05, "Yes *", "No")
)
)
summary_table %>%
kbl(
caption = "Fama-MacBeth Estimated Risk Premia (Daily, 2011–2015)",
digits = 6,
col.names = c("Factor", "Mean λ", "Std. Dev.", "t-Statistic", "p-Value", "Significant (5%)")
) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE) %>%
column_spec(6, bold = TRUE,
color = ifelse(summary_table$Significant == "Yes *", "green", "red"))| Factor | Mean λ | Std. Dev. | t-Statistic | p-Value | Significant (5%) |
|---|---|---|---|---|---|
| Market (MKT) | -0.000412 | 0.038570 | -0.378788 | 0.704909 | No |
| Size (SMB) | 0.003683 | 0.133626 | 0.977117 | 0.328699 | No |
| Value (HML) | -0.000467 | 0.091705 | -0.180437 | 0.856839 | No |
The plot below shows the time-series variation of the three estimated risk premia. Wide variation around zero is expected given the small cross-section (\(N = 6\)); the Fama-MacBeth inference relies on these averages, not on individual dates.
risk_premia_long <- risk_premia %>%
mutate(date = as.Date(date, format = "%d-%b-%y")) %>%
pivot_longer(
cols = c(lambda_MKT, lambda_SMB, lambda_HML),
names_to = "Factor",
values_to = "Lambda"
) %>%
mutate(
Factor = recode(Factor,
lambda_MKT = "Market (MKT)",
lambda_SMB = "Size (SMB)",
lambda_HML = "Value (HML)"
)
)
ggplot(risk_premia_long, aes(x = date, y = Lambda, colour = Factor)) +
geom_line(alpha = 0.65, linewidth = 0.35) +
geom_hline(yintercept = 0, linetype = "dashed", colour = "grey40", linewidth = 0.5) +
facet_wrap(~ Factor, ncol = 1, scales = "free_y") +
labs(
title = "Fama-MacBeth Cross-Sectional Risk Premia Over Time",
subtitle = "Daily cross-sectional OLS estimates, 2011–2015",
x = "Date",
y = expression(hat(lambda)[t]),
caption = "Source: Author's calculations. Fama-French three-factor model."
) +
scale_colour_manual(values = c(
"Market (MKT)" = "#2C7BB6",
"Size (SMB)" = "#D7191C",
"Value (HML)" = "#1A9641"
)) +
theme_bw(base_size = 11) +
theme(
legend.position = "none",
strip.background = element_rect(fill = "#f0f0f0"),
strip.text = element_text(face = "bold")
)Time series of cross-sectional risk premia estimates from Pass 2 (2011–2015). Dashed line at zero for reference.
Each \(\hat{\lambda}_j\) is the average daily return earned per unit of factor \(j\) exposure, holding the other factors constant.
| Factor | Interpretation |
|---|---|
| \(\hat{\lambda}_{\text{MKT}}\) | Average return premium for one unit of market beta. A positive value is consistent with the CAPM prediction that higher systematic risk is rewarded. |
| \(\hat{\lambda}_{\text{SMB}}\) | Average return premium for small-cap tilt. A positive value indicates a size premium: small stocks outperform large stocks after controlling for market and value exposure. |
| \(\hat{\lambda}_{\text{HML}}\) | Average return premium for value exposure. A positive value means high book-to-market stocks earn higher returns than growth stocks (Fama & French, 1993). |
The one-sample \(t\)-test (\(\mu = 0\)) asks: is the average risk premium for this factor statistically distinguishable from zero?
Small cross-section (\(N = 6\)). The cross-sectional regressions in Pass 2 have only \(df = 2\) (after estimating 4 parameters). A larger cross-section is needed for robust inference.
Errors-in-variables (EIV) bias. Betas estimated over the full sample are used as regressors in Pass 2, introducing measurement error that biases coefficient estimates toward zero (attenuation bias). Rolling-window betas or portfolio sorting can mitigate this.
Serial correlation in \(\hat{\lambda}_t\). Standard Fama-MacBeth standard errors do not correct for autocorrelation in the lambda series. The Shanken
This document was generated with R Markdown and is reproducible.
To knit, ensure data.csv is in the same working directory
as this .Rmd file.