1 Introduction

1.1 Motivation

Standard OLS on panel data with cross-sectional correlation in residuals produces biased standard errors. The Fama-MacBeth (1973) procedure corrects for this by running \(T\) separate cross-sectional regressions and averaging the resulting coefficients. The standard errors are computed from the time-series variation of those cross-sectional estimates — not from the pooled residuals — making them robust to cross-sectional dependence.

Fama, E. F., & MacBeth, J. D. (1973). Risk, return, and equilibrium: Empirical tests. Journal of Political Economy, 81(3), 607–636.

1.2 Methodology Overview

The procedure consists of two passes:

Pass 1 — Time-Series Regression (N regressions, one per asset)

For each asset \(i\), regress its full return history on the Fama-French factors to estimate factor loadings (betas):

\[r_{i,t} = \alpha_i + \beta_{i,\text{MKT}} \cdot \text{MKT}_t + \beta_{i,\text{SMB}} \cdot \text{SMB}_t + \beta_{i,\text{HML}} \cdot \text{HML}_t + \varepsilon_{i,t}\]

Pass 2 — Cross-Sectional Regression (T regressions, one per date)

At each date \(t\), regress the cross-section of returns on the betas from Pass 1 to obtain period-\(t\) risk premia \(\hat{\lambda}_t\):

\[r_{i,t} = \gamma_{0,t} + \lambda_{\text{MKT},t} \cdot \hat{\beta}_{i,\text{MKT}} + \lambda_{\text{SMB},t} \cdot \hat{\beta}_{i,\text{SMB}} + \lambda_{\text{HML},t} \cdot \hat{\beta}_{i,\text{HML}} + u_{i,t}\]

Final Step — Averaging & Inference

Average the \(T\) estimates of each \(\lambda\) across time and test whether the mean differs significantly from zero:

\[\hat{\lambda}_j = \frac{1}{T} \sum_{t=1}^{T} \hat{\lambda}_{j,t}, \qquad H_0: \bar{\lambda}_j = 0 \quad \text{vs} \quad H_1: \bar{\lambda}_j \neq 0\]

1.3 Data

This analysis uses a panel of 6 U.S. stocks (AAPL, FORD, GE, GM, IBM, MSFT) with daily returns and Fama-French factor observations from January 2011 to December 2015 (≈ 1,257 trading days × 6 assets = 7,542 observations).

2 Setup

2.1 Package Installation & Loading

# Install any missing packages automatically
required_packages <- c("tidyverse", "broom", "kableExtra")

new_packages <- required_packages[
  !(required_packages %in% installed.packages()[, "Package"])
]
if (length(new_packages) > 0) {
  install.packages(new_packages, dependencies = TRUE)
}

# Load libraries
library(tidyverse)    # Data manipulation and pipe operator
library(broom)        # Tidies model output (tidy, glance, augment)
library(kableExtra)   # Enhanced kable tables for R Markdown

3 Data Import & Inspection

# Read the dataset
# Columns:
#   symbol : stock ticker identifier
#   date   : trading date (character, format: d-Mon-YY)
#   ri     : daily return of asset i
#   MKT    : Fama-French market factor (excess market return)
#   SMB    : Small-Minus-Big size factor
#   HML    : High-Minus-Low value factor

data <- read.csv("data.csv", stringsAsFactors = FALSE)

glimpse(data)

Rows: 7,542
Columns: 6
$ symbol <chr> "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL",…
$ date   <chr> "4-Jan-11", "5-Jan-11", "6-Jan-11", "7-Jan-11", "10-Jan-11", "1…
$ ri     <dbl> 0.0052062641, 0.0081462879, -0.0008082435, 0.0071360567, 0.0186…
$ MKT    <dbl> -0.0013138901, 0.0049946699, -0.0021252276, -0.0018465050, -0.0…
$ SMB    <dbl> -0.0065, 0.0018, 0.0001, 0.0022, 0.0041, 0.0016, 0.0031, -0.002…
$ HML    <dbl> 0.0008, 0.0013, -0.0025, -0.0006, 0.0039, 0.0036, 0.0000, -0.00…

# Panel summary
tibble(
  Metric = c("Total observations", "Number of assets", "Assets",
             "First date", "Last date"),
  Value  = c(
    nrow(data),
    n_distinct(data$symbol),
    paste(sort(unique(data$symbol)), collapse = ", "),
    data$date[1],
    data$date[nrow(data)]
  )
) %>%
  kbl(caption = "Panel Dataset Summary") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE)

Panel Dataset Summary
Metric	Value
Total observations	7542
Number of assets	6
Assets	AAPL, FORD, GE, GM, IBM, MSFT
First date	4-Jan-11
Last date	31-Dec-15

# First few rows
head(data, 10) %>%
  kbl(caption = "First 10 Rows of the Dataset", digits = 6) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))

First 10 Rows of the Dataset
symbol	date	ri	MKT	SMB	HML
AAPL	4-Jan-11	0.005206	-0.001314	-0.0065	0.0008
AAPL	5-Jan-11	0.008146	0.004995	0.0018	0.0013
AAPL	6-Jan-11	-0.000808	-0.002125	0.0001	-0.0025
AAPL	7-Jan-11	0.007136	-0.001847	0.0022	-0.0006
AAPL	10-Jan-11	0.018657	-0.001377	0.0041	0.0039
AAPL	11-Jan-11	-0.002368	0.003718	0.0016	0.0036
AAPL	12-Jan-11	0.008104	0.008967	0.0031	0.0000
AAPL	13-Jan-11	0.003652	-0.001712	-0.0026	-0.0044
AAPL	14-Jan-11	0.008067	0.007357	-0.0010	-0.0073
AAPL	18-Jan-11	-0.022725	0.001375	0.0056	0.0015

4 Pass 1: Time-Series Regressions

For each of the \(N\) assets, run a single OLS regression of the asset’s returns on the three Fama-French factors over the full sample. This yields one set of factor loadings per asset: \(\hat{\beta}_{i,\text{MKT}}\), \(\hat{\beta}_{i,\text{SMB}}\), \(\hat{\beta}_{i,\text{HML}}\).

beta_estimates <- data %>%

  # Nest the full time series for each asset
  nest(ts_data = c(date, ri, MKT, SMB, HML)) %>%

  # Fit OLS for each asset over its time-series observations
  mutate(
    model_output = map(
      ts_data,
      ~ tidy(lm(ri ~ MKT + SMB + HML, data = .x))
    )
  ) %>%

  # Expand nested model output
  unnest(model_output) %>%

  # Keep asset identifier, factor name, and estimated loading
  select(symbol, term, estimate) %>%

  # Pivot from long to wide (one column per factor loading)
  pivot_wider(
    names_from  = term,
    values_from = estimate
  ) %>%

  # Rename columns; drop (Intercept) — alphas not used in Pass 2
  select(
    symbol,
    b_MKT = MKT,
    b_SMB = SMB,
    b_HML = HML
  )

4.1 Estimated Factor Loadings

beta_estimates %>%
  kbl(
    caption = "Pass 1: Estimated Factor Loadings (Betas) per Asset",
    digits  = 6,
    col.names = c("Symbol", "β MKT", "β SMB", "β HML")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE)

Pass 1: Estimated Factor Loadings (Betas) per Asset
Symbol	β MKT	β SMB	β HML
AAPL	0.900006	0.068535	-0.057821
FORD	0.512865	-0.264414	0.138003
GE	1.077868	0.099406	0.090179
GM	1.285386	0.003905	-0.022156
IBM	0.816887	0.033604	-0.012054
MSFT	0.965625	0.058182	-0.064103

Interpretation: Each row shows an asset’s sensitivity to the three Fama-French factors. A \(\hat{\beta}_{\text{MKT}} > 1\) indicates the stock amplifies market movements; \(\hat{\beta}_{\text{SMB}} > 0\) indicates small-cap tilt; \(\hat{\beta}_{\text{HML}} > 0\) indicates value tilt.

5 Merge Betas into Panel

Each observation in the original panel receives its asset’s estimated beta vector. These betas serve as fixed regressors in Pass 2.

data_with_betas <- data %>%
  left_join(beta_estimates, by = "symbol")

6 Pass 2: Cross-Sectional Regressions

For each of the \(T\) dates, run a cross-sectional OLS regression of the \(N\) assets’ returns on their betas (from Pass 1). This produces \(T\) estimates of the risk premia \(\hat{\lambda}_t\) for each factor.

risk_premia <- data_with_betas %>%

  # Nest each cross-section (N assets × 1 date)
  nest(cs_data = c(symbol, ri, b_MKT, b_SMB, b_HML)) %>%

  # Fit cross-sectional OLS at each date t
  mutate(
    model_output = map(
      cs_data,
      ~ tidy(lm(ri ~ b_MKT + b_SMB + b_HML, data = .x))
    )
  ) %>%

  # Expand nested output
  unnest(model_output) %>%

  # Keep date, factor label, and estimated risk premium
  select(date, term, estimate) %>%

  # Pivot to wide: one lambda column per factor
  pivot_wider(
    names_from  = term,
    values_from = estimate
  ) %>%

  # Rename for readability
  select(
    date,
    gamma_0    = `(Intercept)`,
    lambda_MKT = b_MKT,
    lambda_SMB = b_SMB,
    lambda_HML = b_HML
  )

6.1 Sample of Cross-Sectional Risk Premia

head(risk_premia, 10) %>%
  kbl(
    caption   = "Pass 2: Cross-Sectional Risk Premia (first 10 dates)",
    digits    = 6,
    col.names = c("Date", "γ₀", "λ MKT", "λ SMB", "λ HML")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))

Pass 2: Cross-Sectional Risk Premia (first 10 dates)
Date	γ₀	λ MKT	λ SMB	λ HML
4-Jan-11	-0.029761	0.041629	-0.025520	0.057372
5-Jan-11	0.022334	-0.011347	-0.158046	0.062847
6-Jan-11	-0.029243	0.037301	0.007029	-0.173234
7-Jan-11	-0.017503	0.012722	0.032269	-0.064226
10-Jan-11	0.036414	-0.036631	0.017123	0.058646
11-Jan-11	0.002744	0.004089	-0.095361	0.089858
12-Jan-11	0.072331	-0.055365	-0.164496	0.043036
13-Jan-11	0.015027	-0.019357	0.001815	0.025630
14-Jan-11	0.019775	-0.016486	0.063259	0.039214
18-Jan-11	-0.018911	0.010146	0.052508	-0.090027

7 Final Step: Averaging & Significance Tests

The Fama-MacBeth risk premia are the time-series averages of the \(\hat{\lambda}_{j,t}\) sequences. A one-sample \(t\)-test against \(\mu = 0\) determines whether each factor is significantly priced:

\[H_0: \bar{\lambda}_j = 0 \quad \text{(factor } j \text{ carries no risk premium)}\] \[H_1: \bar{\lambda}_j \neq 0\]

t_MKT <- t.test(risk_premia$lambda_MKT, mu = 0)
t_SMB <- t.test(risk_premia$lambda_SMB, mu = 0)
t_HML <- t.test(risk_premia$lambda_HML, mu = 0)

7.0.1 Market Factor (MKT)

print(t_MKT)


    One Sample t-test

data:  risk_premia$lambda_MKT
t = -0.37879, df = 1256, p-value = 0.7049
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -0.002546371  0.001722208
sample estimates:
    mean of x 
-0.0004120813

7.0.2 Size Factor (SMB)

print(t_SMB)


    One Sample t-test

data:  risk_premia$lambda_SMB
t = 0.97712, df = 1256, p-value = 0.3287
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -0.003711466  0.011076953
sample estimates:
  mean of x 
0.003682744

7.0.3 Value Factor (HML)

print(t_HML)


    One Sample t-test

data:  risk_premia$lambda_HML
t = -0.18044, df = 1256, p-value = 0.8568
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -0.005541205  0.004607776
sample estimates:
    mean of x 
-0.0004667146

8 Results Summary Table

summary_table <- tibble(
  Factor       = c("Market (MKT)", "Size (SMB)", "Value (HML)"),
  Mean_Lambda  = c(
    mean(risk_premia$lambda_MKT, na.rm = TRUE),
    mean(risk_premia$lambda_SMB, na.rm = TRUE),
    mean(risk_premia$lambda_HML, na.rm = TRUE)
  ),
  Std_Dev      = c(
    sd(risk_premia$lambda_MKT, na.rm = TRUE),
    sd(risk_premia$lambda_SMB, na.rm = TRUE),
    sd(risk_premia$lambda_HML, na.rm = TRUE)
  ),
  t_Statistic  = c(
    t_MKT$statistic,
    t_SMB$statistic,
    t_HML$statistic
  ),
  p_Value      = c(
    t_MKT$p.value,
    t_SMB$p.value,
    t_HML$p.value
  ),
  Significant  = c(
    ifelse(t_MKT$p.value < 0.05, "Yes *", "No"),
    ifelse(t_SMB$p.value < 0.05, "Yes *", "No"),
    ifelse(t_HML$p.value < 0.05, "Yes *", "No")
  )
)

summary_table %>%
  kbl(
    caption   = "Fama-MacBeth Estimated Risk Premia (Daily, 2011–2015)",
    digits    = 6,
    col.names = c("Factor", "Mean λ", "Std. Dev.", "t-Statistic", "p-Value", "Significant (5%)")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE) %>%
  column_spec(6, bold = TRUE,
              color = ifelse(summary_table$Significant == "Yes *", "green", "red"))

Fama-MacBeth Estimated Risk Premia (Daily, 2011–2015)
Factor	Mean λ	Std. Dev.	t-Statistic	p-Value	Significant (5%)
Market (MKT)	-0.000412	0.038570	-0.378788	0.704909	No
Size (SMB)	0.003683	0.133626	0.977117	0.328699	No
Value (HML)	-0.000467	0.091705	-0.180437	0.856839	No

9 Visualisation

The plot below shows the time-series variation of the three estimated risk premia. Wide variation around zero is expected given the small cross-section (\(N = 6\)); the Fama-MacBeth inference relies on these averages, not on individual dates.

risk_premia_long <- risk_premia %>%
  mutate(date = as.Date(date, format = "%d-%b-%y")) %>%
  pivot_longer(
    cols      = c(lambda_MKT, lambda_SMB, lambda_HML),
    names_to  = "Factor",
    values_to = "Lambda"
  ) %>%
  mutate(
    Factor = recode(Factor,
      lambda_MKT = "Market (MKT)",
      lambda_SMB = "Size (SMB)",
      lambda_HML = "Value (HML)"
    )
  )

ggplot(risk_premia_long, aes(x = date, y = Lambda, colour = Factor)) +
  geom_line(alpha = 0.65, linewidth = 0.35) +
  geom_hline(yintercept = 0, linetype = "dashed", colour = "grey40", linewidth = 0.5) +
  facet_wrap(~ Factor, ncol = 1, scales = "free_y") +
  labs(
    title    = "Fama-MacBeth Cross-Sectional Risk Premia Over Time",
    subtitle = "Daily cross-sectional OLS estimates, 2011–2015",
    x        = "Date",
    y        = expression(hat(lambda)[t]),
    caption  = "Source: Author's calculations. Fama-French three-factor model."
  ) +
  scale_colour_manual(values = c(
    "Market (MKT)" = "#2C7BB6",
    "Size (SMB)"   = "#D7191C",
    "Value (HML)"  = "#1A9641"
  )) +
  theme_bw(base_size = 11) +
  theme(
    legend.position  = "none",
    strip.background = element_rect(fill = "#f0f0f0"),
    strip.text       = element_text(face = "bold")
  )

Time series of cross-sectional risk premia estimates from Pass 2 (2011–2015). Dashed line at zero for reference.

10 Interpretation & Conclusions

10.1 What the Coefficients Represent

Each \(\hat{\lambda}_j\) is the average daily return earned per unit of factor \(j\) exposure, holding the other factors constant.

Factor	Interpretation
\(\hat{\lambda}_{\text{MKT}}\)	Average return premium for one unit of market beta. A positive value is consistent with the CAPM prediction that higher systematic risk is rewarded.
\(\hat{\lambda}_{\text{SMB}}\)	Average return premium for small-cap tilt. A positive value indicates a size premium: small stocks outperform large stocks after controlling for market and value exposure.
\(\hat{\lambda}_{\text{HML}}\)	Average return premium for value exposure. A positive value means high book-to-market stocks earn higher returns than growth stocks (Fama & French, 1993).

10.2 How to Interpret the t-Tests

The one-sample \(t\)-test (\(\mu = 0\)) asks: is the average risk premium for this factor statistically distinguishable from zero?

\(p < 0.05\): Reject \(H_0\). The factor is priced — the market compensates (or penalises) investors for bearing that specific source of risk.
\(p \geq 0.05\): Fail to reject \(H_0\). Insufficient evidence that the factor commands a risk premium in this sample.

10.3 Limitations

Small cross-section (\(N = 6\)). The cross-sectional regressions in Pass 2 have only \(df = 2\) (after estimating 4 parameters). A larger cross-section is needed for robust inference.
Errors-in-variables (EIV) bias. Betas estimated over the full sample are used as regressors in Pass 2, introducing measurement error that biases coefficient estimates toward zero (attenuation bias). Rolling-window betas or portfolio sorting can mitigate this.
Serial correlation in \(\hat{\lambda}_t\). Standard Fama-MacBeth standard errors do not correct for autocorrelation in the lambda series. The Shanken
1. correction or Newey-West adjustment may be warranted for persistent factors.

11 References

Fama, E. F., & MacBeth, J. D. (1973). Risk, return, and equilibrium: Empirical tests. Journal of Political Economy, 81(3), 607–636.
Fama, E. F., & French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33(1), 3–56.
Shanken, J. (1992). On the estimation of beta-pricing models. Review of Financial Studies, 5(1), 1–33.
The Data Hall. (2024). Fama MacBeth Regression in R [YouTube]. https://www.youtube.com/watch?v=dLvjmYj-PVA

This document was generated with R Markdown and is reproducible. To knit, ensure data.csv is in the same working directory as this .Rmd file.

Fama-MacBeth Regression in R

Two-Pass Estimation of the Fama-French Three-Factor Model

Jourast Buwana

May 19, 2026