1 Introduction

1.1 Motivation

Standard OLS on panel data with cross-sectional correlation in residuals produces biased standard errors. The Fama-MacBeth (1973) procedure corrects for this by running \(T\) separate cross-sectional regressions and averaging the resulting coefficients. The standard errors are computed from the time-series variation of those cross-sectional estimates — not from the pooled residuals — making them robust to cross-sectional dependence.

Fama, E. F., & MacBeth, J. D. (1973). Risk, return, and equilibrium: Empirical tests. Journal of Political Economy, 81(3), 607–636.

1.2 Methodology Overview

The procedure consists of two passes:

Pass 1 — Time-Series Regression (N regressions, one per asset)

For each asset \(i\), regress its full return history on the Fama-French factors to estimate factor loadings (betas):

\[r_{i,t} = \alpha_i + \beta_{i,\text{MKT}} \cdot \text{MKT}_t + \beta_{i,\text{SMB}} \cdot \text{SMB}_t + \beta_{i,\text{HML}} \cdot \text{HML}_t + \varepsilon_{i,t}\]

Pass 2 — Cross-Sectional Regression (T regressions, one per date)

At each date \(t\), regress the cross-section of returns on the betas from Pass 1 to obtain period-\(t\) risk premia \(\hat{\lambda}_t\):

\[r_{i,t} = \gamma_{0,t} + \lambda_{\text{MKT},t} \cdot \hat{\beta}_{i,\text{MKT}} + \lambda_{\text{SMB},t} \cdot \hat{\beta}_{i,\text{SMB}} + \lambda_{\text{HML},t} \cdot \hat{\beta}_{i,\text{HML}} + u_{i,t}\]

Final Step — Averaging & Inference

Average the \(T\) estimates of each \(\lambda\) across time and test whether the mean differs significantly from zero:

\[\hat{\lambda}_j = \frac{1}{T} \sum_{t=1}^{T} \hat{\lambda}_{j,t}, \qquad H_0: \bar{\lambda}_j = 0 \quad \text{vs} \quad H_1: \bar{\lambda}_j \neq 0\]

1.3 Data

This analysis uses a panel of 6 U.S. stocks (AAPL, FORD, GE, GM, IBM, MSFT) with daily returns and Fama-French factor observations from January 2011 to December 2015 (≈ 1,257 trading days × 6 assets = 7,542 observations).


2 Setup

2.1 Package Installation & Loading

# Install any missing packages automatically
required_packages <- c("tidyverse", "broom", "kableExtra")

new_packages <- required_packages[
  !(required_packages %in% installed.packages()[, "Package"])
]
if (length(new_packages) > 0) {
  install.packages(new_packages, dependencies = TRUE)
}

# Load libraries
library(tidyverse)    # Data manipulation and pipe operator
library(broom)        # Tidies model output (tidy, glance, augment)
library(kableExtra)   # Enhanced kable tables for R Markdown

3 Data Import & Inspection

# Read the dataset
# Columns:
#   symbol : stock ticker identifier
#   date   : trading date (character, format: d-Mon-YY)
#   ri     : daily return of asset i
#   MKT    : Fama-French market factor (excess market return)
#   SMB    : Small-Minus-Big size factor
#   HML    : High-Minus-Low value factor

data <- read.csv("data.csv", stringsAsFactors = FALSE)

glimpse(data)
Rows: 7,542
Columns: 6
$ symbol <chr> "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL",…
$ date   <chr> "4-Jan-11", "5-Jan-11", "6-Jan-11", "7-Jan-11", "10-Jan-11", "1…
$ ri     <dbl> 0.0052062641, 0.0081462879, -0.0008082435, 0.0071360567, 0.0186…
$ MKT    <dbl> -0.0013138901, 0.0049946699, -0.0021252276, -0.0018465050, -0.0…
$ SMB    <dbl> -0.0065, 0.0018, 0.0001, 0.0022, 0.0041, 0.0016, 0.0031, -0.002…
$ HML    <dbl> 0.0008, 0.0013, -0.0025, -0.0006, 0.0039, 0.0036, 0.0000, -0.00…
# Panel summary
tibble(
  Metric = c("Total observations", "Number of assets", "Assets",
             "First date", "Last date"),
  Value  = c(
    nrow(data),
    n_distinct(data$symbol),
    paste(sort(unique(data$symbol)), collapse = ", "),
    data$date[1],
    data$date[nrow(data)]
  )
) %>%
  kbl(caption = "Panel Dataset Summary") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE)
Panel Dataset Summary
Metric Value
Total observations 7542
Number of assets 6
Assets AAPL, FORD, GE, GM, IBM, MSFT
First date 4-Jan-11
Last date 31-Dec-15
# First few rows
head(data, 10) %>%
  kbl(caption = "First 10 Rows of the Dataset", digits = 6) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
First 10 Rows of the Dataset
symbol date ri MKT SMB HML
AAPL 4-Jan-11 0.005206 -0.001314 -0.0065 0.0008
AAPL 5-Jan-11 0.008146 0.004995 0.0018 0.0013
AAPL 6-Jan-11 -0.000808 -0.002125 0.0001 -0.0025
AAPL 7-Jan-11 0.007136 -0.001847 0.0022 -0.0006
AAPL 10-Jan-11 0.018657 -0.001377 0.0041 0.0039
AAPL 11-Jan-11 -0.002368 0.003718 0.0016 0.0036
AAPL 12-Jan-11 0.008104 0.008967 0.0031 0.0000
AAPL 13-Jan-11 0.003652 -0.001712 -0.0026 -0.0044
AAPL 14-Jan-11 0.008067 0.007357 -0.0010 -0.0073
AAPL 18-Jan-11 -0.022725 0.001375 0.0056 0.0015

4 Pass 1: Time-Series Regressions

For each of the \(N\) assets, run a single OLS regression of the asset’s returns on the three Fama-French factors over the full sample. This yields one set of factor loadings per asset: \(\hat{\beta}_{i,\text{MKT}}\), \(\hat{\beta}_{i,\text{SMB}}\), \(\hat{\beta}_{i,\text{HML}}\).

beta_estimates <- data %>%

  # Nest the full time series for each asset
  nest(ts_data = c(date, ri, MKT, SMB, HML)) %>%

  # Fit OLS for each asset over its time-series observations
  mutate(
    model_output = map(
      ts_data,
      ~ tidy(lm(ri ~ MKT + SMB + HML, data = .x))
    )
  ) %>%

  # Expand nested model output
  unnest(model_output) %>%

  # Keep asset identifier, factor name, and estimated loading
  select(symbol, term, estimate) %>%

  # Pivot from long to wide (one column per factor loading)
  pivot_wider(
    names_from  = term,
    values_from = estimate
  ) %>%

  # Rename columns; drop (Intercept) — alphas not used in Pass 2
  select(
    symbol,
    b_MKT = MKT,
    b_SMB = SMB,
    b_HML = HML
  )

4.1 Estimated Factor Loadings

beta_estimates %>%
  kbl(
    caption = "Pass 1: Estimated Factor Loadings (Betas) per Asset",
    digits  = 6,
    col.names = c("Symbol", "β MKT", "β SMB", "β HML")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE)
Pass 1: Estimated Factor Loadings (Betas) per Asset
Symbol β MKT β SMB β HML
AAPL 0.900006 0.068535 -0.057821
FORD 0.512865 -0.264414 0.138003
GE 1.077868 0.099406 0.090179
GM 1.285386 0.003905 -0.022156
IBM 0.816887 0.033604 -0.012054
MSFT 0.965625 0.058182 -0.064103

Interpretation: Each row shows an asset’s sensitivity to the three Fama-French factors. A \(\hat{\beta}_{\text{MKT}} > 1\) indicates the stock amplifies market movements; \(\hat{\beta}_{\text{SMB}} > 0\) indicates small-cap tilt; \(\hat{\beta}_{\text{HML}} > 0\) indicates value tilt.


5 Merge Betas into Panel

Each observation in the original panel receives its asset’s estimated beta vector. These betas serve as fixed regressors in Pass 2.

data_with_betas <- data %>%
  left_join(beta_estimates, by = "symbol")

6 Pass 2: Cross-Sectional Regressions

For each of the \(T\) dates, run a cross-sectional OLS regression of the \(N\) assets’ returns on their betas (from Pass 1). This produces \(T\) estimates of the risk premia \(\hat{\lambda}_t\) for each factor.

risk_premia <- data_with_betas %>%

  # Nest each cross-section (N assets × 1 date)
  nest(cs_data = c(symbol, ri, b_MKT, b_SMB, b_HML)) %>%

  # Fit cross-sectional OLS at each date t
  mutate(
    model_output = map(
      cs_data,
      ~ tidy(lm(ri ~ b_MKT + b_SMB + b_HML, data = .x))
    )
  ) %>%

  # Expand nested output
  unnest(model_output) %>%

  # Keep date, factor label, and estimated risk premium
  select(date, term, estimate) %>%

  # Pivot to wide: one lambda column per factor
  pivot_wider(
    names_from  = term,
    values_from = estimate
  ) %>%

  # Rename for readability
  select(
    date,
    gamma_0    = `(Intercept)`,
    lambda_MKT = b_MKT,
    lambda_SMB = b_SMB,
    lambda_HML = b_HML
  )

6.1 Sample of Cross-Sectional Risk Premia

head(risk_premia, 10) %>%
  kbl(
    caption   = "Pass 2: Cross-Sectional Risk Premia (first 10 dates)",
    digits    = 6,
    col.names = c("Date", "γ₀", "λ MKT", "λ SMB", "λ HML")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Pass 2: Cross-Sectional Risk Premia (first 10 dates)
Date γ₀ λ MKT λ SMB λ HML
4-Jan-11 -0.029761 0.041629 -0.025520 0.057372
5-Jan-11 0.022334 -0.011347 -0.158046 0.062847
6-Jan-11 -0.029243 0.037301 0.007029 -0.173234
7-Jan-11 -0.017503 0.012722 0.032269 -0.064226
10-Jan-11 0.036414 -0.036631 0.017123 0.058646
11-Jan-11 0.002744 0.004089 -0.095361 0.089858
12-Jan-11 0.072331 -0.055365 -0.164496 0.043036
13-Jan-11 0.015027 -0.019357 0.001815 0.025630
14-Jan-11 0.019775 -0.016486 0.063259 0.039214
18-Jan-11 -0.018911 0.010146 0.052508 -0.090027

7 Final Step: Averaging & Significance Tests

The Fama-MacBeth risk premia are the time-series averages of the \(\hat{\lambda}_{j,t}\) sequences. A one-sample \(t\)-test against \(\mu = 0\) determines whether each factor is significantly priced:

\[H_0: \bar{\lambda}_j = 0 \quad \text{(factor } j \text{ carries no risk premium)}\] \[H_1: \bar{\lambda}_j \neq 0\]

t_MKT <- t.test(risk_premia$lambda_MKT, mu = 0)
t_SMB <- t.test(risk_premia$lambda_SMB, mu = 0)
t_HML <- t.test(risk_premia$lambda_HML, mu = 0)

7.0.1 Market Factor (MKT)

print(t_MKT)

    One Sample t-test

data:  risk_premia$lambda_MKT
t = -0.37879, df = 1256, p-value = 0.7049
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -0.002546371  0.001722208
sample estimates:
    mean of x 
-0.0004120813 

7.0.2 Size Factor (SMB)

print(t_SMB)

    One Sample t-test

data:  risk_premia$lambda_SMB
t = 0.97712, df = 1256, p-value = 0.3287
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -0.003711466  0.011076953
sample estimates:
  mean of x 
0.003682744 

7.0.3 Value Factor (HML)

print(t_HML)

    One Sample t-test

data:  risk_premia$lambda_HML
t = -0.18044, df = 1256, p-value = 0.8568
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -0.005541205  0.004607776
sample estimates:
    mean of x 
-0.0004667146 

8 Results Summary Table

summary_table <- tibble(
  Factor       = c("Market (MKT)", "Size (SMB)", "Value (HML)"),
  Mean_Lambda  = c(
    mean(risk_premia$lambda_MKT, na.rm = TRUE),
    mean(risk_premia$lambda_SMB, na.rm = TRUE),
    mean(risk_premia$lambda_HML, na.rm = TRUE)
  ),
  Std_Dev      = c(
    sd(risk_premia$lambda_MKT, na.rm = TRUE),
    sd(risk_premia$lambda_SMB, na.rm = TRUE),
    sd(risk_premia$lambda_HML, na.rm = TRUE)
  ),
  t_Statistic  = c(
    t_MKT$statistic,
    t_SMB$statistic,
    t_HML$statistic
  ),
  p_Value      = c(
    t_MKT$p.value,
    t_SMB$p.value,
    t_HML$p.value
  ),
  Significant  = c(
    ifelse(t_MKT$p.value < 0.05, "Yes *", "No"),
    ifelse(t_SMB$p.value < 0.05, "Yes *", "No"),
    ifelse(t_HML$p.value < 0.05, "Yes *", "No")
  )
)

summary_table %>%
  kbl(
    caption   = "Fama-MacBeth Estimated Risk Premia (Daily, 2011–2015)",
    digits    = 6,
    col.names = c("Factor", "Mean λ", "Std. Dev.", "t-Statistic", "p-Value", "Significant (5%)")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE) %>%
  column_spec(6, bold = TRUE,
              color = ifelse(summary_table$Significant == "Yes *", "green", "red"))
Fama-MacBeth Estimated Risk Premia (Daily, 2011–2015)
Factor Mean λ Std. Dev. t-Statistic p-Value Significant (5%)
Market (MKT) -0.000412 0.038570 -0.378788 0.704909 No
Size (SMB) 0.003683 0.133626 0.977117 0.328699 No
Value (HML) -0.000467 0.091705 -0.180437 0.856839 No

9 Visualisation

The plot below shows the time-series variation of the three estimated risk premia. Wide variation around zero is expected given the small cross-section (\(N = 6\)); the Fama-MacBeth inference relies on these averages, not on individual dates.

risk_premia_long <- risk_premia %>%
  mutate(date = as.Date(date, format = "%d-%b-%y")) %>%
  pivot_longer(
    cols      = c(lambda_MKT, lambda_SMB, lambda_HML),
    names_to  = "Factor",
    values_to = "Lambda"
  ) %>%
  mutate(
    Factor = recode(Factor,
      lambda_MKT = "Market (MKT)",
      lambda_SMB = "Size (SMB)",
      lambda_HML = "Value (HML)"
    )
  )

ggplot(risk_premia_long, aes(x = date, y = Lambda, colour = Factor)) +
  geom_line(alpha = 0.65, linewidth = 0.35) +
  geom_hline(yintercept = 0, linetype = "dashed", colour = "grey40", linewidth = 0.5) +
  facet_wrap(~ Factor, ncol = 1, scales = "free_y") +
  labs(
    title    = "Fama-MacBeth Cross-Sectional Risk Premia Over Time",
    subtitle = "Daily cross-sectional OLS estimates, 2011–2015",
    x        = "Date",
    y        = expression(hat(lambda)[t]),
    caption  = "Source: Author's calculations. Fama-French three-factor model."
  ) +
  scale_colour_manual(values = c(
    "Market (MKT)" = "#2C7BB6",
    "Size (SMB)"   = "#D7191C",
    "Value (HML)"  = "#1A9641"
  )) +
  theme_bw(base_size = 11) +
  theme(
    legend.position  = "none",
    strip.background = element_rect(fill = "#f0f0f0"),
    strip.text       = element_text(face = "bold")
  )
Time series of cross-sectional risk premia estimates from Pass 2 (2011–2015). Dashed line at zero for reference.

Time series of cross-sectional risk premia estimates from Pass 2 (2011–2015). Dashed line at zero for reference.


10 Interpretation & Conclusions

10.1 What the Coefficients Represent

Each \(\hat{\lambda}_j\) is the average daily return earned per unit of factor \(j\) exposure, holding the other factors constant.

Factor Interpretation
\(\hat{\lambda}_{\text{MKT}}\) Average return premium for one unit of market beta. A positive value is consistent with the CAPM prediction that higher systematic risk is rewarded.
\(\hat{\lambda}_{\text{SMB}}\) Average return premium for small-cap tilt. A positive value indicates a size premium: small stocks outperform large stocks after controlling for market and value exposure.
\(\hat{\lambda}_{\text{HML}}\) Average return premium for value exposure. A positive value means high book-to-market stocks earn higher returns than growth stocks (Fama & French, 1993).

10.2 How to Interpret the t-Tests

The one-sample \(t\)-test (\(\mu = 0\)) asks: is the average risk premium for this factor statistically distinguishable from zero?

  • \(p < 0.05\): Reject \(H_0\). The factor is priced — the market compensates (or penalises) investors for bearing that specific source of risk.
  • \(p \geq 0.05\): Fail to reject \(H_0\). Insufficient evidence that the factor commands a risk premium in this sample.

10.3 Limitations

  1. Small cross-section (\(N = 6\)). The cross-sectional regressions in Pass 2 have only \(df = 2\) (after estimating 4 parameters). A larger cross-section is needed for robust inference.

  2. Errors-in-variables (EIV) bias. Betas estimated over the full sample are used as regressors in Pass 2, introducing measurement error that biases coefficient estimates toward zero (attenuation bias). Rolling-window betas or portfolio sorting can mitigate this.

  3. Serial correlation in \(\hat{\lambda}_t\). Standard Fama-MacBeth standard errors do not correct for autocorrelation in the lambda series. The Shanken

    1. correction or Newey-West adjustment may be warranted for persistent factors.

11 References

  • Fama, E. F., & MacBeth, J. D. (1973). Risk, return, and equilibrium: Empirical tests. Journal of Political Economy, 81(3), 607–636.
  • Fama, E. F., & French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33(1), 3–56.
  • Shanken, J. (1992). On the estimation of beta-pricing models. Review of Financial Studies, 5(1), 1–33.
  • The Data Hall. (2024). Fama MacBeth Regression in R [YouTube]. https://www.youtube.com/watch?v=dLvjmYj-PVA

This document was generated with R Markdown and is reproducible. To knit, ensure data.csv is in the same working directory as this .Rmd file.