1 Introduction

The Fama-MacBeth (1973) regression is a two-pass procedure widely used in empirical asset pricing to test factor models (e.g., CAPM, Fama-French). It addresses cross-sectional correlation of residuals in panel data.

There are three key steps:

  1. Step 1 – Time-Series Regressions (N regressions): For each asset \(i\), regress its excess returns on factor returns over the full time series to estimate factor loadings (\(\hat{\beta}_i\)).

  2. Step 2 – Cross-Sectional Regressions (T regressions): For each time period \(t\), regress the cross-section of asset returns on the estimated betas to get period-specific risk premia (\(\hat{\lambda}_t\)).

  3. Step 3 – Time-Series Average: Average the \(T\) cross-sectional estimates to obtain the final risk premium estimates, and use their time-series standard deviation for inference.


2 Load Required Packages

# Install if not already installed
packages <- c("tidyverse", "broom", "knitr", "kableExtra", "ggplot2", "lmtest", "sandwich")
installed <- packages %in% rownames(installed.packages())
if (any(!installed)) install.packages(packages[!installed])

library(tidyverse)
library(broom)
library(knitr)
library(kableExtra)
library(ggplot2)
library(lmtest)
library(sandwich)

3 Data Simulation

Since we are replicating the methodology, we simulate a panel dataset of 25 stocks over 60 months with two factors: Market (Mkt) and SMB (Small Minus Big), consistent with the Fama-French framework.

set.seed(42)

N <- 25   # Number of stocks
T <- 60   # Number of time periods (months)

# ── True parameters ──────────────────────────────────────────────────────────
true_lambda_mkt <- 0.06   # True market risk premium (annualised ~6%)
true_lambda_smb <- 0.03   # True SMB risk premium

# ── Simulate factor returns ───────────────────────────────────────────────────
factor_data <- tibble(
  t        = 1:T,
  mkt      = rnorm(T, mean = 0.005, sd = 0.04),   # Monthly market excess return
  smb      = rnorm(T, mean = 0.002, sd = 0.03)    # Monthly SMB return
)

# ── Simulate stock-level betas ────────────────────────────────────────────────
stock_betas <- tibble(
  stock   = paste0("Stock_", sprintf("%02d", 1:N)),
  beta_mkt = runif(N, 0.5, 1.8),
  beta_smb = runif(N, -0.5, 1.2)
)

# ── Simulate stock returns using factor model + idiosyncratic noise ───────────
panel_data <- expand_grid(stock = stock_betas$stock, t = 1:T) %>%
  left_join(stock_betas, by = "stock") %>%
  left_join(factor_data,  by = "t") %>%
  mutate(
    alpha  = rnorm(n(), mean = 0.001, sd = 0.002),
    eps    = rnorm(n(), mean = 0, sd = 0.03),
    ret    = alpha + beta_mkt * mkt + beta_smb * smb + eps
  )

# Preview
head(panel_data, 10) %>%
  kable(caption = "Panel Data (first 10 rows)", digits = 4) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE)
Panel Data (first 10 rows)
stock t beta_mkt beta_smb mkt smb alpha eps ret
Stock_01 1 0.5879 -0.2701 0.0598 -0.0090 0.0032 0.0064 0.0473
Stock_01 2 0.5879 -0.2701 -0.0176 0.0076 0.0000 0.0510 0.0387
Stock_01 3 0.5879 -0.2701 0.0195 0.0195 0.0001 -0.0002 0.0061
Stock_01 4 0.5879 -0.2701 0.0303 0.0440 0.0024 -0.0245 -0.0162
Stock_01 5 0.5879 -0.2701 0.0212 -0.0198 -0.0011 -0.0015 0.0152
Stock_01 6 0.5879 -0.2701 0.0008 0.0411 0.0009 0.0024 -0.0073
Stock_01 7 0.5879 -0.2701 0.0655 0.0121 -0.0021 -0.0468 -0.0137
Stock_01 8 0.5879 -0.2701 0.0012 0.0332 0.0033 -0.0051 -0.0101
Stock_01 9 0.5879 -0.2701 0.0857 0.0296 0.0005 -0.0126 0.0303
Stock_01 10 0.5879 -0.2701 0.0025 0.0236 0.0001 0.0147 0.0099

4 Step 1: Time-Series Regressions (N Regressions)

For each stock \(i\), we estimate:

\[r_{it} = \alpha_i + \beta_{i,\text{mkt}} \cdot \text{Mkt}_t + \beta_{i,\text{smb}} \cdot \text{SMB}_t + \varepsilon_{it}\]

This gives us the estimated factor loadings \(\hat{\beta}_{i,\text{mkt}}\) and \(\hat{\beta}_{i,\text{smb}}\).

# Run one OLS regression per stock
ts_results <- panel_data %>%
  group_by(stock) %>%
  do(tidy(lm(ret ~ mkt + smb, data = .))) %>%
  ungroup()

# Extract betas (slope coefficients only)
estimated_betas <- ts_results %>%
  filter(term %in% c("mkt", "smb")) %>%
  select(stock, term, estimate) %>%
  pivot_wider(names_from = term, values_from = estimate) %>%
  rename(beta_mkt_hat = mkt, beta_smb_hat = smb)

estimated_betas %>%
  kable(caption = "Step 1: Estimated Factor Loadings per Stock", digits = 4) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE) %>%
  scroll_box(height = "400px")
Step 1: Estimated Factor Loadings per Stock
stock beta_mkt_hat beta_smb_hat
Stock_01 0.4929 -0.2392
Stock_02 1.2404 1.0472
Stock_03 0.5443 -0.1766
Stock_04 0.6518 1.2262
Stock_05 1.1945 1.0912
Stock_06 1.1874 -0.2539
Stock_07 0.7203 0.7942
Stock_08 0.7117 0.2026
Stock_09 1.2150 -0.4295
Stock_10 1.6167 0.9617
Stock_11 0.9409 0.0444
Stock_12 0.7856 0.0674
Stock_13 0.7912 0.0591
Stock_14 1.1561 0.3142
Stock_15 0.4728 -0.0379
Stock_16 1.4387 0.3456
Stock_17 0.7244 -0.4609
Stock_18 1.1138 0.1458
Stock_19 1.1585 1.0218
Stock_20 1.0756 0.0949
Stock_21 1.4055 0.0456
Stock_22 0.7792 0.1952
Stock_23 0.8831 -0.3394
Stock_24 1.8404 0.0153
Stock_25 1.1748 -0.5045

4.1 Visualise Estimated Betas

estimated_betas_long <- estimated_betas %>%
  pivot_longer(cols = c(beta_mkt_hat, beta_smb_hat),
               names_to = "factor", values_to = "beta") %>%
  mutate(factor = recode(factor,
                         beta_mkt_hat = "Market Beta",
                         beta_smb_hat = "SMB Beta"))

ggplot(estimated_betas_long, aes(x = reorder(stock, beta), y = beta, fill = factor)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~factor, scales = "free_x") +
  coord_flip() +
  labs(title = "Estimated Factor Loadings (Step 1)",
       x = "Stock", y = "Beta Estimate") +
  theme_minimal(base_size = 11) +
  scale_fill_manual(values = c("Market Beta" = "#2c7bb6", "SMB Beta" = "#d7191c"))


5 Step 2: Cross-Sectional Regressions (T Regressions)

For each time period \(t\), we regress the cross-section of stock returns on the estimated betas:

\[r_{it} = \lambda_{0t} + \lambda_{\text{mkt},t} \cdot \hat{\beta}_{i,\text{mkt}} + \lambda_{\text{smb},t} \cdot \hat{\beta}_{i,\text{smb}} + \alpha_{it}\]

This yields T estimates of the risk premia \(\hat{\lambda}_t\).

# Merge estimated betas back into the panel
panel_with_betas <- panel_data %>%
  left_join(estimated_betas, by = "stock")

# Run one OLS regression per time period
cs_results <- panel_with_betas %>%
  group_by(t) %>%
  do(tidy(lm(ret ~ beta_mkt_hat + beta_smb_hat, data = .))) %>%
  ungroup()

# Extract lambdas
lambdas <- cs_results %>%
  select(t, term, estimate) %>%
  pivot_wider(names_from = term, values_from = estimate) %>%
  rename(
    lambda0      = `(Intercept)`,
    lambda_mkt   = beta_mkt_hat,
    lambda_smb   = beta_smb_hat
  )

head(lambdas, 10) %>%
  kable(caption = "Step 2: Cross-Sectional Risk Premia per Period (first 10)", digits = 5) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE)
Step 2: Cross-Sectional Risk Premia per Period (first 10)
t lambda0 lambda_mkt lambda_smb
1 0.01631 0.03845 -0.00519
2 0.01366 -0.01663 -0.01152
3 0.00268 0.01754 0.03018
4 -0.00558 0.03368 0.03063
5 0.02625 -0.00256 -0.00634
6 -0.01360 0.00272 0.03813
7 0.01163 0.04956 0.01215
8 -0.00983 -0.00198 0.05761
9 0.01899 0.07479 0.01745
10 -0.00223 0.00426 0.02850

5.1 Visualise Risk Premia Over Time

lambdas_long <- lambdas %>%
  pivot_longer(cols = c(lambda_mkt, lambda_smb),
               names_to = "factor", values_to = "lambda") %>%
  mutate(factor = recode(factor,
                         lambda_mkt = "Market Risk Premium",
                         lambda_smb = "SMB Risk Premium"))

ggplot(lambdas_long, aes(x = t, y = lambda, colour = factor)) +
  geom_line(linewidth = 0.8) +
  geom_hline(yintercept = 0, linetype = "dashed", colour = "grey40") +
  facet_wrap(~factor, ncol = 1, scales = "free_y") +
  labs(title = "Step 2: Time-Varying Risk Premia from Cross-Sectional Regressions",
       x = "Time Period (Month)", y = expression(hat(lambda)[t])) +
  theme_minimal(base_size = 11) +
  theme(legend.position = "none") +
  scale_colour_manual(values = c("Market Risk Premium" = "#2c7bb6",
                                 "SMB Risk Premium"    = "#d7191c"))


6 Step 3: Time-Series Average of Risk Premia

The final Fama-MacBeth estimates are the time-series means of the \(T\) cross-sectional estimates. Standard errors are based on the time-series standard deviation (Newey-West adjusted for serial correlation).

\[\hat{\lambda} = \frac{1}{T} \sum_{t=1}^{T} \hat{\lambda}_t, \qquad \text{SE}(\hat{\lambda}) = \frac{s(\hat{\lambda}_t)}{\sqrt{T}}\]

T_obs <- nrow(lambdas)

fm_results <- lambdas %>%
  summarise(
    across(c(lambda0, lambda_mkt, lambda_smb),
           list(
             Mean  = ~mean(.x, na.rm = TRUE),
             SD    = ~sd(.x, na.rm = TRUE),
             SE    = ~sd(.x, na.rm = TRUE) / sqrt(T_obs),
             t_stat = ~mean(.x, na.rm = TRUE) / (sd(.x, na.rm = TRUE) / sqrt(T_obs)),
             p_value = ~2 * pt(-abs(mean(.x, na.rm = TRUE) /
                                      (sd(.x, na.rm = TRUE) / sqrt(T_obs))),
                                df = T_obs - 1)
           ),
           .names = "{.col}__{.fn}"
    )
  ) %>%
  pivot_longer(everything(), names_to = c("Parameter", "Stat"), names_sep = "__") %>%
  pivot_wider(names_from = Stat, values_from = value) %>%
  mutate(
    Significance = case_when(
      p_value < 0.01 ~ "***",
      p_value < 0.05 ~ "**",
      p_value < 0.10 ~ "*",
      TRUE           ~ ""
    ),
    Parameter = recode(Parameter,
                       lambda0     = "Intercept (λ₀)",
                       lambda_mkt  = "Market Risk Premium (λ_mkt)",
                       lambda_smb  = "SMB Risk Premium (λ_smb)")
  )

fm_results %>%
  mutate(across(where(is.numeric), ~round(.x, 5))) %>%
  kable(caption = "Step 3: Fama-MacBeth Final Estimates",
        col.names = c("Parameter", "Mean", "Std Dev", "Std Error",
                      "t-Statistic", "p-Value", "Sig.")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE) %>%
  column_spec(2, bold = TRUE) %>%
  footnote(general = "*** p<0.01, ** p<0.05, * p<0.10. Standard errors computed as SD/sqrt(T).")
Step 3: Fama-MacBeth Final Estimates
Parameter Mean Std Dev Std Error t-Statistic p-Value Sig.
Intercept (λ₀) 0.00282 0.01648 0.00213 1.32464 0.19040
Market Risk Premium (λ_mkt) 0.00215 0.04832 0.00624 0.34400 0.73207
SMB Risk Premium (λ_smb) 0.00386 0.02959 0.00382 1.00913 0.31703
Note:
*** p<0.01, ** p<0.05, * p<0.10. Standard errors computed as SD/sqrt(T).

7 Summary: Fama-MacBeth Regression Workflow

summary_df <- tibble(
  Step        = c("Step 1", "Step 2", "Step 3"),
  Description = c(
    "Time-Series Regressions (N = 25 regressions, one per stock)",
    "Cross-Sectional Regressions (T = 60 regressions, one per month)",
    "Time-Series Average of cross-sectional coefficients"
  ),
  Output      = c(
    "Estimated betas: β̂_i,mkt and β̂_i,smb",
    "Period-specific risk premia: λ̂_mkt,t and λ̂_smb,t",
    "Final risk premia estimates with t-statistics and p-values"
  )
)

summary_df %>%
  kable(caption = "Fama-MacBeth Regression – Three-Step Summary") %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = TRUE) %>%
  column_spec(1, bold = TRUE, color = "white", background = "#2c7bb6")
Fama-MacBeth Regression – Three-Step Summary
Step Description Output
Step 1 Time-Series Regressions (N = 25 regressions, one per stock) Estimated betas: β̂_i,mkt and β̂_i,smb
Step 2 Cross-Sectional Regressions (T = 60 regressions, one per month) Period-specific risk premia: λ̂_mkt,t and λ̂_smb,t
Step 3 Time-Series Average of cross-sectional coefficients Final risk premia estimates with t-statistics and p-values

8 Interpretation

  • Market Risk Premium (\(\hat{\lambda}_{\text{mkt}}\)): The estimated compensation per unit of market beta. A positive and significant value supports the CAPM.
  • SMB Risk Premium (\(\hat{\lambda}_{\text{smb}}\)): The estimated compensation for exposure to the size factor.
  • The Fama-MacBeth standard errors naturally account for cross-sectional correlation in returns because each cross-sectional regression is run separately per period.

Note: For robustness in real applications, consider Newey-West (1987) adjusted standard errors to account for autocorrelation in the \(\hat{\lambda}_t\) series.


9 References

  • Fama, E. F., & MacBeth, J. D. (1973). Risk, return, and equilibrium: Empirical tests. Journal of Political Economy, 81(3), 607–636.
  • Fama, E. F., & French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33(1), 3–56.
  • Newey, W. K., & West, K. D. (1987). A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica, 55(3), 703–708.