1 Introduction

The Fama-MacBeth (1973) regression is a two-pass procedure widely used in empirical asset pricing to test factor models (e.g., CAPM, Fama-French). It addresses cross-sectional correlation of residuals in panel data.

There are three key steps:

Step 1 – Time-Series Regressions (N regressions): For each asset \(i\), regress its excess returns on factor returns over the full time series to estimate factor loadings (\(\hat{\beta}_i\)).
Step 2 – Cross-Sectional Regressions (T regressions): For each time period \(t\), regress the cross-section of asset returns on the estimated betas to get period-specific risk premia (\(\hat{\lambda}_t\)).
Step 3 – Time-Series Average: Average the \(T\) cross-sectional estimates to obtain the final risk premium estimates, and use their time-series standard deviation for inference.

2 Load Required Packages

# Install if not already installed
packages <- c("tidyverse", "broom", "knitr", "kableExtra", "ggplot2", "lmtest", "sandwich")
installed <- packages %in% rownames(installed.packages())
if (any(!installed)) install.packages(packages[!installed])

library(tidyverse)
library(broom)
library(knitr)
library(kableExtra)
library(ggplot2)
library(lmtest)
library(sandwich)

3 Data Simulation

Since we are replicating the methodology, we simulate a panel dataset of 25 stocks over 60 months with two factors: Market (Mkt) and SMB (Small Minus Big), consistent with the Fama-French framework.

set.seed(42)

N <- 25   # Number of stocks
T <- 60   # Number of time periods (months)

# ── True parameters ──────────────────────────────────────────────────────────
true_lambda_mkt <- 0.06   # True market risk premium (annualised ~6%)
true_lambda_smb <- 0.03   # True SMB risk premium

# ── Simulate factor returns ───────────────────────────────────────────────────
factor_data <- tibble(
  t        = 1:T,
  mkt      = rnorm(T, mean = 0.005, sd = 0.04),   # Monthly market excess return
  smb      = rnorm(T, mean = 0.002, sd = 0.03)    # Monthly SMB return
)

# ── Simulate stock-level betas ────────────────────────────────────────────────
stock_betas <- tibble(
  stock   = paste0("Stock_", sprintf("%02d", 1:N)),
  beta_mkt = runif(N, 0.5, 1.8),
  beta_smb = runif(N, -0.5, 1.2)
)

# ── Simulate stock returns using factor model + idiosyncratic noise ───────────
panel_data <- expand_grid(stock = stock_betas$stock, t = 1:T) %>%
  left_join(stock_betas, by = "stock") %>%
  left_join(factor_data,  by = "t") %>%
  mutate(
    alpha  = rnorm(n(), mean = 0.001, sd = 0.002),
    eps    = rnorm(n(), mean = 0, sd = 0.03),
    ret    = alpha + beta_mkt * mkt + beta_smb * smb + eps
  )

# Preview
head(panel_data, 10) %>%
  kable(caption = "Panel Data (first 10 rows)", digits = 4) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE)

Panel Data (first 10 rows)
stock	t	beta_mkt	beta_smb	mkt	smb	alpha	eps	ret
Stock_01	1	0.5879	-0.2701	0.0598	-0.0090	0.0032	0.0064	0.0473
Stock_01	2	0.5879	-0.2701	-0.0176	0.0076	0.0000	0.0510	0.0387
Stock_01	3	0.5879	-0.2701	0.0195	0.0195	0.0001	-0.0002	0.0061
Stock_01	4	0.5879	-0.2701	0.0303	0.0440	0.0024	-0.0245	-0.0162
Stock_01	5	0.5879	-0.2701	0.0212	-0.0198	-0.0011	-0.0015	0.0152
Stock_01	6	0.5879	-0.2701	0.0008	0.0411	0.0009	0.0024	-0.0073
Stock_01	7	0.5879	-0.2701	0.0655	0.0121	-0.0021	-0.0468	-0.0137
Stock_01	8	0.5879	-0.2701	0.0012	0.0332	0.0033	-0.0051	-0.0101
Stock_01	9	0.5879	-0.2701	0.0857	0.0296	0.0005	-0.0126	0.0303
Stock_01	10	0.5879	-0.2701	0.0025	0.0236	0.0001	0.0147	0.0099

4 Step 1: Time-Series Regressions (N Regressions)

For each stock \(i\), we estimate:

\[r_{it} = \alpha_i + \beta_{i,\text{mkt}} \cdot \text{Mkt}_t + \beta_{i,\text{smb}} \cdot \text{SMB}_t + \varepsilon_{it}\]

This gives us the estimated factor loadings \(\hat{\beta}_{i,\text{mkt}}\) and \(\hat{\beta}_{i,\text{smb}}\).

# Run one OLS regression per stock
ts_results <- panel_data %>%
  group_by(stock) %>%
  do(tidy(lm(ret ~ mkt + smb, data = .))) %>%
  ungroup()

# Extract betas (slope coefficients only)
estimated_betas <- ts_results %>%
  filter(term %in% c("mkt", "smb")) %>%
  select(stock, term, estimate) %>%
  pivot_wider(names_from = term, values_from = estimate) %>%
  rename(beta_mkt_hat = mkt, beta_smb_hat = smb)

estimated_betas %>%
  kable(caption = "Step 1: Estimated Factor Loadings per Stock", digits = 4) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE) %>%
  scroll_box(height = "400px")

Step 1: Estimated Factor Loadings per Stock
stock	beta_mkt_hat	beta_smb_hat
Stock_01	0.4929	-0.2392
Stock_02	1.2404	1.0472
Stock_03	0.5443	-0.1766
Stock_04	0.6518	1.2262
Stock_05	1.1945	1.0912
Stock_06	1.1874	-0.2539
Stock_07	0.7203	0.7942
Stock_08	0.7117	0.2026
Stock_09	1.2150	-0.4295
Stock_10	1.6167	0.9617
Stock_11	0.9409	0.0444
Stock_12	0.7856	0.0674
Stock_13	0.7912	0.0591
Stock_14	1.1561	0.3142
Stock_15	0.4728	-0.0379
Stock_16	1.4387	0.3456
Stock_17	0.7244	-0.4609
Stock_18	1.1138	0.1458
Stock_19	1.1585	1.0218
Stock_20	1.0756	0.0949
Stock_21	1.4055	0.0456
Stock_22	0.7792	0.1952
Stock_23	0.8831	-0.3394
Stock_24	1.8404	0.0153
Stock_25	1.1748	-0.5045

4.1 Visualise Estimated Betas

estimated_betas_long <- estimated_betas %>%
  pivot_longer(cols = c(beta_mkt_hat, beta_smb_hat),
               names_to = "factor", values_to = "beta") %>%
  mutate(factor = recode(factor,
                         beta_mkt_hat = "Market Beta",
                         beta_smb_hat = "SMB Beta"))

ggplot(estimated_betas_long, aes(x = reorder(stock, beta), y = beta, fill = factor)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~factor, scales = "free_x") +
  coord_flip() +
  labs(title = "Estimated Factor Loadings (Step 1)",
       x = "Stock", y = "Beta Estimate") +
  theme_minimal(base_size = 11) +
  scale_fill_manual(values = c("Market Beta" = "#2c7bb6", "SMB Beta" = "#d7191c"))

5 Step 2: Cross-Sectional Regressions (T Regressions)

For each time period \(t\), we regress the cross-section of stock returns on the estimated betas:

\[r_{it} = \lambda_{0t} + \lambda_{\text{mkt},t} \cdot \hat{\beta}_{i,\text{mkt}} + \lambda_{\text{smb},t} \cdot \hat{\beta}_{i,\text{smb}} + \alpha_{it}\]

This yields T estimates of the risk premia \(\hat{\lambda}_t\).

# Merge estimated betas back into the panel
panel_with_betas <- panel_data %>%
  left_join(estimated_betas, by = "stock")

# Run one OLS regression per time period
cs_results <- panel_with_betas %>%
  group_by(t) %>%
  do(tidy(lm(ret ~ beta_mkt_hat + beta_smb_hat, data = .))) %>%
  ungroup()

# Extract lambdas
lambdas <- cs_results %>%
  select(t, term, estimate) %>%
  pivot_wider(names_from = term, values_from = estimate) %>%
  rename(
    lambda0      = `(Intercept)`,
    lambda_mkt   = beta_mkt_hat,
    lambda_smb   = beta_smb_hat
  )

head(lambdas, 10) %>%
  kable(caption = "Step 2: Cross-Sectional Risk Premia per Period (first 10)", digits = 5) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE)

Step 2: Cross-Sectional Risk Premia per Period (first 10)
t	lambda0	lambda_mkt	lambda_smb
1	0.01631	0.03845	-0.00519
2	0.01366	-0.01663	-0.01152
3	0.00268	0.01754	0.03018
4	-0.00558	0.03368	0.03063
5	0.02625	-0.00256	-0.00634
6	-0.01360	0.00272	0.03813
7	0.01163	0.04956	0.01215
8	-0.00983	-0.00198	0.05761
9	0.01899	0.07479	0.01745
10	-0.00223	0.00426	0.02850

5.1 Visualise Risk Premia Over Time

lambdas_long <- lambdas %>%
  pivot_longer(cols = c(lambda_mkt, lambda_smb),
               names_to = "factor", values_to = "lambda") %>%
  mutate(factor = recode(factor,
                         lambda_mkt = "Market Risk Premium",
                         lambda_smb = "SMB Risk Premium"))

ggplot(lambdas_long, aes(x = t, y = lambda, colour = factor)) +
  geom_line(linewidth = 0.8) +
  geom_hline(yintercept = 0, linetype = "dashed", colour = "grey40") +
  facet_wrap(~factor, ncol = 1, scales = "free_y") +
  labs(title = "Step 2: Time-Varying Risk Premia from Cross-Sectional Regressions",
       x = "Time Period (Month)", y = expression(hat(lambda)[t])) +
  theme_minimal(base_size = 11) +
  theme(legend.position = "none") +
  scale_colour_manual(values = c("Market Risk Premium" = "#2c7bb6",
                                 "SMB Risk Premium"    = "#d7191c"))

6 Step 3: Time-Series Average of Risk Premia

The final Fama-MacBeth estimates are the time-series means of the \(T\) cross-sectional estimates. Standard errors are based on the time-series standard deviation (Newey-West adjusted for serial correlation).

\[\hat{\lambda} = \frac{1}{T} \sum_{t=1}^{T} \hat{\lambda}_t, \qquad \text{SE}(\hat{\lambda}) = \frac{s(\hat{\lambda}_t)}{\sqrt{T}}\]

T_obs <- nrow(lambdas)

fm_results <- lambdas %>%
  summarise(
    across(c(lambda0, lambda_mkt, lambda_smb),
           list(
             Mean  = ~mean(.x, na.rm = TRUE),
             SD    = ~sd(.x, na.rm = TRUE),
             SE    = ~sd(.x, na.rm = TRUE) / sqrt(T_obs),
             t_stat = ~mean(.x, na.rm = TRUE) / (sd(.x, na.rm = TRUE) / sqrt(T_obs)),
             p_value = ~2 * pt(-abs(mean(.x, na.rm = TRUE) /
                                      (sd(.x, na.rm = TRUE) / sqrt(T_obs))),
                                df = T_obs - 1)
           ),
           .names = "{.col}__{.fn}"
    )
  ) %>%
  pivot_longer(everything(), names_to = c("Parameter", "Stat"), names_sep = "__") %>%
  pivot_wider(names_from = Stat, values_from = value) %>%
  mutate(
    Significance = case_when(
      p_value < 0.01 ~ "***",
      p_value < 0.05 ~ "**",
      p_value < 0.10 ~ "*",
      TRUE           ~ ""
    ),
    Parameter = recode(Parameter,
                       lambda0     = "Intercept (λ₀)",
                       lambda_mkt  = "Market Risk Premium (λ_mkt)",
                       lambda_smb  = "SMB Risk Premium (λ_smb)")
  )

fm_results %>%
  mutate(across(where(is.numeric), ~round(.x, 5))) %>%
  kable(caption = "Step 3: Fama-MacBeth Final Estimates",
        col.names = c("Parameter", "Mean", "Std Dev", "Std Error",
                      "t-Statistic", "p-Value", "Sig.")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE) %>%
  column_spec(2, bold = TRUE) %>%
  footnote(general = "*** p<0.01, ** p<0.05, * p<0.10. Standard errors computed as SD/sqrt(T).")

Step 3: Fama-MacBeth Final Estimates
Parameter	Mean	Std Dev	Std Error	t-Statistic	p-Value
Intercept (λ₀)	0.00282	0.01648	0.00213	1.32464	0.19040
Market Risk Premium (λ_mkt)	0.00215	0.04832	0.00624	0.34400	0.73207
SMB Risk Premium (λ_smb)	0.00386	0.02959	0.00382	1.00913	0.31703
Note:
* p<0.01, p<0.05, * p<0.10. Standard errors computed as SD/sqrt(T).

7 Summary: Fama-MacBeth Regression Workflow

summary_df <- tibble(
  Step        = c("Step 1", "Step 2", "Step 3"),
  Description = c(
    "Time-Series Regressions (N = 25 regressions, one per stock)",
    "Cross-Sectional Regressions (T = 60 regressions, one per month)",
    "Time-Series Average of cross-sectional coefficients"
  ),
  Output      = c(
    "Estimated betas: β̂_i,mkt and β̂_i,smb",
    "Period-specific risk premia: λ̂_mkt,t and λ̂_smb,t",
    "Final risk premia estimates with t-statistics and p-values"
  )
)

summary_df %>%
  kable(caption = "Fama-MacBeth Regression – Three-Step Summary") %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = TRUE) %>%
  column_spec(1, bold = TRUE, color = "white", background = "#2c7bb6")

Fama-MacBeth Regression – Three-Step Summary
Step	Description	Output
Step 1	Time-Series Regressions (N = 25 regressions, one per stock)	Estimated betas: β̂_i,mkt and β̂_i,smb
Step 2	Cross-Sectional Regressions (T = 60 regressions, one per month)	Period-specific risk premia: λ̂_mkt,t and λ̂_smb,t
Step 3	Time-Series Average of cross-sectional coefficients	Final risk premia estimates with t-statistics and p-values

8 Interpretation

Market Risk Premium (\(\hat{\lambda}_{\text{mkt}}\)): The estimated compensation per unit of market beta. A positive and significant value supports the CAPM.
SMB Risk Premium (\(\hat{\lambda}_{\text{smb}}\)): The estimated compensation for exposure to the size factor.
The Fama-MacBeth standard errors naturally account for cross-sectional correlation in returns because each cross-sectional regression is run separately per period.

Note: For robustness in real applications, consider Newey-West (1987) adjusted standard errors to account for autocorrelation in the \(\hat{\lambda}_t\) series.

9 References

Fama, E. F., & MacBeth, J. D. (1973). Risk, return, and equilibrium: Empirical tests. Journal of Political Economy, 81(3), 607–636.
Fama, E. F., & French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33(1), 3–56.
Newey, W. K., & West, K. D. (1987). A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica, 55(3), 703–708.

Fama-MacBeth Regression in R

The Data Hall

2026.5