1 Introduction

The Fama-MacBeth (1973) procedure is a two-pass regression method widely used in empirical asset pricing to estimate risk premia. It provides standard errors corrected for cross-sectional correlation, making it especially useful when we have more cross-sections than time-series observations.

Three-step procedure:

  1. Step 0 – Time-Series Regressions: For each asset/portfolio, regress returns on risk factors (MKT, SMB, HML) to obtain factor loadings (betas).
  2. Step 1 – Cross-Sectional Regressions: For each time period t, regress asset returns on the betas from Step 0.
  3. Step 2 – Average the Coefficients: Take the time-series average of the cross-sectional coefficients and test significance using t-tests.

2 Load Required Packages

library(broom)
library(tidyverse)

3 Load Data

data <- read.csv("data.csv")

# Preview the data
head(data, 10)
##    symbol      date            ri          MKT     SMB     HML
## 1    AAPL  4-Jan-11  0.0052062641 -0.001313890 -0.0065  0.0008
## 2    AAPL  5-Jan-11  0.0081462879  0.004994670  0.0018  0.0013
## 3    AAPL  6-Jan-11 -0.0008082435 -0.002125228  0.0001 -0.0025
## 4    AAPL  7-Jan-11  0.0071360567 -0.001846505  0.0022 -0.0006
## 5    AAPL 10-Jan-11  0.0186572890 -0.001377275  0.0041  0.0039
## 6    AAPL 11-Jan-11 -0.0023681840  0.003718222  0.0016  0.0036
## 7    AAPL 12-Jan-11  0.0081042033  0.008967269  0.0031  0.0000
## 8    AAPL 13-Jan-11  0.0036515722 -0.001712249 -0.0026 -0.0044
## 9    AAPL 14-Jan-11  0.0080672745  0.007357425 -0.0010 -0.0073
## 10   AAPL 18-Jan-11 -0.0227252300  0.001375442  0.0056  0.0015
# Data structure
glimpse(data)
## Rows: 7,542
## Columns: 6
## $ symbol <chr> "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL",…
## $ date   <chr> "4-Jan-11", "5-Jan-11", "6-Jan-11", "7-Jan-11", "10-Jan-11", "1…
## $ ri     <dbl> 0.0052062641, 0.0081462879, -0.0008082435, 0.0071360567, 0.0186…
## $ MKT    <dbl> -0.0013138901, 0.0049946699, -0.0021252276, -0.0018465050, -0.0…
## $ SMB    <dbl> -0.0065, 0.0018, 0.0001, 0.0022, 0.0041, 0.0016, 0.0031, -0.002…
## $ HML    <dbl> 0.0008, 0.0013, -0.0025, -0.0006, 0.0039, 0.0036, 0.0000, -0.00…
# Summary statistics
summary(data)
##     symbol              date                 ri            
##  Length:7542        Length:7542        Min.   :-0.3908663  
##  Class :character   Class :character   1st Qu.:-0.0087263  
##  Mode  :character   Mode  :character   Median : 0.0000000  
##                                        Mean   : 0.0002109  
##                                        3rd Qu.: 0.0093507  
##                                        Max.   : 0.9614112  
##       MKT                  SMB                  HML          
##  Min.   :-0.0689583   Min.   :-1.660e-02   Min.   :-0.01490  
##  1st Qu.:-0.0040125   1st Qu.:-3.100e-03   1st Qu.:-0.00260  
##  Median : 0.0005438   Median : 1.000e-04   Median : 0.00000  
##  Mean   : 0.0003774   Mean   : 2.227e-06   Mean   : 0.00013  
##  3rd Qu.: 0.0052641   3rd Qu.: 3.100e-03   3rd Qu.: 0.00260  
##  Max.   : 0.0463174   Max.   : 2.490e-02   Max.   : 0.02250

4 Step 0: Time-Series Regressions (Estimate Betas)

For each stock/asset, regress its returns (ri) on the three Fama-French factors — Market excess return (MKT), Small-Minus-Big (SMB), and High-Minus-Low (HML) — over all time periods.

step0 <- data %>% 
  nest(data = c(date, ri, MKT, SMB, HML)) %>% 
  mutate(estimates = map(
    data,
    ~tidy(lm(ri ~ MKT + SMB + HML, data = .x))
  )) %>% 
  unnest(estimates) %>% 
  select(symbol, estimate, term) %>% 
  pivot_wider(names_from  = term,
              values_from = estimate) %>% 
  select(symbol, 
         b_MKT = MKT, 
         b_HML = HML, 
         b_SMB = SMB)

# View estimated betas
step0
## # A tibble: 6 × 4
##   symbol b_MKT   b_HML    b_SMB
##   <chr>  <dbl>   <dbl>    <dbl>
## 1 AAPL   0.900 -0.0578  0.0685 
## 2 FORD   0.513  0.138  -0.264  
## 3 GE     1.08   0.0902  0.0994 
## 4 GM     1.29  -0.0222  0.00390
## 5 IBM    0.817 -0.0121  0.0336 
## 6 MSFT   0.966 -0.0641  0.0582

4.1 Merge betas back to the panel data

step0 <- data %>% 
  left_join(step0, by = "symbol")

head(step0, 10)
##    symbol      date            ri          MKT     SMB     HML     b_MKT
## 1    AAPL  4-Jan-11  0.0052062641 -0.001313890 -0.0065  0.0008 0.9000063
## 2    AAPL  5-Jan-11  0.0081462879  0.004994670  0.0018  0.0013 0.9000063
## 3    AAPL  6-Jan-11 -0.0008082435 -0.002125228  0.0001 -0.0025 0.9000063
## 4    AAPL  7-Jan-11  0.0071360567 -0.001846505  0.0022 -0.0006 0.9000063
## 5    AAPL 10-Jan-11  0.0186572890 -0.001377275  0.0041  0.0039 0.9000063
## 6    AAPL 11-Jan-11 -0.0023681840  0.003718222  0.0016  0.0036 0.9000063
## 7    AAPL 12-Jan-11  0.0081042033  0.008967269  0.0031  0.0000 0.9000063
## 8    AAPL 13-Jan-11  0.0036515722 -0.001712249 -0.0026 -0.0044 0.9000063
## 9    AAPL 14-Jan-11  0.0080672745  0.007357425 -0.0010 -0.0073 0.9000063
## 10   AAPL 18-Jan-11 -0.0227252300  0.001375442  0.0056  0.0015 0.9000063
##          b_HML      b_SMB
## 1  -0.05782126 0.06853513
## 2  -0.05782126 0.06853513
## 3  -0.05782126 0.06853513
## 4  -0.05782126 0.06853513
## 5  -0.05782126 0.06853513
## 6  -0.05782126 0.06853513
## 7  -0.05782126 0.06853513
## 8  -0.05782126 0.06853513
## 9  -0.05782126 0.06853513
## 10 -0.05782126 0.06853513

5 Step 1: Cross-Sectional Regressions

For each time period (date), regress the cross-section of asset returns on the betas estimated in Step 0. This yields a set of monthly lambda estimates.

step1 <- step0 %>% 
  nest(data = c(symbol, ri, b_MKT, b_SMB, b_HML)) %>% 
  mutate(estimates = map(
    data,
    ~tidy(lm(ri ~ b_MKT + b_SMB + b_HML, data = .x))
  )) %>%
  unnest(estimates) %>% 
  select(date, estimate, term) %>% 
  pivot_wider(names_from  = term,
              values_from = estimate) %>% 
  select(date, b_MKT, b_HML, b_SMB)

# View monthly cross-sectional lambda estimates
head(step1, 10)
## # A tibble: 10 × 4
##    date         b_MKT   b_HML    b_SMB
##    <chr>        <dbl>   <dbl>    <dbl>
##  1 4-Jan-11   0.0416   0.0574 -0.0255 
##  2 5-Jan-11  -0.0113   0.0628 -0.158  
##  3 6-Jan-11   0.0373  -0.173   0.00703
##  4 7-Jan-11   0.0127  -0.0642  0.0323 
##  5 10-Jan-11 -0.0366   0.0586  0.0171 
##  6 11-Jan-11  0.00409  0.0899 -0.0954 
##  7 12-Jan-11 -0.0554   0.0430 -0.164  
##  8 13-Jan-11 -0.0194   0.0256  0.00181
##  9 14-Jan-11 -0.0165   0.0392  0.0633 
## 10 18-Jan-11  0.0101  -0.0900  0.0525

6 Step 2: Time-Series Average of Coefficients

Take the time-series average of the cross-sectional coefficients and test whether they are significantly different from zero using t-tests.

6.1 Market Factor (MKT)

t.test(step1$b_MKT, mu = 0)
## 
##  One Sample t-test
## 
## data:  step1$b_MKT
## t = -0.37879, df = 1256, p-value = 0.7049
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.002546371  0.001722208
## sample estimates:
##     mean of x 
## -0.0004120813

6.2 Size Factor (SMB)

t.test(step1$b_SMB, mu = 0)
## 
##  One Sample t-test
## 
## data:  step1$b_SMB
## t = 0.97712, df = 1256, p-value = 0.3287
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.003711466  0.011076953
## sample estimates:
##   mean of x 
## 0.003682744

6.3 Value Factor (HML)

t.test(step1$b_HML, mu = 0)
## 
##  One Sample t-test
## 
## data:  step1$b_HML
## t = -0.18044, df = 1256, p-value = 0.8568
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.005541205  0.004607776
## sample estimates:
##     mean of x 
## -0.0004667146

7 Results Summary

# Compute summary statistics for all three lambdas
summary_table <- step1 %>%
  select(b_MKT, b_SMB, b_HML) %>%
  pivot_longer(everything(), names_to = "Factor", values_to = "Lambda") %>%
  group_by(Factor) %>%
  summarise(
    Mean   = mean(Lambda, na.rm = TRUE),
    StdDev = sd(Lambda, na.rm = TRUE),
    SE     = StdDev / sqrt(n()),
    t_stat = Mean / SE,
    p_value = 2 * pt(-abs(t_stat), df = n() - 1),
    .groups = "drop"
  ) %>%
  mutate(
    Factor = recode(Factor,
                    b_MKT = "Market (MKT)",
                    b_SMB = "SMB",
                    b_HML = "HML"),
    Significance = case_when(
      p_value < 0.01 ~ "***",
      p_value < 0.05 ~ "**",
      p_value < 0.10 ~ "*",
      TRUE           ~ ""
    )
  )

summary_table
## # A tibble: 3 × 7
##   Factor            Mean StdDev      SE t_stat p_value Significance
##   <chr>            <dbl>  <dbl>   <dbl>  <dbl>   <dbl> <chr>       
## 1 HML          -0.000467 0.0917 0.00259 -0.180   0.857 ""          
## 2 Market (MKT) -0.000412 0.0386 0.00109 -0.379   0.705 ""          
## 3 SMB           0.00368  0.134  0.00377  0.977   0.329 ""

7.1 Visualise Monthly Lambda Estimates Over Time

step1 %>%
  pivot_longer(cols = c(b_MKT, b_SMB, b_HML),
               names_to = "Factor", values_to = "Lambda") %>%
  mutate(Factor = recode(Factor,
                          b_MKT = "Market (MKT)",
                          b_SMB = "SMB",
                          b_HML = "HML")) %>%
  ggplot(aes(x = date, y = Lambda, colour = Factor, group = Factor)) +
  geom_line() +
  geom_hline(yintercept = 0, linetype = "dashed", colour = "grey50") +
  facet_wrap(~Factor, scales = "free_y", ncol = 1) +
  labs(title = "Monthly Cross-Sectional Lambda Estimates Over Time",
       x = "Date", y = "Lambda Estimate") +
  theme_minimal(base_size = 11) +
  theme(legend.position = "none",
        axis.text.x = element_text(angle = 45, hjust = 1))

7.2 Risk Premia Bar Chart

summary_table %>%
  ggplot(aes(x = Factor, y = Mean, fill = Factor)) +
  geom_col(width = 0.5) +
  geom_errorbar(aes(ymin = Mean - 1.96 * SE,
                    ymax = Mean + 1.96 * SE),
                width = 0.2, linewidth = 0.8) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(title    = "Fama-MacBeth Risk Premia Estimates",
       subtitle = "Error bars = 95% Confidence Interval",
       x = "Factor", y = "Mean Lambda (Risk Premium)") +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none")


8 Interpretation

Factor Mean Lambda t-statistic Interpretation
MKT -4^{-4} -0.379 Market risk premium
SMB 0.0037 0.977 Size premium
HML -5^{-4} -0.18 Value premium
  • A significant positive lambda means the factor carries a positive risk premium (investors are compensated for bearing that risk).
  • The Fama-MacBeth standard errors are robust to cross-sectional correlation in residuals, which is a key advantage over pooled OLS.

9 References

  • Fama, E. F., & MacBeth, J. D. (1973). Risk, return, and equilibrium: Empirical tests. Journal of Political Economy, 81(3), 607–636.
  • The Data Hall (2024). Fama-MacBeth Regression in R [Video]. https://www.youtube.com/watch?v=dLvjmYj-PVA