Fama-MacBeth Regression in R

Overview

The Fama-MacBeth (1973) procedure provides standard errors corrected for cross-sectional correlation. It is preferred when we have more cross-sections but less time-series data.

Two-Step Procedure

Step	Description
Step 0	Time Series Regression — Run N time-series regressions (one per asset). Regress each asset’s return on risk factors to obtain factor loadings (betas).
Step 1	Cross-Sectional Regression — Run T cross-sectional regressions (one per time period). Regress asset returns on the betas from Step 0.
Step 2	Average the Coefficients — Take the time-series average of the cross-sectional slope estimates and test if they are significantly different from zero.

Load Packages

library(broom)
library(tidyverse)

Load Data

data <- read.csv("data.csv")

head(data)

##   symbol      date            ri          MKT     SMB     HML
## 1   AAPL  4-Jan-11  0.0052062641 -0.001313890 -0.0065  0.0008
## 2   AAPL  5-Jan-11  0.0081462879  0.004994670  0.0018  0.0013
## 3   AAPL  6-Jan-11 -0.0008082435 -0.002125228  0.0001 -0.0025
## 4   AAPL  7-Jan-11  0.0071360567 -0.001846505  0.0022 -0.0006
## 5   AAPL 10-Jan-11  0.0186572890 -0.001377275  0.0041  0.0039
## 6   AAPL 11-Jan-11 -0.0023681840  0.003718222  0.0016  0.0036

The dataset contains daily returns for 6 stocks (AAPL, FORD, GE, GM, IBM, MSFT) from 4-Jan-11 to 31-Dec-15, along with the three Fama-French factors: MKT, SMB, and HML.

glimpse(data)

## Rows: 7,542
## Columns: 6
## $ symbol <chr> "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL",…
## $ date   <chr> "4-Jan-11", "5-Jan-11", "6-Jan-11", "7-Jan-11", "10-Jan-11", "1…
## $ ri     <dbl> 0.0052062641, 0.0081462879, -0.0008082435, 0.0071360567, 0.0186…
## $ MKT    <dbl> -0.0013138901, 0.0049946699, -0.0021252276, -0.0018465050, -0.0…
## $ SMB    <dbl> -0.0065, 0.0018, 0.0001, 0.0022, 0.0041, 0.0016, 0.0031, -0.002…
## $ HML    <dbl> 0.0008, 0.0013, -0.0025, -0.0006, 0.0039, 0.0036, 0.0000, -0.00…

Step 0: Time Series Regressions (N regressions)

For each stock, regress its return (ri) on the three Fama-French factors to obtain the factor loadings (betas).

step0 <- data %>% 
  nest(data = c(date, ri, MKT, SMB, HML)) %>% 
  mutate(estimates = map(
    data,
    ~tidy(lm(ri ~ MKT + SMB + HML, data = .x))
  )) %>% 
  unnest(estimates) %>% 
  select(symbol, estimate, term) %>% 
  pivot_wider(names_from  = term,
              values_from = estimate) %>% 
  select(symbol, 
         b_MKT = MKT, 
         b_HML = HML, 
         b_SMB = SMB)

step0

## # A tibble: 6 × 4
##   symbol b_MKT   b_HML    b_SMB
##   <chr>  <dbl>   <dbl>    <dbl>
## 1 AAPL   0.900 -0.0578  0.0685 
## 2 FORD   0.513  0.138  -0.264  
## 3 GE     1.08   0.0902  0.0994 
## 4 GM     1.29  -0.0222  0.00390
## 5 IBM    0.817 -0.0121  0.0336 
## 6 MSFT   0.966 -0.0641  0.0582

These are the estimated factor betas for each stock over the full sample period.

# Join betas back to original data
step0 <- data %>% 
  left_join(step0, by = "symbol")

head(step0)

##   symbol      date            ri          MKT     SMB     HML     b_MKT
## 1   AAPL  4-Jan-11  0.0052062641 -0.001313890 -0.0065  0.0008 0.9000063
## 2   AAPL  5-Jan-11  0.0081462879  0.004994670  0.0018  0.0013 0.9000063
## 3   AAPL  6-Jan-11 -0.0008082435 -0.002125228  0.0001 -0.0025 0.9000063
## 4   AAPL  7-Jan-11  0.0071360567 -0.001846505  0.0022 -0.0006 0.9000063
## 5   AAPL 10-Jan-11  0.0186572890 -0.001377275  0.0041  0.0039 0.9000063
## 6   AAPL 11-Jan-11 -0.0023681840  0.003718222  0.0016  0.0036 0.9000063
##         b_HML      b_SMB
## 1 -0.05782126 0.06853513
## 2 -0.05782126 0.06853513
## 3 -0.05782126 0.06853513
## 4 -0.05782126 0.06853513
## 5 -0.05782126 0.06853513
## 6 -0.05782126 0.06853513

Step 1: Cross-Sectional Regressions (T regressions)

For each date, regress asset returns on the betas estimated in Step 0 to obtain cross-sectional risk premia estimates.

step1 <- step0 %>% 
  nest(data = c(symbol, ri, b_MKT, b_SMB, b_HML)) %>% 
  mutate(estimates = map(
    data,
    ~tidy(lm(ri ~ b_MKT + b_SMB + b_HML, data = .x))
  )) %>%
  unnest(estimates) %>% 
  select(date, estimate, term) %>% 
  pivot_wider(names_from  = term,
              values_from = estimate) %>% 
  select(date, b_MKT, b_HML, b_SMB)

head(step1)

## # A tibble: 6 × 4
##   date         b_MKT   b_HML    b_SMB
##   <chr>        <dbl>   <dbl>    <dbl>
## 1 4-Jan-11   0.0416   0.0574 -0.0255 
## 2 5-Jan-11  -0.0113   0.0628 -0.158  
## 3 6-Jan-11   0.0373  -0.173   0.00703
## 4 7-Jan-11   0.0127  -0.0642  0.0323 
## 5 10-Jan-11 -0.0366   0.0586  0.0171 
## 6 11-Jan-11  0.00409  0.0899 -0.0954

Step 2: Time-Series Averages and Hypothesis Tests

Average the cross-sectional coefficients and test whether each risk premium is significantly different from zero.

Market Factor (MKT)

t.test(step1$b_MKT, mu = 0)

## 
##  One Sample t-test
## 
## data:  step1$b_MKT
## t = -0.37879, df = 1256, p-value = 0.7049
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.002546371  0.001722208
## sample estimates:
##     mean of x 
## -0.0004120813

Size Factor (SMB)

t.test(step1$b_SMB, mu = 0)

## 
##  One Sample t-test
## 
## data:  step1$b_SMB
## t = 0.97712, df = 1256, p-value = 0.3287
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.003711466  0.011076953
## sample estimates:
##   mean of x 
## 0.003682744

Value Factor (HML)

t.test(step1$b_HML, mu = 0)

## 
##  One Sample t-test
## 
## data:  step1$b_HML
## t = -0.18044, df = 1256, p-value = 0.8568
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.005541205  0.004607776
## sample estimates:
##     mean of x 
## -0.0004667146

Summary of Risk Premia Estimates

make_row <- function(x, name) {
  tt <- t.test(x, mu = 0)
  data.frame(
    Factor       = name,
    Mean         = mean(x, na.rm = TRUE),
    StdDev       = sd(x, na.rm = TRUE),
    t_statistic  = as.numeric(tt$statistic),
    p_value      = tt$p.value,
    Significant  = ifelse(tt$p.value < 0.05, "Yes *", "No"),
    stringsAsFactors = FALSE
  )
}

summary_table <- bind_rows(
  make_row(step1$b_MKT, "b_MKT"),
  make_row(step1$b_SMB, "b_SMB"),
  make_row(step1$b_HML, "b_HML")
)

knitr::kable(summary_table, digits = 5,
             caption = "Fama-MacBeth Risk Premia Estimates",
             col.names = c("Factor", "Mean", "Std Dev", "t-Statistic", "p-Value", "Significant (5%)"))

Fama-MacBeth Risk Premia Estimates
Factor	Mean	Std Dev	t-Statistic	p-Value	Significant (5%)
b_MKT	-0.00041	0.03857	-0.37879	0.70491	No
b_SMB	0.00368	0.13363	0.97712	0.32870	No
b_HML	-0.00047	0.09171	-0.18044	0.85684	No

Visualization of Time-Varying Risk Premia

step1 %>%
  pivot_longer(cols = c(b_MKT, b_SMB, b_HML),
               names_to  = "Factor",
               values_to = "Estimate") %>%
  ggplot(aes(x = as.Date(date, format = "%d-%b-%y"),
             y = Estimate, colour = Factor)) +
  geom_line(alpha = 0.7) +
  geom_hline(yintercept = 0, linetype = "dashed", colour = "black") +
  facet_wrap(~Factor, scales = "free_y", ncol = 1) +
  labs(
    title    = "Fama-MacBeth Cross-Sectional Risk Premia Over Time",
    subtitle = "Daily estimates from cross-sectional regressions",
    x        = "Date",
    y        = "Risk Premium Estimate",
    colour   = "Factor"
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none",
        strip.text      = element_text(face = "bold"))

Interpretation

MKT: The average market risk premium estimate. A positive and significant value would indicate that market beta is rewarded in the cross-section.
SMB: The size factor premium. Positive values indicate that smaller stocks earn higher returns.
HML: The value factor premium. Positive values indicate that value stocks (high book-to-market) earn higher returns.

The t-tests assess whether each factor’s average cross-sectional price is statistically different from zero — the central question of the Fama-MacBeth methodology.

Replication based on the tutorial video: Fama-MacBeth Regression in R