1 Introduction

The Fama-MacBeth (1973) regression provides standard errors corrected for cross-sectional correlation. It works best when the number of cross-sections is large but the time series is short.

The procedure has two estimation steps, plus a preparatory Step 0:

Note. Fama-MacBeth cannot be used with cross-sectionally invariant variables (e.g. country-level GDP).

2 Packages

# install.packages(c("broom", "tidyverse"))
library(broom)
library(tidyverse)

3 Data

The sample data set data.csv contains daily returns for six U.S. stocks (AAPL, FORD, GE, GM, IBM, MSFT) over 2011-2015, together with the Fama-French market (MKT), small-minus-big (SMB) and high-minus-low (HML) factors.

data <- read.csv("data.csv")
head(data)
cat("Rows         :", nrow(data), "\n")
#> Rows         : 7542
cat("Symbols      :", paste(sort(unique(data$symbol)), collapse = ", "), "\n")
#> Symbols      : AAPL, FORD, GE, GM, IBM, MSFT
cat("Unique dates :", length(unique(data$date)), "\n")
#> Unique dates : 1257

4 Step 0 — N time-series regressions

For each symbol we run

\[ r_{i,t} \;=\; \alpha_i \;+\; \beta_{i,\text{MKT}}\,\text{MKT}_t \;+\; \beta_{i,\text{SMB}}\,\text{SMB}_t \;+\; \beta_{i,\text{HML}}\,\text{HML}_t \;+\; \varepsilon_{i,t} \]

using nest() + map() to obtain one regression per stock. We keep only the slope coefficients and rename them as factor loadings \(b_{\text{MKT}}\), \(b_{\text{SMB}}\), \(b_{\text{HML}}\).

step0 <- data %>%
  nest(data = c(date, ri, MKT, SMB, HML)) %>%
  mutate(estimates = map(
    data,
    ~ tidy(lm(ri ~ MKT + SMB + HML, data = .x))
  )) %>%
  unnest(estimates) %>%
  select(symbol, estimate, term) %>%
  pivot_wider(names_from  = term,
              values_from = estimate) %>%
  select(symbol,
         b_MKT = MKT,
         b_HML = HML,
         b_SMB = SMB)

step0

4.1 Merge factor loadings back to the panel

To run Step 1 each observation must carry its own stock’s factor loadings:

step0 <- data %>%
  left_join(step0, by = "symbol")

head(step0)

5 Step 1 — T cross-sectional regressions

For each date we now regress the cross-section of returns on the estimated factor loadings:

\[ r_{i,t} \;=\; \gamma_{0,t} \;+\; \gamma_{\text{MKT},t}\,\hat b_{i,\text{MKT}} \;+\; \gamma_{\text{SMB},t}\,\hat b_{i,\text{SMB}} \;+\; \gamma_{\text{HML},t}\,\hat b_{i,\text{HML}} \;+\; u_{i,t} \]

step1 <- step0 %>%
  nest(data = c(symbol, ri, b_MKT, b_SMB, b_HML)) %>%
  mutate(estimates = map(
    data,
    ~ tidy(lm(ri ~ b_MKT + b_SMB + b_HML, data = .x))
  )) %>%
  unnest(estimates) %>%
  select(date, estimate, term) %>%
  pivot_wider(names_from  = term,
              values_from = estimate) %>%
  select(date, b_MKT, b_HML, b_SMB)

head(step1)

6 Step 2 — Time-series averages of the cross-sectional coefficients

A one-sample \(t\)-test against zero on each coefficient series tells us whether the corresponding factor is priced (i.e. earns a significant risk premium).

t.test(step1$b_MKT, mu = 0)
#> 
#>  One Sample t-test
#> 
#> data:  step1$b_MKT
#> t = -0.37879, df = 1256, p-value = 0.7049
#> alternative hypothesis: true mean is not equal to 0
#> 95 percent confidence interval:
#>  -0.002546371  0.001722208
#> sample estimates:
#>     mean of x 
#> -0.0004120813
t.test(step1$b_SMB, mu = 0)
#> 
#>  One Sample t-test
#> 
#> data:  step1$b_SMB
#> t = 0.97712, df = 1256, p-value = 0.3287
#> alternative hypothesis: true mean is not equal to 0
#> 95 percent confidence interval:
#>  -0.003711466  0.011076953
#> sample estimates:
#>   mean of x 
#> 0.003682744
t.test(step1$b_HML, mu = 0)
#> 
#>  One Sample t-test
#> 
#> data:  step1$b_HML
#> t = -0.18044, df = 1256, p-value = 0.8568
#> alternative hypothesis: true mean is not equal to 0
#> 95 percent confidence interval:
#>  -0.005541205  0.004607776
#> sample estimates:
#>     mean of x 
#> -0.0004667146

7 Interpretation

For this sample (2011-2015, 6 U.S. stocks), none of the three factors earns a risk premium that is statistically distinguishable from zero at the 5% level — every t-test produces \(|t| < 2\) and \(p > 0.05\). This indicates that once we control for the time-series exposure to MKT, SMB and HML, the cross-sectional pricing of these factors is weak in this small sample.

In a larger universe (e.g. the 25 Fama-French size/value portfolios or the full CRSP cross-section) the market premium typically becomes positive and significant, while SMB and HML are more mixed across sub-samples.

8 Session info

#> R version 4.4.1 (2024-06-14 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26100)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.utf8 
#> [2] LC_CTYPE=English_United States.utf8   
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.utf8    
#> 
#> time zone: Asia/Shanghai
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] lubridate_1.9.5 forcats_1.0.1   stringr_1.5.1   dplyr_1.2.0    
#>  [5] purrr_1.2.1     readr_2.2.0     tidyr_1.3.2     tibble_3.3.1   
#>  [9] ggplot2_4.0.2   tidyverse_2.0.0 broom_1.0.12   
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.6       jsonlite_2.0.0     compiler_4.4.1     tidyselect_1.2.1  
#>  [5] jquerylib_0.1.4    scales_1.4.0       yaml_2.3.10        fastmap_1.2.0     
#>  [9] R6_2.6.1           generics_0.1.3     knitr_1.51         backports_1.5.0   
#> [13] tzdb_0.5.0         bslib_0.10.0       pillar_1.10.2      RColorBrewer_1.1-3
#> [17] rlang_1.1.7        stringi_1.8.7      cachem_1.1.0       xfun_0.52         
#> [21] S7_0.2.0           sass_0.4.10        otel_0.2.0         timechange_0.4.0  
#> [25] cli_3.6.5          withr_3.0.2        magrittr_2.0.3     digest_0.6.37     
#> [29] grid_4.4.1         rstudioapi_0.17.1  hms_1.1.3          lifecycle_1.0.5   
#> [33] vctrs_0.7.1        evaluate_1.0.5     glue_1.8.0         farver_2.1.2      
#> [37] rmarkdown_2.31     tools_4.4.1        pkgconfig_2.0.3    htmltools_0.5.8.1

9 References