The Fama-MacBeth (1973) regression provides standard errors corrected for cross-sectional correlation. It works best when the number of cross-sections is large but the time series is short.
The procedure has two estimation steps, plus a preparatory Step 0:
Note. Fama-MacBeth cannot be used with cross-sectionally invariant variables (e.g. country-level GDP).
The sample data set data.csv contains daily returns for
six U.S. stocks (AAPL, FORD, GE, GM, IBM, MSFT) over 2011-2015, together
with the Fama-French market (MKT), small-minus-big (SMB) and
high-minus-low (HML) factors.
#> Rows : 7542
#> Symbols : AAPL, FORD, GE, GM, IBM, MSFT
#> Unique dates : 1257
For each symbol we run
\[ r_{i,t} \;=\; \alpha_i \;+\; \beta_{i,\text{MKT}}\,\text{MKT}_t \;+\; \beta_{i,\text{SMB}}\,\text{SMB}_t \;+\; \beta_{i,\text{HML}}\,\text{HML}_t \;+\; \varepsilon_{i,t} \]
using nest() + map() to obtain one
regression per stock. We keep only the slope coefficients and rename
them as factor loadings \(b_{\text{MKT}}\), \(b_{\text{SMB}}\), \(b_{\text{HML}}\).
step0 <- data %>%
nest(data = c(date, ri, MKT, SMB, HML)) %>%
mutate(estimates = map(
data,
~ tidy(lm(ri ~ MKT + SMB + HML, data = .x))
)) %>%
unnest(estimates) %>%
select(symbol, estimate, term) %>%
pivot_wider(names_from = term,
values_from = estimate) %>%
select(symbol,
b_MKT = MKT,
b_HML = HML,
b_SMB = SMB)
step0For each date we now regress the cross-section of
returns on the estimated factor loadings:
\[ r_{i,t} \;=\; \gamma_{0,t} \;+\; \gamma_{\text{MKT},t}\,\hat b_{i,\text{MKT}} \;+\; \gamma_{\text{SMB},t}\,\hat b_{i,\text{SMB}} \;+\; \gamma_{\text{HML},t}\,\hat b_{i,\text{HML}} \;+\; u_{i,t} \]
step1 <- step0 %>%
nest(data = c(symbol, ri, b_MKT, b_SMB, b_HML)) %>%
mutate(estimates = map(
data,
~ tidy(lm(ri ~ b_MKT + b_SMB + b_HML, data = .x))
)) %>%
unnest(estimates) %>%
select(date, estimate, term) %>%
pivot_wider(names_from = term,
values_from = estimate) %>%
select(date, b_MKT, b_HML, b_SMB)
head(step1)A one-sample \(t\)-test against zero on each coefficient series tells us whether the corresponding factor is priced (i.e. earns a significant risk premium).
#>
#> One Sample t-test
#>
#> data: step1$b_MKT
#> t = -0.37879, df = 1256, p-value = 0.7049
#> alternative hypothesis: true mean is not equal to 0
#> 95 percent confidence interval:
#> -0.002546371 0.001722208
#> sample estimates:
#> mean of x
#> -0.0004120813
#>
#> One Sample t-test
#>
#> data: step1$b_SMB
#> t = 0.97712, df = 1256, p-value = 0.3287
#> alternative hypothesis: true mean is not equal to 0
#> 95 percent confidence interval:
#> -0.003711466 0.011076953
#> sample estimates:
#> mean of x
#> 0.003682744
#>
#> One Sample t-test
#>
#> data: step1$b_HML
#> t = -0.18044, df = 1256, p-value = 0.8568
#> alternative hypothesis: true mean is not equal to 0
#> 95 percent confidence interval:
#> -0.005541205 0.004607776
#> sample estimates:
#> mean of x
#> -0.0004667146
For this sample (2011-2015, 6 U.S. stocks), none of the three factors earns a risk premium that is statistically distinguishable from zero at the 5% level — every t-test produces \(|t| < 2\) and \(p > 0.05\). This indicates that once we control for the time-series exposure to MKT, SMB and HML, the cross-sectional pricing of these factors is weak in this small sample.
In a larger universe (e.g. the 25 Fama-French size/value portfolios or the full CRSP cross-section) the market premium typically becomes positive and significant, while SMB and HML are more mixed across sub-samples.
#> R version 4.4.1 (2024-06-14 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26100)
#>
#> Matrix products: default
#>
#>
#> locale:
#> [1] LC_COLLATE=English_United States.utf8
#> [2] LC_CTYPE=English_United States.utf8
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.utf8
#>
#> time zone: Asia/Shanghai
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] lubridate_1.9.5 forcats_1.0.1 stringr_1.5.1 dplyr_1.2.0
#> [5] purrr_1.2.1 readr_2.2.0 tidyr_1.3.2 tibble_3.3.1
#> [9] ggplot2_4.0.2 tidyverse_2.0.0 broom_1.0.12
#>
#> loaded via a namespace (and not attached):
#> [1] gtable_0.3.6 jsonlite_2.0.0 compiler_4.4.1 tidyselect_1.2.1
#> [5] jquerylib_0.1.4 scales_1.4.0 yaml_2.3.10 fastmap_1.2.0
#> [9] R6_2.6.1 generics_0.1.3 knitr_1.51 backports_1.5.0
#> [13] tzdb_0.5.0 bslib_0.10.0 pillar_1.10.2 RColorBrewer_1.1-3
#> [17] rlang_1.1.7 stringi_1.8.7 cachem_1.1.0 xfun_0.52
#> [21] S7_0.2.0 sass_0.4.10 otel_0.2.0 timechange_0.4.0
#> [25] cli_3.6.5 withr_3.0.2 magrittr_2.0.3 digest_0.6.37
#> [29] grid_4.4.1 rstudioapi_0.17.1 hms_1.1.3 lifecycle_1.0.5
#> [33] vctrs_0.7.1 evaluate_1.0.5 glue_1.8.0 farver_2.1.2
#> [37] rmarkdown_2.31 tools_4.4.1 pkgconfig_2.0.3 htmltools_0.5.8.1