Asset pricing theory seeks to answer a fundamental question in finance: why do some assets earn higher returns than others? The intuitive answer, backed by decades of empirical research, is that higher returns compensate investors for bearing greater risk. However, the challenge lies in identifying which risks matter and how much each unit of risk is rewarded by the market.
The Fama-MacBeth (1973) regression is one of the most widely used empirical methods for estimating these risk premia — the extra return investors demand for being exposed to systematic risk factors. Originally developed by Eugene Fama and James MacBeth, the procedure has become a cornerstone of empirical asset pricing and is routinely used in both academic research and professional investment management.
In this analysis, we apply the Fama-MacBeth two-pass regression procedure to a panel dataset of six U.S. equities — AAPL, FORD, GE, GM, IBM, and MSFT — spanning the period from January 2011 to December 2015. The three systematic risk factors we examine are drawn from the celebrated Fama-French Three-Factor Model:
| Factor | Description |
|---|---|
| MKT | Market excess return — captures overall market risk |
| SMB | Small Minus Big — captures the size premium |
| HML | High Minus Low — captures the value premium |
Our objective is to determine whether these three factors carry statistically significant risk premia — that is, whether the market genuinely compensates investors for exposure to each factor.
In a standard pooled OLS regression, the error terms across different assets at the same point in time are very likely to be correlated (i.e., if the market drops, almost all stocks decline together). This cross-sectional correlation causes standard OLS to underestimate standard errors, leading to inflated t-statistics and false conclusions about statistical significance.
The Fama-MacBeth procedure elegantly sidesteps this problem. Instead of pooling everything into one regression, it runs T separate cross-sectional regressions — one for each time period — and then uses the time-series variation of the resulting coefficients to compute standard errors. Since each cross-sectional regression is independent in time, the resulting standard errors are robust to cross-sectional correlation.
The Fama-MacBeth regression consists of two passes, with a preliminary Step 0:
For each asset \(i\), we run a time-series regression of its excess returns on the three Fama-French factors over the full sample period:
\[r_{i,t} = \alpha_i + \beta_{i,MKT} \cdot MKT_t + \beta_{i,SMB} \cdot SMB_t + \beta_{i,HML} \cdot HML_t + \varepsilon_{i,t}\]
This produces a set of factor loadings (betas) for each stock — \(\hat{\beta}_{i,MKT}\), \(\hat{\beta}_{i,SMB}\), \(\hat{\beta}_{i,HML}\) — which measure how sensitive each stock’s return is to each risk factor.
For each time period \(t\), we regress the cross-section of asset returns on the betas estimated in Step 0:
\[r_{i,t} = \lambda_{0,t} + \lambda_{MKT,t} \cdot \hat{\beta}_{i,MKT} + \lambda_{SMB,t} \cdot \hat{\beta}_{i,SMB} + \lambda_{HML,t} \cdot \hat{\beta}_{i,HML} + \eta_{i,t}\]
Here, the lambdas (\(\lambda\)) are the risk premia — they tell us how much return the market awards per unit of each factor exposure in period \(t\).
Finally, we take the time-series average of each lambda across all \(T\) periods:
\[\hat{\lambda}_k = \frac{1}{T} \sum_{t=1}^{T} \hat{\lambda}_{k,t}\]
We then apply a one-sample t-test against \(H_0: \lambda_k = 0\) for each factor. A statistically significant result (typically \(|t| > 2\) or \(p < 0.05\)) means the market rewards investors for bearing that particular risk.
# -------------------------------------------------------
# Load required libraries
# broom : tidy() converts model output into data frames
# tidyverse : data wrangling and piping with dplyr + purrr
# knitr : kable() for formatted tables in the report
# -------------------------------------------------------
library(broom)
library(tidyverse)
library(knitr)# -------------------------------------------------------
# Load the dataset
# The panel contains daily returns for 6 U.S. stocks
# along with daily Fama-French three-factor values
# -------------------------------------------------------
data <- read.csv("data.csv")
# Quick preview of the data structure
glimpse(data)## Rows: 7,542
## Columns: 6
## $ symbol <chr> "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL",…
## $ date <chr> "4-Jan-11", "5-Jan-11", "6-Jan-11", "7-Jan-11", "10-Jan-11", "1…
## $ ri <dbl> 0.0052062641, 0.0081462879, -0.0008082435, 0.0071360567, 0.0186…
## $ MKT <dbl> -0.0013138901, 0.0049946699, -0.0021252276, -0.0018465050, -0.0…
## $ SMB <dbl> -0.0065, 0.0018, 0.0001, 0.0022, 0.0041, 0.0016, 0.0031, -0.002…
## $ HML <dbl> 0.0008, 0.0013, -0.0025, -0.0006, 0.0039, 0.0036, 0.0000, -0.00…
# -------------------------------------------------------
# Summary statistics for each variable
# -------------------------------------------------------
data %>%
select(ri, MKT, SMB, HML) %>%
summary()## ri MKT SMB
## Min. :-0.3908663 Min. :-0.0689583 Min. :-1.660e-02
## 1st Qu.:-0.0087263 1st Qu.:-0.0040125 1st Qu.:-3.100e-03
## Median : 0.0000000 Median : 0.0005438 Median : 1.000e-04
## Mean : 0.0002109 Mean : 0.0003774 Mean : 2.227e-06
## 3rd Qu.: 0.0093507 3rd Qu.: 0.0052641 3rd Qu.: 3.100e-03
## Max. : 0.9614112 Max. : 0.0463174 Max. : 2.490e-02
## HML
## Min. :-0.01490
## 1st Qu.:-0.00260
## Median : 0.00000
## Mean : 0.00013
## 3rd Qu.: 0.00260
## Max. : 0.02250
# -------------------------------------------------------
# Display the first few rows in a formatted table
# -------------------------------------------------------
head(data, 10) %>%
kable(
caption = "Table 1: First 10 Observations of the Dataset",
digits = 6,
align = "c"
)| symbol | date | ri | MKT | SMB | HML |
|---|---|---|---|---|---|
| AAPL | 4-Jan-11 | 0.005206 | -0.001314 | -0.0065 | 0.0008 |
| AAPL | 5-Jan-11 | 0.008146 | 0.004995 | 0.0018 | 0.0013 |
| AAPL | 6-Jan-11 | -0.000808 | -0.002125 | 0.0001 | -0.0025 |
| AAPL | 7-Jan-11 | 0.007136 | -0.001847 | 0.0022 | -0.0006 |
| AAPL | 10-Jan-11 | 0.018657 | -0.001377 | 0.0041 | 0.0039 |
| AAPL | 11-Jan-11 | -0.002368 | 0.003718 | 0.0016 | 0.0036 |
| AAPL | 12-Jan-11 | 0.008104 | 0.008967 | 0.0031 | 0.0000 |
| AAPL | 13-Jan-11 | 0.003652 | -0.001712 | -0.0026 | -0.0044 |
| AAPL | 14-Jan-11 | 0.008067 | 0.007357 | -0.0010 | -0.0073 |
| AAPL | 18-Jan-11 | -0.022725 | 0.001375 | 0.0056 | 0.0015 |
The dataset contains 7542 daily observations across
6 stocks (AAPL, FORD, GE, GM, IBM, MSFT), covering the
period January 2011 to December 2015. The variable ri is
each stock’s daily return, and MKT, SMB,
HML are the corresponding daily Fama-French factor
returns.
In this step, we run a separate OLS regression for each of the six
stocks, regressing its daily return (ri) on the three
factors. The nest() function from tidyverse
groups the data by stock, and map() applies the regression
to each group.
# -------------------------------------------------------
# STEP 0: Time-Series Regressions
#
# For each stock (symbol), regress ri on MKT, SMB, HML
# across all time periods to obtain factor betas.
# -------------------------------------------------------
step0_betas <- data %>%
# Group data by stock and nest the time-series data for each stock
nest(data = c(date, ri, MKT, SMB, HML)) %>%
# For each stock, run OLS: ri = alpha + b_MKT*MKT + b_SMB*SMB + b_HML*HML
mutate(estimates = map(
data,
~tidy(lm(ri ~ MKT + SMB + HML, data = .x))
)) %>%
# Unnest to bring model results into the main data frame
unnest(estimates) %>%
# Keep only the symbol, coefficient name, and coefficient value
select(symbol, estimate, term) %>%
# Pivot from long to wide: each factor gets its own column
pivot_wider(
names_from = term,
values_from = estimate
) %>%
# Rename for clarity; drop the intercept (not needed in Step 1)
select(
symbol,
b_MKT = MKT,
b_SMB = SMB,
b_HML = HML
)
# Display the estimated betas for each stock
step0_betas %>%
kable(
caption = "Table 2: Estimated Factor Betas from Time-Series Regressions (Step 0)",
digits = 4,
align = "c"
)| symbol | b_MKT | b_SMB | b_HML |
|---|---|---|---|
| AAPL | 0.9000 | 0.0685 | -0.0578 |
| FORD | 0.5129 | -0.2644 | 0.1380 |
| GE | 1.0779 | 0.0994 | 0.0902 |
| GM | 1.2854 | 0.0039 | -0.0222 |
| IBM | 0.8169 | 0.0336 | -0.0121 |
| MSFT | 0.9656 | 0.0582 | -0.0641 |
Interpretation of Betas: Each beta measures the
sensitivity of a stock’s return to the corresponding factor. For
example, a b_MKT of 1.2 means the stock tends to move 1.2%
for every 1% movement in the market factor — it is more volatile than
the market. A b_SMB close to 0 means the stock has little
exposure to the size premium, and so on.
# -------------------------------------------------------
# Merge the estimated betas back into the original dataset
# so that each daily observation now carries its stock's
# factor loadings — ready for the cross-sectional step.
# -------------------------------------------------------
step0 <- data %>%
left_join(step0_betas, by = "symbol")
# Preview the merged dataset
head(step0, 6) %>%
kable(
caption = "Table 3: Dataset After Merging Factor Betas (Step 0 Output)",
digits = 6,
align = "c"
)| symbol | date | ri | MKT | SMB | HML | b_MKT | b_SMB | b_HML |
|---|---|---|---|---|---|---|---|---|
| AAPL | 4-Jan-11 | 0.005206 | -0.001314 | -0.0065 | 0.0008 | 0.900006 | 0.068535 | -0.057821 |
| AAPL | 5-Jan-11 | 0.008146 | 0.004995 | 0.0018 | 0.0013 | 0.900006 | 0.068535 | -0.057821 |
| AAPL | 6-Jan-11 | -0.000808 | -0.002125 | 0.0001 | -0.0025 | 0.900006 | 0.068535 | -0.057821 |
| AAPL | 7-Jan-11 | 0.007136 | -0.001847 | 0.0022 | -0.0006 | 0.900006 | 0.068535 | -0.057821 |
| AAPL | 10-Jan-11 | 0.018657 | -0.001377 | 0.0041 | 0.0039 | 0.900006 | 0.068535 | -0.057821 |
| AAPL | 11-Jan-11 | -0.002368 | 0.003718 | 0.0016 | 0.0036 | 0.900006 | 0.068535 | -0.057821 |
Now that each observation carries the factor betas, we run a separate cross-sectional OLS regression for each date. Across each date’s six stocks, we regress the stock return on the pre-estimated betas to extract the risk premia \(\lambda\) for that day.
# -------------------------------------------------------
# STEP 1: Cross-Sectional Regressions
#
# For each date (t), regress ri on b_MKT, b_SMB, b_HML
# across all stocks to obtain daily risk premia (lambdas).
# -------------------------------------------------------
step1_lambdas <- step0 %>%
# Group data by date and nest the cross-sectional data for each date
nest(data = c(symbol, ri, b_MKT, b_SMB, b_HML)) %>%
# For each date, run OLS: ri = lambda0 + lam_MKT*b_MKT + ...
mutate(estimates = map(
data,
~tidy(lm(ri ~ b_MKT + b_SMB + b_HML, data = .x))
)) %>%
# Unnest the results
unnest(estimates) %>%
# Keep only the date, coefficient name, and coefficient value
select(date, estimate, term) %>%
# Pivot to wide format: one column per lambda
pivot_wider(
names_from = term,
values_from = estimate
) %>%
# Select and rename the risk premia columns
select(
date,
lam_MKT = b_MKT,
lam_SMB = b_SMB,
lam_HML = b_HML
)
# Preview the first few rows of daily risk premia
head(step1_lambdas, 10) %>%
kable(
caption = "Table 4: Daily Cross-Sectional Risk Premia (Step 1 Output) — First 10 Days",
digits = 6,
align = "c"
)| date | lam_MKT | lam_SMB | lam_HML |
|---|---|---|---|
| 4-Jan-11 | 0.041629 | -0.025520 | 0.057372 |
| 5-Jan-11 | -0.011347 | -0.158046 | 0.062847 |
| 6-Jan-11 | 0.037301 | 0.007029 | -0.173234 |
| 7-Jan-11 | 0.012722 | 0.032269 | -0.064226 |
| 10-Jan-11 | -0.036631 | 0.017123 | 0.058646 |
| 11-Jan-11 | 0.004089 | -0.095361 | 0.089858 |
| 12-Jan-11 | -0.055365 | -0.164496 | 0.043036 |
| 13-Jan-11 | -0.019357 | 0.001815 | 0.025630 |
| 14-Jan-11 | -0.016486 | 0.063259 | 0.039214 |
| 18-Jan-11 | 0.010146 | 0.052508 | -0.090027 |
Interpretation of Lambdas: Each row represents one
day’s cross-sectional regression. lam_MKT on a given day
tells us how much additional return stocks with higher market beta
earned compared to lower-beta stocks on that particular day. We now have
a time series of these daily risk premia for each factor.
The final step averages the daily lambdas across the entire sample period and tests whether each average is statistically different from zero using a one-sample t-test.
# -------------------------------------------------------
# STEP 2a: Compute time-series averages of risk premia
# -------------------------------------------------------
lambda_summary <- step1_lambdas %>%
summarise(
Mean_MKT = mean(lam_MKT, na.rm = TRUE),
Mean_SMB = mean(lam_SMB, na.rm = TRUE),
Mean_HML = mean(lam_HML, na.rm = TRUE),
SD_MKT = sd(lam_MKT, na.rm = TRUE),
SD_SMB = sd(lam_SMB, na.rm = TRUE),
SD_HML = sd(lam_HML, na.rm = TRUE),
N = n()
)
lambda_summary %>%
kable(
caption = "Table 5: Summary Statistics of Daily Risk Premia",
digits = 6,
align = "c"
)| Mean_MKT | Mean_SMB | Mean_HML | SD_MKT | SD_SMB | SD_HML | N |
|---|---|---|---|---|---|---|
| -0.000412 | 0.003683 | -0.000467 | 0.03857 | 0.133626 | 0.091705 | 1257 |
# -------------------------------------------------------
# STEP 2b: One-sample t-tests for each risk premium
#
# H0: lambda = 0 (the factor earns no risk premium)
# H1: lambda ≠ 0 (the factor earns a nonzero risk premium)
#
# We test at the conventional 5% significance level.
# -------------------------------------------------------
cat("============================================================\n")## ============================================================
## FAMA-MACBETH HYPOTHESIS TESTS: H0: lambda = 0
## ============================================================
## --- Factor 1: Market Risk Premium (MKT) ---
##
## One Sample t-test
##
## data: step1_lambdas$lam_MKT
## t = -0.37879, df = 1256, p-value = 0.7049
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.002546371 0.001722208
## sample estimates:
## mean of x
## -0.0004120813
##
## --- Factor 2: Size Premium (SMB) ---
##
## One Sample t-test
##
## data: step1_lambdas$lam_SMB
## t = 0.97712, df = 1256, p-value = 0.3287
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.003711466 0.011076953
## sample estimates:
## mean of x
## 0.003682744
##
## --- Factor 3: Value Premium (HML) ---
##
## One Sample t-test
##
## data: step1_lambdas$lam_HML
## t = -0.18044, df = 1256, p-value = 0.8568
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.005541205 0.004607776
## sample estimates:
## mean of x
## -0.0004667146
# -------------------------------------------------------
# Compile a clean results table for easy reading
# -------------------------------------------------------
results_table <- tibble(
Factor = c("MKT (Market)", "SMB (Size)", "HML (Value)"),
Mean_Lambda = c(ttest_MKT$estimate, ttest_SMB$estimate, ttest_HML$estimate),
t_Statistic = c(ttest_MKT$statistic, ttest_SMB$statistic, ttest_HML$statistic),
p_Value = c(ttest_MKT$p.value, ttest_SMB$p.value, ttest_HML$p.value),
CI_Lower = c(ttest_MKT$conf.int[1], ttest_SMB$conf.int[1], ttest_HML$conf.int[1]),
CI_Upper = c(ttest_MKT$conf.int[2], ttest_SMB$conf.int[2], ttest_HML$conf.int[2]),
Significant = c(
ifelse(ttest_MKT$p.value < 0.05, "Yes ***", "No"),
ifelse(ttest_SMB$p.value < 0.05, "Yes ***", "No"),
ifelse(ttest_HML$p.value < 0.05, "Yes ***", "No")
)
)
results_table %>%
kable(
caption = "Table 6: Fama-MacBeth Regression Results Summary",
digits = c(0, 6, 4, 4, 6, 6, 0),
align = "c"
)| Factor | Mean_Lambda | t_Statistic | p_Value | CI_Lower | CI_Upper | Significant |
|---|---|---|---|---|---|---|
| MKT (Market) | -0.000412 | -0.3788 | 0.7049 | -0.002546 | 0.001722 | No |
| SMB (Size) | 0.003683 | 0.9771 | 0.3287 | -0.003711 | 0.011077 | No |
| HML (Value) | -0.000467 | -0.1804 | 0.8568 | -0.005541 | 0.004608 | No |
# -------------------------------------------------------
# Plot the distribution of daily lambdas for each factor
# to visually inspect whether the average differs from zero
# -------------------------------------------------------
step1_lambdas %>%
pivot_longer(
cols = c(lam_MKT, lam_SMB, lam_HML),
names_to = "Factor",
values_to = "Lambda"
) %>%
mutate(Factor = recode(Factor,
lam_MKT = "MKT (Market)",
lam_SMB = "SMB (Size)",
lam_HML = "HML (Value)"
)) %>%
ggplot(aes(x = Lambda, fill = Factor)) +
geom_histogram(bins = 60, alpha = 0.75, colour = "white") +
geom_vline(xintercept = 0, colour = "black", linetype = "dashed", linewidth = 0.8) +
facet_wrap(~Factor, scales = "free") +
scale_fill_manual(values = c("#2980b9", "#27ae60", "#e74c3c")) +
labs(
title = "Distribution of Daily Cross-Sectional Risk Premia",
subtitle = "Dashed line marks zero; a distribution centred away from zero indicates a nonzero risk premium",
x = "Lambda (Risk Premium)",
y = "Frequency",
caption = "Source: Fama-French Three-Factor Model | Daily data, Jan 2011 – Dec 2015"
) +
theme_minimal(base_size = 13) +
theme(
legend.position = "none",
strip.text = element_text(face = "bold"),
plot.title = element_text(face = "bold", size = 14),
plot.subtitle = element_text(colour = "grey40", size = 11)
)Figure 1: Distribution of Daily Risk Premia for MKT, SMB, and HML
# -------------------------------------------------------
# Plot the time-series of each daily lambda
# -------------------------------------------------------
step1_lambdas %>%
mutate(date = as.Date(date, format = "%d-%b-%y")) %>%
pivot_longer(
cols = c(lam_MKT, lam_SMB, lam_HML),
names_to = "Factor",
values_to = "Lambda"
) %>%
mutate(Factor = recode(Factor,
lam_MKT = "MKT (Market)",
lam_SMB = "SMB (Size)",
lam_HML = "HML (Value)"
)) %>%
ggplot(aes(x = date, y = Lambda, colour = Factor)) +
geom_line(alpha = 0.6, linewidth = 0.4) +
geom_hline(yintercept = 0, colour = "black", linetype = "dashed") +
facet_wrap(~Factor, ncol = 1, scales = "free_y") +
scale_colour_manual(values = c("#2980b9", "#27ae60", "#e74c3c")) +
labs(
title = "Time-Series of Daily Cross-Sectional Risk Premia",
x = "Date",
y = "Lambda (Risk Premium)",
caption = "Source: Fama-French Three-Factor Model | Daily data, Jan 2011 – Dec 2015"
) +
theme_minimal(base_size = 13) +
theme(
legend.position = "none",
strip.text = element_text(face = "bold"),
plot.title = element_text(face = "bold", size = 14)
)Figure 2: Time-Series of Daily Risk Premia
The time-series regressions reveal how each of the six stocks is exposed to the three systematic risk factors. Key observations from Table 2:
b_MKT): Most stocks carry
a positive and meaningful market beta, consistent with the expectation
that equities co-move with the broader market. Consumer-facing
technology stocks such as AAPL and MSFT tend to have betas clustered
around one, while automotive stocks (FORD, GM) may exhibit higher
cyclical sensitivity.b_SMB): Stocks with a
positive SMB beta tend to behave more like small-cap stocks — they
outperform when small firms are doing well relative to large firms.
Negative SMB betas suggest large-cap characteristics.b_HML): A positive HML
beta suggests a value-like tilt (the stock co-moves with high
book-to-market firms), while a negative HML beta indicates a growth-like
characteristic.The core results of the Fama-MacBeth procedure are the t-test results summarised in Table 6. We evaluate each factor against the null hypothesis \(H_0: \lambda = 0\) at the 5% significance level.
This analysis has successfully replicated the Fama-MacBeth (1973) two-pass regression procedure using daily return data for six U.S. equities from 2011 to 2015, with the Fama-French three-factor model as the underlying asset pricing framework.
The procedure proceeded in three well-defined stages:
Step 0 (Time-Series Regressions): We estimated each stock’s exposure (beta) to the market (MKT), size (SMB), and value (HML) factors by running N = 6 separate time-series regressions — one per stock.
Step 1 (Cross-Sectional Regressions): We ran T daily cross-sectional regressions, regressing each day’s cross-section of stock returns on the pre-estimated betas, yielding a daily time-series of risk premia (lambdas) for each factor.
Step 2 (Averaging and Hypothesis Testing): We averaged the daily lambdas and applied one-sample t-tests to determine whether each factor’s risk premium is statistically distinguishable from zero.
A key methodological advantage of the Fama-MacBeth approach over simple pooled OLS is its robustness to cross-sectional correlation — a pervasive feature of financial return data that, if ignored, would produce misleadingly narrow standard errors and overconfident statistical conclusions.
The results of this study contribute to the broader body of empirical evidence on the Fama-French three-factor model and its applicability across different market regimes. Future extensions could include a larger and more diverse stock universe, longer sample periods, additional risk factors (e.g., the Momentum or Profitability factors from the five-factor model), or rolling-window estimation to examine how risk premia evolve through time.
broom package documentation. CRAN.This report was prepared using R Markdown and is intended for
academic submission. All code is fully reproducible — simply place
data.csv in the working directory and knit this
document.