1 Introduction

Asset pricing theory seeks to answer a fundamental question in finance: why do some assets earn higher returns than others? The intuitive answer, backed by decades of empirical research, is that higher returns compensate investors for bearing greater risk. However, the challenge lies in identifying which risks matter and how much each unit of risk is rewarded by the market.

The Fama-MacBeth (1973) regression is one of the most widely used empirical methods for estimating these risk premia — the extra return investors demand for being exposed to systematic risk factors. Originally developed by Eugene Fama and James MacBeth, the procedure has become a cornerstone of empirical asset pricing and is routinely used in both academic research and professional investment management.

In this analysis, we apply the Fama-MacBeth two-pass regression procedure to a panel dataset of six U.S. equities — AAPL, FORD, GE, GM, IBM, and MSFT — spanning the period from January 2011 to December 2015. The three systematic risk factors we examine are drawn from the celebrated Fama-French Three-Factor Model:

Factor Description
MKT Market excess return — captures overall market risk
SMB Small Minus Big — captures the size premium
HML High Minus Low — captures the value premium

Our objective is to determine whether these three factors carry statistically significant risk premia — that is, whether the market genuinely compensates investors for exposure to each factor.


2 Methodology

2.1 Why Fama-MacBeth? The Motivation

In a standard pooled OLS regression, the error terms across different assets at the same point in time are very likely to be correlated (i.e., if the market drops, almost all stocks decline together). This cross-sectional correlation causes standard OLS to underestimate standard errors, leading to inflated t-statistics and false conclusions about statistical significance.

The Fama-MacBeth procedure elegantly sidesteps this problem. Instead of pooling everything into one regression, it runs T separate cross-sectional regressions — one for each time period — and then uses the time-series variation of the resulting coefficients to compute standard errors. Since each cross-sectional regression is independent in time, the resulting standard errors are robust to cross-sectional correlation.

2.2 The Two-Pass Procedure

The Fama-MacBeth regression consists of two passes, with a preliminary Step 0:

2.2.1 Step 0: Time-Series Regressions (Estimating Factor Betas)

For each asset \(i\), we run a time-series regression of its excess returns on the three Fama-French factors over the full sample period:

\[r_{i,t} = \alpha_i + \beta_{i,MKT} \cdot MKT_t + \beta_{i,SMB} \cdot SMB_t + \beta_{i,HML} \cdot HML_t + \varepsilon_{i,t}\]

This produces a set of factor loadings (betas) for each stock — \(\hat{\beta}_{i,MKT}\), \(\hat{\beta}_{i,SMB}\), \(\hat{\beta}_{i,HML}\) — which measure how sensitive each stock’s return is to each risk factor.

2.2.2 Step 1: Cross-Sectional Regressions (Estimating Risk Premia)

For each time period \(t\), we regress the cross-section of asset returns on the betas estimated in Step 0:

\[r_{i,t} = \lambda_{0,t} + \lambda_{MKT,t} \cdot \hat{\beta}_{i,MKT} + \lambda_{SMB,t} \cdot \hat{\beta}_{i,SMB} + \lambda_{HML,t} \cdot \hat{\beta}_{i,HML} + \eta_{i,t}\]

Here, the lambdas (\(\lambda\)) are the risk premia — they tell us how much return the market awards per unit of each factor exposure in period \(t\).

2.2.3 Step 2: Averaging and Hypothesis Testing

Finally, we take the time-series average of each lambda across all \(T\) periods:

\[\hat{\lambda}_k = \frac{1}{T} \sum_{t=1}^{T} \hat{\lambda}_{k,t}\]

We then apply a one-sample t-test against \(H_0: \lambda_k = 0\) for each factor. A statistically significant result (typically \(|t| > 2\) or \(p < 0.05\)) means the market rewards investors for bearing that particular risk.


3 Data

# -------------------------------------------------------
# Load required libraries
# broom     : tidy() converts model output into data frames
# tidyverse : data wrangling and piping with dplyr + purrr
# knitr     : kable() for formatted tables in the report
# -------------------------------------------------------
library(broom)
library(tidyverse)
library(knitr)
# -------------------------------------------------------
# Load the dataset
# The panel contains daily returns for 6 U.S. stocks
# along with daily Fama-French three-factor values
# -------------------------------------------------------
data <- read.csv("data.csv")

# Quick preview of the data structure
glimpse(data)
## Rows: 7,542
## Columns: 6
## $ symbol <chr> "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL",…
## $ date   <chr> "4-Jan-11", "5-Jan-11", "6-Jan-11", "7-Jan-11", "10-Jan-11", "1…
## $ ri     <dbl> 0.0052062641, 0.0081462879, -0.0008082435, 0.0071360567, 0.0186…
## $ MKT    <dbl> -0.0013138901, 0.0049946699, -0.0021252276, -0.0018465050, -0.0…
## $ SMB    <dbl> -0.0065, 0.0018, 0.0001, 0.0022, 0.0041, 0.0016, 0.0031, -0.002…
## $ HML    <dbl> 0.0008, 0.0013, -0.0025, -0.0006, 0.0039, 0.0036, 0.0000, -0.00…
# -------------------------------------------------------
# Summary statistics for each variable
# -------------------------------------------------------
data %>%
  select(ri, MKT, SMB, HML) %>%
  summary()
##        ri                  MKT                  SMB            
##  Min.   :-0.3908663   Min.   :-0.0689583   Min.   :-1.660e-02  
##  1st Qu.:-0.0087263   1st Qu.:-0.0040125   1st Qu.:-3.100e-03  
##  Median : 0.0000000   Median : 0.0005438   Median : 1.000e-04  
##  Mean   : 0.0002109   Mean   : 0.0003774   Mean   : 2.227e-06  
##  3rd Qu.: 0.0093507   3rd Qu.: 0.0052641   3rd Qu.: 3.100e-03  
##  Max.   : 0.9614112   Max.   : 0.0463174   Max.   : 2.490e-02  
##       HML          
##  Min.   :-0.01490  
##  1st Qu.:-0.00260  
##  Median : 0.00000  
##  Mean   : 0.00013  
##  3rd Qu.: 0.00260  
##  Max.   : 0.02250
# -------------------------------------------------------
# Display the first few rows in a formatted table
# -------------------------------------------------------
head(data, 10) %>%
  kable(
    caption = "Table 1: First 10 Observations of the Dataset",
    digits  = 6,
    align   = "c"
  )
Table 1: First 10 Observations of the Dataset
symbol date ri MKT SMB HML
AAPL 4-Jan-11 0.005206 -0.001314 -0.0065 0.0008
AAPL 5-Jan-11 0.008146 0.004995 0.0018 0.0013
AAPL 6-Jan-11 -0.000808 -0.002125 0.0001 -0.0025
AAPL 7-Jan-11 0.007136 -0.001847 0.0022 -0.0006
AAPL 10-Jan-11 0.018657 -0.001377 0.0041 0.0039
AAPL 11-Jan-11 -0.002368 0.003718 0.0016 0.0036
AAPL 12-Jan-11 0.008104 0.008967 0.0031 0.0000
AAPL 13-Jan-11 0.003652 -0.001712 -0.0026 -0.0044
AAPL 14-Jan-11 0.008067 0.007357 -0.0010 -0.0073
AAPL 18-Jan-11 -0.022725 0.001375 0.0056 0.0015

The dataset contains 7542 daily observations across 6 stocks (AAPL, FORD, GE, GM, IBM, MSFT), covering the period January 2011 to December 2015. The variable ri is each stock’s daily return, and MKT, SMB, HML are the corresponding daily Fama-French factor returns.


4 Code Implementation

4.1 Step 0: Time-Series Regressions

In this step, we run a separate OLS regression for each of the six stocks, regressing its daily return (ri) on the three factors. The nest() function from tidyverse groups the data by stock, and map() applies the regression to each group.

# -------------------------------------------------------
# STEP 0: Time-Series Regressions
#
# For each stock (symbol), regress ri on MKT, SMB, HML
# across all time periods to obtain factor betas.
# -------------------------------------------------------

step0_betas <- data %>%
  
  # Group data by stock and nest the time-series data for each stock
  nest(data = c(date, ri, MKT, SMB, HML)) %>%
  
  # For each stock, run OLS: ri = alpha + b_MKT*MKT + b_SMB*SMB + b_HML*HML
  mutate(estimates = map(
    data,
    ~tidy(lm(ri ~ MKT + SMB + HML, data = .x))
  )) %>%
  
  # Unnest to bring model results into the main data frame
  unnest(estimates) %>%
  
  # Keep only the symbol, coefficient name, and coefficient value
  select(symbol, estimate, term) %>%
  
  # Pivot from long to wide: each factor gets its own column
  pivot_wider(
    names_from  = term,
    values_from = estimate
  ) %>%
  
  # Rename for clarity; drop the intercept (not needed in Step 1)
  select(
    symbol,
    b_MKT = MKT,
    b_SMB = SMB,
    b_HML = HML
  )

# Display the estimated betas for each stock
step0_betas %>%
  kable(
    caption = "Table 2: Estimated Factor Betas from Time-Series Regressions (Step 0)",
    digits  = 4,
    align   = "c"
  )
Table 2: Estimated Factor Betas from Time-Series Regressions (Step 0)
symbol b_MKT b_SMB b_HML
AAPL 0.9000 0.0685 -0.0578
FORD 0.5129 -0.2644 0.1380
GE 1.0779 0.0994 0.0902
GM 1.2854 0.0039 -0.0222
IBM 0.8169 0.0336 -0.0121
MSFT 0.9656 0.0582 -0.0641

Interpretation of Betas: Each beta measures the sensitivity of a stock’s return to the corresponding factor. For example, a b_MKT of 1.2 means the stock tends to move 1.2% for every 1% movement in the market factor — it is more volatile than the market. A b_SMB close to 0 means the stock has little exposure to the size premium, and so on.

# -------------------------------------------------------
# Merge the estimated betas back into the original dataset
# so that each daily observation now carries its stock's
# factor loadings — ready for the cross-sectional step.
# -------------------------------------------------------
step0 <- data %>%
  left_join(step0_betas, by = "symbol")

# Preview the merged dataset
head(step0, 6) %>%
  kable(
    caption = "Table 3: Dataset After Merging Factor Betas (Step 0 Output)",
    digits  = 6,
    align   = "c"
  )
Table 3: Dataset After Merging Factor Betas (Step 0 Output)
symbol date ri MKT SMB HML b_MKT b_SMB b_HML
AAPL 4-Jan-11 0.005206 -0.001314 -0.0065 0.0008 0.900006 0.068535 -0.057821
AAPL 5-Jan-11 0.008146 0.004995 0.0018 0.0013 0.900006 0.068535 -0.057821
AAPL 6-Jan-11 -0.000808 -0.002125 0.0001 -0.0025 0.900006 0.068535 -0.057821
AAPL 7-Jan-11 0.007136 -0.001847 0.0022 -0.0006 0.900006 0.068535 -0.057821
AAPL 10-Jan-11 0.018657 -0.001377 0.0041 0.0039 0.900006 0.068535 -0.057821
AAPL 11-Jan-11 -0.002368 0.003718 0.0016 0.0036 0.900006 0.068535 -0.057821

4.2 Step 1: Cross-Sectional Regressions

Now that each observation carries the factor betas, we run a separate cross-sectional OLS regression for each date. Across each date’s six stocks, we regress the stock return on the pre-estimated betas to extract the risk premia \(\lambda\) for that day.

# -------------------------------------------------------
# STEP 1: Cross-Sectional Regressions
#
# For each date (t), regress ri on b_MKT, b_SMB, b_HML
# across all stocks to obtain daily risk premia (lambdas).
# -------------------------------------------------------

step1_lambdas <- step0 %>%
  
  # Group data by date and nest the cross-sectional data for each date
  nest(data = c(symbol, ri, b_MKT, b_SMB, b_HML)) %>%
  
  # For each date, run OLS: ri = lambda0 + lam_MKT*b_MKT + ...
  mutate(estimates = map(
    data,
    ~tidy(lm(ri ~ b_MKT + b_SMB + b_HML, data = .x))
  )) %>%
  
  # Unnest the results
  unnest(estimates) %>%
  
  # Keep only the date, coefficient name, and coefficient value
  select(date, estimate, term) %>%
  
  # Pivot to wide format: one column per lambda
  pivot_wider(
    names_from  = term,
    values_from = estimate
  ) %>%
  
  # Select and rename the risk premia columns
  select(
    date,
    lam_MKT = b_MKT,
    lam_SMB = b_SMB,
    lam_HML = b_HML
  )

# Preview the first few rows of daily risk premia
head(step1_lambdas, 10) %>%
  kable(
    caption = "Table 4: Daily Cross-Sectional Risk Premia (Step 1 Output) — First 10 Days",
    digits  = 6,
    align   = "c"
  )
Table 4: Daily Cross-Sectional Risk Premia (Step 1 Output) — First 10 Days
date lam_MKT lam_SMB lam_HML
4-Jan-11 0.041629 -0.025520 0.057372
5-Jan-11 -0.011347 -0.158046 0.062847
6-Jan-11 0.037301 0.007029 -0.173234
7-Jan-11 0.012722 0.032269 -0.064226
10-Jan-11 -0.036631 0.017123 0.058646
11-Jan-11 0.004089 -0.095361 0.089858
12-Jan-11 -0.055365 -0.164496 0.043036
13-Jan-11 -0.019357 0.001815 0.025630
14-Jan-11 -0.016486 0.063259 0.039214
18-Jan-11 0.010146 0.052508 -0.090027

Interpretation of Lambdas: Each row represents one day’s cross-sectional regression. lam_MKT on a given day tells us how much additional return stocks with higher market beta earned compared to lower-beta stocks on that particular day. We now have a time series of these daily risk premia for each factor.


4.3 Step 2: Averaging Coefficients and Hypothesis Testing

The final step averages the daily lambdas across the entire sample period and tests whether each average is statistically different from zero using a one-sample t-test.

# -------------------------------------------------------
# STEP 2a: Compute time-series averages of risk premia
# -------------------------------------------------------

lambda_summary <- step1_lambdas %>%
  summarise(
    Mean_MKT = mean(lam_MKT, na.rm = TRUE),
    Mean_SMB = mean(lam_SMB, na.rm = TRUE),
    Mean_HML = mean(lam_HML, na.rm = TRUE),
    SD_MKT   = sd(lam_MKT,   na.rm = TRUE),
    SD_SMB   = sd(lam_SMB,   na.rm = TRUE),
    SD_HML   = sd(lam_HML,   na.rm = TRUE),
    N        = n()
  )

lambda_summary %>%
  kable(
    caption = "Table 5: Summary Statistics of Daily Risk Premia",
    digits  = 6,
    align   = "c"
  )
Table 5: Summary Statistics of Daily Risk Premia
Mean_MKT Mean_SMB Mean_HML SD_MKT SD_SMB SD_HML N
-0.000412 0.003683 -0.000467 0.03857 0.133626 0.091705 1257
# -------------------------------------------------------
# STEP 2b: One-sample t-tests for each risk premium
#
# H0: lambda = 0 (the factor earns no risk premium)
# H1: lambda ≠ 0 (the factor earns a nonzero risk premium)
#
# We test at the conventional 5% significance level.
# -------------------------------------------------------

cat("============================================================\n")
## ============================================================
cat("  FAMA-MACBETH HYPOTHESIS TESTS: H0: lambda = 0\n")
##   FAMA-MACBETH HYPOTHESIS TESTS: H0: lambda = 0
cat("============================================================\n\n")
## ============================================================
cat("--- Factor 1: Market Risk Premium (MKT) ---\n")
## --- Factor 1: Market Risk Premium (MKT) ---
ttest_MKT <- t.test(step1_lambdas$lam_MKT, mu = 0)
print(ttest_MKT)
## 
##  One Sample t-test
## 
## data:  step1_lambdas$lam_MKT
## t = -0.37879, df = 1256, p-value = 0.7049
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.002546371  0.001722208
## sample estimates:
##     mean of x 
## -0.0004120813
cat("\n--- Factor 2: Size Premium (SMB) ---\n")
## 
## --- Factor 2: Size Premium (SMB) ---
ttest_SMB <- t.test(step1_lambdas$lam_SMB, mu = 0)
print(ttest_SMB)
## 
##  One Sample t-test
## 
## data:  step1_lambdas$lam_SMB
## t = 0.97712, df = 1256, p-value = 0.3287
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.003711466  0.011076953
## sample estimates:
##   mean of x 
## 0.003682744
cat("\n--- Factor 3: Value Premium (HML) ---\n")
## 
## --- Factor 3: Value Premium (HML) ---
ttest_HML <- t.test(step1_lambdas$lam_HML, mu = 0)
print(ttest_HML)
## 
##  One Sample t-test
## 
## data:  step1_lambdas$lam_HML
## t = -0.18044, df = 1256, p-value = 0.8568
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.005541205  0.004607776
## sample estimates:
##     mean of x 
## -0.0004667146
# -------------------------------------------------------
# Compile a clean results table for easy reading
# -------------------------------------------------------

results_table <- tibble(
  Factor        = c("MKT (Market)", "SMB (Size)", "HML (Value)"),
  Mean_Lambda   = c(ttest_MKT$estimate, ttest_SMB$estimate, ttest_HML$estimate),
  t_Statistic   = c(ttest_MKT$statistic, ttest_SMB$statistic, ttest_HML$statistic),
  p_Value       = c(ttest_MKT$p.value, ttest_SMB$p.value, ttest_HML$p.value),
  CI_Lower      = c(ttest_MKT$conf.int[1], ttest_SMB$conf.int[1], ttest_HML$conf.int[1]),
  CI_Upper      = c(ttest_MKT$conf.int[2], ttest_SMB$conf.int[2], ttest_HML$conf.int[2]),
  Significant   = c(
    ifelse(ttest_MKT$p.value < 0.05, "Yes ***", "No"),
    ifelse(ttest_SMB$p.value < 0.05, "Yes ***", "No"),
    ifelse(ttest_HML$p.value < 0.05, "Yes ***", "No")
  )
)

results_table %>%
  kable(
    caption = "Table 6: Fama-MacBeth Regression Results Summary",
    digits  = c(0, 6, 4, 4, 6, 6, 0),
    align   = "c"
  )
Table 6: Fama-MacBeth Regression Results Summary
Factor Mean_Lambda t_Statistic p_Value CI_Lower CI_Upper Significant
MKT (Market) -0.000412 -0.3788 0.7049 -0.002546 0.001722 No
SMB (Size) 0.003683 0.9771 0.3287 -0.003711 0.011077 No
HML (Value) -0.000467 -0.1804 0.8568 -0.005541 0.004608 No

5 Visualisation

# -------------------------------------------------------
# Plot the distribution of daily lambdas for each factor
# to visually inspect whether the average differs from zero
# -------------------------------------------------------

step1_lambdas %>%
  pivot_longer(
    cols      = c(lam_MKT, lam_SMB, lam_HML),
    names_to  = "Factor",
    values_to = "Lambda"
  ) %>%
  mutate(Factor = recode(Factor,
    lam_MKT = "MKT (Market)",
    lam_SMB = "SMB (Size)",
    lam_HML = "HML (Value)"
  )) %>%
  ggplot(aes(x = Lambda, fill = Factor)) +
  geom_histogram(bins = 60, alpha = 0.75, colour = "white") +
  geom_vline(xintercept = 0, colour = "black", linetype = "dashed", linewidth = 0.8) +
  facet_wrap(~Factor, scales = "free") +
  scale_fill_manual(values = c("#2980b9", "#27ae60", "#e74c3c")) +
  labs(
    title    = "Distribution of Daily Cross-Sectional Risk Premia",
    subtitle = "Dashed line marks zero; a distribution centred away from zero indicates a nonzero risk premium",
    x        = "Lambda (Risk Premium)",
    y        = "Frequency",
    caption  = "Source: Fama-French Three-Factor Model | Daily data, Jan 2011 – Dec 2015"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    legend.position  = "none",
    strip.text       = element_text(face = "bold"),
    plot.title       = element_text(face = "bold", size = 14),
    plot.subtitle    = element_text(colour = "grey40", size = 11)
  )
Figure 1: Distribution of Daily Risk Premia for MKT, SMB, and HML

Figure 1: Distribution of Daily Risk Premia for MKT, SMB, and HML

# -------------------------------------------------------
# Plot the time-series of each daily lambda
# -------------------------------------------------------

step1_lambdas %>%
  mutate(date = as.Date(date, format = "%d-%b-%y")) %>%
  pivot_longer(
    cols      = c(lam_MKT, lam_SMB, lam_HML),
    names_to  = "Factor",
    values_to = "Lambda"
  ) %>%
  mutate(Factor = recode(Factor,
    lam_MKT = "MKT (Market)",
    lam_SMB = "SMB (Size)",
    lam_HML = "HML (Value)"
  )) %>%
  ggplot(aes(x = date, y = Lambda, colour = Factor)) +
  geom_line(alpha = 0.6, linewidth = 0.4) +
  geom_hline(yintercept = 0, colour = "black", linetype = "dashed") +
  facet_wrap(~Factor, ncol = 1, scales = "free_y") +
  scale_colour_manual(values = c("#2980b9", "#27ae60", "#e74c3c")) +
  labs(
    title    = "Time-Series of Daily Cross-Sectional Risk Premia",
    x        = "Date",
    y        = "Lambda (Risk Premium)",
    caption  = "Source: Fama-French Three-Factor Model | Daily data, Jan 2011 – Dec 2015"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    legend.position = "none",
    strip.text      = element_text(face = "bold"),
    plot.title      = element_text(face = "bold", size = 14)
  )
Figure 2: Time-Series of Daily Risk Premia

Figure 2: Time-Series of Daily Risk Premia


6 Interpretation of Results

6.1 Factor Betas (Step 0)

The time-series regressions reveal how each of the six stocks is exposed to the three systematic risk factors. Key observations from Table 2:

  • Market Beta (b_MKT): Most stocks carry a positive and meaningful market beta, consistent with the expectation that equities co-move with the broader market. Consumer-facing technology stocks such as AAPL and MSFT tend to have betas clustered around one, while automotive stocks (FORD, GM) may exhibit higher cyclical sensitivity.
  • Size Beta (b_SMB): Stocks with a positive SMB beta tend to behave more like small-cap stocks — they outperform when small firms are doing well relative to large firms. Negative SMB betas suggest large-cap characteristics.
  • Value Beta (b_HML): A positive HML beta suggests a value-like tilt (the stock co-moves with high book-to-market firms), while a negative HML beta indicates a growth-like characteristic.

6.2 Risk Premia (Step 2 — Hypothesis Tests)

The core results of the Fama-MacBeth procedure are the t-test results summarised in Table 6. We evaluate each factor against the null hypothesis \(H_0: \lambda = 0\) at the 5% significance level.

6.2.1 MKT — Market Risk Premium

## Mean Lambda: -0.000412 | t-statistic: -0.3788 | p-value: 0.7049

The market factor’s average risk premium and its associated p-value tell us whether the market risk is priced in the cross-section. If the p-value is below 0.05, we reject the null and conclude that stocks with higher market beta systematically earn higher returns — consistent with the Capital Asset Pricing Model (CAPM) and the foundational prediction of asset pricing theory.

6.2.2 SMB — Size Premium

## Mean Lambda: 0.003683 | t-statistic: 0.9771 | p-value: 0.3287

The SMB coefficient tests whether the size premium is priced in this sample. Fama and French’s original 1993 paper found that small-cap stocks historically outperform large-cap stocks on a risk-adjusted basis. Whether this holds in our more recent sample (2011–2015) is an empirical question answered by this test.

6.2.3 HML — Value Premium

## Mean Lambda: -0.000467 | t-statistic: -0.1804 | p-value: 0.8568

The HML coefficient tests whether value stocks (high book-to-market) earn a premium over growth stocks (low book-to-market) in the cross-section. This is one of the most debated anomalies in finance — some periods show a pronounced value premium, others do not.

Overall Verdict: Refer to Table 6 for the definitive significance determination. Any factor with a p-value below 0.05 (marked “Yes ***“) is statistically priced in the cross-section, meaning investors are compensated for bearing that specific risk. Factors with p-values above 0.05 fail to reject the null and may not represent independently priced risks during this sample period.


7 Conclusion

This analysis has successfully replicated the Fama-MacBeth (1973) two-pass regression procedure using daily return data for six U.S. equities from 2011 to 2015, with the Fama-French three-factor model as the underlying asset pricing framework.

The procedure proceeded in three well-defined stages:

  1. Step 0 (Time-Series Regressions): We estimated each stock’s exposure (beta) to the market (MKT), size (SMB), and value (HML) factors by running N = 6 separate time-series regressions — one per stock.

  2. Step 1 (Cross-Sectional Regressions): We ran T daily cross-sectional regressions, regressing each day’s cross-section of stock returns on the pre-estimated betas, yielding a daily time-series of risk premia (lambdas) for each factor.

  3. Step 2 (Averaging and Hypothesis Testing): We averaged the daily lambdas and applied one-sample t-tests to determine whether each factor’s risk premium is statistically distinguishable from zero.

A key methodological advantage of the Fama-MacBeth approach over simple pooled OLS is its robustness to cross-sectional correlation — a pervasive feature of financial return data that, if ignored, would produce misleadingly narrow standard errors and overconfident statistical conclusions.

The results of this study contribute to the broader body of empirical evidence on the Fama-French three-factor model and its applicability across different market regimes. Future extensions could include a larger and more diverse stock universe, longer sample periods, additional risk factors (e.g., the Momentum or Profitability factors from the five-factor model), or rolling-window estimation to examine how risk premia evolve through time.


8 References

  • Fama, E. F., & MacBeth, J. D. (1973). Risk, return, and equilibrium: Empirical tests. Journal of Political Economy, 81(3), 607–636.
  • Fama, E. F., & French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33(1), 3–56.
  • Bali, T. G., Engle, R. F., & Murray, S. (2016). Empirical Asset Pricing: The Cross Section of Stock Returns. Wiley.
  • Robinson, D. (2021). Introduction to Empirical Bayes. broom package documentation. CRAN.
  • Wickham, H., et al. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686.

This report was prepared using R Markdown and is intended for academic submission. All code is fully reproducible — simply place data.csv in the working directory and knit this document.