1 Introduction

1.1 Motivation

Asset pricing lies at the core of modern finance. A fundamental question is: which risk factors are priced in the cross-section of stock returns? The Fama-MacBeth (1973) procedure provides a rigorous econometric answer by decomposing the problem into two sequential regression passes — first estimating factor loadings (betas) in the time-series dimension, then testing whether those betas are compensated with a positive risk premium in the cross-sectional dimension.

This document replicates the Fama-MacBeth two-pass regression procedure using daily returns for six well-known U.S. equities (AAPL, FORD, GE, GM, IBM, MSFT) over the period January 2011 – December 2015, with the Fama-French Three-Factor (FF3) model as the theoretical backbone.

1.2 The Fama-French Three-Factor Model

Fama and French (1992, 1993) extended the CAPM by adding two empirically motivated risk factors:

\[r_{i,t} - r_{f,t} = \alpha_i + \beta_i^{MKT} \cdot MKT_t + \beta_i^{SMB} \cdot SMB_t + \beta_i^{HML} \cdot HML_t + \varepsilon_{i,t}\]

Symbol	Factor	Description
\(MKT_t\)	Market	Excess return of the market portfolio
\(SMB_t\)	Small Minus Big	Return spread between small- and large-cap stocks
\(HML_t\)	High Minus Low	Return spread between high- and low-book-to-market stocks

1.3 The Fama-MacBeth Procedure

The two-pass regression unfolds as follows:

Pass 1 — Time-Series (N regressions, one per asset): Regress each asset’s excess return on the three factors across all \(T\) time periods, yielding asset-specific factor loadings \((\hat{\beta}_i^{MKT},\, \hat{\beta}_i^{SMB},\, \hat{\beta}_i^{HML})\).

Pass 2 — Cross-Sectional (T regressions, one per date): For each date \(t\), regress the cross-section of returns on the estimated betas:

\[r_{i,t} = \lambda_t^0 + \lambda_t^{MKT}\hat{\beta}_i^{MKT} + \lambda_t^{SMB}\hat{\beta}_i^{SMB} + \lambda_t^{HML}\hat{\beta}_i^{HML} + \eta_{i,t}\]

Pass 3 — Averaging and Inference: The Fama-MacBeth risk premium estimate is the time-series average:

\[\hat{\lambda}^k = \frac{1}{T}\sum_{t=1}^{T} \hat{\lambda}_t^k, \qquad t\text{-stat} = \frac{\hat{\lambda}^k}{\text{SE}(\hat{\lambda}^k)}\]

2 Setup and Package Loading

# Install any missing packages
required_pkgs <- c("broom", "tidyverse", "knitr", "kableExtra",
                   "ggthemes", "scales", "sandwich", "lmtest")
new_pkgs <- required_pkgs[!(required_pkgs %in% installed.packages()[, "Package"])]
if (length(new_pkgs)) install.packages(new_pkgs, repos = "https://cloud.r-project.org")

library(broom)
library(tidyverse)
library(knitr)
library(kableExtra)
library(ggthemes)
library(scales)
library(sandwich)
library(lmtest)

3 Load and Inspect Data

data <- read.csv("data.csv")
data$date <- as.Date(data$date, format = "%d-%b-%y")

head(data, 10) %>%
  kable(digits = 6,
        caption = "Table 1: First 10 Rows of the Dataset") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE)

Table 1: First 10 Rows of the Dataset
symbol	date	ri	MKT	SMB	HML
AAPL	2011-01-04	0.005206	-0.001314	-0.0065	0.0008
AAPL	2011-01-05	0.008146	0.004995	0.0018	0.0013
AAPL	2011-01-06	-0.000808	-0.002125	0.0001	-0.0025
AAPL	2011-01-07	0.007136	-0.001847	0.0022	-0.0006
AAPL	2011-01-10	0.018657	-0.001377	0.0041	0.0039
AAPL	2011-01-11	-0.002368	0.003718	0.0016	0.0036
AAPL	2011-01-12	0.008104	0.008967	0.0031	0.0000
AAPL	2011-01-13	0.003652	-0.001712	-0.0026	-0.0044
AAPL	2011-01-14	0.008067	0.007357	-0.0010	-0.0073
AAPL	2011-01-18	-0.022725	0.001375	0.0056	0.0015

3.1 Dataset Dimensions

cat("Rows       :", nrow(data), "\n")

#> Rows       : 7542

cat("Columns    :", ncol(data), "\n")

#> Columns    : 6

cat("Stocks     :", paste(unique(data$symbol), collapse = ", "), "\n")

#> Stocks     : AAPL, FORD, GE, GM, IBM, MSFT

cat("Date range :", format(min(data$date)), "to", format(max(data$date)), "\n")

#> Date range : 2011-01-04 to 2015-12-31

3.2 Descriptive Statistics

data %>%
  group_by(symbol) %>%
  summarise(
    N        = n(),
    Mean_ri  = round(mean(ri),  6),
    SD_ri    = round(sd(ri),    6),
    Min_ri   = round(min(ri),   6),
    Max_ri   = round(max(ri),   6),
    Mean_MKT = round(mean(MKT), 6),
    Mean_SMB = round(mean(SMB), 6),
    Mean_HML = round(mean(HML), 6)
  ) %>%
  kable(
    caption   = "Table 2: Descriptive Statistics by Stock",
    col.names = c("Stock", "N", "Mean ri", "SD ri",
                  "Min ri", "Max ri",
                  "Mean MKT", "Mean SMB", "Mean HML")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = TRUE) %>%
  row_spec(0, background = "#2c3e50", color = "white")

Table 2: Descriptive Statistics by Stock
Stock	N	Mean ri	SD ri	Min ri	Max ri	Mean MKT	Mean SMB	Mean HML
AAPL	1257	0.000697	0.016804	-0.131884	0.085022	0.000377	2e-06	0.00013
FORD	1257	-0.000583	0.055493	-0.390866	0.961411	0.000377	2e-06	0.00013
GE	1257	0.000561	0.013450	-0.067653	0.102597	0.000377	2e-06	0.00013
GM	1257	-0.000008	0.018947	-0.115440	0.091084	0.000377	2e-06	0.00013
IBM	1257	-0.000055	0.012213	-0.086419	0.055106	0.000377	2e-06	0.00013
MSFT	1257	0.000654	0.014791	-0.121033	0.099413	0.000377	2e-06	0.00013

4 Exploratory: Daily Return Series

4.1 Return Distributions

ggplot(data, aes(x = ri, fill = symbol)) +
  geom_histogram(bins = 60, colour = "white", alpha = 0.85) +
  facet_wrap(~symbol, scales = "free_y", ncol = 3) +
  scale_x_continuous(labels = percent_format(accuracy = 0.1)) +
  scale_fill_colorblind() +
  labs(
    title    = "Distribution of Daily Excess Returns",
    subtitle = "Fama-French Three-Factor Universe  |  Jan 2011 - Dec 2015",
    x        = "Daily Excess Return",
    y        = "Count"
  ) +
  theme_clean() +
  theme(legend.position = "none",
        strip.text = element_text(face = "bold"))

Figure 1: Daily Excess Return Distributions (2011-2015)

4.2 Cumulative Returns

data %>%
  group_by(symbol) %>%
  arrange(date) %>%
  mutate(cum_ri = cumprod(1 + ri) - 1) %>%
  ungroup() %>%
  ggplot(aes(x = date, y = cum_ri, colour = symbol)) +
  geom_line(size = 0.7) +
  scale_y_continuous(labels = percent_format()) +
  scale_colour_colorblind() +
  labs(
    title  = "Cumulative Excess Returns (2011-2015)",
    x      = NULL, y = "Cumulative Return", colour = "Stock"
  ) +
  theme_clean()

Figure 2: Cumulative Return Paths (2011-2015)

4.3 Factor Time-Series

data %>%
  select(date, MKT, SMB, HML) %>%
  distinct() %>%
  pivot_longer(c(MKT, SMB, HML), names_to = "factor", values_to = "ret") %>%
  ggplot(aes(x = date, y = ret, colour = factor)) +
  geom_line(alpha = 0.7, size = 0.45) +
  facet_wrap(~factor, ncol = 1, scales = "free_y") +
  scale_y_continuous(labels = percent_format(accuracy = 0.01)) +
  scale_colour_colorblind() +
  labs(
    title = "Daily Fama-French Factor Returns",
    x = NULL, y = "Factor Return"
  ) +
  theme_clean() +
  theme(legend.position = "none")

Figure 3: Fama-French Factor Returns Over Time

5 Step 1: First Pass — Time-Series Regressions

In the first pass we run \(N = 6\) OLS regressions — one per stock — across all \(T\) daily observations, obtaining factor loadings (betas) for each stock.

5.1 Estimation of Factor Loadings

step0_betas <- data %>%
  nest(data = c(date, ri, MKT, SMB, HML)) %>%
  mutate(
    model     = map(data, ~ lm(ri ~ MKT + SMB + HML, data = .x)),
    estimates = map(model, tidy),
    fit       = map(model, glance)
  ) %>%
  unnest(estimates) %>%
  select(symbol, term, estimate) %>%
  pivot_wider(names_from = term, values_from = estimate) %>%
  select(
    symbol,
    alpha = `(Intercept)`,
    b_MKT = MKT,
    b_SMB = SMB,
    b_HML = HML
  )

5.2 Model Fit Statistics

data %>%
  nest(data = c(date, ri, MKT, SMB, HML)) %>%
  mutate(
    model = map(data, ~ lm(ri ~ MKT + SMB + HML, data = .x)),
    fit   = map(model, glance)
  ) %>%
  unnest(fit) %>%
  select(symbol, r.squared, adj.r.squared, statistic, p.value, nobs) %>%
  mutate(across(where(is.numeric), ~ round(.x, 4))) %>%
  kable(
    col.names = c("Stock", "R-sq", "Adj R-sq", "F-stat", "p-value", "Obs"),
    caption   = "Table 3: Pass 1 - Time-Series Regression Fit Statistics"
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE) %>%
  row_spec(0, background = "#2c3e50", color = "white")

Table 3: Pass 1 - Time-Series Regression Fit Statistics
Stock	R-sq	Adj R-sq	F-stat	p-value	Obs
AAPL	0.2729	0.2712	156.7577	0.0000	1257
FORD	0.0090	0.0067	3.8122	0.0098	1257
GE	0.6125	0.6116	660.1958	0.0000	1257
GM	0.4379	0.4366	325.4098	0.0000	1257
IBM	0.4255	0.4241	309.2936	0.0000	1257
MSFT	0.4054	0.4040	284.7811	0.0000	1257

5.3 Estimated Beta Coefficients

step0_betas %>%
  mutate(across(where(is.numeric), ~ round(.x, 4))) %>%
  kable(
    col.names = c("Stock", "Alpha", "Beta MKT", "Beta SMB", "Beta HML"),
    caption   = "Table 4: Pass 1 - Estimated Factor Loadings"
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE) %>%
  column_spec(3:5, bold = TRUE) %>%
  row_spec(0, background = "#2c3e50", color = "white")

Table 4: Pass 1 - Estimated Factor Loadings
Stock	Alpha	Beta MKT	Beta SMB	Beta HML
AAPL	4e-04	0.9000	0.0685	-0.0578
FORD	-8e-04	0.5129	-0.2644	0.1380
GE	1e-04	1.0779	0.0994	0.0902
GM	-5e-04	1.2854	0.0039	-0.0222
IBM	-4e-04	0.8169	0.0336	-0.0121
MSFT	3e-04	0.9656	0.0582	-0.0641

Interpretation: Beta-MKT near 1 = near-perfect market co-movement. Beta-SMB > 0 = small-cap-like behaviour. Beta-HML > 0 = value-stock orientation.

5.4 Beta Visualisation

step0_betas %>%
  pivot_longer(c(b_MKT, b_SMB, b_HML),
               names_to = "factor", values_to = "beta") %>%
  mutate(factor = recode(factor,
                         b_MKT = "Market (MKT)",
                         b_SMB = "Size (SMB)",
                         b_HML = "Value (HML)")) %>%
  ggplot(aes(x = symbol, y = beta, fill = factor)) +
  geom_col(position = "dodge", colour = "white", alpha = 0.9) +
  geom_hline(yintercept = 0, linetype = "dashed", colour = "grey40") +
  scale_fill_colorblind() +
  labs(
    title    = "Fama-French Factor Loadings by Stock",
    subtitle = "Pass 1: Time-Series OLS Estimates",
    x = "Stock", y = "Beta Coefficient", fill = "Factor"
  ) +
  theme_clean()

Figure 4: Factor Loadings for Each Stock

6 Step 2: Second Pass — Cross-Sectional Regressions

In the second pass we merge the estimated betas back into the panel and run \(T\) cross-sectional regressions — one per trading day — of realised returns on the estimated betas. Each regression yields time-\(t\) estimates of the factor risk premia \(\lambda_t\).

6.1 Estimation of Risk Premia

step0_panel <- data %>%
  left_join(
    step0_betas %>% select(symbol, b_MKT, b_SMB, b_HML),
    by = "symbol"
  )

step1 <- step0_panel %>%
  nest(data = c(symbol, ri, b_MKT, b_SMB, b_HML)) %>%
  mutate(
    estimates = map(data,
      ~ tidy(lm(ri ~ b_MKT + b_SMB + b_HML, data = .x)))
  ) %>%
  unnest(estimates) %>%
  select(date, term, estimate) %>%
  pivot_wider(names_from = term, values_from = estimate) %>%
  select(
    date,
    lambda0 = `(Intercept)`,
    lam_MKT = b_MKT,
    lam_SMB = b_SMB,
    lam_HML = b_HML
  )

head(step1, 8) %>%
  mutate(across(where(is.numeric), ~ round(.x, 6))) %>%
  kable(
    col.names = c("Date", "Lambda 0", "Lambda MKT", "Lambda SMB", "Lambda HML"),
    caption   = "Table 5: Pass 2 - Cross-Sectional Risk Premium Estimates (First 8 Dates)"
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE) %>%
  row_spec(0, background = "#2c3e50", color = "white")

Table 5: Pass 2 - Cross-Sectional Risk Premium Estimates (First 8 Dates)
Date	Lambda 0	Lambda MKT	Lambda SMB	Lambda HML
2011-01-04	-0.029761	0.041629	-0.025520	0.057372
2011-01-05	0.022334	-0.011347	-0.158046	0.062847
2011-01-06	-0.029243	0.037301	0.007029	-0.173234
2011-01-07	-0.017503	0.012722	0.032269	-0.064226
2011-01-10	0.036414	-0.036631	0.017123	0.058646
2011-01-11	0.002744	0.004089	-0.095361	0.089858
2011-01-12	0.072331	-0.055365	-0.164496	0.043036
2011-01-13	0.015027	-0.019357	0.001815	0.025630

6.2 Time-Series of Risk Premia

step1 %>%
  pivot_longer(c(lam_MKT, lam_SMB, lam_HML),
               names_to = "factor", values_to = "lambda") %>%
  mutate(factor = recode(factor,
                         lam_MKT = "Market (MKT)",
                         lam_SMB = "Size (SMB)",
                         lam_HML = "Value (HML)")) %>%
  ggplot(aes(x = date, y = lambda, colour = factor)) +
  geom_line(alpha = 0.45, size = 0.35) +
  geom_smooth(method = "loess", span = 0.3, se = FALSE, size = 1.1) +
  geom_hline(yintercept = 0, linetype = "dashed", colour = "grey30") +
  facet_wrap(~factor, ncol = 1, scales = "free_y") +
  scale_colour_colorblind() +
  scale_y_continuous(labels = percent_format(accuracy = 0.01)) +
  labs(
    title    = "Daily Risk Premium Estimates - Pass 2",
    subtitle = "Raw daily (thin line)  |  LOESS trend (thick line)",
    x = NULL, y = "Risk Premium Estimate"
  ) +
  theme_clean() +
  theme(legend.position = "none")

Figure 5: Daily Lambda Estimates with LOESS Trend

7 Fama-MacBeth Risk Premia Estimates

7.1 t-Tests

test_MKT <- t.test(step1$lam_MKT, mu = 0)
test_SMB <- t.test(step1$lam_SMB, mu = 0)
test_HML <- t.test(step1$lam_HML, mu = 0)

cat("=== Market Risk Premium (MKT) ===\n"); print(test_MKT)

#> === Market Risk Premium (MKT) ===

#> 
#>  One Sample t-test
#> 
#> data:  step1$lam_MKT
#> t = -0.37879, df = 1256, p-value = 0.7049
#> alternative hypothesis: true mean is not equal to 0
#> 95 percent confidence interval:
#>  -0.002546371  0.001722208
#> sample estimates:
#>     mean of x 
#> -0.0004120813

cat("\n=== Size Risk Premium (SMB) ===\n");   print(test_SMB)

#> 
#> === Size Risk Premium (SMB) ===

#> 
#>  One Sample t-test
#> 
#> data:  step1$lam_SMB
#> t = 0.97712, df = 1256, p-value = 0.3287
#> alternative hypothesis: true mean is not equal to 0
#> 95 percent confidence interval:
#>  -0.003711466  0.011076953
#> sample estimates:
#>   mean of x 
#> 0.003682744

cat("\n=== Value Risk Premium (HML) ===\n");  print(test_HML)

#> 
#> === Value Risk Premium (HML) ===

#> 
#>  One Sample t-test
#> 
#> data:  step1$lam_HML
#> t = -0.18044, df = 1256, p-value = 0.8568
#> alternative hypothesis: true mean is not equal to 0
#> 95 percent confidence interval:
#>  -0.005541205  0.004607776
#> sample estimates:
#>     mean of x 
#> -0.0004667146

7.2 Summary Table

fm_summary <- tibble(
  Factor      = c("Market (MKT)", "Size (SMB)", "Value (HML)"),
  Mean_Lambda = c(mean(step1$lam_MKT),
                  mean(step1$lam_SMB),
                  mean(step1$lam_HML)),
  SE_Lambda   = c(sd(step1$lam_MKT) / sqrt(nrow(step1)),
                  sd(step1$lam_SMB) / sqrt(nrow(step1)),
                  sd(step1$lam_HML) / sqrt(nrow(step1))),
  t_stat      = c(test_MKT$statistic,
                  test_SMB$statistic,
                  test_HML$statistic),
  p_value     = c(test_MKT$p.value,
                  test_SMB$p.value,
                  test_HML$p.value),
  CI_lower    = c(test_MKT$conf.int[1],
                  test_SMB$conf.int[1],
                  test_HML$conf.int[1]),
  CI_upper    = c(test_MKT$conf.int[2],
                  test_SMB$conf.int[2],
                  test_HML$conf.int[2]),
  Significant = c(
    ifelse(test_MKT$p.value < 0.05, "Yes ***", "No"),
    ifelse(test_SMB$p.value < 0.05, "Yes ***", "No"),
    ifelse(test_HML$p.value < 0.05, "Yes ***", "No")
  )
)

fm_summary %>%
  mutate(across(where(is.numeric), ~ round(.x, 6))) %>%
  kable(
    col.names = c("Factor",
                  "Mean Lambda", "SE (Lambda)",
                  "t-stat", "p-value",
                  "CI Lower (95%)", "CI Upper (95%)",
                  "Sig. at 5%?"),
    caption   = "Table 6: Fama-MacBeth Risk Premia Estimates - Final Results"
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = TRUE) %>%
  row_spec(which(fm_summary$p_value < 0.05),  background = "#d4edda") %>%
  row_spec(which(fm_summary$p_value >= 0.05), background = "#fff3cd") %>%
  column_spec(8, bold = TRUE) %>%
  row_spec(0, background = "#2c3e50", color = "white")

Table 6: Fama-MacBeth Risk Premia Estimates - Final Results
Factor	Mean Lambda	SE (Lambda)	t-stat	p-value	CI Lower (95%)	CI Upper (95%)	Sig. at 5%?
Market (MKT)	-0.000412	0.001088	-0.378788	0.704909	-0.002546	0.001722	No
Size (SMB)	0.003683	0.003769	0.977117	0.328699	-0.003711	0.011077	No
Value (HML)	-0.000467	0.002587	-0.180437	0.856839	-0.005541	0.004608	No

8 Visualisation of Results

8.1 Risk Premia with Confidence Intervals

fm_summary %>%
  ggplot(aes(x = Factor, y = Mean_Lambda, colour = Factor)) +
  geom_point(size = 5) +
  geom_errorbar(aes(ymin = CI_lower, ymax = CI_upper),
                width = 0.2, size = 1.2) +
  geom_hline(yintercept = 0, linetype = "dashed",
             colour = "grey40", size = 0.8) +
  scale_colour_colorblind() +
  scale_y_continuous(labels = percent_format(accuracy = 0.0001)) +
  labs(
    title    = "Fama-MacBeth Risk Premium Estimates",
    subtitle = "Point estimates with 95% CI  |  H0: Lambda = 0",
    x = NULL, y = "Average Risk Premium (Mean Lambda)"
  ) +
  theme_clean() +
  theme(legend.position = "none")

Figure 6: Risk Premia Point Estimates with 95% CIs

8.2 Distribution of Daily Lambda Estimates

step1 %>%
  pivot_longer(c(lam_MKT, lam_SMB, lam_HML),
               names_to = "factor", values_to = "lambda") %>%
  mutate(factor = recode(factor,
                         lam_MKT = "Market (MKT)",
                         lam_SMB = "Size (SMB)",
                         lam_HML = "Value (HML)")) %>%
  ggplot(aes(x = lambda, fill = factor)) +
  geom_histogram(bins = 60, colour = "white", alpha = 0.85) +
  geom_vline(xintercept = 0, linetype = "dashed", colour = "grey20") +
  facet_wrap(~factor, scales = "free", ncol = 1) +
  scale_x_continuous(labels = percent_format(accuracy = 0.01)) +
  scale_fill_colorblind() +
  labs(
    title    = "Distribution of Daily Cross-Sectional Lambda Estimates",
    subtitle = "Dashed line = zero",
    x = "Lambda Estimate", y = "Count"
  ) +
  theme_clean() +
  theme(legend.position = "none")

Figure 7: Distribution of Daily Lambda Estimates

9 Newey-West Adjusted Standard Errors

Standard Fama-MacBeth SEs assume no serial correlation in the \(\lambda_t\) series. Newey-West (1987) HAC standard errors relax this assumption, providing more robust inference when lambda series exhibit autocorrelation.

9.1 HAC Standard Errors

nw_se <- function(lambda_vec, lags = 5) {
  df_tmp  <- data.frame(y = lambda_vec, t = seq_along(lambda_vec))
  fit     <- lm(y ~ 1, data = df_tmp)
  nw_test <- coeftest(fit, vcov = NeweyWest(fit, lag = lags, prewhite = FALSE))
  tibble(
    estimate = nw_test[1, "Estimate"],
    nw_se    = nw_test[1, "Std. Error"],
    nw_tstat = nw_test[1, "t value"],
    nw_pval  = nw_test[1, "Pr(>|t|)"]
  )
}

nw_results <- tibble(
  Factor = c("Market (MKT)", "Size (SMB)", "Value (HML)"),
  bind_rows(
    nw_se(step1$lam_MKT),
    nw_se(step1$lam_SMB),
    nw_se(step1$lam_HML)
  )
)

nw_results %>%
  mutate(across(where(is.numeric), ~ round(.x, 6))) %>%
  kable(
    col.names = c("Factor", "Mean Lambda", "NW Std. Error",
                  "NW t-stat", "NW p-value"),
    caption   = "Table 7: Newey-West HAC Standard Errors (5 lags)"
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE) %>%
  row_spec(0, background = "#2c3e50", color = "white")

Table 7: Newey-West HAC Standard Errors (5 lags)
Factor	Mean Lambda	NW Std. Error	NW t-stat	NW p-value
Market (MKT)	-0.000412	0.001018	-0.404790	0.685701
Size (SMB)	0.003683	0.002970	1.240176	0.215142
Value (HML)	-0.000467	0.002226	-0.209703	0.833934

9.2 Robustness: ACF of Lambda Series

par(mfrow = c(3, 1), mar = c(4, 4, 3, 1))
acf(step1$lam_MKT, main = "ACF: Lambda MKT", lag.max = 30, col = "#0072B2")
acf(step1$lam_SMB, main = "ACF: Lambda SMB", lag.max = 30, col = "#E69F00")
acf(step1$lam_HML, main = "ACF: Lambda HML", lag.max = 30, col = "#009E73")

Figure 8: ACF of Daily Lambda Estimates

par(mfrow = c(1, 1))

Significant autocorrelation at short lags indicates Newey-West SEs are preferred over standard Fama-MacBeth SEs.

10 Comparison: Standard FM vs Newey-West SE

comparison <- tibble(
  Factor   = c("Market (MKT)", "Size (SMB)", "Value (HML)"),
  FM_SE    = c(sd(step1$lam_MKT) / sqrt(nrow(step1)),
               sd(step1$lam_SMB) / sqrt(nrow(step1)),
               sd(step1$lam_HML) / sqrt(nrow(step1))),
  FM_tstat = c(test_MKT$statistic, test_SMB$statistic, test_HML$statistic),
  FM_pval  = c(test_MKT$p.value,   test_SMB$p.value,   test_HML$p.value),
  NW_SE    = nw_results$nw_se,
  NW_tstat = nw_results$nw_tstat,
  NW_pval  = nw_results$nw_pval
)

comparison %>%
  mutate(across(where(is.numeric), ~ round(.x, 5))) %>%
  kable(
    col.names = c("Factor",
                  "FM Std. Error", "FM t-stat", "FM p-value",
                  "NW Std. Error", "NW t-stat", "NW p-value"),
    caption   = "Table 8: Standard FM vs Newey-West Standard Errors"
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = TRUE) %>%
  add_header_above(c(" " = 1,
                     "Standard Fama-MacBeth" = 3,
                     "Newey-West HAC" = 3)) %>%
  row_spec(0, background = "#2c3e50", color = "white")

Table 8: Standard FM vs Newey-West Standard Errors
	Standard Fama-MacBeth			Newey-West HAC
Factor	FM Std. Error	FM t-stat	FM p-value	NW Std. Error	NW t-stat	NW p-value
Market (MKT)	0.00109	-0.37879	0.70491	0.00102	-0.40479	0.68570
Size (SMB)	0.00377	0.97712	0.32870	0.00297	1.24018	0.21514
Value (HML)	0.00259	-0.18044	0.85684	0.00223	-0.20970	0.83393

comparison %>%
  select(Factor, FM_SE, NW_SE) %>%
  pivot_longer(c(FM_SE, NW_SE),
               names_to = "method", values_to = "se") %>%
  mutate(method = recode(method,
                         FM_SE = "Standard FM",
                         NW_SE = "Newey-West HAC")) %>%
  ggplot(aes(x = Factor, y = se, fill = method)) +
  geom_col(position = "dodge", colour = "white", alpha = 0.9) +
  scale_fill_colorblind() +
  scale_y_continuous(labels = scientific_format()) +
  labs(
    title    = "Standard Error Comparison by Factor",
    subtitle = "Standard FM vs Newey-West HAC (5 lags)",
    x = NULL, y = "Standard Error", fill = "Method"
  ) +
  theme_clean()

Figure 9: SE Comparison - Standard FM vs Newey-West

11 Summary & Interpretation

11.1 Results Overview

tibble(
  Factor     = c("Market (MKT)", "Size (SMB)", "Value (HML)"),
  Mean_Lam   = round(c(mean(step1$lam_MKT),
                       mean(step1$lam_SMB),
                       mean(step1$lam_HML)), 6),
  Sign       = c(
    ifelse(mean(step1$lam_MKT) > 0, "Positive", "Negative"),
    ifelse(mean(step1$lam_SMB) > 0, "Positive", "Negative"),
    ifelse(mean(step1$lam_HML) > 0, "Positive", "Negative")
  ),
  FF3_Pred   = c("Positive (CAPM/FF3)", "Positive (FF3)", "Positive (FF3)"),
  Consistent = c(
    ifelse(mean(step1$lam_MKT) > 0, "Yes", "No"),
    ifelse(mean(step1$lam_SMB) > 0, "Yes", "No"),
    ifelse(mean(step1$lam_HML) > 0, "Yes", "No")
  ),
  FM_Sig  = c(
    ifelse(test_MKT$p.value < 0.05, "Yes (5%)", "No"),
    ifelse(test_SMB$p.value < 0.05, "Yes (5%)", "No"),
    ifelse(test_HML$p.value < 0.05, "Yes (5%)", "No")
  ),
  NW_Sig  = c(
    ifelse(nw_results$nw_pval[1] < 0.05, "Yes (5%)", "No"),
    ifelse(nw_results$nw_pval[2] < 0.05, "Yes (5%)", "No"),
    ifelse(nw_results$nw_pval[3] < 0.05, "Yes (5%)", "No")
  )
) %>%
  kable(
    col.names = c("Factor", "Mean Lambda", "Sign",
                  "FF3 Prediction", "Consistent?",
                  "FM Significant?", "NW Significant?"),
    caption   = "Table 9: Economic Interpretation of Fama-MacBeth Risk Premia"
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = TRUE) %>%
  row_spec(0, background = "#2c3e50", color = "white")

Table 9: Economic Interpretation of Fama-MacBeth Risk Premia
Factor	Mean Lambda	Sign	FF3 Prediction	Consistent?	FM Significant?	NW Significant?
Market (MKT)	-0.000412	Negative	Positive (CAPM/FF3)	No	No	No
Size (SMB)	0.003683	Positive	Positive (FF3)	Yes	No	No
Value (HML)	-0.000467	Negative	Positive (FF3)	No	No	No

11.2 Theoretical Predictions for Fama-MacBeth Risk Premia

Under the Fama-French three-factor framework:

Market (MKT): \(\lambda_{MKT}\) should be positive — investors require compensation for non-diversifiable market risk.
Size (SMB): \(\lambda_{SMB}\) should be positive — small-cap stocks historically earn higher returns, consistent with a priced size risk factor.
Value (HML): \(\lambda_{HML}\) should be positive — value stocks earn a premium over growth stocks, consistent with distress-risk pricing.

11.3 Discussion

Market Factor (MKT): Statistical significance at daily frequency is typically weak because the daily market premium is small. Annualised estimates should be benchmarked against the historical equity premium (~5–7%).
Size Factor (SMB): This sample is dominated by large/mid-cap names (AAPL, IBM, MSFT, GE). The SMB premium may be attenuated relative to a full Fama-French universe spanning thousands of stocks.
Value Factor (HML): The 2011–2015 period coincided with a growth-stock bull run — particularly in technology — which can suppress or invert the value premium in sub-period analyses.
Sample size limitation: With \(N = 6\) stocks, the Pass 2 cross-sectional regressions have only 2 degrees of freedom. A larger cross-section is required for reliable risk premia estimates.
Standard FM vs Newey-West: When \(\lambda_t\) series exhibit positive autocorrelation, standard FM standard errors are downward-biased. Newey-West SEs provide conservative, more reliable inference in finite samples.

Fama-MacBeth Two-Pass Regression: A Replication Study

Khongorzul Erkhembayar

May 19, 2026