We download the 6 Portfolios Formed on Size and Book-to-Market (2×3) value-weighted monthly returns from Kenneth French’s data library, restrict to January 1930 – December 2018, and split the sample in half.
# ── packages ──────────────────────────────────────────────────────────────────
if (!require(frenchdata)) install.packages("frenchdata")
if (!require(moments)) install.packages("moments")
if (!require(tidyverse)) install.packages("tidyverse")
if (!require(knitr)) install.packages("knitr")
if (!require(kableExtra)) install.packages("kableExtra")
library(frenchdata)
library(moments)
library(tidyverse)
library(knitr)
library(kableExtra)
# ── download ──────────────────────────────────────────────────────────────────
raw <- download_french_data("6 Portfolios Formed on Size and Book-to-Market (2 x 3)")
# Value-weighted monthly returns are in the first sub-table
vw <- raw$subsets$data[[1]] # monthly value-weighted returns
# Inspect
head(vw)
## # A tibble: 6 × 7
## date `SMALL LoBM` `ME1 BM2` `SMALL HiBM` `BIG LoBM` `ME2 BM2` `BIG HiBM`
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 192607 1.09 0.881 -0.128 5.57 1.91 2.01
## 2 192608 0.783 1.47 5.44 2.73 2.70 5.68
## 3 192609 -2.80 -0.0599 -0.440 1.48 0.0954 -0.787
## 4 192610 -4.03 -4.36 -2.01 -3.63 -2.35 -4.00
## 5 192611 3.30 3.62 2.09 3.21 2.93 3.20
## 6 192612 2.56 1.78 3.27 2.90 2.62 2.31
# ── tidy & filter Jan 1930 – Dec 2018 ─────────────────────────────────────────
df <- vw %>%
mutate(date = as.Date(paste0(date, "01"), format = "%Y%m%d")) %>%
filter(date >= as.Date("1930-01-01"),
date <= as.Date("2018-12-31"))
# Portfolio names (short labels)
port_names <- c("Small/Low", "Small/Mid", "Small/High",
"Big/Low", "Big/Mid", "Big/High")
# Convert returns to numeric (they come in as %)
ret_cols <- setdiff(names(df), "date")
df[ret_cols] <- lapply(df[ret_cols], as.numeric)
cat("Total months:", nrow(df), "\n") # Should be 1068 (Jan 1930 – Dec 2018)
## Total months: 1068
# ── split sample in half ───────────────────────────────────────────────────────
n <- nrow(df)
half <- n %/% 2
df_h1 <- df[1:half, ] # First half
df_h2 <- df[(half+1):n, ] # Second half
cat(sprintf("First half: %s to %s (%d months)\n",
min(df_h1$date), max(df_h1$date), nrow(df_h1)))
## First half: 1930-01-01 to 1974-06-01 (534 months)
cat(sprintf("Second half: %s to %s (%d months)\n",
min(df_h2$date), max(df_h2$date), nrow(df_h2)))
## Second half: 1974-07-01 to 2018-12-01 (534 months)
# ── helper: compute stats for one data frame of returns ──────────────────────
compute_stats <- function(data, cols) {
sapply(cols, function(col) {
x <- data[[col]]
c(Mean = mean(x, na.rm = TRUE),
SD = sd(x, na.rm = TRUE),
Skewness = skewness(x, na.rm = TRUE),
Kurtosis = kurtosis(x, na.rm = TRUE)) # excess kurtosis via moments
}) %>%
t() %>%
as.data.frame() %>%
rownames_to_column("Portfolio")
}
stats_h1 <- compute_stats(df_h1, ret_cols)
stats_h1$Portfolio <- port_names
kable(stats_h1, digits = 4, caption = "First Half — Monthly Return Statistics (%)") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE)
| Portfolio | Mean | SD | Skewness | Kurtosis |
|---|---|---|---|---|
| Small/Low | 0.9713 | 8.2253 | 1.1800 | 12.0716 |
| Small/Mid | 1.1695 | 8.4229 | 1.5797 | 15.7404 |
| Small/High | 1.4844 | 10.2059 | 2.2875 | 20.0760 |
| Big/Low | 0.7648 | 5.7095 | 0.1783 | 9.8941 |
| Big/Mid | 0.8118 | 6.7341 | 1.7116 | 20.5352 |
| Big/High | 1.1874 | 8.9106 | 1.7694 | 17.4682 |
stats_h2 <- compute_stats(df_h2, ret_cols)
stats_h2$Portfolio <- port_names
kable(stats_h2, digits = 4, caption = "Second Half — Monthly Return Statistics (%)") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE)
| Portfolio | Mean | SD | Skewness | Kurtosis |
|---|---|---|---|---|
| Small/Low | 0.9959 | 6.6884 | -0.4086 | 5.1587 |
| Small/Mid | 1.3548 | 5.2817 | -0.5330 | 6.4246 |
| Small/High | 1.4251 | 5.4987 | -0.4644 | 7.3053 |
| Big/Low | 0.9781 | 4.6955 | -0.3337 | 4.9925 |
| Big/Mid | 1.0578 | 4.3391 | -0.4729 | 5.6534 |
| Big/High | 1.1446 | 4.8871 | -0.5172 | 5.8054 |
# Combine for plotting
stats_h1$Half <- "First Half"
stats_h2$Half <- "Second Half"
combined <- bind_rows(stats_h1, stats_h2)
combined %>%
pivot_longer(cols = c(Mean, SD), names_to = "Statistic", values_to = "Value") %>%
ggplot(aes(x = Portfolio, y = Value, fill = Half)) +
geom_col(position = "dodge") +
facet_wrap(~Statistic, scales = "free_y") +
theme_minimal(base_size = 12) +
labs(title = "Mean and SD by Portfolio — Two Halves Comparison",
x = NULL, y = "Monthly Return (%)") +
scale_fill_manual(values = c("#2c7bb6", "#d7191c")) +
theme(axis.text.x = element_text(angle = 30, hjust = 1))
Do the six split-halves statistics suggest that returns come from the same distribution over the entire period?
Looking at the table above:
Conclusion: The statistics differ enough between the
two halves — especially in volatility,
skewness, and kurtosis — to cast doubt on the assumption of a single,
stationary distribution
over the entire 1930–2018 period. This is consistent with
well-documented structural breaks
in equity market dynamics (e.g., the Depression, post-war economic
expansion, 1970s stagflation,
and the modern era of monetary policy).