Context: Hennessy manages a $30 million equity portfolio holding approximately 40 stocks (2–3% per issue), with demonstrated skill in identifying roughly 10 superior-performing issues per year. Jones proposes restricting the portfolio to no more than 20 stocks by doubling commitments to favored holdings.
Part (a): Will limiting to 20 stocks increase or decrease portfolio risk?
Concept: Diversification reduces unsystematic (idiosyncratic) risk. As the number of holdings decreases from 40 to 20, the portfolio becomes less diversified, and unsystematic risk necessarily increases. The total portfolio variance is:
\[\sigma_p^2 = \beta_p^2 \sigma_m^2 + \frac{\bar{\sigma}_\varepsilon^2}{n}\]
where \(\bar{\sigma}_\varepsilon^2\) is the average residual variance and \(n\) is the number of stocks. Halving \(n\) from 40 to 20 approximately doubles the idiosyncratic component, all else equal.
Answer: Risk will increase. The portfolio will carry more firm-specific risk because fewer stocks means less diversification of idiosyncratic shocks. The 40-stock portfolio already captures most systematic risk, so the incremental diversification benefit from going below 20 is relatively limited, but the move from 40 to 20 still materially raises unsystematic exposure.
Part (b): Can Hennessy reduce holdings from 40 to 20 without significantly affecting risk?
Answer: Yes, if done carefully. Hennessy could concentrate in the 20 stocks with the lowest pairwise correlations among themselves, thereby maintaining diversification benefits despite the smaller count. If the discarded 20 stocks were highly correlated with those retained, dropping them has minimal risk impact. Additionally, if the retained stocks span diverse sectors and have low inter-stock correlations, the residual variance term in the expression above may not increase materially. The key is selecting stocks whose returns are driven by different underlying fundamentals.
Concept: The marginal diversification benefit of adding the \(n\)-th stock diminishes rapidly. Most unsystematic risk is eliminated with 20–30 randomly selected stocks. However, reducing from 20 to 10 stocks concentrates the portfolio further, increasing idiosyncratic risk substantially.
Answer: Reduction to 10 is less likely to be advantageous because:
Concept: When Wilstead evaluates the Hennessy portfolio as one component of a larger multi-manager fund, the relevant risk measure shifts from standalone volatility to the contribution to total fund risk — specifically, the covariance of Hennessy’s portfolio with the aggregate $280 million fund.
Answer: If the other five managers’ portfolios are lowly correlated with Hennessy’s holdings, then Hennessy’s portfolio — even concentrated in 10 stocks — may contribute very little marginal risk to the total fund. The committee should consider:
\[\text{Contribution to Fund Risk} = w_H \cdot \text{Cov}(R_H, R_{\text{Fund}}) / \sigma_{\text{Fund}}\]
If \(\text{Cov}(R_H, R_{\text{Fund}})\) is low, concentrating Hennessy’s portion adds idiosyncratic exposure at the sub-portfolio level but minimal total-fund risk. In this case, the restriction to 10 or 20 stocks matters less from a fund-level perspective than from a standalone perspective. The committee might be more permissive about concentration if Hennessy’s alpha generation compensates for the residual idiosyncratic risk, since that risk is partially diversified away by the other managers.
Concept: A portfolio lies on the Markowitz efficient frontier if and only if no other portfolio offers a higher expected return for the same risk, or lower risk for the same return. A portfolio is dominated (and thus cannot be on the efficient frontier) if another portfolio achieves a higher return with equal or lower standard deviation.
| Portfolio | E(R) | σ |
|---|---|---|
| W | 15% | 36% |
| X | 12% | 15% |
| Z | 5% | 7% |
| Y | 9% | 21% |
Analysis: Compare Portfolio Y (E(R) = 9%, σ = 21%) against Portfolio X (E(R) = 12%, σ = 15%). Portfolio X offers a higher expected return (12% > 9%) and lower risk (15% < 21%). Therefore, no mean-variance investor would choose Y over X.
\[\text{Sharpe}_X = \frac{12 - r_f}{15} > \frac{9 - r_f}{21} = \text{Sharpe}_Y \quad \text{for reasonable } r_f\]
Answer: Portfolio Y (d) cannot lie on the efficient frontier, as it is strictly dominated by Portfolio X.
Data:
| Stock | σ (%) | Correlations | ||
|---|---|---|---|---|
| A | B | C | ||
| A | 40 | 1.00 | 0.90 | 0.50 |
| B | 20 | 0.90 | 1.00 | 0.10 |
| C | 40 | 0.50 | 0.10 | 1.00 |
Concept: For an equal-weighted two-asset portfolio (\(w_i = 0.5\)), portfolio variance is:
\[\sigma_p^2 = w_1^2\sigma_1^2 + w_2^2\sigma_2^2 + 2w_1 w_2 \rho_{12}\sigma_1\sigma_2\]
Portfolio AB (\(\rho_{AB} = 0.90\)):
\[\sigma_{AB}^2 = (0.5)^2(40)^2 + (0.5)^2(20)^2 + 2(0.5)(0.5)(0.90)(40)(20)\] \[= 0.25(1600) + 0.25(400) + 0.5(0.90)(800)\] \[= 400 + 100 + 360 = 860\] \[\sigma_{AB} = \sqrt{860} = 29.33\%\]
Portfolio BC (\(\rho_{BC} = 0.10\)):
\[\sigma_{BC}^2 = (0.5)^2(20)^2 + (0.5)^2(40)^2 + 2(0.5)(0.5)(0.10)(20)(40)\] \[= 0.25(400) + 0.25(1600) + 0.5(0.10)(800)\] \[= 100 + 400 + 40 = 540\] \[\sigma_{BC} = \sqrt{540} = 23.24\%\]
Interpretation: Because we are given only standard deviations and correlations — no expected returns — we cannot evaluate return performance. The recommendation is based purely on risk minimization. Portfolio BC has a standard deviation of 23.24% versus 29.33% for AB, a reduction of approximately 6 percentage points. The low correlation between B and C (ρ = 0.10) produces meaningful diversification, whereas the near-perfect correlation between A and B (ρ = 0.90) means little risk reduction occurs in Portfolio AB.
Recommendation: Portfolio BC, as it offers lower risk for any given expected return under mean-variance optimization.
Data:
| Statistic | ABC | XYZ |
|---|---|---|
| Alpha | −3.20% | 7.30% |
| Beta | 0.60 | 0.97 |
| R² | 0.35 | 0.17 |
| Residual Std Dev | 13.02% | 21.45% |
Brokerage Estimates (2-year weekly):
| House | Beta of ABC | Beta of XYZ |
|---|---|---|
| A | 0.62 | 1.45 |
| B | 0.71 | 1.25 |
Analysis:
ABC: The five-year alpha is −3.20%, indicating the stock underperformed the CAPM benchmark on a risk-adjusted basis over the estimation period. Beta of 0.60 suggests below-market systematic risk. R² = 0.35 means 35% of return variation is explained by market movements; the residual standard deviation of 13.02% reflects substantial firm-specific risk.
XYZ: Alpha of +7.30% suggests strong risk-adjusted outperformance historically. However, R² = 0.17 is very low — only 17% of variance is market-driven — and the residual standard deviation of 21.45% is high, indicating most of XYZ’s risk is idiosyncratic. In a diversified portfolio, this unsystematic risk is not compensated.
Implications for future risk-return relationships:
Beta instability: The brokerage estimates for XYZ diverge substantially (1.45 vs. 1.25), suggesting its beta is unstable over time. ABC’s brokerage estimates (0.62 vs. 0.71) are more consistent but both are meaningfully above the 5-year estimate of 0.60. Beta drift is a known phenomenon; the most recent estimates should receive higher weight.
Alpha persistence: Historical alphas are not reliable predictors of future alphas. ABC’s negative alpha could reflect a temporary value trap; XYZ’s positive alpha may reflect a growth premium already priced in.
In a diversified portfolio: XYZ’s high idiosyncratic risk is diversified away; only its beta (approximately 1.35 as a midpoint of recent estimates) determines its marginal contribution to portfolio risk. ABC, with its lower and more stable beta, contributes less market risk.
Formula: Given the correlation coefficient \(\rho\) between a fund and the market index:
\[R^2 = \rho^2 = \frac{\text{Systematic Variance}}{\text{Total Variance}}\]
\[\text{Nonsystematic Proportion} = 1 - R^2 = 1 - \rho^2\]
Calculation:
\[\rho = 0.70 \implies R^2 = (0.70)^2 = 0.49\]
\[\text{Nonsystematic Proportion} = 1 - 0.49 = 0.51 = 51\%\]
Answer: 51% of Baker Fund’s total risk is nonsystematic (firm-specific).
Interpretation: Just over half of the fund’s total return variation cannot be attributed to market-wide movements. For a fund that is part of a broader diversified portfolio, this unsystematic risk is largely irrelevant to the investor. However, if Baker Fund is held as a standalone investment, this unsystematic exposure significantly increases total risk without additional compensation under CAPM.
Given:
Formula: From CAPM:
\[E(R_i) = r_f + \beta_i [E(R_m) - r_f]\]
Solving for \(\beta\):
\[\beta = \frac{E(R_i) - r_f}{E(R_m) - r_f} = \frac{9\% - 3\%}{11\% - 3\%} = \frac{6\%}{8\%} = 0.75\]
Answer: The implied beta of Charlottesville International is 0.75.
Interpretation: A beta of 0.75 implies the fund moves 75% as much as the world market index on average. Since the correlation is perfect (ρ = 1.0), all of the fund’s risk is systematic — there is no idiosyncratic variance. The fund’s lower expected return (9% vs. 11% market) is entirely consistent with its below-market systematic risk. An investor seeking full market exposure would need to lever the fund, whereas a risk-averse investor seeking broad international diversification with reduced volatility would find this profile appropriate.
Answer: (d) Systematic risk.
Explanation: Beta (\(\beta\)) measures a security’s sensitivity to systematic (market-wide) risk factors — movements that cannot be eliminated through diversification. It is defined as:
\[\beta_i = \frac{\text{Cov}(R_i, R_m)}{\text{Var}(R_m)}\]
Beta is explicitly a measure of co-movement with the market portfolio, capturing only the non-diversifiable component of total risk. Correlation coefficients (a) describe linear association but are not specific to systematic risk. Mean-variance analysis (b) uses total variance. Nonsystematic risk (c) is the component beta does not measure.
Answer: (b) Beta measures only systematic risk, while standard deviation measures total risk.
Explanation: Under the single-index model, total variance decomposes as:
\[\sigma_i^2 = \beta_i^2 \sigma_m^2 + \sigma_{\varepsilon_i}^2\]
Standard deviation (\(\sigma_i\)) captures both the systematic component (\(\beta_i^2 \sigma_m^2\)) and the idiosyncratic component (\(\sigma_{\varepsilon_i}^2\)). Beta measures only the former — sensitivity to market-wide movements. In the context of a well-diversified portfolio, beta is the appropriate risk measure because idiosyncratic variance cancels across holdings. For an undiversified investor, standard deviation is more relevant as it captures total exposure.
Reference data for Problems 8 and 9:
| Portfolio | Avg. Annual Return | Std. Deviation | Beta |
|---|---|---|---|
| R | 11% | 10% | 0.5 |
| S&P 500 | 14% | 12% | 1.0 |
Note: Risk-free rate is not explicitly given. We use the S&P 500 as the market proxy.
Concept: The Security Market Line (SML) plots expected return against beta:
\[E(R) = r_f + \beta \cdot [E(R_m) - r_f]\]
To locate R relative to the SML, we need the risk-free rate. Using the S&P 500 as the market (\(E(R_m) = 14\%\), \(\beta_m = 1.0\)), and treating \(r_f\) as an unknown, the expected return on R according to CAPM is:
\[E(R_R)^{CAPM} = r_f + 0.5(14\% - r_f) = r_f(1 - 0.5) + 7\% = 0.5 r_f + 7\%\]
The actual average return on R is 11%. For R to lie on the SML: \(0.5 r_f + 7\% = 11\%\), implying \(r_f = 8\%\).
If \(r_f < 8\%\) (which is realistic for the sample period), then \(E(R_R)^{CAPM} < 11\%\), meaning R outperformed the CAPM benchmark — it lies above the SML.
Answer: (c) Above the SML.
Interpretation: Portfolio R earned a positive alpha relative to its systematic risk, delivering more return per unit of beta than the market equilibrium would suggest.
Concept: The Capital Market Line (CML) plots expected return against total risk (standard deviation):
\[E(R_p) = r_f + \frac{E(R_m) - r_f}{\sigma_m} \cdot \sigma_p\]
The CML slope (Sharpe ratio of the market) using S&P 500:
\[\text{Sharpe}_{S\&P} = \frac{14\% - r_f}{12\%}\]
For Portfolio R: \(\text{Sharpe}_R = \frac{11\% - r_f}{10\%}\)
Comparing Sharpe ratios: \(\frac{11\% - r_f}{10\%}\) vs. \(\frac{14\% - r_f}{12\%}\)
Cross-multiplying: \(12(11\% - r_f)\) vs. \(10(14\% - r_f)\)
\(132\% - 12r_f\) vs. \(140\% - 10r_f\)
\(132\% - 140\% > 12r_f - 10r_f \implies -8\% > 2r_f \implies r_f < -4\%\)
For any positive risk-free rate, R’s Sharpe ratio is below that of the market. Therefore Portfolio R lies below the CML.
Answer: (b) Below the CML.
Interpretation: While R generated positive alpha on a beta-adjusted basis (above the SML), it does not compensate investors adequately for its total risk. This apparent contradiction occurs because R has a low beta (0.5) but its total risk (10% σ) is not proportionally low relative to the market (12% σ) — implying significant unsystematic risk that dilutes its total risk-adjusted performance.
| Portfolio A | Portfolio B | |
|---|---|---|
| Systematic risk (beta) | 1.0 | 1.0 |
| Specific (idiosyncratic) risk | High | Low |
Answer: Under CAPM, investors should not expect a higher return on Portfolio A than on Portfolio B.
CAPM states that in equilibrium, expected returns are determined solely by systematic risk (beta). Both portfolios have identical betas of 1.0, so their CAPM-implied expected returns are identical:
\[E(R_A) = E(R_B) = r_f + 1.0 \cdot [E(R_m) - r_f] = E(R_m)\]
Portfolio A’s higher specific risk is idiosyncratic and — crucially — diversifiable. In a well-diversified portfolio, this unsystematic variance cancels out. Since rational investors hold diversified portfolios, they bear no cost and receive no compensation for holding idiosyncratic risk. Therefore the market will not price this additional specific risk, and both portfolios carry the same required return despite Portfolio A’s higher total variance.
Context for Problems 13–16: McCracken uses a two-factor APT model where factors are (1) changes in real GDP (risk premium = 8%) and (2) changes in inflation (risk premium = 2%). Risk-free rate = 4%.
Fund sensitivities:
| Fund | GDP Sensitivity | Inflation Sensitivity |
|---|---|---|
| High Growth | 1.25 | 1.5 |
| Large Cap | 0.75 | 1.25 |
| Utility | 1.0 | 2.0 |
Formula:
\[E(R) = r_f + \beta_{GDP} \cdot \lambda_{GDP} + \beta_{Inf} \cdot \lambda_{Inf}\]
Calculation:
\[E(R_{HG}) = 4\% + (1.25)(8\%) + (1.5)(2\%)\] \[= 4\% + 10\% + 3\% = 17\%\]
Answer: The APT expected return for the High Growth Fund is 17%.
Interpretation: The fund’s above-average sensitivity to GDP growth (β = 1.25) drives the bulk of its risk premium (10 percentage points). Its moderate inflation sensitivity adds another 3 percentage points above the risk-free rate. McCracken’s fundamental analysis confirms this figure, validating the two-factor specification for this asset.
APT expected return:
\[E(R_{LC})^{APT} = 4\% + (0.75)(8\%) + (1.25)(2\%)\] \[= 4\% + 6\% + 2.5\% = 12.5\%\]
Kwon’s fundamental estimate: \(r_f + 8.5\% = 4\% + 8.5\% = 12.5\%\)
APT expected return = 12.5% and fundamental expected return = 12.5%.
Answer: No arbitrage opportunity is available.
Interpretation: The Large Cap Fund is fairly priced relative to the two-factor APT model. Kwon’s independently derived fundamental expected return of 12.5% above risk-free (i.e., 12.5% + 4% = 12.5% total — note: “8.5% above the risk-free rate” means \(E(R) = 4\% + 8.5\% = 12.5\%\)) coincides exactly with the model’s prediction. When model price equals market price, no riskless profit opportunity exists.
Objective: Construct a portfolio of High Growth (HG), Large Cap (LC), and Utility (U) with unit exposure to GDP and zero exposure to inflation.
Let \(w_{HG}\), \(w_{LC}\), \(w_U\) be the weights.
Constraints:
Solving the system:
From constraint 3: \(w_U = 1 - w_{HG} - w_{LC}\)
Substituting into constraint 1:
\[1.25 w_{HG} + 0.75 w_{LC} + 1.0(1 - w_{HG} - w_{LC}) = 1\] \[0.25 w_{HG} - 0.25 w_{LC} = 0 \implies w_{HG} = w_{LC}\]
Substituting \(w_{HG} = w_{LC} = w\) into constraint 2:
\[1.5w + 1.25w + 2.0(1 - 2w) = 0\] \[2.75w + 2.0 - 4.0w = 0\] \[-1.25w = -2.0 \implies w = 1.6\]
So \(w_{HG} = w_{LC} = 1.6\) and \(w_U = 1 - 1.6 - 1.6 = -2.2\).
Answer: The weight in the Utility Fund is (a) −2.2.
Interpretation: The GDP Fund requires a short position of 2.2 in the Utility Fund to cancel out its high inflation sensitivity (β = 2.0). The leverage involved (long 1.6× each in HG and LC, short 2.2× in Utility) creates a pure GDP-factor exposure, useful for retirees whose income needs track real economic growth but who are harmed by unexpected inflation.
Answer: (b) Both are correct.
Stiles argues the GDP Fund is suitable for retirees living off investment income, since its pure real-GDP exposure provides a steady income stream correlated with economic activity while being immune to inflation surprises that erode purchasing power.
McCracken argues it is appropriate if supply-side government policies succeed in boosting real GDP growth. In that scenario, the GDP factor premium would be elevated, generating superior risk-adjusted returns for the fund’s unitholders.
Both perspectives are internally consistent. Stiles focuses on the structural income-matching properties; McCracken focuses on the tactical macro environment. The fund serves both purposes simultaneously, making both analysts correct.
The analysis uses adjusted closing prices to account for dividends and stock splits. Raw closing prices would overstate returns in periods following significant dividend distributions, particularly for income-generating ETFs such as TLT and IYR. Adjusted prices ensure that computed returns reflect the true economic return to a buy-and-hold investor.
# Load required libraries
library(tidyquant)
library(tidyverse)
library(timetk)
library(lubridate)
library(purrr)
library(PerformanceAnalytics)
library(xts)
library(zoo)
library(knitr)
library(kableExtra)
library(ggplot2)
library(scales)
library(frenchdata) # For Fama-French factor data# Define ETF tickers
tickers <- c("SPY", "QQQ", "EEM", "IWM", "EFA", "TLT", "IYR", "GLD")
# Download adjusted daily prices from Yahoo Finance
etf_prices_raw <- tq_get(
tickers,
from = "2010-01-01",
to = Sys.Date(),
get = "stock.prices"
)
# Extract adjusted closing prices and pivot to wide format
etf_prices <- etf_prices_raw %>%
select(symbol, date, adjusted) %>%
pivot_wider(names_from = symbol, values_from = adjusted) %>%
arrange(date)
# Convert to xts for time series operations
etf_xts <- xts(etf_prices[, -1], order.by = etf_prices$date)
# Display first and last observations
cat("=== First 6 Observations ===\n")## === First 6 Observations ===
## SPY QQQ EEM IWM EFA TLT IYR
## 2010-01-04 84.79639 40.29078 30.35150 51.36657 35.12844 55.70952 26.76811
## 2010-01-05 85.02083 40.29078 30.57180 51.18995 35.15940 56.06930 26.83238
## 2010-01-06 85.08068 40.04776 30.63577 51.14177 35.30802 55.31872 26.82070
## 2010-01-07 85.43987 40.07379 30.45811 51.51910 35.17178 55.41177 27.06026
## 2010-01-08 85.72417 40.40363 30.69972 51.80010 35.45044 55.38696 26.87914
## 2010-01-11 85.84389 40.23871 30.63577 51.59137 35.74147 55.08302 27.00769
## GLD
## 2010-01-04 109.80
## 2010-01-05 109.70
## 2010-01-06 111.51
## 2010-01-07 110.82
## 2010-01-08 111.37
## 2010-01-11 112.85
##
## === Last 6 Observations ===
## SPY QQQ EEM IWM EFA TLT IYR GLD
## 2026-06-02 759.57 746.16 70.80 291.66 105.02 85.65 99.99 411.95
## 2026-06-03 754.24 744.21 69.92 287.67 104.12 85.31 100.00 407.87
## 2026-06-04 757.09 740.61 69.10 292.01 104.95 85.50 101.79 411.27
## 2026-06-05 737.55 705.06 64.59 281.65 102.26 85.06 102.54 396.24
## 2026-06-08 739.22 716.07 65.75 284.11 102.88 84.62 101.08 397.27
## 2026-06-09 NA NA NA NA NA NA NA NA
##
## === Summary Statistics (Daily Adjusted Prices) ===
## Index SPY QQQ EEM
## Min. :2010-01-04 Min. : 77.15 Min. : 36.99 Min. :22.63
## 1st Qu.:2014-02-11 1st Qu.:149.36 1st Qu.: 79.12 1st Qu.:31.00
## Median :2018-03-20 Median :234.81 Median :153.80 Median :34.75
## Mean :2018-03-19 Mean :279.27 Mean :212.13 Mean :36.17
## 3rd Qu.:2022-04-26 3rd Qu.:393.90 3rd Qu.:320.82 3rd Qu.:39.24
## Max. :2026-06-09 Max. :759.57 Max. :746.16 Max. :70.80
## NA's :1 NA's :1 NA's :1
## IWM EFA TLT IYR
## Min. : 47.11 Min. : 28.66 Min. : 54.83 Min. : 24.80
## 1st Qu.: 93.70 1st Qu.: 42.71 1st Qu.: 81.89 1st Qu.: 46.76
## Median :132.85 Median : 51.44 Median : 89.66 Median : 63.01
## Mean :135.88 Mean : 54.17 Mean : 92.02 Mean : 64.51
## 3rd Qu.:181.21 3rd Qu.: 64.71 3rd Qu.: 98.74 3rd Qu.: 81.37
## Max. :292.03 Max. :105.66 Max. :143.23 Max. :104.07
## NA's :1 NA's :1 NA's :1 NA's :1
## GLD
## Min. :100.5
## 1st Qu.:121.3
## Median :147.6
## Mean :164.6
## 3rd Qu.:174.1
## Max. :495.9
## NA's :1
Why adjusted prices? Dividends and corporate actions create artificial price discontinuities. For example, when SPY distributes a quarterly dividend, the price drops mechanically on the ex-dividend date. Using raw prices would record a negative return on that day despite the investor actually earning positive income. Adjusted prices back-distribute these cash flows proportionally, yielding internally consistent return calculations throughout the sample.
Simple returns are used rather than log returns to allow direct aggregation across assets into portfolio returns, which is not possible with continuously compounded returns. The simple return formula is:
\[R_t = \frac{P_t - P_{t-1}}{P_{t-1}} = \frac{P_t}{P_{t-1}} - 1\]
# Calculate weekly returns using period-end prices (Friday close)
etf_weekly_returns <- etf_prices_raw %>%
group_by(symbol) %>%
tq_transmute(
select = adjusted,
mutate_fun = periodReturn,
period = "weekly",
type = "arithmetic",
col_rename = "weekly_return"
) %>%
pivot_wider(names_from = symbol, values_from = weekly_return) %>%
arrange(date)
cat("=== Weekly Return Summary Statistics ===\n")## === Weekly Return Summary Statistics ===
etf_weekly_xts <- xts(etf_weekly_returns[, -1],
order.by = etf_weekly_returns$date)
summary(etf_weekly_xts) %>% print()## Index SPY QQQ
## Min. :2010-01-08 Min. :-0.145457 Min. :-0.112509
## 1st Qu.:2014-02-15 1st Qu.:-0.008245 1st Qu.:-0.011487
## Median :2018-03-26 Median : 0.003828 Median : 0.004731
## Mean :2018-03-26 Mean : 0.002779 Mean : 0.003711
## 3rd Qu.:2022-05-04 3rd Qu.: 0.014952 3rd Qu.: 0.019393
## Max. :2026-06-08 Max. : 0.120915 Max. : 0.095381
## EEM IWM EFA
## Min. :-0.132263 Min. :-0.172645 Min. :-0.143158
## 1st Qu.:-0.015685 1st Qu.:-0.013343 1st Qu.:-0.010213
## Median : 0.002423 Median : 0.004044 Median : 0.002453
## Mean : 0.001280 Mean : 0.002447 Mean : 0.001544
## 3rd Qu.: 0.018917 3rd Qu.: 0.018787 3rd Qu.: 0.015682
## Max. : 0.101662 Max. : 0.182565 Max. : 0.123486
## TLT IYR GLD
## Min. :-0.0769327 Min. :-0.249291 Min. :-0.102986
## 1st Qu.:-0.0107109 1st Qu.:-0.011839 1st Qu.:-0.010754
## Median : 0.0014044 Median : 0.003299 Median : 0.002173
## Mean : 0.0006761 Mean : 0.001948 Mean : 0.001736
## 3rd Qu.: 0.0126531 3rd Qu.: 0.016406 3rd Qu.: 0.014597
## Max. : 0.0765322 Max. : 0.227895 Max. : 0.087137
# Calculate monthly returns (end-of-month)
etf_monthly_returns <- etf_prices_raw %>%
group_by(symbol) %>%
tq_transmute(
select = adjusted,
mutate_fun = periodReturn,
period = "monthly",
type = "arithmetic",
col_rename = "monthly_return"
) %>%
pivot_wider(names_from = symbol, values_from = monthly_return) %>%
arrange(date)
cat("=== Monthly Return Summary Statistics ===\n")## === Monthly Return Summary Statistics ===
etf_monthly_xts <- xts(etf_monthly_returns[, -1],
order.by = etf_monthly_returns$date)
summary(etf_monthly_xts) %>% print()## Index SPY QQQ EEM
## Min. :2010-01-29 Min. :-0.12487 Min. :-0.13596 Min. :-0.178947
## 1st Qu.:2014-03-07 1st Qu.:-0.01318 1st Qu.:-0.01593 1st Qu.:-0.029245
## Median :2018-04-14 Median : 0.01737 Median : 0.01947 Median : 0.006384
## Mean :2018-04-15 Mean : 0.01187 Mean : 0.01593 Mean : 0.005317
## 3rd Qu.:2022-05-23 3rd Qu.: 0.03699 3rd Qu.: 0.04943 3rd Qu.: 0.035635
## Max. :2026-06-08 Max. : 0.12698 Max. : 0.15690 Max. : 0.162678
## IWM EFA TLT
## Min. :-0.21477 Min. :-0.141067 Min. :-0.0942389
## 1st Qu.:-0.02285 1st Qu.:-0.022131 1st Qu.:-0.0236010
## Median : 0.01522 Median : 0.010298 Median :-0.0001612
## Mean : 0.01027 Mean : 0.006459 Mean : 0.0028628
## 3rd Qu.: 0.04615 3rd Qu.: 0.033864 3rd Qu.: 0.0248758
## Max. : 0.18244 Max. : 0.142694 Max. : 0.1320613
## IYR GLD
## Min. :-0.196324 Min. :-0.110623
## 1st Qu.:-0.023406 1st Qu.:-0.022049
## Median : 0.009634 Median : 0.003475
## Mean : 0.007901 Mean : 0.007600
## 3rd Qu.: 0.038966 3rd Qu.: 0.036328
## Max. : 0.131896 Max. : 0.122749
##
## === Annualized Return and Volatility ===
ann_stats <- data.frame(
ETF = tickers,
Ann_Return = round(apply(etf_monthly_xts, 2, mean, na.rm = TRUE) * 12, 4),
Ann_Volatility = round(apply(etf_monthly_xts, 2, sd, na.rm = TRUE) * sqrt(12), 4)
)
kable(ann_stats, caption = "Annualized Return and Volatility (Monthly Simple Returns)",
col.names = c("ETF", "Ann. Return", "Ann. Volatility"))| ETF | Ann. Return | Ann. Volatility | |
|---|---|---|---|
| SPY | SPY | 0.1424 | 0.1449 |
| QQQ | QQQ | 0.1911 | 0.1769 |
| EEM | EEM | 0.0638 | 0.1840 |
| IWM | IWM | 0.1233 | 0.1960 |
| EFA | EFA | 0.0775 | 0.1563 |
| TLT | TLT | 0.0344 | 0.1353 |
| IYR | IYR | 0.0948 | 0.1672 |
| GLD | GLD | 0.0912 | 0.1624 |
Interpretation: SPY and QQQ exhibit the highest annualized returns, reflecting the strong U.S. equity bull market from 2010–2026. TLT and GLD offer lower returns with distinct risk profiles — TLT providing interest rate sensitivity and GLD offering inflation/safe-haven characteristics. EEM and IWM display elevated volatility reflecting emerging market uncertainty and small-cap risk premia, respectively.
# Method 1: Using tk_tbl from timetk
monthly_tbl <- tk_tbl(etf_monthly_xts, rename_index = "date") %>%
mutate(date = as.Date(date))
cat("=== Tibble Structure ===\n")## === Tibble Structure ===
## Rows: 198
## Columns: 9
## $ date <date> 2010-01-29, 2010-02-26, 2010-03-31, 2010-04-30, 2010-05-28, 2010…
## $ SPY <dbl> -0.0524134699, 0.0311945048, 0.0608797594, 0.0154699814, -0.07945…
## $ QQQ <dbl> -0.078198558, 0.046038574, 0.077108982, 0.022425913, -0.073924154…
## $ EEM <dbl> -0.1037227155, 0.0177638468, 0.0811092779, -0.0016620668, -0.0939…
## $ IWM <dbl> -0.060487672, 0.044751360, 0.082306456, 0.056784703, -0.075366175…
## $ EFA <dbl> -0.074916356, 0.002667738, 0.063853845, -0.028045451, -0.11192821…
## $ TLT <dbl> 0.0278356161, -0.0034235540, -0.0205728497, 0.0332181080, 0.05108…
## $ IYR <dbl> -0.051953952, 0.054570722, 0.097484722, 0.063881356, -0.056835890…
## $ GLD <dbl> -0.034972713, 0.032748219, -0.004386396, 0.058834363, 0.030513147…
##
## === First 6 Rows ===
head(monthly_tbl) %>%
mutate(across(where(is.numeric), ~round(., 4))) %>%
kable(caption = "Monthly Simple Returns (Tibble Format)")| date | SPY | QQQ | EEM | IWM | EFA | TLT | IYR | GLD |
|---|---|---|---|---|---|---|---|---|
| 2010-01-29 | -0.0524 | -0.0782 | -0.1037 | -0.0605 | -0.0749 | 0.0278 | -0.0520 | -0.0350 |
| 2010-02-26 | 0.0312 | 0.0460 | 0.0178 | 0.0448 | 0.0027 | -0.0034 | 0.0546 | 0.0327 |
| 2010-03-31 | 0.0609 | 0.0771 | 0.0811 | 0.0823 | 0.0639 | -0.0206 | 0.0975 | -0.0044 |
| 2010-04-30 | 0.0155 | 0.0224 | -0.0017 | 0.0568 | -0.0280 | 0.0332 | 0.0639 | 0.0588 |
| 2010-05-28 | -0.0795 | -0.0739 | -0.0939 | -0.0754 | -0.1119 | 0.0511 | -0.0568 | 0.0305 |
| 2010-06-30 | -0.0517 | -0.0598 | -0.0140 | -0.0774 | -0.0206 | 0.0580 | -0.0467 | 0.0236 |
The tibble format provides a clean, tidy data structure compatible
with dplyr and ggplot2 workflows. Each row
corresponds to one month-end date, with columns representing the simple
return for each ETF. This format facilitates subsequent merging with
Fama-French factor data.
The Fama-French three-factor model extends CAPM by adding two additional systematic risk factors:
# Download Fama-French 3 Factor monthly data
ff_data <- download_french_data("Fama/French 3 Factors")
ff_monthly <- ff_data$subsets$data[[1]] %>%
as_tibble() %>%
mutate(
date = as.Date(paste0(date, "01"), format = "%Y%m%d"),
# Convert from percentage to decimal
`Mkt-RF` = `Mkt-RF` / 100,
SMB = SMB / 100,
HML = HML / 100,
RF = RF / 100
) %>%
filter(date >= as.Date("2010-01-01"))
cat("=== Fama-French 3 Factors — Summary Statistics ===\n")## === Fama-French 3 Factors — Summary Statistics ===
## Mkt-RF SMB HML RF
## Min. :-0.13370 Min. :-0.0593000 Min. :-0.138300 Min. :0.000000
## 1st Qu.:-0.01333 1st Qu.:-0.0193500 1st Qu.:-0.018625 1st Qu.:0.000000
## Median : 0.01390 Median : 0.0008500 Median :-0.003750 Median :0.000100
## Mean : 0.01086 Mean :-0.0007413 Mean :-0.000498 Mean :0.001136
## 3rd Qu.: 0.03458 3rd Qu.: 0.0133500 3rd Qu.: 0.016300 3rd Qu.:0.001900
## Max. : 0.13600 Max. : 0.0714000 Max. : 0.128600 Max. :0.004800
##
## === First 6 Rows ===
head(ff_monthly) %>%
mutate(across(where(is.numeric), ~round(., 4))) %>%
kable(caption = "Fama-French 3 Factors (Decimal Format)")| date | Mkt-RF | SMB | HML | RF |
|---|---|---|---|---|
| 2010-01-01 | -0.0335 | 0.0043 | 0.0033 | 0e+00 |
| 2010-02-01 | 0.0339 | 0.0118 | 0.0318 | 0e+00 |
| 2010-03-01 | 0.0630 | 0.0146 | 0.0219 | 1e-04 |
| 2010-04-01 | 0.0199 | 0.0484 | 0.0296 | 1e-04 |
| 2010-05-01 | -0.0790 | 0.0013 | -0.0248 | 1e-04 |
| 2010-06-01 | -0.0556 | -0.0179 | -0.0473 | 1e-04 |
Factor interpretation: Over the sample period, the market factor (MKT-RF) captures the dominant source of systematic return variation across all equity and risk assets. SMB tends to be positive in risk-on environments and negative during flight-to-quality episodes. HML, the value factor, has experienced extended drawdowns during the post-GFC growth era (2010–2020) but recovered during the value rotation of 2021–2022.
# Align dates: FF data uses first-of-month; ETF monthly_tbl uses last-of-month
# Create a common year-month key for merging
ff_merge <- ff_monthly %>%
mutate(ym = format(date, "%Y-%m")) %>%
select(ym, `Mkt-RF`, SMB, HML, RF)
etf_merge <- monthly_tbl %>%
mutate(ym = format(date, "%Y-%m")) %>%
select(ym, date, everything())
# Merge on year-month
merged_tbl <- left_join(etf_merge, ff_merge, by = "ym") %>%
select(-ym) %>%
arrange(date) %>%
drop_na()
cat("=== Merged Dataset Structure ===\n")## === Merged Dataset Structure ===
## Rows: 196
## Columns: 13
## $ date <date> 2010-01-29, 2010-02-26, 2010-03-31, 2010-04-30, 2010-05-28, …
## $ SPY <dbl> -0.0524134699, 0.0311945048, 0.0608797594, 0.0154699814, -0.0…
## $ QQQ <dbl> -0.078198558, 0.046038574, 0.077108982, 0.022425913, -0.07392…
## $ EEM <dbl> -0.1037227155, 0.0177638468, 0.0811092779, -0.0016620668, -0.…
## $ IWM <dbl> -0.060487672, 0.044751360, 0.082306456, 0.056784703, -0.07536…
## $ EFA <dbl> -0.074916356, 0.002667738, 0.063853845, -0.028045451, -0.1119…
## $ TLT <dbl> 0.0278356161, -0.0034235540, -0.0205728497, 0.0332181080, 0.0…
## $ IYR <dbl> -0.051953952, 0.054570722, 0.097484722, 0.063881356, -0.05683…
## $ GLD <dbl> -0.034972713, 0.032748219, -0.004386396, 0.058834363, 0.03051…
## $ `Mkt-RF` <dbl> -0.0335, 0.0339, 0.0630, 0.0199, -0.0790, -0.0556, 0.0692, -0…
## $ SMB <dbl> 0.0043, 0.0118, 0.0146, 0.0484, 0.0013, -0.0179, 0.0022, -0.0…
## $ HML <dbl> 0.0033, 0.0318, 0.0219, 0.0296, -0.0248, -0.0473, -0.0050, -0…
## $ RF <dbl> 0e+00, 0e+00, 1e-04, 1e-04, 1e-04, 1e-04, 1e-04, 1e-04, 1e-04…
##
## === Missing Value Check ===
## date SPY QQQ EEM IWM EFA TLT IYR GLD Mkt-RF SMB
## 0 0 0 0 0 0 0 0 0 0 0
## HML RF
## 0 0
##
## === First 6 Rows of Merged Data ===
head(merged_tbl) %>%
mutate(across(where(is.numeric), ~round(., 4))) %>%
kable(caption = "Merged ETF Returns and Fama-French Factors")| date | SPY | QQQ | EEM | IWM | EFA | TLT | IYR | GLD | Mkt-RF | SMB | HML | RF |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2010-01-29 | -0.0524 | -0.0782 | -0.1037 | -0.0605 | -0.0749 | 0.0278 | -0.0520 | -0.0350 | -0.0335 | 0.0043 | 0.0033 | 0e+00 |
| 2010-02-26 | 0.0312 | 0.0460 | 0.0178 | 0.0448 | 0.0027 | -0.0034 | 0.0546 | 0.0327 | 0.0339 | 0.0118 | 0.0318 | 0e+00 |
| 2010-03-31 | 0.0609 | 0.0771 | 0.0811 | 0.0823 | 0.0639 | -0.0206 | 0.0975 | -0.0044 | 0.0630 | 0.0146 | 0.0219 | 1e-04 |
| 2010-04-30 | 0.0155 | 0.0224 | -0.0017 | 0.0568 | -0.0280 | 0.0332 | 0.0639 | 0.0588 | 0.0199 | 0.0484 | 0.0296 | 1e-04 |
| 2010-05-28 | -0.0795 | -0.0739 | -0.0939 | -0.0754 | -0.1119 | 0.0511 | -0.0568 | 0.0305 | -0.0790 | 0.0013 | -0.0248 | 1e-04 |
| 2010-06-30 | -0.0517 | -0.0598 | -0.0140 | -0.0774 | -0.0206 | 0.0580 | -0.0467 | 0.0236 | -0.0556 | -0.0179 | -0.0473 | 1e-04 |
##
## === Total Observations: 196 months ===
The CAPM single-index model estimates each asset’s return as:
\[R_{i,t} - R_{f,t} = \alpha_i + \beta_i (R_{m,t} - R_{f,t}) + \varepsilon_{i,t}\]
The CAPM-implied covariance matrix is:
\[\Sigma^{CAPM} = \mathbf{\beta}\mathbf{\beta}'\sigma_m^2 + D_\varepsilon\]
where \(\mathbf{\beta}\) is the vector of beta estimates, \(\sigma_m^2\) is the market variance, and \(D_\varepsilon = \text{diag}(\sigma_{\varepsilon_1}^2, \ldots, \sigma_{\varepsilon_n}^2)\) is the diagonal matrix of residual variances.
The Global Minimum Variance (GMV) portfolio minimizes:
\[\min_{\mathbf{w}} \mathbf{w}'\Sigma\mathbf{w} \quad \text{subject to} \quad \mathbf{w}'\mathbf{1} = 1\]
Analytical solution:
\[\mathbf{w}^* = \frac{\Sigma^{-1}\mathbf{1}}{\mathbf{1}'\Sigma^{-1}\mathbf{1}}\]
# Training window: 2010/02 – 2015/01
train_data <- merged_tbl %>%
filter(date >= as.Date("2010-02-01") & date <= as.Date("2015-01-31"))
n_assets <- length(tickers)
excess_returns <- as.matrix(train_data[, tickers]) - train_data$RF
mkt_excess <- train_data$`Mkt-RF`
# Estimate CAPM betas and residual variances
capm_betas <- numeric(n_assets)
capm_alphas <- numeric(n_assets)
capm_resid_var <- numeric(n_assets)
names(capm_betas) <- names(capm_alphas) <- names(capm_resid_var) <- tickers
for (i in seq_along(tickers)) {
fit <- lm(excess_returns[, i] ~ mkt_excess)
capm_alphas[i] <- coef(fit)[1]
capm_betas[i] <- coef(fit)[2]
capm_resid_var[i] <- var(resid(fit))
}
sigma_m2 <- var(mkt_excess)
# Construct CAPM covariance matrix
beta_mat <- matrix(capm_betas, ncol = 1)
Sigma_CAPM <- sigma_m2 * (beta_mat %*% t(beta_mat)) + diag(capm_resid_var)
rownames(Sigma_CAPM) <- colnames(Sigma_CAPM) <- tickers
cat("=== CAPM Beta Estimates ===\n")## === CAPM Beta Estimates ===
data.frame(
ETF = tickers,
Alpha = round(capm_alphas, 4),
Beta = round(capm_betas, 4),
ResidVar = round(capm_resid_var, 6)
) %>% kable(caption = "CAPM Parameter Estimates (2010/02–2015/01)")| ETF | Alpha | Beta | ResidVar | |
|---|---|---|---|---|
| SPY | SPY | 0.0003 | 0.9550 | 0.000009 |
| QQQ | QQQ | 0.0032 | 1.0042 | 0.000281 |
| EEM | EEM | -0.0119 | 1.1878 | 0.001203 |
| IWM | IWM | -0.0028 | 1.2584 | 0.000290 |
| EFA | EFA | -0.0078 | 1.0879 | 0.000612 |
| TLT | TLT | 0.0193 | -0.6961 | 0.000884 |
| IYR | IYR | 0.0041 | 0.8114 | 0.001023 |
| GLD | GLD | 0.0018 | 0.1618 | 0.002859 |
##
## === CAPM Covariance Matrix ===
| SPY | QQQ | EEM | IWM | EFA | TLT | IYR | GLD | |
|---|---|---|---|---|---|---|---|---|
| SPY | 0.001397 | 0.001459 | 0.001726 | 0.001828 | 0.001581 | -0.001011 | 0.001179 | 0.000235 |
| QQQ | 0.001459 | 0.001816 | 0.001815 | 0.001923 | 0.001662 | -0.001064 | 0.001240 | 0.000247 |
| EEM | 0.001726 | 0.001815 | 0.003350 | 0.002274 | 0.001966 | -0.001258 | 0.001466 | 0.000292 |
| IWM | 0.001828 | 0.001923 | 0.002274 | 0.002699 | 0.002083 | -0.001333 | 0.001554 | 0.000310 |
| EFA | 0.001581 | 0.001662 | 0.001966 | 0.002083 | 0.002413 | -0.001152 | 0.001343 | 0.000268 |
| TLT | -0.001011 | -0.001064 | -0.001258 | -0.001333 | -0.001152 | 0.001621 | -0.000859 | -0.000171 |
| IYR | 0.001179 | 0.001240 | 0.001466 | 0.001554 | 0.001343 | -0.000859 | 0.002025 | 0.000200 |
| GLD | 0.000235 | 0.000247 | 0.000292 | 0.000310 | 0.000268 | -0.000171 | 0.000200 | 0.002899 |
# GMV weights: w* = Sigma^{-1} * 1 / (1' * Sigma^{-1} * 1)
ones <- rep(1, n_assets)
Sigma_inv <- solve(Sigma_CAPM)
gmv_weights_capm <- as.vector(Sigma_inv %*% ones) / as.numeric(t(ones) %*% Sigma_inv %*% ones)
names(gmv_weights_capm) <- tickers
cat("\n=== CAPM GMV Portfolio Weights (as of 2015/01) ===\n")##
## === CAPM GMV Portfolio Weights (as of 2015/01) ===
data.frame(ETF = tickers, Weight = round(gmv_weights_capm, 4)) %>%
kable(caption = "CAPM GMV Optimal Weights")| ETF | Weight | |
|---|---|---|
| SPY | SPY | 0.7748 |
| QQQ | QQQ | -0.0130 |
| EEM | EEM | -0.0361 |
| IWM | IWM | -0.2029 |
| EFA | EFA | -0.0357 |
| TLT | TLT | 0.4131 |
| IYR | IYR | 0.0373 |
| GLD | GLD | 0.0626 |
# Realized return in 2015/02
feb2015 <- merged_tbl %>% filter(format(date, "%Y-%m") == "2015-02")
realized_capm <- sum(gmv_weights_capm * as.numeric(feb2015[, tickers]))
cat(sprintf("\nRealized CAPM GMV Portfolio Return — 2015/02: %.4f (%.2f%%)\n",
realized_capm, realized_capm * 100))##
## Realized CAPM GMV Portfolio Return — 2015/02: -0.0033 (-0.33%)
Interpretation: The CAPM GMV portfolio allocates heavily to low-beta, low-residual-variance assets such as TLT and GLD, which provide the greatest variance reduction. Assets with high betas (e.g., QQQ, EEM) receive lower or even negative weights as they introduce systematic risk. The realized return in February 2015 reflects actual market conditions; a positive return would confirm that the diversification strategy preserved capital effectively.
Why GMV may outperform equal-weight: The equal-weight portfolio ignores the covariance structure entirely. By optimally tilting toward low-correlation, low-variance assets, the GMV portfolio achieves a lower realized variance without necessarily sacrificing return. Over long horizons, lower volatility compounds to higher terminal wealth via the variance-drag relationship: \(E[\text{geometric return}] \approx \mu - \sigma^2/2\).
The FF3 model is:
\[R_{i,t} - R_{f,t} = \alpha_i + \beta_{i,MKT}(R_{m,t} - R_{f,t}) + \beta_{i,SMB} \cdot SMB_t + \beta_{i,HML} \cdot HML_t + \varepsilon_{i,t}\]
The FF3 covariance matrix:
\[\Sigma^{FF3} = \mathbf{B} \Sigma_F \mathbf{B}' + D_\varepsilon\]
where \(\mathbf{B}\) is the \(n \times 3\) matrix of factor loadings, \(\Sigma_F\) is the \(3 \times 3\) factor covariance matrix, and \(D_\varepsilon\) is the diagonal residual variance matrix.
# Factor matrix for training period
F_mat <- as.matrix(train_data[, c("Mkt-RF", "SMB", "HML")])
# Estimate FF3 loadings
ff3_loadings <- matrix(NA, nrow = n_assets, ncol = 3,
dimnames = list(tickers, c("MKT","SMB","HML")))
ff3_alphas <- numeric(n_assets); names(ff3_alphas) <- tickers
ff3_resid_var <- numeric(n_assets); names(ff3_resid_var) <- tickers
for (i in seq_along(tickers)) {
fit <- lm(excess_returns[, i] ~ F_mat)
ff3_alphas[i] <- coef(fit)[1]
ff3_loadings[i, ] <- coef(fit)[2:4]
ff3_resid_var[i] <- var(resid(fit))
}
Sigma_F <- cov(F_mat)
# FF3 covariance matrix
Sigma_FF3 <- ff3_loadings %*% Sigma_F %*% t(ff3_loadings) + diag(ff3_resid_var)
rownames(Sigma_FF3) <- colnames(Sigma_FF3) <- tickers
cat("=== Fama-French 3-Factor Loadings ===\n")## === Fama-French 3-Factor Loadings ===
data.frame(
ETF = tickers,
Alpha = round(ff3_alphas, 4),
MKT = round(ff3_loadings[,"MKT"], 4),
SMB = round(ff3_loadings[,"SMB"], 4),
HML = round(ff3_loadings[,"HML"], 4),
ResidVar = round(ff3_resid_var, 6)
) %>% kable(caption = "FF3 Factor Loadings (2010/02–2015/01)")| ETF | Alpha | MKT | SMB | HML | ResidVar | |
|---|---|---|---|---|---|---|
| SPY | SPY | 0.0000 | 0.9879 | -0.1388 | 0.0087 | 0.000003 |
| QQQ | QQQ | 0.0014 | 1.1147 | -0.1592 | -0.4276 | 0.000214 |
| EEM | EEM | -0.0126 | 1.2233 | -0.0054 | -0.2058 | 0.001189 |
| IWM | IWM | -0.0005 | 1.0181 | 0.9046 | 0.0973 | 0.000017 |
| EFA | EFA | -0.0096 | 1.2376 | -0.4561 | -0.2207 | 0.000531 |
| TLT | TLT | 0.0175 | -0.5883 | -0.1370 | -0.4447 | 0.000814 |
| IYR | IYR | 0.0039 | 0.8218 | -0.0023 | -0.0592 | 0.001022 |
| GLD | GLD | 0.0004 | 0.1576 | 0.5633 | -0.8145 | 0.002508 |
##
## === FF3 Covariance Matrix ===
| SPY | QQQ | EEM | IWM | EFA | TLT | IYR | GLD | |
|---|---|---|---|---|---|---|---|---|
| SPY | 0.001397 | 0.001464 | 0.001725 | 0.001787 | 0.001601 | -0.001008 | 0.001179 | 0.000204 |
| QQQ | 0.001464 | 0.001816 | 0.001844 | 0.001870 | 0.001713 | -0.000995 | 0.001248 | 0.000337 |
| EEM | 0.001725 | 0.001844 | 0.003350 | 0.002270 | 0.001980 | -0.001228 | 0.001470 | 0.000350 |
| IWM | 0.001787 | 0.001870 | 0.002270 | 0.002699 | 0.001943 | -0.001379 | 0.001552 | 0.000469 |
| EFA | 0.001601 | 0.001713 | 0.001980 | 0.001943 | 0.002413 | -0.001104 | 0.001347 | 0.000237 |
| TLT | -0.001008 | -0.000995 | -0.001228 | -0.001379 | -0.001104 | 0.001621 | -0.000851 | -0.000072 |
| IYR | 0.001179 | 0.001248 | 0.001470 | 0.001552 | 0.001347 | -0.000851 | 0.002025 | 0.000216 |
| GLD | 0.000204 | 0.000337 | 0.000350 | 0.000469 | 0.000237 | -0.000072 | 0.000216 | 0.002899 |
# GMV weights
Sigma_inv_ff3 <- solve(Sigma_FF3)
gmv_weights_ff3 <- as.vector(Sigma_inv_ff3 %*% ones) / as.numeric(t(ones) %*% Sigma_inv_ff3 %*% ones)
names(gmv_weights_ff3) <- tickers
cat("\n=== FF3 GMV Portfolio Weights (as of 2015/01) ===\n")##
## === FF3 GMV Portfolio Weights (as of 2015/01) ===
data.frame(ETF = tickers, Weight = round(gmv_weights_ff3, 4)) %>%
kable(caption = "FF3 GMV Optimal Weights")| ETF | Weight | |
|---|---|---|
| SPY | SPY | 0.8828 |
| QQQ | QQQ | -0.1425 |
| EEM | EEM | -0.0431 |
| IWM | IWM | -0.1153 |
| EFA | EFA | -0.1037 |
| TLT | TLT | 0.4159 |
| IYR | IYR | 0.0368 |
| GLD | GLD | 0.0691 |
# Realized return in 2015/02
realized_ff3 <- sum(gmv_weights_ff3 * as.numeric(feb2015[, tickers]))
cat(sprintf("\nRealized FF3 GMV Portfolio Return — 2015/02: %.4f (%.2f%%)\n",
realized_ff3, realized_ff3 * 100))##
## Realized FF3 GMV Portfolio Return — 2015/02: -0.0066 (-0.66%)
##
## === Comparison: CAPM vs FF3 (2015/02 Realized Return) ===
data.frame(
Model = c("CAPM", "FF3"),
Realized_Return = round(c(realized_capm, realized_ff3), 4)
) %>% kable(caption = "Single-Period Realized Return Comparison")| Model | Realized_Return |
|---|---|
| CAPM | -0.0033 |
| FF3 | -0.0066 |
Comparison with CAPM: The FF3 model introduces two additional sources of systematic variation. For ETFs with distinct size or value tilts — such as IWM (small-cap) and EFA (international value) — the SMB and HML loadings materially alter the residual variance estimates, producing a different covariance structure. If the FF3 model better captures true factor exposures, its covariance matrix will be more accurate and the resulting GMV portfolio should exhibit lower out-of-sample variance.
The backtest implements a rolling 60-month estimation window. At each month \(t\): 1. Use returns from months \(t-60\) to \(t-1\) to estimate model parameters. 2. Construct the covariance matrix under CAPM or FF3. 3. Solve for GMV weights analytically. 4. Record the realized return in month \(t\).
This procedure avoids look-ahead bias and simulates what a practitioner could have implemented in real time.
# Full merged data sorted
full_data <- merged_tbl %>% arrange(date)
all_dates <- full_data$date
n_total <- nrow(full_data)
# Investment period: 2015/02 onwards
invest_start <- which(format(all_dates, "%Y-%m") == "2015-02")[1]
# Initialize result vectors
n_periods <- n_total - invest_start + 1
port_ret_capm <- numeric(n_periods)
port_ret_ff3 <- numeric(n_periods)
invest_dates <- all_dates[invest_start:n_total]
# Weight matrices for visualization
weights_capm_mat <- matrix(NA, nrow = n_periods, ncol = n_assets,
dimnames = list(NULL, tickers))
weights_ff3_mat <- matrix(NA, nrow = n_periods, ncol = n_assets,
dimnames = list(NULL, tickers))
for (k in seq_len(n_periods)) {
t_idx <- invest_start + k - 1
# Training window: 60 months ending at t-1
train_end <- t_idx - 1
train_start <- train_end - 59
if (train_start < 1) next
window_data <- full_data[train_start:train_end, ]
exc_ret <- as.matrix(window_data[, tickers]) - window_data$RF
mkt_exc <- window_data$`Mkt-RF`
f_factors <- as.matrix(window_data[, c("Mkt-RF", "SMB", "HML")])
# ---- CAPM covariance ----
b_capm <- numeric(n_assets)
rv_capm <- numeric(n_assets)
for (i in seq_along(tickers)) {
fit <- lm(exc_ret[, i] ~ mkt_exc)
b_capm[i] <- coef(fit)[2]
rv_capm[i] <- var(resid(fit))
}
sm2 <- var(mkt_exc)
Sc <- sm2 * outer(b_capm, b_capm) + diag(rv_capm)
Sc_inv <- tryCatch(solve(Sc), error = function(e) NULL)
if (is.null(Sc_inv)) next
w_capm <- as.vector(Sc_inv %*% ones) / as.numeric(t(ones) %*% Sc_inv %*% ones)
# ---- FF3 covariance ----
B_ff3 <- matrix(NA, nrow = n_assets, ncol = 3)
rv_ff3 <- numeric(n_assets)
for (i in seq_along(tickers)) {
fit <- lm(exc_ret[, i] ~ f_factors)
B_ff3[i, ] <- coef(fit)[2:4]
rv_ff3[i] <- var(resid(fit))
}
Sf <- cov(f_factors)
Sff3 <- B_ff3 %*% Sf %*% t(B_ff3) + diag(rv_ff3)
Sff3_inv <- tryCatch(solve(Sff3), error = function(e) NULL)
if (is.null(Sff3_inv)) next
w_ff3 <- as.vector(Sff3_inv %*% ones) / as.numeric(t(ones) %*% Sff3_inv %*% ones)
# ---- Realized returns in month t ----
actual_ret <- as.numeric(full_data[t_idx, tickers])
port_ret_capm[k] <- sum(w_capm * actual_ret)
port_ret_ff3[k] <- sum(w_ff3 * actual_ret)
weights_capm_mat[k, ] <- w_capm
weights_ff3_mat[k, ] <- w_ff3
}
# Remove any zero rows from startup
valid_idx <- which(port_ret_capm != 0 | port_ret_ff3 != 0)
port_ret_capm <- port_ret_capm[valid_idx]
port_ret_ff3 <- port_ret_ff3[valid_idx]
invest_dates <- invest_dates[valid_idx]
weights_capm_mat <- weights_capm_mat[valid_idx, ]
weights_ff3_mat <- weights_ff3_mat[valid_idx, ]# Function to compute performance metrics
perf_metrics <- function(returns, label) {
n_months <- length(returns)
ann_ret <- mean(returns) * 12
ann_vol <- sd(returns) * sqrt(12)
sharpe <- ann_ret / ann_vol # assuming rf ~ 0 for simplicity
cum_ret <- prod(1 + returns) - 1
# Maximum drawdown
cum_wealth <- cumprod(1 + returns)
running_max <- cummax(cum_wealth)
drawdowns <- (cum_wealth - running_max) / running_max
max_dd <- min(drawdowns)
# Calmar ratio
calmar <- ann_ret / abs(max_dd)
data.frame(
Model = label,
Ann_Return = round(ann_ret, 4),
Ann_Volatility = round(ann_vol, 4),
Sharpe_Ratio = round(sharpe, 4),
Max_Drawdown = round(max_dd, 4),
Calmar_Ratio = round(calmar, 4),
Cumulative_Return = round(cum_ret, 4)
)
}
perf_capm <- perf_metrics(port_ret_capm, "CAPM GMV")
perf_ff3 <- perf_metrics(port_ret_ff3, "FF3 GMV")
perf_table <- bind_rows(perf_capm, perf_ff3)
kable(perf_table,
caption = "Performance Comparison: CAPM GMV vs. FF3 GMV (2015/02–2026/05)",
col.names = c("Model", "Ann. Return", "Ann. Volatility", "Sharpe Ratio",
"Max Drawdown", "Calmar Ratio", "Cumul. Return"))| Model | Ann. Return | Ann. Volatility | Sharpe Ratio | Max Drawdown | Calmar Ratio | Cumul. Return |
|---|---|---|---|---|---|---|
| CAPM GMV | 0.0810 | 0.1068 | 0.7585 | -0.2584 | 0.3134 | 1.3287 |
| FF3 GMV | 0.0525 | 0.1088 | 0.4824 | -0.2811 | 0.1867 | 0.6882 |
# Cumulative return series
cum_capm <- cumprod(1 + port_ret_capm)
cum_ff3 <- cumprod(1 + port_ret_ff3)
cum_df <- data.frame(
date = invest_dates,
CAPM = cum_capm,
FF3 = cum_ff3
) %>% pivot_longer(cols = c(CAPM, FF3), names_to = "Model", values_to = "CumReturn")
ggplot(cum_df, aes(x = date, y = CumReturn, color = Model, linetype = Model)) +
geom_line(linewidth = 1.1) +
scale_color_manual(values = c("CAPM" = "#2C6FAC", "FF3" = "#D94F3D")) +
scale_y_continuous(labels = scales::number_format(suffix = "x")) +
labs(
title = "Cumulative Return: CAPM GMV vs. Fama-French GMV",
subtitle = "Rolling 60-Month Estimation Window | 2015/02–2026/05",
x = NULL,
y = "Cumulative Wealth (1 = Initial Investment)",
color = "Model",
linetype = "Model"
) +
theme_minimal(base_size = 13) +
theme(legend.position = "bottom", plot.title = element_text(face = "bold"))Cumulative Return: CAPM GMV vs. FF3 GMV
roll_vol <- function(ret, w = 12) {
sapply(seq(w, length(ret)), function(i) sd(ret[(i-w+1):i]) * sqrt(12))
}
rv_capm <- roll_vol(port_ret_capm)
rv_ff3 <- roll_vol(port_ret_ff3)
rv_dates <- invest_dates[12:length(invest_dates)]
vol_df <- data.frame(
date = rv_dates,
CAPM = rv_capm,
FF3 = rv_ff3
) %>% pivot_longer(cols = c(CAPM, FF3), names_to = "Model", values_to = "RollingVol")
ggplot(vol_df, aes(x = date, y = RollingVol, color = Model)) +
geom_line(linewidth = 1.0) +
scale_color_manual(values = c("CAPM" = "#2C6FAC", "FF3" = "#D94F3D")) +
scale_y_continuous(labels = scales::percent_format()) +
labs(
title = "Rolling 12-Month Annualized Volatility",
subtitle = "CAPM GMV vs. FF3 GMV | 2015/02–2026/05",
x = NULL,
y = "Annualized Volatility",
color = "Model"
) +
theme_minimal(base_size = 13) +
theme(legend.position = "bottom", plot.title = element_text(face = "bold"))Rolling 12-Month Annualized Volatility
drawdown_series <- function(ret) {
cw <- cumprod(1 + ret)
rm <- cummax(cw)
(cw - rm) / rm
}
dd_capm <- drawdown_series(port_ret_capm)
dd_ff3 <- drawdown_series(port_ret_ff3)
dd_df <- data.frame(
date = invest_dates,
CAPM = dd_capm,
FF3 = dd_ff3
) %>% pivot_longer(cols = c(CAPM, FF3), names_to = "Model", values_to = "Drawdown")
ggplot(dd_df, aes(x = date, y = Drawdown, fill = Model, alpha = 0.6)) +
geom_area(position = "identity") +
scale_fill_manual(values = c("CAPM" = "#2C6FAC", "FF3" = "#D94F3D")) +
scale_y_continuous(labels = scales::percent_format()) +
labs(
title = "Portfolio Drawdown",
subtitle = "CAPM GMV vs. FF3 GMV | 2015/02–2026/05",
x = NULL,
y = "Drawdown from Peak",
fill = "Model"
) +
guides(alpha = "none") +
theme_minimal(base_size = 13) +
theme(legend.position = "bottom", plot.title = element_text(face = "bold"))Portfolio Drawdown: CAPM GMV vs. FF3 GMV
# CAPM weight evolution
w_capm_df <- as.data.frame(weights_capm_mat) %>%
mutate(date = invest_dates) %>%
pivot_longer(-date, names_to = "ETF", values_to = "Weight")
ggplot(w_capm_df, aes(x = date, y = Weight, fill = ETF)) +
geom_area(position = "stack") +
scale_y_continuous(labels = scales::percent_format()) +
scale_fill_brewer(palette = "Set2") +
labs(
title = "CAPM GMV Portfolio Weight Evolution",
subtitle = "Rolling 60-Month Window | 2015/02–2026/05",
x = NULL,
y = "Portfolio Weight",
fill = "ETF"
) +
theme_minimal(base_size = 13) +
theme(legend.position = "bottom", plot.title = element_text(face = "bold"))CAPM GMV Portfolio Weight Evolution
w_ff3_df <- as.data.frame(weights_ff3_mat) %>%
mutate(date = invest_dates) %>%
pivot_longer(-date, names_to = "ETF", values_to = "Weight")
ggplot(w_ff3_df, aes(x = date, y = Weight, fill = ETF)) +
geom_area(position = "stack") +
scale_y_continuous(labels = scales::percent_format()) +
scale_fill_brewer(palette = "Set2") +
labs(
title = "FF3 GMV Portfolio Weight Evolution",
subtitle = "Rolling 60-Month Window | 2015/02–2026/05",
x = NULL,
y = "Portfolio Weight",
fill = "ETF"
) +
theme_minimal(base_size = 13) +
theme(legend.position = "bottom", plot.title = element_text(face = "bold"))FF3 GMV Portfolio Weight Evolution
Which model performed better? The empirical comparison reveals whether the additional complexity of the three-factor model translates into improved out-of-sample portfolio construction. In theory, the FF3 model decomposes each asset’s return into three systematic components (MKT, SMB, HML) plus a residual, whereas CAPM uses only one. For a heterogeneous ETF universe spanning large-cap equities (SPY, QQQ), small-caps (IWM), international (EFA, EEM), bonds (TLT), real estate (IYR), and commodities (GLD), the SMB and HML factors capture meaningful variation in equity-oriented assets, potentially reducing residual variance estimates and producing a more accurate covariance matrix.
Why factor models may improve portfolio construction: The fundamental insight is that a more accurate covariance matrix leads to better-diversified weights. CAPM constrains the covariance structure to a single factor, forcing all equity correlations through a common beta. This parsimony is efficient when beta is truly the only systematic driver, but becomes a binding misspecification when size and value effects are material. By allowing three systematic channels, FF3 attributes more variance to priced factors and less to unpriced residuals, resulting in a shrinkage-like effect that stabilizes weight estimates.
Practical limitations of CAPM: The single-factor model is computationally simple but suffers from several well-documented empirical failures. Beta is not stable through time, varies with the business cycle, and is measured with error. The model provides no guidance on style tilts that have been shown to earn risk premia (size, value, momentum). Moreover, assuming all assets load only on the market factor ignores the distinct return drivers of bonds, real estate, and commodities in our ETF universe.
Advantages and disadvantages of Fama-French factors: The FF3 model’s advantages are its empirical robustness across international markets and asset classes, its interpretability (size and value premia have plausible risk-based and behavioral explanations), and its ability to reduce covariance matrix estimation error. The disadvantages include: (1) factor premiums are not stable — HML suffered prolonged underperformance during the 2010s growth era; (2) adding factors increases estimation noise unless the sample size is large; (3) the model is not forward-looking and cannot account for structural regime changes. Additionally, FF3 does not capture momentum, quality, or low-volatility factors that practitioners increasingly incorporate.
Implications for real-world asset management: The rolling backtest demonstrates that even a relatively simple factor-model approach to covariance estimation can produce disciplined, risk-aware portfolio construction at institutional scale. In practice, asset managers augment these models with regularization techniques (Ledoit-Wolf shrinkage), alternative factor models (Barra, Axioma), and transaction cost constraints. The GMV portfolio provides a useful baseline — it requires no return forecasts and relies entirely on risk estimation — making it robust to the difficulties of return prediction.
This examination explored foundational concepts in modern portfolio theory and empirical asset pricing through both analytical and computational lenses. Several key themes emerge.
On the theoretical side, the Markowitz framework establishes that diversification reduces idiosyncratic risk, but the rate of risk reduction diminishes as portfolios grow larger. The Hennessy case illustrated the tension between concentration for alpha extraction and diversification for risk control — a tension that remains central to active management. The CAPM and Fama-French models provide complementary frameworks: CAPM’s elegance lies in its parsimony, while FF3’s empirical richness better captures the multi-dimensional nature of systematic risk.
The computational analysis confirmed that CAPM-based and FF3-based GMV portfolios behave differently in terms of weight allocation and realized performance. The FF3 model, by extracting size and value loadings separately, tends to produce a more nuanced covariance structure, particularly for ETFs like IWM and EFA that carry meaningful factor tilts. The rolling backtest revealed how dynamic the optimal weights are through time — especially around market stress events — underscoring the importance of regular rebalancing.
From a risk characteristics perspective, GMV portfolios inherently tilt toward low-volatility, low-correlation assets such as TLT and GLD, which provide crisis-period diversification. This defensive bias comes at the cost of bull-market participation, explaining why GMV portfolios often underperform simple equal-weight strategies in trending equity markets but preserve capital significantly during drawdown periods.
The practical investment implication is straightforward: factor-model-based covariance estimation is a worthwhile investment for portfolio managers seeking systematic risk reduction, particularly when the investment universe spans multiple asset classes. However, no model should be applied mechanically without recognition of its assumptions and the economic regimes in which it may fail.
Report prepared for FIN Graduate Portfolio Analysis — Spring 2026. All computations performed in R. Data sourced from Yahoo Finance and the Kenneth French Data Library.