1 Questions from Textbook (60%)

1.1 Chapter 7

1.1.1 CFA Problem 1 — Portfolio Concentration and Diversification

Context: Hennessy manages a $30 million equity portfolio holding approximately 40 stocks (2–3% per issue), with demonstrated skill in identifying roughly 10 superior-performing issues per year. Jones proposes restricting the portfolio to no more than 20 stocks by doubling commitments to favored holdings.


Part (a): Will limiting to 20 stocks increase or decrease portfolio risk?

Concept: Diversification reduces unsystematic (idiosyncratic) risk. As the number of holdings decreases from 40 to 20, the portfolio becomes less diversified, and unsystematic risk necessarily increases. The total portfolio variance is:

\[\sigma_p^2 = \beta_p^2 \sigma_m^2 + \frac{\bar{\sigma}_\varepsilon^2}{n}\]

where \(\bar{\sigma}_\varepsilon^2\) is the average residual variance and \(n\) is the number of stocks. Halving \(n\) from 40 to 20 approximately doubles the idiosyncratic component, all else equal.

Answer: Risk will increase. The portfolio will carry more firm-specific risk because fewer stocks means less diversification of idiosyncratic shocks. The 40-stock portfolio already captures most systematic risk, so the incremental diversification benefit from going below 20 is relatively limited, but the move from 40 to 20 still materially raises unsystematic exposure.


Part (b): Can Hennessy reduce holdings from 40 to 20 without significantly affecting risk?

Answer: Yes, if done carefully. Hennessy could concentrate in the 20 stocks with the lowest pairwise correlations among themselves, thereby maintaining diversification benefits despite the smaller count. If the discarded 20 stocks were highly correlated with those retained, dropping them has minimal risk impact. Additionally, if the retained stocks span diverse sectors and have low inter-stock correlations, the residual variance term in the expression above may not increase materially. The key is selecting stocks whose returns are driven by different underlying fundamentals.


1.1.2 CFA Problem 2 — Further Reduction to 10 Stocks

Concept: The marginal diversification benefit of adding the \(n\)-th stock diminishes rapidly. Most unsystematic risk is eliminated with 20–30 randomly selected stocks. However, reducing from 20 to 10 stocks concentrates the portfolio further, increasing idiosyncratic risk substantially.

Answer: Reduction to 10 is less likely to be advantageous because:

  1. The firm-specific variance term \(\bar{\sigma}_\varepsilon^2 / n\) increases from \(\bar{\sigma}_\varepsilon^2/20\) to \(\bar{\sigma}_\varepsilon^2/10\), doubling the unsystematic risk component.
  2. With only 10 stocks, each position represents 10% of the portfolio, meaning one bad earnings surprise can severely damage performance.
  3. Although Hennessy identifies ~10 superior stocks per year, doubling down on only those 10 eliminates any buffer from a diversified baseline. Even skilled managers experience mis-identification errors.
  4. The marginal alpha gained by going from 20 to 10 stocks is unlikely to offset the disproportionate risk increase at that concentration level.

1.1.3 CFA Problem 3 — Evaluating Within the Context of the Total Fund

Concept: When Wilstead evaluates the Hennessy portfolio as one component of a larger multi-manager fund, the relevant risk measure shifts from standalone volatility to the contribution to total fund risk — specifically, the covariance of Hennessy’s portfolio with the aggregate $280 million fund.

Answer: If the other five managers’ portfolios are lowly correlated with Hennessy’s holdings, then Hennessy’s portfolio — even concentrated in 10 stocks — may contribute very little marginal risk to the total fund. The committee should consider:

\[\text{Contribution to Fund Risk} = w_H \cdot \text{Cov}(R_H, R_{\text{Fund}}) / \sigma_{\text{Fund}}\]

If \(\text{Cov}(R_H, R_{\text{Fund}})\) is low, concentrating Hennessy’s portion adds idiosyncratic exposure at the sub-portfolio level but minimal total-fund risk. In this case, the restriction to 10 or 20 stocks matters less from a fund-level perspective than from a standalone perspective. The committee might be more permissive about concentration if Hennessy’s alpha generation compensates for the residual idiosyncratic risk, since that risk is partially diversified away by the other managers.


1.1.4 CFA Problem 4 — Efficient Frontier: Which Portfolio Cannot Lie on It?

Concept: A portfolio lies on the Markowitz efficient frontier if and only if no other portfolio offers a higher expected return for the same risk, or lower risk for the same return. A portfolio is dominated (and thus cannot be on the efficient frontier) if another portfolio achieves a higher return with equal or lower standard deviation.

Portfolio E(R) σ
W 15% 36%
X 12% 15%
Z 5% 7%
Y 9% 21%

Analysis: Compare Portfolio Y (E(R) = 9%, σ = 21%) against Portfolio X (E(R) = 12%, σ = 15%). Portfolio X offers a higher expected return (12% > 9%) and lower risk (15% < 21%). Therefore, no mean-variance investor would choose Y over X.

\[\text{Sharpe}_X = \frac{12 - r_f}{15} > \frac{9 - r_f}{21} = \text{Sharpe}_Y \quad \text{for reasonable } r_f\]

Answer: Portfolio Y (d) cannot lie on the efficient frontier, as it is strictly dominated by Portfolio X.


1.1.5 CFA Problem 10 — Portfolio Selection: AB vs. BC

Data:

Stock σ (%) Correlations
A B C
A 40 1.00 0.90 0.50
B 20 0.90 1.00 0.10
C 40 0.50 0.10 1.00

Concept: For an equal-weighted two-asset portfolio (\(w_i = 0.5\)), portfolio variance is:

\[\sigma_p^2 = w_1^2\sigma_1^2 + w_2^2\sigma_2^2 + 2w_1 w_2 \rho_{12}\sigma_1\sigma_2\]

Portfolio AB (\(\rho_{AB} = 0.90\)):

\[\sigma_{AB}^2 = (0.5)^2(40)^2 + (0.5)^2(20)^2 + 2(0.5)(0.5)(0.90)(40)(20)\] \[= 0.25(1600) + 0.25(400) + 0.5(0.90)(800)\] \[= 400 + 100 + 360 = 860\] \[\sigma_{AB} = \sqrt{860} = 29.33\%\]

Portfolio BC (\(\rho_{BC} = 0.10\)):

\[\sigma_{BC}^2 = (0.5)^2(20)^2 + (0.5)^2(40)^2 + 2(0.5)(0.5)(0.10)(20)(40)\] \[= 0.25(400) + 0.25(1600) + 0.5(0.10)(800)\] \[= 100 + 400 + 40 = 540\] \[\sigma_{BC} = \sqrt{540} = 23.24\%\]

Interpretation: Because we are given only standard deviations and correlations — no expected returns — we cannot evaluate return performance. The recommendation is based purely on risk minimization. Portfolio BC has a standard deviation of 23.24% versus 29.33% for AB, a reduction of approximately 6 percentage points. The low correlation between B and C (ρ = 0.10) produces meaningful diversification, whereas the near-perfect correlation between A and B (ρ = 0.90) means little risk reduction occurs in Portfolio AB.

Recommendation: Portfolio BC, as it offers lower risk for any given expected return under mean-variance optimization.


1.2 Chapter 8

1.2.1 CFA Problem 1 — CAPM Regression Analysis: ABC vs. XYZ

Data:

Statistic ABC XYZ
Alpha −3.20% 7.30%
Beta 0.60 0.97
0.35 0.17
Residual Std Dev 13.02% 21.45%

Brokerage Estimates (2-year weekly):

House Beta of ABC Beta of XYZ
A 0.62 1.45
B 0.71 1.25

Analysis:

ABC: The five-year alpha is −3.20%, indicating the stock underperformed the CAPM benchmark on a risk-adjusted basis over the estimation period. Beta of 0.60 suggests below-market systematic risk. R² = 0.35 means 35% of return variation is explained by market movements; the residual standard deviation of 13.02% reflects substantial firm-specific risk.

XYZ: Alpha of +7.30% suggests strong risk-adjusted outperformance historically. However, R² = 0.17 is very low — only 17% of variance is market-driven — and the residual standard deviation of 21.45% is high, indicating most of XYZ’s risk is idiosyncratic. In a diversified portfolio, this unsystematic risk is not compensated.

Implications for future risk-return relationships:

  1. Beta instability: The brokerage estimates for XYZ diverge substantially (1.45 vs. 1.25), suggesting its beta is unstable over time. ABC’s brokerage estimates (0.62 vs. 0.71) are more consistent but both are meaningfully above the 5-year estimate of 0.60. Beta drift is a known phenomenon; the most recent estimates should receive higher weight.

  2. Alpha persistence: Historical alphas are not reliable predictors of future alphas. ABC’s negative alpha could reflect a temporary value trap; XYZ’s positive alpha may reflect a growth premium already priced in.

  3. In a diversified portfolio: XYZ’s high idiosyncratic risk is diversified away; only its beta (approximately 1.35 as a midpoint of recent estimates) determines its marginal contribution to portfolio risk. ABC, with its lower and more stable beta, contributes less market risk.


1.2.2 CFA Problem 2 — Nonsystematic Risk: Baker Fund

Formula: Given the correlation coefficient \(\rho\) between a fund and the market index:

\[R^2 = \rho^2 = \frac{\text{Systematic Variance}}{\text{Total Variance}}\]

\[\text{Nonsystematic Proportion} = 1 - R^2 = 1 - \rho^2\]

Calculation:

\[\rho = 0.70 \implies R^2 = (0.70)^2 = 0.49\]

\[\text{Nonsystematic Proportion} = 1 - 0.49 = 0.51 = 51\%\]

Answer: 51% of Baker Fund’s total risk is nonsystematic (firm-specific).

Interpretation: Just over half of the fund’s total return variation cannot be attributed to market-wide movements. For a fund that is part of a broader diversified portfolio, this unsystematic risk is largely irrelevant to the investor. However, if Baker Fund is held as a standalone investment, this unsystematic exposure significantly increases total risk without additional compensation under CAPM.


1.2.3 CFA Problem 3 — Implied Beta: Charlottesville International Fund

Given:

  • Correlation with world market index: \(\rho = 1.0\)
  • Expected world market return: \(E(R_m) = 11\%\)
  • Expected fund return: \(E(R_f) = 9\%\)
  • Risk-free rate: \(r_f = 3\%\)

Formula: From CAPM:

\[E(R_i) = r_f + \beta_i [E(R_m) - r_f]\]

Solving for \(\beta\):

\[\beta = \frac{E(R_i) - r_f}{E(R_m) - r_f} = \frac{9\% - 3\%}{11\% - 3\%} = \frac{6\%}{8\%} = 0.75\]

Answer: The implied beta of Charlottesville International is 0.75.

Interpretation: A beta of 0.75 implies the fund moves 75% as much as the world market index on average. Since the correlation is perfect (ρ = 1.0), all of the fund’s risk is systematic — there is no idiosyncratic variance. The fund’s lower expected return (9% vs. 11% market) is entirely consistent with its below-market systematic risk. An investor seeking full market exposure would need to lever the fund, whereas a risk-averse investor seeking broad international diversification with reduced volatility would find this profile appropriate.


1.2.4 CFA Problem 4 — Beta and Systematic Risk (Conceptual)

Answer: (d) Systematic risk.

Explanation: Beta (\(\beta\)) measures a security’s sensitivity to systematic (market-wide) risk factors — movements that cannot be eliminated through diversification. It is defined as:

\[\beta_i = \frac{\text{Cov}(R_i, R_m)}{\text{Var}(R_m)}\]

Beta is explicitly a measure of co-movement with the market portfolio, capturing only the non-diversifiable component of total risk. Correlation coefficients (a) describe linear association but are not specific to systematic risk. Mean-variance analysis (b) uses total variance. Nonsystematic risk (c) is the component beta does not measure.


1.2.5 CFA Problem 5 — Beta vs. Standard Deviation as Risk Measures

Answer: (b) Beta measures only systematic risk, while standard deviation measures total risk.

Explanation: Under the single-index model, total variance decomposes as:

\[\sigma_i^2 = \beta_i^2 \sigma_m^2 + \sigma_{\varepsilon_i}^2\]

Standard deviation (\(\sigma_i\)) captures both the systematic component (\(\beta_i^2 \sigma_m^2\)) and the idiosyncratic component (\(\sigma_{\varepsilon_i}^2\)). Beta measures only the former — sensitivity to market-wide movements. In the context of a well-diversified portfolio, beta is the appropriate risk measure because idiosyncratic variance cancels across holdings. For an undiversified investor, standard deviation is more relevant as it captures total exposure.


1.3 Chapter 9

Reference data for Problems 8 and 9:

Portfolio Avg. Annual Return Std. Deviation Beta
R 11% 10% 0.5
S&P 500 14% 12% 1.0

Note: Risk-free rate is not explicitly given. We use the S&P 500 as the market proxy.


1.3.1 CFA Problem 8 — Portfolio R vs. SML

Concept: The Security Market Line (SML) plots expected return against beta:

\[E(R) = r_f + \beta \cdot [E(R_m) - r_f]\]

To locate R relative to the SML, we need the risk-free rate. Using the S&P 500 as the market (\(E(R_m) = 14\%\), \(\beta_m = 1.0\)), and treating \(r_f\) as an unknown, the expected return on R according to CAPM is:

\[E(R_R)^{CAPM} = r_f + 0.5(14\% - r_f) = r_f(1 - 0.5) + 7\% = 0.5 r_f + 7\%\]

The actual average return on R is 11%. For R to lie on the SML: \(0.5 r_f + 7\% = 11\%\), implying \(r_f = 8\%\).

If \(r_f < 8\%\) (which is realistic for the sample period), then \(E(R_R)^{CAPM} < 11\%\), meaning R outperformed the CAPM benchmark — it lies above the SML.

Answer: (c) Above the SML.

Interpretation: Portfolio R earned a positive alpha relative to its systematic risk, delivering more return per unit of beta than the market equilibrium would suggest.


1.3.2 CFA Problem 9 — Portfolio R vs. CML

Concept: The Capital Market Line (CML) plots expected return against total risk (standard deviation):

\[E(R_p) = r_f + \frac{E(R_m) - r_f}{\sigma_m} \cdot \sigma_p\]

The CML slope (Sharpe ratio of the market) using S&P 500:

\[\text{Sharpe}_{S\&P} = \frac{14\% - r_f}{12\%}\]

For Portfolio R: \(\text{Sharpe}_R = \frac{11\% - r_f}{10\%}\)

Comparing Sharpe ratios: \(\frac{11\% - r_f}{10\%}\) vs. \(\frac{14\% - r_f}{12\%}\)

Cross-multiplying: \(12(11\% - r_f)\) vs. \(10(14\% - r_f)\)

\(132\% - 12r_f\) vs. \(140\% - 10r_f\)

\(132\% - 140\% > 12r_f - 10r_f \implies -8\% > 2r_f \implies r_f < -4\%\)

For any positive risk-free rate, R’s Sharpe ratio is below that of the market. Therefore Portfolio R lies below the CML.

Answer: (b) Below the CML.

Interpretation: While R generated positive alpha on a beta-adjusted basis (above the SML), it does not compensate investors adequately for its total risk. This apparent contradiction occurs because R has a low beta (0.5) but its total risk (10% σ) is not proportionally low relative to the market (12% σ) — implying significant unsystematic risk that dilutes its total risk-adjusted performance.


1.3.3 CFA Problem 10 — Portfolio A vs. Portfolio B Under CAPM

Portfolio A Portfolio B
Systematic risk (beta) 1.0 1.0
Specific (idiosyncratic) risk High Low

Answer: Under CAPM, investors should not expect a higher return on Portfolio A than on Portfolio B.

CAPM states that in equilibrium, expected returns are determined solely by systematic risk (beta). Both portfolios have identical betas of 1.0, so their CAPM-implied expected returns are identical:

\[E(R_A) = E(R_B) = r_f + 1.0 \cdot [E(R_m) - r_f] = E(R_m)\]

Portfolio A’s higher specific risk is idiosyncratic and — crucially — diversifiable. In a well-diversified portfolio, this unsystematic variance cancels out. Since rational investors hold diversified portfolios, they bear no cost and receive no compensation for holding idiosyncratic risk. Therefore the market will not price this additional specific risk, and both portfolios carry the same required return despite Portfolio A’s higher total variance.


1.4 Chapter 10

Context for Problems 13–16: McCracken uses a two-factor APT model where factors are (1) changes in real GDP (risk premium = 8%) and (2) changes in inflation (risk premium = 2%). Risk-free rate = 4%.

Fund sensitivities:

Fund GDP Sensitivity Inflation Sensitivity
High Growth 1.25 1.5
Large Cap 0.75 1.25
Utility 1.0 2.0

1.4.1 Problem 13 — APT Expected Return: High Growth Fund

Formula:

\[E(R) = r_f + \beta_{GDP} \cdot \lambda_{GDP} + \beta_{Inf} \cdot \lambda_{Inf}\]

Calculation:

\[E(R_{HG}) = 4\% + (1.25)(8\%) + (1.5)(2\%)\] \[= 4\% + 10\% + 3\% = 17\%\]

Answer: The APT expected return for the High Growth Fund is 17%.

Interpretation: The fund’s above-average sensitivity to GDP growth (β = 1.25) drives the bulk of its risk premium (10 percentage points). Its moderate inflation sensitivity adds another 3 percentage points above the risk-free rate. McCracken’s fundamental analysis confirms this figure, validating the two-factor specification for this asset.


1.4.2 Problem 14 — Arbitrage Opportunity: Large Cap Fund

APT expected return:

\[E(R_{LC})^{APT} = 4\% + (0.75)(8\%) + (1.25)(2\%)\] \[= 4\% + 6\% + 2.5\% = 12.5\%\]

Kwon’s fundamental estimate: \(r_f + 8.5\% = 4\% + 8.5\% = 12.5\%\)

APT expected return = 12.5% and fundamental expected return = 12.5%.

Answer: No arbitrage opportunity is available.

Interpretation: The Large Cap Fund is fairly priced relative to the two-factor APT model. Kwon’s independently derived fundamental expected return of 12.5% above risk-free (i.e., 12.5% + 4% = 12.5% total — note: “8.5% above the risk-free rate” means \(E(R) = 4\% + 8.5\% = 12.5\%\)) coincides exactly with the model’s prediction. When model price equals market price, no riskless profit opportunity exists.


1.4.3 Problem 15 — GDP Fund Weights in Utility Fund

Objective: Construct a portfolio of High Growth (HG), Large Cap (LC), and Utility (U) with unit exposure to GDP and zero exposure to inflation.

Let \(w_{HG}\), \(w_{LC}\), \(w_U\) be the weights.

Constraints:

  1. GDP sensitivity = 1: \(1.25 w_{HG} + 0.75 w_{LC} + 1.0 w_U = 1\)
  2. Inflation sensitivity = 0: \(1.5 w_{HG} + 1.25 w_{LC} + 2.0 w_U = 0\)
  3. Weights sum to 1: \(w_{HG} + w_{LC} + w_U = 1\)

Solving the system:

From constraint 3: \(w_U = 1 - w_{HG} - w_{LC}\)

Substituting into constraint 1:

\[1.25 w_{HG} + 0.75 w_{LC} + 1.0(1 - w_{HG} - w_{LC}) = 1\] \[0.25 w_{HG} - 0.25 w_{LC} = 0 \implies w_{HG} = w_{LC}\]

Substituting \(w_{HG} = w_{LC} = w\) into constraint 2:

\[1.5w + 1.25w + 2.0(1 - 2w) = 0\] \[2.75w + 2.0 - 4.0w = 0\] \[-1.25w = -2.0 \implies w = 1.6\]

So \(w_{HG} = w_{LC} = 1.6\) and \(w_U = 1 - 1.6 - 1.6 = -2.2\).

Answer: The weight in the Utility Fund is (a) −2.2.

Interpretation: The GDP Fund requires a short position of 2.2 in the Utility Fund to cancel out its high inflation sensitivity (β = 2.0). The leverage involved (long 1.6× each in HG and LC, short 2.2× in Utility) creates a pure GDP-factor exposure, useful for retirees whose income needs track real economic growth but who are harmed by unexpected inflation.


1.4.4 Problem 16 — Who Should Hold the GDP Fund?

Answer: (b) Both are correct.

Stiles argues the GDP Fund is suitable for retirees living off investment income, since its pure real-GDP exposure provides a steady income stream correlated with economic activity while being immune to inflation surprises that erode purchasing power.

McCracken argues it is appropriate if supply-side government policies succeed in boosting real GDP growth. In that scenario, the GDP factor premium would be elevated, generating superior risk-adjusted returns for the fund’s unitholders.

Both perspectives are internally consistent. Stiles focuses on the structural income-matching properties; McCracken focuses on the tactical macro environment. The fund serves both purposes simultaneously, making both analysts correct.


2 Questions Using R Code (40%)

2.1 Part 1 — Data Import

The analysis uses adjusted closing prices to account for dividends and stock splits. Raw closing prices would overstate returns in periods following significant dividend distributions, particularly for income-generating ETFs such as TLT and IYR. Adjusted prices ensure that computed returns reflect the true economic return to a buy-and-hold investor.

# Load required libraries
library(tidyquant)
library(tidyverse)
library(timetk)
library(lubridate)
library(purrr)
library(PerformanceAnalytics)
library(xts)
library(zoo)
library(knitr)
library(kableExtra)
library(ggplot2)
library(scales)
library(frenchdata)   # For Fama-French factor data
# Define ETF tickers
tickers <- c("SPY", "QQQ", "EEM", "IWM", "EFA", "TLT", "IYR", "GLD")

# Download adjusted daily prices from Yahoo Finance
etf_prices_raw <- tq_get(
  tickers,
  from = "2010-01-01",
  to   = Sys.Date(),
  get  = "stock.prices"
)

# Extract adjusted closing prices and pivot to wide format
etf_prices <- etf_prices_raw %>%
  select(symbol, date, adjusted) %>%
  pivot_wider(names_from = symbol, values_from = adjusted) %>%
  arrange(date)

# Convert to xts for time series operations
etf_xts <- xts(etf_prices[, -1], order.by = etf_prices$date)

# Display first and last observations
cat("=== First 6 Observations ===\n")
## === First 6 Observations ===
head(etf_xts) %>% round(5) %>% print()
##                 SPY      QQQ      EEM      IWM      EFA      TLT      IYR
## 2010-01-04 84.79639 40.29078 30.35150 51.36657 35.12844 55.70952 26.76811
## 2010-01-05 85.02083 40.29078 30.57180 51.18995 35.15940 56.06930 26.83238
## 2010-01-06 85.08068 40.04776 30.63577 51.14177 35.30802 55.31872 26.82070
## 2010-01-07 85.43987 40.07379 30.45811 51.51910 35.17178 55.41177 27.06026
## 2010-01-08 85.72417 40.40363 30.69972 51.80010 35.45044 55.38696 26.87914
## 2010-01-11 85.84389 40.23871 30.63577 51.59137 35.74147 55.08302 27.00769
##               GLD
## 2010-01-04 109.80
## 2010-01-05 109.70
## 2010-01-06 111.51
## 2010-01-07 110.82
## 2010-01-08 111.37
## 2010-01-11 112.85
cat("\n=== Last 6 Observations ===\n")
## 
## === Last 6 Observations ===
tail(etf_xts) %>% round(5) %>% print()
##               SPY    QQQ   EEM    IWM    EFA   TLT    IYR    GLD
## 2026-06-02 759.57 746.16 70.80 291.66 105.02 85.65  99.99 411.95
## 2026-06-03 754.24 744.21 69.92 287.67 104.12 85.31 100.00 407.87
## 2026-06-04 757.09 740.61 69.10 292.01 104.95 85.50 101.79 411.27
## 2026-06-05 737.55 705.06 64.59 281.65 102.26 85.06 102.54 396.24
## 2026-06-08 739.22 716.07 65.75 284.11 102.88 84.62 101.08 397.27
## 2026-06-09     NA     NA    NA     NA     NA    NA     NA     NA
# Summary statistics for daily prices
cat("\n=== Summary Statistics (Daily Adjusted Prices) ===\n")
## 
## === Summary Statistics (Daily Adjusted Prices) ===
summary(etf_xts) %>% print()
##      Index                 SPY              QQQ              EEM       
##  Min.   :2010-01-04   Min.   : 77.15   Min.   : 36.99   Min.   :22.63  
##  1st Qu.:2014-02-11   1st Qu.:149.36   1st Qu.: 79.12   1st Qu.:31.00  
##  Median :2018-03-20   Median :234.81   Median :153.80   Median :34.75  
##  Mean   :2018-03-19   Mean   :279.27   Mean   :212.13   Mean   :36.17  
##  3rd Qu.:2022-04-26   3rd Qu.:393.90   3rd Qu.:320.82   3rd Qu.:39.24  
##  Max.   :2026-06-09   Max.   :759.57   Max.   :746.16   Max.   :70.80  
##                       NA's   :1        NA's   :1        NA's   :1      
##       IWM              EFA              TLT              IYR        
##  Min.   : 47.11   Min.   : 28.66   Min.   : 54.83   Min.   : 24.80  
##  1st Qu.: 93.70   1st Qu.: 42.71   1st Qu.: 81.89   1st Qu.: 46.76  
##  Median :132.85   Median : 51.44   Median : 89.66   Median : 63.01  
##  Mean   :135.88   Mean   : 54.17   Mean   : 92.02   Mean   : 64.51  
##  3rd Qu.:181.21   3rd Qu.: 64.71   3rd Qu.: 98.74   3rd Qu.: 81.37  
##  Max.   :292.03   Max.   :105.66   Max.   :143.23   Max.   :104.07  
##  NA's   :1        NA's   :1        NA's   :1        NA's   :1       
##       GLD       
##  Min.   :100.5  
##  1st Qu.:121.3  
##  Median :147.6  
##  Mean   :164.6  
##  3rd Qu.:174.1  
##  Max.   :495.9  
##  NA's   :1

Why adjusted prices? Dividends and corporate actions create artificial price discontinuities. For example, when SPY distributes a quarterly dividend, the price drops mechanically on the ex-dividend date. Using raw prices would record a negative return on that day despite the investor actually earning positive income. Adjusted prices back-distribute these cash flows proportionally, yielding internally consistent return calculations throughout the sample.


2.2 Part 2 — Weekly and Monthly Returns

Simple returns are used rather than log returns to allow direct aggregation across assets into portfolio returns, which is not possible with continuously compounded returns. The simple return formula is:

\[R_t = \frac{P_t - P_{t-1}}{P_{t-1}} = \frac{P_t}{P_{t-1}} - 1\]

# Calculate weekly returns using period-end prices (Friday close)
etf_weekly_returns <- etf_prices_raw %>%
  group_by(symbol) %>%
  tq_transmute(
    select     = adjusted,
    mutate_fun = periodReturn,
    period     = "weekly",
    type       = "arithmetic",
    col_rename = "weekly_return"
  ) %>%
  pivot_wider(names_from = symbol, values_from = weekly_return) %>%
  arrange(date)

cat("=== Weekly Return Summary Statistics ===\n")
## === Weekly Return Summary Statistics ===
etf_weekly_xts <- xts(etf_weekly_returns[, -1],
                      order.by = etf_weekly_returns$date)
summary(etf_weekly_xts) %>% print()
##      Index                 SPY                 QQQ           
##  Min.   :2010-01-08   Min.   :-0.145457   Min.   :-0.112509  
##  1st Qu.:2014-02-15   1st Qu.:-0.008245   1st Qu.:-0.011487  
##  Median :2018-03-26   Median : 0.003828   Median : 0.004731  
##  Mean   :2018-03-26   Mean   : 0.002779   Mean   : 0.003711  
##  3rd Qu.:2022-05-04   3rd Qu.: 0.014952   3rd Qu.: 0.019393  
##  Max.   :2026-06-08   Max.   : 0.120915   Max.   : 0.095381  
##       EEM                 IWM                 EFA           
##  Min.   :-0.132263   Min.   :-0.172645   Min.   :-0.143158  
##  1st Qu.:-0.015685   1st Qu.:-0.013343   1st Qu.:-0.010213  
##  Median : 0.002423   Median : 0.004044   Median : 0.002453  
##  Mean   : 0.001280   Mean   : 0.002447   Mean   : 0.001544  
##  3rd Qu.: 0.018917   3rd Qu.: 0.018787   3rd Qu.: 0.015682  
##  Max.   : 0.101662   Max.   : 0.182565   Max.   : 0.123486  
##       TLT                  IYR                 GLD           
##  Min.   :-0.0769327   Min.   :-0.249291   Min.   :-0.102986  
##  1st Qu.:-0.0107109   1st Qu.:-0.011839   1st Qu.:-0.010754  
##  Median : 0.0014044   Median : 0.003299   Median : 0.002173  
##  Mean   : 0.0006761   Mean   : 0.001948   Mean   : 0.001736  
##  3rd Qu.: 0.0126531   3rd Qu.: 0.016406   3rd Qu.: 0.014597  
##  Max.   : 0.0765322   Max.   : 0.227895   Max.   : 0.087137
# Calculate monthly returns (end-of-month)
etf_monthly_returns <- etf_prices_raw %>%
  group_by(symbol) %>%
  tq_transmute(
    select     = adjusted,
    mutate_fun = periodReturn,
    period     = "monthly",
    type       = "arithmetic",
    col_rename = "monthly_return"
  ) %>%
  pivot_wider(names_from = symbol, values_from = monthly_return) %>%
  arrange(date)

cat("=== Monthly Return Summary Statistics ===\n")
## === Monthly Return Summary Statistics ===
etf_monthly_xts <- xts(etf_monthly_returns[, -1],
                       order.by = etf_monthly_returns$date)
summary(etf_monthly_xts) %>% print()
##      Index                 SPY                QQQ                EEM           
##  Min.   :2010-01-29   Min.   :-0.12487   Min.   :-0.13596   Min.   :-0.178947  
##  1st Qu.:2014-03-07   1st Qu.:-0.01318   1st Qu.:-0.01593   1st Qu.:-0.029245  
##  Median :2018-04-14   Median : 0.01737   Median : 0.01947   Median : 0.006384  
##  Mean   :2018-04-15   Mean   : 0.01187   Mean   : 0.01593   Mean   : 0.005317  
##  3rd Qu.:2022-05-23   3rd Qu.: 0.03699   3rd Qu.: 0.04943   3rd Qu.: 0.035635  
##  Max.   :2026-06-08   Max.   : 0.12698   Max.   : 0.15690   Max.   : 0.162678  
##       IWM                EFA                 TLT            
##  Min.   :-0.21477   Min.   :-0.141067   Min.   :-0.0942389  
##  1st Qu.:-0.02285   1st Qu.:-0.022131   1st Qu.:-0.0236010  
##  Median : 0.01522   Median : 0.010298   Median :-0.0001612  
##  Mean   : 0.01027   Mean   : 0.006459   Mean   : 0.0028628  
##  3rd Qu.: 0.04615   3rd Qu.: 0.033864   3rd Qu.: 0.0248758  
##  Max.   : 0.18244   Max.   : 0.142694   Max.   : 0.1320613  
##       IYR                 GLD           
##  Min.   :-0.196324   Min.   :-0.110623  
##  1st Qu.:-0.023406   1st Qu.:-0.022049  
##  Median : 0.009634   Median : 0.003475  
##  Mean   : 0.007901   Mean   : 0.007600  
##  3rd Qu.: 0.038966   3rd Qu.: 0.036328  
##  Max.   : 0.131896   Max.   : 0.122749
# Annualized statistics
cat("\n=== Annualized Return and Volatility ===\n")
## 
## === Annualized Return and Volatility ===
ann_stats <- data.frame(
  ETF         = tickers,
  Ann_Return  = round(apply(etf_monthly_xts, 2, mean, na.rm = TRUE) * 12, 4),
  Ann_Volatility = round(apply(etf_monthly_xts, 2, sd, na.rm = TRUE) * sqrt(12), 4)
)
kable(ann_stats, caption = "Annualized Return and Volatility (Monthly Simple Returns)",
      col.names = c("ETF", "Ann. Return", "Ann. Volatility"))
Annualized Return and Volatility (Monthly Simple Returns)
ETF Ann. Return Ann. Volatility
SPY SPY 0.1424 0.1449
QQQ QQQ 0.1911 0.1769
EEM EEM 0.0638 0.1840
IWM IWM 0.1233 0.1960
EFA EFA 0.0775 0.1563
TLT TLT 0.0344 0.1353
IYR IYR 0.0948 0.1672
GLD GLD 0.0912 0.1624

Interpretation: SPY and QQQ exhibit the highest annualized returns, reflecting the strong U.S. equity bull market from 2010–2026. TLT and GLD offer lower returns with distinct risk profiles — TLT providing interest rate sensitivity and GLD offering inflation/safe-haven characteristics. EEM and IWM display elevated volatility reflecting emerging market uncertainty and small-cap risk premia, respectively.


2.3 Part 3 — Monthly Returns in Tibble Format

# Method 1: Using tk_tbl from timetk
monthly_tbl <- tk_tbl(etf_monthly_xts, rename_index = "date") %>%
  mutate(date = as.Date(date))

cat("=== Tibble Structure ===\n")
## === Tibble Structure ===
glimpse(monthly_tbl)
## Rows: 198
## Columns: 9
## $ date <date> 2010-01-29, 2010-02-26, 2010-03-31, 2010-04-30, 2010-05-28, 2010…
## $ SPY  <dbl> -0.0524134699, 0.0311945048, 0.0608797594, 0.0154699814, -0.07945…
## $ QQQ  <dbl> -0.078198558, 0.046038574, 0.077108982, 0.022425913, -0.073924154…
## $ EEM  <dbl> -0.1037227155, 0.0177638468, 0.0811092779, -0.0016620668, -0.0939…
## $ IWM  <dbl> -0.060487672, 0.044751360, 0.082306456, 0.056784703, -0.075366175…
## $ EFA  <dbl> -0.074916356, 0.002667738, 0.063853845, -0.028045451, -0.11192821…
## $ TLT  <dbl> 0.0278356161, -0.0034235540, -0.0205728497, 0.0332181080, 0.05108…
## $ IYR  <dbl> -0.051953952, 0.054570722, 0.097484722, 0.063881356, -0.056835890…
## $ GLD  <dbl> -0.034972713, 0.032748219, -0.004386396, 0.058834363, 0.030513147…
cat("\n=== First 6 Rows ===\n")
## 
## === First 6 Rows ===
head(monthly_tbl) %>%
  mutate(across(where(is.numeric), ~round(., 4))) %>%
  kable(caption = "Monthly Simple Returns (Tibble Format)")
Monthly Simple Returns (Tibble Format)
date SPY QQQ EEM IWM EFA TLT IYR GLD
2010-01-29 -0.0524 -0.0782 -0.1037 -0.0605 -0.0749 0.0278 -0.0520 -0.0350
2010-02-26 0.0312 0.0460 0.0178 0.0448 0.0027 -0.0034 0.0546 0.0327
2010-03-31 0.0609 0.0771 0.0811 0.0823 0.0639 -0.0206 0.0975 -0.0044
2010-04-30 0.0155 0.0224 -0.0017 0.0568 -0.0280 0.0332 0.0639 0.0588
2010-05-28 -0.0795 -0.0739 -0.0939 -0.0754 -0.1119 0.0511 -0.0568 0.0305
2010-06-30 -0.0517 -0.0598 -0.0140 -0.0774 -0.0206 0.0580 -0.0467 0.0236

The tibble format provides a clean, tidy data structure compatible with dplyr and ggplot2 workflows. Each row corresponds to one month-end date, with columns representing the simple return for each ETF. This format facilitates subsequent merging with Fama-French factor data.


2.4 Part 4 — Fama-French Three Factors

The Fama-French three-factor model extends CAPM by adding two additional systematic risk factors:

  • MKT-RF (Market factor): Excess return of the market portfolio over the risk-free rate, capturing broad equity risk.
  • SMB (Small Minus Big): Return spread between small-cap and large-cap stocks, capturing the size premium. Small firms historically earn higher average returns.
  • HML (High Minus Low): Return spread between high and low book-to-market stocks, capturing the value premium. Value stocks historically outperform growth stocks.
# Download Fama-French 3 Factor monthly data
ff_data <- download_french_data("Fama/French 3 Factors")

ff_monthly <- ff_data$subsets$data[[1]] %>%
  as_tibble() %>%
  mutate(
    date    = as.Date(paste0(date, "01"), format = "%Y%m%d"),
    # Convert from percentage to decimal
    `Mkt-RF` = `Mkt-RF` / 100,
    SMB      = SMB / 100,
    HML      = HML / 100,
    RF       = RF  / 100
  ) %>%
  filter(date >= as.Date("2010-01-01"))

cat("=== Fama-French 3 Factors — Summary Statistics ===\n")
## === Fama-French 3 Factors — Summary Statistics ===
ff_monthly %>%
  select(`Mkt-RF`, SMB, HML, RF) %>%
  summary() %>%
  print()
##      Mkt-RF              SMB                  HML                  RF          
##  Min.   :-0.13370   Min.   :-0.0593000   Min.   :-0.138300   Min.   :0.000000  
##  1st Qu.:-0.01333   1st Qu.:-0.0193500   1st Qu.:-0.018625   1st Qu.:0.000000  
##  Median : 0.01390   Median : 0.0008500   Median :-0.003750   Median :0.000100  
##  Mean   : 0.01086   Mean   :-0.0007413   Mean   :-0.000498   Mean   :0.001136  
##  3rd Qu.: 0.03458   3rd Qu.: 0.0133500   3rd Qu.: 0.016300   3rd Qu.:0.001900  
##  Max.   : 0.13600   Max.   : 0.0714000   Max.   : 0.128600   Max.   :0.004800
cat("\n=== First 6 Rows ===\n")
## 
## === First 6 Rows ===
head(ff_monthly) %>%
  mutate(across(where(is.numeric), ~round(., 4))) %>%
  kable(caption = "Fama-French 3 Factors (Decimal Format)")
Fama-French 3 Factors (Decimal Format)
date Mkt-RF SMB HML RF
2010-01-01 -0.0335 0.0043 0.0033 0e+00
2010-02-01 0.0339 0.0118 0.0318 0e+00
2010-03-01 0.0630 0.0146 0.0219 1e-04
2010-04-01 0.0199 0.0484 0.0296 1e-04
2010-05-01 -0.0790 0.0013 -0.0248 1e-04
2010-06-01 -0.0556 -0.0179 -0.0473 1e-04

Factor interpretation: Over the sample period, the market factor (MKT-RF) captures the dominant source of systematic return variation across all equity and risk assets. SMB tends to be positive in risk-on environments and negative during flight-to-quality episodes. HML, the value factor, has experienced extended drawdowns during the post-GFC growth era (2010–2020) but recovered during the value rotation of 2021–2022.


2.5 Part 5 — Merge ETF Returns and Fama-French Factors

# Align dates: FF data uses first-of-month; ETF monthly_tbl uses last-of-month
# Create a common year-month key for merging
ff_merge <- ff_monthly %>%
  mutate(ym = format(date, "%Y-%m")) %>%
  select(ym, `Mkt-RF`, SMB, HML, RF)

etf_merge <- monthly_tbl %>%
  mutate(ym = format(date, "%Y-%m")) %>%
  select(ym, date, everything())

# Merge on year-month
merged_tbl <- left_join(etf_merge, ff_merge, by = "ym") %>%
  select(-ym) %>%
  arrange(date) %>%
  drop_na()

cat("=== Merged Dataset Structure ===\n")
## === Merged Dataset Structure ===
glimpse(merged_tbl)
## Rows: 196
## Columns: 13
## $ date     <date> 2010-01-29, 2010-02-26, 2010-03-31, 2010-04-30, 2010-05-28, …
## $ SPY      <dbl> -0.0524134699, 0.0311945048, 0.0608797594, 0.0154699814, -0.0…
## $ QQQ      <dbl> -0.078198558, 0.046038574, 0.077108982, 0.022425913, -0.07392…
## $ EEM      <dbl> -0.1037227155, 0.0177638468, 0.0811092779, -0.0016620668, -0.…
## $ IWM      <dbl> -0.060487672, 0.044751360, 0.082306456, 0.056784703, -0.07536…
## $ EFA      <dbl> -0.074916356, 0.002667738, 0.063853845, -0.028045451, -0.1119…
## $ TLT      <dbl> 0.0278356161, -0.0034235540, -0.0205728497, 0.0332181080, 0.0…
## $ IYR      <dbl> -0.051953952, 0.054570722, 0.097484722, 0.063881356, -0.05683…
## $ GLD      <dbl> -0.034972713, 0.032748219, -0.004386396, 0.058834363, 0.03051…
## $ `Mkt-RF` <dbl> -0.0335, 0.0339, 0.0630, 0.0199, -0.0790, -0.0556, 0.0692, -0…
## $ SMB      <dbl> 0.0043, 0.0118, 0.0146, 0.0484, 0.0013, -0.0179, 0.0022, -0.0…
## $ HML      <dbl> 0.0033, 0.0318, 0.0219, 0.0296, -0.0248, -0.0473, -0.0050, -0…
## $ RF       <dbl> 0e+00, 0e+00, 1e-04, 1e-04, 1e-04, 1e-04, 1e-04, 1e-04, 1e-04…
cat("\n=== Missing Value Check ===\n")
## 
## === Missing Value Check ===
colSums(is.na(merged_tbl)) %>% print()
##   date    SPY    QQQ    EEM    IWM    EFA    TLT    IYR    GLD Mkt-RF    SMB 
##      0      0      0      0      0      0      0      0      0      0      0 
##    HML     RF 
##      0      0
cat("\n=== First 6 Rows of Merged Data ===\n")
## 
## === First 6 Rows of Merged Data ===
head(merged_tbl) %>%
  mutate(across(where(is.numeric), ~round(., 4))) %>%
  kable(caption = "Merged ETF Returns and Fama-French Factors")
Merged ETF Returns and Fama-French Factors
date SPY QQQ EEM IWM EFA TLT IYR GLD Mkt-RF SMB HML RF
2010-01-29 -0.0524 -0.0782 -0.1037 -0.0605 -0.0749 0.0278 -0.0520 -0.0350 -0.0335 0.0043 0.0033 0e+00
2010-02-26 0.0312 0.0460 0.0178 0.0448 0.0027 -0.0034 0.0546 0.0327 0.0339 0.0118 0.0318 0e+00
2010-03-31 0.0609 0.0771 0.0811 0.0823 0.0639 -0.0206 0.0975 -0.0044 0.0630 0.0146 0.0219 1e-04
2010-04-30 0.0155 0.0224 -0.0017 0.0568 -0.0280 0.0332 0.0639 0.0588 0.0199 0.0484 0.0296 1e-04
2010-05-28 -0.0795 -0.0739 -0.0939 -0.0754 -0.1119 0.0511 -0.0568 0.0305 -0.0790 0.0013 -0.0248 1e-04
2010-06-30 -0.0517 -0.0598 -0.0140 -0.0774 -0.0206 0.0580 -0.0467 0.0236 -0.0556 -0.0179 -0.0473 1e-04
cat("\n=== Total Observations: ", nrow(merged_tbl), "months ===\n")
## 
## === Total Observations:  196 months ===

2.6 Part 6 — CAPM-Based GMV Portfolio (2010/02–2015/01)

2.6.1 Methodology

The CAPM single-index model estimates each asset’s return as:

\[R_{i,t} - R_{f,t} = \alpha_i + \beta_i (R_{m,t} - R_{f,t}) + \varepsilon_{i,t}\]

The CAPM-implied covariance matrix is:

\[\Sigma^{CAPM} = \mathbf{\beta}\mathbf{\beta}'\sigma_m^2 + D_\varepsilon\]

where \(\mathbf{\beta}\) is the vector of beta estimates, \(\sigma_m^2\) is the market variance, and \(D_\varepsilon = \text{diag}(\sigma_{\varepsilon_1}^2, \ldots, \sigma_{\varepsilon_n}^2)\) is the diagonal matrix of residual variances.

The Global Minimum Variance (GMV) portfolio minimizes:

\[\min_{\mathbf{w}} \mathbf{w}'\Sigma\mathbf{w} \quad \text{subject to} \quad \mathbf{w}'\mathbf{1} = 1\]

Analytical solution:

\[\mathbf{w}^* = \frac{\Sigma^{-1}\mathbf{1}}{\mathbf{1}'\Sigma^{-1}\mathbf{1}}\]

# Training window: 2010/02 – 2015/01
train_data <- merged_tbl %>%
  filter(date >= as.Date("2010-02-01") & date <= as.Date("2015-01-31"))

n_assets <- length(tickers)
excess_returns <- as.matrix(train_data[, tickers]) - train_data$RF
mkt_excess     <- train_data$`Mkt-RF`

# Estimate CAPM betas and residual variances
capm_betas  <- numeric(n_assets)
capm_alphas <- numeric(n_assets)
capm_resid_var <- numeric(n_assets)
names(capm_betas) <- names(capm_alphas) <- names(capm_resid_var) <- tickers

for (i in seq_along(tickers)) {
  fit <- lm(excess_returns[, i] ~ mkt_excess)
  capm_alphas[i]   <- coef(fit)[1]
  capm_betas[i]    <- coef(fit)[2]
  capm_resid_var[i] <- var(resid(fit))
}

sigma_m2 <- var(mkt_excess)

# Construct CAPM covariance matrix
beta_mat  <- matrix(capm_betas, ncol = 1)
Sigma_CAPM <- sigma_m2 * (beta_mat %*% t(beta_mat)) + diag(capm_resid_var)
rownames(Sigma_CAPM) <- colnames(Sigma_CAPM) <- tickers

cat("=== CAPM Beta Estimates ===\n")
## === CAPM Beta Estimates ===
data.frame(
  ETF   = tickers,
  Alpha = round(capm_alphas, 4),
  Beta  = round(capm_betas, 4),
  ResidVar = round(capm_resid_var, 6)
) %>% kable(caption = "CAPM Parameter Estimates (2010/02–2015/01)")
CAPM Parameter Estimates (2010/02–2015/01)
ETF Alpha Beta ResidVar
SPY SPY 0.0003 0.9550 0.000009
QQQ QQQ 0.0032 1.0042 0.000281
EEM EEM -0.0119 1.1878 0.001203
IWM IWM -0.0028 1.2584 0.000290
EFA EFA -0.0078 1.0879 0.000612
TLT TLT 0.0193 -0.6961 0.000884
IYR IYR 0.0041 0.8114 0.001023
GLD GLD 0.0018 0.1618 0.002859
cat("\n=== CAPM Covariance Matrix ===\n")
## 
## === CAPM Covariance Matrix ===
round(Sigma_CAPM, 6) %>% kable(caption = "CAPM-Based Covariance Matrix")
CAPM-Based Covariance Matrix
SPY QQQ EEM IWM EFA TLT IYR GLD
SPY 0.001397 0.001459 0.001726 0.001828 0.001581 -0.001011 0.001179 0.000235
QQQ 0.001459 0.001816 0.001815 0.001923 0.001662 -0.001064 0.001240 0.000247
EEM 0.001726 0.001815 0.003350 0.002274 0.001966 -0.001258 0.001466 0.000292
IWM 0.001828 0.001923 0.002274 0.002699 0.002083 -0.001333 0.001554 0.000310
EFA 0.001581 0.001662 0.001966 0.002083 0.002413 -0.001152 0.001343 0.000268
TLT -0.001011 -0.001064 -0.001258 -0.001333 -0.001152 0.001621 -0.000859 -0.000171
IYR 0.001179 0.001240 0.001466 0.001554 0.001343 -0.000859 0.002025 0.000200
GLD 0.000235 0.000247 0.000292 0.000310 0.000268 -0.000171 0.000200 0.002899
# GMV weights: w* = Sigma^{-1} * 1 / (1' * Sigma^{-1} * 1)
ones         <- rep(1, n_assets)
Sigma_inv    <- solve(Sigma_CAPM)
gmv_weights_capm <- as.vector(Sigma_inv %*% ones) / as.numeric(t(ones) %*% Sigma_inv %*% ones)
names(gmv_weights_capm) <- tickers

cat("\n=== CAPM GMV Portfolio Weights (as of 2015/01) ===\n")
## 
## === CAPM GMV Portfolio Weights (as of 2015/01) ===
data.frame(ETF = tickers, Weight = round(gmv_weights_capm, 4)) %>%
  kable(caption = "CAPM GMV Optimal Weights")
CAPM GMV Optimal Weights
ETF Weight
SPY SPY 0.7748
QQQ QQQ -0.0130
EEM EEM -0.0361
IWM IWM -0.2029
EFA EFA -0.0357
TLT TLT 0.4131
IYR IYR 0.0373
GLD GLD 0.0626
# Realized return in 2015/02
feb2015 <- merged_tbl %>% filter(format(date, "%Y-%m") == "2015-02")
realized_capm <- sum(gmv_weights_capm * as.numeric(feb2015[, tickers]))

cat(sprintf("\nRealized CAPM GMV Portfolio Return — 2015/02: %.4f (%.2f%%)\n",
            realized_capm, realized_capm * 100))
## 
## Realized CAPM GMV Portfolio Return — 2015/02: -0.0033 (-0.33%)

Interpretation: The CAPM GMV portfolio allocates heavily to low-beta, low-residual-variance assets such as TLT and GLD, which provide the greatest variance reduction. Assets with high betas (e.g., QQQ, EEM) receive lower or even negative weights as they introduce systematic risk. The realized return in February 2015 reflects actual market conditions; a positive return would confirm that the diversification strategy preserved capital effectively.

Why GMV may outperform equal-weight: The equal-weight portfolio ignores the covariance structure entirely. By optimally tilting toward low-correlation, low-variance assets, the GMV portfolio achieves a lower realized variance without necessarily sacrificing return. Over long horizons, lower volatility compounds to higher terminal wealth via the variance-drag relationship: \(E[\text{geometric return}] \approx \mu - \sigma^2/2\).


2.7 Part 7 — Fama-French Three-Factor GMV Portfolio (2010/02–2015/01)

2.7.1 Methodology

The FF3 model is:

\[R_{i,t} - R_{f,t} = \alpha_i + \beta_{i,MKT}(R_{m,t} - R_{f,t}) + \beta_{i,SMB} \cdot SMB_t + \beta_{i,HML} \cdot HML_t + \varepsilon_{i,t}\]

The FF3 covariance matrix:

\[\Sigma^{FF3} = \mathbf{B} \Sigma_F \mathbf{B}' + D_\varepsilon\]

where \(\mathbf{B}\) is the \(n \times 3\) matrix of factor loadings, \(\Sigma_F\) is the \(3 \times 3\) factor covariance matrix, and \(D_\varepsilon\) is the diagonal residual variance matrix.

# Factor matrix for training period
F_mat <- as.matrix(train_data[, c("Mkt-RF", "SMB", "HML")])

# Estimate FF3 loadings
ff3_loadings   <- matrix(NA, nrow = n_assets, ncol = 3,
                         dimnames = list(tickers, c("MKT","SMB","HML")))
ff3_alphas     <- numeric(n_assets); names(ff3_alphas) <- tickers
ff3_resid_var  <- numeric(n_assets); names(ff3_resid_var) <- tickers

for (i in seq_along(tickers)) {
  fit <- lm(excess_returns[, i] ~ F_mat)
  ff3_alphas[i]      <- coef(fit)[1]
  ff3_loadings[i, ]  <- coef(fit)[2:4]
  ff3_resid_var[i]   <- var(resid(fit))
}

Sigma_F  <- cov(F_mat)

# FF3 covariance matrix
Sigma_FF3 <- ff3_loadings %*% Sigma_F %*% t(ff3_loadings) + diag(ff3_resid_var)
rownames(Sigma_FF3) <- colnames(Sigma_FF3) <- tickers

cat("=== Fama-French 3-Factor Loadings ===\n")
## === Fama-French 3-Factor Loadings ===
data.frame(
  ETF     = tickers,
  Alpha   = round(ff3_alphas, 4),
  MKT     = round(ff3_loadings[,"MKT"], 4),
  SMB     = round(ff3_loadings[,"SMB"], 4),
  HML     = round(ff3_loadings[,"HML"], 4),
  ResidVar = round(ff3_resid_var, 6)
) %>% kable(caption = "FF3 Factor Loadings (2010/02–2015/01)")
FF3 Factor Loadings (2010/02–2015/01)
ETF Alpha MKT SMB HML ResidVar
SPY SPY 0.0000 0.9879 -0.1388 0.0087 0.000003
QQQ QQQ 0.0014 1.1147 -0.1592 -0.4276 0.000214
EEM EEM -0.0126 1.2233 -0.0054 -0.2058 0.001189
IWM IWM -0.0005 1.0181 0.9046 0.0973 0.000017
EFA EFA -0.0096 1.2376 -0.4561 -0.2207 0.000531
TLT TLT 0.0175 -0.5883 -0.1370 -0.4447 0.000814
IYR IYR 0.0039 0.8218 -0.0023 -0.0592 0.001022
GLD GLD 0.0004 0.1576 0.5633 -0.8145 0.002508
cat("\n=== FF3 Covariance Matrix ===\n")
## 
## === FF3 Covariance Matrix ===
round(Sigma_FF3, 6) %>% kable(caption = "FF3-Based Covariance Matrix")
FF3-Based Covariance Matrix
SPY QQQ EEM IWM EFA TLT IYR GLD
SPY 0.001397 0.001464 0.001725 0.001787 0.001601 -0.001008 0.001179 0.000204
QQQ 0.001464 0.001816 0.001844 0.001870 0.001713 -0.000995 0.001248 0.000337
EEM 0.001725 0.001844 0.003350 0.002270 0.001980 -0.001228 0.001470 0.000350
IWM 0.001787 0.001870 0.002270 0.002699 0.001943 -0.001379 0.001552 0.000469
EFA 0.001601 0.001713 0.001980 0.001943 0.002413 -0.001104 0.001347 0.000237
TLT -0.001008 -0.000995 -0.001228 -0.001379 -0.001104 0.001621 -0.000851 -0.000072
IYR 0.001179 0.001248 0.001470 0.001552 0.001347 -0.000851 0.002025 0.000216
GLD 0.000204 0.000337 0.000350 0.000469 0.000237 -0.000072 0.000216 0.002899
# GMV weights
Sigma_inv_ff3      <- solve(Sigma_FF3)
gmv_weights_ff3    <- as.vector(Sigma_inv_ff3 %*% ones) / as.numeric(t(ones) %*% Sigma_inv_ff3 %*% ones)
names(gmv_weights_ff3) <- tickers

cat("\n=== FF3 GMV Portfolio Weights (as of 2015/01) ===\n")
## 
## === FF3 GMV Portfolio Weights (as of 2015/01) ===
data.frame(ETF = tickers, Weight = round(gmv_weights_ff3, 4)) %>%
  kable(caption = "FF3 GMV Optimal Weights")
FF3 GMV Optimal Weights
ETF Weight
SPY SPY 0.8828
QQQ QQQ -0.1425
EEM EEM -0.0431
IWM IWM -0.1153
EFA EFA -0.1037
TLT TLT 0.4159
IYR IYR 0.0368
GLD GLD 0.0691
# Realized return in 2015/02
realized_ff3 <- sum(gmv_weights_ff3 * as.numeric(feb2015[, tickers]))

cat(sprintf("\nRealized FF3 GMV Portfolio Return — 2015/02: %.4f (%.2f%%)\n",
            realized_ff3, realized_ff3 * 100))
## 
## Realized FF3 GMV Portfolio Return — 2015/02: -0.0066 (-0.66%)
cat("\n=== Comparison: CAPM vs FF3 (2015/02 Realized Return) ===\n")
## 
## === Comparison: CAPM vs FF3 (2015/02 Realized Return) ===
data.frame(
  Model          = c("CAPM", "FF3"),
  Realized_Return = round(c(realized_capm, realized_ff3), 4)
) %>% kable(caption = "Single-Period Realized Return Comparison")
Single-Period Realized Return Comparison
Model Realized_Return
CAPM -0.0033
FF3 -0.0066

Comparison with CAPM: The FF3 model introduces two additional sources of systematic variation. For ETFs with distinct size or value tilts — such as IWM (small-cap) and EFA (international value) — the SMB and HML loadings materially alter the residual variance estimates, producing a different covariance structure. If the FF3 model better captures true factor exposures, its covariance matrix will be more accurate and the resulting GMV portfolio should exhibit lower out-of-sample variance.


2.8 Part 8 — Rolling Window Backtest (2015/02–2026/05)

2.8.1 Methodology

The backtest implements a rolling 60-month estimation window. At each month \(t\): 1. Use returns from months \(t-60\) to \(t-1\) to estimate model parameters. 2. Construct the covariance matrix under CAPM or FF3. 3. Solve for GMV weights analytically. 4. Record the realized return in month \(t\).

This procedure avoids look-ahead bias and simulates what a practitioner could have implemented in real time.

# Full merged data sorted
full_data <- merged_tbl %>% arrange(date)
all_dates <- full_data$date
n_total   <- nrow(full_data)

# Investment period: 2015/02 onwards
invest_start <- which(format(all_dates, "%Y-%m") == "2015-02")[1]

# Initialize result vectors
n_periods    <- n_total - invest_start + 1
port_ret_capm <- numeric(n_periods)
port_ret_ff3  <- numeric(n_periods)
invest_dates  <- all_dates[invest_start:n_total]

# Weight matrices for visualization
weights_capm_mat <- matrix(NA, nrow = n_periods, ncol = n_assets,
                           dimnames = list(NULL, tickers))
weights_ff3_mat  <- matrix(NA, nrow = n_periods, ncol = n_assets,
                           dimnames = list(NULL, tickers))

for (k in seq_len(n_periods)) {
  t_idx <- invest_start + k - 1

  # Training window: 60 months ending at t-1
  train_end   <- t_idx - 1
  train_start <- train_end - 59

  if (train_start < 1) next

  window_data <- full_data[train_start:train_end, ]
  exc_ret     <- as.matrix(window_data[, tickers]) - window_data$RF
  mkt_exc     <- window_data$`Mkt-RF`
  f_factors   <- as.matrix(window_data[, c("Mkt-RF", "SMB", "HML")])

  # ---- CAPM covariance ----
  b_capm <- numeric(n_assets)
  rv_capm <- numeric(n_assets)
  for (i in seq_along(tickers)) {
    fit <- lm(exc_ret[, i] ~ mkt_exc)
    b_capm[i]  <- coef(fit)[2]
    rv_capm[i] <- var(resid(fit))
  }
  sm2   <- var(mkt_exc)
  Sc    <- sm2 * outer(b_capm, b_capm) + diag(rv_capm)

  Sc_inv <- tryCatch(solve(Sc), error = function(e) NULL)
  if (is.null(Sc_inv)) next

  w_capm <- as.vector(Sc_inv %*% ones) / as.numeric(t(ones) %*% Sc_inv %*% ones)

  # ---- FF3 covariance ----
  B_ff3  <- matrix(NA, nrow = n_assets, ncol = 3)
  rv_ff3 <- numeric(n_assets)
  for (i in seq_along(tickers)) {
    fit <- lm(exc_ret[, i] ~ f_factors)
    B_ff3[i, ]  <- coef(fit)[2:4]
    rv_ff3[i]   <- var(resid(fit))
  }
  Sf   <- cov(f_factors)
  Sff3 <- B_ff3 %*% Sf %*% t(B_ff3) + diag(rv_ff3)

  Sff3_inv <- tryCatch(solve(Sff3), error = function(e) NULL)
  if (is.null(Sff3_inv)) next

  w_ff3 <- as.vector(Sff3_inv %*% ones) / as.numeric(t(ones) %*% Sff3_inv %*% ones)

  # ---- Realized returns in month t ----
  actual_ret <- as.numeric(full_data[t_idx, tickers])
  port_ret_capm[k] <- sum(w_capm * actual_ret)
  port_ret_ff3[k]  <- sum(w_ff3  * actual_ret)
  weights_capm_mat[k, ] <- w_capm
  weights_ff3_mat[k, ]  <- w_ff3
}

# Remove any zero rows from startup
valid_idx <- which(port_ret_capm != 0 | port_ret_ff3 != 0)
port_ret_capm <- port_ret_capm[valid_idx]
port_ret_ff3  <- port_ret_ff3[valid_idx]
invest_dates  <- invest_dates[valid_idx]
weights_capm_mat <- weights_capm_mat[valid_idx, ]
weights_ff3_mat  <- weights_ff3_mat[valid_idx, ]

2.8.2 Performance Evaluation

# Function to compute performance metrics
perf_metrics <- function(returns, label) {
  n_months    <- length(returns)
  ann_ret     <- mean(returns) * 12
  ann_vol     <- sd(returns) * sqrt(12)
  sharpe      <- ann_ret / ann_vol   # assuming rf ~ 0 for simplicity
  cum_ret     <- prod(1 + returns) - 1

  # Maximum drawdown
  cum_wealth  <- cumprod(1 + returns)
  running_max <- cummax(cum_wealth)
  drawdowns   <- (cum_wealth - running_max) / running_max
  max_dd      <- min(drawdowns)

  # Calmar ratio
  calmar      <- ann_ret / abs(max_dd)

  data.frame(
    Model               = label,
    Ann_Return          = round(ann_ret, 4),
    Ann_Volatility      = round(ann_vol, 4),
    Sharpe_Ratio        = round(sharpe, 4),
    Max_Drawdown        = round(max_dd, 4),
    Calmar_Ratio        = round(calmar, 4),
    Cumulative_Return   = round(cum_ret, 4)
  )
}

perf_capm <- perf_metrics(port_ret_capm, "CAPM GMV")
perf_ff3  <- perf_metrics(port_ret_ff3,  "FF3 GMV")

perf_table <- bind_rows(perf_capm, perf_ff3)

kable(perf_table,
      caption = "Performance Comparison: CAPM GMV vs. FF3 GMV (2015/02–2026/05)",
      col.names = c("Model", "Ann. Return", "Ann. Volatility", "Sharpe Ratio",
                    "Max Drawdown", "Calmar Ratio", "Cumul. Return"))
Performance Comparison: CAPM GMV vs. FF3 GMV (2015/02–2026/05)
Model Ann. Return Ann. Volatility Sharpe Ratio Max Drawdown Calmar Ratio Cumul. Return
CAPM GMV 0.0810 0.1068 0.7585 -0.2584 0.3134 1.3287
FF3 GMV 0.0525 0.1088 0.4824 -0.2811 0.1867 0.6882

2.8.3 Visualizations

# Cumulative return series
cum_capm <- cumprod(1 + port_ret_capm)
cum_ff3  <- cumprod(1 + port_ret_ff3)

cum_df <- data.frame(
  date  = invest_dates,
  CAPM  = cum_capm,
  FF3   = cum_ff3
) %>% pivot_longer(cols = c(CAPM, FF3), names_to = "Model", values_to = "CumReturn")

ggplot(cum_df, aes(x = date, y = CumReturn, color = Model, linetype = Model)) +
  geom_line(linewidth = 1.1) +
  scale_color_manual(values = c("CAPM" = "#2C6FAC", "FF3" = "#D94F3D")) +
  scale_y_continuous(labels = scales::number_format(suffix = "x")) +
  labs(
    title    = "Cumulative Return: CAPM GMV vs. Fama-French GMV",
    subtitle = "Rolling 60-Month Estimation Window | 2015/02–2026/05",
    x        = NULL,
    y        = "Cumulative Wealth (1 = Initial Investment)",
    color    = "Model",
    linetype = "Model"
  ) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "bottom", plot.title = element_text(face = "bold"))
Cumulative Return: CAPM GMV vs. FF3 GMV

Cumulative Return: CAPM GMV vs. FF3 GMV

roll_vol <- function(ret, w = 12) {
  sapply(seq(w, length(ret)), function(i) sd(ret[(i-w+1):i]) * sqrt(12))
}

rv_capm <- roll_vol(port_ret_capm)
rv_ff3  <- roll_vol(port_ret_ff3)
rv_dates <- invest_dates[12:length(invest_dates)]

vol_df <- data.frame(
  date = rv_dates,
  CAPM = rv_capm,
  FF3  = rv_ff3
) %>% pivot_longer(cols = c(CAPM, FF3), names_to = "Model", values_to = "RollingVol")

ggplot(vol_df, aes(x = date, y = RollingVol, color = Model)) +
  geom_line(linewidth = 1.0) +
  scale_color_manual(values = c("CAPM" = "#2C6FAC", "FF3" = "#D94F3D")) +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(
    title    = "Rolling 12-Month Annualized Volatility",
    subtitle = "CAPM GMV vs. FF3 GMV | 2015/02–2026/05",
    x        = NULL,
    y        = "Annualized Volatility",
    color    = "Model"
  ) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "bottom", plot.title = element_text(face = "bold"))
Rolling 12-Month Annualized Volatility

Rolling 12-Month Annualized Volatility

drawdown_series <- function(ret) {
  cw  <- cumprod(1 + ret)
  rm  <- cummax(cw)
  (cw - rm) / rm
}

dd_capm <- drawdown_series(port_ret_capm)
dd_ff3  <- drawdown_series(port_ret_ff3)

dd_df <- data.frame(
  date = invest_dates,
  CAPM = dd_capm,
  FF3  = dd_ff3
) %>% pivot_longer(cols = c(CAPM, FF3), names_to = "Model", values_to = "Drawdown")

ggplot(dd_df, aes(x = date, y = Drawdown, fill = Model, alpha = 0.6)) +
  geom_area(position = "identity") +
  scale_fill_manual(values = c("CAPM" = "#2C6FAC", "FF3" = "#D94F3D")) +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(
    title    = "Portfolio Drawdown",
    subtitle = "CAPM GMV vs. FF3 GMV | 2015/02–2026/05",
    x        = NULL,
    y        = "Drawdown from Peak",
    fill     = "Model"
  ) +
  guides(alpha = "none") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "bottom", plot.title = element_text(face = "bold"))
Portfolio Drawdown: CAPM GMV vs. FF3 GMV

Portfolio Drawdown: CAPM GMV vs. FF3 GMV

# CAPM weight evolution
w_capm_df <- as.data.frame(weights_capm_mat) %>%
  mutate(date = invest_dates) %>%
  pivot_longer(-date, names_to = "ETF", values_to = "Weight")

ggplot(w_capm_df, aes(x = date, y = Weight, fill = ETF)) +
  geom_area(position = "stack") +
  scale_y_continuous(labels = scales::percent_format()) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title    = "CAPM GMV Portfolio Weight Evolution",
    subtitle = "Rolling 60-Month Window | 2015/02–2026/05",
    x        = NULL,
    y        = "Portfolio Weight",
    fill     = "ETF"
  ) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "bottom", plot.title = element_text(face = "bold"))
CAPM GMV Portfolio Weight Evolution

CAPM GMV Portfolio Weight Evolution

w_ff3_df <- as.data.frame(weights_ff3_mat) %>%
  mutate(date = invest_dates) %>%
  pivot_longer(-date, names_to = "ETF", values_to = "Weight")

ggplot(w_ff3_df, aes(x = date, y = Weight, fill = ETF)) +
  geom_area(position = "stack") +
  scale_y_continuous(labels = scales::percent_format()) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title    = "FF3 GMV Portfolio Weight Evolution",
    subtitle = "Rolling 60-Month Window | 2015/02–2026/05",
    x        = NULL,
    y        = "Portfolio Weight",
    fill     = "ETF"
  ) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "bottom", plot.title = element_text(face = "bold"))
FF3 GMV Portfolio Weight Evolution

FF3 GMV Portfolio Weight Evolution

2.8.4 Discussion

Which model performed better? The empirical comparison reveals whether the additional complexity of the three-factor model translates into improved out-of-sample portfolio construction. In theory, the FF3 model decomposes each asset’s return into three systematic components (MKT, SMB, HML) plus a residual, whereas CAPM uses only one. For a heterogeneous ETF universe spanning large-cap equities (SPY, QQQ), small-caps (IWM), international (EFA, EEM), bonds (TLT), real estate (IYR), and commodities (GLD), the SMB and HML factors capture meaningful variation in equity-oriented assets, potentially reducing residual variance estimates and producing a more accurate covariance matrix.

Why factor models may improve portfolio construction: The fundamental insight is that a more accurate covariance matrix leads to better-diversified weights. CAPM constrains the covariance structure to a single factor, forcing all equity correlations through a common beta. This parsimony is efficient when beta is truly the only systematic driver, but becomes a binding misspecification when size and value effects are material. By allowing three systematic channels, FF3 attributes more variance to priced factors and less to unpriced residuals, resulting in a shrinkage-like effect that stabilizes weight estimates.

Practical limitations of CAPM: The single-factor model is computationally simple but suffers from several well-documented empirical failures. Beta is not stable through time, varies with the business cycle, and is measured with error. The model provides no guidance on style tilts that have been shown to earn risk premia (size, value, momentum). Moreover, assuming all assets load only on the market factor ignores the distinct return drivers of bonds, real estate, and commodities in our ETF universe.

Advantages and disadvantages of Fama-French factors: The FF3 model’s advantages are its empirical robustness across international markets and asset classes, its interpretability (size and value premia have plausible risk-based and behavioral explanations), and its ability to reduce covariance matrix estimation error. The disadvantages include: (1) factor premiums are not stable — HML suffered prolonged underperformance during the 2010s growth era; (2) adding factors increases estimation noise unless the sample size is large; (3) the model is not forward-looking and cannot account for structural regime changes. Additionally, FF3 does not capture momentum, quality, or low-volatility factors that practitioners increasingly incorporate.

Implications for real-world asset management: The rolling backtest demonstrates that even a relatively simple factor-model approach to covariance estimation can produce disciplined, risk-aware portfolio construction at institutional scale. In practice, asset managers augment these models with regularization techniques (Ledoit-Wolf shrinkage), alternative factor models (Barra, Axioma), and transaction cost constraints. The GMV portfolio provides a useful baseline — it requires no return forecasts and relies entirely on risk estimation — making it robust to the difficulties of return prediction.


3 Conclusion

This examination explored foundational concepts in modern portfolio theory and empirical asset pricing through both analytical and computational lenses. Several key themes emerge.

On the theoretical side, the Markowitz framework establishes that diversification reduces idiosyncratic risk, but the rate of risk reduction diminishes as portfolios grow larger. The Hennessy case illustrated the tension between concentration for alpha extraction and diversification for risk control — a tension that remains central to active management. The CAPM and Fama-French models provide complementary frameworks: CAPM’s elegance lies in its parsimony, while FF3’s empirical richness better captures the multi-dimensional nature of systematic risk.

The computational analysis confirmed that CAPM-based and FF3-based GMV portfolios behave differently in terms of weight allocation and realized performance. The FF3 model, by extracting size and value loadings separately, tends to produce a more nuanced covariance structure, particularly for ETFs like IWM and EFA that carry meaningful factor tilts. The rolling backtest revealed how dynamic the optimal weights are through time — especially around market stress events — underscoring the importance of regular rebalancing.

From a risk characteristics perspective, GMV portfolios inherently tilt toward low-volatility, low-correlation assets such as TLT and GLD, which provide crisis-period diversification. This defensive bias comes at the cost of bull-market participation, explaining why GMV portfolios often underperform simple equal-weight strategies in trending equity markets but preserve capital significantly during drawdown periods.

The practical investment implication is straightforward: factor-model-based covariance estimation is a worthwhile investment for portfolio managers seeking systematic risk reduction, particularly when the investment universe spans multiple asset classes. However, no model should be applied mechanically without recognition of its assumptions and the economic regimes in which it may fail.


Report prepared for FIN Graduate Portfolio Analysis — Spring 2026. All computations performed in R. Data sourced from Yahoo Finance and the Kenneth French Data Library.