1 Questions from Textbook (60%)

1.1 Chapter 7: Optimal Risky Portfolios

1.1.1 CFA Problem 1 — Hennessy Portfolio: 40 vs 20 Stocks

Context: Hennessy & Associates manages a $30 million equity portfolio (~40 stocks at 2–3% per issue). Portfolio manager Jones proposes cutting holdings to 20 stocks, doubling commitments to each.

Part (a): Will limiting to 20 stocks likely increase or decrease portfolio risk?

Limiting the portfolio to 20 stocks will increase risk. The key mechanism is diversification. The variance of an equally-weighted $n$-stock portfolio is:

\[\sigma_p^2 = \frac{1}{n}\bar{\sigma}_i^2 + \left(1 - \frac{1}{n}\right)\overline{\text{Cov}}\]

As $n$ decreases from 40 to 20, the first term — which captures undiversified firm-specific (idiosyncratic) variance — roughly doubles. Each remaining stock now carries twice the portfolio weight, so any adverse firm-level event has twice the impact on total portfolio return. The only way total variance would not rise is if the 20 retained stocks have unusually low idiosyncratic variance, which cannot be assumed in general.

Part (b): Is there any way Hennessy could reduce from 40 to 20 stocks without significantly affecting risk?

Yes. If the 20 retained stocks are chosen to have very low pairwise correlations with one another, the covariance term in the portfolio variance formula remains small, partially offsetting the loss of diversification from holding fewer names. Specifically:

Select low-correlation stocks: Choose 20 names whose return drivers are as uncorrelated as possible (e.g., spanning different sectors, geographies, or factor exposures).
Preserve systematic factor coverage: Ensure the 20 stocks collectively span the same systematic risk factors (market, size, value, momentum) as the original 40. Systematic risk — which dominates the second term in the formula — would then be unchanged, and only the marginal increase in residual risk matters.

In practice, Hennessy’s bottom-up skill gives him information to identify which 20 stocks are both high-conviction and mutually uncorrelated.

1.1.2 CFA Problem 2 — Further Concentration to 10 Stocks

If reducing from 40 to 20 is advantageous, is reducing further to 10 also advantageous?

Not necessarily. The benefit of concentration arises from the active return (alpha) expected from high-conviction positions. However, the marginal risk cost of each stock removed rises steeply as the portfolio shrinks:

Moving 40 → 20 doubles the weight per stock
Moving 20 → 10 doubles it again, to 10% per position

At very high concentration, the residual (unsystematic) risk retained in the portfolio can easily overwhelm any incremental alpha from removing an additional stock. Moreover, Hennessy is evaluated independently of other Wilstead managers against the S&P 500. A 10-stock portfolio would exhibit very large tracking error relative to the benchmark, meaning that even with genuine skill, a run of bad luck could produce multi-year underperformance and risk losing the mandate.

Conclusion: There is a diminishing-returns trade-off. Reducing from 40 to 20 may be optimal; reducing to 10 likely crosses the threshold where idiosyncratic risk costs exceed the alpha benefit.

1.1.3 CFA Problem 3 — Portfolio-of-Portfolios Perspective

How does viewing Hennessy as one component of Wilstead’s total $280 million fund change the analysis?

The critical insight is that risk should be measured at the total-fund level, not at the level of a single sub-portfolio. Wilstead’s five other managers collectively hold 150+ diversified stocks in a $250 million pool. From the total-fund perspective:

Marginal contribution to risk is what matters: the relevant quantity is Hennessy’s covariance with the rest of the fund, not his standalone variance.
Cross-portfolio diversification: If Hennessy’s concentrated positions are in names not held by the other managers, they provide diversification across managers even if Hennessy’s own book is concentrated.
Result: Hennessy can afford to run a more concentrated portfolio (even 10 stocks) without materially raising total-fund risk, provided those positions have low correlation with the other managers’ holdings.

Conclusion: The committee should be more willing to allow concentration when evaluating Hennessy as a sub-portfolio. The fewer overlapping holdings between Hennessy and other managers, the greater the diversification benefit to the total fund.

1.1.4 CFA Problem 4 — Efficient Frontier (Markowitz)

Which portfolio cannot lie on the efficient frontier?

Portfolio	Expected Return (%)	Std Dev (%)
W	15	36
X	12	15
Z	5	7
Y	9	21

On the Markowitz efficient frontier, a portfolio is efficient if and only if no other feasible portfolio offers a higher expected return for the same (or lower) risk.

Compare Portfolio Y (9%, 21% std dev) with Portfolio X (12%, 15% std dev). Portfolio X achieves a higher expected return and a lower standard deviation than Portfolio Y simultaneously. Y is therefore mean-variance dominated by X.

Answer: Portfolio Y (d) cannot lie on the efficient frontier.

W, X, and Z could all potentially be efficient because no single portfolio shown dominates each of them on both dimensions simultaneously.

1.1.5 CFA Problem 10 — Portfolio A&B vs Portfolio B&C

Given only standard deviations and correlations, which 50/50 combination is preferable: A & B, or B & C?

The variance of an equally-weighted two-asset portfolio is:

\[\sigma_p^2 = \left(\tfrac{1}{2}\right)^2 \sigma_1^2 + \left(\tfrac{1}{2}\right)^2 \sigma_2^2 + 2 \cdot \tfrac{1}{2} \cdot \tfrac{1}{2} \cdot \rho_{12} \cdot \sigma_1 \cdot \sigma_2\]

# --- Given standard deviations ---
sd_A <- 40; sd_B <- 20; sd_C <- 40

# --- Given correlations ---
rho_AB <- 0.90   # A & B
rho_BC <- 0.10   # B & C

# --- Portfolio variance function (equal weights) ---
port_var <- function(sd1, sd2, rho) {
  0.25 * sd1^2 + 0.25 * sd2^2 + 0.5 * rho * sd1 * sd2
}

var_AB <- port_var(sd_A, sd_B, rho_AB)
var_BC <- port_var(sd_B, sd_C, rho_BC)

sd_AB <- sqrt(var_AB)
sd_BC <- sqrt(var_BC)

cat(sprintf("50/50 Portfolio A&B - Variance: %6.2f  |  Std Dev: %5.4f %%\n",
            var_AB, sd_AB))

## 50/50 Portfolio A&B - Variance: 860.00  |  Std Dev: 29.3258 %

cat(sprintf("50/50 Portfolio B&C - Variance: %6.2f  |  Std Dev: %5.4f %%\n",
            var_BC, sd_BC))

## 50/50 Portfolio B&C - Variance: 540.00  |  Std Dev: 23.2379 %

Interpretation: Since no information on expected returns is provided, we evaluate portfolios on risk alone. Portfolio B & C has a substantially lower standard deviation (~23.24%) than portfolio A & B (~29.33%).

The reason is correlation: A and B are highly correlated ($\rho = 0.90$), leaving almost no diversification benefit. B and C have a very low correlation ($\rho = 0.10$), yielding substantial variance reduction. Portfolio B & C is preferred on risk grounds.

1.2 Chapter 8: Index Models

1.2.1 CFA Problem 1 — ABC vs XYZ: Regression Analysis & Diversified Portfolio Context

Regression output (5-year monthly excess returns):

Statistic	ABC	XYZ
Alpha	−3.20%	7.3%
Beta	0.60	0.97
R²	0.35	0.17
Residual Std Dev	13.02%	21.45%

Interpretation of historical results:

ABC: Negative alpha (−3.20%) indicates ABC underperformed its CAPM benchmark over the estimation sample. With $\beta = 0.60$, it carries below-market systematic risk. $R^2 = 0.35$ means 35% of return variance is market-driven; the residual std dev of 13.02% represents the idiosyncratic component.
XYZ: Positive alpha (+7.3%) suggests strong historical outperformance relative to CAPM. $\beta \approx 1.0$ implies roughly market-level systematic risk. Low $R^2 = 0.17$ means most of XYZ’s variance (83%) is firm-specific. The large residual std dev (21.45%) confirms high idiosyncratic volatility.

Implications for a well-diversified portfolio:

In a diversified portfolio, idiosyncratic risk is eliminated by averaging across many assets — so the residual standard deviation becomes largely irrelevant. What matters is:

Beta (systematic risk contribution): ABC ($\beta=0.60$) adds less systematic risk to the portfolio than XYZ ($\beta=0.97$).
Forward-looking alpha: Historical alpha is not a reliable predictor of future outperformance. The additional brokerage data showing higher and more variable betas for both stocks (ABC: 0.62–0.71; XYZ: 1.25–1.45) suggests beta instability. XYZ’s true current systematic risk may be significantly higher than the historical 0.97.
Conclusion: In a diversified context, neither stock’s residual risk matters. XYZ’s potentially higher forward-looking beta and unreliable alpha make the choice between them less clear-cut than the historical alpha alone would suggest.

1.2.2 CFA Problem 2 — Baker Fund Nonsystematic Risk

Correlation between Baker Fund and S&P 500 = 0.70. What percentage of total risk is nonsystematic?

\[R^2 = \rho^2, \qquad \text{Nonsystematic\%} = (1 - R^2) \times 100\]

rho_baker <- 0.70
R2_baker  <- rho_baker^2
nonsys    <- 1 - R2_baker

cat(sprintf("R-squared (systematic %%): %.1f%%\n", R2_baker * 100))

## R-squared (systematic %): 49.0%

cat(sprintf("Nonsystematic risk %%:     %.1f%%\n", nonsys   * 100))

## Nonsystematic risk %:     51.0%

Answer: $R^2 = 0.70^2 = 0.49$, so 51% of Baker Fund’s total variance is nonsystematic (firm-specific, diversifiable) risk. (d)

1.2.3 CFA Problem 3 — Implied Beta of Charlottesville International

Given: $\rho_{\text{world}} = 1.0$; $E(R_{\text{world}}) = 11\%$; $E(R_{\text{Charlottesville}}) = 9\%$; $R_f = 3\%$.

Because $\rho = 1.0$, the fund moves in perfect lockstep with the world index. By CAPM:

\[E(R_i) = R_f + \beta_i \bigl[E(R_m) - R_f\bigr] \implies \beta_i = \frac{E(R_i) - R_f}{E(R_m) - R_f}\]

Rf      <- 0.03
E_Rm    <- 0.11
E_Ri    <- 0.09

beta_impl <- (E_Ri - Rf) / (E_Rm - Rf)
cat(sprintf("Implied beta of Charlottesville International: %.4f\n", beta_impl))

## Implied beta of Charlottesville International: 0.7500

\[\beta = \frac{9\% - 3\%}{11\% - 3\%} = \frac{6\%}{8\%} = \mathbf{0.75}\]

Answer: Beta = 0.75. (b)

1.2.4 CFA Problem 4 — Beta Concept

Beta is most closely associated with:

(d) Systematic risk ✓

$\beta_i = \text{Cov}(R_i,\, R_m) / \sigma_m^2$ measures a security’s sensitivity to broad market movements — the systematic, non-diversifiable component of risk. It explicitly excludes idiosyncratic risk (the residual from the single-index model).

1.2.5 CFA Problem 5 — Beta vs Standard Deviation as Risk Measures

Beta and standard deviation differ in that beta measures:

(b) Only systematic risk, while standard deviation measures total risk. ✓

Standard deviation ($\sigma_i$) reflects total risk = systematic + unsystematic.
Beta ($\beta_i$) captures only the market (systematic) component. For a well-diversified investor, only beta is relevant because idiosyncratic risk is diversified away at zero cost.

1.3 Chapter 9: The Capital Asset Pricing Model

Reference data (Problems 8–10):

Portfolio	Avg Annual Return	Std Dev	Beta
R	11%	10%	0.5
S&P 500	14%	12%	1.0

E_R_R    <- 0.11
sd_R     <- 0.10
beta_R   <- 0.5
E_R_mkt  <- 0.14
sd_mkt   <- 0.12

# Infer risk-free rate from SML:
# E(R_R) = Rf + beta_R * (E_mkt - Rf)
# 0.11 = Rf + 0.5*(0.14 - Rf) = 0.5*Rf + 0.07  =>  Rf = 0.08
Rf_inferred <- (E_R_R - beta_R * E_R_mkt) / (1 - beta_R)
cat(sprintf("Inferred risk-free rate: %.2f%%\n", Rf_inferred * 100))

## Inferred risk-free rate: 8.00%

# SML required return for Portfolio R
SML_R <- Rf_inferred + beta_R * (E_R_mkt - Rf_inferred)
cat(sprintf("SML required return for R:  %.2f%%\n", SML_R * 100))

## SML required return for R:  11.00%

cat(sprintf("Actual return of R:         %.2f%%\n", E_R_R * 100))

## Actual return of R:         11.00%

cat(sprintf("Alpha of R:                 %.4f%%\n", (E_R_R - SML_R) * 100))

## Alpha of R:                 0.0000%

1.3.1 CFA Problem 8 — Portfolio R Relative to the SML

The SML gives required return: \[E(R_R)_{\text{SML}} = R_f + \beta_R \times (E_m - R_f) = 8\% + 0.5 \times 6\% = 11\%\]

Portfolio R earns exactly 11% with $\beta = 0.5$ — the return exactly matches the SML prediction. Alpha = 0.

Answer: (a) Portfolio R plots on the SML. ✓

1.3.2 CFA Problem 9 — Portfolio R Relative to the CML

The CML relates expected return to total risk (standard deviation):

\[E(R_p)_{\text{CML}} = R_f + \frac{E(R_m) - R_f}{\sigma_m} \cdot \sigma_p\]

E_R_CML_R <- Rf_inferred + ((E_R_mkt - Rf_inferred) / sd_mkt) * sd_R
cat(sprintf("CML required return for sigma = 10%%: %.4f%%\n", E_R_CML_R * 100))

## CML required return for sigma = 10%: 13.0000%

cat(sprintf("Actual return of R:                   %.2f%%\n", E_R_R * 100))

## Actual return of R:                   11.00%

SR_R   <- (E_R_R - Rf_inferred) / sd_R
SR_mkt <- (E_R_mkt - Rf_inferred) / sd_mkt
cat(sprintf("\nSharpe Ratio of Portfolio R:  %.4f\n", SR_R))

## 
## Sharpe Ratio of Portfolio R:  0.3000

cat(sprintf("Sharpe Ratio of Market (CML): %.4f\n", SR_mkt))

## Sharpe Ratio of Market (CML): 0.5000

The CML requires ~13% for a portfolio with $\sigma = 10\%$. Portfolio R earns only 11%, so its Sharpe ratio ( 0.3) is below the market’s (0.5).

Answer: (b) Portfolio R plots below the CML. ✓

Why? Only perfectly diversified (fully efficient) portfolios lie on the CML. Portfolio R contains residual idiosyncratic risk, meaning it is not on the efficient frontier, and its Sharpe ratio is below the maximum achievable (i.e., below the market’s).

1.3.3 CFA Problem 10 — Portfolio A vs Portfolio B Under CAPM

Feature	Portfolio A	Portfolio B
Systematic risk (beta)	1.0	1.0
Specific (idiosyncratic) risk	High	Low

Should investors expect a higher return on A than B to compensate for A’s higher specific risk?

No. Under CAPM, the only priced risk is systematic risk (beta). Since both portfolios have identical betas of 1.0, CAPM implies they should both earn:

\[E(R) = R_f + 1.0 \times (E_m - R_f) = E_m\]

Idiosyncratic risk is not priced because rational, diversified investors can eliminate it at zero cost by adding more assets to their portfolios. A risk-averse investor will hold A only if its total volatility does not impair the portfolio — but since it can be diversified away, no additional return compensation is warranted. Both portfolios are expected to earn the market return.

1.4 Chapter 10: Arbitrage Pricing Theory and Multifactor Models

Setup (Problems 13–16): Orb Trust uses a two-factor APT model: - Factor 1: Real GDP growth, risk premium $\lambda_1 = 8\%$ - Factor 2: Inflation, risk premium $\lambda_2 = 2\%$ - Risk-free rate: $R_f = 4\%$

Fund	GDP $\beta_1$	Inflation $\beta_2$
High Growth	1.25	1.50
Large Cap	0.75	1.25
Utility	1.00	2.00

APT pricing equation: \[E(R_i) = R_f + \beta_{i,1}\,\lambda_1 + \beta_{i,2}\,\lambda_2\]

1.4.1 Problem 13 — Expected Return of High Growth Fund (APT)

Rf_orb  <- 0.04
lambda1 <- 0.08   # GDP risk premium
lambda2 <- 0.02   # Inflation risk premium

# High Growth Fund betas
b1_HG <- 1.25;  b2_HG <- 1.50

E_R_HG <- Rf_orb + b1_HG * lambda1 + b2_HG * lambda2
cat(sprintf("APT Expected Return - High Growth Fund: %.2f%%\n", E_R_HG * 100))

## APT Expected Return - High Growth Fund: 17.00%

\[E(R_{HG}) = 4\% + 1.25 \times 8\% + 1.5 \times 2\% = 4\% + 10\% + 3\% = \mathbf{17\%}\]

Answer: Expected return = 17%. (a)

1.4.2 Problem 14 — Arbitrage Opportunity for Large Cap Fund?

Kwon’s fundamental estimate: Large Cap expected return = $R_f + 8.5\% = 12.5\%$.

b1_LC <- 0.75;  b2_LC <- 1.25

E_R_LC_APT   <- Rf_orb + b1_LC * lambda1 + b2_LC * lambda2
E_R_LC_Kwon  <- Rf_orb + 0.085

cat(sprintf("APT equilibrium return - Large Cap:  %.2f%%\n", E_R_LC_APT  * 100))

## APT equilibrium return - Large Cap:  12.50%

cat(sprintf("Kwon's fundamental estimate:         %.2f%%\n", E_R_LC_Kwon * 100))

## Kwon's fundamental estimate:         12.50%

cat(sprintf("Alpha (mispricing):                  %.4f%%\n",
            (E_R_LC_Kwon - E_R_LC_APT) * 100))

## Alpha (mispricing):                  0.0000%

\[E(R_{LC})_{\text{APT}} = 4\% + 0.75 \times 8\% + 1.25 \times 2\% = 4\% + 6\% + 2.5\% = 12.5\%\]

Kwon’s estimate (12.5%) equals the APT equilibrium return. There is no mispricing and therefore no arbitrage opportunity. (a)

1.4.3 Problem 15 — Weight in Utility Fund for the “GDP Fund”

We construct a GDP Factor Fund with pure GDP exposure ($\beta_1 = 1$) and zero inflation exposure ($\beta_2 = 0$). The system of equations:

\[\begin{cases} w_1 + w_2 + w_3 = 1 & \text{(weights sum to 1)}\\ 1.25\,w_1 + 0.75\,w_2 + 1.00\,w_3 = 1 & \text{(GDP exposure = 1)}\\ 1.50\,w_1 + 1.25\,w_2 + 2.00\,w_3 = 0 & \text{(inflation exposure = 0)} \end{cases}\]

# Coefficient matrix and RHS
A_mat <- matrix(
  c(1,    1,    1,
    1.25, 0.75, 1.00,
    1.50, 1.25, 2.00),
  nrow = 3, byrow = TRUE
)
b_vec <- c(1, 1, 0)

w_sol <- solve(A_mat, b_vec)
names(w_sol) <- c("w_HighGrowth", "w_LargeCap", "w_Utility")
print(round(w_sol, 4))

## w_HighGrowth   w_LargeCap    w_Utility 
##          1.6          1.6         -2.2

cat(sprintf("\nWeight in Utility Fund (w3): %.4f\n", w_sol["w_Utility"]))

## 
## Weight in Utility Fund (w3): -2.2000

# Verify factor exposures
cat(sprintf("\nVerification:\n"))

## 
## Verification:

cat(sprintf("  Weights sum:          %.4f (should be 1)\n",  sum(w_sol)))

##   Weights sum:          1.0000 (should be 1)

cat(sprintf("  GDP exposure:         %.4f (should be 1)\n",
            c(1.25, 0.75, 1.00) %*% w_sol))

##   GDP exposure:         1.0000 (should be 1)

cat(sprintf("  Inflation exposure:   %.4f (should be 0)\n",
            c(1.50, 1.25, 2.00) %*% w_sol))

##   Inflation exposure:   0.0000 (should be 0)

Answer: The weight in the Utility Fund is -2.2. (b)

The negative weight indicates a short position in the Utility Fund. This is necessary because the Utility Fund has a very high inflation loading ($\beta_2 = 2$); shorting it eliminates the inflation exposure while maintaining the desired GDP sensitivity.

1.4.4 Problem 16 — Stiles vs McCracken: Who is Correct?

Stiles argues the GDP Fund suits retirees seeking steady investment income.
McCracken argues it suits investors who expect supply-side macro policies to succeed (i.e., strong GDP growth).

Answer: (a) McCracken is correct and Stiles is wrong.

The GDP Fund has unit sensitivity to real GDP growth and zero inflation sensitivity (by construction). Its returns fluctuate directly with the GDP growth factor — making it a procyclical, growth-sensitive investment. This is precisely the type of asset that benefits from a strong economic expansion (supply-side policy success), as McCracken argues.

Retirees seeking steady income would require low volatility and ideally inflation-protected returns — not a leveraged pure-factor bet on GDP growth. The GDP Fund’s return characteristics are fundamentally mismatched with the needs of income-focused retirees, making Stiles’ argument incorrect.

2 Questions Using R Code (40%)

2.1 Q1 — Import ETF Data from Yahoo Finance

# --- Load all required libraries ---
library(tidyquant)
library(lubridate)
library(timetk)
library(purrr)
library(tidyverse)
library(PerformanceAnalytics)
library(quadprog)
library(scales)

tickers <- c("SPY", "QQQ", "EEM", "IWM", "EFA", "TLT", "IYR", "GLD")

# Download daily adjusted prices from Yahoo Finance
prices_raw <- tq_get(
  tickers,
  from = "2010-01-01",
  to   = Sys.Date(),
  get  = "stock.prices"
)

# Extract adjusted closing prices; pivot to wide format
prices_wide <- prices_raw %>%
  select(date, symbol, adjusted) %>%
  pivot_wider(names_from = symbol, values_from = adjusted) %>%
  arrange(date)

# Convert to xts for time series operations
prices_xts <- prices_wide %>%
  tk_xts(date_var = date)

cat("ETF price data downloaded successfully.\n")

## ETF price data downloaded successfully.

cat(sprintf("Date range : %s  to  %s\n",
            as.character(index(prices_xts)[1]),
            as.character(index(prices_xts)[nrow(prices_xts)])))

## Date range : 2010-01-04  to  2026-06-09

cat(sprintf("Dimensions : %d rows × %d columns\n",
            nrow(prices_xts), ncol(prices_xts)))

## Dimensions : 4133 rows × 8 columns

# Preview first and last rows
cat("\n--- First 6 rows ---\n")

## 
## --- First 6 rows ---

print(head(prices_xts))

##                 SPY      QQQ      EEM      IWM      EFA      TLT      IYR
## 2010-01-04 84.79639 40.29079 30.35151 51.36657 35.12844 55.70954 26.76810
## 2010-01-05 85.02083 40.29079 30.57180 51.18994 35.15940 56.06930 26.83239
## 2010-01-06 85.08070 40.04776 30.63577 51.14176 35.30801 55.31877 26.82069
## 2010-01-07 85.43987 40.07380 30.45810 51.51910 35.17178 55.41177 27.06028
## 2010-01-08 85.72418 40.40364 30.69972 51.80011 35.45044 55.38697 26.87912
## 2010-01-11 85.84388 40.23871 30.63577 51.59136 35.74147 55.08303 27.00769
##               GLD
## 2010-01-04 109.80
## 2010-01-05 109.70
## 2010-01-06 111.51
## 2010-01-07 110.82
## 2010-01-08 111.37
## 2010-01-11 112.85

cat("\n--- Last 6 rows ---\n")

## 
## --- Last 6 rows ---

print(tail(prices_xts))

##               SPY    QQQ   EEM    IWM    EFA   TLT    IYR    GLD
## 2026-06-02 759.57 746.16 70.80 291.66 105.02 85.65  99.99 411.95
## 2026-06-03 754.24 744.21 69.92 287.67 104.12 85.31 100.00 407.87
## 2026-06-04 757.09 740.61 69.10 292.01 104.95 85.50 101.79 411.27
## 2026-06-05 737.55 705.06 64.59 281.65 102.26 85.06 102.54 396.24
## 2026-06-08 739.22 716.07 65.75 284.11 102.88 84.62 101.08 397.27
## 2026-06-09     NA     NA    NA     NA     NA    NA     NA     NA

2.2 Q2 — Compute Weekly and Monthly Simple Returns

# --- Weekly returns (simple): end-of-week price / start-of-week price - 1 ---
returns_weekly_xts <- apply.weekly(
  prices_xts,
  function(x) {
    # Use first and last price in each week
    if (nrow(x) == 1) return(xts(matrix(rep(0, ncol(x)), nrow = 1,
                                         dimnames = list(NULL, colnames(x))),
                                  order.by = index(x)))
    xts(matrix(as.numeric(x[nrow(x), ]) / as.numeric(x[1, ]) - 1,
               nrow = 1, dimnames = list(NULL, colnames(x))),
        order.by = index(x)[nrow(x)])
  }
)

# --- Monthly returns (simple): end-of-month price / start-of-month price - 1 ---
returns_monthly_xts <- apply.monthly(
  prices_xts,
  function(x) {
    if (nrow(x) == 1) return(xts(matrix(rep(0, ncol(x)), nrow = 1,
                                          dimnames = list(NULL, colnames(x))),
                                   order.by = index(x)))
    xts(matrix(as.numeric(x[nrow(x), ]) / as.numeric(x[1, ]) - 1,
               nrow = 1, dimnames = list(NULL, colnames(x))),
        order.by = index(x)[nrow(x)])
  }
)

cat(sprintf("Weekly  returns: %d weeks  ×  %d assets\n",
            nrow(returns_weekly_xts),  ncol(returns_weekly_xts)))

## Weekly  returns: 858 weeks  ×  8 assets

cat(sprintf("Monthly returns: %d months ×  %d assets\n",
            nrow(returns_monthly_xts), ncol(returns_monthly_xts)))

## Monthly returns: 198 months ×  8 assets

cat("\n--- Monthly returns (first 6 rows) ---\n")

## 
## --- Monthly returns (first 6 rows) ---

print(head(returns_monthly_xts))

##                     SPY         QQQ          EEM         IWM         EFA
## 2010-01-29 -0.052413295 -0.07819881 -0.103722878 -0.06048768 -0.07491615
## 2010-02-26  0.015404188  0.03467340 -0.008903837  0.03255515 -0.01534423
## 2010-03-31  0.049975540  0.06169103  0.063099281  0.05771657  0.05562886
## 2010-04-30  0.008573766  0.02242610 -0.027070761  0.04705500 -0.04493641
## 2010-05-28 -0.091233776 -0.08672143 -0.098864717 -0.09568690 -0.11824813
## 2010-06-30 -0.035514456 -0.05101577  0.004467966 -0.04856789 -0.01079255
##                     TLT         IYR          GLD
## 2010-01-29  0.027836634 -0.05195347 -0.034972713
## 2010-02-26  0.005703908  0.03573048  0.009967714
## 2010-03-31 -0.020144078  0.08633706 -0.004386396
## 2010-04-30  0.035750986  0.05898797  0.046254293
## 2010-05-28  0.052459266 -0.08516477  0.027218472
## 2010-06-30  0.050593786 -0.02782229  0.014761042

2.3 Q3 — Convert Monthly Returns to Tibble Format

# Convert xts to tibble and standardise date to month-start (floor to month)
monthly_tbl <- tk_tbl(returns_monthly_xts, rename_index = "date") %>%
  mutate(date = as.Date(date),
         date = floor_date(date, unit = "month")) %>%
  arrange(date)

cat(sprintf("Monthly returns tibble: %d rows × %d columns\n",
            nrow(monthly_tbl), ncol(monthly_tbl)))

## Monthly returns tibble: 198 rows × 9 columns

cat(sprintf("Date range: %s  to  %s\n",
            as.character(min(monthly_tbl$date)),
            as.character(max(monthly_tbl$date))))

## Date range: 2010-01-01  to  2026-06-01

print(head(monthly_tbl))

## # A tibble: 6 × 9
##   date            SPY     QQQ      EEM     IWM     EFA      TLT     IYR      GLD
##   <date>        <dbl>   <dbl>    <dbl>   <dbl>   <dbl>    <dbl>   <dbl>    <dbl>
## 1 2010-01-01 -0.0524  -0.0782 -0.104   -0.0605 -0.0749  0.0278  -0.0520 -0.0350 
## 2 2010-02-01  0.0154   0.0347 -0.00890  0.0326 -0.0153  0.00570  0.0357  0.00997
## 3 2010-03-01  0.0500   0.0617  0.0631   0.0577  0.0556 -0.0201   0.0863 -0.00439
## 4 2010-04-01  0.00857  0.0224 -0.0271   0.0471 -0.0449  0.0358   0.0590  0.0463 
## 5 2010-05-01 -0.0912  -0.0867 -0.0989  -0.0957 -0.118   0.0525  -0.0852  0.0272 
## 6 2010-06-01 -0.0355  -0.0510  0.00447 -0.0486 -0.0108  0.0506  -0.0278  0.0148

2.4 Q4 — Download Fama-French 3-Factor Data

# ---------------------------------------------------------------------------
# Download directly from Ken French's data library (ZIP → CSV inside)
# Robust fallback: try frenchdata package first, then direct download
# ---------------------------------------------------------------------------

ff3_tbl <- tryCatch({
  # Attempt 1: frenchdata package
  if (!requireNamespace("frenchdata", quietly = TRUE))
    install.packages("frenchdata", repos = "https://cran.rstudio.com")
  library(frenchdata)
  raw   <- download_french_data("Fama/French 3 Factors")
  mnth  <- raw$subsets$data[[1]]

  mnth %>%
    mutate(
      date  = as.Date(paste0(date, "01"), format = "%Y%m%d"),
      date  = floor_date(date, "month"),
      MktRF = `Mkt-RF` / 100,
      SMB   = SMB / 100,
      HML   = HML / 100,
      RF    = RF  / 100
    ) %>%
    select(date, MktRF, SMB, HML, RF) %>%
    filter(date >= as.Date("2010-01-01"))
}, error = function(e) {
  # Attempt 2: direct download from French's website
  message("frenchdata failed, attempting direct download...")
  tmp  <- tempfile(fileext = ".zip")
  url  <- paste0("https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/",
                 "ftp/F-F_Research_Data_Factors_CSV.zip")
  download.file(url, tmp, quiet = TRUE)
  csv_name <- unzip(tmp, list = TRUE)$Name[1]
  raw_lines <- readLines(unz(tmp, csv_name))

  # Find the monthly data block (header row contains "Mkt-RF")
  start <- which(grepl("Mkt-RF", raw_lines))[1] + 1
  end   <- which(raw_lines == "")[which(raw_lines == "") > start][1] - 1
  data_lines <- raw_lines[start:end]
  data_lines <- data_lines[nchar(trimws(data_lines)) > 0]

  df <- read.csv(
    text  = paste(c("date,MktRF,SMB,HML,RF"), data_lines, sep = "\n"),
    strip.white = TRUE
  )
  df %>%
    mutate(
      date  = as.Date(paste0(sprintf("%06d", date), "01"), format = "%Y%m%d"),
      date  = floor_date(date, "month"),
      MktRF = MktRF / 100,
      SMB   = SMB   / 100,
      HML   = HML   / 100,
      RF    = RF    / 100
    ) %>%
    filter(date >= as.Date("2010-01-01")) %>%
    arrange(date)
})

cat(sprintf("FF3 data: %d monthly observations\n", nrow(ff3_tbl)))

## FF3 data: 196 monthly observations

cat(sprintf("Date range: %s  to  %s\n",
            as.character(min(ff3_tbl$date)),
            as.character(max(ff3_tbl$date))))

## Date range: 2010-01-01  to  2026-04-01

print(head(ff3_tbl))

## # A tibble: 6 × 5
##   date         MktRF     SMB     HML     RF
##   <date>       <dbl>   <dbl>   <dbl>  <dbl>
## 1 2010-01-01 -0.0335  0.0043  0.0033 0     
## 2 2010-02-01  0.0339  0.0118  0.0318 0     
## 3 2010-03-01  0.063   0.0146  0.0219 0.0001
## 4 2010-04-01  0.0199  0.0484  0.0296 0.0001
## 5 2010-05-01 -0.079   0.0013 -0.0248 0.0001
## 6 2010-06-01 -0.0556 -0.0179 -0.0473 0.0001

2.5 Q5 — Merge Monthly Returns with FF3 Factors

merged_tbl <- monthly_tbl %>%
  inner_join(ff3_tbl, by = "date") %>%
  arrange(date)

cat(sprintf("Merged dataset: %d rows × %d columns\n",
            nrow(merged_tbl), ncol(merged_tbl)))

## Merged dataset: 196 rows × 13 columns

cat(sprintf("Date range: %s  to  %s\n",
            as.character(min(merged_tbl$date)),
            as.character(max(merged_tbl$date))))

## Date range: 2010-01-01  to  2026-04-01

print(head(merged_tbl))

## # A tibble: 6 × 13
##   date            SPY     QQQ      EEM     IWM     EFA      TLT     IYR      GLD
##   <date>        <dbl>   <dbl>    <dbl>   <dbl>   <dbl>    <dbl>   <dbl>    <dbl>
## 1 2010-01-01 -0.0524  -0.0782 -0.104   -0.0605 -0.0749  0.0278  -0.0520 -0.0350 
## 2 2010-02-01  0.0154   0.0347 -0.00890  0.0326 -0.0153  0.00570  0.0357  0.00997
## 3 2010-03-01  0.0500   0.0617  0.0631   0.0577  0.0556 -0.0201   0.0863 -0.00439
## 4 2010-04-01  0.00857  0.0224 -0.0271   0.0471 -0.0449  0.0358   0.0590  0.0463 
## 5 2010-05-01 -0.0912  -0.0867 -0.0989  -0.0957 -0.118   0.0525  -0.0852  0.0272 
## 6 2010-06-01 -0.0355  -0.0510  0.00447 -0.0486 -0.0108  0.0506  -0.0278  0.0148 
## # ℹ 4 more variables: MktRF <dbl>, SMB <dbl>, HML <dbl>, RF <dbl>

2.6 Q6 — CAPM-Based GMV Portfolio (Single Snapshot: 2015-01 → 2015-02)

The Global Minimum Variance (GMV) portfolio solves:

\[\min_{\mathbf{w}} \; \mathbf{w}^\top \Sigma \mathbf{w} \quad \text{subject to} \quad \mathbf{1}^\top \mathbf{w} = 1\]

Under the CAPM single-index model, the structured covariance matrix is:

\[\Sigma_{\text{CAPM}} = \boldsymbol{\beta}\boldsymbol{\beta}^\top \sigma_m^2 + \mathbf{D}, \qquad \mathbf{D} = \text{diag}\!\left(\sigma_{e_1}^2, \ldots, \sigma_{e_n}^2\right)\]

This reduces estimation from $n(n+1)/2 = 36$ free parameters to $2n+1 = 17$, improving out-of-sample stability.

# ---------------------------------------------------------------------------
# Helper: Compute GMV weights via quadratic programming
#   min  w' Sigma w   s.t.  1'w = 1
# ---------------------------------------------------------------------------
gmv_weights <- function(Sigma) {
  n     <- ncol(Sigma)
  Dmat  <- 2 * Sigma
  # Ensure positive definiteness (add small ridge if needed)
  Dmat  <- Dmat + diag(1e-8, n)
  dvec  <- rep(0, n)
  Amat  <- matrix(1, nrow = n, ncol = 1)
  bvec  <- 1
  sol   <- quadprog::solve.QP(Dmat, dvec, Amat, bvec, meq = 1)
  w     <- sol$solution
  names(w) <- colnames(Sigma)
  w
}

# ---------------------------------------------------------------------------
# Training window: 2010-02-01 to 2015-01-01  (60 months)
# ---------------------------------------------------------------------------
train_start <- as.Date("2010-02-01")
train_end   <- as.Date("2015-01-01")

train_data  <- merged_tbl %>%
  filter(date >= train_start, date <= train_end)

n_assets    <- length(tickers)
ret_mat     <- as.matrix(train_data[, tickers])
mkt_excess  <- train_data$MktRF
rf_monthly  <- train_data$RF

# Excess returns matrix  (asset returns minus risk-free rate)
excess_mat  <- sweep(ret_mat, 1, rf_monthly, "-")

# OLS regressions: excess_i = alpha_i + beta_i * MktRF + eps_i
betas_capm  <- numeric(n_assets)
resid_var   <- numeric(n_assets)

for (i in seq_along(tickers)) {
  fit             <- lm(excess_mat[, i] ~ mkt_excess)
  betas_capm[i]  <- coef(fit)[2]
  resid_var[i]   <- var(residuals(fit))
}

sigma2_mkt  <- var(mkt_excess)
Sigma_capm  <- outer(betas_capm, betas_capm) * sigma2_mkt + diag(resid_var)
colnames(Sigma_capm) <- rownames(Sigma_capm) <- tickers

cat("CAPM factor loadings (betas):\n")

## CAPM factor loadings (betas):

print(round(setNames(betas_capm, tickers), 4))

##     SPY     QQQ     EEM     IWM     EFA     TLT     IYR     GLD 
##  0.9275  0.9711  1.1056  1.2203  1.0217 -0.6590  0.8336  0.1422

cat("\nCAPM Covariance Matrix (annualised ×12):\n")

## 
## CAPM Covariance Matrix (annualised ×12):

print(round(Sigma_capm * 12, 6))

##           SPY       QQQ       EEM       IWM       EFA       TLT       IYR
## SPY  0.017799  0.016445  0.018724  0.020666  0.017303 -0.011160  0.014117
## QQQ  0.016445  0.021406  0.019602  0.021635  0.018115 -0.011684  0.014779
## EEM  0.018724  0.019602  0.039569  0.024633  0.020625 -0.013303  0.016826
## IWM  0.020666  0.021635  0.024633  0.036006  0.022764 -0.014683  0.018572
## EFA  0.017303  0.018115  0.020625  0.022764  0.027802 -0.012294  0.015550
## TLT -0.011160 -0.011684 -0.013303 -0.014683 -0.012294  0.017916 -0.010030
## IYR  0.014117  0.014779  0.016826  0.018572  0.015550 -0.010030  0.026253
## GLD  0.002409  0.002522  0.002871  0.003169  0.002653 -0.001711  0.002165
##           GLD
## SPY  0.002409
## QQQ  0.002522
## EEM  0.002871
## IWM  0.003169
## EFA  0.002653
## TLT -0.001711
## IYR  0.002165
## GLD  0.032878

# GMV optimal weights
w_gmv_capm <- gmv_weights(Sigma_capm)
cat("\nGMV weights (CAPM) as of 2015-01:\n")

## 
## GMV weights (CAPM) as of 2015-01:

print(round(w_gmv_capm, 4))

##     SPY     QQQ     EEM     IWM     EFA     TLT     IYR     GLD 
##  0.2709  0.1090  0.0068 -0.0195  0.0376  0.4576  0.0592  0.0784

cat(sprintf("Weights sum: %.6f\n", sum(w_gmv_capm)))

## Weights sum: 1.000000

# Realised return in 2015-02
ret_201502 <- merged_tbl %>%
  filter(date == as.Date("2015-02-01")) %>%
  select(all_of(tickers)) %>%
  as.numeric()

realized_capm <- sum(w_gmv_capm * ret_201502)
cat(sprintf("\nRealized GMV portfolio return (CAPM), 2015-02: %.4f%%\n",
            realized_capm * 100))

## 
## Realized GMV portfolio return (CAPM), 2015-02: -1.2127%

Interpretation: The CAPM covariance matrix uses only the market factor to explain cross-asset co-movement. The GMV weights tilt toward assets with low market betas and low residual variances (e.g., TLT and GLD typically receive higher weights), minimising overall portfolio variance subject to being fully invested.

2.7 Q7 — FF3-Based GMV Portfolio (Single Snapshot: 2015-01 → 2015-02)

Under the Fama-French three-factor model:

\[R_{i,t} - R_{f,t} = \alpha_i + \beta_{i,m}\,\text{MktRF}_t + \beta_{i,\text{SMB}}\,\text{SMB}_t + \beta_{i,\text{HML}}\,\text{HML}_t + \varepsilon_{i,t}\]

The structured covariance matrix:

\[\Sigma_{\text{FF3}} = \mathbf{B}\,\Sigma_F\,\mathbf{B}^\top + \mathbf{D}\]

where $\mathbf{B}$ is the $n \times 3$ matrix of factor loadings, $\Sigma_F$ is the $3 \times 3$ factor return covariance matrix, and $\mathbf{D} = \text{diag}(\sigma_{e_i}^2)$.

factors_mat   <- as.matrix(train_data[, c("MktRF", "SMB", "HML")])

B_ff3         <- matrix(NA, nrow = n_assets, ncol = 3,
                         dimnames = list(tickers, c("MktRF", "SMB", "HML")))
resid_var_ff3 <- numeric(n_assets)

for (i in seq_along(tickers)) {
  fit               <- lm(excess_mat[, i] ~ factors_mat)
  B_ff3[i, ]       <- coef(fit)[-1]
  resid_var_ff3[i] <- var(residuals(fit))
}

Sigma_F   <- cov(factors_mat)
Sigma_ff3 <- B_ff3 %*% Sigma_F %*% t(B_ff3) + diag(resid_var_ff3)
colnames(Sigma_ff3) <- rownames(Sigma_ff3) <- tickers

cat("FF3 factor loadings (B matrix):\n")

## FF3 factor loadings (B matrix):

print(round(B_ff3, 4))

##       MktRF     SMB     HML
## SPY  0.9507 -0.0803 -0.0196
## QQQ  1.0516 -0.0288 -0.4419
## EEM  1.1401  0.0283 -0.2497
## IWM  0.9615  1.0233  0.0323
## EFA  1.1683 -0.4518 -0.2083
## TLT -0.5516 -0.1527 -0.4186
## IYR  0.8292  0.0761 -0.0875
## GLD  0.2230  0.3070 -0.9436

cat("\nFF3 Covariance Matrix (annualised ×12):\n")

## 
## FF3 Covariance Matrix (annualised ×12):

print(round(Sigma_ff3 * 12, 6))

##           SPY       QQQ       EEM       IWM       EFA       TLT       IYR
## SPY  0.017799  0.016480  0.018729  0.020339  0.017459 -0.011088  0.014097
## QQQ  0.016480  0.021406  0.020047  0.021583  0.018484 -0.010939  0.014935
## EEM  0.018729  0.020047  0.039569  0.024786  0.020755 -0.012904  0.016929
## IWM  0.020339  0.021583  0.024786  0.036006  0.020943 -0.015246  0.018897
## EFA  0.017459  0.018484  0.020755  0.020943  0.027802 -0.011724  0.015479
## TLT -0.011088 -0.010939 -0.012904 -0.015246 -0.011724  0.017916 -0.009923
## IYR  0.014097  0.014935  0.016929  0.018897  0.015479 -0.009923  0.026253
## GLD  0.002366  0.004204  0.003888  0.004569  0.002793 -0.000305  0.002620
##           GLD
## SPY  0.002366
## QQQ  0.004204
## EEM  0.003888
## IWM  0.004569
## EFA  0.002793
## TLT -0.000305
## IYR  0.002620
## GLD  0.032878

# GMV optimal weights
w_gmv_ff3 <- gmv_weights(Sigma_ff3)
cat("\nGMV weights (FF3) as of 2015-01:\n")

## 
## GMV weights (FF3) as of 2015-01:

print(round(w_gmv_ff3, 4))

##     SPY     QQQ     EEM     IWM     EFA     TLT     IYR     GLD 
##  0.3281  0.0286 -0.0015  0.0295  0.0216  0.4688  0.0641  0.0608

cat(sprintf("Weights sum: %.6f\n", sum(w_gmv_ff3)))

## Weights sum: 1.000000

realized_ff3 <- sum(w_gmv_ff3 * ret_201502)
cat(sprintf("\nRealized GMV portfolio return (FF3), 2015-02: %.4f%%\n",
            realized_ff3 * 100))

## 
## Realized GMV portfolio return (FF3), 2015-02: -1.3155%

# Comparison
cat("\n--- Single-period comparison (2015-02) ---\n")

## 
## --- Single-period comparison (2015-02) ---

cat(sprintf("CAPM GMV realised return: %.4f%%\n", realized_capm * 100))

## CAPM GMV realised return: -1.2127%

cat(sprintf("FF3  GMV realised return: %.4f%%\n", realized_ff3  * 100))

## FF3  GMV realised return: -1.3155%

Interpretation: The FF3 model adds two additional factors — size (SMB) and value (HML) — which capture systematic cross-sectional variation missed by the single market factor. The FF3 covariance matrix accounts for common variation driven by size and value tilts, potentially leading to different (and potentially better diversified) GMV weights than the CAPM-based matrix.

2.8 Q8 — Rolling 60-Month Backtest: GMV Portfolios (2015-02 to 2026-05)

Procedure:

At each month-end $t$ (estimation end), use the previous 60 months ($t-59$ to $t$) as the training window.
Estimate CAPM and FF3 covariance matrices; solve for GMV weights.
Apply those weights to assets in month $t+1$ (out-of-sample); record realised return.
Roll forward one month and repeat from 2015-02 through 2026-05.

all_dates   <- merged_tbl$date
n_months    <- nrow(merged_tbl)
window      <- 60

# Index positions: estimation ends at position (window) to (n_months-1)
# OOS (realised return) is at position (window+1) to (n_months)
est_idx  <- window:(n_months - 1)
oos_idx  <- (window + 1):n_months

est_ends  <- all_dates[est_idx]
oos_dates <- all_dates[oos_idx]

# Restrict to 2015-02 onwards
start_oos <- as.Date("2015-02-01")
keep      <- oos_dates >= start_oos
est_ends  <- est_ends[keep]
oos_dates <- oos_dates[keep]

bt_results <- tibble(
  date     = oos_dates,
  ret_capm = NA_real_,
  ret_ff3  = NA_real_
)

for (k in seq_along(oos_dates)) {

  ed  <- est_ends[k]
  oos <- oos_dates[k]

  # 60-month training slice (most recent 60 months ending at ed)
  tr  <- merged_tbl %>% filter(date <= ed) %>% tail(60)

  if (nrow(tr) < window) next  # safety check

  rm_   <- as.matrix(tr[, tickers])
  mkt_  <- tr$MktRF
  rf_   <- tr$RF
  exc_  <- sweep(rm_, 1, rf_, "-")
  fac_  <- as.matrix(tr[, c("MktRF", "SMB", "HML")])

  # ---- CAPM covariance ----
  b1 <- numeric(n_assets)
  rv1 <- numeric(n_assets)
  for (i in seq_along(tickers)) {
    fit    <- lm(exc_[, i] ~ mkt_)
    b1[i]  <- coef(fit)[2]
    rv1[i] <- var(residuals(fit))
  }
  Sc1 <- outer(b1, b1) * var(mkt_) + diag(rv1)
  colnames(Sc1) <- rownames(Sc1) <- tickers
  w1  <- tryCatch(gmv_weights(Sc1),
                  error = function(e) rep(1 / n_assets, n_assets))

  # ---- FF3 covariance ----
  B2  <- matrix(NA, n_assets, 3)
  rv2 <- numeric(n_assets)
  for (i in seq_along(tickers)) {
    fit     <- lm(exc_[, i] ~ fac_)
    B2[i, ] <- coef(fit)[-1]
    rv2[i]  <- var(residuals(fit))
  }
  Sc2 <- B2 %*% cov(fac_) %*% t(B2) + diag(rv2)
  colnames(Sc2) <- rownames(Sc2) <- tickers
  w2  <- tryCatch(gmv_weights(Sc2),
                  error = function(e) rep(1 / n_assets, n_assets))

  # ---- Out-of-sample realised returns ----
  oos_ret <- merged_tbl %>%
    filter(date == oos) %>%
    select(all_of(tickers)) %>%
    as.numeric()

  bt_results$ret_capm[k] <- sum(w1 * oos_ret)
  bt_results$ret_ff3[k]  <- sum(w2 * oos_ret)
}

cat(sprintf("Backtest period: %s  to  %s\n",
            as.character(min(bt_results$date)),
            as.character(max(bt_results$date))))

## Backtest period: 2015-02-01  to  2026-04-01

cat(sprintf("Number of OOS months: %d\n", nrow(bt_results)))

## Number of OOS months: 135

print(head(bt_results, 10))

## # A tibble: 10 × 3
##    date       ret_capm  ret_ff3
##    <date>        <dbl>    <dbl>
##  1 2015-02-01 -0.0121  -0.0132 
##  2 2015-03-01  0.00226  0.00491
##  3 2015-04-01 -0.0175  -0.0209 
##  4 2015-05-01 -0.00401 -0.00448
##  5 2015-06-01 -0.0287  -0.0271 
##  6 2015-07-01  0.0322   0.0289 
##  7 2015-08-01 -0.0348  -0.0352 
##  8 2015-09-01  0.00861  0.00576
##  9 2015-10-01  0.0411   0.0339 
## 10 2015-11-01 -0.0128  -0.0118

2.8.1 Cumulative Returns Plot

# Build long-format cumulative return series
bt_cum <- bt_results %>%
  filter(!is.na(ret_capm), !is.na(ret_ff3)) %>%
  mutate(
    cum_capm = cumprod(1 + ret_capm),
    cum_ff3  = cumprod(1 + ret_ff3)
  ) %>%
  select(date, cum_capm, cum_ff3) %>%
  pivot_longer(cols = c(cum_capm, cum_ff3),
               names_to  = "model",
               values_to = "cum_return") %>%
  mutate(model = recode(model,
                        cum_capm = "CAPM GMV",
                        cum_ff3  = "FF3 GMV"))

# Add a baseline row at 2015-01-01 with value = 1.0
baseline <- tibble(
  date      = as.Date("2015-01-01"),
  model     = c("CAPM GMV", "FF3 GMV"),
  cum_return = 1.0
)
bt_cum <- bind_rows(baseline, bt_cum)

ggplot(bt_cum, aes(x = date, y = cum_return, colour = model, linetype = model)) +
  geom_line(linewidth = 0.9) +
  geom_hline(yintercept = 1, linetype = "dashed", colour = "grey50", linewidth = 0.4) +
  scale_y_continuous(
    labels = function(x) paste0(round((x - 1) * 100, 0), "%"),
    name   = "Cumulative Return (base = 100%)"
  ) +
  scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
  scale_colour_manual(values   = c("CAPM GMV" = "#1565C0", "FF3 GMV" = "#C62828")) +
  scale_linetype_manual(values = c("CAPM GMV" = "solid",   "FF3 GMV" = "solid")) +
  labs(
    title    = "Cumulative Returns: GMV Portfolios — CAPM vs Fama-French 3-Factor",
    subtitle = "Rolling 60-month estimation window | Out-of-sample period: 2015-02 to 2026-05",
    x        = NULL,
    colour   = "Model", linetype = "Model"
  ) +
  theme_bw(base_size = 13) +
  theme(
    legend.position  = "bottom",
    legend.key.width = unit(1.5, "cm"),
    panel.grid.minor = element_blank(),
    plot.title       = element_text(face = "bold")
  )

2.8.2 Performance Summary Table

# Annualised performance metrics
compute_perf <- function(r, label) {
  n_months  <- length(r)
  ann_ret   <- (prod(1 + r) ^ (12 / n_months) - 1) * 100
  ann_vol   <- sd(r) * sqrt(12) * 100
  sharpe    <- (mean(r) / sd(r)) * sqrt(12)
  cr        <- cumprod(1 + r)
  max_dd    <- min((cr - cummax(cr)) / cummax(cr)) * 100
  total_ret <- (prod(1 + r) - 1) * 100

  tibble(
    Model           = label,
    `Total Return (%)` = round(total_ret, 2),
    `Ann. Return (%)` = round(ann_ret, 2),
    `Ann. Volatility (%)` = round(ann_vol, 2),
    `Sharpe Ratio`   = round(sharpe, 3),
    `Max Drawdown (%)` = round(max_dd, 2)
  )
}

perf_tbl <- bind_rows(
  compute_perf(na.omit(bt_results$ret_capm), "CAPM GMV"),
  compute_perf(na.omit(bt_results$ret_ff3),  "FF3 GMV")
)

knitr::kable(
  perf_tbl,
  align   = c("l", rep("r", 5)),
  caption = "Out-of-Sample GMV Portfolio Performance (2015-02 – 2026-05)"
)

Out-of-Sample GMV Portfolio Performance (2015-02 – 2026-05)
Model	Total Return (%)	Ann. Return (%)	Ann. Volatility (%)	Sharpe Ratio	Max Drawdown (%)
CAPM GMV	77.30	5.22	10.13	0.554	-29.56
FF3 GMV	42.88	3.22	10.20	0.362	-34.40

2.8.3 Monthly Returns Distribution

bt_long <- bt_results %>%
  filter(!is.na(ret_capm), !is.na(ret_ff3)) %>%
  pivot_longer(cols = c(ret_capm, ret_ff3),
               names_to  = "model",
               values_to = "monthly_return") %>%
  mutate(model = recode(model,
                        ret_capm = "CAPM GMV",
                        ret_ff3  = "FF3 GMV"))

ggplot(bt_long, aes(x = monthly_return, fill = model)) +
  geom_histogram(bins = 50, alpha = 0.6, position = "identity") +
  geom_vline(aes(xintercept = 0), linetype = "dashed", colour = "black") +
  scale_x_continuous(labels = percent_format(accuracy = 1)) +
  scale_fill_manual(values = c("CAPM GMV" = "#1565C0", "FF3 GMV" = "#C62828")) +
  facet_wrap(~ model, ncol = 2) +
  labs(
    title  = "Distribution of Monthly Out-of-Sample Returns",
    x      = "Monthly Return",
    y      = "Count",
    fill   = "Model"
  ) +
  theme_bw(base_size = 12) +
  theme(legend.position = "none",
        panel.grid.minor = element_blank())

2.8.4 Rolling 12-Month Sharpe Ratio

# Rolling 12-month annualised Sharpe ratio
roll_sharpe <- function(r, window = 12) {
  n <- length(r)
  sharpe <- rep(NA_real_, n)
  for (i in window:n) {
    sub <- r[(i - window + 1):i]
    sharpe[i] <- (mean(sub) / sd(sub)) * sqrt(12)
  }
  sharpe
}

bt_rolling <- bt_results %>%
  filter(!is.na(ret_capm), !is.na(ret_ff3)) %>%
  mutate(
    sr_capm = roll_sharpe(ret_capm),
    sr_ff3  = roll_sharpe(ret_ff3)
  ) %>%
  pivot_longer(cols = c(sr_capm, sr_ff3),
               names_to  = "model",
               values_to = "rolling_sharpe") %>%
  mutate(model = recode(model, sr_capm = "CAPM GMV", sr_ff3 = "FF3 GMV"))

ggplot(bt_rolling, aes(x = date, y = rolling_sharpe, colour = model)) +
  geom_line(linewidth = 0.8, na.rm = TRUE) +
  geom_hline(yintercept = 0, linetype = "dashed", colour = "grey40") +
  scale_colour_manual(values = c("CAPM GMV" = "#1565C0", "FF3 GMV" = "#C62828")) +
  labs(
    title    = "Rolling 12-Month Annualised Sharpe Ratio",
    subtitle = "CAPM GMV vs FF3 GMV | Out-of-sample 2015-02 to 2026-05",
    x        = NULL,
    y        = "Annualised Sharpe Ratio",
    colour   = "Model"
  ) +
  theme_bw(base_size = 12) +
  theme(legend.position = "bottom",
        panel.grid.minor = element_blank())

2.8.5 Interpretation and Conclusions

Model design:

CAPM GMV models the covariance matrix using a single market factor: $\Sigma = \beta\beta^\top \sigma_m^2 + D$. This is highly parsimonious (17 parameters for 8 assets), reducing estimation error but potentially missing systematic variation explained by size and value factors.
FF3 GMV extends to three factors: Market, SMB (small-minus-big), and HML (high-minus-low). With $\Sigma = B\Sigma_F B^\top + D$, it captures additional cross-asset covariation driven by the size and value tilts present in ETFs like IWM (small caps) and IYR (real estate/value), potentially producing a better-diversified minimum variance portfolio.

Key takeaways from the backtest:

Factor-model covariance estimation (both CAPM and FF3) is superior to raw sample covariance in GMV contexts because it imposes structure, reducing the impact of estimation error in the 60-month rolling window.
The FF3 model typically delivers improved covariance estimation for portfolios containing ETFs with meaningful size and value tilts (IWM, IYR), where the SMB and HML factors explain significant cross-sectional return variation.
Both portfolios will tend to tilt toward low-volatility defensive assets (TLT, GLD) since the GMV objective rewards low variance, regardless of expected return.
Performance differences between the two models depend on whether the SMB and HML factors provide genuine diversification benefit in a given sub-period; this varies across market regimes (e.g., growth vs value cycles).

End of Final Exam

## R version 4.4.1 (2024-06-14)
## Platform: aarch64-apple-darwin20
## Running under: macOS 26.2
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Asia/Taipei
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] scales_1.4.0               quadprog_1.5-8            
##  [3] forcats_1.0.0              stringr_1.5.1             
##  [5] dplyr_1.2.0                readr_2.1.5               
##  [7] tidyr_1.3.1                tibble_3.2.1              
##  [9] ggplot2_4.0.2              tidyverse_2.0.0           
## [11] purrr_1.2.1                timetk_2.9.1              
## [13] lubridate_1.9.3            PerformanceAnalytics_2.0.8
## [15] quantmod_0.4.28            TTR_0.24.4                
## [17] xts_0.14.1                 zoo_1.8-14                
## [19] tidyquant_1.0.11          
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.1    timeDate_4051.111   farver_2.1.2       
##  [4] S7_0.2.1            fastmap_1.2.0       digest_0.6.37      
##  [7] rpart_4.1.23        timechange_0.3.0    lifecycle_1.0.5    
## [10] yardstick_1.3.2     survival_3.6-4      magrittr_2.0.3     
## [13] compiler_4.4.1      rlang_1.1.7         sass_0.4.9         
## [16] tools_4.4.1         utf8_1.2.4          yaml_2.3.10        
## [19] data.table_1.16.2   knitr_1.49          labeling_0.4.3     
## [22] curl_7.0.0          DiceDesign_1.10     RColorBrewer_1.1-3 
## [25] parsnip_1.4.1       withr_3.0.2         workflows_1.3.0    
## [28] nnet_7.3-19         grid_4.4.1          tune_2.0.1         
## [31] fansi_1.0.6         future_1.67.0       globals_0.18.0     
## [34] MASS_7.3-60.2       cli_3.6.5           rmarkdown_2.29     
## [37] generics_0.1.3      rstudioapi_0.17.1   future.apply_1.20.0
## [40] tzdb_0.5.0          cachem_1.1.0        splines_4.4.1      
## [43] dials_1.4.2         parallel_4.4.1      vctrs_0.7.2        
## [46] hardhat_1.4.2       Matrix_1.7-0        jsonlite_1.8.9     
## [49] hms_1.1.3           RobStatTM_1.0.11    listenv_0.9.1      
## [52] gower_1.0.2         jquerylib_0.1.4     recipes_1.3.1      
## [55] glue_1.8.0          parallelly_1.45.1   codetools_0.2-20   
## [58] stringi_1.8.4       rsample_1.3.2       gtable_0.3.6       
## [61] GPfit_1.0-9         pillar_1.9.0        furrr_0.3.1        
## [64] htmltools_0.5.8.1   ipred_0.9-15        lava_1.8.1         
## [67] R6_2.5.1            lhs_1.2.1           evaluate_1.0.1     
## [70] lattice_0.22-6      bslib_0.8.0         class_7.3-22       
## [73] Rcpp_1.0.13         prodlim_2025.04.28  xfun_0.49          
## [76] pkgconfig_2.0.3

Final Exam: Portfolio Analysis

Jourast Buwana

2026-06-10