1 Questions from Textbook (60%)

1.1 Chapter 7

1.1.1 Background (CFA Problems 1–4)

Hennessy & Associates manages a $30 million equity portfolio for the Wilstead Pension Fund. Jason Jones proposed limiting the portfolio to no more than 20 stocks (doubling commitment to the ones Hennessy favors), down from ~40.


1.1.2 CFA Problem 1

Question: Will limiting to 20 stocks likely increase or decrease portfolio risk? Is there a way to reduce from 40 to 20 without significantly affecting risk?

Answer:

(a) Limiting the portfolio to 20 stocks will increase portfolio risk (increase unsystematic/diversifiable risk). With fewer stocks, idiosyncratic risk from individual companies is less likely to cancel out across the portfolio. The portfolio currently holds about 40 stocks with 2–3% allocated to each. Cutting to 20 while doubling position sizes concentrates the portfolio significantly.

(b) Yes — Hennessy could reduce to 20 stocks without significantly affecting risk if he selects stocks with low or negative correlations with each other. By choosing 20 stocks that have low pairwise correlations, the diversification benefit per stock is higher, potentially maintaining a similar level of total portfolio risk. Additionally, if the removed 20 stocks are those most highly correlated with the remaining 20, diversification loss is minimized.


1.1.3 CFA Problem 2

Question: A committee member suggested further reducing to 10 stocks. If reducing to 20 is advantageous, explain why reducing to 10 might be less likely to be advantageous.

Answer:

The benefit of concentrating a skilled stock-picker’s portfolio diminishes as the number of holdings falls below a threshold. Going from 40 → 20 stocks still leaves reasonable diversification while concentrating in the manager’s best ideas. However, going from 20 → 10 introduces diminishing marginal benefits alongside increasing marginal costs:

  • Diminishing benefit: The incremental gain from concentrating even further in “best ideas” is smaller — the 10th-best idea is not dramatically better than the 20th.
  • Increasing cost: Unsystematic risk rises steeply as the number of stocks falls below ~15–20. At 10 stocks, firm-specific shocks to any single holding can have a meaningful impact on total portfolio return.
  • Evaluation basis: If Wilstead evaluates Hennessy independently of the other portfolios, the increased volatility from a 10-stock portfolio may cause apparent underperformance in any given year purely from bad luck, even if skill is unchanged.

1.1.4 CFA Problem 3

Question: If the committee considers the effect of a change in the Hennessy portfolio on the total fund, how might this broader view affect the decision to limit holdings to 10 or 20 issues?

Answer:

From the total fund perspective (all six managers, $280M+), the Hennessy portfolio is just one component. The five other managers hold 150+ issues totaling $250M. Even if Hennessy holds only 10–20 stocks, the total fund remains highly diversified at the overall level.

This broader view makes the proposal more attractive or at least less risky:

  • The unsystematic risk added by Hennessy’s concentration is largely diversified away at the fund level by the other managers’ holdings.
  • The committee should evaluate Hennessy’s contribution based on its marginal impact on the total fund’s risk-return profile, not its standalone variance.
  • Since Hennessy has demonstrated stock-selection skill, concentrating his portfolio (to 10 or 20) allows his alpha to have more impact on the total fund’s return, without meaningfully increasing total fund risk.

Therefore, the broader view makes limiting to even 10 issues more defensible than when evaluating Hennessy’s portfolio in isolation.


1.1.5 CFA Problem 4

Question: Which portfolio cannot lie on the efficient frontier as described by Markowitz?

Portfolio Expected Return (%) Standard Deviation (%)
W 15 36
X 12 15
Z 5 7
Y 9 21

Answer:

Portfolio W cannot lie on the efficient frontier.

Portfolio X dominates Portfolio W: X offers a lower expected return (12% vs 15%), but dramatically lower standard deviation (15% vs 36%). More precisely, no rational investor would hold W when Z offers 5% return with 7% SD — suggesting the risk-return tradeoff for W is far outside the efficient region compared to combinations of other portfolios.

More formally, comparing W and X: for W to be on the efficient frontier, there should be no portfolio with the same or higher expected return and lower risk, or same/lower risk and higher return. Given that X has a much superior Sharpe ratio and Y also dominates W in risk-adjusted terms, W lies below the efficient frontier (inefficient).

Answer: Portfolio W (a)


1.1.6 CFA Problem 10

Question: Given three stocks A, B, C with the following statistics, which portfolio would you recommend: equal amounts of A & B, or equal amounts of B & C?

Stock Std Dev (%) Correlation with A Correlation with B Correlation with C
A 40 1.00 0.90 0.50
B 20 0.90 1.00 0.10
C 40 0.50 0.10 1.00

Answer:

Calculate portfolio variance for each (equal weights = 0.5 each):

Portfolio A & B:

\[\sigma^2_{AB} = (0.5)^2(40)^2 + (0.5)^2(20)^2 + 2(0.5)(0.5)(0.90)(40)(20)\] \[= 400 + 100 + 720 = 1220\] \[\sigma_{AB} = \sqrt{1220} \approx 34.93\%\]

Portfolio B & C:

\[\sigma^2_{BC} = (0.5)^2(20)^2 + (0.5)^2(40)^2 + 2(0.5)(0.5)(0.10)(20)(40)\] \[= 100 + 400 + 80 = 580\] \[\sigma_{BC} = \sqrt{580} \approx 24.08\%\]

Recommendation: Portfolio B & C. Since we have no information on expected returns (only risk), and B&C has substantially lower portfolio standard deviation (≈24.08% vs ≈34.93%), the B&C portfolio offers superior diversification. The near-zero correlation (0.10) between B and C provides strong diversification benefits, far outweighing A&B’s high correlation (0.90).


1.2 Chapter 8

1.2.1 CFA Problem 1

Question: Regression results for ABC and XYZ stocks:

Statistic ABC XYZ
Alpha -3.20% 7.3%
Beta 0.60 0.97
0.35 0.17
Residual standard deviation 13.02% 21.45%

Additional betas from brokerages (past 2 years):

Brokerage Beta of ABC Beta of XYZ
A 0.62 1.45
B 0.71 1.25

Answer:

ABC: The negative alpha (−3.20%) indicates ABC underperformed relative to CAPM predictions over the 5-year period. Beta of 0.60 indicates below-market systematic risk. R² = 0.35 means 35% of ABC’s return variance is explained by the market — meaning 65% is firm-specific. The relatively low beta and low R² suggest it’s a defensive, less market-sensitive stock with substantial idiosyncratic risk.

XYZ: Positive alpha (7.3%) indicates outperformance relative to CAPM. Beta ≈ 1.0 means it moves nearly 1-for-1 with the market. However, R² = 0.17 is very low — only 17% of variance explained by the market — and the residual standard deviation (21.45%) is high. Most of XYZ’s risk is firm-specific.

Implications for future risk-return in a diversified portfolio:

  • In a well-diversified portfolio, unsystematic risk is eliminated. Therefore, the relevant risk measure is beta, not total standard deviation.
  • ABC (β = 0.60) will contribute less systematic risk; XYZ (β ≈ 0.97–1.45 depending on the period) contributes more.
  • The recent brokerage estimates show XYZ’s beta has increased substantially (1.25–1.45), suggesting higher systematic risk going forward.
  • ABC’s alpha may not persist; past negative alpha does not guarantee future underperformance, and vice versa for XYZ.
  • For a diversified portfolio, XYZ carries more relevant (systematic) risk than the 5-year regression implies.

1.2.2 CFA Problem 2

Question: The correlation between Baker Fund and the market index is 0.70. What percentage of Baker Fund’s total risk is specific (nonsystematic)?

Answer:

\[R^2 = \rho^2 = (0.70)^2 = 0.49\]

  • Systematic risk accounts for 49% of total variance.
  • Nonsystematic (specific) risk = \(1 - R^2 = 1 - 0.49 = \mathbf{0.51}\)

51% of Baker Fund’s total risk is nonsystematic (specific/firm-level).


1.2.3 CFA Problem 3

Question: The correlation between Charlottesville International Fund and the world market index is 1.0. Expected world market return = 11%, expected fund return = 9%, risk-free rate = 3%. What is the implied beta?

Answer:

Using CAPM: \[E(r) = r_f + \beta [E(r_M) - r_f]\] \[9\% = 3\% + \beta(11\% - 3\%)\] \[6\% = \beta \times 8\%\] \[\beta = \frac{6\%}{8\%} = \mathbf{0.75}\]

The implied beta of Charlottesville International is 0.75.

(Note: Since ρ = 1.0, the fund moves perfectly with the world market but at 75% of its magnitude.)


1.2.4 CFA Problem 4

Question: The concept of beta is most closely associated with:

  1. Correlation coefficients
  2. Mean-variance analysis
  3. Nonsystematic risk
  4. Systematic risk

Answer: (d) Systematic risk.

Beta measures the sensitivity of an asset’s returns to market (systematic) risk — the risk that cannot be diversified away. It quantifies how much a security moves relative to the overall market.


1.2.5 CFA Problem 5

Question: Beta and standard deviation differ as risk measures in that beta measures:

  1. Only unsystematic risk, while standard deviation measures total risk.
  2. Only systematic risk, while standard deviation measures total risk.
  3. Both systematic and unsystematic risk, while standard deviation measures only unsystematic risk.
  4. Both systematic and unsystematic risk, while standard deviation measures only systematic risk.

Answer: (b)

  • Beta measures only the systematic (market) component of risk — the co-movement with the market portfolio.
  • Standard deviation captures total risk = systematic + unsystematic risk.

In a diversified portfolio, unsystematic risk is eliminated, making beta the appropriate risk measure. Standard deviation is appropriate for undiversified portfolios or individual assets held in isolation.


1.3 Chapter 9

The following data apply to CFA Problems 8 and 9:

Portfolio Avg Annual Return Std Dev Beta
R 11% 10% 0.5
S&P 500 14% 12% 1.0

(Assume risk-free rate ≈ implied from SML; using S&P as market proxy)


1.3.1 CFA Problem 8

Question: When plotting portfolio R relative to the SML, portfolio R lies:

  1. On the SML
  2. Below the SML
  3. Above the SML
  4. Insufficient data given

Answer:

We need a risk-free rate to draw the SML. Using the S&P 500 as the market (β=1, return=14%) and solving for the implied risk-free rate using CAPM with portfolio R:

\[E(R_R) = r_f + \beta_R [E(R_M) - r_f]\] \[11\% = r_f + 0.5(14\% - r_f)\] \[11\% = r_f + 7\% - 0.5r_f\] \[4\% = 0.5r_f\] \[r_f = 8\%\]

So the SML-predicted return for β = 0.5: \[E(R) = 8\% + 0.5(14\% - 8\%) = 8\% + 3\% = 11\%\]

Portfolio R’s actual return = 11% = predicted return = 11%. Therefore portfolio R lies exactly on the SML.

Answer: (a) On the SML.


1.3.2 CFA Problem 9

Question: When plotting portfolio R relative to the CML, portfolio R lies:

  1. On the CML
  2. Below the CML
  3. Above the CML
  4. Insufficient data given

Answer:

The CML is defined using total risk (standard deviation), not beta. Using rf = 8% (derived above):

\[\text{CML slope (Sharpe ratio of market)} = \frac{14\% - 8\%}{12\%} = 0.50\]

CML-predicted return for σ = 10%: \[E(R) = 8\% + 0.50 \times 10\% = 13\%\]

Portfolio R’s actual return = 11% < 13% (the CML prediction for σ = 10%).

Therefore, portfolio R lies below the CML.

Answer: (b) Below the CML.

(This is consistent — R is on the SML because its systematic risk is fairly priced, but it lies below the CML because it is not a perfectly diversified portfolio; it carries some unsystematic risk reducing its risk-adjusted efficiency.)


1.3.3 CFA Problem 10

Question: Both Portfolio A and Portfolio B have beta = 1.0, but A has high specific risk and B has low specific risk. Should investors expect a higher return on A than B?

Answer:

No. According to CAPM, investors should not expect a higher return on Portfolio A than Portfolio B.

CAPM holds that only systematic risk (beta) is compensated in equilibrium because:

  • Specific (unsystematic) risk is diversifiable — investors can eliminate it at no cost by holding a well-diversified portfolio.
  • Since rational investors hold diversified portfolios, they will not demand a risk premium for bearing avoidable unsystematic risk.

Both portfolios have the same beta (1.0), so CAPM predicts the same expected return for both: \[E(R_A) = E(R_B) = r_f + 1.0 \times [E(R_M) - r_f]\]

Portfolio A’s higher specific risk simply means greater total volatility, but this does not justify higher expected return in a rational, efficient market. Investors holding A in a diversified portfolio face no additional systematic risk vs. holding B.


1.4 Chapter 10

Context (Problems 13–16): Orb Trust uses a two-factor APT model. Factors: (1) real GDP changes, (2) inflation changes.

  • Factor risk premium for real GDP = 8%
  • Factor risk premium for inflation = 2%
  • High Growth Fund: GDP sensitivity = 1.25, inflation sensitivity = 1.5
  • Large Cap Fund: expected return = 8.5% above risk-free; GDP sensitivity = 0.75, inflation sensitivity = 1.25
  • Utility Fund: GDP sensitivity = 1.0, inflation sensitivity = 2.0
  • Risk-free rate = 4%

1.4.1 Problem 13

Question: Using APT with risk-free rate = 4%, what is the expected return of Orb’s High Growth Fund?

Answer:

\[E(R) = r_f + \beta_{GDP} \times RP_{GDP} + \beta_{INF} \times RP_{INF}\] \[E(R) = 4\% + 1.25 \times 8\% + 1.5 \times 2\%\] \[E(R) = 4\% + 10\% + 3\%\] \[\boxed{E(R_{HGF}) = 17\%}\]


1.4.2 Problem 14

Question: Is an arbitrage opportunity available based on McCracken’s APT estimate vs. Kwon’s fundamental analysis for the Large Cap Fund?

Answer:

APT-implied expected return for Large Cap Fund: \[E(R_{LCF})^{APT} = 4\% + 0.75 \times 8\% + 1.25 \times 2\% = 4\% + 6\% + 2.5\% = 12.5\%\]

Kwon’s fundamental analysis: Expected return = 8.5% above the risk-free rate = 4% + 8.5% = 12.5%

The APT estimate (12.5%) equals the fundamental estimate (12.5%).

No arbitrage opportunity exists. The Large Cap Fund is fairly priced according to both methods. An arbitrage opportunity would require a discrepancy between the model-implied and actual/fundamental expected returns.


1.4.3 Problem 15

Question: If the GDP Fund is constructed from the three funds (High Growth, Large Cap, Utility) to have unit sensitivity to GDP and zero sensitivity to inflation, what is its weight in the Utility Fund?

Answer:

Let weights be \(w_H\), \(w_L\), \(w_U\) for High Growth, Large Cap, and Utility Fund respectively.

Constraint 1 (weights sum to 1): \[w_H + w_L + w_U = 1\]

Constraint 2 (GDP sensitivity = 1): \[1.25 w_H + 0.75 w_L + 1.0 w_U = 1\]

Constraint 3 (inflation sensitivity = 0): \[1.5 w_H + 1.25 w_L + 2.0 w_U = 0\]

From Constraint 3: \[1.5 w_H + 1.25 w_L + 2.0 w_U = 0\]

From Constraint 1: \(w_L = 1 - w_H - w_U\)

Substituting into Constraint 3: \[1.5 w_H + 1.25(1 - w_H - w_U) + 2.0 w_U = 0\] \[1.5 w_H + 1.25 - 1.25 w_H - 1.25 w_U + 2.0 w_U = 0\] \[0.25 w_H + 0.75 w_U = -1.25\] \[w_H + 3 w_U = -5 \quad \text{...(A)}\]

Substituting \(w_L = 1 - w_H - w_U\) into Constraint 2: \[1.25 w_H + 0.75(1 - w_H - w_U) + 1.0 w_U = 1\] \[1.25 w_H + 0.75 - 0.75 w_H - 0.75 w_U + w_U = 1\] \[0.5 w_H + 0.25 w_U = 0.25\] \[2 w_H + w_U = 1 \quad \text{...(B)}\]

From (A): \(w_H = -5 - 3w_U\)

Substituting into (B): \[2(-5 - 3w_U) + w_U = 1\] \[-10 - 6w_U + w_U = 1\] \[-5w_U = 11\] \[w_U = -2.2\]

Answer: (a) −2.2

(Negative weight implies a short position in the Utility Fund.)

Then: \(w_H = -5 - 3(-2.2) = -5 + 6.6 = 1.6\) and \(w_L = 1 - 1.6 - (-2.2) = 1.6\)


1.4.4 Problem 16

Question: Who is correct — Stiles (GDP Fund is good for retirees) or McCracken (good if supply-side policies succeed)?

Answer: (c) Stiles is correct and McCracken is wrong.

  • The GDP Fund has unit exposure to real GDP growth and zero exposure to inflation.
  • Stiles argues this is appropriate for retirees who depend on steady investment income. Retirees benefit from real GDP growth (positive economic returns) while being insulated from inflation risk — which erodes the purchasing power of fixed income. This is a sensible rationale.
  • McCracken suggests it’s only good if supply-side policies succeed (i.e., GDP growth without inflation). But the fund already hedges out inflation risk by construction — its zero inflation sensitivity means it doesn’t depend on such policy outcomes to protect investors.
  • Therefore, the GDP Fund’s inflation-neutral, GDP-sensitive structure is generally appropriate for income-dependent investors regardless of specific macroeconomic policy outcomes.

2 Questions Using R Codes (40%)

2.1 Q1: Import Data

library(tidyquant)
library(lubridate)
library(timetk)
library(purrr)
library(tibble)
library(dplyr)
library(tidyr)
library(ggplot2)
library(quadprog)
library(zoo)      # for as.yearmon()
library(scales)
# Define tickers
tickers <- c("SPY", "QQQ", "EEM", "IWM", "EFA", "TLT", "IYR", "GLD")

# Download daily adjusted prices from Yahoo Finance
prices_raw <- tq_get(tickers,
                     from = "2010-01-01",
                     to   = Sys.Date(),
                     get  = "stock.prices")

# Extract adjusted closing prices and reshape to wide format
prices_wide <- prices_raw %>%
  select(symbol, date, adjusted) %>%
  tidyr::pivot_wider(names_from = symbol, values_from = adjusted)

# Convert to xts
prices_xts <- prices_wide %>%
  column_to_rownames("date") %>%
  as.xts()

# Preview
head(prices_xts)
##                 SPY      QQQ      EEM      IWM      EFA      TLT      IYR
## 2010-01-04 84.79639 40.29079 30.35150 51.36656 35.12844 55.70953 26.76810
## 2010-01-05 85.02087 40.29079 30.57181 51.18993 35.15940 56.06929 26.83238
## 2010-01-06 85.08070 40.04777 30.63577 51.14175 35.30801 55.31872 26.82070
## 2010-01-07 85.43983 40.07380 30.45810 51.51911 35.17178 55.41178 27.06026
## 2010-01-08 85.72415 40.40362 30.69972 51.80012 35.45042 55.38694 26.87912
## 2010-01-11 85.84387 40.23872 30.63577 51.59135 35.74147 55.08302 27.00768
##               GLD
## 2010-01-04 109.80
## 2010-01-05 109.70
## 2010-01-06 111.51
## 2010-01-07 110.82
## 2010-01-08 111.37
## 2010-01-11 112.85
tail(prices_xts)
##               SPY    QQQ   EEM    IWM    EFA   TLT    IYR    GLD
## 2026-06-02 759.57 746.16 70.80 291.66 105.02 85.65  99.99 411.95
## 2026-06-03 754.24 744.21 69.92 287.67 104.12 85.31 100.00 407.87
## 2026-06-04 757.09 740.61 69.10 292.01 104.95 85.50 101.79 411.27
## 2026-06-05 737.55 705.06 64.59 281.65 102.26 85.06 102.54 396.24
## 2026-06-08 739.22 716.07 65.75 284.11 102.88 84.62 101.08 397.27
## 2026-06-09     NA     NA    NA     NA     NA    NA     NA     NA

2.2 Q2: Calculate Weekly and Monthly Returns (Simple Returns)

# Weekly simple returns — use periodReturn correctly on each column
weekly_returns <- do.call(merge, lapply(tickers, function(tk) {
  r <- periodReturn(prices_xts[, tk], period = "weekly", type = "arithmetic")
  colnames(r) <- tk
  r
}))

# Monthly simple returns
monthly_returns <- do.call(merge, lapply(tickers, function(tk) {
  r <- periodReturn(prices_xts[, tk], period = "monthly", type = "arithmetic")
  colnames(r) <- tk
  r
}))

head(weekly_returns)
##                     SPY         QQQ          EEM          IWM         EFA
## 2010-01-04  0.000000000  0.00000000  0.000000000  0.000000000  0.00000000
## 2010-01-11  0.012352948 -0.00129237  0.009365781  0.004376316  0.01745098
## 2010-01-15 -0.009500141 -0.01100069 -0.026907853 -0.009025563 -0.01160788
## 2010-01-25 -0.034054878 -0.03358838 -0.048867763 -0.029522845 -0.04417172
## 2010-02-01 -0.006468655 -0.02369624 -0.014786916 -0.015857303 -0.01998906
## 2010-02-08 -0.029066083 -0.01363864 -0.063088149 -0.035186000 -0.05576346
##                     TLT          IYR          GLD
## 2010-01-04  0.000000000  0.000000000  0.000000000
## 2010-01-11 -0.011246023  0.008949997  0.027777735
## 2010-01-15  0.025675773 -0.011034195 -0.017634009
## 2010-01-25  0.005489306 -0.034784657 -0.030488880
## 2010-02-01 -0.001178641  0.002267015  0.008094484
## 2010-02-08  0.010422290 -0.037087517 -0.039778474
head(monthly_returns)
##                    SPY         QQQ          EEM         IWM          EFA
## 2010-01-29 -0.05241329 -0.07819901 -0.103722583 -0.06048769 -0.074916565
## 2010-02-26  0.03119441  0.04603858  0.017763843  0.04475161  0.002667504
## 2010-03-31  0.06088004  0.07710939  0.081108917  0.08230661  0.063854336
## 2010-04-30  0.01546954  0.02242536 -0.001662003  0.05678484 -0.028045778
## 2010-05-28 -0.07945441 -0.07392382 -0.093935557 -0.07536655 -0.111927790
## 2010-06-30 -0.05174092 -0.05975630 -0.013986635 -0.07743401 -0.020619795
##                     TLT         IYR          GLD
## 2010-01-29  0.027836639 -0.05195396 -0.034972713
## 2010-02-26 -0.003424549  0.05457126  0.032748219
## 2010-03-31 -0.020573382  0.09748433 -0.004386396
## 2010-04-30  0.033218599  0.06388149  0.058834363
## 2010-05-28  0.051083649 -0.05683564  0.030513147
## 2010-06-30  0.057977618 -0.04670129  0.023553189

2.3 Q3: Convert Monthly Returns to Tibble Format

monthly_tbl <- tk_tbl(monthly_returns, rename_index = "date") %>%
  mutate(date = as.yearmon(date))

head(monthly_tbl)
## # A tibble: 6 × 9
##   date          SPY     QQQ      EEM     IWM      EFA      TLT     IYR      GLD
##   <yearmon>   <dbl>   <dbl>    <dbl>   <dbl>    <dbl>    <dbl>   <dbl>    <dbl>
## 1 Jan 2010  -0.0524 -0.0782 -0.104   -0.0605 -0.0749   0.0278  -0.0520 -0.0350 
## 2 Feb 2010   0.0312  0.0460  0.0178   0.0448  0.00267 -0.00342  0.0546  0.0327 
## 3 Mar 2010   0.0609  0.0771  0.0811   0.0823  0.0639  -0.0206   0.0975 -0.00439
## 4 Apr 2010   0.0155  0.0224 -0.00166  0.0568 -0.0280   0.0332   0.0639  0.0588 
## 5 May 2010  -0.0795 -0.0739 -0.0939  -0.0754 -0.112    0.0511  -0.0568  0.0305 
## 6 Jun 2010  -0.0517 -0.0598 -0.0140  -0.0774 -0.0206   0.0580  -0.0467  0.0236

2.4 Q4: Download Fama-French 3 Factors Data

# ── Download Fama-French 3 Factors (monthly) ──────────────────────────────────
# Primary method: frenchdata package
# Fallback  method: direct download from Ken French's website

ff3_monthly_raw <- tryCatch({
  library(frenchdata)
  ff3_raw <- download_french_data("Fama/French 3 Factors")
  df      <- ff3_raw$subsets$data[[1]]

  # frenchdata may name the date column "date" or "Date" depending on version
  if ("date" %in% names(df)) {
    df <- df %>% rename(date_raw = date)
  } else if ("Date" %in% names(df)) {
    df <- df %>% rename(date_raw = Date)
  } else {
    # first column is the date
    names(df)[1] <- "date_raw"
  }

  # Column names may be "Mkt-RF" or "Mkt.RF" depending on version
  if ("Mkt-RF" %in% names(df)) df <- df %>% rename(Mkt.RF = `Mkt-RF`)

  df %>%
    mutate(
      date   = as.yearmon(as.character(date_raw), "%Y%m"),
      Mkt.RF = as.numeric(Mkt.RF) / 100,
      SMB    = as.numeric(SMB)    / 100,
      HML    = as.numeric(HML)    / 100,
      RF     = as.numeric(RF)     / 100
    ) %>%
    filter(!is.na(date)) %>%
    select(date, Mkt.RF, SMB, HML, RF)

}, error = function(e) {
  # ── Fallback: direct ZIP download from Ken French's website ──────────────
  message("frenchdata failed, trying direct download: ", conditionMessage(e))

  tmp <- tempfile(fileext = ".zip")
  download.file(
    "https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_CSV.zip",
    destfile = tmp, mode = "wb", quiet = TRUE
  )
  csv_file <- unzip(tmp, exdir = tempdir())

  # Skip the header lines and read the monthly section
  lines     <- readLines(csv_file[1])
  start_row <- which(grepl("^\\s*,", lines[1:10])) + 1   # first data row
  # Find blank line that separates annual from monthly
  blank_rows <- which(trimws(lines) == "")
  end_row    <- blank_rows[1] - 1

  monthly_txt <- lines[2:end_row]
  monthly_txt <- monthly_txt[!grepl("^\\s*$", monthly_txt)]

  ff_df <- read.csv(
    text = paste(c("date,Mkt.RF,SMB,HML,RF"), monthly_txt, sep = "\n"),
    stringsAsFactors = FALSE
  )

  ff_df %>%
    mutate(
      date   = as.yearmon(as.character(date), "%Y%m"),
      Mkt.RF = as.numeric(Mkt.RF) / 100,
      SMB    = as.numeric(SMB)    / 100,
      HML    = as.numeric(HML)    / 100,
      RF     = as.numeric(RF)     / 100
    ) %>%
    filter(!is.na(date), !is.na(Mkt.RF)) %>%
    select(date, Mkt.RF, SMB, HML, RF)
})

head(ff3_monthly_raw)
## # A tibble: 6 × 5
##   date       Mkt.RF     SMB     HML     RF
##   <yearmon>   <dbl>   <dbl>   <dbl>  <dbl>
## 1 Jul 1926   0.0289 -0.0255 -0.0239 0.0022
## 2 Aug 1926   0.0264 -0.0114  0.0381 0.0025
## 3 Sep 1926   0.0038 -0.0136  0.0005 0.0023
## 4 Oct 1926  -0.0327 -0.0014  0.0082 0.0032
## 5 Nov 1926   0.0254 -0.0011 -0.0061 0.0031
## 6 Dec 1926   0.0262 -0.0007  0.0006 0.0028
tail(ff3_monthly_raw)
## # A tibble: 6 × 5
##   date       Mkt.RF     SMB     HML     RF
##   <yearmon>   <dbl>   <dbl>   <dbl>  <dbl>
## 1 Nov 2025  -0.0013  0.0054  0.0357 0.003 
## 2 Dec 2025  -0.0036 -0.0103  0.0236 0.0034
## 3 Jan 2026   0.0103  0.0212  0.0386 0.003 
## 4 Feb 2026  -0.0117  0.0024  0.0265 0.0028
## 5 Mar 2026  -0.0518  0.0044  0.0335 0.0029
## 6 Apr 2026   0.0994  0.0013 -0.0127 0.0029

2.5 Q5: Merge Monthly Returns and FF3 Factors

# Ensure consistent date format
ff3_tbl <- ff3_monthly_raw %>%
  select(date, Mkt.RF, SMB, HML, RF)

merged_tbl <- inner_join(monthly_tbl, ff3_tbl, by = "date")

head(merged_tbl)
## # A tibble: 6 × 13
##   date          SPY     QQQ      EEM     IWM      EFA      TLT     IYR      GLD
##   <yearmon>   <dbl>   <dbl>    <dbl>   <dbl>    <dbl>    <dbl>   <dbl>    <dbl>
## 1 Jan 2010  -0.0524 -0.0782 -0.104   -0.0605 -0.0749   0.0278  -0.0520 -0.0350 
## 2 Feb 2010   0.0312  0.0460  0.0178   0.0448  0.00267 -0.00342  0.0546  0.0327 
## 3 Mar 2010   0.0609  0.0771  0.0811   0.0823  0.0639  -0.0206   0.0975 -0.00439
## 4 Apr 2010   0.0155  0.0224 -0.00166  0.0568 -0.0280   0.0332   0.0639  0.0588 
## 5 May 2010  -0.0795 -0.0739 -0.0939  -0.0754 -0.112    0.0511  -0.0568  0.0305 
## 6 Jun 2010  -0.0517 -0.0598 -0.0140  -0.0774 -0.0206   0.0580  -0.0467  0.0236 
## # ℹ 4 more variables: Mkt.RF <dbl>, SMB <dbl>, HML <dbl>, RF <dbl>
dim(merged_tbl)
## [1] 196  13

2.6 Q6: GMV Portfolio — CAPM Covariance Matrix (2015/01)

# Helper: compute GMV weights from covariance matrix
gmv_weights <- function(cov_mat) {
  n  <- ncol(cov_mat)
  Dmat <- 2 * cov_mat
  dvec <- rep(0, n)
  Amat <- cbind(rep(1, n), diag(n))   # sum=1 and no-short constraints
  bvec <- c(1, rep(0, n))
  sol  <- solve.QP(Dmat, dvec, Amat, bvec, meq = 1)
  sol$solution
}

# Filter data: 2010/02 – 2015/01 (60 months)
capm_window <- merged_tbl %>%
  filter(date >= as.yearmon("2010-02") & date <= as.yearmon("2015-01"))

# Extract asset returns and factor
asset_returns <- capm_window %>% select(all_of(tickers))
Mkt_RF        <- capm_window$Mkt.RF
RF_rate       <- capm_window$RF

# Compute CAPM covariance matrix
# Step 1: Excess returns
excess_returns <- sweep(asset_returns, 1, RF_rate, "-")

# Step 2: Regress each asset on market factor to get beta and residual variance
betas       <- numeric(length(tickers))
resid_vars  <- numeric(length(tickers))
mkt_var     <- var(Mkt_RF)

for (i in seq_along(tickers)) {
  fit          <- lm(excess_returns[[i]] ~ Mkt_RF)
  betas[i]     <- coef(fit)[2]
  resid_vars[i] <- var(residuals(fit))
}

# Step 3: CAPM covariance matrix
# Cov(i,j) = beta_i * beta_j * sigma^2_mkt   (off-diagonal)
# Var(i)   = beta_i^2 * sigma^2_mkt + sigma^2_eps_i  (diagonal)
cov_capm <- outer(betas, betas) * mkt_var
diag(cov_capm) <- diag(cov_capm) + resid_vars
colnames(cov_capm) <- rownames(cov_capm) <- tickers

# Compute GMV weights
w_capm <- gmv_weights(cov_capm)
names(w_capm) <- tickers

cat("CAPM-based GMV weights (2015/01):\n")
## CAPM-based GMV weights (2015/01):
print(round(w_capm, 4))
##    SPY    QQQ    EEM    IWM    EFA    TLT    IYR    GLD 
## 0.4471 0.0000 0.0000 0.0000 0.0000 0.4483 0.0373 0.0673
# Realized return in 2015/02
ret_201502 <- merged_tbl %>%
  filter(date == as.yearmon("2015-02")) %>%
  select(all_of(tickers)) %>%
  unlist()

realized_capm <- sum(w_capm * ret_201502)
cat(sprintf("\nRealized portfolio return (CAPM GMV) in 2015/02: %.4f (%.2f%%)\n",
            realized_capm, realized_capm * 100))
## 
## Realized portfolio return (CAPM GMV) in 2015/02: -0.0073 (-0.73%)

2.7 Q7: GMV Portfolio — FF3-Factor Covariance Matrix (2015/01)

# Same 60-month window
ff3_factors <- capm_window %>% select(Mkt.RF, SMB, HML)

betas_ff3      <- matrix(0, nrow = length(tickers), ncol = 3)
resid_vars_ff3 <- numeric(length(tickers))

for (i in seq_along(tickers)) {
  fit <- lm(excess_returns[[i]] ~ ff3_factors$Mkt.RF +
              ff3_factors$SMB + ff3_factors$HML)
  betas_ff3[i, ]    <- coef(fit)[-1]
  resid_vars_ff3[i] <- var(residuals(fit))
}

# Factor covariance matrix (3x3)
factor_cov <- cov(ff3_factors)

# FF3 covariance matrix: B * Sigma_F * B' + D
cov_ff3 <- betas_ff3 %*% factor_cov %*% t(betas_ff3)
diag(cov_ff3) <- diag(cov_ff3) + resid_vars_ff3
colnames(cov_ff3) <- rownames(cov_ff3) <- tickers

w_ff3 <- gmv_weights(cov_ff3)
names(w_ff3) <- tickers

cat("FF3-based GMV weights (2015/01):\n")
## FF3-based GMV weights (2015/01):
print(round(w_ff3, 4))
##    SPY    QQQ    EEM    IWM    EFA    TLT    IYR    GLD 
## 0.4579 0.0000 0.0000 0.0000 0.0000 0.4507 0.0334 0.0581
realized_ff3 <- sum(w_ff3 * ret_201502)
cat(sprintf("\nRealized portfolio return (FF3 GMV) in 2015/02: %.4f (%.2f%%)\n",
            realized_ff3, realized_ff3 * 100))
## 
## Realized portfolio return (FF3 GMV) in 2015/02: -0.0062 (-0.62%)

2.8 Q8: Backtest GMV Portfolios — CAPM vs FF3 (2015/02 to 2026/05)

# All months with sufficient history
all_dates <- merged_tbl$date
backtest_dates <- all_dates[all_dates >= as.yearmon("2015-02") &
                              all_dates <= as.yearmon("2026-05")]

results <- tibble(
  date         = backtest_dates,
  ret_capm_gmv = NA_real_,
  ret_ff3_gmv  = NA_real_
)

for (t_idx in seq_along(backtest_dates)) {
  t_date <- backtest_dates[t_idx]

  # Rolling window: 60 months ending one month before t
  window_end   <- as.yearmon(t_date - 1/12)
  window_start <- as.yearmon(window_end - 59/12)

  win_data <- merged_tbl %>%
    filter(date >= window_start & date <= window_end)

  if (nrow(win_data) < 60) next

  ar     <- win_data %>% select(all_of(tickers))
  mkt_rf <- win_data$Mkt.RF
  rf_w   <- win_data$RF
  smb_w  <- win_data$SMB
  hml_w  <- win_data$HML

  er <- sweep(as.matrix(ar), 1, rf_w, "-")

  # ---- CAPM covariance ----
  b_c <- numeric(8); rv_c <- numeric(8)
  mv  <- var(mkt_rf)
  for (i in 1:8) {
    fit      <- lm(er[, i] ~ mkt_rf)
    b_c[i]   <- coef(fit)[2]
    rv_c[i]  <- var(residuals(fit))
  }
  cov_c       <- outer(b_c, b_c) * mv
  diag(cov_c) <- diag(cov_c) + rv_c

  # ---- FF3 covariance ----
  ff_fac  <- cbind(mkt_rf, smb_w, hml_w)
  b_f     <- matrix(0, 8, 3); rv_f <- numeric(8)
  fc_cov  <- cov(ff_fac)
  for (i in 1:8) {
    fit      <- lm(er[, i] ~ ff_fac)
    b_f[i,]  <- coef(fit)[-1]
    rv_f[i]  <- var(residuals(fit))
  }
  cov_f       <- b_f %*% fc_cov %*% t(b_f)
  diag(cov_f) <- diag(cov_f) + rv_f

  # GMV weights (with error handling)
  tryCatch({
    w_c <- gmv_weights(cov_c)
    w_f <- gmv_weights(cov_f)

    ret_t <- merged_tbl %>%
      filter(date == t_date) %>%
      select(all_of(tickers)) %>%
      unlist()

    results$ret_capm_gmv[t_idx] <- sum(w_c * ret_t)
    results$ret_ff3_gmv[t_idx]  <- sum(w_f * ret_t)
  }, error = function(e) NULL)
}

# Remove any NAs
results <- results %>% filter(!is.na(ret_capm_gmv))

# Cumulative returns
results <- results %>%
  mutate(
    cum_capm = cumprod(1 + ret_capm_gmv),
    cum_ff3  = cumprod(1 + ret_ff3_gmv)
  )

head(results)
## # A tibble: 6 × 5
##   date      ret_capm_gmv ret_ff3_gmv cum_capm cum_ff3
##   <yearmon>        <dbl>       <dbl>    <dbl>   <dbl>
## 1 Feb 2015      -0.00733    -0.00622    0.993   0.994
## 2 Mar 2015      -0.00322    -0.00313    0.989   0.991
## 3 Apr 2015      -0.0133     -0.0132     0.976   0.978
## 4 May 2015      -0.00442    -0.00431    0.972   0.973
## 5 Jun 2015      -0.0298     -0.0297     0.943   0.944
## 6 Jul 2015       0.0275      0.0282     0.969   0.971
# Plot cumulative returns
results_long <- results %>%
  select(date, cum_capm, cum_ff3) %>%
  mutate(date = as.Date(date)) %>%
  pivot_longer(cols = c(cum_capm, cum_ff3),
                      names_to = "model",
                      values_to = "cumulative_return") %>%
  mutate(model = recode(model,
                        "cum_capm" = "CAPM GMV",
                        "cum_ff3"  = "FF3 GMV"))

ggplot(results_long, aes(x = date, y = cumulative_return, color = model)) +
  geom_line(linewidth = 1.1) +
  scale_y_continuous(labels = scales::dollar_format(prefix = "$")) +
  scale_color_manual(values = c("CAPM GMV" = "#2196F3", "FF3 GMV" = "#F44336")) +
  labs(
    title    = "Cumulative Returns: GMV Portfolios (CAPM vs FF3)",
    subtitle = "Rolling 60-month window | Feb 2015 – May 2026",
    x        = "Date",
    y        = "Growth of $1 Invested",
    color    = "Model"
  ) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "bottom")

# Summary statistics
results %>%
  summarise(
    CAPM_Cumulative  = last(cum_capm) - 1,
    FF3_Cumulative   = last(cum_ff3)  - 1,
    CAPM_Ann_Return  = prod(1 + ret_capm_gmv)^(12 / n()) - 1,
    FF3_Ann_Return   = prod(1 + ret_ff3_gmv)^(12 / n()) - 1,
    CAPM_Ann_Vol     = sd(ret_capm_gmv) * sqrt(12),
    FF3_Ann_Vol      = sd(ret_ff3_gmv)  * sqrt(12),
    CAPM_Sharpe      = (prod(1 + ret_capm_gmv)^(12/n()) - 1) / (sd(ret_capm_gmv) * sqrt(12)),
    FF3_Sharpe       = (prod(1 + ret_ff3_gmv)^(12/n())  - 1) / (sd(ret_ff3_gmv)  * sqrt(12))
  ) %>%
  pivot_longer(everything(), names_to = "Metric", values_to = "Value") %>%
  mutate(Value = round(Value, 4)) %>%
  knitr::kable(caption = "Performance Summary: CAPM GMV vs FF3 GMV (2015/02 – 2026/05)")
Performance Summary: CAPM GMV vs FF3 GMV (2015/02 – 2026/05)
Metric Value
CAPM_Cumulative 1.3228
FF3_Cumulative 1.2949
CAPM_Ann_Return 0.0778
FF3_Ann_Return 0.0766
CAPM_Ann_Vol 0.1049
FF3_Ann_Vol 0.1057
CAPM_Sharpe 0.7418
FF3_Sharpe 0.7250