1 Part I – Questions from Textbook (60%)

1.1 Chapter 7

1.1.1 CFA Problem 1

Context: Hennessy & Associates manages a $30 million equity portfolio for Wilstead Pension Fund, holding approximately 40 stocks with 2–3% committed to each issue. Jones proposes limiting the portfolio to no more than 20 stocks, doubling the commitment to each retained stock.

1.1.1.1 (a) Will limiting to 20 stocks likely increase or decrease portfolio risk? Explain.

Answer:

Limiting the portfolio from 40 to 20 stocks will increase portfolio risk. The reason is loss of diversification. When more stocks are held, firm-specific (unsystematic) risk is averaged out across positions — poor performance of one holding is offset by others. Cutting the number of holdings in half reduces this averaging effect, increasing the portfolio’s exposure to individual stock shocks.

Additionally, because Hennessy doubles the weight of each remaining stock (from ~2.5% to ~5%), the impact of any single stock’s bad performance on the total portfolio is magnified. Total portfolio variance rises because:

\[\sigma_p^2 = \sum_i w_i^2 \sigma_i^2 + \sum_i \sum_{j \ne i} w_i w_j \sigma_i \sigma_j \rho_{ij}\]

With larger weights $w_i$ on fewer stocks, the variance terms grow substantially.

1.1.1.2 (b) Is there any way Hennessy could reduce from 40 to 20 stocks without significantly affecting risk? Explain.

Answer:

Yes — if Hennessy selects the 20 retained stocks such that their pairwise correlations are sufficiently low, the portfolio risk may not increase significantly. The key insight from portfolio theory is that diversification benefit comes from low correlation, not merely from holding many stocks.

If the 40-stock portfolio contains many stocks that are highly correlated with each other (e.g., stocks within the same industry or sector), eliminating the redundant correlated stocks and retaining 20 stocks that span different industries and geographies — with low pairwise correlations — can preserve most of the diversification benefit. The number of holdings matters less than the correlation structure among them.

1.1.2 CFA Problem 2

Question: One committee member suggested reducing further to 10 stocks. If reducing to 20 is advantageous, why might reducing to 10 be less likely to be advantageous?

Answer:

The benefit of reducing from 40 to 20 is that Hennessy can concentrate capital into his best 20 stock ideas, potentially increasing alpha while accepting a modest increase in unsystematic risk.

However, reducing from 20 to 10 stocks is less likely to be advantageous for the following reasons:

Rapidly diminishing diversification at low stock counts. The marginal diversification benefit of each additional stock is highest when the portfolio has very few holdings. Going from 40 → 20 sacrifices relatively little diversification; going from 20 → 10 sacrifices proportionally far more unsystematic risk, as shown in typical portfolio variance vs. number-of-stocks curves.
Overconcentration amplifies individual stock risk. With only 10 stocks at ~10% each, a single stock experiencing adverse firm-specific news (earnings miss, fraud, litigation) can devastate portfolio performance. The margin for error essentially disappears.
Hennessy identifies ~10 superior stocks per year. A 10-stock portfolio holds only the “best ideas” with no room for any hedging, diversifying, or buffer positions. The portfolio becomes entirely dependent on every single pick being correct.
Independent evaluation makes bad luck look like bad skill. Wilstead evaluates Hennessy independently. With only 10 positions, a string of bad luck — even with genuine skill — is far more likely to produce poor measured returns, damaging Hennessy’s track record unfairly.

1.1.3 CFA Problem 3

Question: Another committee member suggested considering the effects on the total fund rather than evaluating Hennessy’s portfolio independently. How would this broader view affect the decision to limit holdings to 10 or 20 issues?

Answer:

This broader perspective supports allowing Hennessy even greater concentration — possibly down to 10 stocks. Here is why:

The total Wilstead fund is $280 million: Hennessy’s $30M plus five other managers’ $250M holding 150+ individual stocks. Viewed from the total fund level, the fund is already broadly diversified through the other managers. The aggregate unsystematic risk of the total fund is very low regardless of how concentrated Hennessy’s individual sleeve is.

Since Hennessy’s $30M represents only about 10.7% of the total fund, the marginal contribution of his portfolio’s unsystematic risk to the total fund’s variance is small. What matters for the total fund’s risk-adjusted return is the systematic risk (beta) of Hennessy’s sleeve — which does not change with the number of stocks held.

Therefore, the committee should focus exclusively on maximising Hennessy’s alpha contribution (stock-picking returns), not on the diversification of his sleeve. If concentrating to 10 stocks produces higher alpha per dollar invested, that is better for the total fund. The diversification function is already handled by the rest of the fund.

1.1.4 CFA Problem 4

Question: Which one of the following portfolios cannot lie on the efficient frontier as described by Markowitz?

Portfolio	Expected Return (%)	Standard Deviation (%)
W	15	36
X	12	15
Z	5	7
Y	9	21

Answer: Portfolio Y — answer choice (d)

Reasoning:

A portfolio lies on the Markowitz efficient frontier only if no other feasible portfolio offers higher expected return for the same (or lower) risk, or lower risk for the same (or higher) return. A portfolio that is dominated — meaning another portfolio beats it on at least one dimension without being worse on any other — cannot be efficient.

Apply the dominance test to each portfolio:

Portfolio W (15%, 36%): No single portfolio in the set offers both higher return and lower risk simultaneously. W has the highest return. It is not dominated — it can lie on the frontier (at the high-risk, high-return end).
Portfolio X (12%, 15%): No portfolio in the set offers higher return with lower risk. X is not dominated — it can lie on the frontier.
Portfolio Z (5%, 7%): Z has the lowest risk. No portfolio offers lower risk with equal or higher return. Z is not dominated — it can lie on the frontier (near the minimum-variance point).
Portfolio Y (9%, 21%): Compare Y to X. Portfolio X has return 12% > 9% and standard deviation 15% < 21%. X dominates Y on both dimensions simultaneously. A rational, risk-averse investor would always prefer X to Y.

\[\underbrace{E(r_X) = 12\% > 9\% = E(r_Y)}_{\text{higher return}} \quad \text{and} \quad \underbrace{\sigma_X = 15\% < 21\% = \sigma_Y}_{\text{lower risk}}\]

Therefore Portfolio Y is dominated by Portfolio X and cannot lie on the efficient frontier.

Final answer: (d) Portfolio Y

1.1.5 CFA Problem 10

Question: Statistics for stocks A, B, C are given below. Using only this information, which portfolio would you recommend — equal amounts of A & B, or equal amounts of B & C?

Standard Deviations:

Stock	σ (%)
A	40
B	20
C	40

Correlation Matrix:

Stock	A	B	C
A	1.00	0.90	0.50
B	0.90	1.00	0.10
C	0.50	0.10	1.00

Answer: Recommend Portfolio B & C

Step-by-step calculation:

For an equal-weight (50%/50%) two-asset portfolio:

\[\sigma_p^2 = w_1^2\sigma_1^2 + w_2^2\sigma_2^2 + 2\,w_1 w_2\,\rho_{12}\,\sigma_1\sigma_2\]

Portfolio A & B (50% A, 50% B, $\rho_{AB} = 0.90$):

\[\sigma_{AB}^2 = (0.5)^2(40)^2 + (0.5)^2(20)^2 + 2(0.5)(0.5)(0.90)(40)(20)\] \[= 0.25 \times 1600 + 0.25 \times 400 + 0.5 \times 0.90 \times 800\] \[= 400 + 100 + 360 = \mathbf{860}\] \[\sigma_{AB} = \sqrt{860} \approx \mathbf{29.33\%}\]

Portfolio B & C (50% B, 50% C, $\rho_{BC} = 0.10$):

\[\sigma_{BC}^2 = (0.5)^2(20)^2 + (0.5)^2(40)^2 + 2(0.5)(0.5)(0.10)(20)(40)\] \[= 0.25 \times 400 + 0.25 \times 1600 + 0.5 \times 0.10 \times 800\] \[= 100 + 400 + 40 = \mathbf{540}\] \[\sigma_{BC} = \sqrt{540} \approx \mathbf{23.24\%}\]

Comparison:

Portfolio	σ (%)	Why?
A & B	29.33	High correlation (0.90) — little diversification benefit
B & C	23.24	Low correlation (0.10) — strong diversification benefit

Since the problem provides only risk data (no expected return data), the comparison is made purely on risk. Portfolio B & C has a substantially lower standard deviation (≈23.24% vs ≈29.33%) due to the much lower correlation between B and C (0.10 vs 0.90). The diversification benefit is dramatically stronger in the B & C combination.

Recommendation: Portfolio B & C — it offers the same expected return exposure at significantly lower portfolio risk.

1.2 Chapter 8

1.2.1 CFA Problem 1

Question: When annualised monthly excess returns for ABC and XYZ were regressed on the S&P 500 index over a 5-year period, the results were:

Statistic	ABC	XYZ
Alpha (α)	−3.20%	+7.3%
Beta (β)	0.60	0.97
R²	0.35	0.17
Residual std dev (σ_e)	13.02%	21.45%

Additional brokerage-house beta estimates (past 2 years):

Brokerage House	Beta of ABC	Beta of XYZ
A	0.62	1.45
B	0.71	1.25

Answer:

Interpreting the 5-year regression results:

ABC:

Alpha = −3.20%: ABC underperformed the market on a risk-adjusted basis by 3.20% per year. After accounting for market exposure, ABC generated negative abnormal returns.
Beta = 0.60: ABC is a defensive stock — its excess return moves only 60% as much as the market’s excess return. Below-average systematic risk.
R² = 0.35: Only 35% of ABC’s return variance is explained by market movements. The remaining 65% is firm-specific (unsystematic) risk.
Residual σ = 13.02%: Significant idiosyncratic volatility beyond market exposure.

XYZ:

Alpha = +7.3%: XYZ generated 7.3% per year above what CAPM would predict — strong positive abnormal returns over the sample period.
Beta = 0.97: XYZ closely tracks the market; near-average systematic risk.
R² = 0.17: Only 17% of XYZ’s variance is market-driven. A massive 83% is unsystematic — XYZ has very high firm-specific risk.
Residual σ = 21.45%: Very large stock-specific volatility.

Implications for future risk-return in a diversified portfolio:

In a well-diversified portfolio, unsystematic risk is eliminated — only systematic risk (beta) matters for expected return. Key observations:

ABC’s beta is stable: The 5-year estimate (0.60) and both brokerage estimates (0.62, 0.71) are close. Future systematic risk is predictable. Expected excess return ≈ 0.64 × market risk premium. However, the negative historical alpha suggests ABC may continue to underperform on a risk-adjusted basis.
XYZ’s beta has risen sharply: The 5-year regression gives β = 0.97, but recent brokerage estimates are 1.45 and 1.25 — substantially higher. This means XYZ’s systematic risk has increased recently. Under CAPM, a higher beta means a higher required return, making it harder for XYZ to generate positive alpha going forward. The historical +7.3% alpha was earned during a lower-beta period and should not be extrapolated.
Both stocks have high residual risk. In a diversified portfolio, these residuals wash out. The key forward-looking inputs are the updated (higher) betas, not historical alphas.

1.2.2 CFA Problem 2

Question: The correlation coefficient between Baker Fund and the market index is 0.70. What percentage of Baker Fund’s total risk is specific (nonsystematic)?

Answer:

The relationship between R², correlation, and systematic vs. unsystematic risk:

\[R^2 = \rho_{i,M}^2 = (0.70)^2 = 0.49\]

R² = 0.49 means 49% of the total variance is systematic (explained by market movements).

The nonsystematic (specific) proportion of total variance is:

\[\text{Nonsystematic proportion} = 1 - R^2 = 1 - 0.49 = \mathbf{0.51 = 51\%}\]

51% of Baker Fund’s total risk is firm-specific (nonsystematic).

This means that more than half of Baker Fund’s total variance could, in theory, be eliminated through diversification.

1.2.3 CFA Problem 3

Question: The correlation between Charlottesville International Fund and the world market index is 1.0. The expected return on the world market is 11%, the expected return on Charlottesville International Fund is 9%, and the risk-free rate is 3%. What is the implied beta of Charlottesville International?

Answer:

Using the CAPM equation and solving for beta:

\[E(r_i) = r_f + \beta_i \left[ E(r_M) - r_f \right]\]

\[9\% = 3\% + \beta_i \times (11\% - 3\%)\]

\[6\% = \beta_i \times 8\%\]

\[\beta_i = \frac{6\%}{8\%} = \mathbf{0.75}\]

Verification using the beta formula (since ρ = 1.0, all risk is systematic):

\[\beta_i = \rho_{i,M} \times \frac{\sigma_i}{\sigma_M} = 1.0 \times \frac{\sigma_i}{\sigma_M}\]

This means Charlottesville’s standard deviation is exactly 75% of the world market’s standard deviation — consistent with a beta of 0.75.

The implied beta of Charlottesville International Fund is 0.75.

1.2.4 CFA Problem 4

Question: The concept of beta is most closely associated with:

Correlation coefficients
Mean-variance analysis
Nonsystematic risk
Systematic risk

Answer: (d) Systematic risk

Beta (β) is the standard measure of a security’s systematic risk — the component of total risk that is attributable to market-wide factors and cannot be eliminated through diversification. It is defined as:

\[\beta_i = \frac{\text{Cov}(r_i, r_M)}{\text{Var}(r_M)} = \rho_{i,M} \cdot \frac{\sigma_i}{\sigma_M}\]

Beta captures how sensitively an asset’s return responds to market movements. It is the central risk measure in both the CAPM and the Security Market Line. Unlike standard deviation (which measures total risk = systematic + unsystematic), beta measures only the undiversifiable, market-related portion of risk.

1.2.5 CFA Problem 5

Question: Beta and standard deviation differ as risk measures in that beta measures:

Only unsystematic risk, while standard deviation measures total risk
Only systematic risk, while standard deviation measures total risk
Both systematic and unsystematic risk, while standard deviation measures only unsystematic risk
Both systematic and unsystematic risk, while standard deviation measures only systematic risk

Answer: (b) Only systematic risk, while standard deviation measures total risk

Risk Measure	What It Captures
Standard deviation (σ)	Total risk = systematic + unsystematic
Beta (β)	Systematic risk only (market risk)

The decomposition of total variance is:

\[\underbrace{\sigma_i^2}_{\text{Total risk}} = \underbrace{\beta_i^2 \sigma_M^2}_{\text{Systematic risk}} + \underbrace{\sigma_{e,i}^2}_{\text{Unsystematic risk}}\]

In a diversified portfolio, unsystematic risk ($\sigma_e^2$) is diversified away and earns no risk premium. Beta captures only the systematic component — the part that remains even in a fully diversified portfolio and for which investors are compensated with higher expected return.

1.3 Chapter 9

Reference data for CFA Problems 8 and 9:

Portfolio	Avg Annual Return	Standard Deviation	Beta
R	11%	10%	0.5
S&P 500	14%	12%	1.0

Note on risk-free rate: The problem does not explicitly state $r_f$. Using the S&P 500 as the market portfolio and the CAPM relationship, we can infer $r_f$ by noting that the SML passes through both $(β=0, r_f)$ and $(β=1.0, 14\%)$. The exam context (Bodie, Kane, Marcus textbook) uses a standard assumption of $r_f = 6\%$, which is consistent with the given data. All calculations below state this assumption explicitly.

1.3.1 CFA Problem 8

Question: When plotting portfolio R relative to the SML (Security Market Line), portfolio R lies:

On the SML
Below the SML
Above the SML
Insufficient data given

Answer: (c) Above the SML

Working:

The SML gives the required return for any asset as a function of its beta. Using $r_f = 6\%$ (assumed) and $E(r_M) = 14\%$:

\[E(r_R)_{\text{SML}} = r_f + \beta_R \times [E(r_M) - r_f]\] \[= 6\% + 0.5 \times (14\% - 6\%)\] \[= 6\% + 0.5 \times 8\% = 6\% + 4\% = \mathbf{10\%}\]

Portfolio R’s actual average return = 11%

\[\alpha_R = \text{Actual return} - \text{SML return} = 11\% - 10\% = +1\%\]

Since the actual return (11%) exceeds the SML-required return (10%), Portfolio R plots above the SML. It has earned positive alpha — it is underpriced relative to its systematic risk and represents a superior risk-adjusted return.

Answer: (c) Above the SML

1.3.2 CFA Problem 9

Question: When plotting portfolio R relative to the CML (Capital Market Line), portfolio R lies:

On the CML
Below the CML
Above the CML
Insufficient data given

Answer: (b) Below the CML

Working:

The CML relates expected return to total risk (standard deviation) for efficient portfolios only. Using $r_f = 6\%$:

\[\text{Slope of CML} = \frac{E(r_M) - r_f}{\sigma_M} = \frac{14\% - 6\%}{12\%} = \frac{8\%}{12\%} = 0.6\overline{6}\]

The CML-predicted return for a portfolio with $\sigma = 10\%$:

\[E(r)_{\text{CML}} = r_f + \text{Slope} \times \sigma_R = 6\% + 0.6\overline{6} \times 10\% = 6\% + 6.67\% = \mathbf{12.67\%}\]

Portfolio R’s actual return = 11% < CML-required 12.67%

\[\text{Portfolio R lies below the CML}\]

Interpretation: Portfolio R is not an efficient portfolio in the CML sense. Because R holds some unsystematic (firm-specific) risk that is not fully compensated by additional return, its Sharpe ratio ($\frac{11-6}{10} = 0.50$) is lower than the market’s Sharpe ratio ($\frac{14-6}{12} = 0.6\overline{6}$). Only efficient (fully diversified) portfolios lie on the CML.

Note: R lies above the SML but below the CML simultaneously — this is consistent, because the SML and CML measure risk differently (beta vs. standard deviation). R has positive alpha on a systematic-risk basis but carries unrewarded unsystematic risk.

Answer: (b) Below the CML

1.3.3 CFA Problem 10

Question: Briefly explain whether investors should expect a higher return on Portfolio A than on Portfolio B according to CAPM.

Feature	Portfolio A	Portfolio B
Systematic risk (beta)	1.0	1.0
Specific risk (per individual stock)	High	Low

Answer: No — investors should expect the same return on both portfolios.

According to CAPM, expected return depends solely on beta (systematic risk):

\[E(r_i) = r_f + \beta_i \left[E(r_M) - r_f\right]\]

Both Portfolio A and Portfolio B have identical betas of 1.0. Therefore CAPM assigns them the same expected return — equal to the market expected return:

\[E(r_A) = E(r_B) = r_f + 1.0 \times [E(r_M) - r_f] = E(r_M)\]

The high specific (unsystematic) risk in Portfolio A does not earn any additional expected return because:

Unsystematic risk is diversifiable — rational investors hold well-diversified portfolios and eliminate it at zero cost.
Since rational investors can and do eliminate it, the market does not compensate investors for bearing it. Only non-diversifiable (systematic) risk is priced.
Portfolio A’s higher total risk ($\sigma_A > \sigma_B$) comes entirely from the specific risk component, which carries no risk premium.

In summary: CAPM says investors should expect the same return from both portfolios. Portfolio A simply has higher total variance, but that extra variance is all unsystematic and therefore unrewarded.

1.4 Chapter 10

Reference: Orb Trust analyst McCracken uses a two-factor APT model with factors:

Factor 1: Sensitivity to changes in real GDP — risk premium = 8%
Factor 2: Sensitivity to changes in inflation — risk premium = 2%

Fund	β_GDP	β_Inflation
High Growth	1.25	1.50
Large Cap	0.75	1.25
Utility	1.00	2.00

1.4.1 Problem 13

Question: If the risk-free rate is 4%, what is McCracken’s APT estimate of the expected return of Orb’s High Growth Fund?

Answer:

The two-factor APT expected return formula:

\[E(r_i) = r_f + \beta_{i,\text{GDP}} \times RP_{\text{GDP}} + \beta_{i,\text{Inf}} \times RP_{\text{Inf}}\]

Substituting the High Growth Fund values:

\[E(r_{\text{HGF}}) = 4\% + (1.25)(8\%) + (1.50)(2\%)\] \[= 4\% + 10\% + 3\%\] \[= \mathbf{17\%}\]

McCracken’s APT estimate for the High Growth Fund is 17%.

1.4.2 Problem 14

Question: With respect to McCracken’s APT model estimate of the Large Cap Fund and Kwon’s fundamental analysis return of 8.5% above the risk-free rate, is an arbitrage opportunity available?

Answer:

Step 1 — APT equilibrium return for the Large Cap Fund:

\[E(r_{\text{LCF}})_{\text{APT}} = 4\% + (0.75)(8\%) + (1.25)(2\%)\] \[= 4\% + 6\% + 2.5\% = \mathbf{12.5\%}\]

Step 2 — Kwon’s fundamental estimate:

Kwon says the expected return is 8.5% above the risk-free rate:

\[E(r_{\text{LCF}})_{\text{Kwon}} = r_f + 8.5\% = 4\% + 8.5\% = \mathbf{12.5\%}\]

Step 3 — Comparison:

\[E(r_{\text{LCF}})_{\text{APT}} = E(r_{\text{LCF}})_{\text{Kwon}} = 12.5\%\]

Since both estimates are exactly equal, the Large Cap Fund is fairly priced — there is no arbitrage opportunity. An arbitrage opportunity would exist only if one estimate exceeded the other, allowing a zero-investment portfolio to be constructed that earns a riskless profit by going long the underpriced asset and short the overpriced one.

Conclusion: No arbitrage opportunity is available.

1.4.3 Problem 15

Question: If the GDP Fund is constructed from the three funds (High Growth, Large Cap, Utility) to have unit sensitivity to real GDP and zero sensitivity to inflation, what is its weight in the Utility Fund?

Options: (a) −2.2, (b) −3.2, (c) 0.3

Answer: (a) −2.2

Full algebraic solution:

Let $w_H$, $w_L$, $w_U$ be the weights in the High Growth, Large Cap, and Utility Funds.

Three constraints:

[C1] Weights sum to 1: \[w_H + w_L + w_U = 1\]

[C2] GDP factor exposure = 1: \[1.25\,w_H + 0.75\,w_L + 1.00\,w_U = 1\]

[C3] Inflation factor exposure = 0: \[1.50\,w_H + 1.25\,w_L + 2.00\,w_U = 0\]

Solve step by step:

Subtract [C1] from [C2]:

\[0.25\,w_H - 0.25\,w_L = 0 \implies w_H = w_L \quad \text{...(i)}\]

Substitute $w_H = w_L$ into [C3]:

\[1.50\,w_H + 1.25\,w_H + 2.00\,w_U = 0\] \[2.75\,w_H + 2.00\,w_U = 0\] \[w_U = -\frac{2.75}{2.00}\,w_H = -1.375\,w_H \quad \text{...(ii)}\]

Substitute (i) and (ii) into [C1]:

\[w_H + w_H - 1.375\,w_H = 1\] \[0.625\,w_H = 1\] \[w_H = \frac{1}{0.625} = 1.6\]

Therefore: \[w_L = 1.6, \qquad w_U = -1.375 \times 1.6 = \mathbf{-2.2}\]

Verification:

GDP: $1.25(1.6) + 0.75(1.6) + 1.0(-2.2) = 2.0 + 1.2 - 2.2 = 1.0$ ✓
Inflation: $1.5(1.6) + 1.25(1.6) + 2.0(-2.2) = 2.4 + 2.0 - 4.4 = 0.0$ ✓
Sum: $1.6 + 1.6 - 2.2 = 1.0$ ✓

Answer: (a) −2.2 — the GDP Fund takes a short position of 2.2 in the Utility Fund.

1.4.4 Problem 16

Question: With respect to Stiles’s and McCracken’s comments about for whom the GDP Fund would be appropriate:

McCracken is correct and Stiles is wrong
Both are correct
Stiles is correct and McCracken is wrong

Answer: (b) Both are correct

Stiles’s view: The GDP Fund is appropriate for retirees who live off steady investment income. This is correct. The GDP Fund has unit sensitivity to real GDP growth and zero sensitivity to inflation. This means:

When the real economy grows, the fund benefits — providing steady real returns.
Inflation surprises do not affect the fund’s return ($\beta_{\text{inflation}} = 0$), protecting retirees’ purchasing power from inflation shocks.
Retirees on fixed income streams benefit from this inflation-neutrality combined with stable real economic growth exposure.

McCracken’s view: The GDP Fund is appropriate if upcoming supply-side macroeconomic policies succeed. This is also correct. Successful supply-side policies (e.g., tax reform, deregulation, productivity improvements) typically:

Boost real GDP growth without necessarily generating inflation.
A fund with positive GDP exposure and zero inflation exposure benefits disproportionately in exactly this scenario.

Both views are valid from their respective angles — Stiles focuses on the investor profile (retirees needing inflation protection and real income) while McCracken focuses on the macroeconomic scenario (supply-side success). The two perspectives are complementary, not contradictory.

Answer: (b) Both are correct

2 Part II – R Code Questions (40%)

2.1 Q1. Import ETF Data from Yahoo Finance

Download daily ETF data for SPY, QQQ, EEM, IWM, EFA, TLT, IYR, GLD from 2010 to today and extract adjusted closing prices.

# ── Ticker list ────────────────────────────────────────────────────────────────
tickers <- c("SPY", "QQQ", "EEM", "IWM", "EFA", "TLT", "IYR", "GLD")

# ── Download daily adjusted prices via tidyquant ───────────────────────────────
etf_raw <- tq_get(
  tickers,
  from = "2010-01-01",
  to   = Sys.Date(),
  get  = "stock.prices"
)

# ── Pivot to wide format: one column per ETF ──────────────────────────────────
prices_wide <- etf_raw %>%
  select(date, symbol, adjusted) %>%
  pivot_wider(names_from  = symbol,
              values_from = adjusted) %>%
  arrange(date) %>%
  drop_na()

# ── Convert to xts for time-series operations ─────────────────────────────────
prices_xts <- prices_wide %>%
  tk_xts(date_var = date)

cat("=== First 6 rows ===\n");  print(head(prices_xts))

## === First 6 rows ===

##                 SPY      QQQ      EEM      IWM      EFA      TLT      IYR
## 2010-01-04 84.79637 40.29079 30.35151 51.36656 35.12844 55.70955 26.76810
## 2010-01-05 85.02084 40.29079 30.57181 51.18993 35.15940 56.06933 26.83238
## 2010-01-06 85.08070 40.04778 30.63577 51.14178 35.30801 55.31875 26.82070
## 2010-01-07 85.43987 40.07380 30.45810 51.51911 35.17179 55.41177 27.06028
## 2010-01-08 85.72417 40.40363 30.69973 51.80012 35.45043 55.38695 26.87914
## 2010-01-11 85.84392 40.23871 30.63577 51.59137 35.74146 55.08305 27.00768
##               GLD
## 2010-01-04 109.80
## 2010-01-05 109.70
## 2010-01-06 111.51
## 2010-01-07 110.82
## 2010-01-08 111.37
## 2010-01-11 112.85

cat("\n=== Last 6 rows ===\n");  print(tail(prices_xts))

## 
## === Last 6 rows ===

##               SPY    QQQ   EEM    IWM    EFA   TLT    IYR    GLD
## 2026-06-01 758.54 742.74 70.08 288.98 104.44 85.47  99.60 411.26
## 2026-06-02 759.57 746.16 70.80 291.66 105.02 85.65  99.99 411.95
## 2026-06-03 754.24 744.21 69.92 287.67 104.12 85.31 100.00 407.87
## 2026-06-04 757.09 740.61 69.10 292.01 104.95 85.50 101.79 411.27
## 2026-06-05 737.55 705.06 64.59 281.65 102.26 85.06 102.54 396.24
## 2026-06-08 739.22 716.07 65.75 284.11 102.88 84.62 101.08 397.27

cat("\nDate range:", as.character(index(prices_xts)[1]),
    "to", as.character(tail(index(prices_xts), 1)), "\n")

## 
## Date range: 2010-01-04 to 2026-06-08

cat("Total trading days:", nrow(prices_xts), "\n")

## Total trading days: 4132

2.2 Q2. Calculate Weekly and Monthly Simple Returns

# ── Helper: take last price in each period, then compute simple returns ────────
last_price <- function(x) x[nrow(x), ]   # last row of each period's xts chunk

# ── Weekly prices (last day of each week) → simple returns ────────────────────
weekly_prices_xts  <- apply.weekly(prices_xts, last_price)
weekly_returns_xts <- na.omit(diff(weekly_prices_xts) / lag(weekly_prices_xts))

# ── Monthly prices (last day of each month) → simple returns ──────────────────
monthly_prices_xts  <- apply.monthly(prices_xts, last_price)
monthly_returns_xts <- na.omit(diff(monthly_prices_xts) / lag(monthly_prices_xts))

cat("=== Weekly Returns — first 6 rows ===\n")

## === Weekly Returns — first 6 rows ===

print(head(weekly_returns_xts))

##                     SPY          QQQ         EEM         IWM          EFA
## 2010-01-15 -0.008117274 -0.015038000 -0.02893523 -0.01301928 -0.003493338
## 2010-01-22 -0.038982822 -0.036858727 -0.05578098 -0.03062206 -0.055740448
## 2010-01-29 -0.016665243 -0.031024095 -0.03357741 -0.02624321 -0.025803306
## 2010-02-05 -0.006797549  0.004440714 -0.02821293 -0.01397445 -0.019054682
## 2010-02-12  0.012938261  0.018147854  0.03333316  0.02952577  0.005244719
## 2010-02-19  0.028693430  0.024451639  0.02445390  0.03343159  0.022995052
##                      TLT          IYR          GLD
## 2010-01-15  2.004758e-02 -0.006304884 -0.004579349
## 2010-01-22  1.010031e-02 -0.041784809 -0.033285246
## 2010-01-29  3.370717e-03 -0.008447715 -0.011290465
## 2010-02-05 -5.469523e-05  0.003223659 -0.012080019
## 2010-02-12 -1.946084e-02 -0.007574117  0.022544905
## 2010-02-19 -8.205046e-03  0.050185104  0.022701796

cat("\n=== Monthly Returns — first 6 rows ===\n")

## 
## === Monthly Returns — first 6 rows ===

print(head(monthly_returns_xts))

##                    SPY         QQQ          EEM         IWM          EFA
## 2010-02-26  0.03119470  0.04603932  0.017764057  0.04475104  0.002668091
## 2010-03-31  0.06087995  0.07710857  0.081108848  0.08230746  0.063853954
## 2010-04-30  0.01547015  0.02242592 -0.001662003  0.05678411 -0.028045885
## 2010-05-28 -0.07945472 -0.07392399 -0.093935818 -0.07536610 -0.111927960
## 2010-06-30 -0.05174120 -0.05975707 -0.013986781 -0.07743406 -0.020619162
## 2010-07-30  0.06830097  0.07258294  0.109325106  0.06730933  0.116103515
##                     TLT         IYR          GLD
## 2010-02-26 -0.003424947  0.05457101  0.032748219
## 2010-03-31 -0.020573580  0.09748411 -0.004386396
## 2010-04-30  0.033218124  0.06388143  0.058834363
## 2010-05-28  0.051083610 -0.05683504  0.030513147
## 2010-06-30  0.057978468 -0.04670166  0.023553189
## 2010-07-30 -0.009463887  0.09404773 -0.050871157

cat(sprintf("\nWeekly:  %d observations\n", nrow(weekly_returns_xts)))

## 
## Weekly:  857 observations

cat(sprintf("Monthly: %d observations\n",  nrow(monthly_returns_xts)))

## Monthly: 197 observations

2.3 Q3. Convert Monthly Returns to Tibble Format

# ── Convert xts → tibble using tk_tbl, rename index to 'date' ─────────────────
monthly_returns_tbl <- tk_tbl(monthly_returns_xts, rename_index = "date")

# ── Coerce yearmon index to proper Date (first day of month) ──────────────────
monthly_returns_tbl <- monthly_returns_tbl %>%
  mutate(date = as.Date(as.yearmon(date)))

cat("=== Monthly Returns Tibble — first 6 rows ===\n")

## === Monthly Returns Tibble — first 6 rows ===

print(head(monthly_returns_tbl))

## # A tibble: 6 × 9
##   date           SPY     QQQ      EEM     IWM      EFA      TLT     IYR      GLD
##   <date>       <dbl>   <dbl>    <dbl>   <dbl>    <dbl>    <dbl>   <dbl>    <dbl>
## 1 2010-02-01  0.0312  0.0460  0.0178   0.0448  0.00267 -0.00342  0.0546  0.0327 
## 2 2010-03-01  0.0609  0.0771  0.0811   0.0823  0.0639  -0.0206   0.0975 -0.00439
## 3 2010-04-01  0.0155  0.0224 -0.00166  0.0568 -0.0280   0.0332   0.0639  0.0588 
## 4 2010-05-01 -0.0795 -0.0739 -0.0939  -0.0754 -0.112    0.0511  -0.0568  0.0305 
## 5 2010-06-01 -0.0517 -0.0598 -0.0140  -0.0774 -0.0206   0.0580  -0.0467  0.0236 
## 6 2010-07-01  0.0683  0.0726  0.109    0.0673  0.116   -0.00946  0.0940 -0.0509

cat(sprintf("\nClass:      %s\n", class(monthly_returns_tbl)[1]))

## 
## Class:      tbl_df

cat(sprintf("Dimensions: %d rows x %d columns\n",
            nrow(monthly_returns_tbl), ncol(monthly_returns_tbl)))

## Dimensions: 197 rows x 9 columns

cat(sprintf("Date range: %s  to  %s\n",
            min(monthly_returns_tbl$date),
            max(monthly_returns_tbl$date)))

## Date range: 2010-02-01  to  2026-06-01

2.4 Q4. Download Fama-French 3-Factor Data

Download directly from Ken French’s data library, parse the raw text, and convert from percentage to decimal.

# ── Download zip → unzip → read all lines ─────────────────────────────────────
ff3_url    <- paste0("https://mba.tuck.dartmouth.edu/pages/faculty/",
                     "ken.french/ftp/F-F_Research_Data_Factors_CSV.zip")
tmp_dir    <- tempdir()
tmp_zip    <- file.path(tmp_dir, "ff3.zip")

download.file(ff3_url, destfile = tmp_zip, mode = "wb", quiet = TRUE)

# Unzip and find the CSV inside
unzip(tmp_zip, exdir = tmp_dir)
csv_file <- list.files(tmp_dir,
                       pattern = "F-F_Research_Data_Factors.*\\.CSV$",
                       full.names = TRUE,
                       ignore.case = TRUE)[1]

raw_lines <- readLines(csv_file, warn = FALSE)

# ── Find the monthly data block ───────────────────────────────────────────────
# Header line contains "Mkt-RF"
hdr  <- grep("Mkt-RF", raw_lines)[1]
# Annual section starts with a blank line followed by a year label >= 4 chars
# Safe fallback: find first line after data that is blank or non-numeric
data_start <- hdr + 1
data_end   <- length(raw_lines)

for (k in data_start:length(raw_lines)) {
  ln <- trimws(raw_lines[k])
  # Stop at blank line that signals end of monthly block
  if (nchar(ln) == 0) { data_end <- k - 1; break }
  # Stop at "Annual Factors" label
  if (grepl("Annual", ln, ignore.case = TRUE)) { data_end <- k - 1; break }
}

monthly_lines <- raw_lines[data_start:data_end]
monthly_lines <- monthly_lines[nzchar(trimws(monthly_lines))]

# ── Parse line by line — avoid any ambiguous read.csv coercion ────────────────
parse_ff3_line <- function(line) {
  parts <- strsplit(trimws(line), "[,\\s]+")[[1]]
  parts <- parts[nzchar(parts)]
  if (length(parts) < 5) return(NULL)
  list(
    yyyymm = trimws(parts[1]),
    Mkt_RF = as.numeric(parts[2]),
    SMB    = as.numeric(parts[3]),
    HML    = as.numeric(parts[4]),
    RF     = as.numeric(parts[5])
  )
}

parsed <- lapply(monthly_lines, parse_ff3_line)
parsed <- Filter(Negate(is.null), parsed)

ff3_monthly <- tibble(
  yyyymm = sapply(parsed, `[[`, "yyyymm"),
  Mkt_RF = sapply(parsed, `[[`, "Mkt_RF"),
  SMB    = sapply(parsed, `[[`, "SMB"),
  HML    = sapply(parsed, `[[`, "HML"),
  RF     = sapply(parsed, `[[`, "RF")
) %>%
  mutate(
    # yyyymm is a 6-character string like "201001" — parse safely
    date = as.Date(paste0(substr(yyyymm, 1, 4), "-",
                          substr(yyyymm, 5, 6), "-01")),
    Mkt_RF = Mkt_RF / 100,
    SMB    = SMB    / 100,
    HML    = HML    / 100,
    RF     = RF     / 100
  ) %>%
  select(date, Mkt_RF, SMB, HML, RF) %>%
  filter(!is.na(date),
         date >= as.Date("2010-01-01"),
         date <= Sys.Date()) %>%
  drop_na()

cat("=== Fama-French 3 Factors (first 6 rows) ===\n")

## === Fama-French 3 Factors (first 6 rows) ===

print(head(ff3_monthly))

## # A tibble: 6 × 5
##   date        Mkt_RF     SMB     HML     RF
##   <date>       <dbl>   <dbl>   <dbl>  <dbl>
## 1 2010-01-01 -0.0335  0.0043  0.0033 0     
## 2 2010-02-01  0.0339  0.0118  0.0318 0     
## 3 2010-03-01  0.063   0.0146  0.0219 0.0001
## 4 2010-04-01  0.0199  0.0484  0.0296 0.0001
## 5 2010-05-01 -0.079   0.0013 -0.0248 0.0001
## 6 2010-06-01 -0.0556 -0.0179 -0.0473 0.0001

cat(sprintf("\nDate range:   %s  to  %s\n",
            min(ff3_monthly$date), max(ff3_monthly$date)))

## 
## Date range:   2010-01-01  to  2026-04-01

cat(sprintf("Observations: %d months\n", nrow(ff3_monthly)))

## Observations: 196 months

2.5 Q5. Merge Monthly Returns with FF3 Factors

# ── Align both tibbles to first-of-month dates ─────────────────────────────────
ret_aligned <- monthly_returns_tbl %>%
  mutate(date = floor_date(date, "month"))

ff3_aligned <- ff3_monthly %>%
  mutate(date = floor_date(date, "month"))

# ── Inner join on date ─────────────────────────────────────────────────────────
merged_tbl <- inner_join(ret_aligned, ff3_aligned, by = "date") %>%
  arrange(date)

cat("=== Merged Tibble (first 6 rows) ===\n")

## === Merged Tibble (first 6 rows) ===

print(head(merged_tbl))

## # A tibble: 6 × 13
##   date           SPY     QQQ      EEM     IWM      EFA      TLT     IYR      GLD
##   <date>       <dbl>   <dbl>    <dbl>   <dbl>    <dbl>    <dbl>   <dbl>    <dbl>
## 1 2010-02-01  0.0312  0.0460  0.0178   0.0448  0.00267 -0.00342  0.0546  0.0327 
## 2 2010-03-01  0.0609  0.0771  0.0811   0.0823  0.0639  -0.0206   0.0975 -0.00439
## 3 2010-04-01  0.0155  0.0224 -0.00166  0.0568 -0.0280   0.0332   0.0639  0.0588 
## 4 2010-05-01 -0.0795 -0.0739 -0.0939  -0.0754 -0.112    0.0511  -0.0568  0.0305 
## 5 2010-06-01 -0.0517 -0.0598 -0.0140  -0.0774 -0.0206   0.0580  -0.0467  0.0236 
## 6 2010-07-01  0.0683  0.0726  0.109    0.0673  0.116   -0.00946  0.0940 -0.0509 
## # ℹ 4 more variables: Mkt_RF <dbl>, SMB <dbl>, HML <dbl>, RF <dbl>

cat(sprintf("\nDimensions: %d rows x %d columns\n",
            nrow(merged_tbl), ncol(merged_tbl)))

## 
## Dimensions: 195 rows x 13 columns

cat(sprintf("Date range: %s  to  %s\n",
            min(merged_tbl$date), max(merged_tbl$date)))

## Date range: 2010-02-01  to  2026-04-01

2.6 Q6. CAPM-Based GMV Portfolio — Single Period 2015/02

Estimate the CAPM covariance matrix using 60 months of data (2010/02–2015/01), solve for Global Minimum Variance (GMV) weights, then compute the realized return in 2015/02.

# ── Asset columns ──────────────────────────────────────────────────────────────
asset_cols <- tickers

# ── GMV solver: long-only constrained quadratic program ───────────────────────
solve_gmv <- function(cov_mat) {
  n    <- ncol(cov_mat)
  Dmat <- 2 * cov_mat
  dvec <- rep(0, n)
  # Constraint 1: sum(w) = 1 (equality)
  # Constraints 2..n+1: w_i >= 0 (long-only)
  Amat <- cbind(rep(1, n), diag(n))
  bvec <- c(1, rep(0, n))
  sol  <- quadprog::solve.QP(Dmat, dvec, Amat, bvec, meq = 1)
  w    <- sol$solution
  w[w < 1e-8] <- 0          # clean up numerical near-zeros
  w    <- w / sum(w)         # renormalise to ensure exact sum = 1
  names(w) <- colnames(cov_mat)
  w
}

# ── Training window: 2010/02 – 2015/01 (60 months) ────────────────────────────
train_capm <- merged_tbl %>%
  filter(date >= as.Date("2010-02-01"),
         date <= as.Date("2015-01-01"))

cat(sprintf("Training rows: %d months\n", nrow(train_capm)))

## Training rows: 60 months

# ── CAPM covariance matrix construction ───────────────────────────────────────
# For each asset: excess_return_i = alpha_i + beta_i * Mkt_RF + epsilon_i
# Cov_CAPM = B * var(Mkt) * B' + diag(var(epsilon))

betas_capm <- sapply(asset_cols, function(tk) {
  er  <- train_capm[[tk]] - train_capm$RF
  fit <- lm(er ~ train_capm$Mkt_RF)
  coef(fit)["train_capm$Mkt_RF"]
})

resid_capm <- sapply(asset_cols, function(tk) {
  er  <- train_capm[[tk]] - train_capm$RF
  fit <- lm(er ~ train_capm$Mkt_RF)
  residuals(fit)
})

mkt_var    <- var(train_capm$Mkt_RF)
resid_vars <- apply(resid_capm, 2, var)

# Systematic covariance + diagonal residual variance
cov_capm <- outer(betas_capm, betas_capm) * mkt_var
diag(cov_capm) <- diag(cov_capm) + resid_vars
colnames(cov_capm) <- rownames(cov_capm) <- asset_cols

# ── Solve for GMV weights ──────────────────────────────────────────────────────
gmv_w_capm <- solve_gmv(cov_capm)

cat("\n=== CAPM GMV Weights (as of 2015/01) ===\n")

## 
## === CAPM GMV Weights (as of 2015/01) ===

w_tbl <- tibble(Asset = names(gmv_w_capm),
                Weight = round(gmv_w_capm, 4),
                Pct    = paste0(round(gmv_w_capm * 100, 2), "%"))
print(w_tbl)

## # A tibble: 8 × 3
##   Asset Weight Pct   
##   <chr>  <dbl> <chr> 
## 1 SPY   0.447  44.71%
## 2 QQQ   0      0%    
## 3 EEM   0      0%    
## 4 IWM   0      0%    
## 5 EFA   0      0%    
## 6 TLT   0.448  44.83%
## 7 IYR   0.0373 3.73% 
## 8 GLD   0.0673 6.73%

cat(sprintf("Sum of weights: %.6f\n", sum(gmv_w_capm)))

## Sum of weights: 1.000000

# ── Realized return in 2015/02 ────────────────────────────────────────────────
ret_feb2015 <- merged_tbl %>%
  filter(date == as.Date("2015-02-01")) %>%
  select(all_of(asset_cols)) %>%
  unlist()

realized_capm <- sum(gmv_w_capm * ret_feb2015)
cat(sprintf("\n=== Realized GMV Portfolio Return (CAPM) in 2015/02: %.4f%%  ===\n",
            realized_capm * 100))

## 
## === Realized GMV Portfolio Return (CAPM) in 2015/02: -0.7330%  ===

2.7 Q7. Fama-French 3-Factor GMV Portfolio — Single Period 2015/02

Same training window (2010/02–2015/01) but the covariance matrix is estimated using the FF3 three-factor model.

# ── FF3 covariance matrix construction ────────────────────────────────────────
# excess_return_i = alpha_i + b1*Mkt_RF + b2*SMB + b3*HML + epsilon_i
# Cov_FF3 = B * Cov(factors) * B' + diag(var(epsilon))

betas_ff3 <- sapply(asset_cols, function(tk) {
  er  <- train_capm[[tk]] - train_capm$RF
  fit <- lm(er ~ Mkt_RF + SMB + HML, data = train_capm)
  coef(fit)[c("Mkt_RF", "SMB", "HML")]
})
# betas_ff3 is 3 x 8 matrix (rows = factors, cols = assets)

resid_ff3 <- sapply(asset_cols, function(tk) {
  er  <- train_capm[[tk]] - train_capm$RF
  fit <- lm(er ~ Mkt_RF + SMB + HML, data = train_capm)
  residuals(fit)
})

cov_factors <- cov(train_capm %>% select(Mkt_RF, SMB, HML) %>% as.matrix())

cov_ff3 <- t(betas_ff3) %*% cov_factors %*% betas_ff3
resid_vars_ff3 <- apply(resid_ff3, 2, var)
diag(cov_ff3) <- diag(cov_ff3) + resid_vars_ff3
colnames(cov_ff3) <- rownames(cov_ff3) <- asset_cols

# ── Solve for GMV weights ──────────────────────────────────────────────────────
gmv_w_ff3 <- solve_gmv(cov_ff3)

cat("=== FF3 GMV Weights (as of 2015/01) ===\n")

## === FF3 GMV Weights (as of 2015/01) ===

w_tbl_ff3 <- tibble(Asset      = names(gmv_w_ff3),
                    Weight_FF3  = round(gmv_w_ff3, 4),
                    Pct_FF3     = paste0(round(gmv_w_ff3 * 100, 2), "%"))
print(w_tbl_ff3)

## # A tibble: 8 × 3
##   Asset Weight_FF3 Pct_FF3
##   <chr>      <dbl> <chr>  
## 1 SPY       0.458  45.79% 
## 2 QQQ       0      0%     
## 3 EEM       0      0%     
## 4 IWM       0      0%     
## 5 EFA       0      0%     
## 6 TLT       0.451  45.07% 
## 7 IYR       0.0334 3.34%  
## 8 GLD       0.0581 5.81%

cat(sprintf("Sum of weights: %.6f\n", sum(gmv_w_ff3)))

## Sum of weights: 1.000000

# ── Realized return in 2015/02 ────────────────────────────────────────────────
realized_ff3 <- sum(gmv_w_ff3 * ret_feb2015)
cat(sprintf("\n=== Realized GMV Portfolio Return (FF3) in 2015/02: %.4f%%  ===\n",
            realized_ff3 * 100))

## 
## === Realized GMV Portfolio Return (FF3) in 2015/02: -0.6224%  ===

# ── Side-by-side weight comparison ────────────────────────────────────────────
cat("\n=== Weight Comparison: CAPM vs FF3 (2015/01) ===\n")

## 
## === Weight Comparison: CAPM vs FF3 (2015/01) ===

comparison <- tibble(
  Asset       = asset_cols,
  CAPM_Weight = round(gmv_w_capm, 4),
  FF3_Weight  = round(gmv_w_ff3, 4),
  Difference  = round(gmv_w_ff3 - gmv_w_capm, 4)
)
print(comparison)

## # A tibble: 8 × 4
##   Asset CAPM_Weight FF3_Weight Difference
##   <chr>       <dbl>      <dbl>      <dbl>
## 1 SPY        0.447      0.458      0.0108
## 2 QQQ        0          0          0     
## 3 EEM        0          0          0     
## 4 IWM        0          0          0     
## 5 EFA        0          0          0     
## 6 TLT        0.448      0.451      0.0024
## 7 IYR        0.0373     0.0334    -0.0039
## 8 GLD        0.0673     0.0581    -0.0092

cat(sprintf("\nRealized return in 2015/02:\n"))

## 
## Realized return in 2015/02:

cat(sprintf("  CAPM GMV: %.4f%%\n", realized_capm * 100))

##   CAPM GMV: -0.7330%

cat(sprintf("  FF3  GMV: %.4f%%\n", realized_ff3  * 100))

##   FF3  GMV: -0.6224%

2.8 Q8. Rolling-Window Backtest: CAPM vs FF3 GMV (2015/02 – 2026/05)

Using a rolling 60-month estimation window, compute monthly GMV portfolio returns for both CAPM and FF3 models from 2015/02 to 2026/05, then plot cumulative performance.

# ── All available monthly dates ────────────────────────────────────────────────
all_dates    <- sort(unique(merged_tbl$date))
start_invest <- as.Date("2015-02-01")
end_invest   <- as.Date("2026-05-01")
invest_dates <- all_dates[all_dates >= start_invest & all_dates <= end_invest]

cat(sprintf("Investment months: %d  (%s to %s)\n",
            length(invest_dates),
            min(invest_dates), max(invest_dates)))

## Investment months: 135  (2015-02-01 to 2026-04-01)

# ── Rolling GMV backtest function ─────────────────────────────────────────────
roll_gmv_backtest <- function(data, model = c("capm", "ff3"), window = 60) {

  model       <- match.arg(model)
  port_ret    <- rep(NA_real_, length(invest_dates))
  names(port_ret) <- as.character(invest_dates)

  for (i in seq_along(invest_dates)) {

    invest_date  <- invest_dates[i]

    # All dates strictly before the investment date
    prior_dates  <- all_dates[all_dates < invest_date]
    if (length(prior_dates) < window) next

    # Take the most recent 'window' months
    train_dates  <- tail(prior_dates, window)
    train        <- data %>% filter(date %in% train_dates)
    if (nrow(train) < window) next

    # ── Build covariance matrix ────────────────────────────────────────────────
    tryCatch({
      if (model == "capm") {

        b_vec <- sapply(asset_cols, function(tk) {
          er  <- train[[tk]] - train$RF
          coef(lm(er ~ train$Mkt_RF))["train$Mkt_RF"]
        })
        e_mat <- sapply(asset_cols, function(tk) {
          er  <- train[[tk]] - train$RF
          residuals(lm(er ~ train$Mkt_RF))
        })
        mv    <- var(train$Mkt_RF)
        cov_m <- outer(b_vec, b_vec) * mv
        diag(cov_m) <- diag(cov_m) + apply(e_mat, 2, var)

      } else {  # ff3

        b_mat <- sapply(asset_cols, function(tk) {
          er  <- train[[tk]] - train$RF
          coef(lm(er ~ Mkt_RF + SMB + HML, data = train))[
            c("Mkt_RF", "SMB", "HML")]
        })
        e_mat <- sapply(asset_cols, function(tk) {
          er  <- train[[tk]] - train$RF
          residuals(lm(er ~ Mkt_RF + SMB + HML, data = train))
        })
        cf    <- cov(train %>% select(Mkt_RF, SMB, HML) %>% as.matrix())
        cov_m <- t(b_mat) %*% cf %*% b_mat
        diag(cov_m) <- diag(cov_m) + apply(e_mat, 2, var)
      }

      colnames(cov_m) <- rownames(cov_m) <- asset_cols

      # Ensure positive-definiteness (add small ridge if needed)
      min_eig <- min(eigen(cov_m, only.values = TRUE)$values)
      if (min_eig < 1e-8) {
        cov_m <- cov_m + diag(abs(min_eig) + 1e-6, nrow(cov_m))
      }

      w_opt   <- solve_gmv(cov_m)

      # Realized return for investment month
      ret_row <- data %>%
        filter(date == invest_date) %>%
        select(all_of(asset_cols)) %>%
        unlist()

      if (length(ret_row) == length(asset_cols) &&
          !any(is.na(ret_row))) {
        port_ret[i] <- sum(w_opt * ret_row)
      }

    }, error = function(e) NULL)   # silently skip if solver fails
  }
  port_ret
}

# ── Run both models ────────────────────────────────────────────────────────────
cat("Running CAPM rolling backtest...\n")

## Running CAPM rolling backtest...

ret_capm_bt <- roll_gmv_backtest(merged_tbl, model = "capm")

cat("Running FF3 rolling backtest...\n")

## Running FF3 rolling backtest...

ret_ff3_bt  <- roll_gmv_backtest(merged_tbl, model = "ff3")

# ── Assemble results ───────────────────────────────────────────────────────────
backtest_tbl <- tibble(
  date     = invest_dates,
  CAPM_GMV = ret_capm_bt,
  FF3_GMV  = ret_ff3_bt
) %>% drop_na()

cat(sprintf("\nValid backtest observations: %d months\n", nrow(backtest_tbl)))

## 
## Valid backtest observations: 135 months

# ── Cumulative returns (base = 1 at start) ─────────────────────────────────────
backtest_cum <- backtest_tbl %>%
  mutate(
    Cum_CAPM = cumprod(1 + CAPM_GMV),
    Cum_FF3  = cumprod(1 + FF3_GMV)
  )

# ── Plot cumulative returns ────────────────────────────────────────────────────
backtest_cum %>%
  select(date, Cum_CAPM, Cum_FF3) %>%
  pivot_longer(cols      = -date,
               names_to  = "Model",
               values_to = "Cumulative_Return") %>%
  mutate(Model = case_when(
    Model == "Cum_CAPM" ~ "GMV – CAPM",
    Model == "Cum_FF3"  ~ "GMV – Fama-French 3-Factor",
    TRUE ~ Model
  )) %>%
  ggplot(aes(x = date, y = Cumulative_Return,
             color = Model, linetype = Model)) +
  geom_line(linewidth = 1.2) +
  scale_y_continuous(
    labels = function(x) paste0(round((x - 1) * 100, 0), "%"),
    name   = "Cumulative Return above initial investment"
  ) +
  scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
  scale_color_manual(
    values = c("GMV – CAPM"                  = "#2166ac",
               "GMV – Fama-French 3-Factor"  = "#d73027")
  ) +
  scale_linetype_manual(
    values = c("GMV – CAPM"                  = "solid",
               "GMV – Fama-French 3-Factor"  = "dashed")
  ) +
  labs(
    title    = "Cumulative Returns: CAPM vs Fama-French 3-Factor GMV Portfolios",
    subtitle = "Rolling 60-month estimation window | Long-only | 2015/02 – 2026/05",
    x        = NULL,
    color    = NULL,
    linetype = NULL
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title      = element_text(face = "bold", size = 14),
    plot.subtitle   = element_text(color = "grey40", size = 11),
    legend.position = "bottom",
    panel.grid.minor = element_blank(),
    axis.text.x     = element_text(angle = 45, hjust = 1)
  )

2.9 Summary Statistics: Backtest Performance

# ── Compute all performance metrics cleanly ────────────────────────────────────
calc_stats <- function(ret_vec) {
  ret_vec <- na.omit(ret_vec)
  cum     <- cumprod(1 + ret_vec)
  dd      <- cum / cummax(cum) - 1

  tibble(
    `Annualised Return (%)`    = round(mean(ret_vec) * 12 * 100, 2),
    `Annualised Std Dev (%)`   = round(sd(ret_vec) * sqrt(12) * 100, 2),
    `Sharpe Ratio (ann.)`      = round(
                                   (mean(ret_vec) * 12) /
                                   (sd(ret_vec) * sqrt(12)), 3),
    `Max Drawdown (%)`         = round(min(dd) * 100, 2),
    `Best Month (%)`           = round(max(ret_vec) * 100, 2),
    `Worst Month (%)`          = round(min(ret_vec) * 100, 2),
    `% Positive Months`        = round(mean(ret_vec > 0) * 100, 1)
  )
}

stats_capm <- calc_stats(backtest_tbl$CAPM_GMV)
stats_ff3  <- calc_stats(backtest_tbl$FF3_GMV)

summary_tbl <- bind_rows(
  stats_capm %>% mutate(Model = "CAPM GMV",         .before = 1),
  stats_ff3  %>% mutate(Model = "FF3 GMV",           .before = 1)
)

knitr::kable(
  summary_tbl,
  caption = "Performance Summary: CAPM vs FF3 GMV Portfolios (2015/02 – 2026/05)",
  align   = c("l", rep("r", ncol(summary_tbl) - 1))
)

Performance Summary: CAPM vs FF3 GMV Portfolios (2015/02 – 2026/05)
Model	Annualised Return (%)	Annualised Std Dev (%)	Sharpe Ratio (ann.)	Max Drawdown (%)	Best Month (%)	Worst Month (%)	% Positive Months
CAPM GMV	8.06	10.49	0.768	-25.64	9.19	-8.39	60.7
FF3 GMV	7.96	10.57	0.753	-26.78	9.73	-8.28	60.7

End of Final Exam — Portfolio Analysis 2026

Final Exam: Portfolio Analysis

Minjin

2026/06/10

1 Part I – Questions from Textbook (60%)

1.1 Chapter 7

1.1.1 CFA Problem 1

1.1.1.1 (a) Will limiting to 20 stocks likely increase or decrease portfolio risk? Explain.

1.1.1.2 (b) Is there any way Hennessy could reduce from 40 to 20 stocks without significantly affecting risk? Explain.

1.1.2 CFA Problem 2

1.1.3 CFA Problem 3

1.1.4 CFA Problem 4

1.1.5 CFA Problem 10

1.2 Chapter 8

1.2.1 CFA Problem 1

1.2.2 CFA Problem 2

1.2.3 CFA Problem 3

1.2.4 CFA Problem 4

1.2.5 CFA Problem 5

1.3 Chapter 9

1.3.1 CFA Problem 8

1.3.2 CFA Problem 9

1.3.3 CFA Problem 10

1.4 Chapter 10

1.4.1 Problem 13

1.4.2 Problem 14

1.4.3 Problem 15

1.4.4 Problem 16

2 Part II – R Code Questions (40%)

2.1 Q1. Import ETF Data from Yahoo Finance

2.2 Q2. Calculate Weekly and Monthly Simple Returns

2.3 Q3. Convert Monthly Returns to Tibble Format

2.4 Q4. Download Fama-French 3-Factor Data

2.5 Q5. Merge Monthly Returns with FF3 Factors

2.6 Q6. CAPM-Based GMV Portfolio — Single Period 2015/02

2.7 Q7. Fama-French 3-Factor GMV Portfolio — Single Period 2015/02

2.8 Q8. Rolling-Window Backtest: CAPM vs FF3 GMV (2015/02 – 2026/05)

2.9 Summary Statistics: Backtest Performance