1 Part 1 – Questions from Textbook (60%)


1.1 Chapter 7

1.1.1 CFA Problem 1

a. Will limiting to 20 stocks increase or decrease portfolio risk?

Limiting the portfolio from 40 to 20 stocks will likely increase risk. With fewer holdings, each stock receives a larger weight (doubling from ~2–3% to ~4–6%), meaning the portfolio becomes less diversified. Unsystematic (firm-specific) risk that was previously diversified away now has a greater impact on total portfolio variance.

b. Could Hennessy reduce to 20 stocks without significantly affecting risk?

Yes — if Hennessy selects the 20 stocks that have the lowest pairwise correlations with each other (i.e., the stocks that contribute most to diversification), the reduction in holdings may not significantly increase risk. If the 20 best-performing stocks happen to span different industries and have low inter-correlations, the diversification benefit can be largely preserved. However, this requires careful selection based on covariance structure, not just expected returns.


1.1.2 CFA Problem 2

Reducing from 20 to 10 stocks is likely less advantageous than reducing from 40 to 20, for the following reason:

Diversification benefits are not linear — they diminish as the portfolio grows. The largest risk-reduction gains come from the first few added stocks. Going from 40→20 still likely retains adequate diversification because many of those 40 stocks probably share some covariance. Going from 20→10, however, means each stock now represents 10% of the portfolio; idiosyncratic shocks to any single holding have a much larger effect. The law of diminishing returns to diversification means this further concentration yields smaller return improvement but meaningfully higher risk.

Furthermore, if Wilstead evaluates Hennessy’s portfolio independently, concentrating into 10 stocks maximizes idiosyncratic risk exposure, which Wilstead would penalize in isolation — even if the stocks are genuinely high-alpha picks.


1.1.3 CFA Problem 3

If the committee evaluates Hennessy’s portfolio as part of the total fund rather than independently, the analysis changes. The Hennessy portfolio is only one of six managed portfolios. The other five managers collectively hold 150+ stocks worth $250M. The relevant question becomes: what is Hennessy’s marginal contribution to total fund risk?

From a total-fund perspective, the other 150+ stocks already provide broad diversification. Hennessy’s portfolio is a relatively small, concentrated “active overlay.” In this view:

  • Limiting Hennessy to 20 stocks may be fine — the rest of the fund absorbs the unsystematic risk.
  • Limiting to 10 stocks may also be acceptable, since the fund as a whole remains well-diversified.
  • The committee’s concern about concentration is less pressing when viewed at the total fund level, making Jones’s proposal more supportable.

1.1.4 CFA Problem 4

Which portfolio cannot lie on the Markowitz efficient frontier?

Portfolio E(R) σ
W 15% 36%
X 12% 15%
Z 5% 7%
Y 9% 21%

Portfolio Y cannot lie on the efficient frontier.

Reasoning: Portfolio X dominates Y — X offers a higher return (12% > 9%) with lower risk (15% < 21%). On the efficient frontier, no portfolio should be dominated by another. Since Y is both lower-return and higher-risk than X, it is inefficient and cannot lie on the frontier.

Answer: d. Portfolio Y


1.1.5 CFA Problem 10

Given:

  • Stock A: σ = 40%, Stock B: σ = 20%, Stock C: σ = 40%
  • Correlations: ρ(A,B) = 0.90, ρ(A,C) = 0.50, ρ(B,C) = 0.10

Compute portfolio variance for equal-weight (50/50) portfolios:

For a two-asset equal-weight portfolio: \(\sigma_P^2 = 0.25\sigma_1^2 + 0.25\sigma_2^2 + 2(0.25)\rho_{12}\sigma_1\sigma_2\)

Portfolio A+B:

\[\sigma_{AB}^2 = 0.25(40)^2 + 0.25(20)^2 + 0.5 \times 0.90 \times 40 \times 20\] \[= 400 + 100 + 360 = 860 \Rightarrow \sigma_{AB} \approx 29.3\%\]

Portfolio B+C:

\[\sigma_{BC}^2 = 0.25(20)^2 + 0.25(40)^2 + 0.5 \times 0.10 \times 20 \times 40\] \[= 100 + 400 + 40 = 540 \Rightarrow \sigma_{BC} \approx 23.2\%\]

Recommendation: Since we have no information on expected returns for A, B, or C, the decision is based solely on risk. Portfolio B+C has substantially lower variance (540 vs. 860) due to the very low correlation between B and C (ρ = 0.10 vs. 0.90 for A and B). A risk-averse investor should prefer Portfolio B+C.


1.2 Chapter 8

1.2.1 CFA Problem 1

Regression results (5-year, monthly excess returns):

Statistic ABC XYZ
Alpha −3.20% 7.3%
Beta 0.60 0.97
0.35 0.17
Residual SD 13.02% 21.45%

Interpretation:

  • ABC: Negative alpha (−3.2%) means ABC underperformed the market on a risk-adjusted basis. Beta of 0.60 means it moves less than the market. R² = 0.35 implies 35% of ABC’s variance is explained by market movements; the rest is firm-specific.
  • XYZ: Positive alpha (7.3%) suggests outperformance, but R² = 0.17 is very low — only 17% of XYZ’s variance is systematic. The high residual SD (21.45%) means most of XYZ’s risk is idiosyncratic.

Implications for a diversified portfolio:

In a diversified portfolio, unsystematic risk is effectively eliminated. Therefore:

  • The alpha figures from the 5-year regression have limited predictive value — alpha tends to revert toward zero.
  • Recent brokerage house betas (last 2 years) for ABC range from 0.62–0.71 and for XYZ 1.25–1.45, suggesting beta estimates are unstable over time.
  • For future risk-return projections, the more recent betas are likely more relevant, and both alphas should be treated with skepticism (mean reversion).

1.2.2 CFA Problem 2

Baker Fund: correlation with market = 0.70

\[R^2 = \rho^2 = (0.70)^2 = 0.49\]

Systematic risk explains 49% of total variance.

\[\text{Nonsystematic (specific) risk} = 1 - R^2 = 1 - 0.49 = \mathbf{51\%}\]

51% of Baker Fund’s total risk is nonsystematic (firm-specific).


1.2.3 CFA Problem 3

Charlottesville International Fund: correlation with world index = 1.0

Since ρ = 1.0, the fund moves perfectly with the world market index. Using CAPM:

\[E(r) = r_f + \beta[E(r_M) - r_f]\]

\[9\% = 3\% + \beta(11\% - 3\%)\]

\[6\% = \beta \times 8\%\]

\[\beta = \frac{6}{8} = \mathbf{0.75}\]

The implied beta of Charlottesville International is 0.75.


1.2.4 CFA Problem 4

Answer: d. Systematic risk.

Beta is a measure of systematic risk — the sensitivity of an asset’s return to market-wide movements. It captures only the non-diversifiable component of risk.


1.2.5 CFA Problem 5

Answer: b. Only systematic risk, while standard deviation measures total risk.

Beta measures only the systematic (market) risk of an asset — the portion that cannot be diversified away. Standard deviation captures total risk (systematic + unsystematic). In a well-diversified portfolio, only systematic risk matters for pricing.


1.3 Chapter 9

Use the following data for Problems 8 and 9:

Portfolio Avg Annual Return Std Dev Beta
R 11% 10% 0.5
S&P 500 14% 12% 1.0

Note: A risk-free rate is needed to solve these — we use the S&P 500 as market proxy. From the data, we can back out r_f using the SML: E(R) = r_f + β[E(r_M) - r_f].

For S&P 500: 14% = r_f + 1.0 × (14% − r_f) → this is always satisfied. We need an assumption. Using a typical risk-free rate: let r_f = 6% (implied by the SML, solving for what makes Portfolio R fairly priced: 11% = 6% + 0.5×(14%−6%) = 6% + 4% = 10% ≠ 11%). So Portfolio R slightly exceeds the SML.


1.3.1 CFA Problem 8 – Portfolio R vs. SML

Using SML with r_f = 6%:

\[\text{SML expected return for } \beta=0.5: \quad E(r) = 6\% + 0.5(14\% - 6\%) = 6\% + 4\% = 10\%\]

Portfolio R’s actual return = 11% > 10% → Portfolio R plots above the SML.

Answer: c. Above the SML.


1.3.2 CFA Problem 9 – Portfolio R vs. CML

The CML is defined using total risk (standard deviation). The CML slope:

\[\text{CML slope} = \frac{14\% - 6\%}{12\%} = 0.667\]

\[\text{CML return at } \sigma=10\%: \quad 6\% + 0.667 \times 10\% = 12.67\%\]

Portfolio R’s actual return = 11% < 12.67% → Portfolio R plots below the CML.

Answer: b. Below the CML.


1.3.3 CFA Problem 10

Should investors expect a higher return on Portfolio A than B according to CAPM?

Portfolio A Portfolio B
Systematic risk (beta) 1.0 1.0
Specific (idiosyncratic) risk High Low

No. According to CAPM, expected return is determined solely by systematic risk (beta). Both portfolios have identical betas (1.0), so CAPM predicts the same expected return for both.

The higher specific risk in Portfolio A is entirely diversifiable — in a well-diversified portfolio, it disappears. Investors are not compensated for bearing unnecessary idiosyncratic risk. Therefore, CAPM implies E(r_A) = E(r_B).


1.4 Chapter 10

Two-factor APT model: factors are real GDP growth and inflation. Factor risk premiums: GDP = 8%, Inflation = 2% High Growth Fund sensitivities: β_GDP = 1.25, β_Inf = 1.5 Large Cap Fund: expected return = r_f + 8.5%, β_GDP = 0.75, β_Inf = 1.25 Utility Fund: β_GDP = 1.0, β_Inf = 2.0 Risk-free rate = 4%


1.4.1 Problem 13 – Expected Return of High Growth Fund (APT)

\[E(r) = r_f + \beta_{GDP} \times RP_{GDP} + \beta_{Inf} \times RP_{Inf}\]

\[E(r) = 4\% + 1.25 \times 8\% + 1.5 \times 2\%\]

\[= 4\% + 10\% + 3\% = \mathbf{17\%}\]

McCracken’s APT estimate for the High Growth Fund is 17%.


1.4.2 Problem 14 – Arbitrage Opportunity in Large Cap Fund?

APT expected return for Large Cap Fund:

\[E(r)_{APT} = 4\% + 0.75 \times 8\% + 1.25 \times 2\% = 4\% + 6\% + 2.5\% = 12.5\%\]

Kwon’s fundamental estimate: \(r_f + 8.5\% = 4\% + 8.5\% = 12.5\%\)

The APT estimate and fundamental estimate are equal (12.5%) — there is no arbitrage opportunity.

No arbitrage opportunity is available.


1.4.3 Problem 15 – Weight of Utility Fund in GDP Fund

We want a portfolio with β_GDP = 1 and β_Inf = 0, using three funds:

  • High Growth (H): β_GDP=1.25, β_Inf=1.5
  • Large Cap (L): β_GDP=0.75, β_Inf=1.25
  • Utility (U): β_GDP=1.0, β_Inf=2.0

Let weights be \(w_H, w_L, w_U\) with \(w_H + w_L + w_U = 1\).

System of equations:

\[1.25w_H + 0.75w_L + 1.0w_U = 1 \quad \text{(GDP)}\] \[1.5w_H + 1.25w_L + 2.0w_U = 0 \quad \text{(Inflation)}\] \[w_H + w_L + w_U = 1\]

From equations 1 and 3, subtract: \(0.25w_H - 0.25w_L = 0 \Rightarrow w_H = w_L\)

Substituting \(w_H = w_L\) into equation 2: \(1.5w_H + 1.25w_H + 2w_U = 0 \Rightarrow 2.75w_H + 2w_U = 0\)

From equation 3: \(2w_H + w_U = 1 \Rightarrow w_U = 1 - 2w_H\)

Substituting: \(2.75w_H + 2(1-2w_H) = 0 \Rightarrow 2.75w_H + 2 - 4w_H = 0 \Rightarrow -1.25w_H = -2 \Rightarrow w_H = 1.6\)

Then \(w_L = 1.6\), \(w_U = 1 - 3.2 = -2.2\)

Answer: (a) −2.2. The weight of the Utility Fund is −2.2 (a short position).


1.4.4 Problem 16 – Stiles vs. McCracken on GDP Fund

Answer: b. Both are correct.

  • Stiles is correct that the GDP Fund is good for retirees who need steady income — it has zero inflation sensitivity (inflation is the main threat to real purchasing power of fixed income), so it provides a stable real return.
  • McCracken is correct that the GDP Fund is good if supply-side macroeconomic policies succeed — such policies typically boost real GDP growth, benefiting a fund with high GDP sensitivity.

Both views are internally consistent and apply to different investor types or macro scenarios.



2 Part 2 – R Code Questions (40%)


2.1 Q1 – Import ETF Data

# Load required libraries
library(tidyquant)
library(tidyverse)
library(timetk)
library(lubridate)
library(xts)
library(quadprog)
library(PerformanceAnalytics)

# Define tickers
tickers <- c("SPY", "QQQ", "EEM", "IWM", "EFA", "TLT", "IYR", "GLD")

# Download daily data from Yahoo Finance (2010 to today)
prices_raw <- tq_get(tickers,
                     from = "2010-01-01",
                     to   = Sys.Date(),
                     get  = "stock.prices")

# Extract adjusted closing prices and pivot wide
prices_wide <- prices_raw %>%
  select(symbol, date, adjusted) %>%
  pivot_wider(names_from = symbol, values_from = adjusted) %>%
  arrange(date)

# Show first and last few rows
head(prices_wide)
## # A tibble: 6 × 9
##   date         SPY   QQQ   EEM   IWM   EFA   TLT   IYR   GLD
##   <date>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2010-01-04  84.8  40.3  30.4  51.4  35.1  55.7  26.8  110.
## 2 2010-01-05  85.0  40.3  30.6  51.2  35.2  56.1  26.8  110.
## 3 2010-01-06  85.1  40.0  30.6  51.1  35.3  55.3  26.8  112.
## 4 2010-01-07  85.4  40.1  30.5  51.5  35.2  55.4  27.1  111.
## 5 2010-01-08  85.7  40.4  30.7  51.8  35.5  55.4  26.9  111.
## 6 2010-01-11  85.8  40.2  30.6  51.6  35.7  55.1  27.0  113.
tail(prices_wide)
## # A tibble: 6 × 9
##   date         SPY   QQQ   EEM   IWM   EFA   TLT   IYR   GLD
##   <date>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2026-06-02  760.  746.  70.8  292.  105.  85.7 100.0  412.
## 2 2026-06-03  754.  744.  69.9  288.  104.  85.3 100    408.
## 3 2026-06-04  757.  741.  69.1  292.  105.  85.5 102.   411.
## 4 2026-06-05  738.  705.  64.6  282.  102.  85.1 103.   396.
## 5 2026-06-08  739.  716.  65.8  284.  103.  84.6 101.   397.
## 6 2026-06-09   NA    NA   NA     NA    NA   NA    NA     NA

2.2 Q2 – Calculate Weekly and Monthly Returns (Simple)

# Convert to xts — must use numeric-only matrix with Date index
prices_mat <- as.matrix(prices_wide[, tickers])   # numeric columns only
prices_idx <- as.Date(prices_wide$date)            # Date index separately
prices_xts <- xts(prices_mat, order.by = prices_idx)
storage.mode(prices_xts) <- "double"              # ensure numeric storage

# Simple-return helper: last / first - 1 (works on plain matrix subset)
simple_ret <- function(x) {
  x <- coredata(x)          # strip xts attributes → plain matrix
  x[nrow(x), ] / x[1, ] - 1
}

# Weekly simple returns
weekly_returns  <- apply.weekly(prices_xts,  simple_ret)
head(weekly_returns)
##                     SPY          QQQ         EEM          IWM          EFA
## 2010-01-08  0.010941451  0.002800325  0.01147281  0.008439678  0.009166323
## 2010-01-15 -0.009500406 -0.011000690 -0.02690804 -0.009025930 -0.011607772
## 2010-01-22 -0.050842478 -0.052157100 -0.07453300 -0.048111296 -0.064919357
## 2010-01-29 -0.021681617 -0.034303954 -0.04060169 -0.027346216 -0.037593915
## 2010-02-05 -0.022006482 -0.006472321 -0.05367578 -0.025485043 -0.036676543
## 2010-02-12  0.020304043  0.025545011  0.04371441  0.039877218  0.025565047
##                     TLT          IYR         GLD
## 2010-01-08 -0.005790086  0.004147226  0.01429872
## 2010-01-15  0.025675626 -0.011034052 -0.01763401
## 2010-01-22  0.012993233 -0.060489214 -0.03900644
## 2010-01-29  0.007971000 -0.015638792 -0.01414221
## 2010-02-05  0.009106112 -0.014699172 -0.03387170
## 2010-02-12 -0.020737498  0.015500362  0.02883506
# Monthly simple returns
monthly_returns <- apply.monthly(prices_xts, simple_ret)
head(monthly_returns)
##                     SPY         QQQ          EEM         IWM         EFA
## 2010-01-29 -0.052413389 -0.07819928 -0.103722928 -0.06048803 -0.07491636
## 2010-02-26  0.015404462  0.03467381 -0.008903632  0.03255538 -0.01534435
## 2010-03-31  0.049975622  0.06169133  0.063099285  0.05771686  0.05562897
## 2010-04-30  0.008573853  0.02242555 -0.027070942  0.04705570 -0.04493609
## 2010-05-28 -0.091233875 -0.08672134 -0.098864387 -0.09568690 -0.11824793
## 2010-06-30 -0.035514548 -0.05101652  0.004468325 -0.04856775 -0.01079312
##                     TLT         IYR          GLD
## 2010-01-29  0.027835456 -0.05195359 -0.034972713
## 2010-02-26  0.005704716  0.03573040  0.009967714
## 2010-03-31 -0.020144270  0.08633730 -0.004386396
## 2010-04-30  0.035750568  0.05898818  0.046254293
## 2010-05-28  0.052459388 -0.08516451  0.027218472
## 2010-06-30  0.050594152 -0.02782203  0.014761042

2.3 Q3 – Convert Monthly Returns to Tibble

# Convert monthly xts returns to tibble with 'date' column
monthly_tbl <- tk_tbl(monthly_returns, rename_index = "date")

# Ensure date column is Date type (end-of-month)
monthly_tbl <- monthly_tbl %>%
  mutate(date = as.Date(date))

head(monthly_tbl)
## # A tibble: 6 × 9
##   date            SPY     QQQ      EEM     IWM     EFA      TLT     IYR      GLD
##   <date>        <dbl>   <dbl>    <dbl>   <dbl>   <dbl>    <dbl>   <dbl>    <dbl>
## 1 2010-01-29 -0.0524  -0.0782 -0.104   -0.0605 -0.0749  0.0278  -0.0520 -0.0350 
## 2 2010-02-26  0.0154   0.0347 -0.00890  0.0326 -0.0153  0.00570  0.0357  0.00997
## 3 2010-03-31  0.0500   0.0617  0.0631   0.0577  0.0556 -0.0201   0.0863 -0.00439
## 4 2010-04-30  0.00857  0.0224 -0.0271   0.0471 -0.0449  0.0358   0.0590  0.0463 
## 5 2010-05-28 -0.0912  -0.0867 -0.0989  -0.0957 -0.118   0.0525  -0.0852  0.0272 
## 6 2010-06-30 -0.0355  -0.0510  0.00447 -0.0486 -0.0108  0.0506  -0.0278  0.0148

2.4 Q4 – Download Fama-French 3 Factors

# Install frenchdata if needed (fetches from CRAN)
if (!requireNamespace("frenchdata", quietly = TRUE)) {
  install.packages("frenchdata", quiet = TRUE)
}
library(frenchdata)

# Download FF3 monthly factors via frenchdata package
ff3_raw <- download_french_data("Fama/French 3 Factors")

# Extract the monthly data table (first element)
ff3_monthly_raw <- ff3_raw$subsets$data[[1]]

ff3_monthly <- ff3_monthly_raw %>%
  rename(date = date,
         Mkt.RF = `Mkt-RF`,
         SMB    = SMB,
         HML    = HML,
         RF     = RF) %>%
  mutate(
    # date column is already Date class from frenchdata; align to end-of-month
    date   = ceiling_date(as.Date(paste0(date, "01"), "%Y%m%d"), "month") - days(1),
    Mkt.RF = as.numeric(Mkt.RF) / 100,
    SMB    = as.numeric(SMB)    / 100,
    HML    = as.numeric(HML)    / 100,
    RF     = as.numeric(RF)     / 100
  ) %>%
  filter(!is.na(Mkt.RF))

head(ff3_monthly)
## # A tibble: 6 × 5
##   date        Mkt.RF     SMB     HML     RF
##   <date>       <dbl>   <dbl>   <dbl>  <dbl>
## 1 1926-07-31  0.0289 -0.0255 -0.0239 0.0022
## 2 1926-08-31  0.0264 -0.0114  0.0381 0.0025
## 3 1926-09-30  0.0038 -0.0136  0.0005 0.0023
## 4 1926-10-31 -0.0327 -0.0014  0.0082 0.0032
## 5 1926-11-30  0.0254 -0.0011 -0.0061 0.0031
## 6 1926-12-31  0.0262 -0.0007  0.0006 0.0028
tail(ff3_monthly)
## # A tibble: 6 × 5
##   date        Mkt.RF     SMB     HML     RF
##   <date>       <dbl>   <dbl>   <dbl>  <dbl>
## 1 2025-11-30 -0.0013  0.0054  0.0357 0.003 
## 2 2025-12-31 -0.0036 -0.0103  0.0236 0.0034
## 3 2026-01-31  0.0103  0.0212  0.0386 0.003 
## 4 2026-02-28 -0.0117  0.0024  0.0265 0.0028
## 5 2026-03-31 -0.0518  0.0044  0.0335 0.0029
## 6 2026-04-30  0.0994  0.0013 -0.0127 0.0029

2.5 Q5 – Merge Monthly Returns with FF3 Factors

# Align ETF monthly return dates to end-of-month for merging
monthly_tbl_eom <- monthly_tbl %>%
  mutate(date = ceiling_date(date, "month") - days(1))

# Merge
merged_tbl <- inner_join(monthly_tbl_eom, ff3_monthly, by = "date")

cat("Merged data rows:", nrow(merged_tbl), "\n")
## Merged data rows: 196
head(merged_tbl)
## # A tibble: 6 × 13
##   date            SPY     QQQ      EEM     IWM     EFA      TLT     IYR      GLD
##   <date>        <dbl>   <dbl>    <dbl>   <dbl>   <dbl>    <dbl>   <dbl>    <dbl>
## 1 2010-01-31 -0.0524  -0.0782 -0.104   -0.0605 -0.0749  0.0278  -0.0520 -0.0350 
## 2 2010-02-28  0.0154   0.0347 -0.00890  0.0326 -0.0153  0.00570  0.0357  0.00997
## 3 2010-03-31  0.0500   0.0617  0.0631   0.0577  0.0556 -0.0201   0.0863 -0.00439
## 4 2010-04-30  0.00857  0.0224 -0.0271   0.0471 -0.0449  0.0358   0.0590  0.0463 
## 5 2010-05-31 -0.0912  -0.0867 -0.0989  -0.0957 -0.118   0.0525  -0.0852  0.0272 
## 6 2010-06-30 -0.0355  -0.0510  0.00447 -0.0486 -0.0108  0.0506  -0.0278  0.0148 
## # ℹ 4 more variables: Mkt.RF <dbl>, SMB <dbl>, HML <dbl>, RF <dbl>

2.6 Q6 – CAPM Covariance Matrix & GMV Portfolio (2015/01)

# ---- Helper functions ----

# Compute CAPM-based covariance matrix
capm_cov_matrix <- function(returns_mat, ff_data) {
  n_assets <- ncol(returns_mat)
  asset_names <- colnames(returns_mat)
  
  # Align FF data with returns
  dates_use <- as.Date(rownames(returns_mat))
  ff_sub <- ff_data %>% filter(date %in% dates_use) %>% arrange(date)
  
  # Excess returns (subtract RF)
  rf_vec <- ff_sub$RF
  mkt_rf <- ff_sub$Mkt.RF
  
  excess_returns <- sweep(returns_mat, 1, rf_vec, "-")
  
  # Estimate betas via OLS for each asset
  betas <- numeric(n_assets)
  resid_var <- numeric(n_assets)
  
  for (i in seq_len(n_assets)) {
    fit <- lm(excess_returns[, i] ~ mkt_rf)
    betas[i] <- coef(fit)[2]
    resid_var[i] <- var(resid(fit))
  }
  
  var_mkt <- var(mkt_rf)
  
  # CAPM covariance matrix: Σ = β β' σ²_m + D (diagonal residual variances)
  sigma <- outer(betas, betas) * var_mkt + diag(resid_var)
  rownames(sigma) <- colnames(sigma) <- asset_names
  sigma
}

# Compute GMV weights using quadprog
gmv_weights <- function(cov_mat) {
  n <- nrow(cov_mat)
  Dmat <- 2 * cov_mat
  dvec <- rep(0, n)
  # Constraints: sum(w) = 1, w >= 0 (long-only)
  Amat <- cbind(rep(1, n), diag(n))
  bvec <- c(1, rep(0, n))
  sol  <- solve.QP(Dmat, dvec, Amat, bvec, meq = 1)
  w <- sol$solution
  names(w) <- rownames(cov_mat)
  w
}

# ---- Prepare data windows ----
# Training: 2010/02 – 2015/01 (60 months)
train_start <- as.Date("2010-01-31")
train_end   <- as.Date("2015-01-31")

# Get monthly returns matrix for training window (ETFs only)
etf_cols <- tickers

train_data <- merged_tbl %>%
  filter(date >= train_start, date <= train_end) %>%
  arrange(date)

returns_mat_train <- as.matrix(train_data[, etf_cols])
rownames(returns_mat_train) <- as.character(train_data$date)

# ---- CAPM covariance matrix ----
cov_capm <- capm_cov_matrix(returns_mat_train, ff3_monthly)
print(round(cov_capm * 10000, 4))  # in bps²
##         SPY     QQQ      EEM      IWM      EFA      TLT     IYR     GLD
## SPY 15.2562 14.2889  16.2617  17.6535  14.9034  -9.3718 12.1687  2.2586
## QQQ 14.2889 18.9481  17.2463  18.7225  15.8058  -9.9392 12.9055  2.3953
## EEM 16.2617 17.2463  34.2164  21.3073  17.9880 -11.3114 14.6873  2.7260
## IWM 17.6535 18.7225  21.3073  30.4051  19.5277 -12.2796 15.9444  2.9593
## EFA 14.9034 15.8058  17.9880  19.5277  23.8087 -10.3667 13.4605  2.4983
## TLT -9.3718 -9.9392 -11.3114 -12.2796 -10.3667  14.7310 -8.4644 -1.5710
## IYR 12.1687 12.9055  14.6873  15.9444  13.4605  -8.4644 22.2263  2.0399
## GLD  2.2586  2.3953   2.7260   2.9593   2.4983  -1.5710  2.0399 27.1632
# ---- GMV weights ----
w_capm <- gmv_weights(cov_capm)
cat("\nCAPM GMV Weights (2015/01):\n")
## 
## CAPM GMV Weights (2015/01):
print(round(w_capm, 4))
##    SPY    QQQ    EEM    IWM    EFA    TLT    IYR    GLD 
## 0.2635 0.0922 0.0041 0.0000 0.0354 0.4667 0.0589 0.0792
# ---- Realized return in 2015/02 ----
test_date <- as.Date("2015-02-28")
ret_feb2015 <- merged_tbl %>%
  filter(date == test_date) %>%
  select(all_of(etf_cols)) %>%
  as.numeric()

realized_capm <- sum(w_capm * ret_feb2015)
cat(sprintf("\nRealized GMV Portfolio Return (CAPM) in 2015/02: %.4f%%\n",
            realized_capm * 100))
## 
## Realized GMV Portfolio Return (CAPM) in 2015/02: -1.3274%

2.7 Q7 – FF3 Covariance Matrix & GMV Portfolio (2015/01)

# Compute FF3-based covariance matrix
ff3_cov_matrix <- function(returns_mat, ff_data) {
  n_assets <- ncol(returns_mat)
  asset_names <- colnames(returns_mat)
  
  dates_use <- as.Date(rownames(returns_mat))
  ff_sub <- ff_data %>% filter(date %in% dates_use) %>% arrange(date)
  
  rf_vec  <- ff_sub$RF
  mkt_rf  <- ff_sub$Mkt.RF
  smb     <- ff_sub$SMB
  hml     <- ff_sub$HML
  
  excess_returns <- sweep(returns_mat, 1, rf_vec, "-")
  
  # Factor covariance matrix (3×3)
  factors_mat <- cbind(mkt_rf, smb, hml)
  cov_factors <- cov(factors_mat)
  
  betas_mat  <- matrix(NA, nrow = n_assets, ncol = 3,
                       dimnames = list(asset_names, c("Mkt.RF","SMB","HML")))
  resid_var  <- numeric(n_assets)
  
  for (i in seq_len(n_assets)) {
    fit <- lm(excess_returns[, i] ~ mkt_rf + smb + hml)
    betas_mat[i, ] <- coef(fit)[2:4]
    resid_var[i]   <- var(resid(fit))
  }
  
  # Σ = B Σ_f B' + D
  sigma <- betas_mat %*% cov_factors %*% t(betas_mat) + diag(resid_var)
  rownames(sigma) <- colnames(sigma) <- asset_names
  sigma
}

# ---- FF3 covariance matrix ----
cov_ff3 <- ff3_cov_matrix(returns_mat_train, ff3_monthly)
print(round(cov_ff3 * 10000, 4))
##         SPY     QQQ      EEM      IWM     EFA      TLT     IYR     GLD
## SPY 15.2562 14.3486  16.2894  17.3436 15.0680  -9.2861 12.1607  2.2531
## QQQ 14.3486 18.9481  17.6841  18.5960 16.2019  -9.2667 13.0619  3.8509
## EEM 16.2894 17.6841  34.2164  21.3191 18.1860 -10.9188 14.7867  3.6315
## IWM 17.3436 18.5960  21.3191  30.4051 18.0071 -12.7352 16.1451  4.0702
## EFA 15.0680 16.2019  18.1860  18.0071 23.8087  -9.8535 13.4445  2.6835
## TLT -9.2861 -9.2667 -10.9188 -12.7352 -9.8535  14.7310 -8.3445 -0.3762
## IYR 12.1607 13.0619  14.7867  16.1451 13.4445  -8.3445 22.2263  2.4446
## GLD  2.2531  3.8509   3.6315   4.0702  2.6835  -0.3762  2.4446 27.1632
# ---- GMV weights ----
w_ff3 <- gmv_weights(cov_ff3)
cat("\nFF3 GMV Weights (2015/01):\n")
## 
## FF3 GMV Weights (2015/01):
print(round(w_ff3, 4))
##    SPY    QQQ    EEM    IWM    EFA    TLT    IYR    GLD 
## 0.3279 0.0141 0.0000 0.0363 0.0198 0.4766 0.0644 0.0609
# ---- Realized return in 2015/02 ----
realized_ff3 <- sum(w_ff3 * ret_feb2015)
cat(sprintf("\nRealized GMV Portfolio Return (FF3) in 2015/02: %.4f%%\n",
            realized_ff3 * 100))
## 
## Realized GMV Portfolio Return (FF3) in 2015/02: -1.4253%

2.8 Q8 – Backtest GMV Portfolios (2015/02 – 2026/05)

# ---- Rolling window backtest ----
# For each month t from 2015/02 to end:
#   Use months [t-60, t-1] to estimate covariance → compute GMV weights
#   Record return at month t

# Get all monthly dates in merged_tbl
all_dates <- sort(merged_tbl$date)
backtest_start <- as.Date("2015-02-28")
backtest_end   <- max(all_dates)

backtest_dates <- all_dates[all_dates >= backtest_start & all_dates <= backtest_end]

# Pre-extract returns matrix (full sample)
full_returns_mat <- as.matrix(merged_tbl[, etf_cols])
rownames(full_returns_mat) <- as.character(merged_tbl$date)

# Storage
bt_returns_capm <- numeric(length(backtest_dates))
bt_returns_ff3  <- numeric(length(backtest_dates))
names(bt_returns_capm) <- as.character(backtest_dates)
names(bt_returns_ff3)  <- as.character(backtest_dates)

for (i in seq_along(backtest_dates)) {
  t_date  <- backtest_dates[i]
  
  # Find the 60 months immediately before t_date
  prior_dates <- all_dates[all_dates < t_date]
  if (length(prior_dates) < 60) next
  window_dates <- tail(prior_dates, 60)
  
  window_char <- as.character(window_dates)
  ret_window  <- full_returns_mat[window_char, , drop = FALSE]
  
  # Skip if any NA
  if (any(is.na(ret_window))) next
  
  # CAPM covariance & weights
  tryCatch({
    cov_c <- capm_cov_matrix(ret_window, ff3_monthly)
    w_c   <- gmv_weights(cov_c)
    ret_t <- full_returns_mat[as.character(t_date), ]
    bt_returns_capm[i] <- sum(w_c * ret_t, na.rm = TRUE)
  }, error = function(e) NULL)
  
  # FF3 covariance & weights
  tryCatch({
    cov_f <- ff3_cov_matrix(ret_window, ff3_monthly)
    w_f   <- gmv_weights(cov_f)
    ret_t <- full_returns_mat[as.character(t_date), ]
    bt_returns_ff3[i] <- sum(w_f * ret_t, na.rm = TRUE)
  }, error = function(e) NULL)
}

# ---- Build cumulative return series ----
bt_tbl <- tibble(
  date      = backtest_dates,
  ret_capm  = bt_returns_capm,
  ret_ff3   = bt_returns_ff3
) %>%
  filter(ret_capm != 0 | ret_ff3 != 0) %>%
  mutate(
    cum_capm = cumprod(1 + ret_capm) - 1,
    cum_ff3  = cumprod(1 + ret_ff3)  - 1
  )

# ---- Plot cumulative returns ----
bt_tbl %>%
  select(date, cum_capm, cum_ff3) %>%
  pivot_longer(-date, names_to = "model", values_to = "cum_return") %>%
  mutate(model = recode(model,
                        cum_capm = "CAPM GMV",
                        cum_ff3  = "FF3 GMV")) %>%
  ggplot(aes(x = date, y = cum_return * 100, color = model)) +
  geom_line(linewidth = 1) +
  labs(
    title    = "Cumulative Returns: GMV Portfolios (CAPM vs. FF3)",
    subtitle = "Rolling 60-month estimation window | 2015/02 – 2026/05",
    x        = "Date",
    y        = "Cumulative Return (%)",
    color    = "Model"
  ) +
  scale_color_manual(values = c("CAPM GMV" = "#2196F3", "FF3 GMV" = "#E91E63")) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "bottom")

# ---- Summary statistics ----
summary_stats <- bt_tbl %>%
  summarise(
    CAPM_Ann_Return   = mean(ret_capm) * 12 * 100,
    FF3_Ann_Return    = mean(ret_ff3)  * 12 * 100,
    CAPM_Ann_Volatility = sd(ret_capm) * sqrt(12) * 100,
    FF3_Ann_Volatility  = sd(ret_ff3)  * sqrt(12) * 100,
    CAPM_Sharpe       = (mean(ret_capm) * 12) / (sd(ret_capm) * sqrt(12)),
    FF3_Sharpe        = (mean(ret_ff3)  * 12) / (sd(ret_ff3)  * sqrt(12))
  )

knitr::kable(t(summary_stats), digits = 4,
             col.names = "Value",
             caption   = "Backtest Summary Statistics (2015/02 – 2026/05)")
Backtest Summary Statistics (2015/02 – 2026/05)
Value
CAPM_Ann_Return 5.8670
FF3_Ann_Return 5.6818
CAPM_Ann_Volatility 10.0645
FF3_Ann_Volatility 10.1498
CAPM_Sharpe 0.5829
FF3_Sharpe 0.5598

End of Final Exam