Question 1 — Single-Factor (Market) Model [25 pts]

Model: \(R_i - R_f = \alpha + \beta(R_m - R_f) + \varepsilon\), estimated over \(n = 96\) months.

Term Estimate Std. Error
Intercept \(\alpha\) 0.0017 0.0020
Market premium \(\beta\) 0.98 0.17

\(R^2 = 0.50\), \(E[R_m - R_f] = 0.70\%\), critical \(|t| \approx 1.98\).


(a) t-statistic for β; test H₀: β = 0

Formula: \[t_{\hat\beta} = \frac{\hat\beta - 0}{SE(\hat\beta)}\]

beta_hat  <- 0.98
se_beta   <- 0.17
t_beta    <- beta_hat / se_beta
cat("t-statistic for beta:", round(t_beta, 4), "\n")
## t-statistic for beta: 5.765
cat("Critical |t|:", 1.98, "\n")
## Critical |t|: 1.98
cat("Reject H0: beta = 0?", abs(t_beta) > 1.98, "\n")
## Reject H0: beta = 0? TRUE

Result: \(t_{\hat\beta} = 5.7647\). Since \(|t| = `r round(abs(t_beta),4)| > 1.98\), we reject \(H_0: \beta = 0\) at the 5% level.

Economic interpretation of \(\hat\beta = 0.98\): The fund’s excess return moves almost one-for-one with the market premium. A 1% rise in the market excess return is associated with a 0.98% rise in the fund’s excess return, indicating near-market-average systematic sensitivity.


(b) Test H₀: β = 1

Formula: \[t = \frac{\hat\beta - 1}{SE(\hat\beta)}\]

t_beta1 <- (beta_hat - 1) / se_beta
cat("t-statistic for H0: beta = 1:", round(t_beta1, 4), "\n")
## t-statistic for H0: beta = 1: -0.1176
cat("Reject H0: beta = 1?", abs(t_beta1) > 1.98, "\n")
## Reject H0: beta = 1? FALSE

Result: \(t = -0.1176\). Since \(|t| = 0.1176 < 1.98\), we fail to reject \(H_0: \beta = 1\).

Interpretation: The data are consistent with the fund having the same systematic risk as the market. There is no statistically significant evidence that the fund is more or less aggressive than a passive index.


(c) t-statistic for α (Jensen’s Alpha)

Formula: \[t_{\hat\alpha} = \frac{\hat\alpha - 0}{SE(\hat\alpha)}\]

alpha_hat <- 0.0017
se_alpha  <- 0.0020
t_alpha   <- alpha_hat / se_alpha
cat("t-statistic for alpha:", round(t_alpha, 4), "\n")
## t-statistic for alpha: 0.85
cat("Reject H0: alpha = 0?", abs(t_alpha) > 1.98, "\n")
## Reject H0: alpha = 0? FALSE

Result: \(t_{\hat\alpha} = 0.85\). Since \(|t| = 0.85 < 1.98\), we fail to reject \(H_0: \alpha = 0\).

Conclusion: Although \(\hat\alpha = 0.0017 > 0\), the estimate is not statistically distinguishable from zero at the 5% level. The marketing team’s claim of “positive risk-adjusted performance” is not statistically justified by these data.


(d) Interpret R²

R2 <- 0.50
cat("Systematic fraction (R²):", R2, "\n")
## Systematic fraction (R²): 0.5
cat("Diversifiable fraction (1 - R²):", 1 - R2, "\n")
## Diversifiable fraction (1 - R²): 0.5

Interpretation: \(R^2 = 0.50\) means that 50% of the fund’s total return variance is explained by movements in the market factor (systematic risk). The remaining 50% is idiosyncratic (diversifiable) risk — variation unrelated to the market that could be reduced by holding a broader portfolio.


(e) CAPM-implied expected monthly excess return

Formula: \[E[R_i - R_f] = \hat\beta \times E[R_m - R_f]\]

mkt_premium <- 0.70   # percent
capm_E      <- beta_hat * mkt_premium
cat("CAPM-implied E[R_i - R_f]:", round(capm_E, 4), "%\n")
## CAPM-implied E[R_i - R_f]: 0.686 %

Result: The CAPM predicts an expected monthly excess return of 0.686% for this fund.


Question 2 — Fama–French Three-Factor Model [25 pts]

Model: \(R_i - R_f = \alpha + b \cdot MKT + s \cdot SMB + h \cdot HML + \varepsilon\), \(n = 144\) months.

Term Estimate Std. Error
Intercept \(\alpha\) 0.0029 0.0018
MKT (\(b\)) 0.97 0.08
SMB (\(s\)) 0.75 0.11
HML (\(h\)) -0.13 0.13

\(R^2 = 0.92\), Adjusted \(R^2 = 0.918\). Critical \(|t| \approx 1.98\).


(f) t-statistics for all four coefficients

Formula for each: \(t_k = \hat\theta_k / SE(\hat\theta_k)\)

coefs <- c(alpha = 0.0029, b_MKT = 0.97, s_SMB = 0.75, h_HML = -0.13)
ses   <- c(alpha = 0.0018, b_MKT = 0.08, s_SMB = 0.11, h_HML = 0.13)

t_stats    <- coefs / ses
significant <- abs(t_stats) > 1.98

results <- data.frame(
  Estimate  = coefs,
  Std.Error = ses,
  t_stat    = round(t_stats, 4),
  Significant_5pct = significant
)
print(results)
##       Estimate Std.Error t_stat Significant_5pct
## alpha   0.0029    0.0018  1.611            FALSE
## b_MKT   0.9700    0.0800 12.125             TRUE
## s_SMB   0.7500    0.1100  6.818             TRUE
## h_HML  -0.1300    0.1300 -1.000            FALSE

Summary:

  • \(\alpha\): \(t = 1.6111\)not significant
  • \(b_{MKT}\): \(t = 12.125\)significant
  • \(s_{SMB}\): \(t = 6.8182\)significant
  • \(h_{HML}\): \(t = -1\)not significant

(g) Investment style classification

cat("SMB loading (s):", coefs["s_SMB"], "-> positive => SMALL-cap tilt\n")
## SMB loading (s): 0.75 -> positive => SMALL-cap tilt
cat("HML loading (h):", coefs["h_HML"], "-> negative => GROWTH tilt\n")
## HML loading (h): -0.13 -> negative => GROWTH tilt

Style classification:

  • Size tilt: \(\hat{s} = 0.75 > 0\) and statistically significant → the fund tilts toward small-cap stocks. It co-moves positively with small stocks relative to large stocks.
  • Value/Growth tilt: \(\hat{h} = -0.13 < 0\) (but not significant) → a slight lean toward growth stocks (negative HML exposure), though the evidence is weak.

Overall, the fund resembles a small-cap growth strategy.


(h) Intercept interpretation

cat("Alpha:", coefs["alpha"], "\n")
## Alpha: 0.0029
cat("t-stat for alpha:", round(t_stats["alpha"], 4), "\n")
## t-stat for alpha: 1.611
cat("Significant?", abs(t_stats["alpha"]) > 1.98, "\n")
## Significant? FALSE

Interpretation: \(\hat\alpha = 0.0029\) corresponds to a monthly risk-adjusted return of approximately 0.29% above what the three factors explain. However, \(t = 1.6111 < 1.98\), so it is not statistically significant at the 5% level. We cannot conclude the manager adds value beyond the passive factor exposures; the positive alpha may be sampling variation.


(i) R² rise from 0.75 to 0.92; why use Adjusted R²

R2_capm <- 0.75
R2_ff3  <- 0.92
n       <- 144
k_capm  <- 1    # one predictor
k_ff3   <- 3    # three predictors

adj_R2_capm <- 1 - (1 - R2_capm) * (n - 1) / (n - k_capm - 1)
adj_R2_ff3  <- 1 - (1 - R2_ff3)  * (n - 1) / (n - k_ff3  - 1)

cat("Adjusted R² (CAPM, 1 factor):", round(adj_R2_capm, 4), "\n")
## Adjusted R² (CAPM, 1 factor): 0.7482
cat("Adjusted R² (FF3, 3 factors):", round(adj_R2_ff3,  4), "\n")
## Adjusted R² (FF3, 3 factors): 0.9183
cat("Reported Adjusted R² (FF3):", 0.918, "\n")
## Reported Adjusted R² (FF3): 0.918

Interpretation of the rise:
The jump from \(R^2 = 0.75\) (CAPM) to \(R^2 = 0.92\) (FF3) shows that SMB and HML together explain an additional 17 percentage points of return variance. The fund’s returns have important size and style exposures that a single market factor misses.

Why Adjusted R²:
Adding any predictor — even a useless one — mechanically increases \(R^2\). The adjusted metric penalises each extra degree of freedom: \[\bar{R}^2 = 1 - \frac{(1-R^2)(n-1)}{n - k - 1}\] It rises only if a new predictor improves fit more than chance would. With different numbers of predictors across models, adjusted \(R^2\) provides a fair apples-to-apples comparison (here \(\bar{R}^2 = 0.918\) for FF3 vs \(\approx 0.748\) for CAPM).


Question 3 — Logistic Regression for Market Direction [25 pts]

Model: \[\text{logit}\,P(\text{Up}) = \beta_0 + \beta_1 r_{t-1} + \beta_2 \Delta VIX_{t-1}\]

Coefficients: \(\beta_0 = -0.02\), \(\beta_1 = 5.4\), \(\beta_2 = -0.38\).
Today’s inputs: \(r_{t-1} = 0.010\), \(\Delta VIX = 1.5\).


(j) Predicted probability and class

Formula: \[\hat{P}(\text{Up}) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 r_{t-1} + \beta_2 \Delta VIX)}}\]

b0 <- -0.02; b1 <- 5.4; b2 <- -0.38
r_lag  <- 0.010
dVIX   <- 1.5

log_odds <- b0 + b1 * r_lag + b2 * dVIX
prob_up  <- 1 / (1 + exp(-log_odds))

cat("Log-odds (linear combination):", round(log_odds, 4), "\n")
## Log-odds (linear combination): -0.536
cat("Predicted P(Up):", round(prob_up, 4), "\n")
## Predicted P(Up): 0.3691
cat("Predicted class (threshold 0.5):", ifelse(prob_up >= 0.5, "Up", "Down"), "\n")
## Predicted class (threshold 0.5): Down

Result: \(\hat{P}(\text{Up}) = 0.3691\). At the 0.5 threshold the model predicts Down.


(k) Economic interpretation of β₁ and β₂

\(\beta_1 = 5.4 > 0\) (lagged return):
A positive lagged return increases the log-odds of an “Up” day. This captures short-term momentum: if yesterday’s market was up, today’s market is more likely to be up. It is consistent with momentum or trend-following behaviour at the daily frequency.

\(\beta_2 = -0.38 < 0\) (ΔVIX):
An increase in the VIX (rising fear/volatility) decreases the log-odds of an “Up” day. This captures risk-off sentiment: when implied volatility spikes, market participants expect or are experiencing negative returns, making a positive day less likely. It is consistent with the well-documented negative contemporaneous correlation between VIX changes and equity returns.


(l) Confusion matrix metrics

TP <- 67; FP <- 44; FN <- 33; TN <- 56
N  <- TP + FP + FN + TN

accuracy    <- (TP + TN) / N
sensitivity <- TP / (TP + FN)          # True Positive Rate for "Up"
specificity <- TN / (TN + FP)          # True Negative Rate
precision   <- TP / (TP + FP)          # Positive Predictive Value

cat("Accuracy:    ", round(accuracy,    4), "\n")
## Accuracy:     0.615
cat("Sensitivity: ", round(sensitivity, 4), "\n")
## Sensitivity:  0.67
cat("Specificity: ", round(specificity, 4), "\n")
## Specificity:  0.56
cat("Precision:   ", round(precision,   4), "\n")
## Precision:    0.6036
Metric Formula Value
Accuracy \((TP+TN)/N\) 0.615
Sensitivity \(TP/(TP+FN)\) 0.67
Specificity \(TN/(TN+FP)\) 0.56
Precision \(TP/(TP+FP)\) 0.6036

(m) Naive majority-class benchmark

# The dataset has 100 Up and 100 Down — perfectly balanced
majority_class    <- "Up (or Down — tie)"
naive_accuracy    <- max(100, 100) / N
cat("Naive classifier accuracy:", round(naive_accuracy, 4), "\n")
## Naive classifier accuracy: 0.5
cat("Logistic model accuracy:  ", round(accuracy, 4), "\n")
## Logistic model accuracy:   0.615
cat("Model beats naive?", accuracy > naive_accuracy, "\n")
## Model beats naive? TRUE

Result: With 100 “Up” and 100 “Down” days the majority class is a tie at 50%. The logistic model achieves 61.5% accuracy, which beats the naive baseline of 50%.

Why accuracy alone is inadequate for a trading system:

  1. Class imbalance: In live markets “Up” and “Down” days are rarely 50/50. A naive classifier predicting the dominant class achieves high accuracy trivially without any predictive content.
  2. Asymmetric payoffs: False positives (predicting “Up” but actually “Down”) cost money via losing long trades; false negatives miss profitable opportunities. These errors have very different economic consequences that accuracy weights equally.
  3. More economically relevant criterion: The Sharpe ratio of the strategy’s returns (or precision-recall trade-off / F1 score) is more useful because it directly quantifies the risk-adjusted profitability of acting on the model’s predictions. A model that is modestly accurate but correctly identifies high-return days outperforms a highly accurate model that merely predicts frequent but low-return “Up” days.

Question 4 — Resampling and Regularization [25 pts]

Setup: \(n = 48\) monthly returns, \(\bar{r} = 0.70\%\), \(\hat\sigma = 5.50\%\).


(n) Monthly and annualized Sharpe ratio

Monthly Sharpe ratio (excess return already given as mean monthly return over \(R_f\)): \[SR_{monthly} = \frac{\bar{r}}{\hat\sigma}\]

Annualized Sharpe ratio (scaling by \(\sqrt{12}\), since there are 12 months per year): \[SR_{annual} = SR_{monthly} \times \sqrt{12}\]

r_bar <- 0.70   # percent
sigma <- 5.50   # percent

SR_monthly  <- r_bar / sigma
SR_annual   <- SR_monthly * sqrt(12)

cat("Monthly Sharpe ratio:    ", round(SR_monthly, 4), "\n")
## Monthly Sharpe ratio:     0.1273
cat("Scaling factor:          ", "sqrt(12) =", round(sqrt(12), 4), "\n")
## Scaling factor:           sqrt(12) = 3.464
cat("Annualized Sharpe ratio: ", round(SR_annual, 4), "\n")
## Annualized Sharpe ratio:  0.4409

Results: \(SR_{monthly} = 0.1273\); \(SR_{annual} = 0.4409\).

The scaling factor is \(\sqrt{12}\) because if monthly returns are i.i.d., the annualised mean scales by 12 and the annualised standard deviation scales by \(\sqrt{12}\), so their ratio scales by \(12/\sqrt{12} = \sqrt{12}\).


(o) Bootstrap standard error for the Sharpe ratio

Step-by-step bootstrap procedure:

set.seed(42)

# Simulate 48 monthly returns for illustration
r_sim <- rnorm(48, mean = 0.007, sd = 0.055)

B <- 10000
SR_boot <- numeric(B)

for (b in seq_len(B)) {
  # Step 1: Draw n = 48 observations WITH replacement
  r_star <- sample(r_sim, size = length(r_sim), replace = TRUE)
  # Step 2: Compute the Sharpe ratio on the resample
  SR_boot[b] <- mean(r_star) / sd(r_star)
}

# Step 3: SE = standard deviation of bootstrap distribution
SE_SR <- sd(SR_boot)
cat("Bootstrap SE of monthly Sharpe (simulated data):", round(SE_SR, 4), "\n")
## Bootstrap SE of monthly Sharpe (simulated data): 0.149
cat("95% CI: [", round(quantile(SR_boot, 0.025), 4),
    ",", round(quantile(SR_boot, 0.975), 4), "]\n")
## 95% CI: [ -0.2097 , 0.3765 ]

Steps described:

  1. Resample: Draw \(B = 10{,}000\) bootstrap samples of size \(n = 48\) with replacement from the observed monthly returns.
  2. Compute: For each resample \(b\), compute \(SR^{*(b)} = \bar r^{*(b)} / \hat\sigma^{*(b)}\).
  3. Aggregate: The bootstrap SE is \(\widehat{SE}_{SR} = \text{sd}(\{SR^{*(b)}\}_{b=1}^B)\). A 95% CI is the 2.5th–97.5th percentiles.

Why i.i.d. bootstrap is inappropriate:
Monthly financial returns exhibit serial dependence — autocorrelation in volatility (GARCH effects), momentum, and mean-reversion. The i.i.d. bootstrap destroys the temporal dependence structure by shuffling observations randomly, which can understate or overstate the true sampling variability of the Sharpe ratio.

Fix — block bootstrap (e.g., stationary or circular block bootstrap):
Draw contiguous blocks of \(\ell\) consecutive months (e.g., \(\ell = 3\)–6) with replacement, preserving local autocorrelation. The stationary block bootstrap (Politis & Romano 1994) uses random block lengths geometrically distributed around \(\ell\), ensuring the resampled series is stationary and retaining serial structure.


(p) Choosing λ for LASSO deployment

lambda_min  <- 0.030; factors_min <- 14
lambda_1se  <- 0.065; factors_1se <- 7

cat("lambda_min: factors retained =", factors_min, "\n")
## lambda_min: factors retained = 14
cat("lambda_1se: factors retained =", factors_1se, "\n")
## lambda_1se: factors retained = 7

Recommendation: deploy \(\lambda_{1SE} = 0.065\) (7-factor model).

Rationale:

  1. Parsimony and out-of-sample stability: \(\lambda_{min}\) minimises in-sample CV error but retains 14 factors — twice as many. With only 60 candidate factors, 14 predictors risk over-fitting noise. The 1-SE rule selects the most regularised model whose CV error is within one standard error of the minimum, deliberately sacrificing a trivial amount of in-sample fit for substantially lower variance.

  2. Overfitting risk in finance: Financial factor data are collinear and noisy. Extra factors typically capture spurious patterns in the training window that do not persist out-of-sample (a well-documented problem in factor zoo research).

  3. Transaction costs and implementation: Fewer factors mean fewer positions, lower rebalancing costs, and a strategy that is easier to manage and explain.

The 1-SE solution at \(\lambda = 0.065\) retaining 7 factors is the more robust and deployable choice.


(q) Walk-forward (time-respecting) cross-validation

Walk-forward scheme:

##   Fold  Train Validate
## 1    1 t1–t24  t25–t30
## 2    2 t1–t30  t31–t36
## 3    3 t1–t36  t37–t42
## 4    4 t1–t42  t43–t48
## 5    5 t1–t48  t49–t54

Step-by-step procedure:

  1. Initial training window: Fix a minimum history (e.g., 24 months). Train the LASSO on months 1–24, tuning \(\lambda\) on a short internal validation block.
  2. One-step-ahead (or multi-step) forecast: Apply the fitted model to the next out-of-sample period (e.g., months 25–30) and record performance.
  3. Expand the window: Add the validation period to the training set; re-estimate the model.
  4. Repeat: Slide forward until the entire sample is exhausted, collecting out-of-sample returns at each fold.
  5. Aggregate: Compute overall out-of-sample Sharpe ratio, hit rate, and other metrics across all folds.

Why standard random k-fold cross-validation is unsafe:

  • Look-ahead bias: Random splits assign future observations to training folds, meaning the model is inadvertently trained on information it would not have had in real time. This produces optimistically biased performance estimates.
  • Temporal dependence: Shuffling breaks autocorrelation and volatility-clustering structure, leading to unreliable estimates of generalisation error.
  • Practical deployment mismatch: A real strategy is always trained on the past and applied to the future. Walk-forward mimics this exactly, giving an honest simulation of live trading performance.

End of examination.