Question 1 – Single-Factor (Market) Model

(a) t-statistic for β and test H₀: β = 0

beta_hat <- 0.98
se_beta  <- 0.17
t_crit   <- 1.98

# Formula: t = beta_hat / SE(beta)
t_beta <- beta_hat / se_beta

cat("Formula: t = beta_hat / SE(beta)\n")
## Formula: t = beta_hat / SE(beta)
cat("       t =", beta_hat, "/", se_beta, "\n")
##        t = 0.98 / 0.17
cat("       t =", round(t_beta, 4), "\n\n")
##        t = 5.7647
cat("Critical value: |t| =", t_crit, "\n")
## Critical value: |t| = 1.98
cat("Conclusion: Reject H0: beta = 0?",
    ifelse(abs(t_beta) > t_crit, "YES – beta is significant at 5%.",
           "NO – fail to reject."), "\n")
## Conclusion: Reject H0: beta = 0? YES – beta is significant at 5%.

Economic interpretation: β ≈ 0.98 means that for every 1% rise in the market excess return, the fund’s excess return tends to rise by 0.98%. The fund bears nearly the same systematic risk as the broad market. Since |t| = 5.7647 > 1.98 we reject H₀: β = 0; the market factor is a statistically significant driver of fund returns.


(b) Test H₀: β = 1

# Formula: t = (beta_hat - 1) / SE(beta)
t_beta1 <- (beta_hat - 1) / se_beta

cat("Formula: t = (beta_hat - 1) / SE(beta)\n")
## Formula: t = (beta_hat - 1) / SE(beta)
cat("       t = (", beta_hat, "- 1) /", se_beta, "\n")
##        t = ( 0.98 - 1) / 0.17
cat("       t =", round(t_beta1, 4), "\n\n")
##        t = -0.1176
cat("Critical value: |t| =", t_crit, "\n")
## Critical value: |t| = 1.98
cat("Conclusion: Reject H0: beta = 1?",
    ifelse(abs(t_beta1) > t_crit, "YES – beta differs from 1 at 5%.",
           "NO – fail to reject. Beta is not statistically different from 1."),
    "\n")
## Conclusion: Reject H0: beta = 1? NO – fail to reject. Beta is not statistically different from 1.

Conclusion: |t| = 0.1176 < 1.98. We fail to reject H₀: β = 1. The fund’s systematic risk is statistically indistinguishable from the market’s — there is no evidence that the fund is more or less aggressive than a passive index.


(c) t-statistic for α (Jensen’s Alpha)

alpha_hat <- 0.0017
se_alpha  <- 0.0020

# Formula: t = alpha_hat / SE(alpha)
t_alpha <- alpha_hat / se_alpha

cat("Formula: t = alpha_hat / SE(alpha)\n")
## Formula: t = alpha_hat / SE(alpha)
cat("       t =", alpha_hat, "/", se_alpha, "\n")
##        t = 0.0017 / 0.002
cat("       t =", round(t_alpha, 4), "\n\n")
##        t = 0.85
cat("Critical value: |t| =", t_crit, "\n")
## Critical value: |t| = 1.98
cat("Conclusion: Reject H0: alpha = 0?",
    ifelse(abs(t_alpha) > t_crit,
           "YES – alpha is significant at 5%.",
           "NO – fail to reject. Alpha is NOT statistically significant."),
    "\n")
## Conclusion: Reject H0: alpha = 0? NO – fail to reject. Alpha is NOT statistically significant.

Conclusion: |t| = 0.85 < 1.98. Although α = 0.0017 is positive, it is not statistically significant. The marketing claim of “positive risk-adjusted performance” is not justified by the data.


(d) Interpretation of R²

R2 <- 0.50

cat("R² =", R2, "\n\n")
## R² = 0.5
cat("Systematic variation    = R²       =", R2 * 100, "%\n")
## Systematic variation    = R²       = 50 %
cat("Diversifiable variation = 1 - R²   =", (1 - R2) * 100, "%\n")
## Diversifiable variation = 1 - R²   = 50 %

Interpretation: R² = 0.50 means 50% of the fund’s return variance is explained by market movements (systematic risk). The remaining 50% is idiosyncratic (diversifiable) risk that can in principle be eliminated through diversification.


(e) CAPM-implied expected monthly excess return

E_rm_rf <- 0.0070   # 0.70% as a decimal

# Formula: E[R_i - R_f] = beta * E[R_m - R_f]
E_ri_rf <- beta_hat * E_rm_rf

cat("Formula: E[R_i - R_f] = beta * E[R_m - R_f]\n")
## Formula: E[R_i - R_f] = beta * E[R_m - R_f]
cat("       =", beta_hat, "*", E_rm_rf, "\n")
##        = 0.98 * 0.007
cat("       =", round(E_ri_rf, 4), "(i.e.", round(E_ri_rf * 100, 4), "% per month)\n")
##        = 0.0069 (i.e. 0.686 % per month)

Question 2 – Fama–French Three-Factor Model

(f) t-statistics for all four coefficients

coefs <- c(alpha = 0.0029, b_MKT = 0.97, s_SMB = 0.75, h_HML = -0.13)
ses   <- c(alpha = 0.0018, b_MKT = 0.08, s_SMB = 0.11, h_HML =  0.13)

# Formula: t = coefficient / SE(coefficient)
t_stats <- coefs / ses

cat("Formula: t = coefficient / SE(coefficient)\n\n")
## Formula: t = coefficient / SE(coefficient)
results_ff <- data.frame(
  Estimate    = coefs,
  Std.Error   = ses,
  t_statistic = round(t_stats, 4),
  Significant = ifelse(abs(t_stats) > 1.98, "Yes (|t|>1.98)", "No")
)
print(results_ff)
##       Estimate Std.Error t_statistic    Significant
## alpha   0.0029    0.0018      1.6111             No
## b_MKT   0.9700    0.0800     12.1250 Yes (|t|>1.98)
## s_SMB   0.7500    0.1100      6.8182 Yes (|t|>1.98)
## h_HML  -0.1300    0.1300     -1.0000             No

(g) Investment style classification

cat("SMB loading s =", coefs["s_SMB"],
    "-> positive and significant: strong SMALL-CAP tilt\n")
## SMB loading s = 0.75 -> positive and significant: strong SMALL-CAP tilt
cat("HML loading h =", coefs["h_HML"],
    "-> negative and not significant: mild GROWTH tilt\n")
## HML loading h = -0.13 -> negative and not significant: mild GROWTH tilt

Style: The fund tilts strongly toward small-cap stocks (large positive, significant SMB loading) and mildly toward growth stocks (negative HML loading, not significant at 5%).


(h) Interpretation of the intercept

# Formula: t = alpha / SE(alpha)
t_alpha_ff <- coefs["alpha"] / ses["alpha"]

cat("Formula: t = alpha / SE(alpha)\n")
## Formula: t = alpha / SE(alpha)
cat("       t =", coefs["alpha"], "/", ses["alpha"], "\n")
##        t = 0.0029 / 0.0018
cat("       t =", round(t_alpha_ff, 4), "\n\n")
##        t = 1.6111
cat("Critical value: |t| =", t_crit, "\n")
## Critical value: |t| = 1.98
cat("Significant at 5%?",
    ifelse(abs(t_alpha_ff) > t_crit,
           "YES – manager adds value beyond factor exposures.",
           "NO – insufficient evidence of value added."), "\n")
## Significant at 5%? NO – insufficient evidence of value added.

Interpretation: α = 0.0029 (~+0.29%/month). The t-statistic of 1.6111 is below the critical value of 1.98. We fail to reject H₀: α = 0. There is insufficient statistical evidence that the manager generates value beyond the three factor exposures.


(i) Rise in R² from 0.75 to 0.92 and the role of adjusted R²

R2_capm <- 0.75
R2_ff   <- 0.92
n       <- 144
k_capm  <- 1
k_ff    <- 3

# Formula: Adj R² = 1 - (1 - R²) * (n - 1) / (n - k - 1)
adj_R2_capm <- 1 - (1 - R2_capm) * (n - 1) / (n - k_capm - 1)
adj_R2_ff   <- 1 - (1 - R2_ff)   * (n - 1) / (n - k_ff   - 1)

cat("Formula: Adj R² = 1 - (1 - R²) * (n - 1) / (n - k - 1)\n\n")
## Formula: Adj R² = 1 - (1 - R²) * (n - 1) / (n - k - 1)
cat("CAPM  (k=1): Adj R² = 1 - (1 -", R2_capm, ") * (", n, "- 1) / (",
    n, "- 1 - 1) =", round(adj_R2_capm, 4), "\n")
## CAPM  (k=1): Adj R² = 1 - (1 - 0.75 ) * ( 144 - 1) / ( 144 - 1 - 1) = 0.7482
cat("FF3F  (k=3): Adj R² = 1 - (1 -", R2_ff,   ") * (", n, "- 1) / (",
    n, "- 3 - 1) =", round(adj_R2_ff,   4), "\n")
## FF3F  (k=3): Adj R² = 1 - (1 - 0.92 ) * ( 144 - 1) / ( 144 - 3 - 1) = 0.9183

Interpretation: Adding SMB and HML raises R² by 17 percentage points, indicating that size and value exposures capture substantial variation left unexplained by the single market factor. Because R² never decreases when predictors are added, it cannot be used to compare models of different sizes. The adjusted R² penalises each additional predictor; its rise from 0.7482 to 0.9183 confirms that SMB and HML provide genuine explanatory power and are not merely inflating the fit mechanically.


Question 3 – Logistic Regression for Market Direction

(j) Predicted probability and class

b0    <- -0.02
b1    <-  5.4
b2    <- -0.38
r_lag <-  0.010
dVIX  <-  1.5

# Formula: logit P(Up) = b0 + b1*r_lag + b2*dVIX
logit_p <- b0 + b1 * r_lag + b2 * dVIX

# Formula: P(Up) = 1 / (1 + exp(-logit_p))
prob_up <- 1 / (1 + exp(-logit_p))

cat("Formula: logit P(Up) = b0 + b1 * r(t-1) + b2 * dVIX\n")
## Formula: logit P(Up) = b0 + b1 * r(t-1) + b2 * dVIX
cat("       =", b0, "+", b1, "*", r_lag, "+", b2, "*", dVIX, "\n")
##        = -0.02 + 5.4 * 0.01 + -0.38 * 1.5
cat("       =", round(logit_p, 4), "\n\n")
##        = -0.536
cat("Formula: P(Up) = 1 / (1 + exp(-logit_p))\n")
## Formula: P(Up) = 1 / (1 + exp(-logit_p))
cat("       = 1 / (1 + exp(-(",  round(logit_p, 4), ")))\n")
##        = 1 / (1 + exp(-( -0.536 )))
cat("       =", round(prob_up, 4), "\n\n")
##        = 0.3691
cat("Predicted class (threshold 0.5):",
    ifelse(prob_up >= 0.5, "UP", "DOWN"), "\n")
## Predicted class (threshold 0.5): DOWN

(k) Economic interpretation of β₁ and β₂

cat("b1 =", b1,
    "-> POSITIVE: a positive lagged return raises P(Up) tomorrow.\n")
## b1 = 5.4 -> POSITIVE: a positive lagged return raises P(Up) tomorrow.
cat("   Captures short-term MOMENTUM: markets tend to continue direction.\n\n")
##    Captures short-term MOMENTUM: markets tend to continue direction.
cat("b2 =", b2,
    "-> NEGATIVE: a rise in the VIX lowers P(Up) tomorrow.\n")
## b2 = -0.38 -> NEGATIVE: a rise in the VIX lowers P(Up) tomorrow.
cat("   Captures FEAR / UNCERTAINTY: higher implied volatility signals",
    "risk-off and a more likely down day.\n")
##    Captures FEAR / UNCERTAINTY: higher implied volatility signals risk-off and a more likely down day.

(l) Confusion matrix metrics

TP <- 67   # Predicted Up,   Actual Up
FP <- 44   # Predicted Up,   Actual Down
FN <- 33   # Predicted Down, Actual Up
TN <- 56   # Predicted Down, Actual Down
N  <- 200

# Formulas
accuracy    <- (TP + TN) / N
sensitivity <- TP / (TP + FN)
specificity <- TN / (TN + FP)
precision   <- TP / (TP + FP)

cat("Formula: Accuracy    = (TP + TN) / N =",
    TP, "+", TN, "/", N, "=", round(accuracy, 4), "\n")
## Formula: Accuracy    = (TP + TN) / N = 67 + 56 / 200 = 0.615
cat("Formula: Sensitivity = TP / (TP + FN) =",
    TP, "/", TP + FN, "=", round(sensitivity, 4), "\n")
## Formula: Sensitivity = TP / (TP + FN) = 67 / 100 = 0.67
cat("Formula: Specificity = TN / (TN + FP) =",
    TN, "/", TN + FP, "=", round(specificity, 4), "\n")
## Formula: Specificity = TN / (TN + FP) = 56 / 100 = 0.56
cat("Formula: Precision   = TP / (TP + FP) =",
    TP, "/", TP + FP, "=", round(precision, 4), "\n")
## Formula: Precision   = TP / (TP + FP) = 67 / 111 = 0.6036

(m) Naive rule vs. model and limitations of accuracy

# Both classes are balanced (100 Up, 100 Down).
# Naive rule: always predict majority class -> accuracy = 100/200 = 0.50
naive_accuracy <- 100 / N

cat("Formula: Naive accuracy = majority class count / N =",
    100, "/", N, "=", naive_accuracy, "\n\n")
## Formula: Naive accuracy = majority class count / N = 100 / 200 = 0.5
cat("Naive majority-class accuracy:", naive_accuracy, "\n")
## Naive majority-class accuracy: 0.5
cat("Logistic model accuracy:      ", round(accuracy, 4), "\n")
## Logistic model accuracy:       0.615
cat("Model beats naive rule?",
    ifelse(accuracy > naive_accuracy, "YES", "NO"), "\n")
## Model beats naive rule? YES

Why accuracy alone is inadequate for a trading system: Misclassification costs are asymmetric — a false positive (wrongly predicting Up) causes a direct capital loss, while a false negative (missing an Up day) is an opportunity cost. On imbalanced datasets, a model predicting the majority class always achieves high accuracy with zero predictive value.

A more economically relevant criterion: The Sharpe ratio of the strategy’s realized returns — which captures the frequency and magnitude of correct signals net of transaction costs — is far more informative than raw accuracy for evaluating a trading model.


Question 4 – Resampling and Regularization in a Backtest

(n) Monthly and annualized Sharpe ratio

mu_monthly <- 0.0070   # 0.70%
sd_monthly <- 0.0550   # 5.50%
T_months   <- 48

# Formula: SR_monthly = mu / sigma
SR_monthly <- mu_monthly / sd_monthly

# Annualization scaling factor: sqrt(12)
scaling <- sqrt(12)

# Formula: SR_annual = SR_monthly * sqrt(12)
SR_annualized <- SR_monthly * scaling

cat("Formula: SR_monthly = mu_monthly / sd_monthly\n")
## Formula: SR_monthly = mu_monthly / sd_monthly
cat("       =", mu_monthly, "/", sd_monthly, "\n")
##        = 0.007 / 0.055
cat("       =", round(SR_monthly, 4), "\n\n")
##        = 0.1273
cat("Annualization scaling factor: sqrt(12) =", round(scaling, 4), "\n\n")
## Annualization scaling factor: sqrt(12) = 3.4641
cat("Formula: SR_annual = SR_monthly * sqrt(12)\n")
## Formula: SR_annual = SR_monthly * sqrt(12)
cat("       =", round(SR_monthly, 4), "*", round(scaling, 4), "\n")
##        = 0.1273 * 3.4641
cat("       =", round(SR_annualized, 4), "\n")
##        = 0.4409

Scaling factor justification: Assuming i.i.d. monthly returns, the mean scales by 12 and the standard deviation by √12, so SR_annual = SR_monthly × √12 ≈ 0.1273 × 3.4641 = 0.4409.


(o) Bootstrap standard error for the Sharpe ratio

set.seed(42)
B <- 10000

# Simulated monthly returns matching the given parameters
returns_sim <- rnorm(T_months, mean = mu_monthly, sd = sd_monthly)

# ── i.i.d. bootstrap (shown for contrast; inappropriate here) ──────────────
sr_boot_iid <- replicate(B, {
  r_b <- sample(returns_sim, T_months, replace = TRUE)
  mean(r_b) / sd(r_b)
})
se_iid <- sd(sr_boot_iid)

cat("i.i.d. bootstrap SE of SR:", round(se_iid, 4), "\n\n")
## i.i.d. bootstrap SE of SR: 0.149
# ── Circular block bootstrap (appropriate for time-series data) ────────────
# Block size ~ sqrt(T) to balance bias-variance trade-off
block_size <- round(sqrt(T_months))

sr_boot_block <- replicate(B, {
  n_blocks <- ceiling(T_months / block_size)
  starts   <- sample(1:T_months, n_blocks, replace = TRUE)
  indices  <- unlist(lapply(starts, function(s)
    ((s - 1 + 0:(block_size - 1)) %% T_months) + 1))
  r_b <- returns_sim[indices[1:T_months]]
  mean(r_b) / sd(r_b)
})
se_block <- sd(sr_boot_block)

cat("Block bootstrap SE of SR (block size =", block_size, "):",
    round(se_block, 4), "\n")
## Block bootstrap SE of SR (block size = 7 ): 0.1706

Bootstrap procedure — step by step:

  1. From the T = 48 observed monthly returns, draw B = 10,000 resamples of size T with replacement.
  2. For each resample b, compute \(\widehat{SR}^{(b)} = \bar{r}^{(b)} / s^{(b)}\).
  3. The bootstrap standard error is \(\widehat{SE}_{boot} = \text{sd}\!\left(\widehat{SR}^{(1)},\ldots,\widehat{SR}^{(B)}\right)\).

Why the i.i.d. bootstrap is inappropriate: Monthly financial returns exhibit serial dependence — autocorrelation from momentum or mean-reversion, and volatility clustering (GARCH effects). The i.i.d. bootstrap destroys this temporal structure by shuffling observations randomly, underestimating true sampling variability.

The fix — block bootstrap: The circular block bootstrap resamples contiguous blocks of returns, preserving short-run dependence within each block. No external package is required.


(p) Choosing between λ_min and λ_1SE in LASSO

lambda_min  <- 0.030; factors_min <- 14
lambda_1se  <- 0.065; factors_1se <-  7

cat("lambda_min =", lambda_min, "-> retains", factors_min, "factors\n")
## lambda_min = 0.03 -> retains 14 factors
cat("lambda_1SE =", lambda_1se, "-> retains", factors_1se, "factors\n\n")
## lambda_1SE = 0.065 -> retains 7 factors
cat("Recommended: lambda =", lambda_1se, "(1-SE rule)\n")
## Recommended: lambda = 0.065 (1-SE rule)

Decision: deploy λ = 0.065 (the 1-SE rule).

The 1-SE rule selects the most parsimonious model whose CV error lies within one standard error of the minimum-CV-error model. With 60 candidate factors and limited data, overfitting is a serious risk. A 7-factor model is far less likely to reflect data-mining than a 14-factor model. In live trading, each additional factor increases turnover, transaction costs, and the probability of spurious in-sample fit that collapses out-of-sample. The small sacrifice in in-sample fit is economically justified by substantially better out-of-sample robustness.


(q) Walk-forward validation vs. standard k-fold

T_total    <- 60
train_init <- 36
test_size  <-  6

windows <- data.frame()
t_start <- 1

while ((t_start + train_init + test_size - 1) <= T_total) {
  train_end  <- t_start + train_init - 1
  test_start <- train_end + 1
  test_end   <- min(test_start + test_size - 1, T_total)

  windows <- rbind(windows, data.frame(
    Fold        = nrow(windows) + 1,
    Train_Start = t_start,
    Train_End   = train_end,
    Test_Start  = test_start,
    Test_End    = test_end
  ))
  t_start <- t_start + test_size   # expanding window
}

print(windows)
##   Fold Train_Start Train_End Test_Start Test_End
## 1    1           1        36         37       42
## 2    2           7        42         43       48
## 3    3          13        48         49       54
## 4    4          19        54         55       60

Walk-forward (expanding window) scheme: At each fold the model is re-estimated on all available history up to the training end date, and the LASSO λ is re-tuned using only that past data. Predictions are then made on the next unseen block.

**Why standard random k-fold is unsafe here:

  1. **Temporal data leakage: Random k-fold can place future observations in the training set — the model sees the future, producing optimistically biased CV error estimates (look-ahead bias).
  2. **Non-stationarity: Financial distributions shift over time (regime changes, crises). Randomly mixing folds ignores this and overestimates out-of-sample generalization.
  3. **Walk-forward respects causality: every prediction uses only information available at the time of prediction — the only realistic condition for a deployable trading strategy.