beta_hat <- 0.98
se_beta <- 0.17
t_crit <- 1.98
# Formula: t = beta_hat / SE(beta)
t_beta <- beta_hat / se_beta
cat("Formula: t = beta_hat / SE(beta)\n")
## Formula: t = beta_hat / SE(beta)
cat(" t =", beta_hat, "/", se_beta, "\n")
## t = 0.98 / 0.17
cat(" t =", round(t_beta, 4), "\n\n")
## t = 5.7647
cat("Critical value: |t| =", t_crit, "\n")
## Critical value: |t| = 1.98
cat("Conclusion: Reject H0: beta = 0?",
ifelse(abs(t_beta) > t_crit, "YES – beta is significant at 5%.",
"NO – fail to reject."), "\n")
## Conclusion: Reject H0: beta = 0? YES – beta is significant at 5%.
Economic interpretation: β ≈ 0.98 means that for every 1% rise in the market excess return, the fund’s excess return tends to rise by 0.98%. The fund bears nearly the same systematic risk as the broad market. Since |t| = 5.7647 > 1.98 we reject H₀: β = 0; the market factor is a statistically significant driver of fund returns.
# Formula: t = (beta_hat - 1) / SE(beta)
t_beta1 <- (beta_hat - 1) / se_beta
cat("Formula: t = (beta_hat - 1) / SE(beta)\n")
## Formula: t = (beta_hat - 1) / SE(beta)
cat(" t = (", beta_hat, "- 1) /", se_beta, "\n")
## t = ( 0.98 - 1) / 0.17
cat(" t =", round(t_beta1, 4), "\n\n")
## t = -0.1176
cat("Critical value: |t| =", t_crit, "\n")
## Critical value: |t| = 1.98
cat("Conclusion: Reject H0: beta = 1?",
ifelse(abs(t_beta1) > t_crit, "YES – beta differs from 1 at 5%.",
"NO – fail to reject. Beta is not statistically different from 1."),
"\n")
## Conclusion: Reject H0: beta = 1? NO – fail to reject. Beta is not statistically different from 1.
Conclusion: |t| = 0.1176 < 1.98. We fail to reject H₀: β = 1. The fund’s systematic risk is statistically indistinguishable from the market’s — there is no evidence that the fund is more or less aggressive than a passive index.
alpha_hat <- 0.0017
se_alpha <- 0.0020
# Formula: t = alpha_hat / SE(alpha)
t_alpha <- alpha_hat / se_alpha
cat("Formula: t = alpha_hat / SE(alpha)\n")
## Formula: t = alpha_hat / SE(alpha)
cat(" t =", alpha_hat, "/", se_alpha, "\n")
## t = 0.0017 / 0.002
cat(" t =", round(t_alpha, 4), "\n\n")
## t = 0.85
cat("Critical value: |t| =", t_crit, "\n")
## Critical value: |t| = 1.98
cat("Conclusion: Reject H0: alpha = 0?",
ifelse(abs(t_alpha) > t_crit,
"YES – alpha is significant at 5%.",
"NO – fail to reject. Alpha is NOT statistically significant."),
"\n")
## Conclusion: Reject H0: alpha = 0? NO – fail to reject. Alpha is NOT statistically significant.
Conclusion: |t| = 0.85 < 1.98. Although α = 0.0017 is positive, it is not statistically significant. The marketing claim of “positive risk-adjusted performance” is not justified by the data.
R2 <- 0.50
cat("R² =", R2, "\n\n")
## R² = 0.5
cat("Systematic variation = R² =", R2 * 100, "%\n")
## Systematic variation = R² = 50 %
cat("Diversifiable variation = 1 - R² =", (1 - R2) * 100, "%\n")
## Diversifiable variation = 1 - R² = 50 %
Interpretation: R² = 0.50 means 50% of the fund’s return variance is explained by market movements (systematic risk). The remaining 50% is idiosyncratic (diversifiable) risk that can in principle be eliminated through diversification.
E_rm_rf <- 0.0070 # 0.70% as a decimal
# Formula: E[R_i - R_f] = beta * E[R_m - R_f]
E_ri_rf <- beta_hat * E_rm_rf
cat("Formula: E[R_i - R_f] = beta * E[R_m - R_f]\n")
## Formula: E[R_i - R_f] = beta * E[R_m - R_f]
cat(" =", beta_hat, "*", E_rm_rf, "\n")
## = 0.98 * 0.007
cat(" =", round(E_ri_rf, 4), "(i.e.", round(E_ri_rf * 100, 4), "% per month)\n")
## = 0.0069 (i.e. 0.686 % per month)
coefs <- c(alpha = 0.0029, b_MKT = 0.97, s_SMB = 0.75, h_HML = -0.13)
ses <- c(alpha = 0.0018, b_MKT = 0.08, s_SMB = 0.11, h_HML = 0.13)
# Formula: t = coefficient / SE(coefficient)
t_stats <- coefs / ses
cat("Formula: t = coefficient / SE(coefficient)\n\n")
## Formula: t = coefficient / SE(coefficient)
results_ff <- data.frame(
Estimate = coefs,
Std.Error = ses,
t_statistic = round(t_stats, 4),
Significant = ifelse(abs(t_stats) > 1.98, "Yes (|t|>1.98)", "No")
)
print(results_ff)
## Estimate Std.Error t_statistic Significant
## alpha 0.0029 0.0018 1.6111 No
## b_MKT 0.9700 0.0800 12.1250 Yes (|t|>1.98)
## s_SMB 0.7500 0.1100 6.8182 Yes (|t|>1.98)
## h_HML -0.1300 0.1300 -1.0000 No
cat("SMB loading s =", coefs["s_SMB"],
"-> positive and significant: strong SMALL-CAP tilt\n")
## SMB loading s = 0.75 -> positive and significant: strong SMALL-CAP tilt
cat("HML loading h =", coefs["h_HML"],
"-> negative and not significant: mild GROWTH tilt\n")
## HML loading h = -0.13 -> negative and not significant: mild GROWTH tilt
Style: The fund tilts strongly toward small-cap stocks (large positive, significant SMB loading) and mildly toward growth stocks (negative HML loading, not significant at 5%).
# Formula: t = alpha / SE(alpha)
t_alpha_ff <- coefs["alpha"] / ses["alpha"]
cat("Formula: t = alpha / SE(alpha)\n")
## Formula: t = alpha / SE(alpha)
cat(" t =", coefs["alpha"], "/", ses["alpha"], "\n")
## t = 0.0029 / 0.0018
cat(" t =", round(t_alpha_ff, 4), "\n\n")
## t = 1.6111
cat("Critical value: |t| =", t_crit, "\n")
## Critical value: |t| = 1.98
cat("Significant at 5%?",
ifelse(abs(t_alpha_ff) > t_crit,
"YES – manager adds value beyond factor exposures.",
"NO – insufficient evidence of value added."), "\n")
## Significant at 5%? NO – insufficient evidence of value added.
Interpretation: α = 0.0029 (~+0.29%/month). The t-statistic of 1.6111 is below the critical value of 1.98. We fail to reject H₀: α = 0. There is insufficient statistical evidence that the manager generates value beyond the three factor exposures.
R2_capm <- 0.75
R2_ff <- 0.92
n <- 144
k_capm <- 1
k_ff <- 3
# Formula: Adj R² = 1 - (1 - R²) * (n - 1) / (n - k - 1)
adj_R2_capm <- 1 - (1 - R2_capm) * (n - 1) / (n - k_capm - 1)
adj_R2_ff <- 1 - (1 - R2_ff) * (n - 1) / (n - k_ff - 1)
cat("Formula: Adj R² = 1 - (1 - R²) * (n - 1) / (n - k - 1)\n\n")
## Formula: Adj R² = 1 - (1 - R²) * (n - 1) / (n - k - 1)
cat("CAPM (k=1): Adj R² = 1 - (1 -", R2_capm, ") * (", n, "- 1) / (",
n, "- 1 - 1) =", round(adj_R2_capm, 4), "\n")
## CAPM (k=1): Adj R² = 1 - (1 - 0.75 ) * ( 144 - 1) / ( 144 - 1 - 1) = 0.7482
cat("FF3F (k=3): Adj R² = 1 - (1 -", R2_ff, ") * (", n, "- 1) / (",
n, "- 3 - 1) =", round(adj_R2_ff, 4), "\n")
## FF3F (k=3): Adj R² = 1 - (1 - 0.92 ) * ( 144 - 1) / ( 144 - 3 - 1) = 0.9183
Interpretation: Adding SMB and HML raises R² by 17 percentage points, indicating that size and value exposures capture substantial variation left unexplained by the single market factor. Because R² never decreases when predictors are added, it cannot be used to compare models of different sizes. The adjusted R² penalises each additional predictor; its rise from 0.7482 to 0.9183 confirms that SMB and HML provide genuine explanatory power and are not merely inflating the fit mechanically.
b0 <- -0.02
b1 <- 5.4
b2 <- -0.38
r_lag <- 0.010
dVIX <- 1.5
# Formula: logit P(Up) = b0 + b1*r_lag + b2*dVIX
logit_p <- b0 + b1 * r_lag + b2 * dVIX
# Formula: P(Up) = 1 / (1 + exp(-logit_p))
prob_up <- 1 / (1 + exp(-logit_p))
cat("Formula: logit P(Up) = b0 + b1 * r(t-1) + b2 * dVIX\n")
## Formula: logit P(Up) = b0 + b1 * r(t-1) + b2 * dVIX
cat(" =", b0, "+", b1, "*", r_lag, "+", b2, "*", dVIX, "\n")
## = -0.02 + 5.4 * 0.01 + -0.38 * 1.5
cat(" =", round(logit_p, 4), "\n\n")
## = -0.536
cat("Formula: P(Up) = 1 / (1 + exp(-logit_p))\n")
## Formula: P(Up) = 1 / (1 + exp(-logit_p))
cat(" = 1 / (1 + exp(-(", round(logit_p, 4), ")))\n")
## = 1 / (1 + exp(-( -0.536 )))
cat(" =", round(prob_up, 4), "\n\n")
## = 0.3691
cat("Predicted class (threshold 0.5):",
ifelse(prob_up >= 0.5, "UP", "DOWN"), "\n")
## Predicted class (threshold 0.5): DOWN
cat("b1 =", b1,
"-> POSITIVE: a positive lagged return raises P(Up) tomorrow.\n")
## b1 = 5.4 -> POSITIVE: a positive lagged return raises P(Up) tomorrow.
cat(" Captures short-term MOMENTUM: markets tend to continue direction.\n\n")
## Captures short-term MOMENTUM: markets tend to continue direction.
cat("b2 =", b2,
"-> NEGATIVE: a rise in the VIX lowers P(Up) tomorrow.\n")
## b2 = -0.38 -> NEGATIVE: a rise in the VIX lowers P(Up) tomorrow.
cat(" Captures FEAR / UNCERTAINTY: higher implied volatility signals",
"risk-off and a more likely down day.\n")
## Captures FEAR / UNCERTAINTY: higher implied volatility signals risk-off and a more likely down day.
TP <- 67 # Predicted Up, Actual Up
FP <- 44 # Predicted Up, Actual Down
FN <- 33 # Predicted Down, Actual Up
TN <- 56 # Predicted Down, Actual Down
N <- 200
# Formulas
accuracy <- (TP + TN) / N
sensitivity <- TP / (TP + FN)
specificity <- TN / (TN + FP)
precision <- TP / (TP + FP)
cat("Formula: Accuracy = (TP + TN) / N =",
TP, "+", TN, "/", N, "=", round(accuracy, 4), "\n")
## Formula: Accuracy = (TP + TN) / N = 67 + 56 / 200 = 0.615
cat("Formula: Sensitivity = TP / (TP + FN) =",
TP, "/", TP + FN, "=", round(sensitivity, 4), "\n")
## Formula: Sensitivity = TP / (TP + FN) = 67 / 100 = 0.67
cat("Formula: Specificity = TN / (TN + FP) =",
TN, "/", TN + FP, "=", round(specificity, 4), "\n")
## Formula: Specificity = TN / (TN + FP) = 56 / 100 = 0.56
cat("Formula: Precision = TP / (TP + FP) =",
TP, "/", TP + FP, "=", round(precision, 4), "\n")
## Formula: Precision = TP / (TP + FP) = 67 / 111 = 0.6036
# Both classes are balanced (100 Up, 100 Down).
# Naive rule: always predict majority class -> accuracy = 100/200 = 0.50
naive_accuracy <- 100 / N
cat("Formula: Naive accuracy = majority class count / N =",
100, "/", N, "=", naive_accuracy, "\n\n")
## Formula: Naive accuracy = majority class count / N = 100 / 200 = 0.5
cat("Naive majority-class accuracy:", naive_accuracy, "\n")
## Naive majority-class accuracy: 0.5
cat("Logistic model accuracy: ", round(accuracy, 4), "\n")
## Logistic model accuracy: 0.615
cat("Model beats naive rule?",
ifelse(accuracy > naive_accuracy, "YES", "NO"), "\n")
## Model beats naive rule? YES
Why accuracy alone is inadequate for a trading system: Misclassification costs are asymmetric — a false positive (wrongly predicting Up) causes a direct capital loss, while a false negative (missing an Up day) is an opportunity cost. On imbalanced datasets, a model predicting the majority class always achieves high accuracy with zero predictive value.
A more economically relevant criterion: The Sharpe ratio of the strategy’s realized returns — which captures the frequency and magnitude of correct signals net of transaction costs — is far more informative than raw accuracy for evaluating a trading model.
mu_monthly <- 0.0070 # 0.70%
sd_monthly <- 0.0550 # 5.50%
T_months <- 48
# Formula: SR_monthly = mu / sigma
SR_monthly <- mu_monthly / sd_monthly
# Annualization scaling factor: sqrt(12)
scaling <- sqrt(12)
# Formula: SR_annual = SR_monthly * sqrt(12)
SR_annualized <- SR_monthly * scaling
cat("Formula: SR_monthly = mu_monthly / sd_monthly\n")
## Formula: SR_monthly = mu_monthly / sd_monthly
cat(" =", mu_monthly, "/", sd_monthly, "\n")
## = 0.007 / 0.055
cat(" =", round(SR_monthly, 4), "\n\n")
## = 0.1273
cat("Annualization scaling factor: sqrt(12) =", round(scaling, 4), "\n\n")
## Annualization scaling factor: sqrt(12) = 3.4641
cat("Formula: SR_annual = SR_monthly * sqrt(12)\n")
## Formula: SR_annual = SR_monthly * sqrt(12)
cat(" =", round(SR_monthly, 4), "*", round(scaling, 4), "\n")
## = 0.1273 * 3.4641
cat(" =", round(SR_annualized, 4), "\n")
## = 0.4409
Scaling factor justification: Assuming i.i.d. monthly returns, the mean scales by 12 and the standard deviation by √12, so SR_annual = SR_monthly × √12 ≈ 0.1273 × 3.4641 = 0.4409.
set.seed(42)
B <- 10000
# Simulated monthly returns matching the given parameters
returns_sim <- rnorm(T_months, mean = mu_monthly, sd = sd_monthly)
# ── i.i.d. bootstrap (shown for contrast; inappropriate here) ──────────────
sr_boot_iid <- replicate(B, {
r_b <- sample(returns_sim, T_months, replace = TRUE)
mean(r_b) / sd(r_b)
})
se_iid <- sd(sr_boot_iid)
cat("i.i.d. bootstrap SE of SR:", round(se_iid, 4), "\n\n")
## i.i.d. bootstrap SE of SR: 0.149
# ── Circular block bootstrap (appropriate for time-series data) ────────────
# Block size ~ sqrt(T) to balance bias-variance trade-off
block_size <- round(sqrt(T_months))
sr_boot_block <- replicate(B, {
n_blocks <- ceiling(T_months / block_size)
starts <- sample(1:T_months, n_blocks, replace = TRUE)
indices <- unlist(lapply(starts, function(s)
((s - 1 + 0:(block_size - 1)) %% T_months) + 1))
r_b <- returns_sim[indices[1:T_months]]
mean(r_b) / sd(r_b)
})
se_block <- sd(sr_boot_block)
cat("Block bootstrap SE of SR (block size =", block_size, "):",
round(se_block, 4), "\n")
## Block bootstrap SE of SR (block size = 7 ): 0.1706
Bootstrap procedure — step by step:
Why the i.i.d. bootstrap is inappropriate: Monthly financial returns exhibit serial dependence — autocorrelation from momentum or mean-reversion, and volatility clustering (GARCH effects). The i.i.d. bootstrap destroys this temporal structure by shuffling observations randomly, underestimating true sampling variability.
The fix — block bootstrap: The circular block bootstrap resamples contiguous blocks of returns, preserving short-run dependence within each block. No external package is required.
lambda_min <- 0.030; factors_min <- 14
lambda_1se <- 0.065; factors_1se <- 7
cat("lambda_min =", lambda_min, "-> retains", factors_min, "factors\n")
## lambda_min = 0.03 -> retains 14 factors
cat("lambda_1SE =", lambda_1se, "-> retains", factors_1se, "factors\n\n")
## lambda_1SE = 0.065 -> retains 7 factors
cat("Recommended: lambda =", lambda_1se, "(1-SE rule)\n")
## Recommended: lambda = 0.065 (1-SE rule)
Decision: deploy λ = 0.065 (the 1-SE rule).
The 1-SE rule selects the most parsimonious model whose CV error lies within one standard error of the minimum-CV-error model. With 60 candidate factors and limited data, overfitting is a serious risk. A 7-factor model is far less likely to reflect data-mining than a 14-factor model. In live trading, each additional factor increases turnover, transaction costs, and the probability of spurious in-sample fit that collapses out-of-sample. The small sacrifice in in-sample fit is economically justified by substantially better out-of-sample robustness.
T_total <- 60
train_init <- 36
test_size <- 6
windows <- data.frame()
t_start <- 1
while ((t_start + train_init + test_size - 1) <= T_total) {
train_end <- t_start + train_init - 1
test_start <- train_end + 1
test_end <- min(test_start + test_size - 1, T_total)
windows <- rbind(windows, data.frame(
Fold = nrow(windows) + 1,
Train_Start = t_start,
Train_End = train_end,
Test_Start = test_start,
Test_End = test_end
))
t_start <- t_start + test_size # expanding window
}
print(windows)
## Fold Train_Start Train_End Test_Start Test_End
## 1 1 1 36 37 42
## 2 2 7 42 43 48
## 3 3 13 48 49 54
## 4 4 19 54 55 60
Walk-forward (expanding window) scheme: At each fold the model is re-estimated on all available history up to the training end date, and the LASSO λ is re-tuned using only that past data. Predictions are then made on the next unseen block.
**Why standard random k-fold is unsafe here: