Given information:
Formula:
\[t_{\hat{\beta}} = \frac{\hat{\beta} - 0}{\text{SE}(\hat{\beta})}\]
Calculation:
beta_hat <- 0.98
se_beta <- 0.17
t_crit <- 1.98
t_beta <- (beta_hat - 0) / se_beta
cat("t-statistic for beta:", round(t_beta, 4), "\n")## t-statistic for beta: 5.765
## Critical value: 1.98
## Decision: Reject H0? TRUE
Result: \(t_{\hat{\beta}} = 0.98 / 0.17 = 5.7647\)
Since \(|t_{\hat{\beta}}| = 5.7647 > 1.98\), we reject \(H_0: \beta = 0\) at the 5% significance level.
Economic interpretation: \(\hat{\beta} = 0.98\) means the fund’s excess return moves almost one-for-one with the market. For every 1 percentage point increase in the market excess return, the fund is expected to earn approximately 0.98 percentage points more. The fund carries nearly the same systematic (non-diversifiable) risk as the market portfolio.
Formula:
\[t = \frac{\hat{\beta} - 1}{\text{SE}(\hat{\beta})}\]
Calculation:
## t-statistic for (beta = 1): -0.1176
## Critical value: 1.98
## Decision: Reject H0? FALSE
Result: \(t = (0.98 - 1) / 0.17 = -0.02 / 0.17 = -0.1176\)
Since \(|t| = 0.1176 < 1.98\), we fail to reject \(H_0: \beta = 1\) at the 5% level.
Interpretation: The fund’s systematic risk is statistically indistinguishable from the market. We cannot conclude that the manager takes on more or less market exposure than a passive index fund — the fund behaves like a market tracker in terms of its beta.
Formula:
\[t_{\hat{\alpha}} = \frac{\hat{\alpha} - 0}{\text{SE}(\hat{\alpha})}\]
Calculation:
alpha_hat <- 0.0017
se_alpha <- 0.0020
t_alpha <- alpha_hat / se_alpha
cat("t-statistic for alpha:", round(t_alpha, 4), "\n")## t-statistic for alpha: 0.85
## Critical value: 1.98
## Decision: Reject H0? FALSE
Result: \(t_{\hat{\alpha}} = 0.0017 / 0.0020 = 0.85\)
Since \(|t_{\hat{\alpha}}| = 0.85 < 1.98\), we fail to reject \(H_0: \alpha = 0\) at the 5% level.
Conclusion on marketing claim: The data do not statistically justify advertising “positive risk-adjusted performance.” Although \(\hat{\alpha} = 0.0017\) (0.17% per month) is positive in sign, it is not distinguishable from zero at any conventional significance level. The positive estimate could easily be sampling noise rather than genuine manager skill.
## Systematic variation (R^2): 50 %
## Idiosyncratic variation: 50 %
Interpretation: \(R^2 = 0.50\) means that 50% of the fund’s monthly return variation is explained by co-movement with the market portfolio — this is the systematic, non-diversifiable component. The remaining 50% is idiosyncratic (diversifiable) risk, arising from stock selection, sector tilts, or manager-specific bets.
Formula:
\[E[R_i - R_f] = \hat{\beta} \times E[R_m - R_f]\]
Calculation:
mkt_premium <- 0.70 # percent per month
capm_expected <- beta_hat * mkt_premium
cat("CAPM-implied monthly excess return:", round(capm_expected, 4), "%\n")## CAPM-implied monthly excess return: 0.686 %
Result:
\[E[R_i - R_f] = 0.98 \times 0.70\% = 0.6860\%\]
The CAPM implies the fund should earn a monthly excess return of 0.6860% given its systematic risk. The estimated alpha of 0.17% represents the fund’s performance above this benchmark, though as shown in part (c), this surplus is not statistically significant.
The chart below simulates the scatter of fund excess returns against market excess returns consistent with the given regression parameters (\(\hat{\alpha}\), \(\hat{\beta}\), \(R^2\)), and overlays the fitted regression line with its 95% confidence band.
set.seed(101)
n <- 96
sigma_mkt <- 0.04
mkt_xr <- rnorm(n, mean = 0.007, sd = sigma_mkt)
# Residual SD implied by R^2 = 0.50
# Var(R_i) = beta^2 * Var(R_m) + Var(eps)
# R^2 = beta^2 * Var(R_m) / Var(R_i) => Var(eps) = beta^2 * Var(R_m) * (1/R^2 - 1)
var_eps <- beta_hat^2 * sigma_mkt^2 * (1 / R2 - 1)
eps <- rnorm(n, 0, sqrt(var_eps))
fund_xr <- alpha_hat + beta_hat * mkt_xr + eps
df1 <- data.frame(mkt = mkt_xr * 100, fund = fund_xr * 100)
ggplot(df1, aes(x = mkt, y = fund)) +
geom_point(alpha = 0.45, color = "#2C7BB6", size = 1.8) +
geom_smooth(method = "lm", color = "#D7191C", fill = "#FDAE61",
alpha = 0.25, linewidth = 0.9) +
geom_hline(yintercept = 0, linetype = "dashed", color = "grey50", linewidth = 0.4) +
geom_vline(xintercept = 0, linetype = "dashed", color = "grey50", linewidth = 0.4) +
annotate("text", x = min(df1$mkt) + 0.5, y = max(df1$fund) - 0.3,
label = paste0("β̂ = ", beta_hat, " α̂ = ", alpha_hat * 100, "%\nR² = ", R2),
hjust = 0, size = 3.5, color = "#333333") +
labs(title = "CAPM Regression: Fund Excess Return vs. Market Excess Return",
subtitle = "96 monthly observations | OLS fit with 95% confidence band",
x = "Market Excess Return (%)",
y = "Fund Excess Return (%)") +
theme_minimal(base_size = 11) +
theme(plot.title = element_text(face = "bold", size = 12),
plot.subtitle = element_text(color = "grey45", size = 9),
panel.grid.minor = element_blank())Figure 1: Simulated CAPM scatter plot with fitted regression line (95% CI shaded). The slope ≈ 0.98 is indistinguishable from 1, consistent with part (b).
Given information:
coef_names <- c("alpha", "MKT (b)", "SMB (s)", "HML (h)")
estimates <- c(0.0029, 0.97, 0.75, -0.13)
std_errors <- c(0.0018, 0.08, 0.11, 0.13)Formula: \(t_j = \hat{\theta}_j / \text{SE}(\hat{\theta}_j)\), significant if \(|t_j| > 1.98\)
t_stats <- estimates / std_errors
significant <- abs(t_stats) > t_crit
results_ff3 <- data.frame(
Term = coef_names,
Estimate = estimates,
Std_Error = std_errors,
t_statistic = round(t_stats, 4),
Significant = ifelse(significant, "Yes ***", "No")
)
print(results_ff3, row.names = FALSE)## Term Estimate Std_Error t_statistic Significant
## alpha 0.0029 0.0018 1.611 No
## MKT (b) 0.9700 0.0800 12.125 Yes ***
## SMB (s) 0.7500 0.1100 6.818 Yes ***
## HML (h) -0.1300 0.1300 -1.000 No
| Term | Estimate | SE | t-stat | Significant? |
|---|---|---|---|---|
| \(\hat{\alpha}\) | 0.0029 | 0.0018 | 1.6111 | No |
| \(\hat{b}\) (MKT) | 0.97 | 0.08 | 12.1250 | Yes |
| \(\hat{s}\) (SMB) | 0.75 | 0.11 | 6.8182 | Yes |
| \(\hat{h}\) (HML) | −0.13 | 0.13 | −1.0000 | No |
MKT and SMB are statistically significant at the 5% level. \(\alpha\) and HML are not.
## SMB loading (s): 0.75 -> positive -> Small-cap tilt
## HML loading (h): -0.13 -> negative -> Growth tilt
## SMB significant? TRUE
## HML significant? FALSE
Size tilt: \(\hat{s} = +0.75\) (significant, \(t = 6.82\)) — strong small-cap bias.
Value/Growth tilt: \(\hat{h} = -0.13\) (not significant, \(t = -1.00\)) — mild growth tilt, but statistically inconclusive.
Overall: This is a small-cap (growth-leaning) fund.
alpha_ff3 <- 0.0029; se_ff3 <- 0.0018
t_alpha_ff3 <- alpha_ff3 / se_ff3
cat("FF3 alpha (monthly): ", round(alpha_ff3 * 100, 4), "%\n")## FF3 alpha (monthly): 0.29 %
## FF3 alpha (approx. annual): 3.48 %
## t-statistic for alpha: 1.611
## Significant at 5%? FALSE
\(t_{\hat{\alpha}} = 1.6111 < 1.98\) — we fail to reject \(H_0: \alpha = 0\). The monthly alpha of 0.29% annualizes to ~3.5% and is economically promising, but statistically inconclusive. A longer track record is needed.
## CAPM R^2: 0.75
## FF3 R^2: 0.92
## FF3 Adj. R^2: 0.918
## Incremental R^2: 0.17
The 17 pp jump confirms that SMB and HML explain return variation the market factor misses. Adjusted \(R^2 = 0.918 \approx R^2 = 0.920\) shows the added factors are not mere noise — the penalty for the two extra parameters is negligible.
\[\bar{R}^2 = 1 - \frac{(1 - R^2)(n-1)}{n - k - 1}\]
The chart shows each estimated coefficient with its 95% CI (\(\hat{\theta} \pm 1.98 \times \text{SE}\)). Coefficients whose CI excludes zero are statistically significant.
df2 <- data.frame(
term = factor(coef_names, levels = rev(coef_names)),
estimate = estimates,
se = std_errors
)
df2$ci_lo <- df2$estimate - t_crit * df2$se
df2$ci_hi <- df2$estimate + t_crit * df2$se
df2$sig <- abs(df2$estimate / df2$se) > t_crit
ggplot(df2, aes(x = estimate, y = term, color = sig)) +
geom_vline(xintercept = 0, linetype = "dashed", color = "grey55", linewidth = 0.5) +
geom_errorbarh(aes(xmin = ci_lo, xmax = ci_hi), height = 0.25, linewidth = 0.8) +
geom_point(size = 4) +
scale_color_manual(values = c("TRUE" = "#2C7BB6", "FALSE" = "#D7191C"),
labels = c("TRUE" = "Significant", "FALSE" = "Not significant"),
name = NULL) +
labs(title = "Fama–French Three-Factor Model: Coefficient Estimates",
subtitle = "Point estimates ± 1.98 × SE (95% confidence intervals)",
x = "Coefficient estimate",
y = NULL) +
theme_minimal(base_size = 11) +
theme(plot.title = element_text(face = "bold", size = 12),
plot.subtitle = element_text(color = "grey45", size = 9),
legend.position = "bottom",
panel.grid.minor = element_blank())Figure 2: FF3 factor loadings with 95% confidence intervals. MKT and SMB clearly exclude zero; alpha and HML do not.
Given information:
\[\text{logit}\, P(\text{Up}) = \beta_0 + \beta_1 r_{t-1} + \beta_2 \Delta\text{VIX}_{t-1}\]
Formula:
\[\text{logit} = \hat{\beta}_0 + \hat{\beta}_1 r_{t-1} + \hat{\beta}_2 \Delta\text{VIX}\]
\[P(\text{Up}) = \frac{1}{1 + e^{-\text{logit}}}\]
Calculation:
b0 <- -0.02; b1 <- 5.4; b2 <- -0.38
r_lag <- 0.010; delta_vix <- 1.5
logit_val <- b0 + b1 * r_lag + b2 * delta_vix
prob_up <- 1 / (1 + exp(-logit_val))
pred_class <- ifelse(prob_up >= 0.5, "Up", "Down")
cat("Logit value: ", round(logit_val, 4), "\n")## Logit value: -0.536
## P(Up): 0.3691
## Predicted class: Down
\[\text{logit} = -0.02 + 5.4(0.010) + (-0.38)(1.5) = -0.02 + 0.054 - 0.570 = -0.536\]
\[P(\text{Up}) = \frac{1}{1 + e^{0.536}} = 0.3692 < 0.50 \implies \textbf{Down}\]
\(\hat{\beta}_1 = +5.4\) (lagged return): Captures short-term momentum — a positive yesterday’s return raises the probability of an “Up” day tomorrow. Consistent with price continuation and investor herding.
\(\hat{\beta}_2 = -0.38\) (\(\Delta\text{VIX}\)): Captures the risk-off / fear channel — a VIX spike signals rising uncertainty, triggering equity sell-offs. The VIX is the market’s “fear gauge.”
TP <- 67; FP <- 44; FN <- 33; TN <- 56
N <- TP + FP + FN + TN
accuracy <- (TP + TN) / N
sensitivity <- TP / (TP + FN)
specificity <- TN / (TN + FP)
precision <- TP / (TP + FP)
cat("Accuracy: ", round(accuracy, 4), "\n")## Accuracy: 0.615
## Sensitivity: 0.67
## Specificity: 0.56
## Precision: 0.6036
\[\text{Accuracy} = \frac{67+56}{200} = 0.6150 \qquad \text{Sensitivity} = \frac{67}{100} = 0.6700\]
\[\text{Specificity} = \frac{56}{100} = 0.5600 \qquad \text{Precision} = \frac{67}{111} = 0.6036\]
## Naive accuracy: 0.5
## Model accuracy: 0.615
## Model beats naive? TRUE
The naive rule achieves 50% (balanced dataset). The model achieves 61.5% — beating it by 11.5 pp. However, accuracy is inadequate for trading because misclassification costs are asymmetric. The strategy Sharpe ratio is the more economically relevant criterion.
cm_df <- data.frame(
Predicted = factor(c("Up","Up","Down","Down"), levels = c("Up","Down")),
Actual = factor(c("Up","Down","Up","Down"), levels = c("Up","Down")),
Count = c(TP, FP, FN, TN),
Type = c("TP","FP","FN","TN")
)
cm_df$fill_col <- c("#2C7BB6","#FDAE61","#D7191C","#ABD9E9")
ggplot(cm_df, aes(x = Actual, y = Predicted, fill = Type)) +
geom_tile(color = "white", linewidth = 1.5) +
geom_text(aes(label = paste0(Type, "\n", Count)),
size = 5, fontface = "bold", color = "white") +
scale_fill_manual(values = c("TP" = "#1A6FAF", "TN" = "#3A99C9",
"FP" = "#E07B39", "FN" = "#C03020"),
guide = "none") +
labs(title = "Confusion Matrix — 200-Day Hold-out Test Set",
subtitle = "Logistic regression model at 0.5 threshold",
x = "Actual class", y = "Predicted class") +
theme_minimal(base_size = 12) +
theme(plot.title = element_text(face = "bold", size = 12),
plot.subtitle = element_text(color = "grey45", size = 9),
panel.grid = element_blank(),
axis.text = element_text(size = 11))Figure 3a: Confusion matrix heatmap. TP and TN cells (correct predictions) shown in blue shades; errors in red/orange.
r_seq <- seq(-0.04, 0.04, length.out = 200)
logit_vix0 <- b0 + b1 * r_seq + b2 * 0
logit_vix15 <- b0 + b1 * r_seq + b2 * 1.5
p_vix0 <- 1 / (1 + exp(-logit_vix0))
p_vix15 <- 1 / (1 + exp(-logit_vix15))
df3 <- rbind(
data.frame(r = r_seq, prob = p_vix0, scenario = "ΔVIX = 0"),
data.frame(r = r_seq, prob = p_vix15, scenario = "ΔVIX = +1.5")
)
ggplot(df3, aes(x = r * 100, y = prob, color = scenario)) +
geom_line(linewidth = 1) +
geom_hline(yintercept = 0.5, linetype = "dashed", color = "grey40", linewidth = 0.5) +
geom_point(aes(x = r_lag * 100, y = prob_up),
color = "#D7191C", size = 4, shape = 16,
data = data.frame(r_lag = r_lag, prob_up = prob_up, scenario = "ΔVIX = +1.5")) +
annotate("text", x = r_lag * 100 + 0.15, y = prob_up + 0.04,
label = paste0("Today\nP(Up)=", round(prob_up, 3)),
size = 3.2, color = "#D7191C") +
scale_color_manual(values = c("ΔVIX = 0" = "#2C7BB6", "ΔVIX = +1.5" = "#D7191C"),
name = NULL) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1), limits = c(0, 1)) +
labs(title = "Logistic Regression: P(Up) as a Function of Lagged Return",
subtitle = "VIX spike shifts the curve downward — today's prediction is 'Down'",
x = "Lagged return r_{t-1} (%)",
y = "P(Up)") +
theme_minimal(base_size = 11) +
theme(plot.title = element_text(face = "bold", size = 12),
plot.subtitle = element_text(color = "grey45", size = 9),
legend.position = "bottom",
panel.grid.minor = element_blank())Figure 3b: Logistic probability curve as a function of lagged return, at ΔVIX = 0 (blue) and ΔVIX = +1.5 (red). The dashed line marks the 0.5 threshold. Today’s prediction (red dot) falls below the threshold.
Given information:
Formula:
\[SR_{\text{monthly}} = \frac{\bar{\mu}}{\hat{\sigma}} \qquad SR_{\text{annual}} = SR_{\text{monthly}} \times \sqrt{12}\]
mu_monthly <- 0.70
sig_monthly <- 5.50
T_months <- 48
SR_monthly <- mu_monthly / sig_monthly
SR_annual <- SR_monthly * sqrt(12)
cat("Monthly Sharpe ratio: ", round(SR_monthly, 4), "\n")## Monthly Sharpe ratio: 0.1273
## Annualization factor: sqrt(12) = 3.464
## Annualized Sharpe ratio: 0.4409
\[SR_{\text{monthly}} = \frac{0.70\%}{5.50\%} = 0.1273 \qquad SR_{\text{annual}} = 0.1273 \times \sqrt{12} = 0.4409\]
Scaling factor: \(\sqrt{12}\) — under i.i.d. returns, mean scales by 12 and SD by \(\sqrt{12}\), so their ratio scales by \(\sqrt{12}\).
set.seed(42)
mu_true <- 0.70 / 100
sig_true <- 5.50 / 100
returns <- rnorm(T_months, mean = mu_true, sd = sig_true)
B <- 5000
sr_boot_iid <- numeric(B)
for (b in seq_len(B)) {
s <- sample(returns, size = T_months, replace = TRUE)
sr_boot_iid[b] <- mean(s) / sd(s)
}
se_iid <- sd(sr_boot_iid)
ci_lo_iid <- quantile(sr_boot_iid * sqrt(12), 0.025)
ci_hi_iid <- quantile(sr_boot_iid * sqrt(12), 0.975)
cat("Bootstrap SE (annualized): ", round(se_iid * sqrt(12), 4), "\n")## Bootstrap SE (annualized): 0.5178
## 95% CI (annualized): [ -0.7251 , 1.321 ]
Procedure: (1) Resample with replacement \(B = 5{,}000\) times; (2) compute \(SR^*_b = \bar{r}^* / \hat{\sigma}^*\) each time; (3) \(\widehat{SE}(SR) = \text{std}(\{SR^*_b\})\).
Why i.i.d. bootstrap fails: Monthly returns have volatility clustering and autocorrelation — i.i.d. resampling destroys temporal dependence and underestimates uncertainty.
Fix: The stationary block bootstrap (Politis & Romano, 1994) resamples contiguous blocks of length \(\ell\), preserving within-block serial correlation.
lambda_min <- 0.030; factors_min <- 14
lambda_1se <- 0.065; factors_1se <- 7
cat("lambda_min:", lambda_min, "-> factors retained:", factors_min, "\n")## lambda_min: 0.03 -> factors retained: 14
## lambda_1se: 0.065 -> factors retained: 7
Recommendation: Deploy \(\lambda = 0.065\) (1-SE rule) — 7 factors vs. 14. The 1-SE rule selects the most parsimonious model whose CV error is within one standard error of the minimum, yielding lower transaction costs, less overfitting, and better out-of-sample robustness.
total_months <- 60; initial_train <- 36; test_window <- 6
folds <- data.frame(Fold=integer(), Train_Start=integer(), Train_End=integer(),
Test_Start=integer(), Test_End=integer())
train_end <- initial_train; fold_num <- 1
while ((train_end + test_window) <= total_months) {
folds <- rbind(folds, data.frame(Fold=fold_num, Train_Start=1,
Train_End=train_end, Test_Start=train_end+1, Test_End=train_end+test_window))
train_end <- train_end + test_window; fold_num <- fold_num + 1
}
print(folds, row.names = FALSE)## Fold Train_Start Train_End Test_Start Test_End
## 1 1 36 37 42
## 2 1 42 43 48
## 3 1 48 49 54
## 4 1 54 55 60
Scheme: Expand training window by 6 months each fold; test always comes strictly after training. Random \(k\)-fold CV is unsafe because it allows future data into the training set (look-ahead bias), inflating backtest performance.
sr_ann_boot <- sr_boot_iid * sqrt(12)
sr_ann_point <- SR_annual
df4a <- data.frame(sr = sr_ann_boot)
ggplot(df4a, aes(x = sr)) +
geom_histogram(aes(y = after_stat(density)), bins = 60,
fill = "#ABD9E9", color = "white", linewidth = 0.2) +
geom_density(color = "#2C7BB6", linewidth = 0.9) +
geom_vline(xintercept = sr_ann_point, color = "#D7191C",
linewidth = 1, linetype = "solid") +
geom_vline(xintercept = ci_lo_iid, color = "#E07B39",
linewidth = 0.8, linetype = "dashed") +
geom_vline(xintercept = ci_hi_iid, color = "#E07B39",
linewidth = 0.8, linetype = "dashed") +
annotate("text", x = sr_ann_point + 0.02, y = Inf, vjust = 1.5,
label = paste0("SR = ", round(sr_ann_point, 3)), color = "#D7191C", size = 3.5) +
labs(title = "Bootstrap Distribution of Annualized Sharpe Ratio",
subtitle = paste0("B = 5,000 i.i.d. resamples | 95% CI: [",
round(ci_lo_iid,3), ", ", round(ci_hi_iid,3), "]"),
x = "Annualized Sharpe Ratio", y = "Density") +
theme_minimal(base_size = 11) +
theme(plot.title = element_text(face = "bold", size = 12),
plot.subtitle = element_text(color = "grey45", size = 9),
panel.grid.minor = element_blank())Figure 4a: Bootstrap distribution of the annualized Sharpe ratio (5,000 draws). The red line marks the point estimate; shaded tails show the 95% CI.
folds$fold_label <- paste0("Fold ", folds$Fold)
df4b <- rbind(
data.frame(Fold = folds$fold_label, xmin = folds$Train_Start,
xmax = folds$Train_End, type = "Train"),
data.frame(Fold = folds$fold_label, xmin = folds$Test_Start,
xmax = folds$Test_End, type = "Test")
)
df4b$Fold <- factor(df4b$Fold, levels = rev(unique(df4b$Fold)))
ggplot(df4b, aes(xmin = xmin, xmax = xmax,
ymin = as.numeric(Fold) - 0.38,
ymax = as.numeric(Fold) + 0.38,
fill = type)) +
geom_rect(color = "white", linewidth = 0.5) +
scale_fill_manual(values = c("Train" = "#2C7BB6", "Test" = "#FDAE61"),
name = "Period") +
scale_x_continuous(breaks = seq(0, total_months, by = 6),
labels = paste0("M", seq(0, total_months, by = 6))) +
scale_y_continuous(breaks = seq_along(levels(df4b$Fold)),
labels = levels(df4b$Fold)) +
labs(title = "Walk-Forward Cross-Validation — Expanding Window",
subtitle = "Training window grows with each fold; test window is always in the future",
x = "Month", y = NULL) +
theme_minimal(base_size = 11) +
theme(plot.title = element_text(face = "bold", size = 12),
plot.subtitle = element_text(color = "grey45", size = 9),
legend.position = "bottom",
panel.grid.minor = element_blank(),
panel.grid.major.y = element_blank())Figure 4b: Walk-forward fold structure (expanding window). Blue = training periods; orange = test windows. Test data always comes after training data, eliminating look-ahead bias.