Model: \(R_i - R_f = \alpha + \beta(R_m - R_f) + \varepsilon\), estimated over \(n = 96\) months.
| Term | Estimate | Std. Error |
|---|---|---|
| Intercept \(\alpha\) | 0.0017 | 0.0020 |
| Market premium \(\beta\) | 0.98 | 0.17 |
\(R^2 = 0.50\), \(E[R_m - R_f] = 0.70\%\), critical \(|t| \approx 1.98\).
Formula: \[t_{\hat\beta} = \frac{\hat\beta - 0}{SE(\hat\beta)}\]
beta_hat <- 0.98
se_beta <- 0.17
t_beta <- beta_hat / se_beta
cat("t-statistic for beta:", round(t_beta, 4), "\n")## t-statistic for beta: 5.765
## Critical |t|: 1.98
## Reject H0: beta = 0? TRUE
Result: \(t_{\hat\beta} = 5.7647\). Since \(|t| = `r round(abs(t_beta),4)| > 1.98\), we reject \(H_0: \beta = 0\) at the 5% level.
Economic interpretation of \(\hat\beta = 0.98\): The fund’s excess return moves almost one-for-one with the market premium. A 1% rise in the market excess return is associated with a 0.98% rise in the fund’s excess return, indicating near-market-average systematic sensitivity.
Formula: \[t = \frac{\hat\beta - 1}{SE(\hat\beta)}\]
## t-statistic for H0: beta = 1: -0.1176
## Reject H0: beta = 1? FALSE
Result: \(t = -0.1176\). Since \(|t| = 0.1176 < 1.98\), we fail to reject \(H_0: \beta = 1\).
Interpretation: The data are consistent with the fund having the same systematic risk as the market. There is no statistically significant evidence that the fund is more or less aggressive than a passive index.
Formula: \[t_{\hat\alpha} = \frac{\hat\alpha - 0}{SE(\hat\alpha)}\]
alpha_hat <- 0.0017
se_alpha <- 0.0020
t_alpha <- alpha_hat / se_alpha
cat("t-statistic for alpha:", round(t_alpha, 4), "\n")## t-statistic for alpha: 0.85
## Reject H0: alpha = 0? FALSE
Result: \(t_{\hat\alpha} = 0.85\). Since \(|t| = 0.85 < 1.98\), we fail to reject \(H_0: \alpha = 0\).
Conclusion: Although \(\hat\alpha = 0.0017 > 0\), the estimate is not statistically distinguishable from zero at the 5% level. The marketing team’s claim of “positive risk-adjusted performance” is not statistically justified by these data.
## Systematic fraction (R²): 0.5
## Diversifiable fraction (1 - R²): 0.5
Interpretation: \(R^2 = 0.50\) means that 50% of the fund’s total return variance is explained by movements in the market factor (systematic risk). The remaining 50% is idiosyncratic (diversifiable) risk — variation unrelated to the market that could be reduced by holding a broader portfolio.
Formula: \[E[R_i - R_f] = \hat\beta \times E[R_m - R_f]\]
mkt_premium <- 0.70 # percent
capm_E <- beta_hat * mkt_premium
cat("CAPM-implied E[R_i - R_f]:", round(capm_E, 4), "%\n")## CAPM-implied E[R_i - R_f]: 0.686 %
Result: The CAPM predicts an expected monthly excess return of 0.686% for this fund.
Model: \(R_i - R_f = \alpha + b \cdot MKT + s \cdot SMB + h \cdot HML + \varepsilon\), \(n = 144\) months.
| Term | Estimate | Std. Error |
|---|---|---|
| Intercept \(\alpha\) | 0.0029 | 0.0018 |
| MKT (\(b\)) | 0.97 | 0.08 |
| SMB (\(s\)) | 0.75 | 0.11 |
| HML (\(h\)) | -0.13 | 0.13 |
\(R^2 = 0.92\), Adjusted \(R^2 = 0.918\). Critical \(|t| \approx 1.98\).
Formula for each: \(t_k = \hat\theta_k / SE(\hat\theta_k)\)
coefs <- c(alpha = 0.0029, b_MKT = 0.97, s_SMB = 0.75, h_HML = -0.13)
ses <- c(alpha = 0.0018, b_MKT = 0.08, s_SMB = 0.11, h_HML = 0.13)
t_stats <- coefs / ses
significant <- abs(t_stats) > 1.98
results <- data.frame(
Estimate = coefs,
Std.Error = ses,
t_stat = round(t_stats, 4),
Significant_5pct = significant
)
print(results)## Estimate Std.Error t_stat Significant_5pct
## alpha 0.0029 0.0018 1.611 FALSE
## b_MKT 0.9700 0.0800 12.125 TRUE
## s_SMB 0.7500 0.1100 6.818 TRUE
## h_HML -0.1300 0.1300 -1.000 FALSE
Summary:
## SMB loading (s): 0.75 -> positive => SMALL-cap tilt
## HML loading (h): -0.13 -> negative => GROWTH tilt
Style classification:
Overall, the fund resembles a small-cap growth strategy.
## Alpha: 0.0029
## t-stat for alpha: 1.611
## Significant? FALSE
Interpretation: \(\hat\alpha = 0.0029\) corresponds to a monthly risk-adjusted return of approximately 0.29% above what the three factors explain. However, \(t = 1.6111 < 1.98\), so it is not statistically significant at the 5% level. We cannot conclude the manager adds value beyond the passive factor exposures; the positive alpha may be sampling variation.
R2_capm <- 0.75
R2_ff3 <- 0.92
n <- 144
k_capm <- 1 # one predictor
k_ff3 <- 3 # three predictors
adj_R2_capm <- 1 - (1 - R2_capm) * (n - 1) / (n - k_capm - 1)
adj_R2_ff3 <- 1 - (1 - R2_ff3) * (n - 1) / (n - k_ff3 - 1)
cat("Adjusted R² (CAPM, 1 factor):", round(adj_R2_capm, 4), "\n")## Adjusted R² (CAPM, 1 factor): 0.7482
## Adjusted R² (FF3, 3 factors): 0.9183
## Reported Adjusted R² (FF3): 0.918
Interpretation of the rise:
The jump from \(R^2 = 0.75\) (CAPM) to
\(R^2 = 0.92\) (FF3) shows that
SMB and HML together explain an additional 17 percentage
points of return variance. The fund’s returns have important
size and style exposures that a single market factor misses.
Why Adjusted R²:
Adding any predictor — even a useless one — mechanically increases \(R^2\). The adjusted metric penalises each
extra degree of freedom: \[\bar{R}^2 = 1 -
\frac{(1-R^2)(n-1)}{n - k - 1}\] It rises only if a new predictor
improves fit more than chance would. With different numbers of
predictors across models, adjusted \(R^2\) provides a fair apples-to-apples
comparison (here \(\bar{R}^2 = 0.918\)
for FF3 vs \(\approx 0.748\) for
CAPM).
Model: \[\text{logit}\,P(\text{Up}) = \beta_0 + \beta_1 r_{t-1} + \beta_2 \Delta VIX_{t-1}\]
Coefficients: \(\beta_0 = -0.02\),
\(\beta_1 = 5.4\), \(\beta_2 = -0.38\).
Today’s inputs: \(r_{t-1} = 0.010\),
\(\Delta VIX = 1.5\).
Formula: \[\hat{P}(\text{Up}) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 r_{t-1} + \beta_2 \Delta VIX)}}\]
b0 <- -0.02; b1 <- 5.4; b2 <- -0.38
r_lag <- 0.010
dVIX <- 1.5
log_odds <- b0 + b1 * r_lag + b2 * dVIX
prob_up <- 1 / (1 + exp(-log_odds))
cat("Log-odds (linear combination):", round(log_odds, 4), "\n")## Log-odds (linear combination): -0.536
## Predicted P(Up): 0.3691
## Predicted class (threshold 0.5): Down
Result: \(\hat{P}(\text{Up}) = 0.3691\). At the 0.5 threshold the model predicts Down.
\(\beta_1 = 5.4 > 0\)
(lagged return):
A positive lagged return increases the log-odds of an “Up” day. This
captures short-term momentum: if yesterday’s market was
up, today’s market is more likely to be up. It is consistent with
momentum or trend-following behaviour at the daily frequency.
\(\beta_2 = -0.38 < 0\)
(ΔVIX):
An increase in the VIX (rising fear/volatility) decreases the
log-odds of an “Up” day. This captures risk-off
sentiment: when implied volatility spikes, market participants
expect or are experiencing negative returns, making a positive day less
likely. It is consistent with the well-documented negative
contemporaneous correlation between VIX changes and equity returns.
TP <- 67; FP <- 44; FN <- 33; TN <- 56
N <- TP + FP + FN + TN
accuracy <- (TP + TN) / N
sensitivity <- TP / (TP + FN) # True Positive Rate for "Up"
specificity <- TN / (TN + FP) # True Negative Rate
precision <- TP / (TP + FP) # Positive Predictive Value
cat("Accuracy: ", round(accuracy, 4), "\n")## Accuracy: 0.615
## Sensitivity: 0.67
## Specificity: 0.56
## Precision: 0.6036
| Metric | Formula | Value |
|---|---|---|
| Accuracy | \((TP+TN)/N\) | 0.615 |
| Sensitivity | \(TP/(TP+FN)\) | 0.67 |
| Specificity | \(TN/(TN+FP)\) | 0.56 |
| Precision | \(TP/(TP+FP)\) | 0.6036 |
# The dataset has 100 Up and 100 Down — perfectly balanced
majority_class <- "Up (or Down — tie)"
naive_accuracy <- max(100, 100) / N
cat("Naive classifier accuracy:", round(naive_accuracy, 4), "\n")## Naive classifier accuracy: 0.5
## Logistic model accuracy: 0.615
## Model beats naive? TRUE
Result: With 100 “Up” and 100 “Down” days the majority class is a tie at 50%. The logistic model achieves 61.5% accuracy, which beats the naive baseline of 50%.
Why accuracy alone is inadequate for a trading system:
Setup: \(n = 48\) monthly returns, \(\bar{r} = 0.70\%\), \(\hat\sigma = 5.50\%\).
Monthly Sharpe ratio (excess return already given as mean monthly return over \(R_f\)): \[SR_{monthly} = \frac{\bar{r}}{\hat\sigma}\]
Annualized Sharpe ratio (scaling by \(\sqrt{12}\), since there are 12 months per year): \[SR_{annual} = SR_{monthly} \times \sqrt{12}\]
r_bar <- 0.70 # percent
sigma <- 5.50 # percent
SR_monthly <- r_bar / sigma
SR_annual <- SR_monthly * sqrt(12)
cat("Monthly Sharpe ratio: ", round(SR_monthly, 4), "\n")## Monthly Sharpe ratio: 0.1273
## Scaling factor: sqrt(12) = 3.464
## Annualized Sharpe ratio: 0.4409
Results: \(SR_{monthly} = 0.1273\); \(SR_{annual} = 0.4409\).
The scaling factor is \(\sqrt{12}\) because if monthly returns are i.i.d., the annualised mean scales by 12 and the annualised standard deviation scales by \(\sqrt{12}\), so their ratio scales by \(12/\sqrt{12} = \sqrt{12}\).
Step-by-step bootstrap procedure:
set.seed(42)
# Simulate 48 monthly returns for illustration
r_sim <- rnorm(48, mean = 0.007, sd = 0.055)
B <- 10000
SR_boot <- numeric(B)
for (b in seq_len(B)) {
# Step 1: Draw n = 48 observations WITH replacement
r_star <- sample(r_sim, size = length(r_sim), replace = TRUE)
# Step 2: Compute the Sharpe ratio on the resample
SR_boot[b] <- mean(r_star) / sd(r_star)
}
# Step 3: SE = standard deviation of bootstrap distribution
SE_SR <- sd(SR_boot)
cat("Bootstrap SE of monthly Sharpe (simulated data):", round(SE_SR, 4), "\n")## Bootstrap SE of monthly Sharpe (simulated data): 0.149
cat("95% CI: [", round(quantile(SR_boot, 0.025), 4),
",", round(quantile(SR_boot, 0.975), 4), "]\n")## 95% CI: [ -0.2097 , 0.3765 ]
Steps described:
Why i.i.d. bootstrap is inappropriate:
Monthly financial returns exhibit serial dependence —
autocorrelation in volatility (GARCH effects), momentum, and
mean-reversion. The i.i.d. bootstrap destroys the temporal dependence
structure by shuffling observations randomly, which can understate or
overstate the true sampling variability of the Sharpe ratio.
Fix — block bootstrap (e.g., stationary or circular block
bootstrap):
Draw contiguous blocks of \(\ell\) consecutive months (e.g., \(\ell = 3\)–6) with replacement, preserving
local autocorrelation. The stationary block bootstrap
(Politis & Romano 1994) uses random block lengths geometrically
distributed around \(\ell\), ensuring
the resampled series is stationary and retaining serial structure.
lambda_min <- 0.030; factors_min <- 14
lambda_1se <- 0.065; factors_1se <- 7
cat("lambda_min: factors retained =", factors_min, "\n")## lambda_min: factors retained = 14
## lambda_1se: factors retained = 7
Recommendation: deploy \(\lambda_{1SE} = 0.065\) (7-factor model).
Rationale:
Parsimony and out-of-sample stability: \(\lambda_{min}\) minimises in-sample CV error but retains 14 factors — twice as many. With only 60 candidate factors, 14 predictors risk over-fitting noise. The 1-SE rule selects the most regularised model whose CV error is within one standard error of the minimum, deliberately sacrificing a trivial amount of in-sample fit for substantially lower variance.
Overfitting risk in finance: Financial factor data are collinear and noisy. Extra factors typically capture spurious patterns in the training window that do not persist out-of-sample (a well-documented problem in factor zoo research).
Transaction costs and implementation: Fewer factors mean fewer positions, lower rebalancing costs, and a strategy that is easier to manage and explain.
The 1-SE solution at \(\lambda = 0.065\) retaining 7 factors is the more robust and deployable choice.
Walk-forward scheme:
## Fold Train Validate
## 1 1 t1–t24 t25–t30
## 2 2 t1–t30 t31–t36
## 3 3 t1–t36 t37–t42
## 4 4 t1–t42 t43–t48
## 5 5 t1–t48 t49–t54
Step-by-step procedure:
Why standard random k-fold cross-validation is unsafe:
End of examination.