Machine Learning Applications in Finance: Final Examination

Question 1. Single-Factor (Market) Model
Question 2. Fama–French Three-Factor Model
Question 3. Logistic Regression for Market Direction
Question 4. Resampling and Regularization in a Backtest

Question 1. Single-Factor (Market) Model

The single-factor market model equation is specified as:

\[R_i - R_f = \alpha + \beta(R_m - R_f) + \varepsilon\]

Given parameters over 96 months: * \(\alpha\) estimate = 0.0017, Std. Error = 0.0020 * \(\beta\) estimate = 0.98, Std. Error = 0.17 * \(R^2\) = 0.50 * \(E[R_m - R_f]\) = 0.70% * Critical \(|t|\) \(\approx\) 1.98

(a) Compute the t-statistic for \(\beta\) and test \(H_0: \beta = 0\)

\[t_{\beta} = \frac{\text{Estimate}}{\text{Std. Error}} = \frac{0.98}{0.17} = 5.7647\]

Statistical Test: Since the calculated \(|t_{\beta}| = 5.7647\) is strictly greater than the critical value of \(1.98\), we reject the null hypothesis \(H_0: \beta = 0\) at the 5% significance level. The market premium coefficient is statistically significant.
Economic Interpretation: A beta (\(\beta\)) of 0.98 indicates that the fund has a close-to-unity exposure to market systematic risk. For every 1% increase (or decrease) in the market excess return, the fund’s excess return is expected to increase (or decrease) by 0.98%.

(b) Test \(H_0: \beta = 1\)

\[t_{(\beta=1)} = \frac{0.98 - 1}{0.17} = \frac{-0.02}{0.17} = -0.1176\]

Statistical Test: Since the absolute calculated \(|t_{(\beta=1)}| = 0.1176\) is less than the critical value of \(1.98\), we fail to reject the null hypothesis \(H_0: \beta = 1\) at the 5% significance level.
Economic Conclusion: The fund’s systematic risk is statistically indistinguishable from the overall market’s systematic risk.

(c) Compute the t-statistic for \(\alpha\) (Jensen’s alpha)

\[t_{\alpha} = \frac{0.0017}{0.0020} = 0.8500\]

Conclusion: Since \(|t_{\alpha}| = 0.8500 < 1.98\), we fail to reject \(H_0: \alpha = 0\) at the 5% level. The marketing team’s claim of ‘positive risk-adjusted performance’ is not statistically justified. While the point estimate is positive (0.0017), it is statistically indistinguishable from zero, meaning the observed outperformance could simply be due to random chance.

(d) Interpret \(R^2\)

Systematic Variation (\(R^2\)): 0.50 (or 50%). Exactly 50% of the variation in the fund’s excess returns is explained by its exposure to the market excess return.
Diversifiable Variation (\(1 - R^2\)): \(1 - 0.50 = 0.50\) (or 50%). The remaining 50% of the return variance is driven by idiosyncratic, fund-specific factors that can be removed through diversification.

(e) Compute the CAPM-implied expected monthly excess return

\[E[R_i - R_f] = \beta \times E[R_m - R_f] = 0.98 \times 0.70\% = 0.6860\% \text{ (or } 0.00686\text{)}\]

The CAPM-implied expected monthly excess return for the fund is 0.6860%.

Question 2. Fama–French Three-Factor Model

The model equation is specified as:

\[R_i - R_f = \alpha + b \cdot \text{MKT} + s \cdot \text{SMB} + h \cdot \text{HML} + \varepsilon\]

Given parameters over 144 months: * Critical \(|t|\) = 1.98 * \(R^2\) = 0.92, Adjusted \(R^2\) = 0.918

(f) Compute the t-statistic for each coefficient

Intercept (\(\alpha\)): \(t = \frac{0.0029}{0.0018} = 1.6111 \rightarrow\) Not Significant (\(|t| < 1.98\))
MKT (\(b\)): \(t = \frac{0.97}{0.08} = 12.1250 \rightarrow\) Significant (\(|t| \ge 1.98\))
SMB (\(s\)): \(t = \frac{0.75}{0.11} = 6.8182 \rightarrow\) Significant (\(|t| \ge 1.98\))
HML (\(h\)): \(t = \frac{-0.13}{0.13} = -1.0000 \rightarrow\) Not Significant (\(|t| < 1.98\))

(g) Classify the fund’s investment style

Size Tilt: The SMB coefficient (\(s = 0.75\)) is positive and highly statistically significant (\(t = 6.8182\)). This indicates a strong small-cap tilt (the portfolio holds significant exposure to small-cap stocks).
Value/Growth Tilt: The HML coefficient (\(h = -0.13\)) is negative, suggesting a minor growth tilt (exposure to low book-to-market assets). However, because it is statistically insignificant (\(t = -1.0000\)), the fund’s value/growth stance cannot be reliably distinguished from a neutral market blend profile.

(h) Interpret the intercept (\(\alpha\))

The intercept (\(\alpha = 0.0029\)) represents the risk-adjusted monthly abnormal return after controlling for market, size, and style risk premiums. Because the intercept is statistically insignificant (\(t = 1.6111 < 1.98\)), the manager does not add statistically verifiable value beyond basic factor exposures. The historical excess performance is explained by systematic factor loadings rather than stock-picking skill.

(i) Interpret the rise in \(R^2\) vs Adjusted \(R^2\)

Interpretation of the Rise: Moving from a single-factor CAPM (\(R^2 = 0.75\)) to the Three-Factor model (\(R^2 = 0.92\)) shows that the inclusion of the size (SMB) and value (HML) factors captures an additional 17% of the total variance in the fund’s returns.
Why Adjusted \(R^2\) is Used: Standard \(R^2\) monotonically increases whenever new predictors are added, even if they are noise. Adjusted \(R^2\) penalizes for degrees of freedom spent on additional parameters. Because the Adjusted \(R^2\) (0.918) remains exceptionally close to the multiple \(R^2\) (0.92), it validates that the extra parameters provide real, meaningful explanatory power.

Question 3. Logistic Regression for Market Direction

The logit specification is given by:

\[\text{logit } P(\text{Up}) = \beta_0 + \beta_1 \cdot (r_{t-1}) + \beta_2 \cdot (\Delta \text{VIX}_{t-1})\]

Given inputs: * \(\beta_0 = -0.02\), \(\beta_1 = 5.4\), \(\beta_2 = -0.38\) * Today’s inputs: \(r_{t-1} = 0.010\), \(\Delta \text{VIX}_{t-1} = 1.5\)

(j) Compute the predicted probability of an “Up” day

\[\text{Log-odds } (z) = -0.02 + 5.4 \times (0.010) + (-0.38) \times (1.5) = -0.5360\]

\[P(\text{Up}) = \frac{1}{1 + e^{-z}} = \frac{1}{1 + e^{0.5360}} = \frac{1}{1 + 1.7091} = 0.3691\]

Classification: Since the predicted probability \(P(\text{Up}) = 0.3691\) is strictly below the classification threshold of \(0.5\), the predicted class for today is Down.

(k) Economically interpret the signs of \(\beta_1\) and \(\beta_2\)

\(\beta_1 = 5.4\) (Positive): Captures market momentum. A positive return yesterday shifts the log-odds upward, boosting the likelihood that the market will continue to go up today.
\(\beta_2 = -0.38\) (Negative): Captures the implied volatility/fear effect. Because the VIX moves inversely with equity direction, a positive spike in yesterday’s VIX indicates elevated market uncertainty, structurally decreasing the probability of an upward return today.

(l) Confusion Matrix Metrics

Given: \(TP = 67\), \(FP = 44\), \(FN = 33\), \(TN = 56\), \(\text{Total} = 200\)

Accuracy = \(\frac{67 + 56}{200} = 0.6150\)
Sensitivity = \(\frac{67}{67 + 33} = 0.6700\)
Specificity = \(\frac{56}{56 + 44} = 0.5600\)
Precision = \(\frac{67}{67 + 44} = 0.6036\)

(m) Naive Rule Performance and System Evaluation

Naive Rule: The dataset contains 100 actual ‘Up’ days and 100 ‘Down’ days. A baseline predictor predicting the majority class would achieve an accuracy of \(\frac{100}{200} = 0.5000\). The model’s accuracy is \(0.6150\), successfully beating the benchmark.
Why Accuracy is Inadequate: 1. Asymmetric Costs: Treats False Positives (buying an asset that crashes) and False Negatives (missing a gain) identically.
1. Ignores Return Magnitudes: Evaluates count of correct signs, not financial loss. A system could be 80% accurate but go bankrupt if the 20% errors occur during a major crash.
Economic Criteria: Sharpe Ratio, Information Ratio, Maximum Drawdown, or Profit Factor.

Question 4. Resampling and Regularization in a Backtest

Given performance figures over 48 months: * Monthly Mean = 0.70%, Monthly SD = 5.50%

(n) Compute the Annualized Sharpe Ratio

\[\text{Monthly Sharpe Ratio} = \frac{0.70\%}{5.50\%} = 0.1273\] \[\text{Annualized Sharpe Ratio} = 0.1273 \times \sqrt{12} = 0.4409\]

The scaling factor used is \(\sqrt{12}\) because variances scale linearly with time under i.i.d assumptions.

(o) Bootstrap Procedure

Collect original historical series of 48 monthly returns.
Randomly draw 48 observations with replacement.
Calculate the annualized Sharpe ratio for this bootstrap sample.
Repeat steps 2 and 3 \(B\) times (e.g., \(B = 10,000\)).
Compute the standard deviation across all \(B\) estimates to find the Bootstrap Standard Error.

Inappropriateness: Financial returns display serial temporal dependencies (volatility clustering). The standard i.i.d bootstrap shuffles points randomly, breaking this structure and underestimating risk.
The Fix: A Block Bootstrap (Moving Block or Stationary Bootstrap) must be used.

(p) Regularization Parameter Selection (\(\lambda\))

Decision: Deploy \(\lambda = 0.065\) (the 1-SE rule).
Reasoning: The 1-SE rule selects the most parsimonious (simplest) model within one standard error of the minimum CV error. Financial markets are prone to data-mining bias. A model keeping 14 factors will likely overfit noise. Choosing \(\lambda = 0.065\) cuts the factor footprint down to 7, decreasing model variance and increasing out-of-sample durability.

(q) Walk-Forward Evaluation vs K-Fold CV

Partitioning: Arrange data chronologically. Establish an initial training window and testing block.
Estimation: Fit Lasso regression and pick variables solely inside the training window.
Testing: Run strategy parameters on the immediate following out-of-sample forward block.
Forward Roll: Shift the timeline forward, re-optimize parameters, and trade the next block.
Aggregation: Splice all out-of-sample blocks together to compute clean performance metrics.

Why K-Fold CV is Unsafe: Standard k-fold cross-validation shuffles rows randomly across time. This allows future data to enter the training set to forecast past data inside validation folds, generating look-ahead bias and artificial backtest performance.