Final Examination: Machine Learning Applications in Finance

Question 1: Single-Factor (Market) Model

(a) Compute the t-statistic for \(\beta\) and test \(H_0: \beta = 0\)

First, we calculate the t-statistic by dividing the estimate by its standard error.

beta_est <- 0.98
beta_se <- 0.17
t_beta <- beta_est / beta_se
round(t_beta, 4)

## [1] 5.7647

Interpretation: The t-statistic is 5.7647. Since \(5.7647 > 1.98\) (the critical value), we reject the null hypothesis at the 5% level. Economically, \(\beta\) represents the fund’s systematic risk relative to the market. For every 1% increase in the market’s excess return, the fund’s excess return is expected to increase by 0.98%.

(b) Test \(H_0: \beta = 1\) at the 5% level

Here, we test if the beta is statistically different from 1 (the market average).

t_beta_1 <- (beta_est - 1) / beta_se
round(t_beta_1, 4)

## [1] -0.1176

Interpretation: The absolute value of the t-statistic is \(|-0.1176|\), which is less than the critical value of 1.98. We fail to reject the null hypothesis. This means the fund’s systematic risk is statistically indistinguishable from the overall market.

(c) Compute the t-statistic for \(\alpha\) (Jensen’s alpha)

The marketing team wants to advertise positive risk-adjusted performance. Let’s see if the intercept (\(\alpha\)) backs that up.

alpha_est <- 0.0017
alpha_se <- 0.0020
t_alpha <- alpha_est / alpha_se
round(t_alpha, 4)

## [1] 0.85

Interpretation: The t-statistic is 0.8500, which is not statistically significant (since \(0.8500 < 1.98\)). The data do not statistically justify the marketing team’s claim. The positive alpha value is statistically indistinguishable from zero and could simply be due to random chance rather than genuine manager skill.

(d) Interpret \(R^2\)

The reported \(R^2\) is 0.50. This means exactly 50% (or 0.5000) of the fund’s return variation is systematic (explained by the market), while the remaining 50% (or 0.5000) is diversifiable (idiosyncratic) risk.

(e) CAPM-implied expected monthly excess return

Using the CAPM formula: \(\text{Expected Return} = \beta \times E[R_m-R_f]\)

market_premium <- 0.0070 # 0.70%
expected_return <- beta_est * market_premium
round(expected_return * 100, 4) # Shown as a percentage

## [1] 0.686

The CAPM-implied expected monthly excess return for the fund is 0.6860%.

Question 2: Fama-French Three-Factor Model

(f) Compute the t-statistic for each coefficient

estimates <- c(Intercept = 0.0029, MKT = 0.97, SMB = 0.75, HML = -0.13)
std_errors <- c(0.0018, 0.08, 0.11, 0.13)
t_stats <- estimates / std_errors
round(t_stats, 4)

## Intercept       MKT       SMB       HML 
##    1.6111   12.1250    6.8182   -1.0000

Intercept (\(\alpha\)): 1.6111 (Not significant at 5% level)
MKT (b): 12.1250 (Significant)
SMB (s): 6.8182 (Significant)
HML (h): -1.0000 (Not significant)

(g) Classify the fund’s investment style

Size Tilt: The positive SMB loading (\(s = 0.75\)) indicates a strong tilt toward small-cap stocks.
Value/Growth Tilt: The negative HML loading (\(h = -0.13\)) indicates a tilt toward growth stocks.

(h) Interpret the intercept and its significance

The intercept (\(\alpha\)) is 0.0029, but as shown above, its t-statistic is 1.6111, making it statistically insignificant. This indicates that the manager does not consistently add value (generate alpha) beyond what is already explained by their passive exposure to the Market, Size, and Value factors.

(i) Compare \(R^2\) metrics

Rise to 0.92: The single-factor model had an \(R^2\) of 0.75, while this model’s \(R^2\) is 0.92. This indicates that adding the Size (SMB) and Value (HML) factors explains an additional 17% of the variation in the fund’s returns.
Adjusted \(R^2\): Adjusted \(R^2\) is the appropriate metric when comparing models with different numbers of predictors because it penalizes the model for adding extra variables. This prevents the illusion of a better fit caused solely by increasing the number of independent variables, making it a fair comparison.

Question 3: Logistic Regression for Market Direction

(j) Predicted probability and class

First, we calculate the logit, then transform it into a probability using the logistic function.

b0 <- -0.02
b1 <- 5.4
b2 <- -0.38
r_lag <- 0.010
vix_change <- 1.5

logit <- b0 + (b1 * r_lag) + (b2 * vix_change)
prob_up <- 1 / (1 + exp(-logit))

c(Logit = round(logit, 4), Probability = round(prob_up, 4))

##       Logit Probability 
##     -0.5360      0.3691

The predicted probability of an “Up” day is 0.3691. At a 0.5 threshold, since \(0.3691 < 0.5\), the predicted class is Down.

(k) Economic Interpretations

\(\beta_1\) (5.4): A positive past return increases the probability of an “Up” market tomorrow. This captures momentum behavior in the market.
\(\beta_2\) (-0.38): An increase in the VIX (the fear gauge) decreases the probability of an “Up” market. This captures the inverse relationship between market volatility/fear and positive equity returns.

(l) Confusion Matrix Metrics

From the provided matrix (Total = 200):

True Positives (TP, Predicted Up/Actual Up) = 67
True Negatives (TN, Predicted Down/Actual Down) = 56
False Positives (FP, Predicted Up/Actual Down) = 44
False Negatives (FN, Predicted Down/Actual Up) = 33

TP <- 67; TN <- 56; FP <- 44; FN <- 33
accuracy <- (TP + TN) / 200
sensitivity <- TP / (TP + FN) # True-positive rate for "Up"
specificity <- TN / (TN + FP)
precision <- TP / (TP + FP)

metrics <- c(Accuracy = accuracy, Sensitivity = sensitivity, 
             Specificity = specificity, Precision = precision)
round(metrics, 4)

##    Accuracy Sensitivity Specificity   Precision 
##      0.6150      0.6700      0.5600      0.6036

(m) Naive rule and adequacy of accuracy

Naive Rule: Since the actual classes are split exactly perfectly (100 “Up” and 100 “Down”), a naive rule that always predicts the majority class yields an accuracy of 0.5000. The model’s accuracy of 0.6150 clearly beats this baseline.
Why accuracy is inadequate: Accuracy treats all misclassifications as equally bad. In a trading system, the financial cost of a false positive (taking a bad trade and losing money) and a false negative (missing a good trade and losing opportunity) are asymmetrical.
Better Criterion: More economically relevant criteria would be the strategy’s expected profitability, maximum drawdown, or Sharpe ratio.

Question 4: Resampling and Regularization in a Backtest

(n) Monthly and Annualized Sharpe Ratio

mean_ret <- 0.0070
sd_ret <- 0.0550
sharpe_monthly <- mean_ret / sd_ret

# Annualizing using the square root of 12 scaling factor
scaling_factor <- sqrt(12)
sharpe_annual <- sharpe_monthly * scaling_factor

round(c(Monthly = sharpe_monthly, Annualized = sharpe_annual), 4)

##    Monthly Annualized 
##     0.1273     0.4409

The scaling factor used to annualize the Sharpe ratio is \(\sqrt{12}\) (approx. 3.4641).

(o) Bootstrap Procedure

Step-by-Step: To calculate the standard error without assuming normality, draw a random sample of 48 months with replacement from the original dataset. Calculate the Sharpe ratio for this new sample. Repeat this resampling process thousands of times (e.g., 10,000 iterations) to build an empirical distribution. The standard deviation of these generated Sharpe ratios serves as your standard error.
i.i.d. Flaw: The ordinary i.i.d. bootstrap is inappropriate because monthly financial return data often exhibit serial correlation (autocorrelation) and volatility clustering, meaning the data points are not purely independent.
The Fix: You should use a block bootstrap variant. This resamples contiguous blocks of time rather than individual months, successfully preserving the original time-series structure and dependence.

(p) Lasso Regularization Selection

I would deploy the solution provided by the one-standard-error rule (\(\lambda=0.065\), retaining 7 factors).

Why: Financial data is notoriously noisy. The one-standard-error rule provides a much simpler, more parsimonious model whose cross-validated error is statistically indistinguishable (within one standard error) from the absolute minimum error. Retaining 7 factors instead of 14 heavily penalizes complexity and helps protect against overfitting the noise.

(q) Time-respecting (walk-forward) evaluation

Walk-Forward Scheme: Train the model on an initial historical window (e.g., months 1-24) and test it on the immediate next period (e.g., month 25). Then, expand or roll the training window forward chronologically to include month 25, retrain, and predict month 26. Accumulate the out-of-sample predictions step-by-step.
Why k-fold is unsafe: Time series data has a strict chronological order. Standard random k-fold cross-validation is unsafe because it jumbles the timeline. This causes “look-ahead bias” or data leakage, where future information is wrongly used to train the model to predict past events, artificially inflating the backtest’s performance metrics.