Overview

This document answers all four questions of the final examination of Application of Financial Software Package. Every coefficient test uses the two-sided rule reject \(H_0\) when \(|t| > t_{crit} \approx 1.98\) (the large-sample 5% two-tailed critical value). All numerical results are computed in R so the analysis is fully reproducible, and figures are rounded to four decimals unless stated otherwise.

The general single-coefficient \(t\)-statistic for a hypothesised value \(\theta_0\) is

\[ t \;=\; \frac{\hat{\theta} - \theta_0}{\widehat{\mathrm{SE}}(\hat{\theta})}. \]

1 Question 1 — Single-Factor (Market) Model

The market model regresses the fund’s excess return on the market excess return over \(n = 96\) months:

\[ R_i - R_f \;=\; \alpha + \beta\,(R_m - R_f) + \varepsilon . \]

# --- Reported regression output -------------------------------------------
alpha    <- 0.0017;  se_alpha <- 0.0020   # Jensen's alpha & its SE
beta     <- 0.98;    se_beta  <- 0.17     # market beta & its SE
R2_1     <- 0.50                          # coefficient of determination
mkt_prem <- 0.0070                        # E[R_m - R_f] = 0.70% per month
n1       <- 96                            # sample size
tcrit    <- 1.98                          # critical |t| at 5%

1.1 (a) Test \(H_0:\beta = 0\) and interpret \(\beta\)

\[ t_{\beta=0} = \frac{\hat\beta - 0}{\mathrm{SE}(\hat\beta)} = \frac{0.98}{0.17}. \]

t_beta0 <- (beta - 0) / se_beta
data.frame(statistic = "t (beta = 0)",
           value     = r4(t_beta0),
           crit      = tcrit,
           decision  = ttest_decision(t_beta0, tcrit))

Result. \(t_{\beta=0} = 5.7647\), which far exceeds \(1.98\), so we reject \(H_0\) at the 5% level: the market beta is highly significant.

Economic interpretation. \(\hat\beta = 0.98\) means that for every \(1\%\) change in the market excess return, the fund’s excess return moves by about \(0.98\%\). The fund carries essentially the same systematic (non-diversifiable) exposure to market risk as the market portfolio itself. Beta is the slope of the security characteristic line and measures the fund’s contribution to a diversified investor’s portfolio risk.

1.2 (b) Test \(H_0:\beta = 1\)

\[ t_{\beta=1} = \frac{\hat\beta - 1}{\mathrm{SE}(\hat\beta)} = \frac{0.98 - 1}{0.17}. \]

t_beta1 <- (beta - 1) / se_beta
data.frame(statistic = "t (beta = 1)",
           value     = r4(t_beta1),
           crit      = tcrit,
           decision  = ttest_decision(t_beta1, tcrit))

Result. \(t_{\beta=1} = -0.1176\), which is tiny in magnitude, so we fail to reject \(H_0:\beta = 1\).

Interpretation. Although \(\hat\beta = 0.98 \ne 1\) numerically, the difference from one is statistically indistinguishable from sampling noise. The fund’s systematic risk is not statistically different from that of the market: it is neither aggressive (\(\beta>1\)) nor defensive (\(\beta<1\)) in any reliable sense. Note the contrast with part (a): the same coefficient is significantly different from 0 yet not different from 1 — the hypothesis being tested, not just the estimate, determines the conclusion.

1.3 (c) Jensen’s alpha and the marketing claim

\[ t_{\alpha} = \frac{\hat\alpha - 0}{\mathrm{SE}(\hat\alpha)} = \frac{0.0017}{0.0020}. \]

t_alpha <- alpha / se_alpha
data.frame(statistic = "t (alpha = 0)",
           value     = r4(t_alpha),
           crit      = tcrit,
           decision  = ttest_decision(t_alpha, tcrit))

Result. \(t_{\alpha} = 0.85 < 1.98\), so we fail to reject \(H_0:\alpha = 0\).

Does the data justify advertising “positive risk-adjusted performance”? No. Jensen’s alpha is the average return in excess of the CAPM benchmark. The point estimate is positive (\(0.17\%\) per month), but with a standard error of \(0.20\%\) it is statistically indistinguishable from zero. The fund’s apparent outperformance is well within the range expected from luck. Advertising “statistically positive risk-adjusted performance” would be misleading: the honest statement is that there is no reliable evidence of skill (positive alpha).

1.4 (d) Interpretation of \(R^2\)

data.frame(systematic_share = R2_1,
           diversifiable_share = 1 - R2_1)

\(R^2 = 0.50\) means that 50% of the variation in the fund’s excess returns is explained by the market factor — this is the systematic portion of risk that cannot be diversified away. The remaining 50% is idiosyncratic (firm-specific, diversifiable) variation captured by the residual \(\varepsilon\). In a market model, \(R^2\) is exactly the fraction of total variance attributable to systematic market movements.

1.5 (e) CAPM-implied expected monthly excess return

Under CAPM the intercept is zero in equilibrium, so the expected excess return is

\[ E[R_i - R_f] = \beta \cdot E[R_m - R_f] = 0.98 \times 0.70\%. \]

capm_er <- beta * mkt_prem
cat("CAPM-implied monthly excess return:",
    r4(capm_er), "=", r4(capm_er * 100), "% per month\n")

## CAPM-implied monthly excess return: 0.0069 = 0.686 % per month

The CAPM-implied expected excess return is 0.686% per month (\(0.0069\)). This is the return the fund should earn purely as compensation for bearing market risk; any genuine skill would appear as a positive, significant alpha on top of this — which, from part (c), we do not observe.

2 Question 2 — Fama–French Three-Factor Model

Estimated on \(n = 144\) monthly observations:

\[ R_i - R_f = \alpha + b\cdot \mathrm{MKT} + s\cdot \mathrm{SMB} + h\cdot \mathrm{HML} + \varepsilon . \]

ff <- data.frame(
  term = c("Intercept (alpha)", "MKT (b)", "SMB (s)", "HML (h)"),
  est  = c(0.0029, 0.97, 0.75, -0.13),
  se   = c(0.0018, 0.08, 0.11, 0.13)
)
R2_2     <- 0.92
adjR2_2  <- 0.918
n2       <- 144

2.1 (f) \(t\)-statistics and significance

\[ t_j = \frac{\hat\theta_j}{\mathrm{SE}(\hat\theta_j)}, \qquad j \in \{\alpha,b,s,h\}. \]

ff$t   <- ff$est / ff$se
ff$sig <- ifelse(abs(ff$t) > tcrit, "Significant", "Not significant")
transform(ff, est = r4(est), se = r4(se), t = r4(t))

Conclusions at the 5% level:

MKT (\(t = 12.125\)) — strongly significant.
SMB (\(t = 6.8182\)) — strongly significant.
HML (\(t = -1\)) — not significant.
Alpha (\(t = 1.6111\)) — not significant.

2.2 (g) Investment-style classification

data.frame(
  loading = c("SMB (size)", "HML (value/growth)"),
  estimate = c(ff$est[3], ff$est[4]),
  significant = c(abs(ff$t[3]) > tcrit, abs(ff$t[4]) > tcrit)
)

Size tilt (SMB \(= +0.75\), significant): a positive and large SMB loading means the fund behaves like a portfolio long small-capitalisation stocks. This is a pronounced small-cap tilt.
Value/growth tilt (HML \(= -0.13\), not significant): a negative HML loading points toward growth stocks (low book-to-market), but the coefficient is statistically indistinguishable from zero. There is therefore no reliable value-or-growth tilt — at most a faint, statistically unsupported growth lean.

Style label: a small-cap fund that is essentially style-neutral on the value/growth dimension (a small-cap “blend,” leaning marginally and insignificantly toward growth).

2.3 (h) The intercept — does the manager add value?

cat("alpha =", r4(ff$est[1]), "(", r4(ff$est[1]*100), "% per month ),",
    "t =", r4(ff$t[1]), "->", ttest_decision(ff$t[1], tcrit), "\n")

## alpha = 0.0029 ( 0.29 % per month ), t = 1.6111 -> Fail to reject H0

The intercept is the three-factor (FF) alpha: average return after controlling for market, size, and value exposures. The estimate is positive (\(0.29\%\) per month) but \(t = 1.6111 < 1.98\), so it is not statistically significant. The manager therefore does not demonstrably “add value” beyond what is mechanically generated by loading on the market and small-cap factors. Much of what a single-factor model might have mislabeled as skill is, in fact, a size premium the manager harvests passively. Genuine skill requires a significantly positive alpha, which is absent.

2.4 (i) Why \(R^2\) rises to 0.92, and why adjusted \(R^2\) is the right metric

data.frame(
  model            = c("CAPM (1 factor)", "Fama-French (3 factors)"),
  R2               = c(0.75, R2_2),
  adjusted_R2      = c(NA, adjR2_2)
)

Moving from the one-factor CAPM (\(R^2 = 0.75\)) to the three-factor model (\(R^2 = 0.92\)) shows that the size (SMB) and value (HML) factors explain a large block of return variation the market factor alone misses — consistent with the highly significant SMB loading found in (f). Roughly \(17\) percentage points of additional variation are now attributed to systematic factor exposure rather than left in the residual.

Why adjusted \(R^2\) for model comparison. Ordinary \(R^2\) can never decrease when predictors are added, even if those predictors are pure noise, so it mechanically favours larger models and cannot diagnose overfitting. Adjusted \(R^2\) applies a degrees-of-freedom penalty,

\[ R^2_{\text{adj}} = 1 - (1 - R^2)\,\frac{n-1}{\,n-k-1\,}, \]

rewarding added regressors only if they improve fit by more than chance. Here \(R^2_{\text{adj}} = 0.918\) sits essentially on top of \(R^2 = 0.92\), confirming that SMB and HML earn their place — the improvement is genuine, not an artifact of adding parameters.

3 Question 3 — Logistic Regression for Market Direction

\[ \operatorname{logit} P(\text{Up}) = \beta_0 + \beta_1\,r_{t-1} + \beta_2\,\Delta\mathrm{VIX}_{t-1}, \qquad P(\text{Up}) = \frac{1}{1 + e^{-\,\text{logit}}}. \]

b0 <- -0.02; b1 <- 5.4; b2 <- -0.38   # coefficients
r_lag <- 0.010; dVIX <- 1.5           # today's inputs

3.1 (j) Predicted probability and class

logit_val <- b0 + b1 * r_lag + b2 * dVIX
p_up      <- 1 / (1 + exp(-logit_val))
class_pred <- ifelse(p_up >= 0.5, "Up", "Down")

cat("logit =", r4(logit_val), "\n")

## logit = -0.536

cat("P(Up) =", r4(p_up), "\n")

## P(Up) = 0.3691

cat("Predicted class (0.5 threshold):", class_pred, "\n")

## Predicted class (0.5 threshold): Down

Step by step,

\[ \text{logit} = -0.02 + 5.4(0.010) + (-0.38)(1.5) = -0.536, \]

\[ P(\text{Up}) = \frac{1}{1+e^{-(-0.536)}} = 0.3691. \]

Because \(0.3691 < 0.5\), the predicted class is “Down.” Intuitively, the large negative VIX-change contribution (\(-0.57\) in log-odds) overwhelms the small positive momentum contribution (\(+0.054\)), pushing the probability below one-half.

3.2 (k) Economic interpretation of the slopes

\(\beta_1 = +5.4\) (lagged return). A positive coefficient means a higher return yesterday raises the odds of an up day today — the model encodes short-horizon momentum / positive return persistence. (The large magnitude reflects that \(r_{t-1}\) is on a small decimal scale.)
\(\beta_2 = -0.38\) (\(\Delta\)VIX). A negative coefficient means a rise in the volatility/“fear” index lowers the odds of an up day. This captures the well-documented leverage / risk-off effect: spiking implied volatility accompanies falling markets, so rising VIX is bearish for next-day direction.

3.3 (l) Confusion-matrix metrics (200-day hold-out)

TP <- 67  # predicted Up,  actual Up
FP <- 44  # predicted Up,  actual Down
FN <- 33  # predicted Down, actual Up
TN <- 56  # predicted Down, actual Down
N  <- TP + FP + FN + TN

accuracy    <- (TP + TN) / N
sensitivity <- TP / (TP + FN)   # true-positive rate for "Up"
specificity <- TN / (TN + FP)   # true-negative rate
precision   <- TP / (TP + FP)   # PPV for "Up" predictions

data.frame(
  metric = c("Accuracy", "Sensitivity (TPR, Up)",
             "Specificity", "Precision (Up)"),
  formula = c("(TP+TN)/N", "TP/(TP+FN)", "TN/(TN+FP)", "TP/(TP+FP)"),
  value = r4(c(accuracy, sensitivity, specificity, precision))
)

Accuracy \(= 123/200 = 0.615\).
Sensitivity \(= 67/100 = 0.67\) — of all true up-days, \(67\%\) are caught.
Specificity \(= 56/100 = 0.56\) — of all true down-days, \(56\%\) are caught.
Precision (Up) \(= 67/111 = 0.6036\) — only \(60.3604\%\) of predicted up-days are actually up. The model is better at flagging up-days than at being right when it does (sensitivity > precision), and it is notably weaker on down-days.

3.4 (m) Naive benchmark and why accuracy is not enough

actual_up   <- TP + FN   # 100
actual_down <- FP + TN   # 100
naive_acc   <- max(actual_up, actual_down) / N

data.frame(
  rule = c("Model (0.5 threshold)", "Naive majority-class"),
  accuracy = r4(c(accuracy, naive_acc))
)

The classes are balanced (\(100\) up, \(100\) down), so a majority-class rule scores \(0.5\). The model’s \(0.615\) beats the naive rule by about 11.5 percentage points — a real but modest edge.

Why accuracy alone is inadequate for a trading system. Accuracy weights every error equally and ignores the economics of the bet:

Asymmetric payoffs. Up- and down-days differ in magnitude; being right on a small move and wrong on a large move can be net-losing even at high accuracy.
No notion of position size or P&L. A classifier can be “accurate” yet unprofitable after transaction costs and slippage.
Threshold blindness. \(0.5\) is arbitrary; trading cares about the full probability calibration and the cost-optimal cutoff.

A more economically relevant criterion is the risk-adjusted return of the strategy itself — the (out-of-sample) Sharpe ratio of the P&L generated by trading on the signal. Closely related useful measures are expected profit/utility per trade (cost-aware), or the AUC/ROC when one wants a threshold-independent ranking quality.

4 Question 4 — Resampling and Regularization in a Backtest

mu_m   <- 0.0070   # sample mean monthly return = 0.70%
sd_m   <- 0.0550   # sample SD monthly return  = 5.50%
n_obs  <- 48       # months

4.1 (n) Monthly and annualized Sharpe ratio

\[ SR_{\text{monthly}} = \frac{\bar r}{s} = \frac{0.0070}{0.0550}, \qquad SR_{\text{annual}} = \sqrt{12}\; SR_{\text{monthly}}. \]

SR_month <- mu_m / sd_m
scale_factor <- sqrt(12)
SR_annual <- SR_month * scale_factor

cat("Monthly Sharpe   :", r4(SR_month), "\n")

## Monthly Sharpe   : 0.1273

cat("Scaling factor   :", r4(scale_factor), "(= sqrt(12))\n")

## Scaling factor   : 3.4641 (= sqrt(12))

cat("Annualized Sharpe:", r4(SR_annual), "\n")

## Annualized Sharpe: 0.4409

The monthly Sharpe is 0.1273 and the annualized Sharpe is 0.4409. The scaling factor is \(\sqrt{12} \approx 3.4641\): under the i.i.d. assumption the annual mean scales by \(12\) while the annual standard deviation scales by \(\sqrt{12}\), so the ratio scales by \(12/\sqrt{12} = \sqrt{12}\). (Implicitly the risk-free rate is treated as \(0\) for the excess return here.)

4.2 (o) Bootstrapping the standard error of the Sharpe ratio

A nonparametric bootstrap estimates \(\mathrm{SE}(\widehat{SR})\) without a normality assumption:

Resample. From the observed series of \(n=48\) monthly returns, draw a resample of size \(48\) with replacement.
Recompute. On that resample compute \(\widehat{SR}^{*}_b = \bar r^{*}/s^{*}\).
Repeat. Do steps 1–2 for \(b = 1,\dots,B\) with \(B\) large (e.g. \(B = 10{,}000\)), storing each \(\widehat{SR}^{*}_b\).
Summarise. The bootstrap standard error is the sample standard deviation of \(\{\widehat{SR}^{*}_b\}\); a percentile confidence interval comes from the \(2.5\%\) and \(97.5\%\) quantiles of that distribution.

Why the ordinary i.i.d. bootstrap is inappropriate here. Monthly financial returns are not i.i.d.: they exhibit serial dependence — autocorrelation in the mean and, especially, volatility clustering (GARCH effects). Resampling individual months independently destroys this temporal dependence, which biases the variance estimate (typically understating the true standard error) and yields over-optimistic confidence intervals.

The fix: a block bootstrap. Resample contiguous blocks of consecutive months so that within-block dependence is preserved — the moving-block bootstrap, or the stationary bootstrap of Politis & Romano (random block lengths, which keeps the resampled series stationary). The block length should grow with \(n\) to capture the dependence horizon.

# Illustrative block-bootstrap for a Sharpe-ratio SE.
# `r_series` would be the vector of 48 monthly returns (not provided here).
library(boot)

sharpe <- function(x) mean(x) / sd(x)

# tsboot preserves dependence via fixed-length blocks (moving-block bootstrap)
set.seed(1)
bb <- tsboot(r_series, statistic = sharpe,
             R = 10000, l = 6, sim = "fixed")  # block length l ~ 6 months

se_SR  <- sd(bb$t)                 # bootstrap standard error
ci_SR  <- quantile(bb$t, c(0.025, 0.975))  # percentile 95% CI

4.3 (p) Choosing \(\lambda\) for the lasso

data.frame(
  rule = c("Minimum-CV-error", "One-standard-error (1-SE)"),
  lambda = c(0.030, 0.065),
  factors_retained = c(14, 7)
)

I would deploy the one-standard-error solution: \(\lambda = 0.065\) with 7 factors. The 1-SE rule selects the most regularized (sparsest) model whose cross-validation error is within one standard error of the minimum — i.e. the simplest model that is statistically indistinguishable in predictive accuracy from the best one. The reasons are sharper in finance:

Overfitting / data-snooping. With \(60\) candidate factors and only noisy return data, the minimum-CV model (\(14\) factors) likely fits sampling noise; CV error is itself estimated with uncertainty, and chasing its exact minimum rewards luck.
Parsimony, stability, interpretability. Seven factors give lower estimation variance, more stable loadings, and a story a risk committee can actually interpret.
Lower turnover / transaction costs. Fewer active factors generally mean a steadier, cheaper-to-trade portfolio out of sample.

If the sole objective were raw predictive accuracy on a stationary process with ample data, the minimum-CV \(\lambda\) could be defended — but for a deployable trading strategy, the robustness of the 1-SE choice wins.

4.4 (q) A time-respecting (walk-forward) evaluation

A valid out-of-sample scheme must never let future data inform the past.

Order chronologically and split time into an initial training window plus subsequent test blocks.
Train the full pipeline (factor selection, lasso \(\lambda\) via internal CV, parameter estimation) on the training window only.
Test on the next untouched block of months and record realised P&L.
Roll forward — either an expanding window (append new data, retrain) or a fixed-length rolling window — refit, and test on the following block.
Repeat to the end of the sample and aggregate the concatenated out-of-sample returns into performance metrics (out-of-sample Sharpe, drawdown).
Purge & embargo. Insert a small gap between train and test to remove observations whose information overlaps the boundary (important under autocorrelation / overlapping-horizon features), following López de Prado.

Why standard random \(k\)-fold CV is unsafe here. Random \(k\)-folding shuffles observations, so a training fold routinely contains months that occur after the test months. This is look-ahead bias / data leakage: the model is allowed to “see the future” when predicting the past, which is impossible in live trading and inflates measured performance. Because returns are temporally ordered and serially dependent, only a forward-chaining (walk-forward) protocol reproduces the real information set available at each decision date and gives an honest estimate of deployable performance.

Machine Learning Applications in Finance — Final Examination

Khongorzul Erkhembayar

June 08, 2026