Instructions. Answer all questions and show every formula and calculation. Round to four decimals unless told otherwise. The critical value used for all two-sided 5% t-tests is \(|t| \approx 1.98\).

All numbers below are computed live in R so that every figure is reproducible and verifiable from the code chunks.

Question 1 — Single-Factor (Market) Model (25 pts)

A fund’s monthly excess returns are regressed on the market excess return over \(n = 96\) months:

\[ R_i - R_f \;=\; \alpha \;+\; \beta\,(R_m - R_f) \;+\; \varepsilon \]

Term	Estimate	Std. Error
Intercept (\(\alpha\))	0.0017	0.0020
Market premium (\(\beta\))	0.98	0.17

Also reported: \(R^2 = 0.50\), average market risk premium \(E[R_m - R_f] = 0.70\% = 0.0070\), critical \(|t| \approx 1.98\).

n_obs  <- 96
alpha  <- 0.0017;  se_alpha <- 0.0020
beta   <- 0.98;    se_beta  <- 0.17
R2_1   <- 0.50
mkt_prem <- 0.0070      # E[Rm - Rf]
tcrit  <- 1.98

(a) t-statistic for \(\beta\) and test \(H_0:\beta = 0\)

The t-statistic for a coefficient against a null value \(\beta_0\) is

\[ t \;=\; \frac{\hat\beta - \beta_0}{\mathrm{SE}(\hat\beta)}, \qquad H_0:\beta = 0 \;\Rightarrow\; t = \frac{\hat\beta}{\mathrm{SE}(\hat\beta)}. \]

t_beta0 <- beta / se_beta
r4(t_beta0)

## [1] 5.7647

\(t = 0.98 / 0.17 = 5.7647\). Since \(|t| = 5.7647 > 1.98\) we reject \(H_0:\beta = 0\) at the 5% level — \(\beta\) is highly significant.

Economic interpretation. \(\beta = 0.98\) means the fund moves almost one-for-one with the market: a 1% rise in the market excess return is associated with about a 0.98% rise in the fund’s excess return. The fund carries essentially market-level systematic (non-diversifiable) risk, marginally defensive since \(\beta < 1\).

(b) Test \(H_0:\beta = 1\)

\[ t \;=\; \frac{\hat\beta - 1}{\mathrm{SE}(\hat\beta)} \]

t_beta1 <- (beta - 1) / se_beta
r4(t_beta1)

## [1] -0.1176

\(t = (0.98 - 1)/0.17 = -0.1176\). Since \(|t| = 0.1176 < 1.98\) we fail to reject \(H_0:\beta = 1\).

Conclusion. The fund’s beta is statistically indistinguishable from 1. Its systematic risk is not significantly different from that of the market — we cannot claim it is either more aggressive or more defensive than the market.

(c) t-statistic for \(\alpha\) (Jensen’s alpha)

\[ t \;=\; \frac{\hat\alpha}{\mathrm{SE}(\hat\alpha)} \]

t_alpha <- alpha / se_alpha
r4(t_alpha)

## [1] 0.85

\(t = 0.0017 / 0.0020 = 0.85\). Since \(|t| = 0.85 < 1.98\) we fail to reject \(H_0:\alpha = 0\).

On the marketing claim. Although the point estimate of alpha is positive (\(0.17\%\) per month), it is not statistically significant. The data do not justify advertising “positive risk-adjusted performance” — the positive alpha is well within sampling noise and could plausibly be zero (or negative).

(d) Interpreting \(R^2\)

\(R^2 = 0.50\) means the market factor explains 50% of the variance of the fund’s excess return. Decomposition of total risk:

Systematic (market-driven): \(R^2 = 50\%\) — cannot be diversified away.
Idiosyncratic (diversifiable): \(1 - R^2 = 50\%\) — firm-specific risk that vanishes in a well-diversified portfolio.

c(systematic = r4(R2_1), diversifiable = r4(1 - R2_1))

##    systematic diversifiable 
##           0.5           0.5

(e) CAPM-implied expected monthly excess return

CAPM sets \(\alpha = 0\), so the expected excess return is driven entirely by beta:

\[ E[R_i - R_f] \;=\; \beta \cdot E[R_m - R_f] \]

capm_er <- beta * mkt_prem
r4(capm_er)               # decimal

## [1] 0.0069

r4(capm_er * 100)         # percent

## [1] 0.686

\(E[R_i - R_f] = 0.98 \times 0.0070 = 0.0069\), i.e. about 0.686% per month.

Question 2 — Fama–French Three-Factor Model (25 pts)

For a managed equity fund, estimated on \(n = 144\) monthly observations:

\[ R_i - R_f \;=\; \alpha \;+\; b\cdot \text{MKT} \;+\; s\cdot \text{SMB} \;+\; h\cdot \text{HML} \;+\; \varepsilon \]

Term	Estimate	Std. Error
Intercept (\(\alpha\))	0.0029	0.0018
MKT (\(b\))	0.97	0.08
SMB (\(s\))	0.75	0.11
HML (\(h\))	-0.13	0.13

Also reported: \(R^2 = 0.92\), Adjusted \(R^2 = 0.918\), critical \(|t| \approx 1.98\).

ff <- data.frame(
  term     = c("alpha", "MKT (b)", "SMB (s)", "HML (h)"),
  estimate = c(0.0029, 0.97, 0.75, -0.13),
  se       = c(0.0018, 0.08, 0.11, 0.13)
)
n_ff <- 144; k_ff <- 3; R2_ff <- 0.92

(f) t-statistics and significance

\[ t_j \;=\; \frac{\hat\theta_j}{\mathrm{SE}(\hat\theta_j)} \]

ff$t_stat <- r4(ff$estimate / ff$se)
ff$significant_5pct <- ifelse(abs(ff$t_stat) > tcrit, "YES", "no")
ff

Coefficient	\(t\)	Significant (5%)?
\(\alpha\)	1.6111	no
MKT (\(b\))	12.125	YES
SMB (\(s\))	6.8182	YES
HML (\(h\))	-1	no

Significant at 5%: the market loading \(b\) and the size loading \(s\). Not significant: the intercept \(\alpha\) and the value loading \(h\).

(g) Investment style from SMB and HML loadings

SMB \(s = +0.75\) (positive, highly significant). A positive size loading means the fund’s returns co-move with small-minus-big, i.e. a strong tilt toward small-capitalization stocks.
HML \(h = -0.13\) (negative, not significant). A negative value loading points toward growth stocks (low book-to-market), but because \(h\) is statistically indistinguishable from zero, the value/growth tilt is negligible/unreliable.

Classification: a small-cap fund with at most a weak (statistically insignificant) growth lean — best described as small-cap, style-neutral on the value–growth dimension.

(h) Intercept interpretation and manager skill

alpha_ff   <- ff$estimate[1]
t_alpha_ff <- ff$t_stat[1]
c(alpha_monthly = r4(alpha_ff), alpha_annual_approx = r4(alpha_ff*12),
  t = r4(t_alpha_ff))

##       alpha_monthly alpha_annual_approx                   t 
##              0.0029              0.0348              1.6111

The intercept \(\alpha = 0.0029\) (\(\approx 0.29\%\)/month, roughly \(3.48\%\)/year) is the factor-adjusted (abnormal) return — performance beyond what the three factor exposures explain. Its t-statistic is \(1.6111 < 1.98\), so it is not significant.

Does the manager add value? The point estimate hints at positive skill, but we cannot statistically conclude the manager adds value beyond passive factor exposure: the alpha is within sampling noise of zero. The fund’s returns are largely a repackaging of market + small-cap exposure.

(i) From \(R^2 = 0.75\) (CAPM) to \(0.92\), and why adjusted \(R^2\)

The single-factor CAPM achieved \(R^2 = 0.75\); adding SMB and HML raised it to \(0.92\). This +17 pp jump shows that a large share of what the CAPM treated as idiosyncratic noise is in fact systematic exposure to the size factor (and, to a lesser extent, value). The factor model captures the fund’s behavior far better.

Why adjusted \(R^2\) for model comparison. Ordinary \(R^2\) can never decrease when predictors are added — even useless ones — so it mechanically favors larger models. Adjusted \(R^2\) penalizes the number of predictors \(k\):

\[ \bar R^2 \;=\; 1 - (1 - R^2)\,\frac{n - 1}{\,n - k - 1\,} \]

It rises only when a new predictor improves fit more than expected by chance, making it the fair criterion across models of different dimension. Verifying the reported value:

adj_R2 <- 1 - (1 - R2_ff) * (n_ff - 1) / (n_ff - k_ff - 1)
r4(adj_R2)

## [1] 0.9183

\(\bar R^2 = 1 - (1-0.92)\frac{143}{140} = 0.9183\), which matches the reported \(0.918\) up to rounding (they agree to three decimals). It is barely below \(R^2 = 0.92\), confirming the extra factors add genuine explanatory power rather than overfitting.

Question 3 — Logistic Regression for Market Direction (25 pts)

A model predicts the probability that tomorrow’s market return is positive (“Up”):

\[ \operatorname{logit} P(\text{Up}) \;=\; \beta_0 \;+\; \beta_1\,(\text{lagged return } r_{t-1}) \;+\; \beta_2\,(\Delta \text{VIX}_{t-1}) \]

Estimated coefficients: \(\beta_0 = -0.02\), \(\beta_1 = 5.4\), \(\beta_2 = -0.38\). Today’s inputs: \(r_{t-1} = 0.010\), \(\Delta\text{VIX} = 1.5\).

b0 <- -0.02; b1 <- 5.4; b2 <- -0.38
r_lag <- 0.010; d_vix <- 1.5

(j) Predicted probability and class

The logistic link and its inverse:

\[ z = \operatorname{logit}P(\text{Up}) = \beta_0 + \beta_1 r_{t-1} + \beta_2 \Delta\text{VIX}, \qquad P(\text{Up}) = \frac{1}{1 + e^{-z}}. \]

z   <- b0 + b1*r_lag + b2*d_vix
p_up <- 1 / (1 + exp(-z))
c(logit_z = r4(z), P_Up = r4(p_up))

## logit_z    P_Up 
## -0.5360  0.3691

predicted_class <- ifelse(p_up >= 0.5, "Up", "Down")
predicted_class

## [1] "Down"

\(z = -0.02 + 5.4(0.010) + (-0.38)(1.5) = -0.02 + 0.054 - 0.57 = -0.536\).

\[ P(\text{Up}) = \frac{1}{1 + e^{0.536}} = 0.3691. \]

Since \(P(\text{Up}) = 0.3691 < 0.5\), the predicted class is “Down.”

(k) Economic interpretation of the signs

\(\beta_1 = +5.4 > 0\). A higher lagged return raises the odds of an Up day — short-horizon momentum / positive return autocorrelation: yesterday’s gains predict today’s gains. (The large magnitude is partly because daily returns are tiny in level, so the coefficient on a ~1% input must be large.)
\(\beta_2 = -0.38 < 0\). A rise in VIX (increasing fear/volatility) lowers the odds of an Up day — the “leverage effect”/fear gauge: spiking implied volatility coincides with falling markets, so \(\Delta\text{VIX}\uparrow\) predicts \(P(\text{Up})\downarrow\).

(l) Confusion-matrix metrics

On a 200-day hold-out test set (0.5 threshold):

	Actual Up	Actual Down	Total
Predicted Up	67	44	111
Predicted Down	33	56	89
Total	100	100	200

TP <- 67; FP <- 44; FN <- 33; TN <- 56
N  <- TP + FP + FN + TN

accuracy    <- (TP + TN) / N
sensitivity <- TP / (TP + FN)   # true-positive rate for "Up" (recall)
specificity <- TN / (TN + FP)   # true-negative rate for "Down"
precision   <- TP / (TP + FP)   # precision for "Up" predictions

data.frame(
  metric = c("Accuracy", "Sensitivity (TPR, Up)", "Specificity", "Precision (Up)"),
  formula = c("(TP+TN)/N", "TP/(TP+FN)", "TN/(TN+FP)", "TP/(TP+FP)"),
  value  = r4(c(accuracy, sensitivity, specificity, precision))
)

\[ \begin{aligned} \text{Accuracy} &= \frac{TP+TN}{N} = \frac{67+56}{200} = 0.615,\\[4pt] \text{Sensitivity} &= \frac{TP}{TP+FN} = \frac{67}{100} = 0.67,\\[4pt] \text{Specificity} &= \frac{TN}{TN+FP} = \frac{56}{100} = 0.56,\\[4pt] \text{Precision} &= \frac{TP}{TP+FP} = \frac{67}{111} = 0.6036. \end{aligned} \]

(m) Naive majority-class rule and the limits of accuracy

actual_up   <- TP + FN   # 100
actual_down <- FP + TN   # 100
naive_acc   <- max(actual_up, actual_down) / N
c(actual_up = actual_up, actual_down = actual_down,
  naive_accuracy = r4(naive_acc), model_accuracy = r4(accuracy))

##      actual_up    actual_down naive_accuracy model_accuracy 
##        100.000        100.000          0.500          0.615

Since the two classes are tied (100 Up vs 100 Down), there is no strict majority class. A constant naive classifier that always predicts “Up” (or always predicts “Down”) is therefore correct \(100/200 = 0.5000\) of the time. The model’s accuracy of 0.615 beats this benchmark by \(11.5\) percentage points.

Why accuracy alone is inadequate for a trading system.

It weights every error equally, but the economic cost of a false “Up” (going long into a down day) differs from a missed “Up.”
It ignores the magnitude of returns — being right on many tiny moves and wrong on a few large ones can still lose money.
It ignores transaction costs and payoff asymmetry.

A more economically relevant criterion: the risk-adjusted profitability of the resulting strategy — its (out-of-sample) Sharpe ratio, or equivalently the realized P&L / economic value of trading on the signal, which weights each decision by the money actually made or lost.

Question 4 — Resampling and Regularization in a Backtest (25 pts)

A candidate strategy earns a sample mean monthly return of \(0.70\%\) with a sample standard deviation of \(5.50\%\) over \(48\) months.

mu_m <- 0.0070   # mean monthly return
sd_m <- 0.0550   # monthly standard deviation
T_m  <- 48

(n) Monthly and annualized Sharpe ratio

\[ \text{SR}_{\text{monthly}} = \frac{\bar r}{s_r}, \qquad \text{SR}_{\text{annual}} = \sqrt{12}\;\text{SR}_{\text{monthly}}. \]

SR_m   <- mu_m / sd_m
scale  <- sqrt(12)
SR_ann <- SR_m * scale
c(monthly_Sharpe = r4(SR_m), scaling_factor = r4(scale),
  annualized_Sharpe = r4(SR_ann))

##    monthly_Sharpe    scaling_factor annualized_Sharpe 
##            0.1273            3.4641            0.4409

\(\text{SR}_{\text{monthly}} = 0.0070/0.0550 = 0.1273\).

Scaling factor \(= \sqrt{12} = 3.4641\). It is \(\sqrt{12}\) (not \(12\)) because, under i.i.d. returns, the mean scales with the horizon \(T\) while the standard deviation scales with \(\sqrt{T}\); their ratio therefore scales with \(\sqrt{T}\).

\[ \text{SR}_{\text{annual}} = \sqrt{12}\times 0.1273 = 0.4409. \]

(o) Bootstrap standard error for the Sharpe ratio

Because monthly returns are serially dependent, use a block bootstrap (its i.i.d. special case is just block length = 1). This makes no normality assumption about the return distribution.

Procedure (block bootstrap), step by step:

Keep the observed sample of \(48\) monthly returns in time order and pick a block length \(\ell\) (e.g. \(\ell \approx 3\)–\(6\) months) long enough to span the short-run dependence.
Form overlapping blocks of \(\ell\) consecutive returns and draw blocks with replacement, concatenating them until the resample is again length \(48\).
Compute the Sharpe ratio \(\widehat{\text{SR}}^{*} = \bar r^{*} / s_r^{*}\) on that resample.
Repeat steps 2–3 a large number of times \(B\) (e.g. \(B = 10{,}000\)), storing each replicate \(\widehat{\text{SR}}^{*}_b\).
The bootstrap standard error is the standard deviation across replicates, \(\widehat{\text{SE}} = \mathrm{sd}\!\left(\widehat{\text{SR}}^{*}_1,\dots,\widehat{\text{SR}}^{*}_B\right)\); the percentiles of the \(\widehat{\text{SR}}^{*}_b\) give a confidence interval.

(No SE is computed here because the raw 48-month return series is not provided — the question asks for the procedure, which is described above.)

Why the ordinary i.i.d. bootstrap is inappropriate here. It would draw individual months independently (step 2 with \(\ell = 1\)). But monthly financial returns are not independent: they exhibit serial dependence — autocorrelation in the mean and, especially, volatility clustering (GARCH-type effects). Resampling single months destroys this time-series structure and biases the standard error, typically understating it.

The fix is the block bootstrap used above, which resamples contiguous blocks of consecutive months and so preserves the short-run autocorrelation and volatility clustering. The stationary (Politis–Romano) bootstrap, which randomizes the block length, is the standard refinement.

(p) Which \(\lambda\) to deploy: min-CV vs. one-standard-error rule

Minimum-CV-error: \(\lambda = 0.030\), retaining 14 factors.
One-standard-error rule: \(\lambda = 0.065\), retaining 7 factors.

Deploy \(\lambda = 0.065\) (the 1-SE, 7-factor model). The 1-SE rule chooses the most parsimonious model whose CV error is within one standard error of the minimum. The CV curve is noisy and its minimum typically sits at a \(\lambda\) that is too small, retaining factors that will not generalize. With only \(48\) months but \(60\) candidate factors — a high-dimensional, low-sample, multiple- testing setting highly prone to overfitting and data-snooping — the simpler 7-factor model is more robust out-of-sample, more stable, and more interpretable. The negligible loss in in-sample CV fit buys substantial protection against fitting noise.

(q) Walk-forward evaluation and why random k-fold is unsafe

Time-respecting (walk-forward) scheme:

Order the data chronologically.
Fit everything — lasso, the choice of \(\lambda\)/factors, and the weights — on an initial in-sample window (e.g. months 1–24).
Generate predictions/returns on the next out-of-sample block that comes strictly after training (e.g. months 25–30).
Roll forward: expand (or slide) the window to include the just-tested period, re-fit from scratch, and test on the following block (31–36).
Repeat to the end of the sample, concatenating the out-of-sample returns.
Evaluate performance (Sharpe, drawdown, P&L) only on those concatenated out-of-sample periods. Optionally insert a purge/embargo gap between train and test to prevent leakage from overlapping or lagged features.

Why standard random k-fold CV is unsafe. Random k-fold shuffles observations, so the training folds routinely contain future months used to predict past ones. This is look-ahead bias / data leakage: it exploits information unavailable in real time and yields over-optimistic, unrealizable performance. It also breaks the temporal autocorrelation structure and lets information bleed between adjacent (correlated) months. Walk-forward respects the arrow of time — only the past is ever used to predict the future — giving an honest estimate of live performance.

Summary of computed answers

Item	Value
Q1a t(beta=0)	5.7647
Q1b t(beta=1)	-0.1176
Q1c t(alpha)	0.8500
Q1e CAPM E[Ri-Rf]	0.0069
Q2 t(alpha)	1.6111
Q2 t(MKT)	12.1250
Q2 t(SMB)	6.8182
Q2 t(HML)	-1.0000
Q2 adj R^2	0.9183
Q3j P(Up)	0.3691
Q3l accuracy	0.6150
Q3l sensitivity	0.6700
Q3l specificity	0.5600
Q3l precision	0.6036
Q3m naive acc	0.5000
Q4n monthly Sharpe	0.1273
Q4n annualized Sharpe	0.4409

Document generated with R Markdown; all figures rounded to four decimals.

Final Examination — Machine Learning Applications in Finance

Enkhjin.N

June 08, 2026