BIOS 731 Final Project — Presentation Script

Speaker: Conglin Bao (Richard) | Duration: ~13–14 min (safe for 15-min slot)


Slide 1 — Title (≈ 30 sec)

Good morning everyone. Thank you for being here. My name is Conglin Bao — you can also call me Richard.

Today I am going to walk you through my final project, which is titled Implementing an ADMM-Based Elastic Net Solver for Factor Return Prediction.

The project sits at the intersection of three things I care about: high-dimensional statistics, optimization, and quantitative finance. I want to upfront say this is a work-in-progress — I will be honest about which pieces are complete, which are partial, and where I am still iterating. Let us get started.


Slide 2 — Motivation (≈ 1 min 15 sec)

Let me start with the motivation.

In quantitative finance, one of the oldest and most important questions is: given a large set of candidate factors — value, momentum, size, profitability, low volatility, and so on — which ones actually carry predictive signal for returns? This is the alpha factor selection problem.

Two features of this problem make it statistically hard.

First, the predictors are heavily correlated. Value and profitability move together; momentum and low volatility interact. Plain ordinary least squares blows up under this kind of multicollinearity.

Second, we expect sparsity, but not perfect sparsity — we believe only a handful of factors matter, but we do not want to throw away everything else entirely.

This is exactly the regime where the Elastic Net shines. It blends the L1 penalty, which gives us sparsity, with the L2 penalty, which gives us stability under correlation. So the statistical question is clear. The remaining question is computational: how do we actually solve it — and do we understand what is happening inside the solver?


Slide 3 — Project Objective (≈ 45 sec)

That brings me to the objective of this project, which has three parts.

First, I implement an ADMM solver for Elastic Net from scratch, in R, without calling glmnet or any off-the-shelf penalized regression package.

Second, I plan to apply this solver to Fama-French multi-factor data to predict industry-portfolio excess returns — I will be transparent that this part is still in progress.

Third, I run a simulation study benchmarking ADMM Elastic Net against coordinate descent LASSO and Ridge regression, across different sparsity levels, sample sizes, and predictor correlation structures.

The deeper goal is not to beat glmnet on runtime — glmnet is extremely well-optimized. The goal is to understand the algorithmic machinery, so that later I can adapt it to problem structures where off-the-shelf tools do not apply.


Slide 4 — ADMM for Elastic Net (≈ 2 min)

Let me now spend a couple of minutes on the core method. This is the most technical slide and I will walk through it carefully.

The Elastic Net objective is shown on the left: a squared-error loss plus an L1 penalty with weight lambda times alpha, plus an L2 penalty with weight lambda times one minus alpha, divided by two. When alpha is one we recover LASSO; when alpha is zero we recover Ridge.

ADMM — the Alternating Direction Method of Multipliers — is a splitting technique. The idea is simple: we introduce a copy of the variable, call it z, and impose a linear constraint x equals z. This splits the hard combined problem into three easy pieces that we iterate between.

The first update — the x-update — is a ridge-type linear system. It has a closed-form solution. You compute X-transpose-X plus a ridge term, invert it once, and then reuse that inverse every iteration. This is cheap.

The second update — the z-update — is where the L1 penalty lives. It has an elegant closed form called the soft-thresholding operator. You shrink each coordinate toward zero, and if it is small enough, you set it exactly to zero. This is what gives us sparsity.

The third update is the dual variable update — just a simple running sum of the primal residual.

We iterate these three steps until the primal and dual residuals are both small, which are the standard convergence criteria from the ADMM literature. This is exactly the diagnostic we covered in the MM/ADMM lecture.

The elegance of ADMM is that it turned a hard non-smooth optimization problem into three updates, two of which are closed-form and one of which is trivial.


Slide 5 — Implementation Details (≈ 1 min 15 sec)

A few notes on the implementation.

The core solver is written in pure R first, because I wanted the logic to be transparent and easy to debug. Then — and this directly applies the Rcpp module from class — I rewrote the inner iteration loop in C++ through Rcpp.

The speedup is meaningful. For a problem with n equal to three hundred observations and p equal to fifty predictors, the pure R version takes on the order of hundreds of milliseconds, and the Rcpp version drops into the tens of milliseconds. Exact numbers will be in the final report — right now I am still benchmarking across different problem sizes.

I cache the Cholesky factorization of X-transpose-X plus rho times the identity once upfront, and reuse it across all iterations. This matters because the x-update is the dominant cost per iteration, and factoring once means each iteration is just a forward-backward solve.

For convergence I use both primal and dual residual tolerances at ten to the minus four, and cap the maximum iterations at one thousand.


Slide 6 — Simulation Design (≈ 1 min 30 sec)

Now let me turn to the simulation study — this is the largest completed piece of the project.

Following the simulation design principles from the Simulation and Resampling module in class, I generate synthetic data from a linear model y equals X-beta plus epsilon.

I vary three factors in a factorial design.

Sparsity, meaning the number of true non-zero coefficients out of fifty predictors, takes three levels: five, fifteen, and thirty.

Sample size takes three levels: one hundred, three hundred, and five hundred.

Predictor correlation rho takes three levels: zero, zero point three, and zero point seven — using a Toeplitz structure so that correlation decays with coordinate distance.

I run two hundred Monte Carlo replicates per cell. I originally planned five hundred, but I scaled it back to keep the full grid tractable on my local machine — and I report Monte Carlo standard errors alongside every estimate so the precision is explicit.

I compare three methods: my ADMM Elastic Net, coordinate descent LASSO, and Ridge regression. Tuning parameters are selected by five-fold cross-validation on an inner loop.

I evaluate three metrics: prediction mean squared error on held-out data, true positive rate for variable selection, and false discovery rate.

Replicates are run in parallel using the future package — another direct application from the Reproducible Computing module.


Slide 7 — Simulation Results (≈ 1 min 30 sec)

Here are the preliminary results from the simulation. I want to highlight the patterns rather than walk through every number.

First, on prediction error. Under high predictor correlation — rho equals zero point seven — Elastic Net clearly outperforms both LASSO and Ridge. This is the regime Elastic Net was designed for, so this matches theoretical expectations. Under zero correlation, LASSO and Elastic Net are essentially tied, which again is the expected behavior because the L2 component adds little value when predictors are orthogonal.

Second, on variable selection. Elastic Net has a higher true positive rate than LASSO under high correlation. The intuition is that LASSO tends to arbitrarily pick one correlated predictor and zero out the rest, which hurts recall. Elastic Net’s L2 component smooths this out and retains groups of correlated signals.

Third, on false discovery rate. Both sparse methods control FDR reasonably well at moderate sparsity. At the dense end — thirty true signals — all methods degrade, which is expected.

I want to flag honestly that the magnitudes I report are based on two hundred replicates, and the Monte Carlo standard errors are non-trivial for some cells. I will expand the grid before the final report if time allows.


Slide 8 — Real Data Application: Status (≈ 45 sec)

Now for the part I want to be most transparent about.

The real-data application using Fama-French data is partially complete. I have downloaded the Fama-French five-factor plus momentum dataset from the Kenneth French data library, and I have built the data-cleaning pipeline. The solver runs end-to-end on the data.

What I have not yet finished: the full cross-validated comparison across industry portfolios, and the out-of-sample backtest window. These will be in the written report, but I do not have clean numbers to show you today.

What I learned from the partial work is a practical point — financial factor data has very different signal-to-noise characteristics than simulated data, and the cross-validation fold design matters a lot because of time-series structure. I will discuss this in the appendix of the written report.


Slide 9 — Course Concepts Applied (≈ 30 sec)

Quickly, to connect this back to the class content — this project draws on four modules directly.

The MM and ADMM lecture gave me the theoretical foundation for the solver.

The Rcpp module is what let me get a meaningful speedup on the inner loop.

The Simulation and Resampling module shaped the experimental design and the reporting of Monte Carlo standard errors.

And the Reproducible Computing module is reflected in the parallel replicate runs and in the fact that the whole project lives in a GitHub repository with a reproducible R Markdown report.


Slide 10 — Next Steps and Takeaways (≈ 45 sec)

To wrap up, here is where I am and where I am going.

What is complete: the ADMM Elastic Net solver in both R and Rcpp, and the core simulation study.

What is in progress: the full Fama-French application, expanding the simulation to more replicates, and the final benchmarking against glmnet on identical problems.

What I want to take away for the audience. First, ADMM is a clean and flexible framework — once you see soft-thresholding as the workhorse of the z-update, it generalizes to many other penalties beyond Elastic Net. Second, implementing a solver from scratch forces you to understand things about the problem that a library call completely hides, especially around tuning and convergence diagnostics.

Thank you. I am happy to take questions.


Q&A Preparation — Likely Questions

Q1. Why did you use ADMM instead of coordinate descent, which is what glmnet uses? A. Coordinate descent is faster per iteration for this specific problem, but ADMM is much easier to extend to structured penalties — group Elastic Net, fused LASSO, graph-constrained penalties. My goal was a solver I could adapt, not one that beats glmnet on vanilla Elastic Net.

Q2. How did you choose rho, the ADMM penalty parameter? A. I used a simple heuristic — rho equal to one — and verified that residuals converged at a reasonable rate. Adaptive rho schemes based on residual balancing are in the Boyd ADMM monograph and would be a natural extension.

Q3. How did you handle the alpha parameter — the L1 versus L2 blend? A. I ran the simulation at alpha equals zero point five. A fuller study would cross-validate over alpha jointly with lambda, which is what glmnet does. That is on the extension list.

Q4. Is two hundred replicates enough? A. For most cells the Monte Carlo standard error is small relative to the effect sizes I am reporting, but for the boundary cases — small n and high correlation — I would want more. That is an honest limitation.

Q5. How does your Rcpp version compare to glmnet in raw wall-clock time? A. glmnet is still faster — it has more than a decade of optimization plus warm-starting across the regularization path. My Rcpp version is within a factor of roughly three to five on the problems I have tested, which is reasonable for a from-scratch implementation.

Q6. Why Fama-French specifically? A. It is public, well-documented, and the factors have known economic interpretations, so if my solver picks a sensible subset I have a sanity check on the output. It is also a realistic multicollinearity setting — the factors are known to correlate.