ALADYNOULLI: Dynamic Disease Modeling
A Bayesian Framework for Genomic Discovery and Clinical Prediction
2026-02-04
Preface
“The shape of a story is the shape of a life.” — Kurt Vonnegut
This book began, as many things do, with a patient.
A 34-year-old man arrives in the emergency department with chest pain. His ECG shows ST elevations. His troponin is 870. The diagnosis seems clear: acute coronary syndrome. But everything is moving—his risk was different yesterday than it is today, and it will be different tomorrow.
Traditional risk models assume that hazards are proportional: that the relative risk between two individuals stays constant over time. But we know this isn’t true. A 40-year-old with high genetic risk for coronary disease faces different dynamics than a 70-year-old with the same genes. Risk evolves. Diseases interact. Biology unfolds over decades.
This book describes ALADYNOULLI, a Bayesian framework that models how disease risk changes over the lifespan by integrating longitudinal electronic health records with genetic data. The name combines “Aladdin” (the genie grants wishes—and predictions) with “Bernoulli” (the distribution underlying our likelihood). The model identifies latent disease signatures from patterns of diagnosis, learns how individuals traverse these signatures over time, and incorporates genetic information to improve both biological discovery and clinical prediction.
Who This Book Is For
This book is written for:
- Biostatisticians interested in longitudinal modeling, Bayesian methods, and survival analysis
- Genetic epidemiologists working with biobank data and polygenic risk scores
- Clinical researchers developing risk prediction tools
- Data scientists applying machine learning to healthcare
- Graduate students in biostatistics, bioinformatics, or computational biology
We assume familiarity with basic probability, linear algebra, and some exposure to Bayesian statistics. Code examples are in Python (PyTorch), with R used for visualization.
How This Book Is Organized
Part I: The Clinical Problem motivates why we need dynamic risk models, what electronic health records offer, and why genetics matters for disease trajectories.
Part II: The ALADYNOULLI Model presents the mathematical framework, from the Bayesian formulation through Gaussian process priors to computational implementation.
Part III: Discovering Disease Signatures shows how the model identifies latent patterns of disease co-occurrence and what these signatures mean biologically.
Part IV: Genetic Validation demonstrates that signatures capture real biology through GWAS, rare variant analysis, and heritability estimation.
Part V: Clinical Prediction evaluates prediction accuracy against established risk scores and shows how dynamic updating improves forecasts.
Part VI: Robustness and Validation addresses selection bias, population stratification, temporal leakage, and other methodological concerns.
Part VII: Advanced Topics covers competing risks, alternative architectures, and future directions.
Appendices provide mathematical details, complete validation analyses, and code for reproducibility.
Acknowledgments
This work would not have been possible without the UK Biobank, Mass General Brigham Biobank, and All of Us Research Program participants who contributed their data for research.
I am deeply grateful to my mentors Giovanni Parmigiani, Pradeep Natarajan, and Alexander Gusev for their guidance and support. Special thanks to the many collaborators who contributed to this work: Yi Ding, Tetsushi Nakao, Satoshi Koyama, Xilin Jiang, Achyutha Harish, Leslie Gaffney, Whitney Hornsby, and Jordan Smoller.
Code and Data
All code is available at: https://github.com/surbut/aladynoulli2
Interactive results: https://surbut.github.io/aladynoulli2/
Web application: https://aladynoulli.hms.harvard.edu
Boston, Massachusetts
January 2026