Case Study — Formation Bio Senior Data Engineer, RWD
2026-05-05
Problem statement: Build a scalable data system to provide the European Medicines Agency with timely and accurate health data across Europe.
Agenda
Architecture decisions (CDMConnector & new tools)
Consensus building across teams (Oxford & Erasmus)
The package ecosystem we shipped (omopverse → 25+ packages)
AI tooling we layered on top
What I learned
~16 engineer-weeks for the foundation; full project timeline 2022–2026.
40+ databases across 12 countries.
Each partner runs studies behind their firewall.
Patient-level data never leaves the source.
Common substrate: OMOP CDM.
Distributed analytics: study code travels to the data, results come back.
Scale targets: