Refactoring McMasterPandemic for Speed and Flexibility

2022-02-09

Acknowledgements – Funding

Acknowledgements – People

Ben Bolker – McMasterPandemic Creator/Architect
Mac-Theobio COVID Modelling Group Covid Modelling Blog: https://mac-theobio.github.io/covid-19/
- Irena Papst, Mike Li (Public Health Agency of Canada, McMaster)
- Ben Bolker, Jonathan Dushoff, David Earn (McMaster)
Weiguang Guan – C++ Developer (Sharcnet)
Help Understanding Models / Alpha Testing / Test Cases
- Zach Levine (Weizmann Institute of Science)
- Lisa Kanary (Yukon University)
- Caroline Colijn (Simon Fraser University)

Background

What does Refactoring mean?

What?
- Change the internals of an existing code base
- Match behaviour of original code
Why?
- Maintainability
- Readability
- Speed
- Extensibility

Related concepts
- Non-functional requirements
- Technical debt

What is McMasterPandemic?

R packagehttps://github.com/mac-theobio/McMasterPandemic
Infectious disease forecasts
Developed at McMaster University
Covid-19 response

Forecasts
- Ontario Science Advisory Table
- Public Health Agency of Canada
- Chief Public Health Officer
- World Health Organization (2020, Afro-Region, 49 countries)
Small user base
- Number of forecasters \(\approx 3\)

Prediction of the Third Wave

Mac-Theobio COVID Modelling Group Covid Modelling Blog: https://mac-theobio.github.io/covid-19/

Prediction of the Third Wave

Mac-Theobio COVID Modelling Group Covid Modelling Blog: https://mac-theobio.github.io/covid-19/

Why Refactor McMasterPandemic?

Why Refactor? – slow forecasting

Confidence bands slow to compute
Based on repeated simulation
Most time-consuming step for PHAC
Two approaches:
- Translate to C++ (done)
- Parallelize (todo)

Why Refactor? – slow calbration

Calibration: tuning parameters to data
Example
- observed case reports – dots
- simulated case reports – line
- optimize transmission rate using maximum likelihood
- negative binomial error model

Why Refactor? – slow calbration

Calibration: tuning parameters to data
Based on trajectory matching
- select parameter set
- simulate data
- compare simulated and observed data
- update parameter set
- repeat
Again – repeated simulations are slow

Why Refactor? – slow scenario exploration

Public health modeller has to make weekly forecasts
Model fitting/calibration takes two days
Only three days to
- explore scenarios
- write the report
No time to refine the model
No time to refine messaging

Why Refactor? – slow scenario exploration

Public health modeller has to make weekly forecasts
Model fitting/calibration takes two days
Only three days to
- explore scenarios
- write the report
No time to refine the model
No time to refine messaging

What if two days becomes two hours?

More time modelling and communicating, less time waiting for results

Why Refactor? – slow model development

Need a bit more background before answering …

What is a Compartmental Model?

Base McMasterPandemic Model

Figure from Irena Papst

Model with Vaccination and Variants

Figure from Irena Papst

Matrices and Linear Algebra

As models get more complex, quantities that looked like this …

Matrices and Linear Algebra

… become more complexThis is actually a simplification

Why Refactor? – slow model development

New variant emerges, as immunity from vaccination wanes
Public health modeller has to create new model structure
Requires a few weeks of software development
Public health situation changes over this time
Model is less relevant when complete

Figure from Irena Papst

Why Refactor? – slow model development

New variant emerges
Immunity from vaccination wanes
Public health modeller has to create new model structure
Requires a few weeks of software development
Public health situation changes over this time
Model is less relevant when complete

Figure from Irena Papst

What if a few weeks becomes an afternoon?

Forecasts are delivered quickly, resulting in bigger public health impact

Architecture

Call stack diagram
- Arrows from (layer of) functions that call other functions
- Arrows to (layer of) functions being called by other functions
Two use cases
- Calibration
- Forecasting
Both interfaces utilize same simulation engine

Refactored Architecture

Refactored components in grey
Focus on two things
- model definition conveniences (focused human time)
- simulation speed (human waiting time)

Increasing Simulation Speed

Technology Choice – TMB

Template Model Builder – TMB
R package (convenient)
Objective functions in C++ (fast)
Uses automatic differentiation (fast)
Options for statistical approaches

Aside – Automatic Differentiation

Classical methods of computer-assisted differentiation
- Numerical differentiation by finite differencing
- Symbolic differentiation by computer algebra systems
- These are both slow with many inputs
Automatic differentiation
- Automatically applies the chain rule to standard mathematical functions
- Will not give general answers (like symbolic differentiation)
- Give very fast and accurate evaluations of derivatives at specific values

Benchmark Example

TMB: Time difference of 7.8s
R: Time difference of 1741.0s
Speedup: 224x

But to be conservative we were seeing 60x speedups regularly

Does this mean that a workflow that took an hour, will now take one minute?

Not necessarily

Managing Expectations – Theory

Split up code into two sets:

Refactored code
Not refactored code

Simple but important insight:

If the code that is not refactored takes \(x\) minutes to run, the fastest run time that could be achieved by refactoring is \(x\) minutes

Managing Expectations – Theory

Two type of code that are in the “Not refactored” set

The code in MacPan that is being executed by the user’s job, but is not refactored
The code that the user wrote (don’t forget about this!)

Managing Expectations – Theory

Run-times of two sets of code, in different states

\(r_{\text{before}}\): refactored code, before refactoring
\(r_{\text{not_done}}\): unrefactored code
\(r_{\text{after}}\): refactored code, after refactoring

The total run-time before refactoring: \[ t_{\text{before}} = r_{\text{before}} + r_{\text{not_done}} \]

The total run-time after refactoring: \[ t_{\text{after}} = r_{\text{after}} + r_{\text{not_done}} \]

Managing Expectations – Theory

Dimensionless measures of this system:

Realized speedup: \(\tau = \frac{t_\text{before}}{t_\text{after}}\)
Naive speedup: \(\alpha = \frac{r_\text{before}}{r_\text{after}}\)
Fraction refactored: \(\rho = \frac{r_\text{before}}{t_{\text{before}}}\)

Expression for the realized speedupsThis is Amdahl’s Law: \[ \tau = \frac{1}{\frac{\rho}{\alpha} + 1 - \rho} \]

Managing Expectations – Theory

Conservative but typical realized speedups: \(\tau \approx 6\)
Conservative but typical naive speedups: \(\alpha \approx 60\)

Estimate fraction refactored, \(\rho\), by rearranging Amdahl’s Law

\[ \rho = \frac{1 - \frac{1}{\tau}}{1 - \frac{1}{\alpha}} \approx 0.85 \]

Performance Enhancements in Practice

Removing performance bottlenecks rarely results in speed ups
People try even more challenging computations
Measuring performance gains in units of ‘times speed-up’ misses the point
More of a binary thing:

Has the performance optimization allowed people to do things that they couldn’t do before?

Testimonial

I just wanted to send out this email to express how impressed/happy I am running the new TMB version.

It is insanely fast and definitely saves me hours everyday and allows me to explore more scenarios.

I can do many more things I couldn’t before due to computation limitations.

Thank you, you guys are heroes!

Increasing Model Flexibility

Original McMasterPandemic Architecture

Create any compartment topology
Fairly general specification of rates of flow among compartments

Example – SIR

state = c(S = 20000, I = 100, R = 0)
params = c(gamma = 0.06, beta = 0.15, N = sum(state))

sir_model = (init_model(
    params, state,
    start_date = "2000-01-01",
    end_date = "2000-05-01",
  )

  %>% add_rate("S", "I", ~ (1/N) * (beta) * (I))
  %>% add_rate("I", "R", ~ (gamma))
)

Example – SIR

(sir_model
 %>% simulate
 %>% filter(variable %in% names(state))
 %>% rename(compartment = variable, state = value)
 %>% mutate(compartment = factor(compartment, names(sir_model$state)))
 %>% ggplot
  +  geom_line(aes(x = Date, y = state, colour = compartment))
)

Example – SIR

(sir_model
 %>% simulate(format = 'wide')
 %>% rename(`Force of Infection` = S_to_I)
 %>% ggplot
  +  geom_line(aes(x = Date, y = `Force of Infection`))
)

Example – SIR

Example – Matrix-Valued Rates

Create symbolic vectors and matrices

C = vec('Ca', 'Cp', '(Cm) * (1 - iso_m)', '(Cs) * (1 - iso_s)')

Use vectors and matrices to define rates

%>% add_rate("S", "E", sum(I * C * beta * inverse_N))

Keep things compact even when the model gets complex

%>% vec_rate(
  "S" %_% vax_cat,
  "E" %_% vax_cat,
  kronecker(vax_trans_red, t(baseline_trans_rates)) %*% Istate
)

Example – Erlang SEIR Model

Image from Comsol Blog, Model described in Champredon et al (2018)

Example – Erlang SEIR Model

m = 5 # number of E boxes
(chained_E_boxes = "E" %_% 1:m)

## [1] "E_1" "E_2" "E_3" "E_4" "E_5"

# sojourn through exposed compartments
  %>% rep_rate(
    chained_E_boxes[1:(m-1)],
    chained_E_boxes[2:m],
    ~ (m) * (sigma)
  )

Full example onlinehttps://canmod.net/misc/flex_intro#erlang-seir

Example – BC Omicron Model

Thank you Caroline Colijn! (unpublished)

Graph with McMasterPandemic. Code: https://canmod.net/misc/flex_intro#bc-covid-omicron

Library of Examples

https://canmod.net/misc/flex_intro

Interface Lessons

Simple models are easy anyways
Need to focus on complex structured models as the primary use case
Focus on modelling, not data manipulation or visualization – plug into standard tools

In Progress

Architecture

In Progress

Refactor data fitting components in C++
Utilize automatic differentiation
- More speed
- Better optimization steps
- Converge in fewer iterations
Flexibility in statistical methodology
- Latent variables
- Process noise
- Splines

Acknowledgements – Funding

Acknowledgements – People

Background

What does Refactoring mean?

What is McMasterPandemic?

Prediction of the Third Wave

Prediction of the Third Wave

Why Refactor McMasterPandemic?

Why Refactor? – slow forecasting

Why Refactor? – slow calbration

Why Refactor? – slow calbration

Why Refactor? – slow scenario exploration

Why Refactor? – slow scenario exploration

Why Refactor? – slow model development

What is a Compartmental Model?

Base McMasterPandemic Model

Model with Vaccination and Variants

Matrices and Linear Algebra

Matrices and Linear Algebra

Why Refactor? – slow model development

Why Refactor? – slow model development

Architecture

Architecture

Refactored Architecture

Increasing Simulation Speed

Technology Choice – TMB

Aside – Automatic Differentiation

Benchmark Example

Managing Expectations – Theory

Managing Expectations – Theory

Managing Expectations – Theory

Managing Expectations – Theory

Managing Expectations – Theory

Performance Enhancements in Practice

Testimonial

Increasing Model Flexibility

Original McMasterPandemic Architecture

Original McMasterPandemic Architecture

Example – SIR

Example – SIR

Example – SIR

Example – SIR

Example – SIR

Example – Matrix-Valued Rates

Example – Erlang SEIR Model

Example – Erlang SEIR Model

Example – BC Omicron Model

Library of Examples

Interface Lessons

In Progress

Architecture

In Progress

Thank you!