| Pathogen | Approximate R₀ | Transmission Route |
|---|---|---|
| Measles virus | 12–18 | Airborne |
| Pertussis (B. pertussis) | 12–17 | Droplet/contact |
| Chickenpox (VZV) | 8–10 | Airborne/contact |
| Mumps virus | 4–7 | Droplet |
| Rubella virus | 5–7 | Droplet |
| Polio virus | 5–7 | Faecal-oral |
| SARS-CoV-2 (original) | 2–3 | Airborne/droplet |
| SARS-CoV-2 (Delta variant) | 5–8 | Airborne |
| SARS-CoV-2 (Omicron variant) | 10–18 | Airborne |
| Seasonal influenza | 1.2–1.4 | Droplet |
| HIV (sexual transmission) | 2–5 | Contact |
| Ebola virus | 1.5–2.5 | Direct contact |
| SARS-CoV-1 | 2–5 | Droplet/aerosol |
Infectious Disease Modelling in Epidemiology
Modules 1 & 2: Foundations and the SIR Framework
MODULE 1: Foundations of Infectious Disease Epidemiology
0.1 Introduction: Why Do We Model Infectious Diseases?
Before students open R or draw a single flow diagram, they must understand why mathematical modelling exists as a discipline within epidemiology. The answer is not simply “to predict outbreaks,” though prediction is one application. The deeper answer is that infectious disease systems are dynamically complex — they involve feedback loops, threshold effects, and population-level phenomena that cannot be understood by intuition alone.
Consider a simple question: if you double the number of infectious individuals in a population, does the number of new infections double? Not necessarily. It depends on how many susceptible individuals remain, how frequently people contact one another, and whether the infectious individuals are isolated. These are non-linear, interdependent relationships. Mathematics gives us a language precise enough to capture them.
0.1.1 The Historical Imperative
The intellectual history of infectious disease modelling is inseparable from the history of epidemiology itself. Students should appreciate this lineage not as trivia, but as evidence that the questions they are learning to answer have been among the most consequential in the history of human health.
John Snow (1854) is often celebrated as the father of epidemiology, not because he built a mathematical model, but because he thought spatially and causally about cholera transmission — mapping cases around the Broad Street pump in Soho, London. Snow’s work was implicitly a model: he hypothesised a transmission mechanism (waterborne), identified an exposure source, and tested the hypothesis by removing the pump handle. This logical structure — mechanism, exposure, test — underlies all infectious disease modelling.
Ronald Ross (1911) was the first to apply differential equations to an infectious disease problem. Working on malaria transmission between humans and mosquitoes, Ross derived mathematical conditions under which malaria could be eradicated. His crucial insight was the threshold theorem: malaria could be eliminated not by reducing mosquito numbers to zero, but by reducing them below a critical threshold. This was a paradigm-shifting result, demonstrating that mathematical analysis could generate non-obvious, policy-relevant conclusions.
Kermack and McKendrick (1927) generalised Ross’s ideas into the foundational paper of modern epidemic theory. Their 1927 paper, “A Contribution to the Mathematical Theory of Epidemics,” introduced what we now call the SIR model and proved the epidemic threshold theorem: an epidemic can only occur if the density of susceptibles exceeds a critical value. Nearly a century later, this paper remains one of the most cited in epidemiology.
The COVID-19 pandemic (2020–2022) brought infectious disease modelling into public consciousness with extraordinary force. Models from groups like the Imperial College London COVID-19 Response Team directly informed national lockdown decisions. They also exposed the limitations of models when data were sparse, parameters uncertain, and political pressures intense. Students entering this field today inherit both the power and the responsibility that this history represents.
0.1.2 What Models Can and Cannot Do
A recurring theme throughout this course — and one that must be established clearly in Module 1 — is the epistemological status of models. Students must understand the following:
Models are simplifications by design. The statistician George Box wrote: “All models are wrong, but some are useful.” A model that perfectly reproduced every individual interaction in a population would be as complex as the population itself and therefore useless as an analytical tool. The art of modelling is deciding what to leave out.
Models generate hypotheses, not facts. A model predicts what would happen under a specified set of assumptions. Whether those assumptions hold in the real world is an empirical question.
Models are tools for structured thinking. Even a model that is never fitted to data forces the modeller to specify assumptions explicitly. What drives transmission? How long are people infectious? Are all age groups equally susceptible?
Models can be wrong in important ways. Omitting heterogeneity — age structure, spatial clustering, behavioural variation — can lead to systematically biased predictions. This is a reason to model carefully and communicate uncertainty honestly, not a reason to avoid modelling.
0.2 The Epidemiological Triad
Every infectious disease event involves the interaction of three elements: a host, an agent, and an environment. This framework, known as the epidemiological triad, is one of the oldest conceptual tools in epidemiology. While modern infectious disease modelling often transcends this simple framework, it provides an indispensable starting point.
0.2.1 The Host
The host is the organism — in most contexts a human being — that is susceptible to infection. Host characteristics that influence transmission dynamics include:
Immunological status. A host that has previously been infected by a pathogen, or vaccinated against it, may have partial or complete immunity. Immunity reduces the probability of infection upon exposure and reduces the infectious period if infection does occur. The distribution of immune states across a population is one of the most important determinants of whether an epidemic can occur.
Age. Age influences both susceptibility (younger children may lack immune memory; older adults may have weakened immune responses) and exposure (children mix intensively in schools; adults mix differently in workplaces). Age-structured models, introduced in Module 6, capture these differences formally.
Behaviour. Sexual behaviour (relevant for HIV, gonorrhoea, HPV), hand hygiene (relevant for cholera, norovirus), and mask-wearing (relevant for influenza, SARS-CoV-2) all modify transmission probabilities. Behaviour is among the hardest quantities to parameterise, yet among the most important.
Nutritional status and comorbidities. Malnutrition, HIV co-infection, and diabetes all increase susceptibility to certain pathogens (e.g., tuberculosis). In low-income settings, these interactions can substantially modify model predictions.
0.2.2 The Agent
The agent is the pathogen: a virus, bacterium, parasite, fungus, or prion. Key agent characteristics include:
Infectivity. The probability that an exposed host becomes infected. This is distinct from transmissibility, which refers to the probability of transmission occurring during a contact between an infectious and susceptible individual.
Pathogenicity. The proportion of infected individuals who develop clinical disease. A highly pathogenic organism produces severe illness in most infected individuals; a low-pathogenicity organism may infect widely while causing few recognisable cases.
Virulence. The severity of disease produced in those who become ill. Virulence is related to, but distinct from, pathogenicity. Ebola virus has both high pathogenicity and high virulence; many rhinoviruses (common cold) have high pathogenicity but low virulence.
Antigenicity and mutation rate. Pathogens that mutate rapidly (e.g., influenza A, SARS-CoV-2) can evade prior immunity, complicating vaccination strategies and enabling repeated epidemics in the same population.
Survival outside the host. Some pathogens survive for extended periods on surfaces (norovirus, Clostridioides difficile), in water (cholera vibrio, Cryptosporidium), or in aerosols (measles virus, Mycobacterium tuberculosis). Survival determines the viability of indirect transmission routes.
0.2.3 The Environment
The environment encompasses all extrinsic factors that influence the probability and frequency of host–agent contact.
Physical environment. Temperature and humidity affect pathogen survival and vector biology. Malaria transmission is strongly seasonal in sub-Saharan Africa, peaking with rainfall that creates vector breeding sites. Influenza epidemics are concentrated in winter months in temperate climates, partly because cold, dry air favours viral survival in aerosols and partly because people congregate indoors.
Social and built environment. Population density, housing quality, sanitation infrastructure, and health system capacity all mediate transmission. Cholera thrives where water treatment is inadequate. Tuberculosis spreads in overcrowded households and poorly ventilated congregate settings.
Healthcare environment. Hospitals are environments where vulnerable people concentrate and pathogen exposure risks are elevated. Healthcare-associated infections (HAIs) — including MRSA and C. difficile — require models that incorporate healthcare contact patterns distinct from community transmission.
0.3 Transmission Dynamics: The Mechanics of Spread
Understanding how pathogens move between hosts is the mechanistic foundation of all infectious disease models.
0.3.1 Direct Transmission
Contact transmission occurs through physical touching, including sexual contact. HIV, gonorrhoea, and syphilis are transmitted primarily through sexual contact.
Droplet transmission occurs when large respiratory droplets (>5 μm) produced by coughing, sneezing, or talking travel short distances (generally <1–2 metres) and are deposited on mucous membranes of a susceptible host. Influenza, pertussis, and meningococcal disease can be transmitted by this route.
Airborne transmission involves smaller particles (<5 μm, sometimes called droplet nuclei) that remain suspended in the air for extended periods and can travel beyond 1–2 metres. Measles, varicella (chickenpox), and tuberculosis are classic examples.
0.3.2 Indirect Transmission
Vehicle-borne transmission occurs through contaminated inanimate objects (fomites) or substances. Cholera and typhoid are transmitted through contaminated water; Salmonella, Campylobacter, and Shiga toxin-producing E. coli spread through contaminated food.
Vector-borne transmission involves a living intermediary — most commonly an arthropod — that carries the pathogen from one host to another:
- Mechanical vectors: the pathogen is carried on the vector’s surface without replication (e.g., houseflies carrying enteric pathogens).
- Biological vectors: the pathogen undergoes replication or development within the vector (e.g., Plasmodium species completing part of their life cycle in Anopheles mosquitoes; dengue virus replicating in Aedes aegypti).
Vertical transmission refers to transmission from parent to offspring, either in utero (congenital), during delivery (perinatal), or through breastfeeding (postnatal). HIV, cytomegalovirus, and rubella can all be transmitted vertically.
0.3.3 Transmission Probability and the Force of Infection
The force of infection (λ, lambda) is the per capita rate at which susceptible individuals acquire infection per unit time. It is a function of:
- The prevalence of infectious individuals in the population
- The contact rate between susceptible and infectious individuals
- The per-contact probability of transmission
Mathematically, in the simplest case:
\[\lambda = \beta \cdot \frac{I}{N}\]
where β is the transmission rate, \(I\) is the number of infectious individuals, and \(N\) is the total population size. This expression is the engine of the SIR model and will be derived carefully in Module 2.
0.4 Natural History of Infection
The natural history of an infectious disease describes the typical progression of infection in an individual host from initial exposure through to resolution (recovery, chronic infection, or death) in the absence of intervention.
0.4.1 Stages of Infection
| Stage | Definition |
|---|---|
| Exposure | Contact with the pathogen; does not guarantee infection |
| Incubation period | Time from infection to onset of clinical symptoms |
| Latent period | Time from infection to onset of infectiousness |
| Infectious period | Duration during which the host can transmit the pathogen |
| Resolution | Recovery (with or without immunity) or death |
Exposure. A susceptible individual encounters the pathogen. Exposure does not guarantee infection; it depends on the infectious dose, route of exposure, and host immunological status.
Incubation period. The time between initial infection and the onset of clinical symptoms. The incubation period varies by pathogen: hours to days for Staphylococcus aureus food poisoning, 2–14 days for COVID-19, 1–3 weeks for measles, months for tuberculosis.
Latent period. The time between infection and the onset of infectiousness. This is the period that determines whether an individual in the E (Exposed) compartment can transmit the pathogen.
The latent period and incubation period are not the same:
Pre-symptomatic transmission occurs when the latent period is shorter than the incubation period. The host becomes infectious before symptoms appear. SARS-CoV-2 and influenza A both exhibit substantial pre-symptomatic transmission — individuals have no reason to isolate and may interact normally.
Asymptomatic transmission occurs when some infected individuals never develop symptoms yet remain infectious. Estimates suggest 30–40% of SARS-CoV-2 infections were asymptomatic.
Infectious period. The duration over which an infected individual can transmit the pathogen to susceptible contacts. This period is parameterised in models as \(1/\gamma\) (gamma), where γ is the recovery rate.
0.4.2 Clinical Spectrum of Infection
For any given pathogen, there is typically a spectrum of outcomes: subclinical (asymptomatic) infection → mild disease → moderate disease → severe disease → critical disease → death. This spectrum has direct implications for surveillance. Because mild and asymptomatic cases are often not diagnosed or reported, surveillance data systematically undercount true infections — a ascertainment bias that must be accounted for when fitting models to reported case data.
0.5 Epidemic, Endemic, and Pandemic: Precise Definitions
| Term | Definition | Model Interpretation |
|---|---|---|
| Epidemic | Cases in excess of what is normally expected in a defined area or season | Exponential growth phase; \(R_t > 1\) |
| Endemic | Relatively constant disease presence in a population over time | Stable equilibrium; incidence neither grows nor declines |
| Outbreak | Geographically confined epidemic (hospital, community, district) | Localised epidemic curve |
| Pandemic | Epidemic spanning multiple countries or continents | Multi-patch spread; global Rₜ > 1 |
0.6 The Basic Reproduction Number (R₀): Concept Before Formula
No concept in infectious disease epidemiology is more central — or more frequently misunderstood — than the basic reproduction number, R₀ (pronounced “R-naught”).
0.6.1 Intuitive Definition
R₀ is the average number of secondary infections generated by a single infectious individual introduced into a fully susceptible population, in the absence of interventions or prior immunity.
Every word of this definition matters:
- Average: R₀ is a population-level average. Individual variation in infectiousness can be enormous (the concept of superspreading, discussed below).
- Single infectious individual: R₀ describes what happens from a single index case, not the rate of growth of an established epidemic.
- Fully susceptible population: If any proportion of the population is already immune, the effective reproduction number is lower than R₀.
- Absence of interventions: R₀ describes the intrinsic biology of the host–pathogen interaction.
0.6.2 Interpreting R₀
The threshold nature of R₀ is its most important property:
- R₀ < 1: Each infectious individual generates fewer than one new infection on average. Chains of transmission are self-limiting; the pathogen cannot sustain itself.
- R₀ = 1: The infection is at a critical threshold; neither growing nor declining.
- R₀ > 1: Each infectious individual generates more than one new infection. The infection can spread; the larger R₀, the faster the growth.
0.6.3 What Determines R₀?
R₀ emerges from the interaction of three factors:
\[R_0 = \beta_c \cdot c \cdot D = \frac{\beta}{\gamma}\]
where: - \(\beta_c\) = transmission probability per contact
- \(c\) = contact rate (contacts per unit time)
- \(D = 1/\gamma\) = mean duration of infectiousness
- \(\beta = \beta_c \times c\) = overall transmission rate
The crucial implication: R₀ can be reduced by targeting any of its three components:
| Intervention | Component targeted |
|---|---|
| Masks, antivirals, condoms | Reduce \(\beta_c\) |
| Social distancing, school closures, travel restrictions | Reduce \(c\) |
| Treatment shortening infectious period, isolation | Reduce \(D\) |
0.6.4 Approximate R₀ Values for Common Pathogens
Measles has one of the highest R₀ values of any human pathogen, which explains why achieving herd immunity against measles requires vaccination coverage of approximately 95% of the population.
0.6.5 Superspreading and the Limits of R₀
R₀ is a mean, and means can be misleading when underlying distributions are highly skewed. Empirical evidence suggests that for many pathogens — including SARS-CoV-1, SARS-CoV-2, Ebola, and tuberculosis — the distribution of individual reproductive numbers is overdispersed: most infected individuals cause few or no secondary infections, while a small minority (“superspreaders”) generate a disproportionate number of cases.
The negative binomial distribution is commonly used to model overdispersed transmission, with a dispersion parameter \(k\): small values of \(k\) indicate high overdispersion. SARS-CoV-1 was estimated to have \(k \approx 0.16\), implying extreme superspreading — roughly 20% of cases were responsible for 80% of transmission.
0.7 R Implementation: Module 1
0.7.1 Session 1.1: Project Setup and Package Loading
# Install packages if needed (run once)
# install.packages(c("tidyverse", "incidence2", "lubridate"))
library(tidyverse)
library(incidence2)
library(lubridate)0.7.2 Session 1.2: Epidemic Curve Visualisation with incidence2
The epidemic curve (epi curve) is the most fundamental descriptive tool in outbreak investigation. It plots case counts against time and communicates — at a glance — the phase of an epidemic (growth, plateau, decline), the likely incubation period, and potential exposure events.
# Simulated linelist: date of symptom onset for 200 cases
set.seed(42)
linelist <- data.frame(
id = 1:200,
date_onset = as.Date("2024-01-01") +
round(rgamma(200, shape = 3, rate = 0.15))
)
# Build incidence object (weekly intervals)
epi_curve <- incidence(
x = linelist,
date_index = "date_onset",
interval = "week"
)
# Plot epidemic curve
plot(epi_curve) +
labs(
title = "Simulated Outbreak — Weekly Epidemic Curve",
subtitle = "Gamma-distributed symptom onset dates (n = 200 cases)",
x = "Week of symptom onset",
y = "Number of cases"
) +
theme_minimal(base_size = 13) +
theme(plot.title = element_text(face = "bold"))From the epidemic curve alone, students should be able to identify:
- The epidemic type (point source vs propagated): a point source outbreak shows a sharp rise and decline following a single exposure; a propagated outbreak shows a more gradual, wave-like pattern.
- The approximate epidemic peak.
- Whether the epidemic is growing, plateauing, or declining at any given point.
0.7.3 Session 1.3: Calculating Descriptive Epidemiological Parameters
# Attack rate
total_population <- 5000
total_cases <- 200
attack_rate <- total_cases / total_population
cat("Attack rate:", round(attack_rate * 100, 2), "%\n")Attack rate: 4 %
# Case fatality rate (CFR)
total_deaths <- 12
CFR <- total_deaths / total_cases
cat("CFR:", round(CFR * 100, 2), "%\n")CFR: 6 %
# Doubling time from early growth phase (naive exponential estimate)
early_cases <- linelist %>%
filter(date_onset <= as.Date("2024-01-21")) %>% # first 3 weeks
count(week = lubridate::floor_date(date_onset, "week")) %>%
filter(n > 0)
# Fit simple exponential growth model
model <- lm(log(n) ~ as.numeric(week), data = early_cases)
growth_rate <- coef(model)[2]
doubling_time <- log(2) / growth_rate
cat("Estimated doubling time:", round(doubling_time, 1), "days\n")Estimated doubling time: -15.1 days
0.8 Module 1 References
- Kermack, W. O., & McKendrick, A. G. (1927). A contribution to the mathematical theory of epidemics. Proceedings of the Royal Society of London. Series A, 115(772), 700–721.
- Anderson, R. M., & May, R. M. (1991). Infectious Diseases of Humans: Dynamics and Control. Oxford University Press.
- Keeling, M. J., & Rohani, P. (2008). Modeling Infectious Diseases in Humans and Animals. Princeton University Press.
- Bjørnstad, O. N. (2018). Epidemics: Models and Data Using R. Springer.
- Gordis, L. (2014). Epidemiology (5th ed.). Elsevier Saunders.
- Diekmann, O., Heesterbeek, H., & Britton, T. (2013). Mathematical Tools for Understanding Infectious Disease Dynamics. Princeton University Press.
- Dietz, K. (1993). The estimation of the basic reproduction number for infectious diseases. Statistical Methods in Medical Research, 2(1), 23–41.
- Heesterbeek, J. A. P. (2002). A brief history of R₀ and a recipe for its calculation. Acta Biotheoretica, 50(3), 189–204.
- Delamater, P. L., Street, E. J., Leslie, T. F., Yang, Y. T., & Jacobsen, K. H. (2019). Complexity of the basic reproduction number (R₀). Emerging Infectious Diseases, 25(1), 1–4.
- Lloyd-Smith, J. O., Schreiber, S. J., Kopp, P. E., & Getz, W. M. (2005). Superspreading and the effect of individual variation on disease emergence. Nature, 438(7066), 355–359.
- Kamvar, Z. N., Cai, J., Pulliam, J. R. C., Schumacher, J., & Jombart, T. (2019). Epidemic curves made easy using the R package incidence. F1000Research, 8, 139.
MODULE 2: Compartmental Models — The SIR Framework
0.9 The Logic of Compartmental Modelling
A compartmental model divides a population into mutually exclusive subgroups — compartments — defined by their status with respect to the infection of interest. Individuals move between compartments according to mathematically specified rates. The model tracks the size of each compartment over time and produces predictions about epidemic dynamics.
The elegance of this approach lies in its parsimony. Rather than tracking every individual in a population (computationally expensive and often unnecessary), we track the aggregate flow between states. This is justified under the mean-field assumption: that individuals within a compartment are identical and mix randomly with individuals in all other compartments.
0.9.1 Population-Level vs Individual-Level Thinking
A critical conceptual transition for students — particularly those trained in clinical medicine or individual-level epidemiology — is shifting from thinking about what happens to an individual to thinking about what happens to a population.
A clinician treats a patient: the patient is either infected or not, recovers or does not. A compartmental modeller asks: given that 10% of the population is currently infectious, what fraction will become infected over the next two weeks? These are fundamentally different questions requiring different frameworks.
The compartmental model, in its basic form, is deterministic: given a set of initial conditions and parameters, the model produces a single, unique trajectory. Real epidemics are stochastic (random), but the deterministic approximation is very good when population sizes are large (typically N > 1,000–10,000). Module 5 addresses stochasticity explicitly.
0.10 Flow Diagrams: The Visual Language of Compartmental Models
Before writing a single differential equation, every student should be able to draw and interpret a flow diagram. This is not a simplification for beginners — it is how professional modellers think. The flow diagram is the model, expressed visually. The differential equations are simply a translation of the diagram into mathematical notation.
0.10.1 Elements of a Flow Diagram
A flow diagram consists of:
- Boxes representing compartments, each labelled with the compartment name and its variable (S, I, R, E, etc.)
- Arrows representing flows between compartments, each carrying a rate label
- Rate labels describing how fast the flow occurs — these are the model parameters
0.10.2 The SIR Flow Diagram
The simplest epidemic model with lasting immunity involves three compartments:
\[\boxed{S} \xrightarrow{\beta I/N} \boxed{I} \xrightarrow{\gamma} \boxed{R}\]
Reading this diagram aloud: susceptible individuals (S) become infectious (I) at a rate proportional to the current prevalence of infection (\(\beta I/N\)), where β is the transmission rate. Infectious individuals recover (R) at a constant rate γ (gamma). Recovered individuals are permanently immune and leave the dynamic system.
This diagram encodes several explicit assumptions:
- The population is closed — no births, deaths, or migration; total N = S + I + R is constant.
- All susceptible individuals are equally susceptible.
- All infectious individuals are equally infectious.
- Mixing is homogeneous — every individual contacts every other individual with equal probability.
- Recovery confers permanent, complete immunity.
- There is no latent period — infection is immediately followed by infectiousness.
Each of these assumptions can be relaxed (and will be in subsequent modules), but they define the canonical SIR model as formulated by Kermack and McKendrick.
0.11 From Flow Diagram to Differential Equations
The differential equations of the SIR model are a direct translation of the flow diagram. For each compartment, the equation describes the rate of change of its size over time: inflows minus outflows.
0.11.1 Deriving the SIR Equations
For the Susceptible compartment (S):
The only flow out of S is infection. The rate of infection is \(\beta \cdot S \cdot I/N\): the transmission rate β multiplied by the number of susceptibles S multiplied by the probability that a random contact is with an infectious individual (\(I/N\)). There are no inflows.
\[\frac{dS}{dt} = -\frac{\beta S I}{N}\]
For the Infectious compartment (I):
One inflow (new infections from S) and one outflow (recovery to R):
\[\frac{dI}{dt} = \frac{\beta S I}{N} - \gamma I\]
For the Recovered compartment (R):
One inflow (recoveries from I) and no outflows:
\[\frac{dR}{dt} = \gamma I\]
0.11.2 Conservation of Population
A key sanity check: N = S + I + R should be constant over time. Adding all three equations:
\[\frac{dS}{dt} + \frac{dI}{dt} + \frac{dR}{dt} = -\frac{\beta S I}{N} + \frac{\beta S I}{N} - \gamma I + \gamma I = 0\]
The terms cancel exactly, confirming that \(dN/dt = 0\) — the total population is conserved. ✓
0.11.3 The Mass Action Principle
The infection term \(\beta SI/N\) embodies the mass action principle, borrowed from chemical kinetics. Two formulations exist:
| Formulation | Expression | When to use |
|---|---|---|
| Frequency-dependent (standard incidence) | \(\beta \cdot S \cdot I/N\) | Contact rates do not increase with population size (most human diseases) |
| Density-dependent (mass action incidence) | \(\beta \cdot S \cdot I\) | Contact rates increase with density (some vector-borne diseases) |
With frequency-dependent incidence, R₀ is independent of population size. With density-dependent incidence, R₀ scales with N. Students frequently confuse these formulations — the choice matters substantially for model predictions.
0.12 Parameters of the SIR Model
0.12.1 The Transmission Rate (β)
Biological meaning: β encodes the combined effects of the contact rate and per-contact transmission probability:
\[\beta = c \cdot p\]
where \(c\) is the average number of potentially infectious contacts per unit time, and \(p\) is the probability of transmission per contact.
Units: per person per unit time (e.g., day⁻¹).
Example: If β = 0.5 day⁻¹ and I/N = 0.1, then the daily per capita rate of new infections among susceptibles is 0.05 — 5% of susceptibles become infected each day.
Estimation:
- From early epidemic growth rate combined with serial interval data
- From household secondary attack rates combined with contact frequency data
- By fitting the full SIR model to incidence data (Module 9)
0.12.2 The Recovery Rate (γ)
Biological meaning: γ is the per capita rate at which infectious individuals recover per unit time. The mean infectious period is \(1/\gamma\).
Units: per unit time (e.g., day⁻¹).
Example: If γ = 0.1 day⁻¹, then 10% of currently infectious individuals recover each day, and the mean infectious period is \(1/0.1 = 10\) days.
Under the SIR formulation, the time spent in the I compartment follows an exponential distribution with rate γ. The exponential distribution has a memoryless property: the probability of recovering today does not depend on how long the individual has already been infectious. This is a mathematical convenience, not a biological truth. More realistic distributions (gamma, Weibull) can be incorporated with the SEIR and related models (Module 3).
0.12.3 Deriving R₀ from the SIR Model
Having established the model structure, we can now derive R₀ rigorously. Consider the equation for \(dI/dt\):
\[\frac{dI}{dt} = I\left(\frac{\beta S}{N} - \gamma\right)\]
At the very beginning of an epidemic, virtually the entire population is susceptible: \(S \approx N\). Therefore:
\[\frac{dI}{dt} \approx I(\beta - \gamma) = \gamma I\left(\frac{\beta}{\gamma} - 1\right)\]
The infection grows (\(dI/dt > 0\)) if and only if \(\beta/\gamma > 1\). We therefore define:
\[\boxed{R_0 = \frac{\beta}{\gamma}}\]
This is a clean, model-derived definition: R₀ is the ratio of the rate at which new infections are generated (β) to the rate at which infections are resolved (γ). Each infectious individual generates new infections at rate β for an average duration of \(1/\gamma\), giving \(\beta \times (1/\gamma) = \beta/\gamma\) secondary infections.
0.12.4 The Epidemic Threshold Theorem
The epidemic threshold theorem states that a major epidemic can occur if and only if R₀ > 1, equivalently if the initial proportion of susceptibles \(S(0)/N\) exceeds \(1/R_0\).
Corollary 1 — Not everyone gets infected. Even when R₀ >> 1, the epidemic does not infect the entire population. As infection spreads, the susceptible pool depletes, reducing the force of infection. Eventually, susceptibles fall below \(N/R_0\) and the epidemic begins to decline. The final size of the epidemic is determined by the final size equation (discussed below).
Corollary 2 — Herd immunity. If a proportion \(v\) of the population is vaccinated (assuming perfect vaccine efficacy), the effective R₀ becomes \(R_0(1-v)\). For the epidemic to be controlled:
\[R_0(1-v) < 1 \implies v_c = 1 - \frac{1}{R_0}\]
| Pathogen | R₀ | Herd Immunity Threshold (\(v_c\)) |
|---|---|---|
| Measles | ~15 | ~93% |
| Pertussis | ~15 | ~93% |
| COVID-19 (original) | ~2.5 | ~60% |
| COVID-19 (Omicron) | ~12 | ~92% |
| Seasonal influenza | ~1.3 | ~23% |
0.12.5 The Final Size Equation
The final proportion of the population ever infected (\(z\)) satisfies:
\[1 - z = e^{-R_0 z}\]
This transcendental equation has no closed-form solution but can be solved numerically:
# Numerically solve the final size equation for a range of R0 values
final_size_solve <- function(R0) {
# uniroot finds z such that (1 - z) - exp(-R0 * z) = 0
f <- function(z) (1 - z) - exp(-R0 * z)
sol <- uniroot(f, interval = c(1e-6, 1 - 1e-6))
sol$root
}
R0_vals <- seq(1.01, 10, by = 0.05)
final_size <- sapply(R0_vals, final_size_solve)
fs_df <- data.frame(R0 = R0_vals, attack_rate = final_size * 100)
ggplot(fs_df, aes(x = R0, y = attack_rate)) +
geom_line(colour = "#c0392b", linewidth = 1.3) +
geom_hline(yintercept = c(58, 80, 89), linetype = "dashed",
colour = "grey50", linewidth = 0.7) +
annotate("text", x = 9.8, y = 60, label = "R₀=1.5 → 58%",
size = 3.5, hjust = 1, colour = "#555555") +
annotate("text", x = 9.8, y = 82, label = "R₀=2.0 → 80%",
size = 3.5, hjust = 1, colour = "#555555") +
annotate("text", x = 9.8, y = 91, label = "R₀=2.5 → 89%",
size = 3.5, hjust = 1, colour = "#555555") +
labs(
title = "Final Epidemic Size as a Function of R₀",
subtitle = "Derived from the Kermack–McKendrick final size equation: 1 − z = exp(−R₀z)",
x = "Basic reproduction number (R₀)",
y = "Final attack rate (%)"
) +
theme_minimal(base_size = 13) +
theme(plot.title = element_text(face = "bold"))An R₀ of just 2 leads to 80% of the population being infected in the absence of interventions or prior immunity. This is a counterintuitive result for many students. The implication is that interventions that reduce R₀ — even without eliminating transmission — can have enormous impact on total disease burden.
0.13 Dynamics of the SIR Model
0.13.1 Qualitative Behaviour
The SIR model produces a characteristic epidemic trajectory across four phases:
Exponential growth phase: When \(S \approx N\) and \(I\) is small, \(dI/dt \approx \gamma I(R_0 - 1)\), and \(I\) grows approximately exponentially. The growth rate is \(r = \gamma(R_0 - 1)\).
Peak: The epidemic peaks when \(dI/dt = 0\), i.e., when \(\beta S/N = \gamma\), i.e., when \(S = N/R_0\). At this moment, depletion of susceptibles has reduced transmission to exactly the replacement rate.
Decline: Once \(S < N/R_0\), new infections are generated more slowly than recoveries occur, and \(I\) begins to decline. Note: R₀ itself has not changed; the effective reproduction number \(R_t = R_0 \cdot S/N\) has fallen below 1.
Epidemic burnout: \(I\) declines toward zero, but \(S\) does not reach zero. A residual pool of susceptibles always remains — too few to sustain transmission, but not zero.
0.13.2 The Effective Reproduction Number Rₜ
The effective reproduction number at time \(t\) is:
\[R_t = R_0 \cdot \frac{S(t)}{N}\]
- At \(t = 0\): \(R_t = R_0\) (full susceptibility)
- When \(R_t = 1\): the epidemic is at its peak
- When \(R_t < 1\): the epidemic is declining
0.14 R Implementation: Module 2
0.14.1 Session 2.1: Setting Up Required Packages
# Install if needed
# install.packages(c("deSolve", "tidyverse"))
library(deSolve)
library(tidyverse)0.14.2 Session 2.2: Implementing the SIR Model with deSolve
# ---- Step 1: Define the ODE function ----
sir_model <- function(time, state, params) {
with(as.list(c(state, params)), {
N <- S + I + R
dS_dt <- -beta * S * I / N
dI_dt <- beta * S * I / N - gamma * I
dR_dt <- gamma * I
return(list(c(dS_dt, dI_dt, dR_dt)))
})
}
# ---- Step 2: Define parameters ----
params <- c(
beta = 0.4, # transmission rate (day^-1)
gamma = 0.1 # recovery rate (day^-1)
# R0 = beta/gamma = 0.4/0.1 = 4
)
# ---- Step 3: Define initial conditions ----
# 1 infectious individual in a fully susceptible population of 10,000
N <- 10000
inits <- c(S = N - 1, I = 1, R = 0)
# ---- Step 4: Define time vector ----
times <- seq(0, 200, by = 1) # 200 days, daily steps
# ---- Step 5: Solve the ODE system ----
output <- ode(
y = inits,
times = times,
func = sir_model,
parms = params
)
sir_df <- as.data.frame(output)
head(sir_df) time S I R
1 0 9999.000 1.000000 0.0000000
2 1 9998.534 1.349795 0.1166174
3 2 9997.904 1.821903 0.2740243
4 3 9997.054 2.459069 0.4864842
5 4 9995.908 3.318931 0.7732396
6 5 9994.361 4.479225 1.1602554
0.14.3 Session 2.3: Visualising SIR Compartment Dynamics
# Reshape to long format for ggplot
sir_long <- sir_df %>%
pivot_longer(
cols = c(S, I, R),
names_to = "Compartment",
values_to = "Count"
) %>%
mutate(
Compartment = factor(Compartment, levels = c("S", "I", "R")),
Label = case_when(
Compartment == "S" ~ "Susceptible",
Compartment == "I" ~ "Infectious",
Compartment == "R" ~ "Recovered"
)
)
ggplot(sir_long, aes(x = time, y = Count, colour = Label)) +
geom_line(linewidth = 1.3) +
scale_colour_manual(
values = c("Susceptible" = "#2196F3",
"Infectious" = "#F44336",
"Recovered" = "#4CAF50")
) +
scale_y_continuous(labels = scales::comma) +
labs(
title = "SIR Model Dynamics (R₀ = 4)",
subtitle = "N = 10,000; β = 0.4 day⁻¹; γ = 0.1 day⁻¹; 1 seed case",
x = "Time (days)",
y = "Number of individuals",
colour = "Compartment"
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold"),
legend.position = "bottom"
)0.14.4 Session 2.4: Sensitivity Analysis — Varying R₀
# Function to run SIR for a given R0
run_sir <- function(R0, gamma = 0.1, N = 10000, days = 200) {
beta <- R0 * gamma
params <- c(beta = beta, gamma = gamma)
inits <- c(S = N - 1, I = 1, R = 0)
times <- seq(0, days, by = 1)
out <- ode(y = inits, times = times,
func = sir_model, parms = params)
as.data.frame(out) %>%
mutate(R0_label = paste0("R₀ = ", R0))
}
# Run for multiple R0 values
R0_values <- c(1.5, 2.0, 3.0, 5.0, 8.0)
results_df <- map_dfr(R0_values, run_sir)
# Plot infectious curves
results_df %>%
ggplot(aes(x = time, y = I,
colour = factor(R0_label,
levels = paste0("R₀ = ", R0_values)),
group = R0_label)) +
geom_line(linewidth = 1.1) +
scale_colour_brewer(palette = "Set1") +
scale_y_continuous(labels = scales::comma) +
labs(
title = "SIR Infectious Curve Across a Range of R₀ Values",
subtitle = "N = 10,000; γ = 0.1 day⁻¹; 1 seed case",
x = "Time (days)",
y = "Number infectious",
colour = "Scenario"
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold"),
legend.position = "right"
)From the sensitivity plot, students should answer:
- Which value of R₀ produces the earliest peak? Why? (Higher R₀ → faster growth → earlier, sharper peak)
- For which R₀ is the epidemic most prolonged? (Lower R₀ near 1 → slow spread → longer epidemic)
- How does total final attack rate change across R₀ values? (Use the final size equation to verify)
0.14.5 Session 2.5: Verifying the Herd Immunity Threshold
# Analytic herd immunity threshold as a function of R0
R0_vals <- seq(1.1, 18, by = 0.1)
HIT <- 1 - 1 / R0_vals
hit_df <- data.frame(R0 = R0_vals, HIT_pct = HIT * 100)
ggplot(hit_df, aes(x = R0, y = HIT_pct)) +
geom_line(colour = "#9C27B0", linewidth = 1.3) +
geom_hline(yintercept = c(60, 93),
linetype = "dashed", colour = "grey50") +
annotate("text", x = 17.5, y = 62,
label = "COVID-19 original (~60%)", size = 3.5, hjust = 1) +
annotate("text", x = 17.5, y = 95,
label = "Measles / Omicron (~93%)", size = 3.5, hjust = 1) +
scale_y_continuous(limits = c(0, 100)) +
labs(
title = "Herd Immunity Threshold as a Function of R₀",
subtitle = expression(v[c] == 1 - 1/R[0]),
x = "Basic reproduction number (R₀)",
y = "Vaccination coverage required (%)"
) +
theme_minimal(base_size = 13) +
theme(plot.title = element_text(face = "bold"))0.14.6 Session 2.6: Incidence vs Prevalence — A Critical Distinction
Students often conflate incidence (new cases per day) with prevalence (current number of infectious cases). The SIR model generates prevalence \(I(t)\) directly. Incidence is the rate of flow from S to I.
# Compute daily incidence from the reduction in S
sir_df <- sir_df %>%
mutate(
incidence = c(NA, -diff(S)), # new infections per day = reduction in S
prevalence = I
)
# Plot both on the same axes
sir_df %>%
select(time, incidence, prevalence) %>%
pivot_longer(-time,
names_to = "measure",
values_to = "count") %>%
mutate(measure = str_to_title(measure)) %>%
ggplot(aes(x = time, y = count, colour = measure)) +
geom_line(linewidth = 1.2) +
scale_colour_manual(
values = c("Incidence" = "#e74c3c",
"Prevalence" = "#2980b9")
) +
scale_y_continuous(labels = scales::comma) +
labs(
title = "Incidence vs Prevalence in the SIR Model",
subtitle = "Note: incidence peak precedes prevalence peak by approximately 1/γ days",
x = "Time (days)",
y = "Cases",
colour = NULL
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold"),
legend.position = "bottom"
)The incidence peak precedes the prevalence peak by approximately \(1/\gamma\) days. This means that by the time observed case counts in a surveillance system appear to peak, transmission has often already begun to decline. Delayed or lagged reporting further amplifies this gap. Modellers and public health practitioners must account for this when interpreting real-time surveillance data.
0.14.7 Session 2.7: Computing Rₜ Over Time from SIR Output
# R0 and parameters
R0 <- 0.4 / 0.1 # beta / gamma = 4
N_pop <- 10000
# Compute Rt = R0 * S(t)/N at each time point
sir_df <- sir_df %>%
mutate(Rt = R0 * S / N_pop)
ggplot(sir_df, aes(x = time, y = Rt)) +
geom_line(colour = "#e67e22", linewidth = 1.3) +
geom_hline(yintercept = 1, linetype = "dashed",
colour = "#c0392b", linewidth = 0.9) +
annotate("text", x = 180, y = 1.15,
label = "Rₜ = 1 (epidemic peak)", size = 3.5,
colour = "#c0392b") +
labs(
title = "Effective Reproduction Number Rₜ Over the Course of the Epidemic",
subtitle = expression(R[t] == R[0] %.% S(t)/N),
x = "Time (days)",
y = expression(R[t])
) +
theme_minimal(base_size = 13) +
theme(plot.title = element_text(face = "bold"))- When Rₜ > 1: the epidemic is growing. The gap above 1 indicates how fast.
- When Rₜ = 1: the epidemic is at its peak in terms of incidence.
- When Rₜ < 1: the epidemic is declining. Rₜ will never return to R₀ because susceptibles are progressively depleted.
0.15 Module 2 References
- Kermack, W. O., & McKendrick, A. G. (1927). A contribution to the mathematical theory of epidemics. Proceedings of the Royal Society of London. Series A, 115(772), 700–721.
- Kermack, W. O., & McKendrick, A. G. (1932). Contributions to the mathematical theory of epidemics: II. The problem of endemicity. Proceedings of the Royal Society of London. Series A, 138(834), 55–83.
- Kermack, W. O., & McKendrick, A. G. (1933). Contributions to the mathematical theory of epidemics: III. Further studies of the problem of endemicity. Proceedings of the Royal Society of London. Series A, 141(843), 94–122.
- Keeling, M. J., & Rohani, P. (2008). Modeling Infectious Diseases in Humans and Animals. Princeton University Press.
- Anderson, R. M., & May, R. M. (1991). Infectious Diseases of Humans: Dynamics and Control. Oxford University Press.
- Bjørnstad, O. N. (2018). Epidemics: Models and Data Using R. Springer.
- Diekmann, O., Heesterbeek, H., & Britton, T. (2013). Mathematical Tools for Understanding Infectious Disease Dynamics. Princeton University Press.
- Vynnycky, E., & White, R. (2010). An Introduction to Mathematical Modelling of Infectious Diseases. Oxford University Press.
- Hethcote, H. W. (2000). The mathematics of infectious diseases. SIAM Review, 42(4), 599–653.
- Heesterbeek, J. A. P., & Dietz, K. (1996). The concept of R₀ in epidemic theory. Statistica Neerlandica, 50(1), 89–110.
- Fine, P., Eames, K., & Heymann, D. L. (2011). “Herd immunity”: a rough guide. Clinical Infectious Diseases, 52(7), 911–916.
- McCallum, H., Barlow, N., & Hone, J. (2001). How should pathogen transmission be modelled? Trends in Ecology & Evolution, 16(6), 295–300.
- Soetaert, K., Petzoldt, T., & Setzer, R. W. (2010). Solving differential equations in R: Package deSolve. Journal of Statistical Software, 33(9), 1–25.