Infectious Disease Modelling in Epidemiology

Modules 1 & 2: Foundations and the SIR Framework

Author

Course Instructor: Timothy Achala

Published

June 5, 2026

MODULE 1: Foundations of Infectious Disease Epidemiology

0.1 Introduction: Why Do We Model Infectious Diseases?

Before students open R or draw a single flow diagram, they must understand why mathematical modelling exists as a discipline within epidemiology. The answer is not simply “to predict outbreaks,” though prediction is one application. The deeper answer is that infectious disease systems are dynamically complex — they involve feedback loops, threshold effects, and population-level phenomena that cannot be understood by intuition alone.

Consider a simple question: if you double the number of infectious individuals in a population, does the number of new infections double? Not necessarily. It depends on how many susceptible individuals remain, how frequently people contact one another, and whether the infectious individuals are isolated. These are non-linear, interdependent relationships. Mathematics gives us a language precise enough to capture them.

0.1.1 The Historical Imperative

The intellectual history of infectious disease modelling is inseparable from the history of epidemiology itself. Students should appreciate this lineage not as trivia, but as evidence that the questions they are learning to answer have been among the most consequential in the history of human health.

John Snow (1854) is often celebrated as the father of epidemiology, not because he built a mathematical model, but because he thought spatially and causally about cholera transmission — mapping cases around the Broad Street pump in Soho, London. Snow’s work was implicitly a model: he hypothesised a transmission mechanism (waterborne), identified an exposure source, and tested the hypothesis by removing the pump handle. This logical structure — mechanism, exposure, test — underlies all infectious disease modelling.

Ronald Ross (1911) was the first to apply differential equations to an infectious disease problem. Working on malaria transmission between humans and mosquitoes, Ross derived mathematical conditions under which malaria could be eradicated. His crucial insight was the threshold theorem: malaria could be eliminated not by reducing mosquito numbers to zero, but by reducing them below a critical threshold. This was a paradigm-shifting result, demonstrating that mathematical analysis could generate non-obvious, policy-relevant conclusions.

Kermack and McKendrick (1927) generalised Ross’s ideas into the foundational paper of modern epidemic theory. Their 1927 paper, “A Contribution to the Mathematical Theory of Epidemics,” introduced what we now call the SIR model and proved the epidemic threshold theorem: an epidemic can only occur if the density of susceptibles exceeds a critical value. Nearly a century later, this paper remains one of the most cited in epidemiology.

The COVID-19 pandemic (2020–2022) brought infectious disease modelling into public consciousness with extraordinary force. Models from groups like the Imperial College London COVID-19 Response Team directly informed national lockdown decisions. They also exposed the limitations of models when data were sparse, parameters uncertain, and political pressures intense. Students entering this field today inherit both the power and the responsibility that this history represents.

0.1.2 What Models Can and Cannot Do

A recurring theme throughout this course — and one that must be established clearly in Module 1 — is the epistemological status of models. Students must understand the following:

Core Modelling Principles

Models are simplifications by design. The statistician George Box wrote: “All models are wrong, but some are useful.” A model that perfectly reproduced every individual interaction in a population would be as complex as the population itself and therefore useless as an analytical tool. The art of modelling is deciding what to leave out.
Models generate hypotheses, not facts. A model predicts what would happen under a specified set of assumptions. Whether those assumptions hold in the real world is an empirical question.
Models are tools for structured thinking. Even a model that is never fitted to data forces the modeller to specify assumptions explicitly. What drives transmission? How long are people infectious? Are all age groups equally susceptible?
Models can be wrong in important ways. Omitting heterogeneity — age structure, spatial clustering, behavioural variation — can lead to systematically biased predictions. This is a reason to model carefully and communicate uncertainty honestly, not a reason to avoid modelling.

0.2 The Epidemiological Triad

Every infectious disease event involves the interaction of three elements: a host, an agent, and an environment. This framework, known as the epidemiological triad, is one of the oldest conceptual tools in epidemiology. While modern infectious disease modelling often transcends this simple framework, it provides an indispensable starting point.

0.2.1 The Host

The host is the organism — in most contexts a human being — that is susceptible to infection. Host characteristics that influence transmission dynamics include:

Immunological status. A host that has previously been infected by a pathogen, or vaccinated against it, may have partial or complete immunity. Immunity reduces the probability of infection upon exposure and reduces the infectious period if infection does occur. The distribution of immune states across a population is one of the most important determinants of whether an epidemic can occur.

Age. Age influences both susceptibility (younger children may lack immune memory; older adults may have weakened immune responses) and exposure (children mix intensively in schools; adults mix differently in workplaces). Age-structured models, introduced in Module 6, capture these differences formally.

Behaviour. Sexual behaviour (relevant for HIV, gonorrhoea, HPV), hand hygiene (relevant for cholera, norovirus), and mask-wearing (relevant for influenza, SARS-CoV-2) all modify transmission probabilities. Behaviour is among the hardest quantities to parameterise, yet among the most important.

Nutritional status and comorbidities. Malnutrition, HIV co-infection, and diabetes all increase susceptibility to certain pathogens (e.g., tuberculosis). In low-income settings, these interactions can substantially modify model predictions.

0.2.2 The Agent

The agent is the pathogen: a virus, bacterium, parasite, fungus, or prion. Key agent characteristics include:

Infectivity. The probability that an exposed host becomes infected. This is distinct from transmissibility, which refers to the probability of transmission occurring during a contact between an infectious and susceptible individual.

Pathogenicity. The proportion of infected individuals who develop clinical disease. A highly pathogenic organism produces severe illness in most infected individuals; a low-pathogenicity organism may infect widely while causing few recognisable cases.

Virulence. The severity of disease produced in those who become ill. Virulence is related to, but distinct from, pathogenicity. Ebola virus has both high pathogenicity and high virulence; many rhinoviruses (common cold) have high pathogenicity but low virulence.

Antigenicity and mutation rate. Pathogens that mutate rapidly (e.g., influenza A, SARS-CoV-2) can evade prior immunity, complicating vaccination strategies and enabling repeated epidemics in the same population.

Survival outside the host. Some pathogens survive for extended periods on surfaces (norovirus, Clostridioides difficile), in water (cholera vibrio, Cryptosporidium), or in aerosols (measles virus, Mycobacterium tuberculosis). Survival determines the viability of indirect transmission routes.

0.2.3 The Environment

The environment encompasses all extrinsic factors that influence the probability and frequency of host–agent contact.

Physical environment. Temperature and humidity affect pathogen survival and vector biology. Malaria transmission is strongly seasonal in sub-Saharan Africa, peaking with rainfall that creates vector breeding sites. Influenza epidemics are concentrated in winter months in temperate climates, partly because cold, dry air favours viral survival in aerosols and partly because people congregate indoors.

Social and built environment. Population density, housing quality, sanitation infrastructure, and health system capacity all mediate transmission. Cholera thrives where water treatment is inadequate. Tuberculosis spreads in overcrowded households and poorly ventilated congregate settings.

Healthcare environment. Hospitals are environments where vulnerable people concentrate and pathogen exposure risks are elevated. Healthcare-associated infections (HAIs) — including MRSA and C. difficile — require models that incorporate healthcare contact patterns distinct from community transmission.

0.3 Transmission Dynamics: The Mechanics of Spread

Understanding how pathogens move between hosts is the mechanistic foundation of all infectious disease models.

0.3.1 Direct Transmission

Contact transmission occurs through physical touching, including sexual contact. HIV, gonorrhoea, and syphilis are transmitted primarily through sexual contact.

Droplet transmission occurs when large respiratory droplets (>5 μm) produced by coughing, sneezing, or talking travel short distances (generally <1–2 metres) and are deposited on mucous membranes of a susceptible host. Influenza, pertussis, and meningococcal disease can be transmitted by this route.

Airborne transmission involves smaller particles (<5 μm, sometimes called droplet nuclei) that remain suspended in the air for extended periods and can travel beyond 1–2 metres. Measles, varicella (chickenpox), and tuberculosis are classic examples.

0.3.2 Indirect Transmission

Vehicle-borne transmission occurs through contaminated inanimate objects (fomites) or substances. Cholera and typhoid are transmitted through contaminated water; Salmonella, Campylobacter, and Shiga toxin-producing E. coli spread through contaminated food.

Vector-borne transmission involves a living intermediary — most commonly an arthropod — that carries the pathogen from one host to another:

Mechanical vectors: the pathogen is carried on the vector’s surface without replication (e.g., houseflies carrying enteric pathogens).
Biological vectors: the pathogen undergoes replication or development within the vector (e.g., Plasmodium species completing part of their life cycle in Anopheles mosquitoes; dengue virus replicating in Aedes aegypti).

Vertical transmission refers to transmission from parent to offspring, either in utero (congenital), during delivery (perinatal), or through breastfeeding (postnatal). HIV, cytomegalovirus, and rubella can all be transmitted vertically.

0.3.3 Transmission Probability and the Force of Infection

The force of infection (λ, lambda) is the per capita rate at which susceptible individuals acquire infection per unit time. It is a function of:

The prevalence of infectious individuals in the population
The contact rate between susceptible and infectious individuals
The per-contact probability of transmission

Mathematically, in the simplest case:

\[\lambda = \beta \cdot \frac{I}{N}\]

where β is the transmission rate, $I$ is the number of infectious individuals, and $N$ is the total population size. This expression is the engine of the SIR model and will be derived carefully in Module 2.

0.4 Natural History of Infection

The natural history of an infectious disease describes the typical progression of infection in an individual host from initial exposure through to resolution (recovery, chronic infection, or death) in the absence of intervention.

0.4.1 Stages of Infection

Key Stages in the Natural History of Infection

Stage	Definition
Exposure	Contact with the pathogen; does not guarantee infection
Incubation period	Time from infection to onset of clinical symptoms
Latent period	Time from infection to onset of infectiousness
Infectious period	Duration during which the host can transmit the pathogen
Resolution	Recovery (with or without immunity) or death

Exposure. A susceptible individual encounters the pathogen. Exposure does not guarantee infection; it depends on the infectious dose, route of exposure, and host immunological status.

Incubation period. The time between initial infection and the onset of clinical symptoms. The incubation period varies by pathogen: hours to days for Staphylococcus aureus food poisoning, 2–14 days for COVID-19, 1–3 weeks for measles, months for tuberculosis.

Latent period. The time between infection and the onset of infectiousness. This is the period that determines whether an individual in the E (Exposed) compartment can transmit the pathogen.

Critical Distinction: Latent vs Incubation Period

The latent period and incubation period are not the same:

Pre-symptomatic transmission occurs when the latent period is shorter than the incubation period. The host becomes infectious before symptoms appear. SARS-CoV-2 and influenza A both exhibit substantial pre-symptomatic transmission — individuals have no reason to isolate and may interact normally.
Asymptomatic transmission occurs when some infected individuals never develop symptoms yet remain infectious. Estimates suggest 30–40% of SARS-CoV-2 infections were asymptomatic.

Infectious period. The duration over which an infected individual can transmit the pathogen to susceptible contacts. This period is parameterised in models as $1/\gamma$ (gamma), where γ is the recovery rate.

0.4.2 Clinical Spectrum of Infection

For any given pathogen, there is typically a spectrum of outcomes: subclinical (asymptomatic) infection → mild disease → moderate disease → severe disease → critical disease → death. This spectrum has direct implications for surveillance. Because mild and asymptomatic cases are often not diagnosed or reported, surveillance data systematically undercount true infections — a ascertainment bias that must be accounted for when fitting models to reported case data.

0.5 Epidemic, Endemic, and Pandemic: Precise Definitions

Term	Definition	Model Interpretation
Epidemic	Cases in excess of what is normally expected in a defined area or season	Exponential growth phase; $R_t > 1$
Endemic	Relatively constant disease presence in a population over time	Stable equilibrium; incidence neither grows nor declines
Outbreak	Geographically confined epidemic (hospital, community, district)	Localised epidemic curve
Pandemic	Epidemic spanning multiple countries or continents	Multi-patch spread; global Rₜ > 1

0.6 The Basic Reproduction Number (R₀): Concept Before Formula

No concept in infectious disease epidemiology is more central — or more frequently misunderstood — than the basic reproduction number, R₀ (pronounced “R-naught”).

0.6.1 Intuitive Definition

Definition of R₀

R₀ is the average number of secondary infections generated by a single infectious individual introduced into a fully susceptible population, in the absence of interventions or prior immunity.

Every word of this definition matters:

Average: R₀ is a population-level average. Individual variation in infectiousness can be enormous (the concept of superspreading, discussed below).
Single infectious individual: R₀ describes what happens from a single index case, not the rate of growth of an established epidemic.
Fully susceptible population: If any proportion of the population is already immune, the effective reproduction number is lower than R₀.
Absence of interventions: R₀ describes the intrinsic biology of the host–pathogen interaction.

0.6.2 Interpreting R₀

The threshold nature of R₀ is its most important property:

R₀ < 1: Each infectious individual generates fewer than one new infection on average. Chains of transmission are self-limiting; the pathogen cannot sustain itself.
R₀ = 1: The infection is at a critical threshold; neither growing nor declining.
R₀ > 1: Each infectious individual generates more than one new infection. The infection can spread; the larger R₀, the faster the growth.

0.6.3 What Determines R₀?

R₀ emerges from the interaction of three factors:

\[R_0 = \beta_c \cdot c \cdot D = \frac{\beta}{\gamma}\]

where: - $\beta_c$ = transmission probability per contact
- $c$ = contact rate (contacts per unit time)
- $D = 1/\gamma$ = mean duration of infectiousness
- $\beta = \beta_c \times c$ = overall transmission rate

The crucial implication: R₀ can be reduced by targeting any of its three components:

Intervention	Component targeted
Masks, antivirals, condoms	Reduce $\beta_c$
Social distancing, school closures, travel restrictions	Reduce $c$
Treatment shortening infectious period, isolation	Reduce $D$

0.6.4 Approximate R₀ Values for Common Pathogens

Approximate R₀ values for selected human pathogens (values are context-dependent and vary by population, time, and study methodology)
Pathogen	Approximate R₀	Transmission Route
Measles virus	12–18	Airborne
Pertussis (B. pertussis)	12–17	Droplet/contact
Chickenpox (VZV)	8–10	Airborne/contact
Mumps virus	4–7	Droplet
Rubella virus	5–7	Droplet
Polio virus	5–7	Faecal-oral
SARS-CoV-2 (original)	2–3	Airborne/droplet
SARS-CoV-2 (Delta variant)	5–8	Airborne
SARS-CoV-2 (Omicron variant)	10–18	Airborne
Seasonal influenza	1.2–1.4	Droplet
HIV (sexual transmission)	2–5	Contact
Ebola virus	1.5–2.5	Direct contact
SARS-CoV-1	2–5	Droplet/aerosol

Note

Measles has one of the highest R₀ values of any human pathogen, which explains why achieving herd immunity against measles requires vaccination coverage of approximately 95% of the population.

0.6.5 Superspreading and the Limits of R₀

R₀ is a mean, and means can be misleading when underlying distributions are highly skewed. Empirical evidence suggests that for many pathogens — including SARS-CoV-1, SARS-CoV-2, Ebola, and tuberculosis — the distribution of individual reproductive numbers is overdispersed: most infected individuals cause few or no secondary infections, while a small minority (“superspreaders”) generate a disproportionate number of cases.

The negative binomial distribution is commonly used to model overdispersed transmission, with a dispersion parameter $k$: small values of $k$ indicate high overdispersion. SARS-CoV-1 was estimated to have $k \approx 0.16$, implying extreme superspreading — roughly 20% of cases were responsible for 80% of transmission.

0.7 R Implementation: Module 1

0.7.1 Session 1.1: Project Setup and Package Loading

# Install packages if needed (run once)
# install.packages(c("tidyverse", "incidence2", "lubridate"))

library(tidyverse)
library(incidence2)
library(lubridate)

0.7.2 Session 1.2: Epidemic Curve Visualisation with `incidence2`

The epidemic curve (epi curve) is the most fundamental descriptive tool in outbreak investigation. It plots case counts against time and communicates — at a glance — the phase of an epidemic (growth, plateau, decline), the likely incubation period, and potential exposure events.

# Simulated linelist: date of symptom onset for 200 cases
set.seed(42)
linelist <- data.frame(
  id         = 1:200,
  date_onset = as.Date("2024-01-01") +
    round(rgamma(200, shape = 3, rate = 0.15))
)

# Build incidence object (weekly intervals)
epi_curve <- incidence(
  x          = linelist,
  date_index = "date_onset",
  interval   = "week"
)

# Plot epidemic curve
plot(epi_curve) +
  labs(
    title    = "Simulated Outbreak — Weekly Epidemic Curve",
    subtitle = "Gamma-distributed symptom onset dates (n = 200 cases)",
    x        = "Week of symptom onset",
    y        = "Number of cases"
  ) +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

Interpretation Exercise

From the epidemic curve alone, students should be able to identify:

The epidemic type (point source vs propagated): a point source outbreak shows a sharp rise and decline following a single exposure; a propagated outbreak shows a more gradual, wave-like pattern.
The approximate epidemic peak.
Whether the epidemic is growing, plateauing, or declining at any given point.

0.7.3 Session 1.3: Calculating Descriptive Epidemiological Parameters

# Attack rate
total_population <- 5000
total_cases      <- 200
attack_rate      <- total_cases / total_population

cat("Attack rate:", round(attack_rate * 100, 2), "%\n")

Attack rate: 4 %

# Case fatality rate (CFR)
total_deaths <- 12
CFR          <- total_deaths / total_cases

cat("CFR:", round(CFR * 100, 2), "%\n")

CFR: 6 %

# Doubling time from early growth phase (naive exponential estimate)
early_cases <- linelist %>%
  filter(date_onset <= as.Date("2024-01-21")) %>%  # first 3 weeks
  count(week = lubridate::floor_date(date_onset, "week")) %>%
  filter(n > 0)

# Fit simple exponential growth model
model        <- lm(log(n) ~ as.numeric(week), data = early_cases)
growth_rate  <- coef(model)[2]
doubling_time <- log(2) / growth_rate

cat("Estimated doubling time:", round(doubling_time, 1), "days\n")

Estimated doubling time: -15.1 days

0.8 Module 1 References

Kermack, W. O., & McKendrick, A. G. (1927). A contribution to the mathematical theory of epidemics. Proceedings of the Royal Society of London. Series A, 115(772), 700–721.
Anderson, R. M., & May, R. M. (1991). Infectious Diseases of Humans: Dynamics and Control. Oxford University Press.
Keeling, M. J., & Rohani, P. (2008). Modeling Infectious Diseases in Humans and Animals. Princeton University Press.
Bjørnstad, O. N. (2018). Epidemics: Models and Data Using R. Springer.
Gordis, L. (2014). Epidemiology (5th ed.). Elsevier Saunders.
Diekmann, O., Heesterbeek, H., & Britton, T. (2013). Mathematical Tools for Understanding Infectious Disease Dynamics. Princeton University Press.
Dietz, K. (1993). The estimation of the basic reproduction number for infectious diseases. Statistical Methods in Medical Research, 2(1), 23–41.
Heesterbeek, J. A. P. (2002). A brief history of R₀ and a recipe for its calculation. Acta Biotheoretica, 50(3), 189–204.
Delamater, P. L., Street, E. J., Leslie, T. F., Yang, Y. T., & Jacobsen, K. H. (2019). Complexity of the basic reproduction number (R₀). Emerging Infectious Diseases, 25(1), 1–4.
Lloyd-Smith, J. O., Schreiber, S. J., Kopp, P. E., & Getz, W. M. (2005). Superspreading and the effect of individual variation on disease emergence. Nature, 438(7066), 355–359.
Kamvar, Z. N., Cai, J., Pulliam, J. R. C., Schumacher, J., & Jombart, T. (2019). Epidemic curves made easy using the R package incidence. F1000Research, 8, 139.

MODULE 2: Compartmental Models — The SIR Framework

0.9 The Logic of Compartmental Modelling

A compartmental model divides a population into mutually exclusive subgroups — compartments — defined by their status with respect to the infection of interest. Individuals move between compartments according to mathematically specified rates. The model tracks the size of each compartment over time and produces predictions about epidemic dynamics.

The elegance of this approach lies in its parsimony. Rather than tracking every individual in a population (computationally expensive and often unnecessary), we track the aggregate flow between states. This is justified under the mean-field assumption: that individuals within a compartment are identical and mix randomly with individuals in all other compartments.

0.9.1 Population-Level vs Individual-Level Thinking

A critical conceptual transition for students — particularly those trained in clinical medicine or individual-level epidemiology — is shifting from thinking about what happens to an individual to thinking about what happens to a population.

A clinician treats a patient: the patient is either infected or not, recovers or does not. A compartmental modeller asks: given that 10% of the population is currently infectious, what fraction will become infected over the next two weeks? These are fundamentally different questions requiring different frameworks.

Deterministic vs Stochastic Models

The compartmental model, in its basic form, is deterministic: given a set of initial conditions and parameters, the model produces a single, unique trajectory. Real epidemics are stochastic (random), but the deterministic approximation is very good when population sizes are large (typically N > 1,000–10,000). Module 5 addresses stochasticity explicitly.

0.10 Flow Diagrams: The Visual Language of Compartmental Models

Important

Before writing a single differential equation, every student should be able to draw and interpret a flow diagram. This is not a simplification for beginners — it is how professional modellers think. The flow diagram is the model, expressed visually. The differential equations are simply a translation of the diagram into mathematical notation.

0.10.1 Elements of a Flow Diagram

A flow diagram consists of:

Boxes representing compartments, each labelled with the compartment name and its variable (S, I, R, E, etc.)
Arrows representing flows between compartments, each carrying a rate label
Rate labels describing how fast the flow occurs — these are the model parameters

0.10.2 The SIR Flow Diagram

The simplest epidemic model with lasting immunity involves three compartments:

\[\boxed{S} \xrightarrow{\beta I/N} \boxed{I} \xrightarrow{\gamma} \boxed{R}\]

Reading this diagram aloud: susceptible individuals (S) become infectious (I) at a rate proportional to the current prevalence of infection ($\beta I/N$), where β is the transmission rate. Infectious individuals recover (R) at a constant rate γ (gamma). Recovered individuals are permanently immune and leave the dynamic system.

This diagram encodes several explicit assumptions:

Assumptions of the Basic SIR Model

The population is closed — no births, deaths, or migration; total N = S + I + R is constant.
All susceptible individuals are equally susceptible.
All infectious individuals are equally infectious.
Mixing is homogeneous — every individual contacts every other individual with equal probability.
Recovery confers permanent, complete immunity.
There is no latent period — infection is immediately followed by infectiousness.

Each of these assumptions can be relaxed (and will be in subsequent modules), but they define the canonical SIR model as formulated by Kermack and McKendrick.

0.11 From Flow Diagram to Differential Equations

The differential equations of the SIR model are a direct translation of the flow diagram. For each compartment, the equation describes the rate of change of its size over time: inflows minus outflows.

0.11.1 Deriving the SIR Equations

For the Susceptible compartment (S):

The only flow out of S is infection. The rate of infection is $\beta \cdot S \cdot I/N$: the transmission rate β multiplied by the number of susceptibles S multiplied by the probability that a random contact is with an infectious individual ($I/N$). There are no inflows.

\[\frac{dS}{dt} = -\frac{\beta S I}{N}\]

For the Infectious compartment (I):

One inflow (new infections from S) and one outflow (recovery to R):

\[\frac{dI}{dt} = \frac{\beta S I}{N} - \gamma I\]

For the Recovered compartment (R):

One inflow (recoveries from I) and no outflows:

\[\frac{dR}{dt} = \gamma I\]

0.11.2 Conservation of Population

A key sanity check: N = S + I + R should be constant over time. Adding all three equations:

\[\frac{dS}{dt} + \frac{dI}{dt} + \frac{dR}{dt} = -\frac{\beta S I}{N} + \frac{\beta S I}{N} - \gamma I + \gamma I = 0\]

The terms cancel exactly, confirming that $dN/dt = 0$ — the total population is conserved. ✓

0.11.3 The Mass Action Principle

The infection term $\beta SI/N$ embodies the mass action principle, borrowed from chemical kinetics. Two formulations exist:

Formulation	Expression	When to use
Frequency-dependent (standard incidence)	$\beta \cdot S \cdot I/N$	Contact rates do not increase with population size (most human diseases)
Density-dependent (mass action incidence)	$\beta \cdot S \cdot I$	Contact rates increase with density (some vector-borne diseases)

Important: Choice of Incidence Formulation

With frequency-dependent incidence, R₀ is independent of population size. With density-dependent incidence, R₀ scales with N. Students frequently confuse these formulations — the choice matters substantially for model predictions.

0.12 Parameters of the SIR Model

0.12.1 The Transmission Rate (β)

Biological meaning: β encodes the combined effects of the contact rate and per-contact transmission probability:

\[\beta = c \cdot p\]

where $c$ is the average number of potentially infectious contacts per unit time, and $p$ is the probability of transmission per contact.

Units: per person per unit time (e.g., day⁻¹).

Example: If β = 0.5 day⁻¹ and I/N = 0.1, then the daily per capita rate of new infections among susceptibles is 0.05 — 5% of susceptibles become infected each day.

Estimation:

From early epidemic growth rate combined with serial interval data
From household secondary attack rates combined with contact frequency data
By fitting the full SIR model to incidence data (Module 9)

0.12.2 The Recovery Rate (γ)

Biological meaning: γ is the per capita rate at which infectious individuals recover per unit time. The mean infectious period is $1/\gamma$.

Units: per unit time (e.g., day⁻¹).

Example: If γ = 0.1 day⁻¹, then 10% of currently infectious individuals recover each day, and the mean infectious period is $1/0.1 = 10$ days.

The Exponential Infectious Period

Under the SIR formulation, the time spent in the I compartment follows an exponential distribution with rate γ. The exponential distribution has a memoryless property: the probability of recovering today does not depend on how long the individual has already been infectious. This is a mathematical convenience, not a biological truth. More realistic distributions (gamma, Weibull) can be incorporated with the SEIR and related models (Module 3).

0.12.3 Deriving R₀ from the SIR Model

Having established the model structure, we can now derive R₀ rigorously. Consider the equation for $dI/dt$:

\[\frac{dI}{dt} = I\left(\frac{\beta S}{N} - \gamma\right)\]

At the very beginning of an epidemic, virtually the entire population is susceptible: $S \approx N$. Therefore:

\[\frac{dI}{dt} \approx I(\beta - \gamma) = \gamma I\left(\frac{\beta}{\gamma} - 1\right)\]

The infection grows ($dI/dt > 0$) if and only if $\beta/\gamma > 1$. We therefore define:

\[\boxed{R_0 = \frac{\beta}{\gamma}}\]

This is a clean, model-derived definition: R₀ is the ratio of the rate at which new infections are generated (β) to the rate at which infections are resolved (γ). Each infectious individual generates new infections at rate β for an average duration of $1/\gamma$, giving $\beta \times (1/\gamma) = \beta/\gamma$ secondary infections.

0.12.4 The Epidemic Threshold Theorem

The epidemic threshold theorem states that a major epidemic can occur if and only if R₀ > 1, equivalently if the initial proportion of susceptibles $S(0)/N$ exceeds $1/R_0$.

Corollary 1 — Not everyone gets infected. Even when R₀ >> 1, the epidemic does not infect the entire population. As infection spreads, the susceptible pool depletes, reducing the force of infection. Eventually, susceptibles fall below $N/R_0$ and the epidemic begins to decline. The final size of the epidemic is determined by the final size equation (discussed below).

Corollary 2 — Herd immunity. If a proportion $v$ of the population is vaccinated (assuming perfect vaccine efficacy), the effective R₀ becomes $R_0(1-v)$. For the epidemic to be controlled:

\[R_0(1-v) < 1 \implies v_c = 1 - \frac{1}{R_0}\]

Pathogen	R₀	Herd Immunity Threshold ($v_c$)
Measles	~15	~93%
Pertussis	~15	~93%
COVID-19 (original)	~2.5	~60%
COVID-19 (Omicron)	~12	~92%
Seasonal influenza	~1.3	~23%

0.12.5 The Final Size Equation

The final proportion of the population ever infected ($z$) satisfies:

\[1 - z = e^{-R_0 z}\]

This transcendental equation has no closed-form solution but can be solved numerically:

# Numerically solve the final size equation for a range of R0 values
final_size_solve <- function(R0) {
  # uniroot finds z such that (1 - z) - exp(-R0 * z) = 0
  f   <- function(z) (1 - z) - exp(-R0 * z)
  sol <- uniroot(f, interval = c(1e-6, 1 - 1e-6))
  sol$root
}

R0_vals    <- seq(1.01, 10, by = 0.05)
final_size <- sapply(R0_vals, final_size_solve)

fs_df <- data.frame(R0 = R0_vals, attack_rate = final_size * 100)

ggplot(fs_df, aes(x = R0, y = attack_rate)) +
  geom_line(colour = "#c0392b", linewidth = 1.3) +
  geom_hline(yintercept = c(58, 80, 89), linetype = "dashed",
             colour = "grey50", linewidth = 0.7) +
  annotate("text", x = 9.8, y = 60, label = "R₀=1.5 → 58%",
           size = 3.5, hjust = 1, colour = "#555555") +
  annotate("text", x = 9.8, y = 82, label = "R₀=2.0 → 80%",
           size = 3.5, hjust = 1, colour = "#555555") +
  annotate("text", x = 9.8, y = 91, label = "R₀=2.5 → 89%",
           size = 3.5, hjust = 1, colour = "#555555") +
  labs(
    title    = "Final Epidemic Size as a Function of R₀",
    subtitle = "Derived from the Kermack–McKendrick final size equation: 1 − z = exp(−R₀z)",
    x        = "Basic reproduction number (R₀)",
    y        = "Final attack rate (%)"
  ) +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

Important

An R₀ of just 2 leads to 80% of the population being infected in the absence of interventions or prior immunity. This is a counterintuitive result for many students. The implication is that interventions that reduce R₀ — even without eliminating transmission — can have enormous impact on total disease burden.

0.13 Dynamics of the SIR Model

0.13.1 Qualitative Behaviour

The SIR model produces a characteristic epidemic trajectory across four phases:

Exponential growth phase: When $S \approx N$ and $I$ is small, $dI/dt \approx \gamma I(R_0 - 1)$, and $I$ grows approximately exponentially. The growth rate is $r = \gamma(R_0 - 1)$.
Peak: The epidemic peaks when $dI/dt = 0$, i.e., when $\beta S/N = \gamma$, i.e., when $S = N/R_0$. At this moment, depletion of susceptibles has reduced transmission to exactly the replacement rate.
Decline: Once $S < N/R_0$, new infections are generated more slowly than recoveries occur, and $I$ begins to decline. Note: R₀ itself has not changed; the effective reproduction number $R_t = R_0 \cdot S/N$ has fallen below 1.
Epidemic burnout: $I$ declines toward zero, but $S$ does not reach zero. A residual pool of susceptibles always remains — too few to sustain transmission, but not zero.

0.13.2 The Effective Reproduction Number Rₜ

The effective reproduction number at time $t$ is:

\[R_t = R_0 \cdot \frac{S(t)}{N}\]

At $t = 0$: $R_t = R_0$ (full susceptibility)
When $R_t = 1$: the epidemic is at its peak
When $R_t < 1$: the epidemic is declining

0.14 R Implementation: Module 2

0.14.1 Session 2.1: Setting Up Required Packages

# Install if needed
# install.packages(c("deSolve", "tidyverse"))

library(deSolve)
library(tidyverse)

0.14.2 Session 2.2: Implementing the SIR Model with `deSolve`

# ---- Step 1: Define the ODE function ----
sir_model <- function(time, state, params) {
  with(as.list(c(state, params)), {
    N     <- S + I + R
    dS_dt <- -beta * S * I / N
    dI_dt <-  beta * S * I / N - gamma * I
    dR_dt <-  gamma * I
    return(list(c(dS_dt, dI_dt, dR_dt)))
  })
}

# ---- Step 2: Define parameters ----
params <- c(
  beta  = 0.4,    # transmission rate (day^-1)
  gamma = 0.1     # recovery rate     (day^-1)
  # R0 = beta/gamma = 0.4/0.1 = 4
)

# ---- Step 3: Define initial conditions ----
# 1 infectious individual in a fully susceptible population of 10,000
N     <- 10000
inits <- c(S = N - 1, I = 1, R = 0)

# ---- Step 4: Define time vector ----
times <- seq(0, 200, by = 1)   # 200 days, daily steps

# ---- Step 5: Solve the ODE system ----
output <- ode(
  y     = inits,
  times = times,
  func  = sir_model,
  parms = params
)

sir_df <- as.data.frame(output)
head(sir_df)

  time        S        I         R
1    0 9999.000 1.000000 0.0000000
2    1 9998.534 1.349795 0.1166174
3    2 9997.904 1.821903 0.2740243
4    3 9997.054 2.459069 0.4864842
5    4 9995.908 3.318931 0.7732396
6    5 9994.361 4.479225 1.1602554

0.14.3 Session 2.3: Visualising SIR Compartment Dynamics

# Reshape to long format for ggplot
sir_long <- sir_df %>%
  pivot_longer(
    cols      = c(S, I, R),
    names_to  = "Compartment",
    values_to = "Count"
  ) %>%
  mutate(
    Compartment = factor(Compartment, levels = c("S", "I", "R")),
    Label = case_when(
      Compartment == "S" ~ "Susceptible",
      Compartment == "I" ~ "Infectious",
      Compartment == "R" ~ "Recovered"
    )
  )

ggplot(sir_long, aes(x = time, y = Count, colour = Label)) +
  geom_line(linewidth = 1.3) +
  scale_colour_manual(
    values = c("Susceptible" = "#2196F3",
               "Infectious"  = "#F44336",
               "Recovered"   = "#4CAF50")
  ) +
  scale_y_continuous(labels = scales::comma) +
  labs(
    title    = "SIR Model Dynamics (R₀ = 4)",
    subtitle = "N = 10,000; β = 0.4 day⁻¹; γ = 0.1 day⁻¹; 1 seed case",
    x        = "Time (days)",
    y        = "Number of individuals",
    colour   = "Compartment"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title    = element_text(face = "bold"),
    legend.position = "bottom"
  )

0.14.4 Session 2.4: Sensitivity Analysis — Varying R₀

# Function to run SIR for a given R0
run_sir <- function(R0, gamma = 0.1, N = 10000, days = 200) {
  beta   <- R0 * gamma
  params <- c(beta = beta, gamma = gamma)
  inits  <- c(S = N - 1, I = 1, R = 0)
  times  <- seq(0, days, by = 1)

  out <- ode(y = inits, times = times,
             func = sir_model, parms = params)

  as.data.frame(out) %>%
    mutate(R0_label = paste0("R₀ = ", R0))
}

# Run for multiple R0 values
R0_values  <- c(1.5, 2.0, 3.0, 5.0, 8.0)
results_df <- map_dfr(R0_values, run_sir)

# Plot infectious curves
results_df %>%
  ggplot(aes(x = time, y = I,
             colour = factor(R0_label,
                             levels = paste0("R₀ = ", R0_values)),
             group  = R0_label)) +
  geom_line(linewidth = 1.1) +
  scale_colour_brewer(palette = "Set1") +
  scale_y_continuous(labels = scales::comma) +
  labs(
    title    = "SIR Infectious Curve Across a Range of R₀ Values",
    subtitle = "N = 10,000; γ = 0.1 day⁻¹; 1 seed case",
    x        = "Time (days)",
    y        = "Number infectious",
    colour   = "Scenario"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title    = element_text(face = "bold"),
    legend.position = "right"
  )

Interpretation Exercise

From the sensitivity plot, students should answer:

Which value of R₀ produces the earliest peak? Why? (Higher R₀ → faster growth → earlier, sharper peak)
For which R₀ is the epidemic most prolonged? (Lower R₀ near 1 → slow spread → longer epidemic)
How does total final attack rate change across R₀ values? (Use the final size equation to verify)

0.14.5 Session 2.5: Verifying the Herd Immunity Threshold

# Analytic herd immunity threshold as a function of R0
R0_vals <- seq(1.1, 18, by = 0.1)
HIT     <- 1 - 1 / R0_vals

hit_df <- data.frame(R0 = R0_vals, HIT_pct = HIT * 100)

ggplot(hit_df, aes(x = R0, y = HIT_pct)) +
  geom_line(colour = "#9C27B0", linewidth = 1.3) +
  geom_hline(yintercept = c(60, 93),
             linetype = "dashed", colour = "grey50") +
  annotate("text", x = 17.5, y = 62,
           label = "COVID-19 original (~60%)", size = 3.5, hjust = 1) +
  annotate("text", x = 17.5, y = 95,
           label = "Measles / Omicron (~93%)", size = 3.5, hjust = 1) +
  scale_y_continuous(limits = c(0, 100)) +
  labs(
    title    = "Herd Immunity Threshold as a Function of R₀",
    subtitle = expression(v[c] == 1 - 1/R[0]),
    x        = "Basic reproduction number (R₀)",
    y        = "Vaccination coverage required (%)"
  ) +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

0.14.6 Session 2.6: Incidence vs Prevalence — A Critical Distinction

Students often conflate incidence (new cases per day) with prevalence (current number of infectious cases). The SIR model generates prevalence $I(t)$ directly. Incidence is the rate of flow from S to I.

# Compute daily incidence from the reduction in S
sir_df <- sir_df %>%
  mutate(
    incidence  = c(NA, -diff(S)),   # new infections per day = reduction in S
    prevalence = I
  )

# Plot both on the same axes
sir_df %>%
  select(time, incidence, prevalence) %>%
  pivot_longer(-time,
               names_to  = "measure",
               values_to = "count") %>%
  mutate(measure = str_to_title(measure)) %>%
  ggplot(aes(x = time, y = count, colour = measure)) +
  geom_line(linewidth = 1.2) +
  scale_colour_manual(
    values = c("Incidence"  = "#e74c3c",
               "Prevalence" = "#2980b9")
  ) +
  scale_y_continuous(labels = scales::comma) +
  labs(
    title    = "Incidence vs Prevalence in the SIR Model",
    subtitle = "Note: incidence peak precedes prevalence peak by approximately 1/γ days",
    x        = "Time (days)",
    y        = "Cases",
    colour   = NULL
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title    = element_text(face = "bold"),
    legend.position = "bottom"
  )

A Consequential Surveillance Insight

The incidence peak precedes the prevalence peak by approximately $1/\gamma$ days. This means that by the time observed case counts in a surveillance system appear to peak, transmission has often already begun to decline. Delayed or lagged reporting further amplifies this gap. Modellers and public health practitioners must account for this when interpreting real-time surveillance data.

0.14.7 Session 2.7: Computing Rₜ Over Time from SIR Output

# R0 and parameters
R0    <- 0.4 / 0.1   # beta / gamma = 4
N_pop <- 10000

# Compute Rt = R0 * S(t)/N at each time point
sir_df <- sir_df %>%
  mutate(Rt = R0 * S / N_pop)

ggplot(sir_df, aes(x = time, y = Rt)) +
  geom_line(colour = "#e67e22", linewidth = 1.3) +
  geom_hline(yintercept = 1, linetype = "dashed",
             colour = "#c0392b", linewidth = 0.9) +
  annotate("text", x = 180, y = 1.15,
           label = "Rₜ = 1 (epidemic peak)", size = 3.5,
           colour = "#c0392b") +
  labs(
    title    = "Effective Reproduction Number Rₜ Over the Course of the Epidemic",
    subtitle = expression(R[t] == R[0] %.% S(t)/N),
    x        = "Time (days)",
    y        = expression(R[t])
  ) +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

Reading the Rₜ Plot

When Rₜ > 1: the epidemic is growing. The gap above 1 indicates how fast.
When Rₜ = 1: the epidemic is at its peak in terms of incidence.
When Rₜ < 1: the epidemic is declining. Rₜ will never return to R₀ because susceptibles are progressively depleted.

0.15 Module 2 References

Kermack, W. O., & McKendrick, A. G. (1927). A contribution to the mathematical theory of epidemics. Proceedings of the Royal Society of London. Series A, 115(772), 700–721.
Kermack, W. O., & McKendrick, A. G. (1932). Contributions to the mathematical theory of epidemics: II. The problem of endemicity. Proceedings of the Royal Society of London. Series A, 138(834), 55–83.
Kermack, W. O., & McKendrick, A. G. (1933). Contributions to the mathematical theory of epidemics: III. Further studies of the problem of endemicity. Proceedings of the Royal Society of London. Series A, 141(843), 94–122.
Keeling, M. J., & Rohani, P. (2008). Modeling Infectious Diseases in Humans and Animals. Princeton University Press.
Anderson, R. M., & May, R. M. (1991). Infectious Diseases of Humans: Dynamics and Control. Oxford University Press.
Bjørnstad, O. N. (2018). Epidemics: Models and Data Using R. Springer.
Diekmann, O., Heesterbeek, H., & Britton, T. (2013). Mathematical Tools for Understanding Infectious Disease Dynamics. Princeton University Press.
Vynnycky, E., & White, R. (2010). An Introduction to Mathematical Modelling of Infectious Diseases. Oxford University Press.
Hethcote, H. W. (2000). The mathematics of infectious diseases. SIAM Review, 42(4), 599–653.
Heesterbeek, J. A. P., & Dietz, K. (1996). The concept of R₀ in epidemic theory. Statistica Neerlandica, 50(1), 89–110.
Fine, P., Eames, K., & Heymann, D. L. (2011). “Herd immunity”: a rough guide. Clinical Infectious Diseases, 52(7), 911–916.
McCallum, H., Barlow, N., & Hone, J. (2001). How should pathogen transmission be modelled? Trends in Ecology & Evolution, 16(6), 295–300.
Soetaert, K., Petzoldt, T., & Setzer, R. W. (2010). Solving differential equations in R: Package deSolve. Journal of Statistical Software, 33(9), 1–25.

--- title: "Infectious Disease Modelling in Epidemiology" subtitle: "Modules 1 & 2: Foundations and the SIR Framework" author: "Course Instructor: Timothy Achala" date: today format: html: toc: true toc-depth: 4 toc-location: left toc-title: "Table of Contents" number-sections: true theme: flatly highlight-style: github code-fold: false code-tools: true fig-width: 9 fig-height: 5.5 fig-align: center embed-resources: true smooth-scroll: true css: | body { font-size: 15px; line-height: 1.8; } h1 { color: #1a3a5c; border-bottom: 3px solid #1a3a5c; padding-bottom: 6px; } h2 { color: #1a5276; border-bottom: 1px solid #aed6f1; padding-bottom: 4px; } h3 { color: #21618c; } h4 { color: #2e86c1; } .callout { border-radius: 6px; } table { font-size: 14px; } execute: echo: true warning: false message: false cache: false bibliography: references.bib --- ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, warning = FALSE, message = FALSE, fig.align = "center", dpi = 150 ) ``` --- # MODULE 1: Foundations of Infectious Disease Epidemiology {.unnumbered} --- ## Introduction: Why Do We Model Infectious Diseases? Before students open R or draw a single flow diagram, they must understand why mathematical modelling exists as a discipline within epidemiology. The answer is not simply "to predict outbreaks," though prediction is one application. The deeper answer is that infectious disease systems are *dynamically complex* — they involve feedback loops, threshold effects, and population-level phenomena that cannot be understood by intuition alone. Consider a simple question: if you double the number of infectious individuals in a population, does the number of new infections double? Not necessarily. It depends on how many susceptible individuals remain, how frequently people contact one another, and whether the infectious individuals are isolated. These are non-linear, interdependent relationships. Mathematics gives us a language precise enough to capture them. ### The Historical Imperative The intellectual history of infectious disease modelling is inseparable from the history of epidemiology itself. Students should appreciate this lineage not as trivia, but as evidence that the questions they are learning to answer have been among the most consequential in the history of human health. **John Snow (1854)** is often celebrated as the father of epidemiology, not because he built a mathematical model, but because he thought spatially and causally about cholera transmission — mapping cases around the Broad Street pump in Soho, London. Snow's work was implicitly a model: he hypothesised a transmission mechanism (waterborne), identified an exposure source, and tested the hypothesis by removing the pump handle. This logical structure — mechanism, exposure, test — underlies all infectious disease modelling. **Ronald Ross (1911)** was the first to apply differential equations to an infectious disease problem. Working on malaria transmission between humans and mosquitoes, Ross derived mathematical conditions under which malaria could be eradicated. His crucial insight was the *threshold theorem*: malaria could be eliminated not by reducing mosquito numbers to zero, but by reducing them below a critical threshold. This was a paradigm-shifting result, demonstrating that mathematical analysis could generate non-obvious, policy-relevant conclusions. **Kermack and McKendrick (1927)** generalised Ross's ideas into the foundational paper of modern epidemic theory. Their 1927 paper, *"A Contribution to the Mathematical Theory of Epidemics,"* introduced what we now call the SIR model and proved the epidemic threshold theorem: an epidemic can only occur if the density of susceptibles exceeds a critical value. Nearly a century later, this paper remains one of the most cited in epidemiology. **The COVID-19 pandemic (2020–2022)** brought infectious disease modelling into public consciousness with extraordinary force. Models from groups like the Imperial College London COVID-19 Response Team directly informed national lockdown decisions. They also exposed the limitations of models when data were sparse, parameters uncertain, and political pressures intense. Students entering this field today inherit both the power and the responsibility that this history represents. ### What Models Can and Cannot Do A recurring theme throughout this course — and one that must be established clearly in Module 1 — is the epistemological status of models. Students must understand the following: ::: {.callout-important title="Core Modelling Principles"} - **Models are simplifications by design.** The statistician George Box wrote: *"All models are wrong, but some are useful."* A model that perfectly reproduced every individual interaction in a population would be as complex as the population itself and therefore useless as an analytical tool. The art of modelling is deciding what to leave out. - **Models generate hypotheses, not facts.** A model predicts what *would* happen under a specified set of assumptions. Whether those assumptions hold in the real world is an empirical question. - **Models are tools for structured thinking.** Even a model that is never fitted to data forces the modeller to specify assumptions explicitly. What drives transmission? How long are people infectious? Are all age groups equally susceptible? - **Models can be wrong in important ways.** Omitting heterogeneity — age structure, spatial clustering, behavioural variation — can lead to systematically biased predictions. This is a reason to model carefully and communicate uncertainty honestly, not a reason to avoid modelling. ::: --- ## The Epidemiological Triad Every infectious disease event involves the interaction of three elements: a **host**, an **agent**, and an **environment**. This framework, known as the epidemiological triad, is one of the oldest conceptual tools in epidemiology. While modern infectious disease modelling often transcends this simple framework, it provides an indispensable starting point. ### The Host The host is the organism — in most contexts a human being — that is susceptible to infection. Host characteristics that influence transmission dynamics include: **Immunological status.** A host that has previously been infected by a pathogen, or vaccinated against it, may have partial or complete immunity. Immunity reduces the probability of infection upon exposure and reduces the infectious period if infection does occur. The distribution of immune states across a population is one of the most important determinants of whether an epidemic can occur. **Age.** Age influences both susceptibility (younger children may lack immune memory; older adults may have weakened immune responses) and exposure (children mix intensively in schools; adults mix differently in workplaces). Age-structured models, introduced in Module 6, capture these differences formally. **Behaviour.** Sexual behaviour (relevant for HIV, gonorrhoea, HPV), hand hygiene (relevant for cholera, norovirus), and mask-wearing (relevant for influenza, SARS-CoV-2) all modify transmission probabilities. Behaviour is among the hardest quantities to parameterise, yet among the most important. **Nutritional status and comorbidities.** Malnutrition, HIV co-infection, and diabetes all increase susceptibility to certain pathogens (e.g., tuberculosis). In low-income settings, these interactions can substantially modify model predictions. ### The Agent The agent is the pathogen: a virus, bacterium, parasite, fungus, or prion. Key agent characteristics include: **Infectivity.** The probability that an exposed host becomes infected. This is distinct from transmissibility, which refers to the probability of transmission occurring during a contact between an infectious and susceptible individual. **Pathogenicity.** The proportion of infected individuals who develop clinical disease. A highly pathogenic organism produces severe illness in most infected individuals; a low-pathogenicity organism may infect widely while causing few recognisable cases. **Virulence.** The severity of disease produced in those who become ill. Virulence is related to, but distinct from, pathogenicity. Ebola virus has both high pathogenicity and high virulence; many rhinoviruses (common cold) have high pathogenicity but low virulence. **Antigenicity and mutation rate.** Pathogens that mutate rapidly (e.g., influenza A, SARS-CoV-2) can evade prior immunity, complicating vaccination strategies and enabling repeated epidemics in the same population. **Survival outside the host.** Some pathogens survive for extended periods on surfaces (norovirus, *Clostridioides difficile*), in water (cholera vibrio, *Cryptosporidium*), or in aerosols (measles virus, *Mycobacterium tuberculosis*). Survival determines the viability of indirect transmission routes. ### The Environment The environment encompasses all extrinsic factors that influence the probability and frequency of host–agent contact. **Physical environment.** Temperature and humidity affect pathogen survival and vector biology. Malaria transmission is strongly seasonal in sub-Saharan Africa, peaking with rainfall that creates vector breeding sites. Influenza epidemics are concentrated in winter months in temperate climates, partly because cold, dry air favours viral survival in aerosols and partly because people congregate indoors. **Social and built environment.** Population density, housing quality, sanitation infrastructure, and health system capacity all mediate transmission. Cholera thrives where water treatment is inadequate. Tuberculosis spreads in overcrowded households and poorly ventilated congregate settings. **Healthcare environment.** Hospitals are environments where vulnerable people concentrate and pathogen exposure risks are elevated. Healthcare-associated infections (HAIs) — including MRSA and *C. difficile* — require models that incorporate healthcare contact patterns distinct from community transmission. --- ## Transmission Dynamics: The Mechanics of Spread Understanding *how* pathogens move between hosts is the mechanistic foundation of all infectious disease models. ### Direct Transmission **Contact transmission** occurs through physical touching, including sexual contact. HIV, gonorrhoea, and syphilis are transmitted primarily through sexual contact. **Droplet transmission** occurs when large respiratory droplets (>5 μm) produced by coughing, sneezing, or talking travel short distances (generally <1–2 metres) and are deposited on mucous membranes of a susceptible host. Influenza, pertussis, and meningococcal disease can be transmitted by this route. **Airborne transmission** involves smaller particles (<5 μm, sometimes called droplet nuclei) that remain suspended in the air for extended periods and can travel beyond 1–2 metres. Measles, varicella (chickenpox), and tuberculosis are classic examples. ### Indirect Transmission **Vehicle-borne transmission** occurs through contaminated inanimate objects (fomites) or substances. Cholera and typhoid are transmitted through contaminated water; *Salmonella*, *Campylobacter*, and Shiga toxin-producing *E. coli* spread through contaminated food. **Vector-borne transmission** involves a living intermediary — most commonly an arthropod — that carries the pathogen from one host to another: - *Mechanical vectors*: the pathogen is carried on the vector's surface without replication (e.g., houseflies carrying enteric pathogens). - *Biological vectors*: the pathogen undergoes replication or development within the vector (e.g., *Plasmodium* species completing part of their life cycle in *Anopheles* mosquitoes; dengue virus replicating in *Aedes aegypti*). **Vertical transmission** refers to transmission from parent to offspring, either in utero (congenital), during delivery (perinatal), or through breastfeeding (postnatal). HIV, cytomegalovirus, and rubella can all be transmitted vertically. ### Transmission Probability and the Force of Infection The **force of infection** (λ, lambda) is the per capita rate at which susceptible individuals acquire infection per unit time. It is a function of: - The prevalence of infectious individuals in the population - The contact rate between susceptible and infectious individuals - The per-contact probability of transmission Mathematically, in the simplest case: $$\lambda = \beta \cdot \frac{I}{N}$$ where β is the transmission rate, $I$ is the number of infectious individuals, and $N$ is the total population size. This expression is the engine of the SIR model and will be derived carefully in Module 2. --- ## Natural History of Infection The *natural history* of an infectious disease describes the typical progression of infection in an individual host from initial exposure through to resolution (recovery, chronic infection, or death) in the absence of intervention. ### Stages of Infection ::: {.callout-note title="Key Stages in the Natural History of Infection"} | Stage | Definition | |---|---| | **Exposure** | Contact with the pathogen; does not guarantee infection | | **Incubation period** | Time from infection to onset of clinical *symptoms* | | **Latent period** | Time from infection to onset of *infectiousness* | | **Infectious period** | Duration during which the host can transmit the pathogen | | **Resolution** | Recovery (with or without immunity) or death | ::: **Exposure.** A susceptible individual encounters the pathogen. Exposure does not guarantee infection; it depends on the infectious dose, route of exposure, and host immunological status. **Incubation period.** The time between initial infection and the onset of clinical symptoms. The incubation period varies by pathogen: hours to days for *Staphylococcus aureus* food poisoning, 2–14 days for COVID-19, 1–3 weeks for measles, months for tuberculosis. **Latent period.** The time between infection and the onset of *infectiousness*. This is the period that determines whether an individual in the E (Exposed) compartment can transmit the pathogen. ::: {.callout-warning title="Critical Distinction: Latent vs Incubation Period"} The latent period and incubation period are **not the same**: - *Pre-symptomatic transmission* occurs when the latent period is **shorter** than the incubation period. The host becomes infectious **before** symptoms appear. SARS-CoV-2 and influenza A both exhibit substantial pre-symptomatic transmission — individuals have no reason to isolate and may interact normally. - *Asymptomatic transmission* occurs when some infected individuals never develop symptoms yet remain infectious. Estimates suggest 30–40% of SARS-CoV-2 infections were asymptomatic. ::: **Infectious period.** The duration over which an infected individual can transmit the pathogen to susceptible contacts. This period is parameterised in models as $1/\gamma$ (gamma), where γ is the recovery rate. ### Clinical Spectrum of Infection For any given pathogen, there is typically a spectrum of outcomes: subclinical (asymptomatic) infection → mild disease → moderate disease → severe disease → critical disease → death. This spectrum has direct implications for surveillance. Because mild and asymptomatic cases are often not diagnosed or reported, surveillance data systematically undercount true infections — a *ascertainment bias* that must be accounted for when fitting models to reported case data. --- ## Epidemic, Endemic, and Pandemic: Precise Definitions | Term | Definition | Model Interpretation | |---|---|---| | **Epidemic** | Cases in excess of what is normally expected in a defined area or season | Exponential growth phase; $R_t > 1$ | | **Endemic** | Relatively constant disease presence in a population over time | Stable equilibrium; incidence neither grows nor declines | | **Outbreak** | Geographically confined epidemic (hospital, community, district) | Localised epidemic curve | | **Pandemic** | Epidemic spanning multiple countries or continents | Multi-patch spread; global Rₜ > 1 | --- ## The Basic Reproduction Number (R₀): Concept Before Formula No concept in infectious disease epidemiology is more central — or more frequently misunderstood — than the basic reproduction number, R₀ (pronounced "R-naught"). ### Intuitive Definition ::: {.callout-important title="Definition of R₀"} **R₀ is the average number of secondary infections generated by a single infectious individual introduced into a fully susceptible population, in the absence of interventions or prior immunity.** ::: Every word of this definition matters: - *Average*: R₀ is a population-level average. Individual variation in infectiousness can be enormous (the concept of superspreading, discussed below). - *Single infectious individual*: R₀ describes what happens from a single index case, not the rate of growth of an established epidemic. - *Fully susceptible population*: If any proportion of the population is already immune, the effective reproduction number is lower than R₀. - *Absence of interventions*: R₀ describes the intrinsic biology of the host–pathogen interaction. ### Interpreting R₀ The threshold nature of R₀ is its most important property: - **R₀ < 1**: Each infectious individual generates fewer than one new infection on average. Chains of transmission are self-limiting; the pathogen cannot sustain itself. - **R₀ = 1**: The infection is at a critical threshold; neither growing nor declining. - **R₀ > 1**: Each infectious individual generates more than one new infection. The infection can spread; the larger R₀, the faster the growth. ### What Determines R₀? R₀ emerges from the interaction of three factors: $$R_0 = \beta_c \cdot c \cdot D = \frac{\beta}{\gamma}$$ where: - $\beta_c$ = transmission probability per contact - $c$ = contact rate (contacts per unit time) - $D = 1/\gamma$ = mean duration of infectiousness - $\beta = \beta_c \times c$ = overall transmission rate The crucial implication: R₀ can be reduced by targeting **any** of its three components: | Intervention | Component targeted | |---|---| | Masks, antivirals, condoms | Reduce $\beta_c$ | | Social distancing, school closures, travel restrictions | Reduce $c$ | | Treatment shortening infectious period, isolation | Reduce $D$ | ### Approximate R₀ Values for Common Pathogens ```{r r0-table, echo=FALSE} library(tidyverse) library(knitr) r0_table <- tibble::tribble( ~Pathogen, ~`Approximate R₀`, ~`Transmission Route`, "Measles virus", "12–18", "Airborne", "Pertussis (B. pertussis)", "12–17", "Droplet/contact", "Chickenpox (VZV)", "8–10", "Airborne/contact", "Mumps virus", "4–7", "Droplet", "Rubella virus", "5–7", "Droplet", "Polio virus", "5–7", "Faecal-oral", "SARS-CoV-2 (original)", "2–3", "Airborne/droplet", "SARS-CoV-2 (Delta variant)", "5–8", "Airborne", "SARS-CoV-2 (Omicron variant)", "10–18", "Airborne", "Seasonal influenza", "1.2–1.4", "Droplet", "HIV (sexual transmission)", "2–5", "Contact", "Ebola virus", "1.5–2.5", "Direct contact", "SARS-CoV-1", "2–5", "Droplet/aerosol" ) knitr::kable( r0_table, caption = "Approximate R₀ values for selected human pathogens (values are context-dependent and vary by population, time, and study methodology)", align = c("l", "c", "l") ) ``` ::: {.callout-note} Measles has one of the highest R₀ values of any human pathogen, which explains why achieving herd immunity against measles requires vaccination coverage of approximately **95%** of the population. ::: ### Superspreading and the Limits of R₀ R₀ is a mean, and means can be misleading when underlying distributions are highly skewed. Empirical evidence suggests that for many pathogens — including SARS-CoV-1, SARS-CoV-2, Ebola, and tuberculosis — the distribution of individual reproductive numbers is **overdispersed**: most infected individuals cause few or no secondary infections, while a small minority ("superspreaders") generate a disproportionate number of cases. The negative binomial distribution is commonly used to model overdispersed transmission, with a dispersion parameter $k$: small values of $k$ indicate high overdispersion. SARS-CoV-1 was estimated to have $k \approx 0.16$, implying extreme superspreading — roughly 20% of cases were responsible for 80% of transmission. --- ## R Implementation: Module 1 ### Session 1.1: Project Setup and Package Loading ```{r packages-m1} # Install packages if needed (run once) # install.packages(c("tidyverse", "incidence2", "lubridate")) library(tidyverse) library(incidence2) library(lubridate) ``` ### Session 1.2: Epidemic Curve Visualisation with `incidence2` The epidemic curve (epi curve) is the most fundamental descriptive tool in outbreak investigation. It plots case counts against time and communicates — at a glance — the phase of an epidemic (growth, plateau, decline), the likely incubation period, and potential exposure events. ```{r epi-curve} # Simulated linelist: date of symptom onset for 200 cases set.seed(42) linelist <- data.frame( id = 1:200, date_onset = as.Date("2024-01-01") + round(rgamma(200, shape = 3, rate = 0.15)) ) # Build incidence object (weekly intervals) epi_curve <- incidence( x = linelist, date_index = "date_onset", interval = "week" ) # Plot epidemic curve plot(epi_curve) + labs( title = "Simulated Outbreak — Weekly Epidemic Curve", subtitle = "Gamma-distributed symptom onset dates (n = 200 cases)", x = "Week of symptom onset", y = "Number of cases" ) + theme_minimal(base_size = 13) + theme(plot.title = element_text(face = "bold")) ``` ::: {.callout-tip title="Interpretation Exercise"} From the epidemic curve alone, students should be able to identify: 1. The **epidemic type** (point source vs propagated): a point source outbreak shows a sharp rise and decline following a single exposure; a propagated outbreak shows a more gradual, wave-like pattern. 2. The approximate **epidemic peak**. 3. Whether the epidemic is **growing, plateauing, or declining** at any given point. ::: ### Session 1.3: Calculating Descriptive Epidemiological Parameters ```{r descriptive-params} # Attack rate total_population <- 5000 total_cases <- 200 attack_rate <- total_cases / total_population cat("Attack rate:", round(attack_rate * 100, 2), "%\n") # Case fatality rate (CFR) total_deaths <- 12 CFR <- total_deaths / total_cases cat("CFR:", round(CFR * 100, 2), "%\n") # Doubling time from early growth phase (naive exponential estimate) early_cases <- linelist %>% filter(date_onset <= as.Date("2024-01-21")) %>% # first 3 weeks count(week = lubridate::floor_date(date_onset, "week")) %>% filter(n > 0) # Fit simple exponential growth model model <- lm(log(n) ~ as.numeric(week), data = early_cases) growth_rate <- coef(model)[2] doubling_time <- log(2) / growth_rate cat("Estimated doubling time:", round(doubling_time, 1), "days\n") ``` --- ## Module 1 References - Kermack, W. O., & McKendrick, A. G. (1927). A contribution to the mathematical theory of epidemics. *Proceedings of the Royal Society of London. Series A*, 115(772), 700–721. - Anderson, R. M., & May, R. M. (1991). *Infectious Diseases of Humans: Dynamics and Control.* Oxford University Press. - Keeling, M. J., & Rohani, P. (2008). *Modeling Infectious Diseases in Humans and Animals.* Princeton University Press. - Bjørnstad, O. N. (2018). *Epidemics: Models and Data Using R.* Springer. - Gordis, L. (2014). *Epidemiology* (5th ed.). Elsevier Saunders. - Diekmann, O., Heesterbeek, H., & Britton, T. (2013). *Mathematical Tools for Understanding Infectious Disease Dynamics.* Princeton University Press. - Dietz, K. (1993). The estimation of the basic reproduction number for infectious diseases. *Statistical Methods in Medical Research*, 2(1), 23–41. - Heesterbeek, J. A. P. (2002). A brief history of R₀ and a recipe for its calculation. *Acta Biotheoretica*, 50(3), 189–204. - Delamater, P. L., Street, E. J., Leslie, T. F., Yang, Y. T., & Jacobsen, K. H. (2019). Complexity of the basic reproduction number (R₀). *Emerging Infectious Diseases*, 25(1), 1–4. - Lloyd-Smith, J. O., Schreiber, S. J., Kopp, P. E., & Getz, W. M. (2005). Superspreading and the effect of individual variation on disease emergence. *Nature*, 438(7066), 355–359. - Kamvar, Z. N., Cai, J., Pulliam, J. R. C., Schumacher, J., & Jombart, T. (2019). Epidemic curves made easy using the R package incidence. *F1000Research*, 8, 139. --- --- # MODULE 2: Compartmental Models — The SIR Framework {.unnumbered} --- ## The Logic of Compartmental Modelling A compartmental model divides a population into mutually exclusive subgroups — *compartments* — defined by their status with respect to the infection of interest. Individuals move between compartments according to mathematically specified rates. The model tracks the size of each compartment over time and produces predictions about epidemic dynamics. The elegance of this approach lies in its parsimony. Rather than tracking every individual in a population (computationally expensive and often unnecessary), we track the aggregate flow between states. This is justified under the **mean-field assumption**: that individuals within a compartment are identical and mix randomly with individuals in all other compartments. ### Population-Level vs Individual-Level Thinking A critical conceptual transition for students — particularly those trained in clinical medicine or individual-level epidemiology — is shifting from thinking about what happens to *an individual* to thinking about what happens to *a population*. A clinician treats a patient: the patient is either infected or not, recovers or does not. A compartmental modeller asks: given that 10% of the population is currently infectious, what fraction will become infected over the next two weeks? These are fundamentally different questions requiring different frameworks. ::: {.callout-note title="Deterministic vs Stochastic Models"} The compartmental model, in its basic form, is **deterministic**: given a set of initial conditions and parameters, the model produces a single, unique trajectory. Real epidemics are stochastic (random), but the deterministic approximation is very good when population sizes are large (typically N > 1,000–10,000). Module 5 addresses stochasticity explicitly. ::: --- ## Flow Diagrams: The Visual Language of Compartmental Models ::: {.callout-important} **Before writing a single differential equation, every student should be able to draw and interpret a flow diagram.** This is not a simplification for beginners — it is how professional modellers think. The flow diagram *is* the model, expressed visually. The differential equations are simply a translation of the diagram into mathematical notation. ::: ### Elements of a Flow Diagram A flow diagram consists of: - **Boxes** representing compartments, each labelled with the compartment name and its variable (S, I, R, E, etc.) - **Arrows** representing flows between compartments, each carrying a rate label - **Rate labels** describing how fast the flow occurs — these are the model parameters ### The SIR Flow Diagram The simplest epidemic model with lasting immunity involves three compartments: $$\boxed{S} \xrightarrow{\beta I/N} \boxed{I} \xrightarrow{\gamma} \boxed{R}$$ Reading this diagram aloud: susceptible individuals (S) become infectious (I) at a rate proportional to the current prevalence of infection ($\beta I/N$), where β is the transmission rate. Infectious individuals recover (R) at a constant rate γ (gamma). Recovered individuals are permanently immune and leave the dynamic system. This diagram encodes several explicit assumptions: ::: {.callout-note title="Assumptions of the Basic SIR Model"} 1. The population is **closed** — no births, deaths, or migration; total N = S + I + R is constant. 2. All susceptible individuals are **equally susceptible**. 3. All infectious individuals are **equally infectious**. 4. Mixing is **homogeneous** — every individual contacts every other individual with equal probability. 5. Recovery confers **permanent, complete immunity**. 6. There is **no latent period** — infection is immediately followed by infectiousness. ::: Each of these assumptions can be relaxed (and will be in subsequent modules), but they define the canonical SIR model as formulated by Kermack and McKendrick. --- ## From Flow Diagram to Differential Equations The differential equations of the SIR model are a direct translation of the flow diagram. For each compartment, the equation describes the rate of change of its size over time: **inflows minus outflows**. ### Deriving the SIR Equations **For the Susceptible compartment (S):** The only flow *out* of S is infection. The rate of infection is $\beta \cdot S \cdot I/N$: the transmission rate β multiplied by the number of susceptibles S multiplied by the probability that a random contact is with an infectious individual ($I/N$). There are no inflows. $$\frac{dS}{dt} = -\frac{\beta S I}{N}$$ **For the Infectious compartment (I):** One inflow (new infections from S) and one outflow (recovery to R): $$\frac{dI}{dt} = \frac{\beta S I}{N} - \gamma I$$ **For the Recovered compartment (R):** One inflow (recoveries from I) and no outflows: $$\frac{dR}{dt} = \gamma I$$ ### Conservation of Population A key sanity check: N = S + I + R should be constant over time. Adding all three equations: $$\frac{dS}{dt} + \frac{dI}{dt} + \frac{dR}{dt} = -\frac{\beta S I}{N} + \frac{\beta S I}{N} - \gamma I + \gamma I = 0$$ The terms cancel exactly, confirming that $dN/dt = 0$ — the total population is conserved. ✓ ### The Mass Action Principle The infection term $\beta SI/N$ embodies the *mass action principle*, borrowed from chemical kinetics. Two formulations exist: | Formulation | Expression | When to use | |---|---|---| | **Frequency-dependent** (standard incidence) | $\beta \cdot S \cdot I/N$ | Contact rates do not increase with population size (most human diseases) | | **Density-dependent** (mass action incidence) | $\beta \cdot S \cdot I$ | Contact rates increase with density (some vector-borne diseases) | ::: {.callout-warning title="Important: Choice of Incidence Formulation"} With **frequency-dependent** incidence, R₀ is **independent** of population size. With **density-dependent** incidence, R₀ **scales with N**. Students frequently confuse these formulations — the choice matters substantially for model predictions. ::: --- ## Parameters of the SIR Model ### The Transmission Rate (β) **Biological meaning:** β encodes the combined effects of the contact rate and per-contact transmission probability: $$\beta = c \cdot p$$ where $c$ is the average number of potentially infectious contacts per unit time, and $p$ is the probability of transmission per contact. **Units:** per person per unit time (e.g., day⁻¹). **Example:** If β = 0.5 day⁻¹ and I/N = 0.1, then the daily per capita rate of new infections among susceptibles is 0.05 — 5% of susceptibles become infected each day. **Estimation:** - From early epidemic growth rate combined with serial interval data - From household secondary attack rates combined with contact frequency data - By fitting the full SIR model to incidence data (Module 9) ### The Recovery Rate (γ) **Biological meaning:** γ is the per capita rate at which infectious individuals recover per unit time. The *mean infectious period* is $1/\gamma$. **Units:** per unit time (e.g., day⁻¹). **Example:** If γ = 0.1 day⁻¹, then 10% of currently infectious individuals recover each day, and the mean infectious period is $1/0.1 = 10$ days. ::: {.callout-note title="The Exponential Infectious Period"} Under the SIR formulation, the time spent in the I compartment follows an **exponential distribution** with rate γ. The exponential distribution has a *memoryless* property: the probability of recovering today does not depend on how long the individual has already been infectious. This is a mathematical convenience, not a biological truth. More realistic distributions (gamma, Weibull) can be incorporated with the SEIR and related models (Module 3). ::: ### Deriving R₀ from the SIR Model Having established the model structure, we can now derive R₀ rigorously. Consider the equation for $dI/dt$: $$\frac{dI}{dt} = I\left(\frac{\beta S}{N} - \gamma\right)$$ At the very beginning of an epidemic, virtually the entire population is susceptible: $S \approx N$. Therefore: $$\frac{dI}{dt} \approx I(\beta - \gamma) = \gamma I\left(\frac{\beta}{\gamma} - 1\right)$$ The infection grows ($dI/dt > 0$) if and only if $\beta/\gamma > 1$. We therefore define: $$\boxed{R_0 = \frac{\beta}{\gamma}}$$ This is a clean, model-derived definition: R₀ is the ratio of the rate at which new infections are generated (β) to the rate at which infections are resolved (γ). Each infectious individual generates new infections at rate β for an average duration of $1/\gamma$, giving $\beta \times (1/\gamma) = \beta/\gamma$ secondary infections. ### The Epidemic Threshold Theorem The epidemic threshold theorem states that a major epidemic can occur if and only if **R₀ > 1**, equivalently if the initial proportion of susceptibles $S(0)/N$ exceeds $1/R_0$. **Corollary 1 — Not everyone gets infected.** Even when R₀ >> 1, the epidemic does not infect the entire population. As infection spreads, the susceptible pool depletes, reducing the force of infection. Eventually, susceptibles fall below $N/R_0$ and the epidemic begins to decline. The final size of the epidemic is determined by the *final size equation* (discussed below). **Corollary 2 — Herd immunity.** If a proportion $v$ of the population is vaccinated (assuming perfect vaccine efficacy), the effective R₀ becomes $R_0(1-v)$. For the epidemic to be controlled: $$R_0(1-v) < 1 \implies v_c = 1 - \frac{1}{R_0}$$ | Pathogen | R₀ | Herd Immunity Threshold ($v_c$) | |---|---|---| | Measles | ~15 | ~93% | | Pertussis | ~15 | ~93% | | COVID-19 (original) | ~2.5 | ~60% | | COVID-19 (Omicron) | ~12 | ~92% | | Seasonal influenza | ~1.3 | ~23% | ### The Final Size Equation The final proportion of the population ever infected ($z$) satisfies: $$1 - z = e^{-R_0 z}$$ This transcendental equation has no closed-form solution but can be solved numerically: ```{r final-size} # Numerically solve the final size equation for a range of R0 values final_size_solve <- function(R0) { # uniroot finds z such that (1 - z) - exp(-R0 * z) = 0 f <- function(z) (1 - z) - exp(-R0 * z) sol <- uniroot(f, interval = c(1e-6, 1 - 1e-6)) sol$root } R0_vals <- seq(1.01, 10, by = 0.05) final_size <- sapply(R0_vals, final_size_solve) fs_df <- data.frame(R0 = R0_vals, attack_rate = final_size * 100) ggplot(fs_df, aes(x = R0, y = attack_rate)) + geom_line(colour = "#c0392b", linewidth = 1.3) + geom_hline(yintercept = c(58, 80, 89), linetype = "dashed", colour = "grey50", linewidth = 0.7) + annotate("text", x = 9.8, y = 60, label = "R₀=1.5 → 58%", size = 3.5, hjust = 1, colour = "#555555") + annotate("text", x = 9.8, y = 82, label = "R₀=2.0 → 80%", size = 3.5, hjust = 1, colour = "#555555") + annotate("text", x = 9.8, y = 91, label = "R₀=2.5 → 89%", size = 3.5, hjust = 1, colour = "#555555") + labs( title = "Final Epidemic Size as a Function of R₀", subtitle = "Derived from the Kermack–McKendrick final size equation: 1 − z = exp(−R₀z)", x = "Basic reproduction number (R₀)", y = "Final attack rate (%)" ) + theme_minimal(base_size = 13) + theme(plot.title = element_text(face = "bold")) ``` ::: {.callout-important} An R₀ of just **2** leads to **80%** of the population being infected in the absence of interventions or prior immunity. This is a counterintuitive result for many students. The implication is that interventions that reduce R₀ — even without eliminating transmission — can have enormous impact on total disease burden. ::: --- ## Dynamics of the SIR Model ### Qualitative Behaviour The SIR model produces a characteristic epidemic trajectory across four phases: 1. **Exponential growth phase:** When $S \approx N$ and $I$ is small, $dI/dt \approx \gamma I(R_0 - 1)$, and $I$ grows approximately exponentially. The growth rate is $r = \gamma(R_0 - 1)$. 2. **Peak:** The epidemic peaks when $dI/dt = 0$, i.e., when $\beta S/N = \gamma$, i.e., when $S = N/R_0$. At this moment, depletion of susceptibles has reduced transmission to exactly the replacement rate. 3. **Decline:** Once $S < N/R_0$, new infections are generated more slowly than recoveries occur, and $I$ begins to decline. Note: R₀ itself has not changed; the *effective* reproduction number $R_t = R_0 \cdot S/N$ has fallen below 1. 4. **Epidemic burnout:** $I$ declines toward zero, but $S$ does not reach zero. A residual pool of susceptibles always remains — too few to sustain transmission, but not zero. ### The Effective Reproduction Number Rₜ The effective reproduction number at time $t$ is: $$R_t = R_0 \cdot \frac{S(t)}{N}$$ - At $t = 0$: $R_t = R_0$ (full susceptibility) - When $R_t = 1$: the epidemic is at its peak - When $R_t < 1$: the epidemic is declining --- ## R Implementation: Module 2 ### Session 2.1: Setting Up Required Packages ```{r packages-m2} # Install if needed # install.packages(c("deSolve", "tidyverse")) library(deSolve) library(tidyverse) ``` ### Session 2.2: Implementing the SIR Model with `deSolve` ```{r sir-model} # ---- Step 1: Define the ODE function ---- sir_model <- function(time, state, params) { with(as.list(c(state, params)), { N <- S + I + R dS_dt <- -beta * S * I / N dI_dt <- beta * S * I / N - gamma * I dR_dt <- gamma * I return(list(c(dS_dt, dI_dt, dR_dt))) }) } # ---- Step 2: Define parameters ---- params <- c( beta = 0.4, # transmission rate (day^-1) gamma = 0.1 # recovery rate (day^-1) # R0 = beta/gamma = 0.4/0.1 = 4 ) # ---- Step 3: Define initial conditions ---- # 1 infectious individual in a fully susceptible population of 10,000 N <- 10000 inits <- c(S = N - 1, I = 1, R = 0) # ---- Step 4: Define time vector ---- times <- seq(0, 200, by = 1) # 200 days, daily steps # ---- Step 5: Solve the ODE system ---- output <- ode( y = inits, times = times, func = sir_model, parms = params ) sir_df <- as.data.frame(output) head(sir_df) ``` ### Session 2.3: Visualising SIR Compartment Dynamics ```{r sir-plot} # Reshape to long format for ggplot sir_long <- sir_df %>% pivot_longer( cols = c(S, I, R), names_to = "Compartment", values_to = "Count" ) %>% mutate( Compartment = factor(Compartment, levels = c("S", "I", "R")), Label = case_when( Compartment == "S" ~ "Susceptible", Compartment == "I" ~ "Infectious", Compartment == "R" ~ "Recovered" ) ) ggplot(sir_long, aes(x = time, y = Count, colour = Label)) + geom_line(linewidth = 1.3) + scale_colour_manual( values = c("Susceptible" = "#2196F3", "Infectious" = "#F44336", "Recovered" = "#4CAF50") ) + scale_y_continuous(labels = scales::comma) + labs( title = "SIR Model Dynamics (R₀ = 4)", subtitle = "N = 10,000; β = 0.4 day⁻¹; γ = 0.1 day⁻¹; 1 seed case", x = "Time (days)", y = "Number of individuals", colour = "Compartment" ) + theme_minimal(base_size = 13) + theme( plot.title = element_text(face = "bold"), legend.position = "bottom" ) ``` ### Session 2.4: Sensitivity Analysis — Varying R₀ ```{r r0-sensitivity} # Function to run SIR for a given R0 run_sir <- function(R0, gamma = 0.1, N = 10000, days = 200) { beta <- R0 * gamma params <- c(beta = beta, gamma = gamma) inits <- c(S = N - 1, I = 1, R = 0) times <- seq(0, days, by = 1) out <- ode(y = inits, times = times, func = sir_model, parms = params) as.data.frame(out) %>% mutate(R0_label = paste0("R₀ = ", R0)) } # Run for multiple R0 values R0_values <- c(1.5, 2.0, 3.0, 5.0, 8.0) results_df <- map_dfr(R0_values, run_sir) # Plot infectious curves results_df %>% ggplot(aes(x = time, y = I, colour = factor(R0_label, levels = paste0("R₀ = ", R0_values)), group = R0_label)) + geom_line(linewidth = 1.1) + scale_colour_brewer(palette = "Set1") + scale_y_continuous(labels = scales::comma) + labs( title = "SIR Infectious Curve Across a Range of R₀ Values", subtitle = "N = 10,000; γ = 0.1 day⁻¹; 1 seed case", x = "Time (days)", y = "Number infectious", colour = "Scenario" ) + theme_minimal(base_size = 13) + theme( plot.title = element_text(face = "bold"), legend.position = "right" ) ``` ::: {.callout-tip title="Interpretation Exercise"} From the sensitivity plot, students should answer: 1. Which value of R₀ produces the **earliest peak**? Why? *(Higher R₀ → faster growth → earlier, sharper peak)* 2. For which R₀ is the epidemic most **prolonged**? *(Lower R₀ near 1 → slow spread → longer epidemic)* 3. How does total **final attack rate** change across R₀ values? *(Use the final size equation to verify)* ::: ### Session 2.5: Verifying the Herd Immunity Threshold ```{r hit-plot} # Analytic herd immunity threshold as a function of R0 R0_vals <- seq(1.1, 18, by = 0.1) HIT <- 1 - 1 / R0_vals hit_df <- data.frame(R0 = R0_vals, HIT_pct = HIT * 100) ggplot(hit_df, aes(x = R0, y = HIT_pct)) + geom_line(colour = "#9C27B0", linewidth = 1.3) + geom_hline(yintercept = c(60, 93), linetype = "dashed", colour = "grey50") + annotate("text", x = 17.5, y = 62, label = "COVID-19 original (~60%)", size = 3.5, hjust = 1) + annotate("text", x = 17.5, y = 95, label = "Measles / Omicron (~93%)", size = 3.5, hjust = 1) + scale_y_continuous(limits = c(0, 100)) + labs( title = "Herd Immunity Threshold as a Function of R₀", subtitle = expression(v[c] == 1 - 1/R[0]), x = "Basic reproduction number (R₀)", y = "Vaccination coverage required (%)" ) + theme_minimal(base_size = 13) + theme(plot.title = element_text(face = "bold")) ``` ### Session 2.6: Incidence vs Prevalence — A Critical Distinction Students often conflate **incidence** (new cases per day) with **prevalence** (current number of infectious cases). The SIR model generates prevalence $I(t)$ directly. Incidence is the rate of flow from S to I. ```{r incidence-prevalence} # Compute daily incidence from the reduction in S sir_df <- sir_df %>% mutate( incidence = c(NA, -diff(S)), # new infections per day = reduction in S prevalence = I ) # Plot both on the same axes sir_df %>% select(time, incidence, prevalence) %>% pivot_longer(-time, names_to = "measure", values_to = "count") %>% mutate(measure = str_to_title(measure)) %>% ggplot(aes(x = time, y = count, colour = measure)) + geom_line(linewidth = 1.2) + scale_colour_manual( values = c("Incidence" = "#e74c3c", "Prevalence" = "#2980b9") ) + scale_y_continuous(labels = scales::comma) + labs( title = "Incidence vs Prevalence in the SIR Model", subtitle = "Note: incidence peak precedes prevalence peak by approximately 1/γ days", x = "Time (days)", y = "Cases", colour = NULL ) + theme_minimal(base_size = 13) + theme( plot.title = element_text(face = "bold"), legend.position = "bottom" ) ``` ::: {.callout-important title="A Consequential Surveillance Insight"} The **incidence peak precedes the prevalence peak** by approximately $1/\gamma$ days. This means that by the time observed case counts in a surveillance system appear to peak, transmission has often already begun to decline. Delayed or lagged reporting further amplifies this gap. Modellers and public health practitioners must account for this when interpreting real-time surveillance data. ::: ### Session 2.7: Computing Rₜ Over Time from SIR Output ```{r reff-over-time} # R0 and parameters R0 <- 0.4 / 0.1 # beta / gamma = 4 N_pop <- 10000 # Compute Rt = R0 * S(t)/N at each time point sir_df <- sir_df %>% mutate(Rt = R0 * S / N_pop) ggplot(sir_df, aes(x = time, y = Rt)) + geom_line(colour = "#e67e22", linewidth = 1.3) + geom_hline(yintercept = 1, linetype = "dashed", colour = "#c0392b", linewidth = 0.9) + annotate("text", x = 180, y = 1.15, label = "Rₜ = 1 (epidemic peak)", size = 3.5, colour = "#c0392b") + labs( title = "Effective Reproduction Number Rₜ Over the Course of the Epidemic", subtitle = expression(R[t] == R[0] %.% S(t)/N), x = "Time (days)", y = expression(R[t]) ) + theme_minimal(base_size = 13) + theme(plot.title = element_text(face = "bold")) ``` ::: {.callout-tip title="Reading the Rₜ Plot"} - When **Rₜ > 1**: the epidemic is growing. The gap above 1 indicates how fast. - When **Rₜ = 1**: the epidemic is at its peak in terms of incidence. - When **Rₜ < 1**: the epidemic is declining. Rₜ will never return to R₀ because susceptibles are progressively depleted. ::: --- ## Module 2 References - Kermack, W. O., & McKendrick, A. G. (1927). A contribution to the mathematical theory of epidemics. *Proceedings of the Royal Society of London. Series A*, 115(772), 700–721. - Kermack, W. O., & McKendrick, A. G. (1932). Contributions to the mathematical theory of epidemics: II. The problem of endemicity. *Proceedings of the Royal Society of London. Series A*, 138(834), 55–83. - Kermack, W. O., & McKendrick, A. G. (1933). Contributions to the mathematical theory of epidemics: III. Further studies of the problem of endemicity. *Proceedings of the Royal Society of London. Series A*, 141(843), 94–122. - Keeling, M. J., & Rohani, P. (2008). *Modeling Infectious Diseases in Humans and Animals.* Princeton University Press. - Anderson, R. M., & May, R. M. (1991). *Infectious Diseases of Humans: Dynamics and Control.* Oxford University Press. - Bjørnstad, O. N. (2018). *Epidemics: Models and Data Using R.* Springer. - Diekmann, O., Heesterbeek, H., & Britton, T. (2013). *Mathematical Tools for Understanding Infectious Disease Dynamics.* Princeton University Press. - Vynnycky, E., & White, R. (2010). *An Introduction to Mathematical Modelling of Infectious Diseases.* Oxford University Press. - Hethcote, H. W. (2000). The mathematics of infectious diseases. *SIAM Review*, 42(4), 599–653. - Heesterbeek, J. A. P., & Dietz, K. (1996). The concept of R₀ in epidemic theory. *Statistica Neerlandica*, 50(1), 89–110. - Fine, P., Eames, K., & Heymann, D. L. (2011). "Herd immunity": a rough guide. *Clinical Infectious Diseases*, 52(7), 911–916. - McCallum, H., Barlow, N., & Hone, J. (2001). How should pathogen transmission be modelled? *Trends in Ecology & Evolution*, 16(6), 295–300. - Soetaert, K., Petzoldt, T., & Setzer, R. W. (2010). Solving differential equations in R: Package deSolve. *Journal of Statistical Software*, 33(9), 1–25. ---

Term	Definition	Model Interpretation
Epidemic	Cases in excess of what is normally expected in a defined area or season	Exponential growth phase; \(R_t > 1\)
Endemic	Relatively constant disease presence in a population over time	Stable equilibrium; incidence neither grows nor declines
Outbreak	Geographically confined epidemic (hospital, community, district)	Localised epidemic curve
Pandemic	Epidemic spanning multiple countries or continents	Multi-patch spread; global Rₜ > 1

Intervention	Component targeted
Masks, antivirals, condoms	Reduce \(\beta_c\)
Social distancing, school closures, travel restrictions	Reduce \(c\)
Treatment shortening infectious period, isolation	Reduce \(D\)

Formulation	Expression	When to use
Frequency-dependent (standard incidence)	\(\beta \cdot S \cdot I/N\)	Contact rates do not increase with population size (most human diseases)
Density-dependent (mass action incidence)	\(\beta \cdot S \cdot I\)	Contact rates increase with density (some vector-borne diseases)

MODULE 1: Foundations of Infectious Disease Epidemiology

0.1 Introduction: Why Do We Model Infectious Diseases?

0.1.1 The Historical Imperative

0.1.2 What Models Can and Cannot Do

0.2 The Epidemiological Triad

0.2.1 The Host

0.2.2 The Agent

0.2.3 The Environment

0.3 Transmission Dynamics: The Mechanics of Spread

0.3.1 Direct Transmission

0.3.2 Indirect Transmission

0.3.3 Transmission Probability and the Force of Infection

0.4 Natural History of Infection

0.4.1 Stages of Infection

0.4.2 Clinical Spectrum of Infection

0.5 Epidemic, Endemic, and Pandemic: Precise Definitions

0.6 The Basic Reproduction Number (R₀): Concept Before Formula

0.6.1 Intuitive Definition

0.6.2 Interpreting R₀

0.6.3 What Determines R₀?

0.6.4 Approximate R₀ Values for Common Pathogens

0.6.5 Superspreading and the Limits of R₀

0.7 R Implementation: Module 1

0.7.1 Session 1.1: Project Setup and Package Loading

0.7.2 Session 1.2: Epidemic Curve Visualisation with incidence2

0.7.3 Session 1.3: Calculating Descriptive Epidemiological Parameters

0.8 Module 1 References

MODULE 2: Compartmental Models — The SIR Framework

0.9 The Logic of Compartmental Modelling

0.9.1 Population-Level vs Individual-Level Thinking

0.10 Flow Diagrams: The Visual Language of Compartmental Models

0.10.1 Elements of a Flow Diagram

0.10.2 The SIR Flow Diagram

0.11 From Flow Diagram to Differential Equations

0.11.1 Deriving the SIR Equations

0.11.2 Conservation of Population

0.11.3 The Mass Action Principle

0.12 Parameters of the SIR Model

0.12.1 The Transmission Rate (β)

0.12.2 The Recovery Rate (γ)

0.12.3 Deriving R₀ from the SIR Model

0.12.4 The Epidemic Threshold Theorem

0.12.5 The Final Size Equation

0.13 Dynamics of the SIR Model

0.13.1 Qualitative Behaviour

0.13.2 The Effective Reproduction Number Rₜ

0.14 R Implementation: Module 2

0.14.1 Session 2.1: Setting Up Required Packages

0.14.2 Session 2.2: Implementing the SIR Model with deSolve

0.14.3 Session 2.3: Visualising SIR Compartment Dynamics

0.14.4 Session 2.4: Sensitivity Analysis — Varying R₀

0.14.5 Session 2.5: Verifying the Herd Immunity Threshold

0.14.6 Session 2.6: Incidence vs Prevalence — A Critical Distinction

0.14.7 Session 2.7: Computing Rₜ Over Time from SIR Output

0.15 Module 2 References

0.7.2 Session 1.2: Epidemic Curve Visualisation with `incidence2`

0.14.2 Session 2.2: Implementing the SIR Model with `deSolve`