0.1 —
1 Learning objectives
2 Why infectious disease modelling matters
3 Basic epidemiological concepts
- 3.1 Host, pathogen and transmission
- 3.2 Incidence and prevalence
- 3.3 Force of infection
- 3.4 Latent period, incubation period and infectious period
4 Model classification
- 4.1 Deterministic models
- 4.2 Stochastic models
- 4.3 Compartmental models
5 The SI model
6 The SIR model
7 The SEIR model
8 Reproduction numbers
- 8.1 Basic reproduction number
- 8.2 Effective reproduction number
- 8.3 Time-varying reproduction number from incidence data
9 Real-data example: early COVID-19 incidence in Australia
10 Estimating exponential growth rate
11 Approximate reproduction number from growth rate
12 Simple renewal-model estimate of time-varying R
13 Parameter estimation for an SIR model
14 Likelihood-based estimation
15 Stochastic epidemic simulation
16 Sensitivity analysis
- 16.1 One-way sensitivity analysis
- 16.2 Two-way sensitivity analysis
17 Intervention modelling
18 Vaccination and herd immunity threshold
19 Advanced modelling extensions
- 19.1 Overdispersion and superspreading
- 19.2 Under-reporting
- 19.3 Reporting delays
- 19.4 Spatial transmission
20 Model validation and diagnostics
21 Common mistakes in infectious disease modelling
22 Practical workflow for a modelling project
23 Practice questions
24 References and useful reading
25 Session information

Affiliations

¹ Charles Sturt University, Sydney, Australia

² Gulabali Research Centre, Charles Sturt University, Wagga Wagga, Australia

Corresponding Author: Dr Hom Nath Dhungana

Email: homnath1988@gmail.com

0.1 —

1 Learning objectives

By the end of these notes, you should be able to:

explain the core epidemiological concepts used in infectious disease modelling;
distinguish deterministic, stochastic, individual-level, population-level, compartmental and renewal models;
formulate and simulate basic SI, SIS, SIR and SEIR models;
estimate simple transmission parameters from incidence data;
compute and interpret the basic reproduction number, effective reproduction number and time-varying reproduction number;
conduct basic sensitivity analysis for epidemiological parameters;
understand how uncertainty, under-reporting, delay and overdispersion affect infectious disease inference.

2 Why infectious disease modelling matters

Infectious disease modelling translates biological, epidemiological and behavioural knowledge into mathematical or statistical structures. These models help researchers and public health decision-makers answer questions such as:

How fast is an outbreak growing?
How many people may be infected in the future?
Which parameters drive transmission most strongly?
What level of vaccination, isolation or behaviour change is needed to control spread?
How uncertain are the estimates?

Models are not perfect predictions. They are structured ways of reasoning under uncertainty. Their usefulness depends on the quality of data, assumptions, parameter values, model structure and interpretation.

3 Basic epidemiological concepts

3.1 Host, pathogen and transmission

An infectious disease system usually involves:

host: the human, animal or vector that can be infected;
pathogen: virus, bacterium, parasite or fungus causing infection;
reservoir: environment or population where the pathogen persists;
transmission route: direct contact, droplet, airborne, vector-borne, sexual, foodborne, waterborne or environmental;
susceptible population: individuals who can acquire infection;
infectious population: individuals who can transmit the pathogen.

3.2 Incidence and prevalence

Incidence is the number of new cases occurring during a specified time interval.
Prevalence is the number or proportion of existing cases at a particular point or period.

For modelling acute infections, incidence is often more informative because it describes the timing of new infections. For chronic infections such as tuberculosis or HIV, both incidence and prevalence may be important.

3.3 Force of infection

The force of infection, usually denoted by \(\lambda(t)\), is the instantaneous rate at which susceptible individuals become infected.

In a basic frequency-dependent model:

\[ \lambda(t) = \beta \frac{I(t)}{N}, \]

where:

\(\beta\) is the effective transmission rate;
\(I(t)\) is the number infectious at time \(t\);
\(N\) is the total population size.

3.4 Latent period, incubation period and infectious period

Latent period: time from infection to becoming infectious.
Incubation period: time from infection to symptom onset.
Infectious period: duration during which an infected person can transmit.
Generation interval: time between infection of a primary case and infection of a secondary case.
Serial interval: time between symptom onset in a primary case and symptom onset in a secondary case.

The generation interval is biologically central but often unobserved. The serial interval is easier to observe and is widely used in reproduction-number estimation.

4 Model classification

4.1 Deterministic models

Deterministic models produce the same output every time for the same initial conditions and parameter values. They are usually written as ordinary differential equations.

Advantages:

easy to interpret;
computationally efficient;
useful for average epidemic dynamics.

Limitations:

do not represent random variation directly;
less appropriate when case numbers are very small;
can hide outbreak extinction or superspreading effects.

4.2 Stochastic models

Stochastic models include random variation. They are useful when transmission is highly variable or when the number of infections is small.

Advantages:

represent chance extinction;
capture outbreak variability;
useful for uncertainty analysis and simulation.

Limitations:

computationally heavier;
require repeated simulation;
harder to interpret than deterministic models.

4.3 Compartmental models

Compartmental models divide the population into epidemiological states, for example:

\(S\): susceptible;
\(E\): exposed but not yet infectious;
\(I\): infectious;
\(R\): removed, recovered or immune;
\(D\): dead;
\(V\): vaccinated.

Common models include SI, SIS, SIR, SEIR and SEIRD.

5 The SI model

The SI model is the simplest infection model. Individuals move from susceptible to infected.

\[ \frac{dS}{dt} = -\beta \frac{SI}{N}, \]

\[ \frac{dI}{dt} = \beta \frac{SI}{N}. \]

This model may be useful for early growth or infections without recovery over the period of interest, but it is usually too simple for most real epidemics.

N <- 1000
beta <- 0.35
days <- 100
S <- I <- numeric(days + 1)
S[1] <- 999
I[1] <- 1

dt <- 1
for (t in 1:days) {
  new_inf <- beta * S[t] * I[t] / N * dt
  S[t + 1] <- S[t] - new_inf
  I[t + 1] <- I[t] + new_inf
}

plot(0:days, S, type = "l", lwd = 2, ylim = c(0, N), xlab = "Day", ylab = "Number of people")
lines(0:days, I, lwd = 2, lty = 2)
legend("right", legend = c("Susceptible", "Infected"), lwd = 2, lty = c(1, 2), bty = "n")

6 The SIR model

The SIR model is the classic model for acute immunising infections.

\[ \frac{dS}{dt} = -\beta \frac{SI}{N}, \]

\[ \frac{dI}{dt} = \beta \frac{SI}{N} - \gamma I, \]

\[ \frac{dR}{dt} = \gamma I. \]

where:

\(\beta\): effective transmission rate;
\(\gamma\): recovery rate;
\(1/\gamma\): average infectious period.

The basic reproduction number is:

\[ R_0 = \frac{\beta}{\gamma}. \]

If \(R_0 > 1\), the epidemic can grow in a fully susceptible population. If \(R_0 < 1\), sustained spread is unlikely.

simulate_sir <- function(N = 1000, I0 = 1, R0_init = 0, beta = 0.3, gamma = 0.1, days = 160, dt = 1) {
  steps <- days / dt
  S <- I <- R <- numeric(steps + 1)
  S[1] <- N - I0 - R0_init
  I[1] <- I0
  R[1] <- R0_init
  
  for (t in 1:steps) {
    new_inf <- beta * S[t] * I[t] / N * dt
    new_rec <- gamma * I[t] * dt
    S[t + 1] <- S[t] - new_inf
    I[t + 1] <- I[t] + new_inf - new_rec
    R[t + 1] <- R[t] + new_rec
  }
  
  data.frame(day = seq(0, days, by = dt), S = S, I = I, R = R)
}

sir_out <- simulate_sir(beta = 0.30, gamma = 0.10)
head(sir_out)

plot(sir_out$day, sir_out$S, type = "l", lwd = 2, ylim = c(0, 1000), xlab = "Day", ylab = "Number of people")
lines(sir_out$day, sir_out$I, lwd = 2, lty = 2)
lines(sir_out$day, sir_out$R, lwd = 2, lty = 3)
legend("right", legend = c("S", "I", "R"), lwd = 2, lty = c(1, 2, 3), bty = "n")

7 The SEIR model

The SEIR model includes an exposed compartment.

\[ \frac{dS}{dt} = -\beta \frac{SI}{N}, \]

\[ \frac{dE}{dt} = \beta \frac{SI}{N} - \sigma E, \]

\[ \frac{dI}{dt} = \sigma E - \gamma I, \]

\[ \frac{dR}{dt} = \gamma I. \]

where:

\(\sigma\): rate of progression from exposed to infectious;
\(1/\sigma\): average latent period;
\(\gamma\): recovery rate.

simulate_seir <- function(N = 1000, E0 = 0, I0 = 1, R0_init = 0, beta = 0.4, sigma = 1/4, gamma = 1/6, days = 160, dt = 1) {
  steps <- days / dt
  S <- E <- I <- R <- numeric(steps + 1)
  S[1] <- N - E0 - I0 - R0_init
  E[1] <- E0
  I[1] <- I0
  R[1] <- R0_init
  
  for (t in 1:steps) {
    new_exp <- beta * S[t] * I[t] / N * dt
    new_inf <- sigma * E[t] * dt
    new_rec <- gamma * I[t] * dt
    S[t + 1] <- S[t] - new_exp
    E[t + 1] <- E[t] + new_exp - new_inf
    I[t + 1] <- I[t] + new_inf - new_rec
    R[t + 1] <- R[t] + new_rec
  }
  
  data.frame(day = seq(0, days, by = dt), S = S, E = E, I = I, R = R)
}

seir_out <- simulate_seir()

plot(seir_out$day, seir_out$S, type = "l", lwd = 2, ylim = c(0, 1000), xlab = "Day", ylab = "Number of people")
lines(seir_out$day, seir_out$E, lwd = 2, lty = 2)
lines(seir_out$day, seir_out$I, lwd = 2, lty = 3)
lines(seir_out$day, seir_out$R, lwd = 2, lty = 4)
legend("right", legend = c("S", "E", "I", "R"), lwd = 2, lty = c(1, 2, 3, 4), bty = "n")

8 Reproduction numbers

8.1 Basic reproduction number

The basic reproduction number \(R_0\) is the expected number of secondary infections generated by one infectious individual in a fully susceptible population.

For a simple SIR model:

\[ R_0 = \frac{\beta}{\gamma}. \]

8.2 Effective reproduction number

The effective reproduction number \(R_t\) accounts for depletion of susceptible individuals and interventions.

For a homogeneous SIR model:

\[ R_t = R_0 \frac{S(t)}{N}. \]

sir_out$Rt <- (0.30 / 0.10) * sir_out$S / 1000
plot(sir_out$day, sir_out$Rt, type = "l", lwd = 2, xlab = "Day", ylab = "Effective reproduction number")
abline(h = 1, lty = 2)

8.3 Time-varying reproduction number from incidence data

In practice, \(R_t\) can be estimated from incidence data and a serial interval distribution. A common renewal-equation approximation is:

\[ I_t \sim \text{Poisson}(R_t \Lambda_t), \]

where:

\[ \Lambda_t = \sum_{s=1}^{t} I_{t-s} w_s, \]

and \(w_s\) is the serial interval distribution.

The Cori et al. framework estimates the instantaneous reproduction number using incidence and serial interval assumptions. The EpiEstim R package implements this approach, but below we use a simple base-R version for teaching.

9 Real-data example: early COVID-19 incidence in Australia

The following small dataset contains daily confirmed COVID-19 cases for Australia during March 2020. It is embedded directly so that the R Markdown file knits without downloading data. In applied work, always use the most recent official data source, check definitions carefully and document reporting delays.

covid_aus <- data.frame(
  date = as.Date(c(
    "2020-03-01", "2020-03-02", "2020-03-03", "2020-03-04", "2020-03-05",
    "2020-03-06", "2020-03-07", "2020-03-08", "2020-03-09", "2020-03-10",
    "2020-03-11", "2020-03-12", "2020-03-13", "2020-03-14", "2020-03-15",
    "2020-03-16", "2020-03-17", "2020-03-18", "2020-03-19", "2020-03-20",
    "2020-03-21"
  )),
  cases = c(3, 4, 8, 12, 9, 13, 12, 15, 20, 16, 27, 32, 46, 37, 52, 57, 77, 111, 144, 147, 224)
)

covid_aus$cumulative <- cumsum(covid_aus$cases)
covid_aus

plot(covid_aus$date, covid_aus$cases, type = "h", lwd = 5, xlab = "Date", ylab = "Daily confirmed cases")
points(covid_aus$date, covid_aus$cases, pch = 16)

10 Estimating exponential growth rate

During the early phase of an epidemic, susceptible depletion is small. Incidence may grow approximately exponentially:

\[ I_t = I_0 e^{rt}, \]

where \(r\) is the exponential growth rate.

Taking logs:

\[ \log(I_t) = \log(I_0) + rt. \]

dat <- covid_aus
 dat$t <- seq_len(nrow(dat)) - 1
fit_growth <- lm(log(cases) ~ t, data = dat)
summary(fit_growth)

## 
## Call:
## lm(formula = log(cases) ~ t, data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.34294 -0.10887 -0.01426  0.14636  0.50288 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.415279   0.090882   15.57 2.84e-12 ***
## t           0.188917   0.007774   24.30 8.99e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2157 on 19 degrees of freedom
## Multiple R-squared:  0.9688, Adjusted R-squared:  0.9672 
## F-statistic: 590.5 on 1 and 19 DF,  p-value: 8.989e-16

r_hat <- coef(fit_growth)["t"]
doubling_time <- log(2) / r_hat
r_hat

##         t 
## 0.1889167

doubling_time

##        t 
## 3.669063

pred_cases <- exp(predict(fit_growth))
plot(dat$date, dat$cases, pch = 16, xlab = "Date", ylab = "Daily cases")
lines(dat$date, pred_cases, lwd = 2)

11 Approximate reproduction number from growth rate

If the infectious period is exponentially distributed with recovery rate \(\gamma\), a simple approximation is:

\[ R_0 \approx 1 + \frac{r}{\gamma}. \]

For a fixed generation interval \(T_g\), another approximation is:

\[ R_0 \approx e^{rT_g}. \]

gamma <- 1/5
Tg <- 5
R0_exp_infectious <- 1 + r_hat / gamma
R0_fixed_generation <- exp(r_hat * Tg)

c(
  growth_rate = r_hat,
  doubling_time_days = doubling_time,
  R0_exponential_infectious_period = R0_exp_infectious,
  R0_fixed_generation_interval = R0_fixed_generation
)

##                      growth_rate.t               doubling_time_days.t 
##                          0.1889167                          3.6690632 
## R0_exponential_infectious_period.t     R0_fixed_generation_interval.t 
##                          1.9445833                          2.5717416

Interpretation: these are rough estimates. They depend strongly on the assumed infectious period or generation interval. Early incidence data are often affected by testing, importation, reporting delays and surveillance changes.

12 Simple renewal-model estimate of time-varying R

We now estimate a simple instantaneous reproduction number using a discretised serial interval distribution.

incidence <- covid_aus$cases
n <- length(incidence)

# Discrete serial interval: mean approximately 5 days, sd approximately 2 days
si_days <- 1:15
si_weights <- dgamma(si_days, shape = (5/2)^2, scale = 2^2/5)
si_weights <- si_weights / sum(si_weights)

infectiousness <- rep(NA_real_, n)
Rt_simple <- rep(NA_real_, n)

for (t in 2:n) {
  max_lag <- min(t - 1, length(si_weights))
  infectiousness[t] <- sum(incidence[(t - max_lag):(t - 1)] * si_weights[1:max_lag])
  if (!is.na(infectiousness[t]) && infectiousness[t] > 0) {
    Rt_simple[t] <- incidence[t] / infectiousness[t]
  }
}

rt_dat <- data.frame(date = covid_aus$date, incidence = incidence, infectiousness = infectiousness, Rt = Rt_simple)
rt_dat

plot(rt_dat$date, rt_dat$Rt, type = "b", pch = 16, xlab = "Date", ylab = "Simple estimated Rt", ylim = c(0, max(rt_dat$Rt, na.rm = TRUE)))
abline(h = 1, lty = 2)

This simple estimator is intentionally transparent. For publication-quality analysis, use a formal framework with uncertainty intervals, reporting-delay adjustment, imported/local case distinction and sensitivity analysis.

13 Parameter estimation for an SIR model

Suppose we observe daily incidence and want to estimate \(\beta\) and \(\gamma\). We can fit a deterministic model by minimising the squared difference between observed and predicted incidence.

The predicted new infections in the SIR model are:

\[ \text{new infections}_t = \beta \frac{S_t I_t}{N}. \]

true_beta <- 0.32
true_gamma <- 0.10
N <- 10000
sir_true <- simulate_sir(N = N, I0 = 10, beta = true_beta, gamma = true_gamma, days = 80)

observed <- data.frame(
  day = sir_true$day[-1],
  cases = pmax(0, round(diff(N - sir_true$S) + rnorm(80, mean = 0, sd = 8)))
)

plot(observed$day, observed$cases, pch = 16, xlab = "Day", ylab = "Observed incidence")

sir_incidence <- function(beta, gamma, N = 10000, I0 = 10, days = 80) {
  out <- simulate_sir(N = N, I0 = I0, beta = beta, gamma = gamma, days = days)
  diff(N - out$S)
}

objective <- function(par) {
  beta <- exp(par[1])
  gamma <- exp(par[2])
  pred <- sir_incidence(beta, gamma, N = N, I0 = 10, days = 80)
  sum((observed$cases - pred)^2)
}

fit <- optim(par = log(c(beta = 0.2, gamma = 0.2)), fn = objective, method = "Nelder-Mead")
est <- exp(fit$par)
est

##       beta      gamma 
## 0.31730649 0.09727082

est["beta"] / est["gamma"]

##     beta 
## 3.262093

fitted_inc <- sir_incidence(est["beta"], est["gamma"], N = N, I0 = 10, days = 80)
plot(observed$day, observed$cases, pch = 16, xlab = "Day", ylab = "Daily cases")
lines(observed$day, fitted_inc, lwd = 2)
legend("topright", legend = c("Observed", "Fitted deterministic SIR"), pch = c(16, NA), lty = c(NA, 1), lwd = c(NA, 2), bty = "n")

14 Likelihood-based estimation

Squared-error fitting is simple but not always statistically appropriate. Infectious disease case counts are non-negative integers, so Poisson or negative binomial likelihoods are common.

For a Poisson observation model:

\[ Y_t \sim \text{Poisson}(\mu_t), \]

where \(\mu_t\) is predicted incidence from the model.

neg_log_lik <- function(par) {
  beta <- exp(par[1])
  gamma <- exp(par[2])
  mu <- sir_incidence(beta, gamma, N = N, I0 = 10, days = 80)
  mu <- pmax(mu, 1e-8)
  -sum(dpois(observed$cases, lambda = mu, log = TRUE))
}

fit_pois <- optim(par = log(c(beta = 0.25, gamma = 0.12)), fn = neg_log_lik, method = "Nelder-Mead")
est_pois <- exp(fit_pois$par)
est_pois

##      beta     gamma 
## 0.3196818 0.1001534

est_pois["beta"] / est_pois["gamma"]

##     beta 
## 3.191923

15 Stochastic epidemic simulation

A simple stochastic SIR model can be simulated using binomial transitions:

\[ \text{new infections}_t \sim \text{Binomial}\left(S_t, 1 - e^{-\beta I_t/N}\right), \]

\[ \text{new recoveries}_t \sim \text{Binomial}\left(I_t, 1 - e^{-\gamma}\right). \]

simulate_stochastic_sir <- function(N = 1000, I0 = 1, beta = 0.3, gamma = 0.1, days = 160) {
  S <- I <- R <- numeric(days + 1)
  S[1] <- N - I0
  I[1] <- I0
  R[1] <- 0
  
  for (t in 1:days) {
    p_inf <- 1 - exp(-beta * I[t] / N)
    p_rec <- 1 - exp(-gamma)
    new_inf <- rbinom(1, S[t], p_inf)
    new_rec <- rbinom(1, I[t], p_rec)
    S[t + 1] <- S[t] - new_inf
    I[t + 1] <- I[t] + new_inf - new_rec
    R[t + 1] <- R[t] + new_rec
  }
  
  data.frame(day = 0:days, S = S, I = I, R = R)
}

stoch1 <- simulate_stochastic_sir(beta = 0.3, gamma = 0.1)
plot(stoch1$day, stoch1$I, type = "l", lwd = 2, xlab = "Day", ylab = "Infectious individuals")

nsim <- 50
mat_I <- matrix(NA_real_, nrow = 161, ncol = nsim)
for (i in 1:nsim) {
  mat_I[, i] <- simulate_stochastic_sir(beta = 0.3, gamma = 0.1)$I
}

plot(0:160, mat_I[, 1], type = "l", ylim = range(mat_I), xlab = "Day", ylab = "Infectious individuals")
for (i in 2:nsim) lines(0:160, mat_I[, i])

16 Sensitivity analysis

Sensitivity analysis asks: how much do model outputs change when parameters change?

Common outputs include:

peak infection size;
timing of the peak;
final epidemic size;
cumulative incidence;
probability of outbreak extinction;
time when \(R_t < 1\).

16.1 One-way sensitivity analysis

beta_values <- seq(0.15, 0.50, by = 0.025)
sens <- data.frame(beta = beta_values, peak_I = NA_real_, peak_day = NA_real_, final_size = NA_real_)

for (i in seq_along(beta_values)) {
  out <- simulate_sir(N = 1000, I0 = 1, beta = beta_values[i], gamma = 0.1, days = 160)
  sens$peak_I[i] <- max(out$I)
  sens$peak_day[i] <- out$day[which.max(out$I)]
  sens$final_size[i] <- max(out$R)
}

sens

plot(sens$beta, sens$peak_I, type = "b", pch = 16, xlab = "Transmission rate beta", ylab = "Peak infectious population")

16.2 Two-way sensitivity analysis

beta_grid <- seq(0.15, 0.45, by = 0.05)
gamma_grid <- seq(0.05, 0.20, by = 0.025)
res <- expand.grid(beta = beta_grid, gamma = gamma_grid)
res$R0 <- res$beta / res$gamma
res$final_size <- NA_real_

for (i in seq_len(nrow(res))) {
  out <- simulate_sir(N = 1000, I0 = 1, beta = res$beta[i], gamma = res$gamma[i], days = 160)
  res$final_size[i] <- max(out$R)
}

head(res)

plot(res$R0, res$final_size, pch = 16, xlab = "Basic reproduction number", ylab = "Final epidemic size")

17 Intervention modelling

Interventions can be represented by reducing transmission:

\[ \beta_{new} = \beta(1 - c), \]

where \(c\) is intervention effectiveness.

compare_intervention <- function(c_effect) {
  beta0 <- 0.30
  beta_new <- beta0 * (1 - c_effect)
  out <- simulate_sir(N = 1000, I0 = 5, beta = beta_new, gamma = 0.10, days = 160)
  out$intervention <- paste0(round(c_effect * 100), "% reduction")
  out
}

int0 <- compare_intervention(0)
int30 <- compare_intervention(0.30)
int60 <- compare_intervention(0.60)

plot(int0$day, int0$I, type = "l", lwd = 2, ylim = c(0, max(int0$I)), xlab = "Day", ylab = "Infectious individuals")
lines(int30$day, int30$I, lwd = 2, lty = 2)
lines(int60$day, int60$I, lwd = 2, lty = 3)
legend("topright", legend = c("No reduction", "30% reduction", "60% reduction"), lwd = 2, lty = c(1, 2, 3), bty = "n")

18 Vaccination and herd immunity threshold

For a perfect vaccine in a homogeneous population, the critical vaccination threshold is:

\[ p_c = 1 - \frac{1}{R_0}. \]

If vaccine effectiveness is \(VE\), the required coverage is approximately:

\[ p_c = \frac{1 - 1/R_0}{VE}. \]

R0_values <- seq(1.1, 5, by = 0.1)
VE <- 0.90
threshold <- (1 - 1 / R0_values) / VE
threshold <- pmin(threshold, 1)

plot(R0_values, threshold, type = "l", lwd = 2, xlab = "R0", ylab = "Required vaccine coverage")

19 Advanced modelling extensions

19.1 Overdispersion and superspreading

Many diseases show overdispersion: a small proportion of cases generate a large proportion of secondary infections. This can be represented by a negative binomial offspring distribution.

If the mean number of secondary infections is \(R\) and the dispersion parameter is \(k\), then smaller \(k\) means stronger heterogeneity.

R_mean <- 2.5
k_values <- c(0.1, 0.5, 2, 10)
x <- 0:20
plot(x, dnbinom(x, size = k_values[1], mu = R_mean), type = "h", lwd = 4, ylim = c(0, 0.75), xlab = "Secondary cases", ylab = "Probability")
for (k in k_values[-1]) {
  lines(x + runif(1, -0.05, 0.05), dnbinom(x, size = k, mu = R_mean), type = "h", lwd = 2)
}
legend("topright", legend = paste("k =", k_values), lwd = 2, bty = "n")

19.2 Under-reporting

Reported cases are not always equal to true infections. If \(\rho\) is reporting probability:

\[ Y_t \sim \text{Binomial}(I_t, \rho). \]

Under-reporting affects estimated incidence, growth rates, severity ratios and reproduction numbers, especially if reporting changes over time.

19.3 Reporting delays

Observed reports may lag behind infection or symptom onset:

\[ Y_t = \sum_{d=0}^{D} X_{t-d} q_d, \]

where \(q_d\) is the delay distribution.

Delay correction is important for real-time estimation because recent days are usually incomplete.

19.4 Spatial transmission

Spatial models include location-specific risk:

\[ \log(\mu_{it}) = \alpha + u_i + v_i + \gamma_t + \delta_{it} + \log(Pop_i), \]

where:

\(u_i\): spatially structured random effect;
\(v_i\): unstructured heterogeneity;
\(\gamma_t\): temporal trend;
\(\delta_{it}\): space-time interaction.

These models are useful for dengue, tuberculosis, COVID-19, influenza, Ebola and vector-borne diseases.

20 Model validation and diagnostics

Good modelling practice includes:

checking whether predictions reproduce observed incidence;
comparing alternative model structures;
assessing residuals;
testing sensitivity to priors and assumptions;
reporting uncertainty intervals;
validating against held-out data where possible;
documenting data cleaning and parameter sources;
avoiding overinterpretation.

21 Common mistakes in infectious disease modelling

Treating reported cases as true infections without discussion.
Ignoring changes in testing and surveillance.
Using \(R_0\) when \(R_t\) is more relevant.
Estimating parameters from too few data points.
Ignoring imported cases in early outbreaks.
Assuming homogeneous mixing when contact patterns are highly structured.
Fitting complex models without identifiability checks.
Reporting point estimates without uncertainty.
Confusing incubation period, latent period, generation interval and serial interval.

22 Practical workflow for a modelling project

Define the public health question.
Understand disease biology and transmission pathway.
Identify the target population and timescale.
Select model type: deterministic, stochastic, statistical, agent-based or hybrid.
Assemble data: incidence, population, covariates, interventions, mobility and delays.
Define assumptions clearly.
Estimate parameters with uncertainty.
Validate and compare models.
Run scenario and sensitivity analyses.
Communicate results clearly to technical and non-technical audiences.

23 Practice questions

Explain the difference between incidence and prevalence using tuberculosis and influenza as examples.
Derive \(R_0 = \beta/\gamma\) for the basic SIR model.
Simulate an SIR model with \(R_0 = 1.5\), \(2.5\) and \(4.0\). Compare final epidemic sizes.
Estimate the exponential growth rate from an incidence dataset of your choice.
Fit an SIR model using a Poisson likelihood instead of squared error.
Conduct one-way sensitivity analysis for \(\gamma\) and interpret the result.
Explain why stochastic models are important when case numbers are low.
Describe how under-reporting may bias estimation of \(R_t\).
Explain why the serial interval distribution matters for renewal models.
Propose a model extension for spatial dengue, tuberculosis or Ebola data.

24 References and useful reading

Anderson, R. M., & May, R. M. (1991). Infectious Diseases of Humans: Dynamics and Control. Oxford University Press.
Brauer, F., Castillo-Chavez, C., & Feng, Z. (2019). Mathematical Models in Epidemiology. Springer.
Keeling, M. J., & Rohani, P. (2008). Modeling Infectious Diseases in Humans and Animals. Princeton University Press.
Cori, A., Ferguson, N. M., Fraser, C., & Cauchemez, S. (2013). A new framework and software to estimate time-varying reproduction numbers during epidemics. American Journal of Epidemiology, 178(9), 1505–1512.
Wallinga, J., & Teunis, P. (2004). Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures. American Journal of Epidemiology, 160(6), 509–516.
Fraser, C. (2007). Estimating individual and household reproduction numbers in an emerging epidemic. PLoS ONE, 2(8), e758.
Diekmann, O., Heesterbeek, J. A. P., & Roberts, M. G. (2010). The construction of next-generation matrices for compartmental epidemic models. Journal of the Royal Society Interface, 7(47), 873–885.
Our World in Data COVID-19 dataset and documentation: https://ourworldindata.org/covid-cases
EpiEstim package documentation: https://mrc-ide.github.io/EpiEstim/

25 Session information

sessionInfo()

## R version 4.4.3 (2025-02-28)
## Platform: x86_64-apple-darwin20
## Running under: macOS Sequoia 15.7.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## time zone: Australia/Sydney
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.37     R6_2.6.1          fastmap_1.2.0     xfun_0.52        
##  [5] cachem_1.1.0      knitr_1.50        htmltools_0.5.8.1 rmarkdown_2.29   
##  [9] lifecycle_1.0.4   cli_3.6.4         vctrs_0.6.5       sass_0.4.9       
## [13] jquerylib_0.1.4   compiler_4.4.3    rstudioapi_0.17.1 tools_4.4.3      
## [17] evaluate_1.0.3    bslib_0.9.0       yaml_2.3.10       rlang_1.1.5      
## [21] jsonlite_1.9.1

Introduction to Infectious Disease Modelling

From Epidemiological Concepts to Mathematical and Statistical Models

Dr Hom Nath Dhungana¹

Dr Ariful Islam²

14 June 2026

0.1 —

1 Learning objectives

2 Why infectious disease modelling matters

3 Basic epidemiological concepts

3.1 Host, pathogen and transmission

3.2 Incidence and prevalence

3.3 Force of infection

3.4 Latent period, incubation period and infectious period

4 Model classification

4.1 Deterministic models

4.2 Stochastic models

4.3 Compartmental models

5 The SI model

6 The SIR model

7 The SEIR model

8 Reproduction numbers

8.1 Basic reproduction number

8.2 Effective reproduction number

8.3 Time-varying reproduction number from incidence data

9 Real-data example: early COVID-19 incidence in Australia

10 Estimating exponential growth rate

11 Approximate reproduction number from growth rate

12 Simple renewal-model estimate of time-varying R

13 Parameter estimation for an SIR model

14 Likelihood-based estimation

15 Stochastic epidemic simulation

16 Sensitivity analysis

16.1 One-way sensitivity analysis

16.2 Two-way sensitivity analysis

17 Intervention modelling

18 Vaccination and herd immunity threshold

19 Advanced modelling extensions

19.1 Overdispersion and superspreading

19.2 Under-reporting

19.3 Reporting delays

19.4 Spatial transmission

20 Model validation and diagnostics

21 Common mistakes in infectious disease modelling

22 Practical workflow for a modelling project

23 Practice questions

24 References and useful reading

25 Session information