Affiliations

¹ Charles Sturt University, Sydney, Australia

² Gulabali Research Centre, Charles Sturt University, Wagga Wagga, Australia

Corresponding Author: Dr Hom Nath Dhungana

Email:

0.1

1 Learning objectives

By the end of these notes, you should be able to:

  1. explain the core epidemiological concepts used in infectious disease modelling;
  2. distinguish deterministic, stochastic, individual-level, population-level, compartmental and renewal models;
  3. formulate and simulate basic SI, SIS, SIR and SEIR models;
  4. estimate simple transmission parameters from incidence data;
  5. compute and interpret the basic reproduction number, effective reproduction number and time-varying reproduction number;
  6. conduct basic sensitivity analysis for epidemiological parameters;
  7. understand how uncertainty, under-reporting, delay and overdispersion affect infectious disease inference.

2 Why infectious disease modelling matters

Infectious disease modelling translates biological, epidemiological and behavioural knowledge into mathematical or statistical structures. These models help researchers and public health decision-makers answer questions such as:

Models are not perfect predictions. They are structured ways of reasoning under uncertainty. Their usefulness depends on the quality of data, assumptions, parameter values, model structure and interpretation.

3 Basic epidemiological concepts

3.1 Host, pathogen and transmission

An infectious disease system usually involves:

  • host: the human, animal or vector that can be infected;
  • pathogen: virus, bacterium, parasite or fungus causing infection;
  • reservoir: environment or population where the pathogen persists;
  • transmission route: direct contact, droplet, airborne, vector-borne, sexual, foodborne, waterborne or environmental;
  • susceptible population: individuals who can acquire infection;
  • infectious population: individuals who can transmit the pathogen.

3.2 Incidence and prevalence

Incidence is the number of new cases occurring during a specified time interval.
Prevalence is the number or proportion of existing cases at a particular point or period.

For modelling acute infections, incidence is often more informative because it describes the timing of new infections. For chronic infections such as tuberculosis or HIV, both incidence and prevalence may be important.

3.3 Force of infection

The force of infection, usually denoted by \(\lambda(t)\), is the instantaneous rate at which susceptible individuals become infected.

In a basic frequency-dependent model:

\[ \lambda(t) = \beta \frac{I(t)}{N}, \]

where:

  • \(\beta\) is the effective transmission rate;
  • \(I(t)\) is the number infectious at time \(t\);
  • \(N\) is the total population size.

3.4 Latent period, incubation period and infectious period

  • Latent period: time from infection to becoming infectious.
  • Incubation period: time from infection to symptom onset.
  • Infectious period: duration during which an infected person can transmit.
  • Generation interval: time between infection of a primary case and infection of a secondary case.
  • Serial interval: time between symptom onset in a primary case and symptom onset in a secondary case.

The generation interval is biologically central but often unobserved. The serial interval is easier to observe and is widely used in reproduction-number estimation.

4 Model classification

4.1 Deterministic models

Deterministic models produce the same output every time for the same initial conditions and parameter values. They are usually written as ordinary differential equations.

Advantages:

  • easy to interpret;
  • computationally efficient;
  • useful for average epidemic dynamics.

Limitations:

  • do not represent random variation directly;
  • less appropriate when case numbers are very small;
  • can hide outbreak extinction or superspreading effects.

4.2 Stochastic models

Stochastic models include random variation. They are useful when transmission is highly variable or when the number of infections is small.

Advantages:

  • represent chance extinction;
  • capture outbreak variability;
  • useful for uncertainty analysis and simulation.

Limitations:

  • computationally heavier;
  • require repeated simulation;
  • harder to interpret than deterministic models.

4.3 Compartmental models

Compartmental models divide the population into epidemiological states, for example:

  • \(S\): susceptible;
  • \(E\): exposed but not yet infectious;
  • \(I\): infectious;
  • \(R\): removed, recovered or immune;
  • \(D\): dead;
  • \(V\): vaccinated.

Common models include SI, SIS, SIR, SEIR and SEIRD.

5 The SI model

The SI model is the simplest infection model. Individuals move from susceptible to infected.

\[ \frac{dS}{dt} = -\beta \frac{SI}{N}, \]

\[ \frac{dI}{dt} = \beta \frac{SI}{N}. \]

This model may be useful for early growth or infections without recovery over the period of interest, but it is usually too simple for most real epidemics.

N <- 1000
beta <- 0.35
days <- 100
S <- I <- numeric(days + 1)
S[1] <- 999
I[1] <- 1

dt <- 1
for (t in 1:days) {
  new_inf <- beta * S[t] * I[t] / N * dt
  S[t + 1] <- S[t] - new_inf
  I[t + 1] <- I[t] + new_inf
}

plot(0:days, S, type = "l", lwd = 2, ylim = c(0, N), xlab = "Day", ylab = "Number of people")
lines(0:days, I, lwd = 2, lty = 2)
legend("right", legend = c("Susceptible", "Infected"), lwd = 2, lty = c(1, 2), bty = "n")

6 The SIR model

The SIR model is the classic model for acute immunising infections.

\[ \frac{dS}{dt} = -\beta \frac{SI}{N}, \]

\[ \frac{dI}{dt} = \beta \frac{SI}{N} - \gamma I, \]

\[ \frac{dR}{dt} = \gamma I. \]

where:

The basic reproduction number is:

\[ R_0 = \frac{\beta}{\gamma}. \]

If \(R_0 > 1\), the epidemic can grow in a fully susceptible population. If \(R_0 < 1\), sustained spread is unlikely.

simulate_sir <- function(N = 1000, I0 = 1, R0_init = 0, beta = 0.3, gamma = 0.1, days = 160, dt = 1) {
  steps <- days / dt
  S <- I <- R <- numeric(steps + 1)
  S[1] <- N - I0 - R0_init
  I[1] <- I0
  R[1] <- R0_init
  
  for (t in 1:steps) {
    new_inf <- beta * S[t] * I[t] / N * dt
    new_rec <- gamma * I[t] * dt
    S[t + 1] <- S[t] - new_inf
    I[t + 1] <- I[t] + new_inf - new_rec
    R[t + 1] <- R[t] + new_rec
  }
  
  data.frame(day = seq(0, days, by = dt), S = S, I = I, R = R)
}

sir_out <- simulate_sir(beta = 0.30, gamma = 0.10)
head(sir_out)
plot(sir_out$day, sir_out$S, type = "l", lwd = 2, ylim = c(0, 1000), xlab = "Day", ylab = "Number of people")
lines(sir_out$day, sir_out$I, lwd = 2, lty = 2)
lines(sir_out$day, sir_out$R, lwd = 2, lty = 3)
legend("right", legend = c("S", "I", "R"), lwd = 2, lty = c(1, 2, 3), bty = "n")

7 The SEIR model

The SEIR model includes an exposed compartment.

\[ \frac{dS}{dt} = -\beta \frac{SI}{N}, \]

\[ \frac{dE}{dt} = \beta \frac{SI}{N} - \sigma E, \]

\[ \frac{dI}{dt} = \sigma E - \gamma I, \]

\[ \frac{dR}{dt} = \gamma I. \]

where:

simulate_seir <- function(N = 1000, E0 = 0, I0 = 1, R0_init = 0, beta = 0.4, sigma = 1/4, gamma = 1/6, days = 160, dt = 1) {
  steps <- days / dt
  S <- E <- I <- R <- numeric(steps + 1)
  S[1] <- N - E0 - I0 - R0_init
  E[1] <- E0
  I[1] <- I0
  R[1] <- R0_init
  
  for (t in 1:steps) {
    new_exp <- beta * S[t] * I[t] / N * dt
    new_inf <- sigma * E[t] * dt
    new_rec <- gamma * I[t] * dt
    S[t + 1] <- S[t] - new_exp
    E[t + 1] <- E[t] + new_exp - new_inf
    I[t + 1] <- I[t] + new_inf - new_rec
    R[t + 1] <- R[t] + new_rec
  }
  
  data.frame(day = seq(0, days, by = dt), S = S, E = E, I = I, R = R)
}

seir_out <- simulate_seir()

plot(seir_out$day, seir_out$S, type = "l", lwd = 2, ylim = c(0, 1000), xlab = "Day", ylab = "Number of people")
lines(seir_out$day, seir_out$E, lwd = 2, lty = 2)
lines(seir_out$day, seir_out$I, lwd = 2, lty = 3)
lines(seir_out$day, seir_out$R, lwd = 2, lty = 4)
legend("right", legend = c("S", "E", "I", "R"), lwd = 2, lty = c(1, 2, 3, 4), bty = "n")

8 Reproduction numbers

8.1 Basic reproduction number

The basic reproduction number \(R_0\) is the expected number of secondary infections generated by one infectious individual in a fully susceptible population.

For a simple SIR model:

\[ R_0 = \frac{\beta}{\gamma}. \]

8.2 Effective reproduction number

The effective reproduction number \(R_t\) accounts for depletion of susceptible individuals and interventions.

For a homogeneous SIR model:

\[ R_t = R_0 \frac{S(t)}{N}. \]

sir_out$Rt <- (0.30 / 0.10) * sir_out$S / 1000
plot(sir_out$day, sir_out$Rt, type = "l", lwd = 2, xlab = "Day", ylab = "Effective reproduction number")
abline(h = 1, lty = 2)

8.3 Time-varying reproduction number from incidence data

In practice, \(R_t\) can be estimated from incidence data and a serial interval distribution. A common renewal-equation approximation is:

\[ I_t \sim \text{Poisson}(R_t \Lambda_t), \]

where:

\[ \Lambda_t = \sum_{s=1}^{t} I_{t-s} w_s, \]

and \(w_s\) is the serial interval distribution.

The Cori et al. framework estimates the instantaneous reproduction number using incidence and serial interval assumptions. The EpiEstim R package implements this approach, but below we use a simple base-R version for teaching.

9 Real-data example: early COVID-19 incidence in Australia

The following small dataset contains daily confirmed COVID-19 cases for Australia during March 2020. It is embedded directly so that the R Markdown file knits without downloading data. In applied work, always use the most recent official data source, check definitions carefully and document reporting delays.

covid_aus <- data.frame(
  date = as.Date(c(
    "2020-03-01", "2020-03-02", "2020-03-03", "2020-03-04", "2020-03-05",
    "2020-03-06", "2020-03-07", "2020-03-08", "2020-03-09", "2020-03-10",
    "2020-03-11", "2020-03-12", "2020-03-13", "2020-03-14", "2020-03-15",
    "2020-03-16", "2020-03-17", "2020-03-18", "2020-03-19", "2020-03-20",
    "2020-03-21"
  )),
  cases = c(3, 4, 8, 12, 9, 13, 12, 15, 20, 16, 27, 32, 46, 37, 52, 57, 77, 111, 144, 147, 224)
)

covid_aus$cumulative <- cumsum(covid_aus$cases)
covid_aus
plot(covid_aus$date, covid_aus$cases, type = "h", lwd = 5, xlab = "Date", ylab = "Daily confirmed cases")
points(covid_aus$date, covid_aus$cases, pch = 16)

10 Estimating exponential growth rate

During the early phase of an epidemic, susceptible depletion is small. Incidence may grow approximately exponentially:

\[ I_t = I_0 e^{rt}, \]

where \(r\) is the exponential growth rate.

Taking logs:

\[ \log(I_t) = \log(I_0) + rt. \]

dat <- covid_aus
 dat$t <- seq_len(nrow(dat)) - 1
fit_growth <- lm(log(cases) ~ t, data = dat)
summary(fit_growth)
## 
## Call:
## lm(formula = log(cases) ~ t, data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.34294 -0.10887 -0.01426  0.14636  0.50288 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.415279   0.090882   15.57 2.84e-12 ***
## t           0.188917   0.007774   24.30 8.99e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2157 on 19 degrees of freedom
## Multiple R-squared:  0.9688, Adjusted R-squared:  0.9672 
## F-statistic: 590.5 on 1 and 19 DF,  p-value: 8.989e-16
r_hat <- coef(fit_growth)["t"]
doubling_time <- log(2) / r_hat
r_hat
##         t 
## 0.1889167
doubling_time
##        t 
## 3.669063
pred_cases <- exp(predict(fit_growth))
plot(dat$date, dat$cases, pch = 16, xlab = "Date", ylab = "Daily cases")
lines(dat$date, pred_cases, lwd = 2)

11 Approximate reproduction number from growth rate

If the infectious period is exponentially distributed with recovery rate \(\gamma\), a simple approximation is:

\[ R_0 \approx 1 + \frac{r}{\gamma}. \]

For a fixed generation interval \(T_g\), another approximation is:

\[ R_0 \approx e^{rT_g}. \]

gamma <- 1/5
Tg <- 5
R0_exp_infectious <- 1 + r_hat / gamma
R0_fixed_generation <- exp(r_hat * Tg)

c(
  growth_rate = r_hat,
  doubling_time_days = doubling_time,
  R0_exponential_infectious_period = R0_exp_infectious,
  R0_fixed_generation_interval = R0_fixed_generation
)
##                      growth_rate.t               doubling_time_days.t 
##                          0.1889167                          3.6690632 
## R0_exponential_infectious_period.t     R0_fixed_generation_interval.t 
##                          1.9445833                          2.5717416

Interpretation: these are rough estimates. They depend strongly on the assumed infectious period or generation interval. Early incidence data are often affected by testing, importation, reporting delays and surveillance changes.

12 Simple renewal-model estimate of time-varying R

We now estimate a simple instantaneous reproduction number using a discretised serial interval distribution.

incidence <- covid_aus$cases
n <- length(incidence)

# Discrete serial interval: mean approximately 5 days, sd approximately 2 days
si_days <- 1:15
si_weights <- dgamma(si_days, shape = (5/2)^2, scale = 2^2/5)
si_weights <- si_weights / sum(si_weights)

infectiousness <- rep(NA_real_, n)
Rt_simple <- rep(NA_real_, n)

for (t in 2:n) {
  max_lag <- min(t - 1, length(si_weights))
  infectiousness[t] <- sum(incidence[(t - max_lag):(t - 1)] * si_weights[1:max_lag])
  if (!is.na(infectiousness[t]) && infectiousness[t] > 0) {
    Rt_simple[t] <- incidence[t] / infectiousness[t]
  }
}

rt_dat <- data.frame(date = covid_aus$date, incidence = incidence, infectiousness = infectiousness, Rt = Rt_simple)
rt_dat
plot(rt_dat$date, rt_dat$Rt, type = "b", pch = 16, xlab = "Date", ylab = "Simple estimated Rt", ylim = c(0, max(rt_dat$Rt, na.rm = TRUE)))
abline(h = 1, lty = 2)

This simple estimator is intentionally transparent. For publication-quality analysis, use a formal framework with uncertainty intervals, reporting-delay adjustment, imported/local case distinction and sensitivity analysis.

13 Parameter estimation for an SIR model

Suppose we observe daily incidence and want to estimate \(\beta\) and \(\gamma\). We can fit a deterministic model by minimising the squared difference between observed and predicted incidence.

The predicted new infections in the SIR model are:

\[ \text{new infections}_t = \beta \frac{S_t I_t}{N}. \]

true_beta <- 0.32
true_gamma <- 0.10
N <- 10000
sir_true <- simulate_sir(N = N, I0 = 10, beta = true_beta, gamma = true_gamma, days = 80)

observed <- data.frame(
  day = sir_true$day[-1],
  cases = pmax(0, round(diff(N - sir_true$S) + rnorm(80, mean = 0, sd = 8)))
)

plot(observed$day, observed$cases, pch = 16, xlab = "Day", ylab = "Observed incidence")

sir_incidence <- function(beta, gamma, N = 10000, I0 = 10, days = 80) {
  out <- simulate_sir(N = N, I0 = I0, beta = beta, gamma = gamma, days = days)
  diff(N - out$S)
}

objective <- function(par) {
  beta <- exp(par[1])
  gamma <- exp(par[2])
  pred <- sir_incidence(beta, gamma, N = N, I0 = 10, days = 80)
  sum((observed$cases - pred)^2)
}

fit <- optim(par = log(c(beta = 0.2, gamma = 0.2)), fn = objective, method = "Nelder-Mead")
est <- exp(fit$par)
est
##       beta      gamma 
## 0.31730649 0.09727082
est["beta"] / est["gamma"]
##     beta 
## 3.262093
fitted_inc <- sir_incidence(est["beta"], est["gamma"], N = N, I0 = 10, days = 80)
plot(observed$day, observed$cases, pch = 16, xlab = "Day", ylab = "Daily cases")
lines(observed$day, fitted_inc, lwd = 2)
legend("topright", legend = c("Observed", "Fitted deterministic SIR"), pch = c(16, NA), lty = c(NA, 1), lwd = c(NA, 2), bty = "n")

14 Likelihood-based estimation

Squared-error fitting is simple but not always statistically appropriate. Infectious disease case counts are non-negative integers, so Poisson or negative binomial likelihoods are common.

For a Poisson observation model:

\[ Y_t \sim \text{Poisson}(\mu_t), \]

where \(\mu_t\) is predicted incidence from the model.

neg_log_lik <- function(par) {
  beta <- exp(par[1])
  gamma <- exp(par[2])
  mu <- sir_incidence(beta, gamma, N = N, I0 = 10, days = 80)
  mu <- pmax(mu, 1e-8)
  -sum(dpois(observed$cases, lambda = mu, log = TRUE))
}

fit_pois <- optim(par = log(c(beta = 0.25, gamma = 0.12)), fn = neg_log_lik, method = "Nelder-Mead")
est_pois <- exp(fit_pois$par)
est_pois
##      beta     gamma 
## 0.3196818 0.1001534
est_pois["beta"] / est_pois["gamma"]
##     beta 
## 3.191923

15 Stochastic epidemic simulation

A simple stochastic SIR model can be simulated using binomial transitions:

\[ \text{new infections}_t \sim \text{Binomial}\left(S_t, 1 - e^{-\beta I_t/N}\right), \]

\[ \text{new recoveries}_t \sim \text{Binomial}\left(I_t, 1 - e^{-\gamma}\right). \]

simulate_stochastic_sir <- function(N = 1000, I0 = 1, beta = 0.3, gamma = 0.1, days = 160) {
  S <- I <- R <- numeric(days + 1)
  S[1] <- N - I0
  I[1] <- I0
  R[1] <- 0
  
  for (t in 1:days) {
    p_inf <- 1 - exp(-beta * I[t] / N)
    p_rec <- 1 - exp(-gamma)
    new_inf <- rbinom(1, S[t], p_inf)
    new_rec <- rbinom(1, I[t], p_rec)
    S[t + 1] <- S[t] - new_inf
    I[t + 1] <- I[t] + new_inf - new_rec
    R[t + 1] <- R[t] + new_rec
  }
  
  data.frame(day = 0:days, S = S, I = I, R = R)
}

stoch1 <- simulate_stochastic_sir(beta = 0.3, gamma = 0.1)
plot(stoch1$day, stoch1$I, type = "l", lwd = 2, xlab = "Day", ylab = "Infectious individuals")

nsim <- 50
mat_I <- matrix(NA_real_, nrow = 161, ncol = nsim)
for (i in 1:nsim) {
  mat_I[, i] <- simulate_stochastic_sir(beta = 0.3, gamma = 0.1)$I
}

plot(0:160, mat_I[, 1], type = "l", ylim = range(mat_I), xlab = "Day", ylab = "Infectious individuals")
for (i in 2:nsim) lines(0:160, mat_I[, i])

16 Sensitivity analysis

Sensitivity analysis asks: how much do model outputs change when parameters change?

Common outputs include:

16.1 One-way sensitivity analysis

beta_values <- seq(0.15, 0.50, by = 0.025)
sens <- data.frame(beta = beta_values, peak_I = NA_real_, peak_day = NA_real_, final_size = NA_real_)

for (i in seq_along(beta_values)) {
  out <- simulate_sir(N = 1000, I0 = 1, beta = beta_values[i], gamma = 0.1, days = 160)
  sens$peak_I[i] <- max(out$I)
  sens$peak_day[i] <- out$day[which.max(out$I)]
  sens$final_size[i] <- max(out$R)
}

sens
plot(sens$beta, sens$peak_I, type = "b", pch = 16, xlab = "Transmission rate beta", ylab = "Peak infectious population")

16.2 Two-way sensitivity analysis

beta_grid <- seq(0.15, 0.45, by = 0.05)
gamma_grid <- seq(0.05, 0.20, by = 0.025)
res <- expand.grid(beta = beta_grid, gamma = gamma_grid)
res$R0 <- res$beta / res$gamma
res$final_size <- NA_real_

for (i in seq_len(nrow(res))) {
  out <- simulate_sir(N = 1000, I0 = 1, beta = res$beta[i], gamma = res$gamma[i], days = 160)
  res$final_size[i] <- max(out$R)
}

head(res)
plot(res$R0, res$final_size, pch = 16, xlab = "Basic reproduction number", ylab = "Final epidemic size")

17 Intervention modelling

Interventions can be represented by reducing transmission:

\[ \beta_{new} = \beta(1 - c), \]

where \(c\) is intervention effectiveness.

compare_intervention <- function(c_effect) {
  beta0 <- 0.30
  beta_new <- beta0 * (1 - c_effect)
  out <- simulate_sir(N = 1000, I0 = 5, beta = beta_new, gamma = 0.10, days = 160)
  out$intervention <- paste0(round(c_effect * 100), "% reduction")
  out
}

int0 <- compare_intervention(0)
int30 <- compare_intervention(0.30)
int60 <- compare_intervention(0.60)

plot(int0$day, int0$I, type = "l", lwd = 2, ylim = c(0, max(int0$I)), xlab = "Day", ylab = "Infectious individuals")
lines(int30$day, int30$I, lwd = 2, lty = 2)
lines(int60$day, int60$I, lwd = 2, lty = 3)
legend("topright", legend = c("No reduction", "30% reduction", "60% reduction"), lwd = 2, lty = c(1, 2, 3), bty = "n")

18 Vaccination and herd immunity threshold

For a perfect vaccine in a homogeneous population, the critical vaccination threshold is:

\[ p_c = 1 - \frac{1}{R_0}. \]

If vaccine effectiveness is \(VE\), the required coverage is approximately:

\[ p_c = \frac{1 - 1/R_0}{VE}. \]

R0_values <- seq(1.1, 5, by = 0.1)
VE <- 0.90
threshold <- (1 - 1 / R0_values) / VE
threshold <- pmin(threshold, 1)

plot(R0_values, threshold, type = "l", lwd = 2, xlab = "R0", ylab = "Required vaccine coverage")

19 Advanced modelling extensions

19.1 Overdispersion and superspreading

Many diseases show overdispersion: a small proportion of cases generate a large proportion of secondary infections. This can be represented by a negative binomial offspring distribution.

If the mean number of secondary infections is \(R\) and the dispersion parameter is \(k\), then smaller \(k\) means stronger heterogeneity.

R_mean <- 2.5
k_values <- c(0.1, 0.5, 2, 10)
x <- 0:20
plot(x, dnbinom(x, size = k_values[1], mu = R_mean), type = "h", lwd = 4, ylim = c(0, 0.75), xlab = "Secondary cases", ylab = "Probability")
for (k in k_values[-1]) {
  lines(x + runif(1, -0.05, 0.05), dnbinom(x, size = k, mu = R_mean), type = "h", lwd = 2)
}
legend("topright", legend = paste("k =", k_values), lwd = 2, bty = "n")

19.2 Under-reporting

Reported cases are not always equal to true infections. If \(\rho\) is reporting probability:

\[ Y_t \sim \text{Binomial}(I_t, \rho). \]

Under-reporting affects estimated incidence, growth rates, severity ratios and reproduction numbers, especially if reporting changes over time.

19.3 Reporting delays

Observed reports may lag behind infection or symptom onset:

\[ Y_t = \sum_{d=0}^{D} X_{t-d} q_d, \]

where \(q_d\) is the delay distribution.

Delay correction is important for real-time estimation because recent days are usually incomplete.

19.4 Spatial transmission

Spatial models include location-specific risk:

\[ \log(\mu_{it}) = \alpha + u_i + v_i + \gamma_t + \delta_{it} + \log(Pop_i), \]

where:

  • \(u_i\): spatially structured random effect;
  • \(v_i\): unstructured heterogeneity;
  • \(\gamma_t\): temporal trend;
  • \(\delta_{it}\): space-time interaction.

These models are useful for dengue, tuberculosis, COVID-19, influenza, Ebola and vector-borne diseases.

20 Model validation and diagnostics

Good modelling practice includes:

  1. checking whether predictions reproduce observed incidence;
  2. comparing alternative model structures;
  3. assessing residuals;
  4. testing sensitivity to priors and assumptions;
  5. reporting uncertainty intervals;
  6. validating against held-out data where possible;
  7. documenting data cleaning and parameter sources;
  8. avoiding overinterpretation.

21 Common mistakes in infectious disease modelling

22 Practical workflow for a modelling project

  1. Define the public health question.
  2. Understand disease biology and transmission pathway.
  3. Identify the target population and timescale.
  4. Select model type: deterministic, stochastic, statistical, agent-based or hybrid.
  5. Assemble data: incidence, population, covariates, interventions, mobility and delays.
  6. Define assumptions clearly.
  7. Estimate parameters with uncertainty.
  8. Validate and compare models.
  9. Run scenario and sensitivity analyses.
  10. Communicate results clearly to technical and non-technical audiences.

23 Practice questions

  1. Explain the difference between incidence and prevalence using tuberculosis and influenza as examples.
  2. Derive \(R_0 = \beta/\gamma\) for the basic SIR model.
  3. Simulate an SIR model with \(R_0 = 1.5\), \(2.5\) and \(4.0\). Compare final epidemic sizes.
  4. Estimate the exponential growth rate from an incidence dataset of your choice.
  5. Fit an SIR model using a Poisson likelihood instead of squared error.
  6. Conduct one-way sensitivity analysis for \(\gamma\) and interpret the result.
  7. Explain why stochastic models are important when case numbers are low.
  8. Describe how under-reporting may bias estimation of \(R_t\).
  9. Explain why the serial interval distribution matters for renewal models.
  10. Propose a model extension for spatial dengue, tuberculosis or Ebola data.

24 References and useful reading

25 Session information

sessionInfo()
## R version 4.4.3 (2025-02-28)
## Platform: x86_64-apple-darwin20
## Running under: macOS Sequoia 15.7.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## time zone: Australia/Sydney
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.37     R6_2.6.1          fastmap_1.2.0     xfun_0.52        
##  [5] cachem_1.1.0      knitr_1.50        htmltools_0.5.8.1 rmarkdown_2.29   
##  [9] lifecycle_1.0.4   cli_3.6.4         vctrs_0.6.5       sass_0.4.9       
## [13] jquerylib_0.1.4   compiler_4.4.3    rstudioapi_0.17.1 tools_4.4.3      
## [17] evaluate_1.0.3    bslib_0.9.0       yaml_2.3.10       rlang_1.1.5      
## [21] jsonlite_1.9.1