HFRS Incubation Period Estimation — v4 (Generalized IC)

0. Version Notes

Version history

Version	File	Change	Motivation
v2	`hfrs_v2_stan.Rmd`	Original Stan model	Baseline HMC/NUTS
v3a	`hfrs_v3a_reparam.Rmd`	Reparameterize to `(mean_ip, sd_ip)`	Feedback 2: `exp(mu)` = median ≠ mean
v3b	`hfrs_v3b_full.Rmd`	Add fixed ±1 day IC likelihood	Feedback 1: calendar-day recording uncertainty
v3c	`hfrs_v3c_sensitivity.Rmd`	δ sensitivity (0.5 / 1.0 / 2.0)	Assess robustness to window width
v4	`hfrs_v4_general_ic.Rmd`	Generalized per-case `[T_lower, T_upper]`	Correct doubly IC framework; extensible to wider exposure windows

v4 change

This document extends v3b (hfrs_v3b_full.Rmd) by replacing the fixed-window interval-censored likelihood with a generalized formulation that accepts individual per-case bounds [T_lower, T_upper].

Aspect	v2	v3a	v3b	v4 (this file)
Parameters	mu, sigma	mean_ip, sd_ip	mean_ip, sd_ip	mean_ip, sd_ip
Likelihood	point	point	IC fixed δ = 1	IC individual bounds
Stan data	`vector y`	`vector y`	`array int inc`	`vector T_lower, T_upper`
Window	—	—	±1 day (global)	[T_lo, T_hi] per case
Priors	log-scale	day-scale	day-scale	day-scale

Change — Generalized doubly interval-censored likelihood: Both exposure time E and onset time O are unknown within their recorded calendar day. For an individual with exposure window [E_L, E_R] and onset window [O_L, O_R] (all recorded as calendar days), the true continuous incubation T = O − E is bounded by:

\[T \in \bigl(O_{L} - E_{R} - 1,\;\; O_{R} - E_{L} + 1\bigr)\]

The likelihood contribution is:

\[P\!\left(T \in (T^{\text{lo}},\, T^{\text{hi}}) \mid \mu, \sigma\right) = F_{\text{LN}}\!\left(T^{\text{hi}} \mid \mu, \sigma\right) - F_{\text{LN}}\!\left(T^{\text{lo}} \mid \mu, \sigma\right)\]

For Group A (single-day exposure and onset, E_L = E_R, O_L = O_R): \(T^{\text{lo}} = k - 1\), \(T^{\text{hi}} = k + 1\), recovering v3b exactly. The generalisation adds no overhead for this group but allows future extension to cases with wider exposure windows without model changes.

Feedback 2 — Reparameterization: carried over from v3a/v3b.

1. Dataset

Incubation periods were recorded from 18 laboratory workers (Group A: single-visit external visitors who attended the department on 24 Oct 1961 only) who visited the infected department exactly once, providing a single point-identified exposure date.

Source: Kulagin, Fedorova & Ketiladze (1962), Journal of Microbiology, Epidemiology and Immunology, 33(10), 121–126. https://www.cabidigitallibrary.org/doi/full/10.5555/19632701583

inc <- c(rep(17, 3), 19, rep(21, 2), rep(22, 8), 23, rep(25, 2), 27)
cat("n =", length(inc), "\n")

## n = 18

cat("Range:", min(inc), "-", max(inc), "days\n")

## Range: 17 - 27 days

cat("Mean:", round(mean(inc), 1), "  Median:", median(inc), "\n")

## Mean: 21.6   Median: 22

ggplot(data.frame(x = inc), aes(x = "", y = x)) +
  geom_boxplot(
    fill = "#009E73", alpha = 0.4, width = 0.4, outlier.shape = NA
  ) +
  geom_jitter(width = 0.08, alpha = 0.6, color = "#009E73", size = 2) +
  labs(
    title    = "Observed Incubation Periods (n = 18, Group A)",
    subtitle = paste0(
      "Single-visit visitors on Oct 24; ",
      "integer-day difference between exposure and onset"
    ),
    x = NULL, y = "days"
  ) +
  theme_minimal()

Each recorded integer k represents a true continuous incubation T ∈ (k − 1, k + 1), since both exposure and onset fall within a calendar day whose exact continuous time is unknown.

2. Maximum Likelihood Estimation (No change)

2.1 Fitting

Three parametric families are fitted via MLE with L-BFGS-B optimization. These use point observations for comparison; the interval-censored treatment is applied in Section 3.

ll_lnorm <- function(params, data) {
  mu    <- params[1]
  sigma <- params[2]
  if (sigma <= 0) return(Inf)
  -sum(dlnorm(data, meanlog = mu, sdlog = sigma, log = TRUE))
}
fit_lnorm <- optim(
  c(mean(log(inc)), sd(log(inc))), ll_lnorm, data = inc,
  method = "L-BFGS-B", lower = c(-Inf, 0.001), hessian = TRUE
)
se_lnorm <- sqrt(diag(solve(fit_lnorm$hessian)))
ci_lnorm <- cbind(
  fit_lnorm$par - 1.96 * se_lnorm,
  fit_lnorm$par + 1.96 * se_lnorm
)
aic_lnorm <- 2 * 2 + 2 * fit_lnorm$value

ll_weib <- function(params, data) {
  shape <- params[1]
  scale <- params[2]
  if (shape <= 0 || scale <= 0) return(Inf)
  -sum(dweibull(data, shape = shape, scale = scale, log = TRUE))
}
fit_weib <- optim(
  c(1, mean(inc)), ll_weib, data = inc,
  method = "L-BFGS-B", lower = c(0.001, 0.001), hessian = TRUE
)
se_weib <- sqrt(diag(solve(fit_weib$hessian)))
ci_weib <- cbind(
  fit_weib$par - 1.96 * se_weib,
  fit_weib$par + 1.96 * se_weib
)
aic_weib <- 2 * 2 + 2 * fit_weib$value

ll_gamma <- function(params, data) {
  shape <- params[1]
  rate  <- params[2]
  if (shape <= 0 || rate <= 0) return(Inf)
  -sum(dgamma(data, shape = shape, rate = rate, log = TRUE))
}
fit_gamma <- optim(
  c(1, 1 / mean(inc)), ll_gamma, data = inc,
  method = "L-BFGS-B", lower = c(0.001, 0.001), hessian = TRUE
)
se_gamma <- sqrt(diag(solve(fit_gamma$hessian)))
ci_gamma <- cbind(
  fit_gamma$par - 1.96 * se_gamma,
  fit_gamma$par + 1.96 * se_gamma
)
aic_gamma <- 2 * 2 + 2 * fit_gamma$value

2.2 Parameter Estimates

ci1_lo <- round(
  c(ci_lnorm[1, 1], ci_weib[1, 1], ci_gamma[1, 1]), 3
)
ci1_hi <- round(
  c(ci_lnorm[1, 2], ci_weib[1, 2], ci_gamma[1, 2]), 3
)
ci2_lo <- round(
  c(ci_lnorm[2, 1], ci_weib[2, 1], ci_gamma[2, 1]), 3
)
ci2_hi <- round(
  c(ci_lnorm[2, 2], ci_weib[2, 2], ci_gamma[2, 2]), 3
)
result <- data.frame(
  Distribution = c("Lognormal", "Weibull", "Gamma"),
  Parameters   = c("meanlog, sdlog", "shape, scale", "shape, rate"),
  Param1 = round(
    c(fit_lnorm$par[1], fit_weib$par[1], fit_gamma$par[1]), 4
  ),
  CI1    = paste0("[", ci1_lo, ", ", ci1_hi, "]"),
  Param2 = round(
    c(fit_lnorm$par[2], fit_weib$par[2], fit_gamma$par[2]), 4
  ),
  CI2    = paste0("[", ci2_lo, ", ", ci2_hi, "]"),
  AIC    = round(c(aic_lnorm, aic_weib, aic_gamma), 2)
)
kable(result, align = "llrrrrc",
      col.names = c(
        "Distribution", "Parameters",
        "Param 1", "95% CI", "Param 2", "95% CI", "AIC"
      ))

Distribution	Parameters	Param 1	95% CI	Param 2	95% CI	AIC
Lognormal	meanlog, sdlog	3.0628	[3.004, 3.121]	0.1264	[0.085, 0.168]	90.88
Weibull	shape, scale	8.9965	[5.89, 12.103]	22.7147	[21.482, 23.948]	90.67
Gamma	shape, rate	64.0606	[22.316, 105.805]	2.9719	[1.028, 4.916]	90.56

Interpretation: AICs fall within ΔAIC < 2 across all three distributions. Lognormal is retained as the primary model. MLE uses point observations; the Bayesian model below uses the interval-censored likelihood.

2.3 Distribution Curves (MLE)

freq <- seq(0, 50, length.out = 1000)

df_lnorm <- data.frame(
  x        = freq,
  density  = dlnorm(freq, fit_lnorm$par[1], fit_lnorm$par[2]),
  ci_lower = dlnorm(freq, ci_lnorm[1, 1],   ci_lnorm[2, 1]),
  ci_upper = dlnorm(freq, ci_lnorm[1, 2],   ci_lnorm[2, 2])
)
plnorm <- ggplot(df_lnorm, aes(x = x)) +
  geom_ribbon(
    aes(ymin = ci_lower, ymax = ci_upper),
    fill = "#009E73", alpha = 0.2
  ) +
  geom_line(aes(y = density), color = "#009E73", linewidth = 1) +
  geom_rug(
    data = data.frame(x = inc), aes(x = x),
    inherit.aes = FALSE, sides = "b", alpha = 0.6
  ) +
  labs(
    title = paste0("Lognormal  (AIC = ", round(aic_lnorm, 1), ")"),
    x = "days", y = "density"
  ) +
  theme_minimal()

df_weib <- data.frame(
  x        = freq,
  density  = dweibull(freq, fit_weib$par[1], fit_weib$par[2]),
  ci_lower = dweibull(freq, ci_weib[1, 1],   ci_weib[2, 1]),
  ci_upper = dweibull(freq, ci_weib[1, 2],   ci_weib[2, 2])
)
pweib <- ggplot(df_weib, aes(x = x)) +
  geom_ribbon(
    aes(ymin = ci_lower, ymax = ci_upper),
    fill = "#56B4E9", alpha = 0.2
  ) +
  geom_line(aes(y = density), color = "#56B4E9", linewidth = 1) +
  geom_rug(
    data = data.frame(x = inc), aes(x = x),
    inherit.aes = FALSE, sides = "b", alpha = 0.6
  ) +
  labs(
    title = paste0("Weibull  (AIC = ", round(aic_weib, 1), ")"),
    x = "days", y = "density"
  ) +
  theme_minimal()

df_gamma <- data.frame(
  x        = freq,
  density  = dgamma(freq, fit_gamma$par[1], fit_gamma$par[2]),
  ci_lower = dgamma(freq, ci_gamma[1, 1],   ci_gamma[2, 1]),
  ci_upper = dgamma(freq, ci_gamma[1, 2],   ci_gamma[2, 2])
)
pgamma_p <- ggplot(df_gamma, aes(x = x)) +
  geom_ribbon(
    aes(ymin = ci_lower, ymax = ci_upper),
    fill = "#E69F00", alpha = 0.2
  ) +
  geom_line(aes(y = density), color = "#E69F00", linewidth = 1) +
  geom_rug(
    data = data.frame(x = inc), aes(x = x),
    inherit.aes = FALSE, sides = "b", alpha = 0.6
  ) +
  labs(
    title = paste0("Gamma  (AIC = ", round(aic_gamma, 1), ")"),
    x = "days", y = "density"
  ) +
  theme_minimal()

df_combined <- data.frame(
  x = rep(freq, 3),
  density = c(
    dlnorm(freq,   fit_lnorm$par[1], fit_lnorm$par[2]),
    dweibull(freq, fit_weib$par[1],  fit_weib$par[2]),
    dgamma(freq,   fit_gamma$par[1], fit_gamma$par[2])
  ),
  Distribution = rep(
    c("Lognormal", "Weibull", "Gamma"), each = length(freq)
  )
)
pcom <- ggplot(
  df_combined, aes(x = x, y = density, color = Distribution)
) +
  geom_line(linewidth = 1) +
  geom_rug(
    data = data.frame(x = inc), aes(x = x),
    inherit.aes = FALSE, sides = "b", alpha = 0.6
  ) +
  scale_color_manual(values = c("#009E73", "#E69F00", "#56B4E9")) +
  labs(title = "Combined", x = "days", y = "density") +
  theme_minimal()

grid.arrange(plnorm, pweib, pgamma_p, pcom, ncol = 2)

2.4 Q-Q Plot (Lognormal MLE)

n     <- length(inc)
probs <- (seq_len(n) - 0.5) / n
qq_df <- data.frame(
  theoretical = qlnorm(probs, fit_lnorm$par[1], fit_lnorm$par[2]),
  observed    = sort(inc)
)
ggplot(qq_df, aes(x = theoretical, y = observed)) +
  geom_point(color = "#009E73", alpha = 0.7, size = 2) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed") +
  labs(
    title = "Q-Q Plot — Lognormal (MLE)",
    x = "Theoretical Quantiles", y = "Observed Quantiles"
  ) +
  theme_minimal()

3. Bayesian Estimation via Stan (HMC/NUTS)

3.1 Model Specification

Distribution Assumption

\[T_{i} \sim \text{LogNormal}(\mu,\, \sigma), \quad i = 1, \ldots, N\]

where \(T_{i}\) is the true continuous incubation period of individual \(i\) (days).

Why exp(μ) ≠ Mean

The lognormal distribution has two distinct location summaries:

\[\text{Median}(T) = \exp(\mu), \qquad \mathbb{E}[T] = \exp\!\left(\mu + \frac{\sigma^{2}}{2}\right)\]

Reporting \(\mu\) directly implies the median, not the mean. When \(\sigma > 0\), the two quantities diverge by a factor of \(\exp(\sigma^{2}/2)\).

Reparameterization

Let \(m = \mathbb{E}[T]\) (mean_ip) and \(s = \text{SD}(T)\) (sd_ip). The internal lognormal parameters are recovered by:

\[\mu = \log(m) - \frac{1}{2}\,\log\!\left(1 + \frac{s^{2}}{m^{2}}\right)\]

\[\sigma = \sqrt{\,\log\!\left(1 + \frac{s^{2}}{m^{2}}\right)}\]

Both \(m\) and \(s\) are in days, allowing priors and credible intervals to be stated directly in the clinically interpretable scale.

Interval-Censored Likelihood (Generalized)

For individual \(i\) with exposure window \([E_{L,i},\, E_{R,i}]\) and onset window \([O_{L,i},\, O_{R,i}]\) (all in calendar days), the continuous exposure time \(E \in (E_{L,i},\; E_{R,i} + 1)\) and onset time \(O \in (O_{L,i},\; O_{R,i} + 1)\), giving per-observation bounds:

\[T^{\text{lo}}_{i} = O_{L,i} - E_{R,i} - 1, \qquad T^{\text{hi}}_{i} = O_{R,i} - E_{L,i} + 1\]

The log-likelihood contribution is:

\[\ell(\theta \mid i) = \log\!\Bigl[ F_{\text{LN}}\!\left(T^{\text{hi}}_{i} \mid \mu, \sigma\right) - F_{\text{LN}}\!\left(T^{\text{lo}}_{i} \mid \mu, \sigma\right) \Bigr]\]

Summing over all \(N\) observations:

\[\ell(\theta \mid \mathbf{T}) = \sum_{i=1}^{N} \log\!\Bigl[ F_{\text{LN}}\!\left(T^{\text{hi}}_{i} \mid \mu, \sigma\right) - F_{\text{LN}}\!\left(T^{\text{lo}}_{i} \mid \mu, \sigma\right) \Bigr]\]

Group A special case (\(E_{L,i} = E_{R,i}\), \(O_{L,i} = O_{R,i}\)): \(T^{\text{lo}}_{i} = k_{i} - 1\), \(T^{\text{hi}}_{i} = k_{i} + 1\), identical to v3b with \(\delta = 1\).

stan_code <- "
data {
  int<lower=0> N;
  // Per-case incubation bounds (days, real-valued).
  // For Group A: T_lower = k - 1, T_upper = k + 1
  // (k = integer onset - exposure day).
  // For wider exposure windows: T_lower = O_L - E_R - 1,
  //                             T_upper = O_R - E_L + 1.
  vector<lower=0>[N] T_lower;
  vector<lower=0>[N] T_upper;
}
parameters {
  real<lower=0> mean_ip;
  real<lower=0> sd_ip;
}
transformed parameters {
  real mu    = log(mean_ip)
               - 0.5 * log(1.0 + square(sd_ip / mean_ip));
  real sigma = sqrt(log(1.0 + square(sd_ip / mean_ip)));
}
model {
  // Weakly informative priors -- to be updated after literature review.
  // mean_ip ~ Normal(21, 7): centred near expected HFRS value.
  // sd_ip   ~ HalfNormal(5): <lower=0> enforces positivity.
  mean_ip ~ normal(21, 7);
  sd_ip   ~ normal(0, 5);
  // Generalized interval-censored likelihood.
  for (i in 1:N) {
    target += log_diff_exp(
      lognormal_lcdf(T_upper[i] | mu, sigma),
      lognormal_lcdf(T_lower[i] | mu, sigma)
    );
  }
}
"

Priors: unchanged — mean_ip ~ Normal(21, 7), sd_ip ~ HalfNormal(5).

Likelihood: generalized doubly interval-censored; each observation contributes P(T ∈ (T_lower[i], T_upper[i])). For Group A this equals v3b: T_lower = k − 1, T_upper = k + 1. Cases with wider exposure windows slot in by supplying different bounds — no model changes required.

Extensibility: to include a case from Group B-1 (exposure Oct 7–25, onset day O), set T_lower = O − 25 − 1, T_upper = O − 7 + 1.

3.2 Sampling

# Derive per-observation bounds from integer incubation periods.
# Group A: single calendar-day exposure and onset -> bounds = k +/- 1.
T_lower <- as.numeric(inc) - 1.0
T_upper <- as.numeric(inc) + 1.0

stan_data <- list(
  N       = length(inc),
  T_lower = T_lower,
  T_upper = T_upper
)

fit_stan <- stan(
  model_code = stan_code,
  data       = stan_data,
  chains     = 4,
  iter       = 10000,
  warmup     = 5000,
  seed       = 42,
  refresh    = 0
)

4 chains × 5,000 post-warmup iterations = 20,000 posterior samples.

3.3 HMC Diagnostics

check_hmc_diagnostics(fit_stan)

## 
## Divergences:

## 0 of 20000 iterations ended with a divergence.

## 
## Tree depth:

## 0 of 20000 iterations saturated the maximum tree depth of 10.

## 
## Energy:

## E-BFMI indicated no pathological behavior.

stan_sum <- summary(
  fit_stan, pars = c("mean_ip", "sd_ip", "mu", "sigma")
)$summary
diag_df <- data.frame(
  Parameter = c(
    "mean_ip (days)", "sd_ip (days)",
    "mu (meanlog)", "sigma (sdlog)"
  ),
  Mean  = round(stan_sum[, "mean"], 4),
  R_hat = round(stan_sum[, "Rhat"], 4),
  ESS   = round(stan_sum[, "n_eff"])
)
kable(
  diag_df,
  col.names = c("Parameter", "Posterior Mean", "R-hat", "ESS"),
  row.names = FALSE
)

Parameter	Posterior Mean	R-hat	ESS
mean_ip (days)	21.6414	1.0003	9144
sd_ip (days)	2.9455	1.0004	8878
mu (meanlog)	3.0646	1.0003	10116
sigma (sdlog)	0.1353	1.0004	10319

Interpretation:

R-hat < 1.01 for all parameters confirms convergence across chains.
ESS — effective sample size; target > 400.
Divergent transitions — zero is expected; any divergence requires investigation.
E-BFMI — values near 1 indicate efficient posterior exploration.

3.4 Trace Plots

chains_arr <- extract(
  fit_stan, pars = c("mean_ip", "sd_ip"),
  permuted = FALSE, inc_warmup = FALSE
)

df_trace <- do.call(rbind, lapply(seq_len(dim(chains_arr)[2]), function(k) {
  data.frame(
    iter    = seq_len(dim(chains_arr)[1]),
    mean_ip = chains_arr[, k, "mean_ip"],
    sd_ip   = chains_arr[, k, "sd_ip"],
    chain   = factor(k)
  )
}))

p_tr_mean <- ggplot(
  df_trace, aes(x = iter, y = mean_ip, color = chain)
) +
  geom_line(alpha = 0.6, linewidth = 0.3) +
  labs(
    title = "Trace — mean_ip (days) [post-warmup]",
    x = "post-warmup iteration", y = "mean_ip"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

p_tr_sd <- ggplot(
  df_trace, aes(x = iter, y = sd_ip, color = chain)
) +
  geom_line(alpha = 0.6, linewidth = 0.3) +
  labs(
    title = "Trace — sd_ip (days) [post-warmup]",
    x = "post-warmup iteration", y = "sd_ip"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

grid.arrange(p_tr_mean, p_tr_sd, ncol = 1)

3.5 Posterior Distributions

post_stan <- as.data.frame(
  extract(fit_stan, pars = c("mean_ip", "sd_ip", "mu", "sigma"))
)

mle_mean <- exp(fit_lnorm$par[1] + fit_lnorm$par[2]^2 / 2)
mle_sd <- sqrt(
  (exp(fit_lnorm$par[2]^2) - 1) *
    exp(2 * fit_lnorm$par[1] + fit_lnorm$par[2]^2)
)

p_post_mean <- ggplot(post_stan, aes(x = mean_ip)) +
  geom_density(fill = "#009E73", alpha = 0.5) +
  geom_vline(
    xintercept = mle_mean, linetype = "dashed", color = "gray30"
  ) +
  labs(
    title    = "Posterior — mean_ip (days)",
    subtitle = "Dashed: MLE-equivalent mean (point likelihood)",
    x = "mean incubation period (days)", y = "density"
  ) +
  theme_minimal()

p_post_sd <- ggplot(post_stan, aes(x = sd_ip)) +
  geom_density(fill = "#009E73", alpha = 0.5) +
  geom_vline(
    xintercept = mle_sd, linetype = "dashed", color = "gray30"
  ) +
  labs(
    title    = "Posterior — sd_ip (days)",
    subtitle = "Dashed: MLE-equivalent SD (point likelihood)",
    x = "SD incubation period (days)", y = "density"
  ) +
  theme_minimal()

grid.arrange(p_post_mean, p_post_sd, ncol = 2)

Interpretation:

mean_ip — posterior concentrated ~20–24 days; MLE reference line confirms prior-likelihood agreement. Numerically identical to v3b because Group A has single-day bounds in both versions.
sd_ip — right-skewed; mode ~3 days. The generalized formulation produces the same posteriors as v3b for Group A, confirming backward compatibility.

3.6 Posterior Predictive Check

set.seed(1)
n_samp   <- 500
samp_idx <- sample(nrow(post_stan), n_samp)
pp_mat <- sapply(samp_idx, function(i) {
  dlnorm(freq, post_stan$mu[i], post_stan$sigma[i])
})

df_pp <- data.frame(
  x     = freq,
  mean  = rowMeans(pp_mat),
  lower = apply(pp_mat, 1, quantile, probs = 0.025),
  upper = apply(pp_mat, 1, quantile, probs = 0.975)
)

ggplot(df_pp, aes(x = x)) +
  geom_ribbon(
    aes(ymin = lower, ymax = upper), fill = "#009E73", alpha = 0.25
  ) +
  geom_line(aes(y = mean), color = "#009E73", linewidth = 1) +
  geom_rug(
    data = data.frame(x = inc), aes(x = x),
    inherit.aes = FALSE, sides = "b", alpha = 0.8
  ) +
  labs(
    title    = "Posterior Predictive — Lognormal (Stan/HMC, v4)",
    subtitle = paste0(
      "Mean density +/- 95% interval from ", n_samp,
      " posterior samples"
    ),
    x = "days", y = "density"
  ) +
  theme_minimal()

4. Final Estimates

est_df <- data.frame(
  Quantity = c(
    "Mean incubation period (days)",
    "SD incubation period (days)",
    "Median incubation period (days)",
    "mu (meanlog)",
    "sigma (sdlog)"
  ),
  Posterior_Median = round(c(
    median(post_stan$mean_ip),
    median(post_stan$sd_ip),
    median(exp(post_stan$mu)),
    median(post_stan$mu),
    median(post_stan$sigma)
  ), 2),
  CI_Lower = round(c(
    quantile(post_stan$mean_ip, 0.025),
    quantile(post_stan$sd_ip,   0.025),
    quantile(exp(post_stan$mu), 0.025),
    quantile(post_stan$mu,      0.025),
    quantile(post_stan$sigma,   0.025)
  ), 2),
  CI_Upper = round(c(
    quantile(post_stan$mean_ip, 0.975),
    quantile(post_stan$sd_ip,   0.975),
    quantile(exp(post_stan$mu), 0.975),
    quantile(post_stan$mu,      0.975),
    quantile(post_stan$sigma,   0.975)
  ), 2)
)
kable(
  est_df,
  col.names = c(
    "Quantity", "Posterior Median", "95% CrI Lower", "95% CrI Upper"
  ),
  row.names = FALSE
)

Quantity	Posterior Median	95% CrI Lower	95% CrI Upper
Mean incubation period (days)	21.61	20.30	23.14
SD incubation period (days)	2.85	2.04	4.39
Median incubation period (days)	21.42	20.07	22.87
mu (meanlog)	3.06	3.00	3.13
sigma (sdlog)	0.13	0.09	0.20

Interpretation:

Results are numerically identical to v3b for Group A, confirming that the generalized formulation is backward-compatible. The model is now ready to absorb additional cases with wider exposure windows (Groups B, C, D, E) if individual-level [E_L, E_R] and [O_L, O_R] data become available — the Stan code requires no further changes.

5. Credits

Data & Literature

Kulagin, S.V., Fedorova, N.I., & Ketiladze, E.S. (1962). Laboratory outbreak of hemorrhagic fever with renal syndrome. Journal of Microbiology, Epidemiology and Immunology, 33(10), 121–126.
Sartwell, P.E. (1950). The distribution of incubation periods of infectious disease. American Journal of Hygiene, 51(3), 310–318.
Reich, N.G., et al. (2009). Estimating incubation period distributions with coarse data. Statistics in Medicine, 28(22), 2769–2784. (Foundational reference for the doubly interval-censored framework.)

Software

R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
Stan Development Team. RStan: the R interface to Stan.
Wickham, H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

AI Assistance

This document was developed with the assistance of Claude (Anthropic). AI was used to support code writing, interpretation drafting, and statistical explanation. All analytical decisions, prior specifications, and interpretations were reviewed and approved by the author.

HFRS Incubation Period Estimation — v4 (Generalized IC)

Kulagin et al. (1962) — Laboratory Outbreak, n = 18

Andy Chu

2026-05-28

0. Version Notes

Version history

v4 change

1. Dataset

2. Maximum Likelihood Estimation (No change)

2.1 Fitting

2.2 Parameter Estimates

2.3 Distribution Curves (MLE)

2.4 Q-Q Plot (Lognormal MLE)

3. Bayesian Estimation via Stan (HMC/NUTS)

3.1 Model Specification

Distribution Assumption

Why exp(μ) ≠ Mean

Reparameterization

Interval-Censored Likelihood (Generalized)

3.2 Sampling

3.3 HMC Diagnostics

3.4 Trace Plots

3.5 Posterior Distributions

3.6 Posterior Predictive Check

4. Final Estimates

5. Credits

Data & Literature

Software

AI Assistance