0. Version Notes

Version history

Version File Change Motivation
v2 hfrs_v2_stan.Rmd Original Stan model Baseline HMC/NUTS
v3a hfrs_v3a_reparam.Rmd Reparameterize to (mean_ip, sd_ip) Feedback 2: exp(mu) = median ≠ mean
v3b hfrs_v3b_full.Rmd Add fixed ±1 day IC likelihood Feedback 1: calendar-day recording uncertainty
v3c hfrs_v3c_sensitivity.Rmd δ sensitivity (0.5 / 1.0 / 2.0) Assess robustness to window width
v4 hfrs_v4_general_ic.Rmd Generalized per-case [T_lower, T_upper] Correct doubly IC framework; extensible to wider exposure windows

v4 change

This document extends v3b (hfrs_v3b_full.Rmd) by replacing the fixed-window interval-censored likelihood with a generalized formulation that accepts individual per-case bounds [T_lower, T_upper].

Aspect v2 v3a v3b v4 (this file)
Parameters mu, sigma mean_ip, sd_ip mean_ip, sd_ip mean_ip, sd_ip
Likelihood point point IC fixed δ = 1 IC individual bounds
Stan data vector y vector y array int inc vector T_lower, T_upper
Window ±1 day (global) [T_lo, T_hi] per case
Priors log-scale day-scale day-scale day-scale

Change — Generalized doubly interval-censored likelihood: Both exposure time E and onset time O are unknown within their recorded calendar day. For an individual with exposure window [E_L, E_R] and onset window [O_L, O_R] (all recorded as calendar days), the true continuous incubation T = O − E is bounded by:

\[T \in \bigl(O_{L} - E_{R} - 1,\;\; O_{R} - E_{L} + 1\bigr)\]

The likelihood contribution is:

\[P\!\left(T \in (T^{\text{lo}},\, T^{\text{hi}}) \mid \mu, \sigma\right) = F_{\text{LN}}\!\left(T^{\text{hi}} \mid \mu, \sigma\right) - F_{\text{LN}}\!\left(T^{\text{lo}} \mid \mu, \sigma\right)\]

For Group A (single-day exposure and onset, E_L = E_R, O_L = O_R): \(T^{\text{lo}} = k - 1\), \(T^{\text{hi}} = k + 1\), recovering v3b exactly. The generalisation adds no overhead for this group but allows future extension to cases with wider exposure windows without model changes.

Feedback 2 — Reparameterization: carried over from v3a/v3b.


1. Dataset

Incubation periods were recorded from 18 laboratory workers (Group A: single-visit external visitors who attended the department on 24 Oct 1961 only) who visited the infected department exactly once, providing a single point-identified exposure date.

Source: Kulagin, Fedorova & Ketiladze (1962), Journal of Microbiology, Epidemiology and Immunology, 33(10), 121–126. https://www.cabidigitallibrary.org/doi/full/10.5555/19632701583

inc <- c(rep(17, 3), 19, rep(21, 2), rep(22, 8), 23, rep(25, 2), 27)
cat("n =", length(inc), "\n")
## n = 18
cat("Range:", min(inc), "-", max(inc), "days\n")
## Range: 17 - 27 days
cat("Mean:", round(mean(inc), 1), "  Median:", median(inc), "\n")
## Mean: 21.6   Median: 22
ggplot(data.frame(x = inc), aes(x = "", y = x)) +
  geom_boxplot(
    fill = "#009E73", alpha = 0.4, width = 0.4, outlier.shape = NA
  ) +
  geom_jitter(width = 0.08, alpha = 0.6, color = "#009E73", size = 2) +
  labs(
    title    = "Observed Incubation Periods (n = 18, Group A)",
    subtitle = paste0(
      "Single-visit visitors on Oct 24; ",
      "integer-day difference between exposure and onset"
    ),
    x = NULL, y = "days"
  ) +
  theme_minimal()

Each recorded integer k represents a true continuous incubation T ∈ (k − 1, k + 1), since both exposure and onset fall within a calendar day whose exact continuous time is unknown.


2. Maximum Likelihood Estimation (No change)

2.1 Fitting

Three parametric families are fitted via MLE with L-BFGS-B optimization. These use point observations for comparison; the interval-censored treatment is applied in Section 3.

ll_lnorm <- function(params, data) {
  mu    <- params[1]
  sigma <- params[2]
  if (sigma <= 0) return(Inf)
  -sum(dlnorm(data, meanlog = mu, sdlog = sigma, log = TRUE))
}
fit_lnorm <- optim(
  c(mean(log(inc)), sd(log(inc))), ll_lnorm, data = inc,
  method = "L-BFGS-B", lower = c(-Inf, 0.001), hessian = TRUE
)
se_lnorm <- sqrt(diag(solve(fit_lnorm$hessian)))
ci_lnorm <- cbind(
  fit_lnorm$par - 1.96 * se_lnorm,
  fit_lnorm$par + 1.96 * se_lnorm
)
aic_lnorm <- 2 * 2 + 2 * fit_lnorm$value

ll_weib <- function(params, data) {
  shape <- params[1]
  scale <- params[2]
  if (shape <= 0 || scale <= 0) return(Inf)
  -sum(dweibull(data, shape = shape, scale = scale, log = TRUE))
}
fit_weib <- optim(
  c(1, mean(inc)), ll_weib, data = inc,
  method = "L-BFGS-B", lower = c(0.001, 0.001), hessian = TRUE
)
se_weib <- sqrt(diag(solve(fit_weib$hessian)))
ci_weib <- cbind(
  fit_weib$par - 1.96 * se_weib,
  fit_weib$par + 1.96 * se_weib
)
aic_weib <- 2 * 2 + 2 * fit_weib$value

ll_gamma <- function(params, data) {
  shape <- params[1]
  rate  <- params[2]
  if (shape <= 0 || rate <= 0) return(Inf)
  -sum(dgamma(data, shape = shape, rate = rate, log = TRUE))
}
fit_gamma <- optim(
  c(1, 1 / mean(inc)), ll_gamma, data = inc,
  method = "L-BFGS-B", lower = c(0.001, 0.001), hessian = TRUE
)
se_gamma <- sqrt(diag(solve(fit_gamma$hessian)))
ci_gamma <- cbind(
  fit_gamma$par - 1.96 * se_gamma,
  fit_gamma$par + 1.96 * se_gamma
)
aic_gamma <- 2 * 2 + 2 * fit_gamma$value

2.2 Parameter Estimates

ci1_lo <- round(
  c(ci_lnorm[1, 1], ci_weib[1, 1], ci_gamma[1, 1]), 3
)
ci1_hi <- round(
  c(ci_lnorm[1, 2], ci_weib[1, 2], ci_gamma[1, 2]), 3
)
ci2_lo <- round(
  c(ci_lnorm[2, 1], ci_weib[2, 1], ci_gamma[2, 1]), 3
)
ci2_hi <- round(
  c(ci_lnorm[2, 2], ci_weib[2, 2], ci_gamma[2, 2]), 3
)
result <- data.frame(
  Distribution = c("Lognormal", "Weibull", "Gamma"),
  Parameters   = c("meanlog, sdlog", "shape, scale", "shape, rate"),
  Param1 = round(
    c(fit_lnorm$par[1], fit_weib$par[1], fit_gamma$par[1]), 4
  ),
  CI1    = paste0("[", ci1_lo, ", ", ci1_hi, "]"),
  Param2 = round(
    c(fit_lnorm$par[2], fit_weib$par[2], fit_gamma$par[2]), 4
  ),
  CI2    = paste0("[", ci2_lo, ", ", ci2_hi, "]"),
  AIC    = round(c(aic_lnorm, aic_weib, aic_gamma), 2)
)
kable(result, align = "llrrrrc",
      col.names = c(
        "Distribution", "Parameters",
        "Param 1", "95% CI", "Param 2", "95% CI", "AIC"
      ))
Distribution Parameters Param 1 95% CI Param 2 95% CI AIC
Lognormal meanlog, sdlog 3.0628 [3.004, 3.121] 0.1264 [0.085, 0.168] 90.88
Weibull shape, scale 8.9965 [5.89, 12.103] 22.7147 [21.482, 23.948] 90.67
Gamma shape, rate 64.0606 [22.316, 105.805] 2.9719 [1.028, 4.916] 90.56

Interpretation: AICs fall within ΔAIC < 2 across all three distributions. Lognormal is retained as the primary model. MLE uses point observations; the Bayesian model below uses the interval-censored likelihood.


2.3 Distribution Curves (MLE)

freq <- seq(0, 50, length.out = 1000)

df_lnorm <- data.frame(
  x        = freq,
  density  = dlnorm(freq, fit_lnorm$par[1], fit_lnorm$par[2]),
  ci_lower = dlnorm(freq, ci_lnorm[1, 1],   ci_lnorm[2, 1]),
  ci_upper = dlnorm(freq, ci_lnorm[1, 2],   ci_lnorm[2, 2])
)
plnorm <- ggplot(df_lnorm, aes(x = x)) +
  geom_ribbon(
    aes(ymin = ci_lower, ymax = ci_upper),
    fill = "#009E73", alpha = 0.2
  ) +
  geom_line(aes(y = density), color = "#009E73", linewidth = 1) +
  geom_rug(
    data = data.frame(x = inc), aes(x = x),
    inherit.aes = FALSE, sides = "b", alpha = 0.6
  ) +
  labs(
    title = paste0("Lognormal  (AIC = ", round(aic_lnorm, 1), ")"),
    x = "days", y = "density"
  ) +
  theme_minimal()

df_weib <- data.frame(
  x        = freq,
  density  = dweibull(freq, fit_weib$par[1], fit_weib$par[2]),
  ci_lower = dweibull(freq, ci_weib[1, 1],   ci_weib[2, 1]),
  ci_upper = dweibull(freq, ci_weib[1, 2],   ci_weib[2, 2])
)
pweib <- ggplot(df_weib, aes(x = x)) +
  geom_ribbon(
    aes(ymin = ci_lower, ymax = ci_upper),
    fill = "#56B4E9", alpha = 0.2
  ) +
  geom_line(aes(y = density), color = "#56B4E9", linewidth = 1) +
  geom_rug(
    data = data.frame(x = inc), aes(x = x),
    inherit.aes = FALSE, sides = "b", alpha = 0.6
  ) +
  labs(
    title = paste0("Weibull  (AIC = ", round(aic_weib, 1), ")"),
    x = "days", y = "density"
  ) +
  theme_minimal()

df_gamma <- data.frame(
  x        = freq,
  density  = dgamma(freq, fit_gamma$par[1], fit_gamma$par[2]),
  ci_lower = dgamma(freq, ci_gamma[1, 1],   ci_gamma[2, 1]),
  ci_upper = dgamma(freq, ci_gamma[1, 2],   ci_gamma[2, 2])
)
pgamma_p <- ggplot(df_gamma, aes(x = x)) +
  geom_ribbon(
    aes(ymin = ci_lower, ymax = ci_upper),
    fill = "#E69F00", alpha = 0.2
  ) +
  geom_line(aes(y = density), color = "#E69F00", linewidth = 1) +
  geom_rug(
    data = data.frame(x = inc), aes(x = x),
    inherit.aes = FALSE, sides = "b", alpha = 0.6
  ) +
  labs(
    title = paste0("Gamma  (AIC = ", round(aic_gamma, 1), ")"),
    x = "days", y = "density"
  ) +
  theme_minimal()

df_combined <- data.frame(
  x = rep(freq, 3),
  density = c(
    dlnorm(freq,   fit_lnorm$par[1], fit_lnorm$par[2]),
    dweibull(freq, fit_weib$par[1],  fit_weib$par[2]),
    dgamma(freq,   fit_gamma$par[1], fit_gamma$par[2])
  ),
  Distribution = rep(
    c("Lognormal", "Weibull", "Gamma"), each = length(freq)
  )
)
pcom <- ggplot(
  df_combined, aes(x = x, y = density, color = Distribution)
) +
  geom_line(linewidth = 1) +
  geom_rug(
    data = data.frame(x = inc), aes(x = x),
    inherit.aes = FALSE, sides = "b", alpha = 0.6
  ) +
  scale_color_manual(values = c("#009E73", "#E69F00", "#56B4E9")) +
  labs(title = "Combined", x = "days", y = "density") +
  theme_minimal()

grid.arrange(plnorm, pweib, pgamma_p, pcom, ncol = 2)


2.4 Q-Q Plot (Lognormal MLE)

n     <- length(inc)
probs <- (seq_len(n) - 0.5) / n
qq_df <- data.frame(
  theoretical = qlnorm(probs, fit_lnorm$par[1], fit_lnorm$par[2]),
  observed    = sort(inc)
)
ggplot(qq_df, aes(x = theoretical, y = observed)) +
  geom_point(color = "#009E73", alpha = 0.7, size = 2) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed") +
  labs(
    title = "Q-Q Plot — Lognormal (MLE)",
    x = "Theoretical Quantiles", y = "Observed Quantiles"
  ) +
  theme_minimal()


3. Bayesian Estimation via Stan (HMC/NUTS)

3.1 Model Specification

Distribution Assumption

\[T_{i} \sim \text{LogNormal}(\mu,\, \sigma), \quad i = 1, \ldots, N\]

where \(T_{i}\) is the true continuous incubation period of individual \(i\) (days).

Why exp(μ) ≠ Mean

The lognormal distribution has two distinct location summaries:

\[\text{Median}(T) = \exp(\mu), \qquad \mathbb{E}[T] = \exp\!\left(\mu + \frac{\sigma^{2}}{2}\right)\]

Reporting \(\mu\) directly implies the median, not the mean. When \(\sigma > 0\), the two quantities diverge by a factor of \(\exp(\sigma^{2}/2)\).

Reparameterization

Let \(m = \mathbb{E}[T]\) (mean_ip) and \(s = \text{SD}(T)\) (sd_ip). The internal lognormal parameters are recovered by:

\[\mu = \log(m) - \frac{1}{2}\,\log\!\left(1 + \frac{s^{2}}{m^{2}}\right)\]

\[\sigma = \sqrt{\,\log\!\left(1 + \frac{s^{2}}{m^{2}}\right)}\]

Both \(m\) and \(s\) are in days, allowing priors and credible intervals to be stated directly in the clinically interpretable scale.

Interval-Censored Likelihood (Generalized)

For individual \(i\) with exposure window \([E_{L,i},\, E_{R,i}]\) and onset window \([O_{L,i},\, O_{R,i}]\) (all in calendar days), the continuous exposure time \(E \in (E_{L,i},\; E_{R,i} + 1)\) and onset time \(O \in (O_{L,i},\; O_{R,i} + 1)\), giving per-observation bounds:

\[T^{\text{lo}}_{i} = O_{L,i} - E_{R,i} - 1, \qquad T^{\text{hi}}_{i} = O_{R,i} - E_{L,i} + 1\]

The log-likelihood contribution is:

\[\ell(\theta \mid i) = \log\!\Bigl[ F_{\text{LN}}\!\left(T^{\text{hi}}_{i} \mid \mu, \sigma\right) - F_{\text{LN}}\!\left(T^{\text{lo}}_{i} \mid \mu, \sigma\right) \Bigr]\]

Summing over all \(N\) observations:

\[\ell(\theta \mid \mathbf{T}) = \sum_{i=1}^{N} \log\!\Bigl[ F_{\text{LN}}\!\left(T^{\text{hi}}_{i} \mid \mu, \sigma\right) - F_{\text{LN}}\!\left(T^{\text{lo}}_{i} \mid \mu, \sigma\right) \Bigr]\]

Group A special case (\(E_{L,i} = E_{R,i}\), \(O_{L,i} = O_{R,i}\)): \(T^{\text{lo}}_{i} = k_{i} - 1\), \(T^{\text{hi}}_{i} = k_{i} + 1\), identical to v3b with \(\delta = 1\).

stan_code <- "
data {
  int<lower=0> N;
  // Per-case incubation bounds (days, real-valued).
  // For Group A: T_lower = k - 1, T_upper = k + 1
  // (k = integer onset - exposure day).
  // For wider exposure windows: T_lower = O_L - E_R - 1,
  //                             T_upper = O_R - E_L + 1.
  vector<lower=0>[N] T_lower;
  vector<lower=0>[N] T_upper;
}
parameters {
  real<lower=0> mean_ip;
  real<lower=0> sd_ip;
}
transformed parameters {
  real mu    = log(mean_ip)
               - 0.5 * log(1.0 + square(sd_ip / mean_ip));
  real sigma = sqrt(log(1.0 + square(sd_ip / mean_ip)));
}
model {
  // Weakly informative priors -- to be updated after literature review.
  // mean_ip ~ Normal(21, 7): centred near expected HFRS value.
  // sd_ip   ~ HalfNormal(5): <lower=0> enforces positivity.
  mean_ip ~ normal(21, 7);
  sd_ip   ~ normal(0, 5);
  // Generalized interval-censored likelihood.
  for (i in 1:N) {
    target += log_diff_exp(
      lognormal_lcdf(T_upper[i] | mu, sigma),
      lognormal_lcdf(T_lower[i] | mu, sigma)
    );
  }
}
"

Priors: unchanged — mean_ip ~ Normal(21, 7), sd_ip ~ HalfNormal(5).

Likelihood: generalized doubly interval-censored; each observation contributes P(T ∈ (T_lower[i], T_upper[i])). For Group A this equals v3b: T_lower = k − 1, T_upper = k + 1. Cases with wider exposure windows slot in by supplying different bounds — no model changes required.

Extensibility: to include a case from Group B-1 (exposure Oct 7–25, onset day O), set T_lower = O − 25 − 1, T_upper = O − 7 + 1.


3.2 Sampling

# Derive per-observation bounds from integer incubation periods.
# Group A: single calendar-day exposure and onset -> bounds = k +/- 1.
T_lower <- as.numeric(inc) - 1.0
T_upper <- as.numeric(inc) + 1.0

stan_data <- list(
  N       = length(inc),
  T_lower = T_lower,
  T_upper = T_upper
)

fit_stan <- stan(
  model_code = stan_code,
  data       = stan_data,
  chains     = 4,
  iter       = 10000,
  warmup     = 5000,
  seed       = 42,
  refresh    = 0
)

4 chains × 5,000 post-warmup iterations = 20,000 posterior samples.


3.3 HMC Diagnostics

check_hmc_diagnostics(fit_stan)
## 
## Divergences:
## 0 of 20000 iterations ended with a divergence.
## 
## Tree depth:
## 0 of 20000 iterations saturated the maximum tree depth of 10.
## 
## Energy:
## E-BFMI indicated no pathological behavior.
stan_sum <- summary(
  fit_stan, pars = c("mean_ip", "sd_ip", "mu", "sigma")
)$summary
diag_df <- data.frame(
  Parameter = c(
    "mean_ip (days)", "sd_ip (days)",
    "mu (meanlog)", "sigma (sdlog)"
  ),
  Mean  = round(stan_sum[, "mean"], 4),
  R_hat = round(stan_sum[, "Rhat"], 4),
  ESS   = round(stan_sum[, "n_eff"])
)
kable(
  diag_df,
  col.names = c("Parameter", "Posterior Mean", "R-hat", "ESS"),
  row.names = FALSE
)
Parameter Posterior Mean R-hat ESS
mean_ip (days) 21.6414 1.0003 9144
sd_ip (days) 2.9455 1.0004 8878
mu (meanlog) 3.0646 1.0003 10116
sigma (sdlog) 0.1353 1.0004 10319

Interpretation:

  • R-hat < 1.01 for all parameters confirms convergence across chains.
  • ESS — effective sample size; target > 400.
  • Divergent transitions — zero is expected; any divergence requires investigation.
  • E-BFMI — values near 1 indicate efficient posterior exploration.

3.4 Trace Plots

chains_arr <- extract(
  fit_stan, pars = c("mean_ip", "sd_ip"),
  permuted = FALSE, inc_warmup = FALSE
)

df_trace <- do.call(rbind, lapply(seq_len(dim(chains_arr)[2]), function(k) {
  data.frame(
    iter    = seq_len(dim(chains_arr)[1]),
    mean_ip = chains_arr[, k, "mean_ip"],
    sd_ip   = chains_arr[, k, "sd_ip"],
    chain   = factor(k)
  )
}))

p_tr_mean <- ggplot(
  df_trace, aes(x = iter, y = mean_ip, color = chain)
) +
  geom_line(alpha = 0.6, linewidth = 0.3) +
  labs(
    title = "Trace — mean_ip (days) [post-warmup]",
    x = "post-warmup iteration", y = "mean_ip"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

p_tr_sd <- ggplot(
  df_trace, aes(x = iter, y = sd_ip, color = chain)
) +
  geom_line(alpha = 0.6, linewidth = 0.3) +
  labs(
    title = "Trace — sd_ip (days) [post-warmup]",
    x = "post-warmup iteration", y = "sd_ip"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

grid.arrange(p_tr_mean, p_tr_sd, ncol = 1)


3.5 Posterior Distributions

post_stan <- as.data.frame(
  extract(fit_stan, pars = c("mean_ip", "sd_ip", "mu", "sigma"))
)

mle_mean <- exp(fit_lnorm$par[1] + fit_lnorm$par[2]^2 / 2)
mle_sd <- sqrt(
  (exp(fit_lnorm$par[2]^2) - 1) *
    exp(2 * fit_lnorm$par[1] + fit_lnorm$par[2]^2)
)

p_post_mean <- ggplot(post_stan, aes(x = mean_ip)) +
  geom_density(fill = "#009E73", alpha = 0.5) +
  geom_vline(
    xintercept = mle_mean, linetype = "dashed", color = "gray30"
  ) +
  labs(
    title    = "Posterior — mean_ip (days)",
    subtitle = "Dashed: MLE-equivalent mean (point likelihood)",
    x = "mean incubation period (days)", y = "density"
  ) +
  theme_minimal()

p_post_sd <- ggplot(post_stan, aes(x = sd_ip)) +
  geom_density(fill = "#009E73", alpha = 0.5) +
  geom_vline(
    xintercept = mle_sd, linetype = "dashed", color = "gray30"
  ) +
  labs(
    title    = "Posterior — sd_ip (days)",
    subtitle = "Dashed: MLE-equivalent SD (point likelihood)",
    x = "SD incubation period (days)", y = "density"
  ) +
  theme_minimal()

grid.arrange(p_post_mean, p_post_sd, ncol = 2)

Interpretation:

  • mean_ip — posterior concentrated ~20–24 days; MLE reference line confirms prior-likelihood agreement. Numerically identical to v3b because Group A has single-day bounds in both versions.

  • sd_ip — right-skewed; mode ~3 days. The generalized formulation produces the same posteriors as v3b for Group A, confirming backward compatibility.


3.6 Posterior Predictive Check

set.seed(1)
n_samp   <- 500
samp_idx <- sample(nrow(post_stan), n_samp)
pp_mat <- sapply(samp_idx, function(i) {
  dlnorm(freq, post_stan$mu[i], post_stan$sigma[i])
})

df_pp <- data.frame(
  x     = freq,
  mean  = rowMeans(pp_mat),
  lower = apply(pp_mat, 1, quantile, probs = 0.025),
  upper = apply(pp_mat, 1, quantile, probs = 0.975)
)

ggplot(df_pp, aes(x = x)) +
  geom_ribbon(
    aes(ymin = lower, ymax = upper), fill = "#009E73", alpha = 0.25
  ) +
  geom_line(aes(y = mean), color = "#009E73", linewidth = 1) +
  geom_rug(
    data = data.frame(x = inc), aes(x = x),
    inherit.aes = FALSE, sides = "b", alpha = 0.8
  ) +
  labs(
    title    = "Posterior Predictive — Lognormal (Stan/HMC, v4)",
    subtitle = paste0(
      "Mean density +/- 95% interval from ", n_samp,
      " posterior samples"
    ),
    x = "days", y = "density"
  ) +
  theme_minimal()


4. Final Estimates

est_df <- data.frame(
  Quantity = c(
    "Mean incubation period (days)",
    "SD incubation period (days)",
    "Median incubation period (days)",
    "mu (meanlog)",
    "sigma (sdlog)"
  ),
  Posterior_Median = round(c(
    median(post_stan$mean_ip),
    median(post_stan$sd_ip),
    median(exp(post_stan$mu)),
    median(post_stan$mu),
    median(post_stan$sigma)
  ), 2),
  CI_Lower = round(c(
    quantile(post_stan$mean_ip, 0.025),
    quantile(post_stan$sd_ip,   0.025),
    quantile(exp(post_stan$mu), 0.025),
    quantile(post_stan$mu,      0.025),
    quantile(post_stan$sigma,   0.025)
  ), 2),
  CI_Upper = round(c(
    quantile(post_stan$mean_ip, 0.975),
    quantile(post_stan$sd_ip,   0.975),
    quantile(exp(post_stan$mu), 0.975),
    quantile(post_stan$mu,      0.975),
    quantile(post_stan$sigma,   0.975)
  ), 2)
)
kable(
  est_df,
  col.names = c(
    "Quantity", "Posterior Median", "95% CrI Lower", "95% CrI Upper"
  ),
  row.names = FALSE
)
Quantity Posterior Median 95% CrI Lower 95% CrI Upper
Mean incubation period (days) 21.61 20.30 23.14
SD incubation period (days) 2.85 2.04 4.39
Median incubation period (days) 21.42 20.07 22.87
mu (meanlog) 3.06 3.00 3.13
sigma (sdlog) 0.13 0.09 0.20

Interpretation:

Results are numerically identical to v3b for Group A, confirming that the generalized formulation is backward-compatible. The model is now ready to absorb additional cases with wider exposure windows (Groups B, C, D, E) if individual-level [E_L, E_R] and [O_L, O_R] data become available — the Stan code requires no further changes.


5. Credits

Data & Literature

  • Kulagin, S.V., Fedorova, N.I., & Ketiladze, E.S. (1962). Laboratory outbreak of hemorrhagic fever with renal syndrome. Journal of Microbiology, Epidemiology and Immunology, 33(10), 121–126.

  • Sartwell, P.E. (1950). The distribution of incubation periods of infectious disease. American Journal of Hygiene, 51(3), 310–318.

  • Reich, N.G., et al. (2009). Estimating incubation period distributions with coarse data. Statistics in Medicine, 28(22), 2769–2784. (Foundational reference for the doubly interval-censored framework.)

Software

  • R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Stan Development Team. RStan: the R interface to Stan.
  • Wickham, H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

AI Assistance

This document was developed with the assistance of Claude (Anthropic). AI was used to support code writing, interpretation drafting, and statistical explanation. All analytical decisions, prior specifications, and interpretations were reviewed and approved by the author.