Why statistics matters in immunology

  • Immune responses vary a lot across participants, even in the same study.
  • A biologically interesting difference is not enough; we also need an estimate of uncertainty.
  • Immunology studies often compare vaccine formulations, adjuvants, doses, or treatment groups.
  • In this example, the question is: Does a TLR-agonist adjuvant increase antibody response at day 28 compared with alum?

Example study and variables

  • Design: simulated vaccine immunogenicity study with 120 participants.
  • Groups: alum control vs. TLR-agonist adjuvant.
  • Continuous outcome: log2 fold change in antibody titer from baseline to day 28.
  • Binary outcome: seroconversion, defined as a 4-fold or greater increase in titer.
  • Covariate: age, because immune response can decrease with age.
Quick summary of the simulated immunology dataset
Group n Mean_log2FC Mean_Day28_log2 Seroconversion
Alum 60 0.53 4.97 1.7%
TLR 60 1.18 5.71 15%

Note: the dataset is simulated for teaching purposes, but the statistical workflow mirrors real immunology analyses.

Exploratory data analysis with ggplot2

  • The TLR group shows a higher center and more values above a 4-fold increase threshold.
  • There is still substantial subject-to-subject variability, which is why estimation and testing are useful.

Point estimation and interval estimation

We estimate the group effect with the difference in mean log2 fold change:

\[ \hat{\Delta} = \bar{Y}_{\mathrm{TLR}} - \bar{Y}_{\mathrm{Alum}} \]

A 95% confidence interval is

\[ \hat{\Delta} \pm t^* \cdot SE(\hat{\Delta}). \]

  • Estimated mean difference: 0.65 log2 units.
  • 95% CI: (0.36, 0.93).
  • Interpretation: because the interval is entirely above 0, the TLR adjuvant is associated with a stronger antibody response in this study.

Hypothesis testing and p-value

We test

\[ H_0: \mu_{\mathrm{TLR}} - \mu_{\mathrm{Alum}} = 0 \qquad \text{vs.} \qquad H_A: \mu_{\mathrm{TLR}} - \mu_{\mathrm{Alum}} \ne 0. \]

Using a linear model, the test statistic is

\[ t = \frac{\hat{\Delta}}{SE(\hat{\Delta})}. \]

  • Observed test statistic: 4.52.
  • p-value: 1.51^{-5}.
  • Since the p-value is very small, we reject the null hypothesis and conclude that the adjuvant groups differ in mean antibody response.
  • The p-value answers a probability question under the null model; it is not the probability that the null hypothesis is true.

Logistic regression for seroconversion

Now we model the probability of seroconversion while adjusting for age:

\[ \log\left(\frac{P(\text{response}=1)}{1-P(\text{response}=1)}\right) = \beta_0 + \beta_1 I(\text{TLR}) + \beta_2 \cdot \text{Age}. \]

  • Estimated odds ratio for TLR vs. alum: 8.52.
  • Approximate 95% CI for the odds ratio: (1.01, 71.68).

Interactive 3D Plotly view

  • This plot helps visualize multivariable structure that is harder to see in a 2D figure.
  • In immunology, interactive graphics are useful for exploring response heterogeneity.

Reproducible R code example

ggplot(dat, aes(x = group, y = log2_fc, fill = group)) +
  geom_violin(alpha = 0.28, color = NA, width = 0.9) +
  geom_boxplot(width = 0.18, outlier.shape = NA, alpha = 0.65) +
  geom_jitter(width = 0.08, alpha = 0.55, size = 2) +
  labs(
    title = "Day-28 antibody response is higher in the TLR-adjuvanted group",
    x = "Adjuvant group",
    y = "Log2 fold change in antibody titer"
  ) +
  theme_minimal(base_size = 14) +
  theme(legend.position = "none")
  • This code creates the first ggplot shown in the presentation.
  • Because the entire analysis is inside one .Rmd file, the report is easy to reproduce and update.

Take-home messages

  • Statistics helps convert biological variation into quantitative evidence.
  • In this simulated immunology study, the TLR adjuvant produced a larger average antibody response than alum.
  • Point estimates, confidence intervals, and p-values answer different questions and should be interpreted together.
  • A regression model extends the analysis by adjusting for age and estimating seroconversion probability.
  • The same workflow can be adapted to cytokine data, flow cytometry summaries, neutralization titers, or survival outcomes in immunology.