Statistical Methods in medical science: applications in virology, microbiology, immunology, and infectious diseases

Introduction

In medical research, statistical methods are essential for analysing data and making evidence-based decisions. Medical fields such as virology, microbiology, immunology, and infectious diseases often deal with variables like viral load, antibody titres, and pathogen counts, which must be rigorously analyzed to draw accurate conclusions about patient outcomes, disease progression, and treatment efficacy.

This tutorial will focus on key normality, parametric and non-parametric statistical tests, guiding you on when and how to use them based on specific assumptions about your data. We will use case studies from virology (particularly STIs), microbiology, immunology, and infectious diseases to illustrate how these tests apply in real-world scenarios.

The statistical tests covered include:

Normality tests, such as Shapiro-Wilk and Kolmogorov-Smirnov. Parametric tests, such as the t-test and ANOVA, which assume normal distribution. Non-parametric tests, such as the Mann-Whitney U test and Wilcoxon Signed-Rank test, which do not require normal distribution. Post hoc tests, for multiple comparisons after ANOVA or non-parametric tests.

For each test, we will cover:

  1. Rationale
  2. Use case
  3. Assumptions
  4. Hypotheses
  5. Visualizations
  6. Output

This tutorial will guide you through conducting these tests using R with the tidyverse, report, and ggstatsplot packages, and interpreting the results for medical applications.

What does it mean when a variable is normally distributed?

When we say a variable is normally distributed, we mean that its values follow a pattern commonly described by the “bell curve” or normal distribution. A normally distributed variable has:

  1. Symmetry around its mean: Most data points are clustered around the mean, and the distribution is balanced on both sides.

2, A peak at the mean: The highest frequency of values occurs at the mean, while frequencies gradually decrease as values move further from the mean.

  1. Consistent properties: Approximately 68% of the values fall within one standard deviation of the mean, about 95% within two standard deviations, and around 99.7% within three standard deviations.

A normal distribution allows for the use of parametric tests, which assume specific statistical properties like the mean and SD, to infer patterns and relationships within the data.

What does it mean when a variable is not normally distributed?

When a variable is not normally distributed, its values deviate from the characteristics of a bell curve. This non-normality can occur in several ways:

  1. Skewness: The data may lean towards one side (left or right), causing an asymmetric shape. For example, income data is often positively skewed, with a few very high values extending the tail to the right.

  2. Kurtosis: A non-normally distributed variable may have heavier or lighter tails than a normal distribution, affecting the likelihood of extreme values. High kurtosis indicates more frequent extreme values, while low kurtosis shows fewer.

  3. Multiple Modes: A dataset might have more than one peak (bimodal or multimodal), indicating that the data likely originates from different groups or distributions.

Non-normally distributed data often suggests the presence of underlying factors affecting the data, such as natural limitations, extreme values, or subgroup differences. Such data may require non-parametric tests, which do not assume normality, to provide valid insights and robust results.

Normality tests

1. Shapiro-Wilk Tests

Description of the Shapiro-Wilk Test

The Shapiro-Wilk test is a statistical test designed to evaluate whether a dataset follows a normal distribution. It is particularly well-suited for smaller samples (usually less than 50), though it can still be effective with larger samples. The test compares the observed data to an idealized, perfectly normal distribution. If the observed data matches the expected distribution, the test produces a high p-value, indicating that any deviation from normality is not statistically significant. However, if the data significantly differs from a normal distribution, the p-value will be low, prompting us to consider the data as non-normally distributed.

Example 1: Patient blood pressure in a clinical trial

Rationale:

To determine if systolic blood pressure data of a sample of patients is normally distributed. Normality is essential for the validity of parametric tests, especially in clinical trials.

Use Case:

Evaluating systolic blood pressure among 30 patients in a clinical trial to check if data meets normality for use in further parametric analyses.

Assumptions:

  1. Data is continuous.
  2. Data is randomly sampled from the population.

Hypotheses:

  1. Null Hypothesis (H₀): Systolic blood pressure data follows a normal distribution.
  2. Alternative Hypothesis (H₁): Systolic blood pressure data does not follow a normal distribution.

Dataset

# Generate data
set.seed(1)
data_shapiro <- tibble(
  blood_pressure = rnorm(100, mean = 100, sd = 15)
)

datatable(
  data_shapiro
)

Visualizations:

# Histogram
ggplot(data.frame(data_shapiro), aes(x = blood_pressure)) +
  geom_histogram(aes(y = ..density..), color = "black", fill = "skyblue", bins = 10) +
  geom_density(alpha = 0.2, fill = "#FF6666") +
  ggtitle("Histogram of Systolic Blood Pressure") +
  kellz_theme

# QQ Plot
ggplot(data.frame(data_shapiro), aes(sample = blood_pressure)) +
  stat_qq() + stat_qq_line() +
  ggtitle("QQ Plot for Systolic Blood Pressure") +
  kellz_theme

Output

shapiro.test(data_shapiro$blood_pressure)
## 
##  Shapiro-Wilk normality test
## 
## data:  data_shapiro$blood_pressure
## W = 0.9956, p-value = 0.9876

Interpretation:

“The Shapiro-Wilk test assessing normality of data_shapiro$blood_pressure indicates a slight deviation from normality (W = 0.97022, p-value = 0.02299). Since the p-value is below the common significance threshold of 0.05, this result suggests that the distribution of blood pressure data is not perfectly normal. However, given that the test statistic (W) is close to 1, the deviation may not be substantial. Researchers may still consider normality approximately satisfied, but depending on the analysis requirements, alternative methods or transformations could be explored.

Additionally:

If the p-value > 0.05, this suggests that systolic blood pressure measurements do not significantly deviate from a normal distribution. This outcome supports the use of parametric tests, such as t-tests, to further analyze blood pressure data in this clinical trial. Conversely, if the p-value < 0.05, it indicates the data distribution is significantly different from normality. In that case, relying on non-parametric methods might yield more reliable insights when examining blood pressure variations in the sample.

Example 2: White blood cell count (WBC) in cancer patients

Rationale:

Testing the normality of WBC counts helps determine the best statistical approach for analysis, as WBC distribution is often skewed due to variability in immune response.

Use Case:

Assessing normality in WBC counts among 50 cancer patients to choose between parametric or non-parametric tests.

Assumptions:

  1. Data is continuous.
  2. The sample is random.

Hypotheses:

  1. Null Hypothesis (H₀): White blood cell counts follow a normal distribution.
  2. Alternative Hypothesis (H₁): White blood cell counts do not follow a normal distribution.

Dataset

# Generate data
wbc_count_shapiro <- tibble(wbc_count = c(rnorm(25, mean = 5, sd = 1), rnorm(25, mean = 10, sd = 2)))

datatable(
  wbc_count_shapiro
)

Visualizations:

# Histogram
ggplot(wbc_count_shapiro, aes(x = wbc_count)) +
  geom_histogram(aes(y = ..density..), color = "black", fill = "lightgreen", bins = 12) +
  geom_density(alpha = 0.2, fill = "#66CCFF") +
  ggtitle("Histogram of WBC Counts") +
  kellz_theme

# QQ Plot
ggplot(wbc_count_shapiro, aes(sample = wbc_count)) +
  stat_qq() + stat_qq_line() +
  ggtitle("QQ Plot for WBC Counts") +
  kellz_theme

Output

shapiro.test(wbc_count_shapiro$wbc_count)
## 
##  Shapiro-Wilk normality test
## 
## data:  wbc_count_shapiro$wbc_count
## W = 0.9115, p-value = 0.001178

Interpretation:

The Shapiro-Wilk test assessing normality of wbc_count_shapiro$wbc_count indicates a statistically significant deviation from normality (W = 0.9366, p-value = 0.009907). Given that the p-value is below 0.05, the distribution of white blood cell counts is likely non-normal. This suggests that parametric analyses assuming normality may not be appropriate, and non-parametric methods or data transformations could be considered.

Additionally:

A p-value > 0.05 implies that WBC counts appear to follow a normal distribution, suggesting that parametric tests are suitable for deeper investigation. Given the critical role of WBC in assessing immune response, this normal distribution would allow for more conventional statistical modeling. If, however, the p-value < 0.05, it reveals significant deviations from normality, highlighting the need for non-parametric techniques. These methods would be better suited for capturing the potentially skewed nature of WBC counts due to variations in immune response across patients.

Example 3: Cholesterol levels in middle-aged adults

Rationale:

Cholesterol levels are a key cardiovascular health indicator, so normality testing helps ensure reliable parametric testing results.

Use Case:

Evaluating cholesterol levels in 40 middle-aged adults for parametric suitability.

Assumptions:

  1. Data is continuous.
  2. Data is randomly sampled.

Hypotheses:

  1. Null Hypothesis (H₀): Cholesterol levels are normally distributed.
  2. Alternative Hypothesis (H₁): Cholesterol levels are not normally distributed.

Dataset

# Generate data
cholesterol_shapiro <- tibble(cholesterol = rnorm(40, mean = 200, sd = 30))

datatable(
  cholesterol_shapiro
)

Visualizations:

# Histogram
ggplot(cholesterol_shapiro, aes(x = cholesterol)) +
  geom_histogram(aes(y = ..density..), color = "black", fill = "orange", bins = 10) +
  geom_density(alpha = 0.2, fill = "gold") +
  ggtitle("Histogram of Cholesterol Levels") +
  kellz_theme

# QQ Plot
ggplot(cholesterol_shapiro, aes(sample = cholesterol)) +
  stat_qq() + stat_qq_line() +
  ggtitle("QQ Plot for Cholesterol Levels") +
  kellz_theme

Output

shapiro.test(cholesterol_shapiro$cholesterol)
## 
##  Shapiro-Wilk normality test
## 
## data:  cholesterol_shapiro$cholesterol
## W = 0.96572, p-value = 0.2613

Interpretation:

The Shapiro-Wilk test assessing normality of cholesterol_shapiro$cholesterol shows no significant deviation from normality (W = 0.97325, p-value = 0.4534). With a p-value well above 0.05, this result suggests that the cholesterol data is approximately normally distributed. Therefore, analyses assuming normality can likely proceed without concerns for this variable.

Additionally:

With a p-value > 0.05, we interpret the cholesterol levels as being normally distributed, providing a green light for parametric analyses, such as ANOVA, to explore associations with cardiovascular risk factors. A p-value < 0.05, however, would indicate that cholesterol levels are not normally distributed, suggesting that non-parametric approaches would more accurately represent this variable, especially considering the natural variability in cholesterol influenced by genetics, diet, and lifestyle.

2. Kolmogorov-Smirnov (K-S) Test

Description of the Kolmogorov-Smirnov (K-S) Test

The Kolmogorov-Smirnov (K-S) test is a non-parametric test that determines whether a sample differs from a specified distribution, most often a normal distribution. Unlike the Shapiro-Wilk test, the K-S test is more versatile and can be applied to larger sample sizes, making it useful in studies with substantial data. The test calculates the largest difference between the cumulative distribution of the sample and the cumulative distribution of the specified theoretical distribution. A high p-value suggests the sample aligns well with the specified distribution, while a low p-value indicates a significant departure from it.

Example 1: Patient glucose levels in a diabetes study

Rationale:

Assessing the distribution of fasting glucose levels helps decide between parametric or non-parametric methods, given glucose data often has skewness.

Use Case:

Analyzing fasting glucose levels of 100 diabetes patients for normality.

Assumptions:

  1. The data is continuous and independent.
  2. The sample is randomly drawn from the population.
  3. The K-S test compares the sample distribution to a normal distribution (or another specified distribution), assuming no specific parametric form for the underlying distribution.

Hypotheses:

  1. Null Hypothesis (H₀): The distribution of glucose levels in glucose_ks$glucose is not significantly different from a normal distribution.

  2. Alternative Hypothesis (H₁): The distribution of glucose levels in glucose_ks$glucose significantly differs from a normal distribution.

Dataset

# Generate data
glucose_ks <- tibble(glucose = c(rnorm(50, mean = 90, sd = 15), rnorm(50, mean = 140, sd = 20)))

datatable(
  glucose_ks
)

Visualizations:

# Histogram
ggplot(glucose_ks, aes(x = glucose)) +
  geom_histogram(aes(y = ..density..), color = "black", fill = "purple", bins = 15) +
  geom_density(alpha = 0.2, fill = "violet") +
  ggtitle("Histogram of Fasting Glucose Levels") +
  kellz_theme

# QQ Plot
ggplot(glucose_ks, aes(sample = glucose)) +
  stat_qq() + stat_qq_line() +
  ggtitle("QQ Plot for Fasting Glucose Levels") +
  kellz_theme

Output

ks.test(glucose_ks$glucose, "pnorm", 
        mean = mean(glucose_ks$glucose), 
        sd = sd(glucose_ks$glucose))
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  glucose_ks$glucose
## D = 0.12178, p-value = 0.103
## alternative hypothesis: two-sided

Interpretation:

The asymptotic one-sample Kolmogorov-Smirnov test examining the normality of glucose levels suggests that the data does not significantly deviate from a normal distribution (D = 0.0844, p = 0.4748). Since the p-value is greater than .05, we fail to reject the null hypothesis that the glucose values follow a normal distribution.

Example 2: Heart rates during exercise

Rationale:

Testing normality in heart rate during exercise helps determine if the data can be analyzed with parametric tests.

Use Case:

Normality assessment of exercise heart rates in 80 patients.

Assumptions:

  1. The heart rate data is continuous and consists of independent observations.
  2. The sample is representative of the population of interest and is randomly drawn.
  3. The K-S test assesses whether the sample distribution matches a normal distribution without assuming a parametric form for the distribution.

Hypotheses:

  1. Null Hypothesis (H₀): The distribution of heart rate in heart_rate_ks$heart_rate is not significantly different from a normal distribution.

  2. Alternative Hypothesis (H₁): The distribution of heart rate in heart_rate_ks$heart_rate significantly differs from a normal distribution.

Dataset:

# Generate data
heart_rate_ks <- tibble(heart_rate = rnorm(80, mean = 140, sd = 15))

datatable(
  heart_rate_ks
)

Visualizations:

# Histogram
ggplot(heart_rate_ks, aes(x = heart_rate)) +
  geom_histogram(aes(y = ..density..), color = "black", fill = "blue", bins = 15) +
  geom_density(alpha = 0.2, fill = "lightblue") +
  ggtitle("Histogram of Heart Rates During Exercise") +
  kellz_theme

# QQ Plot
ggplot(heart_rate_ks, aes(sample = heart_rate)) +
  stat_qq() + stat_qq_line() +
  ggtitle("QQ Plot for Heart Rates")  +
  kellz_theme

Output

ks.test(heart_rate_ks$heart_rate, "pnorm", 
        mean = mean(heart_rate_ks$heart_rate), 
        sd = sd(heart_rate_ks$heart_rate))
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  heart_rate_ks$heart_rate
## D = 0.061516, p-value = 0.9042
## alternative hypothesis: two-sided

Interpretation:

The exact one-sample Kolmogorov-Smirnov test examining the normality of heart rate suggests that the data does not significantly deviate from a normal distribution (D = 0.0985, p = 0.3938). Since the p-value is greater than .05, we fail to reject the null hypothesis that the heart rate values follow a normal distribution.

Assumptions and Handling Violations

For each test, after verifying assumptions (e.g., normality, homoscedasticity, sphericity), we can take specific actions if assumptions are violated:

  1. Transform the data (e.g., log transformation for non-normality).
  2. Use nonparametric alternatives (e.g., Mann-Whitney U instead of independent t-test).
  3. Use robust statistical methods (e.g., Welch’s ANOVA for unequal variances).

1. Parametric Tests

1.1 Independent t-Test

Rationale:

Compares the means of two independent groups.

Use Case:

Comparing the viral loads between two HIV treatment groups.

Assumptions:

  1. The dependent variable (viral load) is continuous and normally distributed.
  2. Independent groups.
  3. Homogeneity of variances.

Hypotheses:

  1. H₀: The mean viral load is the same for both treatment groups.
  2. H₁: The mean viral load is different for the two groups.

Dataset:

set.seed(123)
data_ttest <- tibble(
  treatment_group = rep(c("Drug_A", "Drug_B"), each = 50),
  viral_load = c(rnorm(50, mean = 200, sd = 50), rnorm(50, mean = 180, sd = 45))
)

datatable(
  data_ttest
)

Visualization:

data_ttest |> 
  ggstatsplot::ggbetweenstats(x = treatment_group, y = viral_load) +
  theme_minimal()

Output

t.test(viral_load ~ treatment_group,
       data = data_ttest) |> 
  report()
## Effect sizes were labelled following Cohen's (1988) recommendations.
## 
## The Welch Two Sample t-test testing the difference of viral_load by
## treatment_group (mean in group Drug_A = 201.72, mean in group Drug_B = 186.59)
## suggests that the effect is positive, statistically not significant, and small
## (difference = 15.13, 95% CI [-2.18, 32.44], t(96.45) = 1.73, p = 0.086; Cohen's
## d = 0.35, 95% CI [-0.05, 0.75])

1.2 Paired t-Test

Rationale:

Compares the means of the same group measured twice (pre- and post-treatment).

Use Case:

Measuring antibody levels before and after vaccination in the same patient group.

Assumptions:

  1. Normally distributed differences.
  2. Paired samples.

Hypotheses:

  1. H₀: No difference in mean antibody levels before and after vaccination.
  2. H₁: A difference in antibody levels exists before and after vaccination.

Dataset:

set.seed(456)
data_paired <- tibble(
  patient_id = 1:50,
  before_vaccine = rnorm(50, mean = 120, sd = 15),
  after_vaccine = rnorm(50, mean = 145, sd = 20)
)

# Reshape the data for plotting
data_long <- data_paired |> 
  pivot_longer(cols = c(before_vaccine, after_vaccine), 
               names_to = "Vaccine_status", 
               values_to = "antibody_levels")

# Set factor levels for ordering
data_long$Vaccine_status <- factor(data_long$Vaccine_status, levels = c("before_vaccine", "after_vaccine"))

datatable(
  data_paired
)

Visualization

data_long |> 
  ggstatsplot::ggwithinstats(x = Vaccine_status, y = antibody_levels, type = "parametric") +
  kellz_theme

Output:

t.test(data_paired$after_vaccine, data_paired$before_vaccine, paired = TRUE) |> 
  report()
## Effect sizes were labelled following Cohen's (1988) recommendations.
## 
## The Paired t-test testing the difference between data_paired$after_vaccine and
## data_paired$before_vaccine (mean difference = 24.66) suggests that the effect
## is positive, statistically significant, and large (difference = 24.66, 95% CI
## [17.82, 31.51], t(49) = 7.24, p < .001; Cohen's d = 1.02, 95% CI [0.68, 1.36])

Interpretation:

The paired t-test revealed a significant increase in antibody levels after vaccination. The positive mean difference (17.68) and the narrow confidence interval (10.85 to 24.51) suggest a reliable and meaningful effect. The medium effect size (Cohen’s d = 0.74) further supports the practical significance of the observed change in antibody levels, indicating a substantial improvement due to the vaccination.

1.3 One-Way ANOVA

Rationale:

Compares means of three or more independent groups.

Use Case:

Evaluating bacterial growth under three antibiotic treatments.

Assumptions:

  1. The dependent variable (bacterial growth) is normally distributed.
  2. Independent groups.
  3. Homogeneity of variances.

Hypotheses:

  1. H₀: The means are the same across all groups.
  2. H₁: At least one group mean is different.

Dataset:

set.seed(789)
data_anova <- tibble(
  antibiotic = rep(c("Antibiotic_A", "Antibiotic_B", "Antibiotic_C"), each = 30),
  bacterial_growth = c(rnorm(30, mean = 50, sd = 10), 
                       rnorm(30, mean = 40, sd = 12), 
                       rnorm(30, mean = 60, sd = 15))
)

datatable(
  data_anova
)

Visualization:

data_anova |> 
  ggstatsplot::ggbetweenstats(x = antibiotic, y = bacterial_growth, type = "parametric") +
  kellz_theme

Output:

aov(bacterial_growth ~ antibiotic, data = data_anova) |> 
  report()
## The ANOVA (formula: bacterial_growth ~ antibiotic) suggests that:
## 
##   - The main effect of antibiotic is statistically significant and large (F(2,
## 87) = 22.20, p < .001; Eta2 = 0.34, 95% CI [0.20, 1.00])
## 
## Effect sizes were labelled following Field's (2013) recommendations.

Interpretation:

The one-way ANOVA revealed a significant difference in bacterial growth across the antibiotic groups (F(2, 87) = 23.96, p < .001). This indicates that at least one antibiotic has a different effect on bacterial growth compared to the others. The large effect size (Eta2 = 0.36) suggests that the differences are substantial and not just statistically significant, implying a meaningful practical impact. Further analysis, such as post-hoc tests such as Tukey’s HSD, would be necessary to identify which specific antibiotic groups differ significantly from each other.

1.4 Two-Way ANOVA

Rationale:

Compares the means of groups classified by two factors.

Use Case:

Evaluating interaction between drug dosage and time on immune response.

Assumptions:

  1. The dependent variable (immune response) is continuous and normally distributed.
  2. Independent groups for each factor.

Hypotheses:

  1. H₀: No interaction between drug dosage and time.
  2. H₁: There is an interaction between drug dosage and time.

Dataset:

set.seed(789)
data_twoway_anova <- tibble(
  dosage = rep(c("Low", "Medium", "High"), each = 30),
  time = rep(c("1_week", "2_weeks"), each = 45),
  immune_response = rnorm(90, mean = 150, sd = 20)
)

datatable(
  data_twoway_anova
)

Visualization:

data_twoway_anova |> 
  ggstatsplot::ggbetweenstats(x = dosage, y = immune_response, fill = time,
                              title = "Immune Response by Dosage and Time",
                              xlab = "Dosage", ylab = "Immune Response") +
  kellz_theme

data_twoway_anova |> 
  grouped_ggbetweenstats(
    x = dosage,
    y = immune_response,
    grouping.var = time,
    type = "parametric",
    pairwise.display = "all",
    p.adjust.method = "fdr", ## adjust p-values for multiple tests using this method
    package = "ggsci",
    palette = "default_jco",
    plotgrid.args = list(nrow = 2)
  ) 

Output:

aov(immune_response ~ dosage * time, data = data_twoway_anova) |> 
  report()
## The ANOVA (formula: immune_response ~ dosage * time) suggests that:
## 
##   - The main effect of dosage is statistically not significant and small (F(2,
## 86) = 1.62, p = 0.204; Eta2 (partial) = 0.04, 95% CI [0.00, 1.00])
##   - The main effect of time is statistically not significant and very small (F(1,
## 86) = 6.65e-03, p = 0.935; Eta2 (partial) = 7.74e-05, 95% CI [0.00, 1.00])
## 
## Effect sizes were labelled following Field's (2013) recommendations.

Interpretation:

The two-way ANOVA revealed no significant interaction between drug dosage and time (F(2, 86) = 0.82, p = 0.445; Eta2 (partial) = 0.02, 95% CI [0.00, 1.00]). This indicates that the effects of dosage and time on immune response are independent of each other. Additionally, neither dosage (F(2, 86) = 0.82, p = 0.445; Eta2 (partial) = 0.02, 95% CI [0.00, 1.00]) nor time (F(1, 86) = 0.06, p = 0.809; Eta2 (partial) = 6.86e-04, 95% CI [0.00, 1.00]) has a significant main effect on immune response. The small effect sizes for both dosage and time further support the conclusion that these factors are not major determinants of immune response in this analysis.

1.5 Repeated Measures ANOVA

Rationale:

Used for measuring the same subjects multiple times.

Use Case:

Measuring cytokine levels over time after drug treatment.

Assumptions:

  1. Sphericity (Mauchly’s test).
  2. Normality of dependent variable.

Hypotheses:

  1. H₀: No change in cytokine levels over time.
  2. H₁: Cytokine levels change over time.

Dataset:

set.seed(101)
data_rmanova <- tibble(
  patient_id = rep(1:30, each = 4),
  time = rep(c("Pre", "1_hour", "6_hours", "24_hours"), times = 30),
  cytokine_levels = rnorm(120, mean = 50, sd = 15)
)

datatable(
  data_rmanova
)

Visualization:

# Create the dataset
data_rmanova <- tibble(
  patient_id = rep(1:30, each = 4),
  time = factor(rep(c("Pre", "1_hour", "6_hours", "24_hours"), times = 30), levels = c("Pre", "1_hour", "6_hours", "24_hours")),
  cytokine_levels = rnorm(120, mean = 50, sd = 15)
)

# Plot the data
data_rmanova |> 
  ggstatsplot::ggwithinstats(x = time, y = cytokine_levels) +
  kellz_theme

Output:

# Create the dataset
data_rmanova <- tibble(
  patient_id = rep(1:30, each = 4),
  time = rep(c("Pre", "1_hour", "6_hours", "24_hours"), times = 30),
  cytokine_levels = rnorm(120, mean = 50, sd = 15)
)

# Convert 'time' to a factor
data_rmanova$time <- factor(data_rmanova$time, levels = c("Pre", "1_hour", "6_hours", "24_hours"))

# Run repeated measures ANOVA
anova_results <- aov_ez(
  id = "patient_id",
  dv = "cytokine_levels",
  within = "time",
  data = data_rmanova
) 

# Print the results
summary(anova_results)
## Warning in summary.Anova.mlm(object$Anova, multivariate = FALSE): HF eps > 1
## treated as 1
## 
## Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
## 
##             Sum Sq num Df Error SS den Df   F value Pr(>F)    
## (Intercept) 321474      1     4841     29 1925.7959 <2e-16 ***
## time           300      3    19358     87    0.4498 0.7181    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Mauchly Tests for Sphericity
## 
##      Test statistic p-value
## time        0.91265 0.77149
## 
## 
## Greenhouse-Geisser and Huynh-Feldt Corrections
##  for Departure from Sphericity
## 
##       GG eps Pr(>F[GG])
## time 0.94729     0.7077
## 
##        HF eps Pr(>F[HF])
## time 1.060931  0.7180904

Interpretation:

The repeated measures ANOVA revealed no significant change in cytokine levels over time (F(3, 87) = 0.1473, p = 0.9312). This suggests that the observed variations in cytokine levels across the different time points are likely due to factors other than a systematic change over time. The lack of a significant effect of time indicates that the cytokine levels remained relatively stable throughout the study period.

1.6 Pearson’s Correlation Coefficient

Rationale:

Measures the strength and direction of the linear relationship between two continuous variables.

Use Case:

Examines whether there is a significant relationship between patients’ systolic blood pressure and their cholesterol levels. If the two variables have a linear relationship, this test can quantify the strength of that relationship.

Assumptions

  1. The two variables are continuous.
  2. There is a linear relationship between the two variables.
  3. The data for both variables are normally distributed.
  4. Homoscedasticity: the spread of residuals (differences between observed and predicted values) is consistent for all predicted values.

Hypotheses:

H₀: There is no linear correlation between the two variables (r = 0). H₁: There is a significant linear correlation between the two variables (r ≠ 0).

Dataset:

set.seed(123)
data_pearson <- tibble(
  systolic_bp = rnorm(100, mean = 120, sd = 15),
  cholesterol = rnorm(100, mean = 200, sd = 30)
)

datatable(
  data_pearson
)

Visualization:

data_pearson |> 
  ggstatsplot::ggscatterstats(x = systolic_bp, y = cholesterol, type = "parametric") +
  kellz_theme

Output:

cor.test(data_pearson$systolic_bp, data_pearson$cholesterol,
         method = "pearson",
         exact = F) |> 
  report()
## Effect sizes were labelled following Funder's (2019) recommendations.
## 
## The Pearson's product-moment correlation between data_pearson$systolic_bp and
## data_pearson$cholesterol is negative, statistically not significant, and tiny
## (r = -0.05, 95% CI [-0.24, 0.15], t(98) = -0.49, p = 0.625)

2. Nonparametric Tests

2.1 Mann-Whitney U Test

Rationale:

Nonparametric test to compare two independent groups.

Use Case:

Comparing viral load suppression between two HIV treatments when data is not normally distributed.

Assumptions:

  1. Ordinal or continuous data.
  2. Independent groups.
  3. No normality assumption.

Hypotheses:

  1. H₀: No difference in viral load distribution.
  2. H₁: A difference in viral load distribution exists.

Dataset:

set.seed(111)
data_mw <- tibble(
  treatment_group = rep(c("Drug_A", "Drug_B"), each = 50),
  viral_load = c(rnorm(50, mean = 200, sd = 80), rnorm(50, mean = 180, sd = 85))
)

datatable(
  data_mw
)

Visualization:

data_mw |> 
  ggstatsplot::ggbetweenstats(x = treatment_group, y = viral_load, type = "nonparametric") +
  kellz_theme

Output:

# Perform the Mann-Whitney U Test
test_result <- wilcox.test(data_mw$viral_load ~ data_mw$treatment_group)
report(test_result)
## Effect sizes were labelled following Funder's (2019) recommendations.
## 
## The Wilcoxon rank sum test with continuity correction testing the difference in
## ranks between data_mw$viral_load and data_mw$treatment_group suggests that the
## effect is negative, statistically not significant, and very small (W = 1157.00,
## p = 0.524; r (rank biserial) = -0.07, 95% CI [-0.29, 0.15])

Interpretation:

The test statistic W=1157.00 and the p-value p=0.524 indicate that there is no statistically significant difference in viral load distribution between the two treatment groups. The p-value is greater than the common significance level of 0.05, so we fail to reject the null hypothesis.

The effect size, represented by the rank biserial correlation r=−0.07, suggests a very small effect, implying that any difference in viral load between the groups is negligible. The confidence interval for the effect size [−0.29,0.15] ncludes zero, further supporting the conclusion that there is no meaningful difference.

In summary, there is no significant evidence to suggest that one treatment is more effective than the other in terms of viral load suppression in this dataset.

2.2 Wilcoxon Signed-Rank Test

Rationale:

Nonparametric alternative to the paired t-test.

Use Case:

Comparing pre- and post-treatment viral loads in patients when the data is not normally distributed.

Assumptions:

  1. Paired samples.
  2. Ordinal or continuous data.
  3. No normality assumption.

Hypotheses:

  1. H₀: The median difference in viral loads before and after treatment is zero.
  2. H₁: The median difference is not zero.

Dataset:

set.seed(123) # For reproducibility
data_wilcox <- tibble(
  patient_id = 1:30,
  pre_treatment = rnorm(30, mean = 200, sd = 50),
  post_treatment = rnorm(30, mean = 180, sd = 50)
)

# Reshape the data for plotting
data_long <- data_wilcox %>%
  pivot_longer(cols = c(pre_treatment, post_treatment), 
               names_to = "Treatment_period", 
               values_to = "Viral_load")

# Set factor levels for ordering
data_long$Treatment_period <- factor(data_long$Treatment_period, levels = c("pre_treatment", "post_treatment"))

datatable(
  data_long
)

Visualization:

ggstatsplot::ggbetweenstats(
  data = data_long,
  x = Treatment_period,
  y = Viral_load,
  type = "np", # Nonparametric test
  paired = TRUE
) +
  kellz_theme

Output:

# Perform the Wilcoxon Signed-Rank Test
wilcox_test_result <- wilcox.test(
  data_wilcox$pre_treatment, 
  data_wilcox$post_treatment, 
  paired = TRUE
)
report(wilcox_test_result)
## Effect sizes were labelled following Funder's (2019) recommendations.
## 
## The Wilcoxon signed rank exact test testing the difference in ranks between
## data_wilcox$pre_treatment and data_wilcox$post_treatment suggests that the
## effect is positive, statistically not significant, and small (W = 257.00, p =
## 0.626; r (rank biserial) = 0.11, 95% CI [-0.30, 0.47])

Interpretation:

The Wilcoxon signed-rank test revealed no significant difference in the median viral load before and after treatment (W = 257.00, p = 0.626). This suggests that the observed changes in viral load are not statistically significant. The small effect size (r = 0.11) and the confidence interval (-0.30, 0.47) that includes zero further support the conclusion that there is no meaningful difference in viral load between the pre-treatment and post-treatment measurements.

2.3 Kruskal-Wallis Test

Rationale:

Nonparametric alternative to ANOVA for comparing more than two groups.

Use Case:

Comparing immune response across three treatment groups when the data is non-normally distributed.

Assupmtions:

  1. Independence: Observations are independent of each other. Ordinal Data: The test is appropriate for ordinal data or continuous data that is not normally distributed.
  2. Similar Shape Distributions: The distributions of the groups should have similar shapes.

Hypotheses:

  1. H₀: There is no difference in the median immune response across the treatment groups.
  2. H₁: There is a difference in the median immune response across the treatment groups.

Dataset:

set.seed(222)
data_kw <- tibble(
  treatment_group = rep(c("Treatment_A", "Treatment_B", "Treatment_C"), each = 30),
  immune_response = c(rnorm(30, mean = 150, sd = 50),
                      rnorm(30, mean = 140, sd = 55),
                      rnorm(30, mean = 160, sd = 45))
)

datatable(
  data_kw
)

Visualization:

data_kw |> 
  ggstatsplot::ggbetweenstats(x = treatment_group, y = immune_response, type = "nonparametric") +
  kellz_theme

Output:

kruskal_immune_trt <- kruskal.test(data_kw$immune_response ~ data_kw$treatment_group)
report(kruskal_immune_trt)
## Effect sizes were labelled following Field's (2013) recommendations.
## 
## The Kruskal-Wallis rank sum test testing the difference in ranks between
## data_kw$immune_response and data_kw$treatment_group suggests that the effect is
## statistically not significant, and small (Kruskal-Wallis chi2 = 1.11, p =
## 0.575; Epsilon squared (rank) = 0.01, 95% CI [1.61e-03, 1.00])

Interpretation:

The Kruskal-Wallis rank sum test revealed a significant difference in the median immune response across the treatment groups (Kruskal-Wallis chi2 = 9.07, p = 0.011). This indicates that at least one treatment group has a different median immune response compared to the others. The medium effect size (Epsilon squared (rank) = 0.10) suggests that the observed differences are moderately meaningful, implying a practical impact beyond statistical significance. Further analysis, such as post-hoc tests such as Dunns test, would be necessary to identify which specific treatment groups differ significantly from each other.

2.4 Friedman Test

Rationale:

Nonparametric alternative to repeated measures ANOVA.

Use Case:

Measuring cytokine levels at multiple time points in patients, where the data is non-normally distributed.

Assumptions:

  1. Repeated measures on the same subjects.
  2. Ordinal or continuous data.
  3. No normality assumption.

Hypotheses:

  1. H₀: There is no change in cytokine levels over time.
  2. H₁: There is a change in cytokine levels over time.

Dataset:

set.seed(444)
data_friedman <- tibble(
  patient_id = rep(1:30, each = 4),
  time = rep(c("Pre", "1_hour", "6_hours", "24_hours"), times = 30),
  cytokine_levels = rnorm(120, mean = 80, sd = 25)
)

# Convert 'time' to a factor
data_friedman$time <- factor(data_friedman$time, levels = c("Pre", "1_hour", "6_hours", "24_hours"))

datatable(
  data_friedman
)

Visualization:

data_friedman %>%
  ggstatsplot::ggwithinstats(x = time, y = cytokine_levels, type = "nonparametric") +
  kellz_theme

Output:

# Convert the data to a wide format for the Friedman test
data_friedman_wide <- data_friedman |> 
  pivot_wider(names_from = time, values_from = cytokine_levels)

# Perform the Friedman test
friedman.test(as.matrix(data_friedman_wide[,-1])) |> 
  report()
## Effect sizes were labelled following Landis' (1977) recommendations.
## 
## The Friedman rank sum test testing the difference in rank for
## as.matrix(data_friedman_wide[, -1]) and true location of 0 suggests that the
## effect is statistically not significant, and in slight agreement (chi2 = 2.28,
## p = 0.516; Kendall's W = 0.03, 95% CI [8.44e-03, 1.00])

Interpretation:

The Friedman rank sum test revealed no significant difference in cytokine levels across the different time points (chi2 = 2.28, p = 0.516). This suggests that the observed variations in cytokine levels are not statistically significant. The small effect size (Kendall’s W = 0.03) and the confidence interval (8.44e-03, 1.00) that includes zero further support the conclusion that there is no meaningful change in cytokine levels over time.

2.5 Spearman’s Rank Correlation

Rationale:

Nonparametric alternative to Pearson’s correlation. Used when data are ordinal or not normally distributed.

Use Case:

Evaluating the correlation between patient age and immune response after treatment when the data is not normally distributed.

Assumptions:

  1. Ordinal or continuous variables.
  2. Monotonic relationship between variables.

Hypotheses:

  1. H₀: There is no correlation between age and immune response.
  2. H₁: There is a correlation between age and immune response.

Dataset:

set.seed(555)
data_spearman <- tibble(
  age = sample(20:80, 100, replace = TRUE),
  immune_response = rnorm(100, mean = 120, sd = 30)
)

datatable(
  data_spearman
)

Visualization:

data_spearman |> 
  ggstatsplot::ggscatterstats(x = age, y = immune_response, type = "nonparametric") +
  kellz_theme
## `stat_xsidebin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_ysidebin()` using `bins = 30`. Pick better value with `binwidth`.

Output:

cor.test(data_spearman$age, data_spearman$immune_response, method = c("spearman")) |> 
  report()
## Warning in cor.test.default(data_spearman$age, data_spearman$immune_response, :
## Cannot compute exact p-value with ties
## Effect sizes were labelled following Funder's (2019) recommendations.
## 
## The Spearman's rank correlation rho between data_spearman$age and
## data_spearman$immune_response is negative, statistically not significant, and
## tiny (rho = -0.02, S = 1.71e+05, p = 0.806)

Interpretation:

The Spearman’s rank correlation analysis revealed a weak positive association between age and immune response (rho = 0.11). However, this relationship is not statistically significant (p = 0.255), suggesting that the observed association is likely due to chance or other factors. The small effect size (rho = 0.11) further supports the conclusion that there is no meaningful correlation between age and immune response in this dataset.

2.6 Pearson’s Chi-Squared Test

Rationale:

Used to test the association between two categorical variables.

Use Case:

Testing the association between antibiotic resistance (yes/no) and infection type (bacterial/viral).

Assumptions:

  1. Categorical data.
  2. Expected frequency > 5 in each cell.

Hypotheses:

  1. H₀: No association between antibiotic resistance and infection type.
  2. H₁: There is an association between antibiotic resistance and infection type.

Dataset:

set.seed(666)
data_chisq <- tibble(
  infection_type = sample(c("Bacterial", "Viral"), 100, replace = TRUE),
  antibiotic_resistance = sample(c("Yes", "No"), 100, replace = TRUE)
)

datatable(
  data_chisq
)

Visualization:

data_chisq |> 
  ggbarstats(
    x = antibiotic_resistance,
    y = infection_type,
    digits.perc = 1,
    label = "both"
  )

Output:

chi_infection_type <-  chisq.test(data_chisq$infection_type, data_chisq$antibiotic_resistance)
report(chi_infection_type)
## Effect sizes were labelled following Funder's (2019) recommendations.
## 
## The Pearson's Chi-squared test with Yates' continuity correction of
## independence between data_chisq$infection_type and suggests that the effect is
## statistically not significant, and tiny (chi2 = 0.11, p = 0.739; Adjusted
## Cramer's v = 0.00, 95% CI [0.00, 1.00])

Interpretation

The Pearson’s Chi-squared test with Yates’ continuity correction revealed no significant association between antibiotic resistance and infection type (chi2 = 0.00, p > .999). This suggests that the observed relationship is likely due to chance or other factors and does not represent a meaningful association between the two variables. The tiny effect size (adjusted Cramer’s v = 0.00) and the confidence interval (0.00, 1.00) that includes zero further support this conclusion.

2.7 Fisher’s Exact Test

Rationale:

Used when sample sizes are small, and Pearson’s chi-squared test assumptions are not met.

Use Case:

Testing the association between vaccination status (vaccinated/not vaccinated) and infection outcome (infected/not infected) in a small sample.

Assumptions:

  1. Small sample sizes.
  2. Categorical variables.

Hypotheses:

  1. H₀: No association between vaccination status and infection outcome.
  2. H₁: There is an association between vaccination status and infection outcome.

Dataset:

set.seed(777)
data_fisher <- tibble(
  vaccination_status = c(rep("Vaccinated", 15), rep("Not Vaccinated", 15)),
  infection_outcome = c(rep("Infected", 5), rep("Not Infected", 10),
                       rep("Infected", 3), rep("Not Infected", 12))
)

datatable(
  data_fisher
)

Visualization:

data_fisher |> 
  ggbarstats(
    x = infection_outcome,
    y = vaccination_status,
    digits.perc = 1,
    label = "both"
  )

Output:

fisher_infection_type <-  fisher.test(data_fisher$infection_outcome, data_fisher$vaccination_status)
report(fisher_infection_type)
## Effect sizes were labelled following Funder's (2019) recommendations.
## 
## The Fisher's Exact Test for Count Data testing the association between the
## variables of the data_fisher$infection_outcome and
## data_fisher$vaccination_status dataset suggests that the effect is
## statistically not significant, and tiny (odds ratio = 0.51, 95% CI [0.06,
## 3.42], p = 0.682; Adjusted Cramer's v = 0.00, 95% CI [0.00, 0.48])

Interpretation:

The Fisher’s Exact Test revealed no significant association between vaccination status and infection outcome (odds ratio = 0.51, p = 0.682). This suggests that the observed relationship is likely due to chance or other factors and does not represent a meaningful association between the two variables. The tiny effect size (adjusted Cramer’s v = 0.00) and the confidence interval (0.00, 0.48) that includes zero further support this conclusion.

Conclusion

In this tutorial, we explored a comprehensive range of statistical tests used in medical research, distinguishing between parametric and non-parametric methods. We began with essential parametric tests such as the independent t-test, ANOVA etc, discussing their assumptions, and followed up with non-parametric counterparts like the Mann-Whitney U test, Kruskal-Wallis test, and Friedman test, which are appropriate when parametric assumptions (e.g., normality) are not met. Additionally, advanced methods were highlighted for complex research designs.

A key takeaway from this tutorial is the importance of testing assumptions before applying any statistical method. We reviewed how normality tests like the Shapiro-Wilk, and Kolmogorov-Smirnov tests help assess whether data follows a normal distribution. When assumptions such as normality, homogeneity of variance, or linearity are violated, switching to non-parametric methods or using data transformations is crucial to ensure the validity of the results.

Moreover, we discussed post hoc tests for multiple comparisons, like Tukey’s HSD, which are essential when significant differences are detected in ANOVA. Visualization techniques using ggplot2 and ggstatsplot were emphasized throughout the session, showing how data visualization is key to communicating findings effectively.

A significant addition to this seminar is the inclusion of permutation tests, a flexible, non-parametric approach that is invaluable when traditional parametric test assumptions are not satisfied or when sample sizes are small. Permutation tests do not rely on distributional assumptions, making them particularly useful in fields such as virology or immunology where data often violate normality assumptions. By generating a distribution based on the data itself, permutation tests allow for robust comparisons between groups, correlations, or model coefficients.

In conclusion, the tutorial covered a wide array of statistical tests relevant to medical research, detailing when and how to use each test, checking assumptions, and the steps to take if these assumptions are violated. Whether analysing the effectiveness of treatments, investigating disease progression, or exploring associations between clinical biomarkers, the statistical methods discussed here equip researchers with the tools to make informed decisions. By applying the right statistical tests in conjunction with proper visualizations, we can ensure the integrity and clarity of research findings in the medical and public health domains.

About

KP Analytics Insights (Pty) Ltd is a reputable data-driven research and consulting company based in South Africa. Our primary objective is to facilitate evidence-based decision-making in the field of medical science by offering valuable insights and innovative solutions. We cater to a diverse range of clients, including postgraduate students, independent researchers, medical scientists, as well as organizations and institutions seeking comprehensive statistical analysis, interpretation, reporting, and actionable insights.

At KP Analytics Insights, our portfolio encompasses a wide array of projects focused on sexually transmitted infections (STIs), human papillomavirus (HPV) research, HIV-1 drug resistance, HBV drug resistance, and the enhancement of laboratory workflows through the implementation of statistical quality control methods. Moreover, we actively contribute to the advancement of medical research through rigorous peer reviews, publication of scholarly articles, and the development of interactive dashboards.

Visit our website KP Analytics Insights (Pty) Ltd to gain a comprehensive understanding of how our expertise can assist you in harnessing the power of data-driven insights. Let us be your partner in transforming data into knowledge.