Statistical Methods in medical science: applications in virology, microbiology, immunology, and infectious diseases
Introduction
In medical research, statistical methods are essential for analysing data and making evidence-based decisions. Medical fields such as virology, microbiology, immunology, and infectious diseases often deal with variables like viral load, antibody titres, and pathogen counts, which must be rigorously analyzed to draw accurate conclusions about patient outcomes, disease progression, and treatment efficacy.
This tutorial will focus on key normality, parametric and non-parametric statistical tests, guiding you on when and how to use them based on specific assumptions about your data. We will use case studies from virology (particularly STIs), microbiology, immunology, and infectious diseases to illustrate how these tests apply in real-world scenarios.
The statistical tests covered include:
Normality tests, such as Shapiro-Wilk and Kolmogorov-Smirnov. Parametric tests, such as the t-test and ANOVA, which assume normal distribution. Non-parametric tests, such as the Mann-Whitney U test and Wilcoxon Signed-Rank test, which do not require normal distribution. Post hoc tests, for multiple comparisons after ANOVA or non-parametric tests.
For each test, we will cover:
- Rationale
- Use case
- Assumptions
- Hypotheses
- Visualizations
- Output
This tutorial will guide you through conducting these tests using R with the tidyverse, report, and ggstatsplot packages, and interpreting the results for medical applications.
What does it mean when a variable is normally distributed?
When we say a variable is normally distributed, we mean that its values follow a pattern commonly described by the “bell curve” or normal distribution. A normally distributed variable has:
- Symmetry around its mean: Most data points are clustered around the mean, and the distribution is balanced on both sides.
2, A peak at the mean: The highest frequency of values occurs at the mean, while frequencies gradually decrease as values move further from the mean.
- Consistent properties: Approximately 68% of the values fall within one standard deviation of the mean, about 95% within two standard deviations, and around 99.7% within three standard deviations.
A normal distribution allows for the use of parametric tests, which assume specific statistical properties like the mean and SD, to infer patterns and relationships within the data.
What does it mean when a variable is not normally distributed?
When a variable is not normally distributed, its values deviate from the characteristics of a bell curve. This non-normality can occur in several ways:
Skewness: The data may lean towards one side (left or right), causing an asymmetric shape. For example, income data is often positively skewed, with a few very high values extending the tail to the right.
Kurtosis: A non-normally distributed variable may have heavier or lighter tails than a normal distribution, affecting the likelihood of extreme values. High kurtosis indicates more frequent extreme values, while low kurtosis shows fewer.
Multiple Modes: A dataset might have more than one peak (bimodal or multimodal), indicating that the data likely originates from different groups or distributions.
Non-normally distributed data often suggests the presence of underlying factors affecting the data, such as natural limitations, extreme values, or subgroup differences. Such data may require non-parametric tests, which do not assume normality, to provide valid insights and robust results.
Normality tests
1. Shapiro-Wilk Tests
Description of the Shapiro-Wilk Test
The Shapiro-Wilk test is a statistical test designed to evaluate whether a dataset follows a normal distribution. It is particularly well-suited for smaller samples (usually less than 50), though it can still be effective with larger samples. The test compares the observed data to an idealized, perfectly normal distribution. If the observed data matches the expected distribution, the test produces a high p-value, indicating that any deviation from normality is not statistically significant. However, if the data significantly differs from a normal distribution, the p-value will be low, prompting us to consider the data as non-normally distributed.
Example 1: Patient blood pressure in a clinical trial
Rationale:
To determine if systolic blood pressure data of a sample of patients is normally distributed. Normality is essential for the validity of parametric tests, especially in clinical trials.
Use Case:
Evaluating systolic blood pressure among 30 patients in a clinical trial to check if data meets normality for use in further parametric analyses.
Assumptions:
- Data is continuous.
- Data is randomly sampled from the population.
Hypotheses:
- Null Hypothesis (H₀): Systolic blood pressure data follows a normal distribution.
- Alternative Hypothesis (H₁): Systolic blood pressure data does not follow a normal distribution.
Dataset
Visualizations:
# Histogram
ggplot(data.frame(data_shapiro), aes(x = blood_pressure)) +
geom_histogram(aes(y = ..density..), color = "black", fill = "skyblue", bins = 10) +
geom_density(alpha = 0.2, fill = "#FF6666") +
ggtitle("Histogram of Systolic Blood Pressure") +
kellz_theme# QQ Plot
ggplot(data.frame(data_shapiro), aes(sample = blood_pressure)) +
stat_qq() + stat_qq_line() +
ggtitle("QQ Plot for Systolic Blood Pressure") +
kellz_themeOutput
##
## Shapiro-Wilk normality test
##
## data: data_shapiro$blood_pressure
## W = 0.9956, p-value = 0.9876
Interpretation:
“The Shapiro-Wilk test assessing normality of data_shapiro$blood_pressure indicates a slight deviation from normality (W = 0.97022, p-value = 0.02299). Since the p-value is below the common significance threshold of 0.05, this result suggests that the distribution of blood pressure data is not perfectly normal. However, given that the test statistic (W) is close to 1, the deviation may not be substantial. Researchers may still consider normality approximately satisfied, but depending on the analysis requirements, alternative methods or transformations could be explored.
Additionally:
If the p-value > 0.05, this suggests that systolic blood pressure measurements do not significantly deviate from a normal distribution. This outcome supports the use of parametric tests, such as t-tests, to further analyze blood pressure data in this clinical trial. Conversely, if the p-value < 0.05, it indicates the data distribution is significantly different from normality. In that case, relying on non-parametric methods might yield more reliable insights when examining blood pressure variations in the sample.
Example 2: White blood cell count (WBC) in cancer patients
Rationale:
Testing the normality of WBC counts helps determine the best statistical approach for analysis, as WBC distribution is often skewed due to variability in immune response.
Use Case:
Assessing normality in WBC counts among 50 cancer patients to choose between parametric or non-parametric tests.
Assumptions:
- Data is continuous.
- The sample is random.
Hypotheses:
- Null Hypothesis (H₀): White blood cell counts follow a normal distribution.
- Alternative Hypothesis (H₁): White blood cell counts do not follow a normal distribution.
Dataset
Visualizations:
# Histogram
ggplot(wbc_count_shapiro, aes(x = wbc_count)) +
geom_histogram(aes(y = ..density..), color = "black", fill = "lightgreen", bins = 12) +
geom_density(alpha = 0.2, fill = "#66CCFF") +
ggtitle("Histogram of WBC Counts") +
kellz_theme# QQ Plot
ggplot(wbc_count_shapiro, aes(sample = wbc_count)) +
stat_qq() + stat_qq_line() +
ggtitle("QQ Plot for WBC Counts") +
kellz_themeOutput
##
## Shapiro-Wilk normality test
##
## data: wbc_count_shapiro$wbc_count
## W = 0.9115, p-value = 0.001178
Interpretation:
The Shapiro-Wilk test assessing normality of wbc_count_shapiro$wbc_count indicates a statistically significant deviation from normality (W = 0.9366, p-value = 0.009907). Given that the p-value is below 0.05, the distribution of white blood cell counts is likely non-normal. This suggests that parametric analyses assuming normality may not be appropriate, and non-parametric methods or data transformations could be considered.
Additionally:
A p-value > 0.05 implies that WBC counts appear to follow a normal distribution, suggesting that parametric tests are suitable for deeper investigation. Given the critical role of WBC in assessing immune response, this normal distribution would allow for more conventional statistical modeling. If, however, the p-value < 0.05, it reveals significant deviations from normality, highlighting the need for non-parametric techniques. These methods would be better suited for capturing the potentially skewed nature of WBC counts due to variations in immune response across patients.
Example 3: Cholesterol levels in middle-aged adults
Rationale:
Cholesterol levels are a key cardiovascular health indicator, so normality testing helps ensure reliable parametric testing results.
Use Case:
Evaluating cholesterol levels in 40 middle-aged adults for parametric suitability.
Assumptions:
- Data is continuous.
- Data is randomly sampled.
Hypotheses:
- Null Hypothesis (H₀): Cholesterol levels are normally distributed.
- Alternative Hypothesis (H₁): Cholesterol levels are not normally distributed.
Dataset
Visualizations:
# Histogram
ggplot(cholesterol_shapiro, aes(x = cholesterol)) +
geom_histogram(aes(y = ..density..), color = "black", fill = "orange", bins = 10) +
geom_density(alpha = 0.2, fill = "gold") +
ggtitle("Histogram of Cholesterol Levels") +
kellz_theme# QQ Plot
ggplot(cholesterol_shapiro, aes(sample = cholesterol)) +
stat_qq() + stat_qq_line() +
ggtitle("QQ Plot for Cholesterol Levels") +
kellz_themeOutput
##
## Shapiro-Wilk normality test
##
## data: cholesterol_shapiro$cholesterol
## W = 0.96572, p-value = 0.2613
Interpretation:
The Shapiro-Wilk test assessing normality of cholesterol_shapiro$cholesterol shows no significant deviation from normality (W = 0.97325, p-value = 0.4534). With a p-value well above 0.05, this result suggests that the cholesterol data is approximately normally distributed. Therefore, analyses assuming normality can likely proceed without concerns for this variable.
Additionally:
With a p-value > 0.05, we interpret the cholesterol levels as being normally distributed, providing a green light for parametric analyses, such as ANOVA, to explore associations with cardiovascular risk factors. A p-value < 0.05, however, would indicate that cholesterol levels are not normally distributed, suggesting that non-parametric approaches would more accurately represent this variable, especially considering the natural variability in cholesterol influenced by genetics, diet, and lifestyle.
2. Kolmogorov-Smirnov (K-S) Test
Description of the Kolmogorov-Smirnov (K-S) Test
The Kolmogorov-Smirnov (K-S) test is a non-parametric test that determines whether a sample differs from a specified distribution, most often a normal distribution. Unlike the Shapiro-Wilk test, the K-S test is more versatile and can be applied to larger sample sizes, making it useful in studies with substantial data. The test calculates the largest difference between the cumulative distribution of the sample and the cumulative distribution of the specified theoretical distribution. A high p-value suggests the sample aligns well with the specified distribution, while a low p-value indicates a significant departure from it.
Example 1: Patient glucose levels in a diabetes study
Rationale:
Assessing the distribution of fasting glucose levels helps decide between parametric or non-parametric methods, given glucose data often has skewness.
Use Case:
Analyzing fasting glucose levels of 100 diabetes patients for normality.
Assumptions:
- The data is continuous and independent.
- The sample is randomly drawn from the population.
- The K-S test compares the sample distribution to a normal distribution (or another specified distribution), assuming no specific parametric form for the underlying distribution.
Hypotheses:
Null Hypothesis (H₀): The distribution of glucose levels in glucose_ks$glucose is not significantly different from a normal distribution.
Alternative Hypothesis (H₁): The distribution of glucose levels in glucose_ks$glucose significantly differs from a normal distribution.
Dataset
Visualizations:
# Histogram
ggplot(glucose_ks, aes(x = glucose)) +
geom_histogram(aes(y = ..density..), color = "black", fill = "purple", bins = 15) +
geom_density(alpha = 0.2, fill = "violet") +
ggtitle("Histogram of Fasting Glucose Levels") +
kellz_theme# QQ Plot
ggplot(glucose_ks, aes(sample = glucose)) +
stat_qq() + stat_qq_line() +
ggtitle("QQ Plot for Fasting Glucose Levels") +
kellz_themeOutput
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: glucose_ks$glucose
## D = 0.12178, p-value = 0.103
## alternative hypothesis: two-sided
Interpretation:
The asymptotic one-sample Kolmogorov-Smirnov test examining the normality of glucose levels suggests that the data does not significantly deviate from a normal distribution (D = 0.0844, p = 0.4748). Since the p-value is greater than .05, we fail to reject the null hypothesis that the glucose values follow a normal distribution.
Example 2: Heart rates during exercise
Rationale:
Testing normality in heart rate during exercise helps determine if the data can be analyzed with parametric tests.
Use Case:
Normality assessment of exercise heart rates in 80 patients.
Assumptions:
- The heart rate data is continuous and consists of independent observations.
- The sample is representative of the population of interest and is randomly drawn.
- The K-S test assesses whether the sample distribution matches a normal distribution without assuming a parametric form for the distribution.
Hypotheses:
Null Hypothesis (H₀): The distribution of heart rate in heart_rate_ks$heart_rate is not significantly different from a normal distribution.
Alternative Hypothesis (H₁): The distribution of heart rate in heart_rate_ks$heart_rate significantly differs from a normal distribution.
Dataset:
Visualizations:
# Histogram
ggplot(heart_rate_ks, aes(x = heart_rate)) +
geom_histogram(aes(y = ..density..), color = "black", fill = "blue", bins = 15) +
geom_density(alpha = 0.2, fill = "lightblue") +
ggtitle("Histogram of Heart Rates During Exercise") +
kellz_theme# QQ Plot
ggplot(heart_rate_ks, aes(sample = heart_rate)) +
stat_qq() + stat_qq_line() +
ggtitle("QQ Plot for Heart Rates") +
kellz_themeOutput
ks.test(heart_rate_ks$heart_rate, "pnorm",
mean = mean(heart_rate_ks$heart_rate),
sd = sd(heart_rate_ks$heart_rate))##
## Exact one-sample Kolmogorov-Smirnov test
##
## data: heart_rate_ks$heart_rate
## D = 0.061516, p-value = 0.9042
## alternative hypothesis: two-sided
Interpretation:
The exact one-sample Kolmogorov-Smirnov test examining the normality of heart rate suggests that the data does not significantly deviate from a normal distribution (D = 0.0985, p = 0.3938). Since the p-value is greater than .05, we fail to reject the null hypothesis that the heart rate values follow a normal distribution.
Assumptions and Handling Violations
For each test, after verifying assumptions (e.g., normality, homoscedasticity, sphericity), we can take specific actions if assumptions are violated:
- Transform the data (e.g., log transformation for non-normality).
- Use nonparametric alternatives (e.g., Mann-Whitney U instead of independent t-test).
- Use robust statistical methods (e.g., Welch’s ANOVA for unequal variances).
1. Parametric Tests
1.1 Independent t-Test
Rationale:
Compares the means of two independent groups.
Use Case:
Comparing the viral loads between two HIV treatment groups.
Assumptions:
- The dependent variable (viral load) is continuous and normally distributed.
- Independent groups.
- Homogeneity of variances.
Hypotheses:
- H₀: The mean viral load is the same for both treatment groups.
- H₁: The mean viral load is different for the two groups.
Dataset:
Visualization:
Output
## Effect sizes were labelled following Cohen's (1988) recommendations.
##
## The Welch Two Sample t-test testing the difference of viral_load by
## treatment_group (mean in group Drug_A = 201.72, mean in group Drug_B = 186.59)
## suggests that the effect is positive, statistically not significant, and small
## (difference = 15.13, 95% CI [-2.18, 32.44], t(96.45) = 1.73, p = 0.086; Cohen's
## d = 0.35, 95% CI [-0.05, 0.75])
1.2 Paired t-Test
Rationale:
Compares the means of the same group measured twice (pre- and post-treatment).
Use Case:
Measuring antibody levels before and after vaccination in the same patient group.
Assumptions:
- Normally distributed differences.
- Paired samples.
Hypotheses:
- H₀: No difference in mean antibody levels before and after vaccination.
- H₁: A difference in antibody levels exists before and after vaccination.
Dataset:
set.seed(456)
data_paired <- tibble(
patient_id = 1:50,
before_vaccine = rnorm(50, mean = 120, sd = 15),
after_vaccine = rnorm(50, mean = 145, sd = 20)
)
# Reshape the data for plotting
data_long <- data_paired |>
pivot_longer(cols = c(before_vaccine, after_vaccine),
names_to = "Vaccine_status",
values_to = "antibody_levels")
# Set factor levels for ordering
data_long$Vaccine_status <- factor(data_long$Vaccine_status, levels = c("before_vaccine", "after_vaccine"))
datatable(
data_paired
)Visualization
data_long |>
ggstatsplot::ggwithinstats(x = Vaccine_status, y = antibody_levels, type = "parametric") +
kellz_themeOutput:
## Effect sizes were labelled following Cohen's (1988) recommendations.
##
## The Paired t-test testing the difference between data_paired$after_vaccine and
## data_paired$before_vaccine (mean difference = 24.66) suggests that the effect
## is positive, statistically significant, and large (difference = 24.66, 95% CI
## [17.82, 31.51], t(49) = 7.24, p < .001; Cohen's d = 1.02, 95% CI [0.68, 1.36])
Interpretation:
The paired t-test revealed a significant increase in antibody levels after vaccination. The positive mean difference (17.68) and the narrow confidence interval (10.85 to 24.51) suggest a reliable and meaningful effect. The medium effect size (Cohen’s d = 0.74) further supports the practical significance of the observed change in antibody levels, indicating a substantial improvement due to the vaccination.
1.3 One-Way ANOVA
Rationale:
Compares means of three or more independent groups.
Use Case:
Evaluating bacterial growth under three antibiotic treatments.
Assumptions:
- The dependent variable (bacterial growth) is normally distributed.
- Independent groups.
- Homogeneity of variances.
Hypotheses:
- H₀: The means are the same across all groups.
- H₁: At least one group mean is different.
Dataset:
Visualization:
data_anova |>
ggstatsplot::ggbetweenstats(x = antibiotic, y = bacterial_growth, type = "parametric") +
kellz_themeOutput:
## The ANOVA (formula: bacterial_growth ~ antibiotic) suggests that:
##
## - The main effect of antibiotic is statistically significant and large (F(2,
## 87) = 22.20, p < .001; Eta2 = 0.34, 95% CI [0.20, 1.00])
##
## Effect sizes were labelled following Field's (2013) recommendations.
Interpretation:
The one-way ANOVA revealed a significant difference in bacterial growth across the antibiotic groups (F(2, 87) = 23.96, p < .001). This indicates that at least one antibiotic has a different effect on bacterial growth compared to the others. The large effect size (Eta2 = 0.36) suggests that the differences are substantial and not just statistically significant, implying a meaningful practical impact. Further analysis, such as post-hoc tests such as Tukey’s HSD, would be necessary to identify which specific antibiotic groups differ significantly from each other.
1.4 Two-Way ANOVA
Rationale:
Compares the means of groups classified by two factors.
Use Case:
Evaluating interaction between drug dosage and time on immune response.
Assumptions:
- The dependent variable (immune response) is continuous and normally distributed.
- Independent groups for each factor.
Hypotheses:
- H₀: No interaction between drug dosage and time.
- H₁: There is an interaction between drug dosage and time.
Dataset:
Visualization:
data_twoway_anova |>
ggstatsplot::ggbetweenstats(x = dosage, y = immune_response, fill = time,
title = "Immune Response by Dosage and Time",
xlab = "Dosage", ylab = "Immune Response") +
kellz_themedata_twoway_anova |>
grouped_ggbetweenstats(
x = dosage,
y = immune_response,
grouping.var = time,
type = "parametric",
pairwise.display = "all",
p.adjust.method = "fdr", ## adjust p-values for multiple tests using this method
package = "ggsci",
palette = "default_jco",
plotgrid.args = list(nrow = 2)
) Output:
## The ANOVA (formula: immune_response ~ dosage * time) suggests that:
##
## - The main effect of dosage is statistically not significant and small (F(2,
## 86) = 1.62, p = 0.204; Eta2 (partial) = 0.04, 95% CI [0.00, 1.00])
## - The main effect of time is statistically not significant and very small (F(1,
## 86) = 6.65e-03, p = 0.935; Eta2 (partial) = 7.74e-05, 95% CI [0.00, 1.00])
##
## Effect sizes were labelled following Field's (2013) recommendations.
Interpretation:
The two-way ANOVA revealed no significant interaction between drug dosage and time (F(2, 86) = 0.82, p = 0.445; Eta2 (partial) = 0.02, 95% CI [0.00, 1.00]). This indicates that the effects of dosage and time on immune response are independent of each other. Additionally, neither dosage (F(2, 86) = 0.82, p = 0.445; Eta2 (partial) = 0.02, 95% CI [0.00, 1.00]) nor time (F(1, 86) = 0.06, p = 0.809; Eta2 (partial) = 6.86e-04, 95% CI [0.00, 1.00]) has a significant main effect on immune response. The small effect sizes for both dosage and time further support the conclusion that these factors are not major determinants of immune response in this analysis.
1.5 Repeated Measures ANOVA
Rationale:
Used for measuring the same subjects multiple times.
Use Case:
Measuring cytokine levels over time after drug treatment.
Assumptions:
- Sphericity (Mauchly’s test).
- Normality of dependent variable.
Hypotheses:
- H₀: No change in cytokine levels over time.
- H₁: Cytokine levels change over time.
Dataset:
Visualization:
# Create the dataset
data_rmanova <- tibble(
patient_id = rep(1:30, each = 4),
time = factor(rep(c("Pre", "1_hour", "6_hours", "24_hours"), times = 30), levels = c("Pre", "1_hour", "6_hours", "24_hours")),
cytokine_levels = rnorm(120, mean = 50, sd = 15)
)
# Plot the data
data_rmanova |>
ggstatsplot::ggwithinstats(x = time, y = cytokine_levels) +
kellz_themeOutput:
# Create the dataset
data_rmanova <- tibble(
patient_id = rep(1:30, each = 4),
time = rep(c("Pre", "1_hour", "6_hours", "24_hours"), times = 30),
cytokine_levels = rnorm(120, mean = 50, sd = 15)
)
# Convert 'time' to a factor
data_rmanova$time <- factor(data_rmanova$time, levels = c("Pre", "1_hour", "6_hours", "24_hours"))
# Run repeated measures ANOVA
anova_results <- aov_ez(
id = "patient_id",
dv = "cytokine_levels",
within = "time",
data = data_rmanova
)
# Print the results
summary(anova_results)## Warning in summary.Anova.mlm(object$Anova, multivariate = FALSE): HF eps > 1
## treated as 1
##
## Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
##
## Sum Sq num Df Error SS den Df F value Pr(>F)
## (Intercept) 321474 1 4841 29 1925.7959 <2e-16 ***
## time 300 3 19358 87 0.4498 0.7181
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Mauchly Tests for Sphericity
##
## Test statistic p-value
## time 0.91265 0.77149
##
##
## Greenhouse-Geisser and Huynh-Feldt Corrections
## for Departure from Sphericity
##
## GG eps Pr(>F[GG])
## time 0.94729 0.7077
##
## HF eps Pr(>F[HF])
## time 1.060931 0.7180904
Interpretation:
The repeated measures ANOVA revealed no significant change in cytokine levels over time (F(3, 87) = 0.1473, p = 0.9312). This suggests that the observed variations in cytokine levels across the different time points are likely due to factors other than a systematic change over time. The lack of a significant effect of time indicates that the cytokine levels remained relatively stable throughout the study period.
1.6 Pearson’s Correlation Coefficient
Rationale:
Measures the strength and direction of the linear relationship between two continuous variables.
Use Case:
Examines whether there is a significant relationship between patients’ systolic blood pressure and their cholesterol levels. If the two variables have a linear relationship, this test can quantify the strength of that relationship.
Assumptions
- The two variables are continuous.
- There is a linear relationship between the two variables.
- The data for both variables are normally distributed.
- Homoscedasticity: the spread of residuals (differences between observed and predicted values) is consistent for all predicted values.
Hypotheses:
H₀: There is no linear correlation between the two variables (r = 0). H₁: There is a significant linear correlation between the two variables (r ≠ 0).
Dataset:
Visualization:
data_pearson |>
ggstatsplot::ggscatterstats(x = systolic_bp, y = cholesterol, type = "parametric") +
kellz_themeOutput:
cor.test(data_pearson$systolic_bp, data_pearson$cholesterol,
method = "pearson",
exact = F) |>
report()## Effect sizes were labelled following Funder's (2019) recommendations.
##
## The Pearson's product-moment correlation between data_pearson$systolic_bp and
## data_pearson$cholesterol is negative, statistically not significant, and tiny
## (r = -0.05, 95% CI [-0.24, 0.15], t(98) = -0.49, p = 0.625)
2. Nonparametric Tests
2.1 Mann-Whitney U Test
Rationale:
Nonparametric test to compare two independent groups.
Use Case:
Comparing viral load suppression between two HIV treatments when data is not normally distributed.
Assumptions:
- Ordinal or continuous data.
- Independent groups.
- No normality assumption.
Hypotheses:
- H₀: No difference in viral load distribution.
- H₁: A difference in viral load distribution exists.
Dataset:
Visualization:
data_mw |>
ggstatsplot::ggbetweenstats(x = treatment_group, y = viral_load, type = "nonparametric") +
kellz_themeOutput:
# Perform the Mann-Whitney U Test
test_result <- wilcox.test(data_mw$viral_load ~ data_mw$treatment_group)
report(test_result)## Effect sizes were labelled following Funder's (2019) recommendations.
##
## The Wilcoxon rank sum test with continuity correction testing the difference in
## ranks between data_mw$viral_load and data_mw$treatment_group suggests that the
## effect is negative, statistically not significant, and very small (W = 1157.00,
## p = 0.524; r (rank biserial) = -0.07, 95% CI [-0.29, 0.15])
Interpretation:
The test statistic W=1157.00 and the p-value p=0.524 indicate that there is no statistically significant difference in viral load distribution between the two treatment groups. The p-value is greater than the common significance level of 0.05, so we fail to reject the null hypothesis.
The effect size, represented by the rank biserial correlation r=−0.07, suggests a very small effect, implying that any difference in viral load between the groups is negligible. The confidence interval for the effect size [−0.29,0.15] ncludes zero, further supporting the conclusion that there is no meaningful difference.
In summary, there is no significant evidence to suggest that one treatment is more effective than the other in terms of viral load suppression in this dataset.
2.2 Wilcoxon Signed-Rank Test
Rationale:
Nonparametric alternative to the paired t-test.
Use Case:
Comparing pre- and post-treatment viral loads in patients when the data is not normally distributed.
Assumptions:
- Paired samples.
- Ordinal or continuous data.
- No normality assumption.
Hypotheses:
- H₀: The median difference in viral loads before and after treatment is zero.
- H₁: The median difference is not zero.
Dataset:
set.seed(123) # For reproducibility
data_wilcox <- tibble(
patient_id = 1:30,
pre_treatment = rnorm(30, mean = 200, sd = 50),
post_treatment = rnorm(30, mean = 180, sd = 50)
)
# Reshape the data for plotting
data_long <- data_wilcox %>%
pivot_longer(cols = c(pre_treatment, post_treatment),
names_to = "Treatment_period",
values_to = "Viral_load")
# Set factor levels for ordering
data_long$Treatment_period <- factor(data_long$Treatment_period, levels = c("pre_treatment", "post_treatment"))
datatable(
data_long
)Visualization:
ggstatsplot::ggbetweenstats(
data = data_long,
x = Treatment_period,
y = Viral_load,
type = "np", # Nonparametric test
paired = TRUE
) +
kellz_themeOutput:
# Perform the Wilcoxon Signed-Rank Test
wilcox_test_result <- wilcox.test(
data_wilcox$pre_treatment,
data_wilcox$post_treatment,
paired = TRUE
)
report(wilcox_test_result)## Effect sizes were labelled following Funder's (2019) recommendations.
##
## The Wilcoxon signed rank exact test testing the difference in ranks between
## data_wilcox$pre_treatment and data_wilcox$post_treatment suggests that the
## effect is positive, statistically not significant, and small (W = 257.00, p =
## 0.626; r (rank biserial) = 0.11, 95% CI [-0.30, 0.47])
Interpretation:
The Wilcoxon signed-rank test revealed no significant difference in the median viral load before and after treatment (W = 257.00, p = 0.626). This suggests that the observed changes in viral load are not statistically significant. The small effect size (r = 0.11) and the confidence interval (-0.30, 0.47) that includes zero further support the conclusion that there is no meaningful difference in viral load between the pre-treatment and post-treatment measurements.
2.3 Kruskal-Wallis Test
Rationale:
Nonparametric alternative to ANOVA for comparing more than two groups.
Use Case:
Comparing immune response across three treatment groups when the data is non-normally distributed.
Assupmtions:
- Independence: Observations are independent of each other. Ordinal Data: The test is appropriate for ordinal data or continuous data that is not normally distributed.
- Similar Shape Distributions: The distributions of the groups should have similar shapes.
Hypotheses:
- H₀: There is no difference in the median immune response across the treatment groups.
- H₁: There is a difference in the median immune response across the treatment groups.
Dataset:
Visualization:
data_kw |>
ggstatsplot::ggbetweenstats(x = treatment_group, y = immune_response, type = "nonparametric") +
kellz_themeOutput:
kruskal_immune_trt <- kruskal.test(data_kw$immune_response ~ data_kw$treatment_group)
report(kruskal_immune_trt)## Effect sizes were labelled following Field's (2013) recommendations.
##
## The Kruskal-Wallis rank sum test testing the difference in ranks between
## data_kw$immune_response and data_kw$treatment_group suggests that the effect is
## statistically not significant, and small (Kruskal-Wallis chi2 = 1.11, p =
## 0.575; Epsilon squared (rank) = 0.01, 95% CI [1.61e-03, 1.00])
Interpretation:
The Kruskal-Wallis rank sum test revealed a significant difference in the median immune response across the treatment groups (Kruskal-Wallis chi2 = 9.07, p = 0.011). This indicates that at least one treatment group has a different median immune response compared to the others. The medium effect size (Epsilon squared (rank) = 0.10) suggests that the observed differences are moderately meaningful, implying a practical impact beyond statistical significance. Further analysis, such as post-hoc tests such as Dunns test, would be necessary to identify which specific treatment groups differ significantly from each other.
2.4 Friedman Test
Rationale:
Nonparametric alternative to repeated measures ANOVA.
Use Case:
Measuring cytokine levels at multiple time points in patients, where the data is non-normally distributed.
Assumptions:
- Repeated measures on the same subjects.
- Ordinal or continuous data.
- No normality assumption.
Hypotheses:
- H₀: There is no change in cytokine levels over time.
- H₁: There is a change in cytokine levels over time.
Dataset:
set.seed(444)
data_friedman <- tibble(
patient_id = rep(1:30, each = 4),
time = rep(c("Pre", "1_hour", "6_hours", "24_hours"), times = 30),
cytokine_levels = rnorm(120, mean = 80, sd = 25)
)
# Convert 'time' to a factor
data_friedman$time <- factor(data_friedman$time, levels = c("Pre", "1_hour", "6_hours", "24_hours"))
datatable(
data_friedman
)Visualization:
data_friedman %>%
ggstatsplot::ggwithinstats(x = time, y = cytokine_levels, type = "nonparametric") +
kellz_themeOutput:
# Convert the data to a wide format for the Friedman test
data_friedman_wide <- data_friedman |>
pivot_wider(names_from = time, values_from = cytokine_levels)
# Perform the Friedman test
friedman.test(as.matrix(data_friedman_wide[,-1])) |>
report()## Effect sizes were labelled following Landis' (1977) recommendations.
##
## The Friedman rank sum test testing the difference in rank for
## as.matrix(data_friedman_wide[, -1]) and true location of 0 suggests that the
## effect is statistically not significant, and in slight agreement (chi2 = 2.28,
## p = 0.516; Kendall's W = 0.03, 95% CI [8.44e-03, 1.00])
Interpretation:
The Friedman rank sum test revealed no significant difference in cytokine levels across the different time points (chi2 = 2.28, p = 0.516). This suggests that the observed variations in cytokine levels are not statistically significant. The small effect size (Kendall’s W = 0.03) and the confidence interval (8.44e-03, 1.00) that includes zero further support the conclusion that there is no meaningful change in cytokine levels over time.
2.5 Spearman’s Rank Correlation
Rationale:
Nonparametric alternative to Pearson’s correlation. Used when data are ordinal or not normally distributed.
Use Case:
Evaluating the correlation between patient age and immune response after treatment when the data is not normally distributed.
Assumptions:
- Ordinal or continuous variables.
- Monotonic relationship between variables.
Hypotheses:
- H₀: There is no correlation between age and immune response.
- H₁: There is a correlation between age and immune response.
Dataset:
Visualization:
data_spearman |>
ggstatsplot::ggscatterstats(x = age, y = immune_response, type = "nonparametric") +
kellz_theme## `stat_xsidebin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_ysidebin()` using `bins = 30`. Pick better value with `binwidth`.
Output:
## Warning in cor.test.default(data_spearman$age, data_spearman$immune_response, :
## Cannot compute exact p-value with ties
## Effect sizes were labelled following Funder's (2019) recommendations.
##
## The Spearman's rank correlation rho between data_spearman$age and
## data_spearman$immune_response is negative, statistically not significant, and
## tiny (rho = -0.02, S = 1.71e+05, p = 0.806)
Interpretation:
The Spearman’s rank correlation analysis revealed a weak positive association between age and immune response (rho = 0.11). However, this relationship is not statistically significant (p = 0.255), suggesting that the observed association is likely due to chance or other factors. The small effect size (rho = 0.11) further supports the conclusion that there is no meaningful correlation between age and immune response in this dataset.
2.6 Pearson’s Chi-Squared Test
Rationale:
Used to test the association between two categorical variables.
Use Case:
Testing the association between antibiotic resistance (yes/no) and infection type (bacterial/viral).
Assumptions:
- Categorical data.
- Expected frequency > 5 in each cell.
Hypotheses:
- H₀: No association between antibiotic resistance and infection type.
- H₁: There is an association between antibiotic resistance and infection type.
Dataset:
Visualization:
data_chisq |>
ggbarstats(
x = antibiotic_resistance,
y = infection_type,
digits.perc = 1,
label = "both"
)Output:
chi_infection_type <- chisq.test(data_chisq$infection_type, data_chisq$antibiotic_resistance)
report(chi_infection_type)## Effect sizes were labelled following Funder's (2019) recommendations.
##
## The Pearson's Chi-squared test with Yates' continuity correction of
## independence between data_chisq$infection_type and suggests that the effect is
## statistically not significant, and tiny (chi2 = 0.11, p = 0.739; Adjusted
## Cramer's v = 0.00, 95% CI [0.00, 1.00])
Interpretation
The Pearson’s Chi-squared test with Yates’ continuity correction revealed no significant association between antibiotic resistance and infection type (chi2 = 0.00, p > .999). This suggests that the observed relationship is likely due to chance or other factors and does not represent a meaningful association between the two variables. The tiny effect size (adjusted Cramer’s v = 0.00) and the confidence interval (0.00, 1.00) that includes zero further support this conclusion.
2.7 Fisher’s Exact Test
Rationale:
Used when sample sizes are small, and Pearson’s chi-squared test assumptions are not met.
Use Case:
Testing the association between vaccination status (vaccinated/not vaccinated) and infection outcome (infected/not infected) in a small sample.
Assumptions:
- Small sample sizes.
- Categorical variables.
Hypotheses:
- H₀: No association between vaccination status and infection outcome.
- H₁: There is an association between vaccination status and infection outcome.
Dataset:
Visualization:
data_fisher |>
ggbarstats(
x = infection_outcome,
y = vaccination_status,
digits.perc = 1,
label = "both"
)Output:
fisher_infection_type <- fisher.test(data_fisher$infection_outcome, data_fisher$vaccination_status)
report(fisher_infection_type)## Effect sizes were labelled following Funder's (2019) recommendations.
##
## The Fisher's Exact Test for Count Data testing the association between the
## variables of the data_fisher$infection_outcome and
## data_fisher$vaccination_status dataset suggests that the effect is
## statistically not significant, and tiny (odds ratio = 0.51, 95% CI [0.06,
## 3.42], p = 0.682; Adjusted Cramer's v = 0.00, 95% CI [0.00, 0.48])
Interpretation:
The Fisher’s Exact Test revealed no significant association between vaccination status and infection outcome (odds ratio = 0.51, p = 0.682). This suggests that the observed relationship is likely due to chance or other factors and does not represent a meaningful association between the two variables. The tiny effect size (adjusted Cramer’s v = 0.00) and the confidence interval (0.00, 0.48) that includes zero further support this conclusion.
Conclusion
In this tutorial, we explored a comprehensive range of statistical tests used in medical research, distinguishing between parametric and non-parametric methods. We began with essential parametric tests such as the independent t-test, ANOVA etc, discussing their assumptions, and followed up with non-parametric counterparts like the Mann-Whitney U test, Kruskal-Wallis test, and Friedman test, which are appropriate when parametric assumptions (e.g., normality) are not met. Additionally, advanced methods were highlighted for complex research designs.
A key takeaway from this tutorial is the importance of testing assumptions before applying any statistical method. We reviewed how normality tests like the Shapiro-Wilk, and Kolmogorov-Smirnov tests help assess whether data follows a normal distribution. When assumptions such as normality, homogeneity of variance, or linearity are violated, switching to non-parametric methods or using data transformations is crucial to ensure the validity of the results.
Moreover, we discussed post hoc tests for multiple comparisons, like Tukey’s HSD, which are essential when significant differences are detected in ANOVA. Visualization techniques using ggplot2 and ggstatsplot were emphasized throughout the session, showing how data visualization is key to communicating findings effectively.
A significant addition to this seminar is the inclusion of permutation tests, a flexible, non-parametric approach that is invaluable when traditional parametric test assumptions are not satisfied or when sample sizes are small. Permutation tests do not rely on distributional assumptions, making them particularly useful in fields such as virology or immunology where data often violate normality assumptions. By generating a distribution based on the data itself, permutation tests allow for robust comparisons between groups, correlations, or model coefficients.
In conclusion, the tutorial covered a wide array of statistical tests relevant to medical research, detailing when and how to use each test, checking assumptions, and the steps to take if these assumptions are violated. Whether analysing the effectiveness of treatments, investigating disease progression, or exploring associations between clinical biomarkers, the statistical methods discussed here equip researchers with the tools to make informed decisions. By applying the right statistical tests in conjunction with proper visualizations, we can ensure the integrity and clarity of research findings in the medical and public health domains.
About
KP Analytics Insights (Pty) Ltd is a reputable data-driven research and consulting company based in South Africa. Our primary objective is to facilitate evidence-based decision-making in the field of medical science by offering valuable insights and innovative solutions. We cater to a diverse range of clients, including postgraduate students, independent researchers, medical scientists, as well as organizations and institutions seeking comprehensive statistical analysis, interpretation, reporting, and actionable insights.
At KP Analytics Insights, our portfolio encompasses a wide array of projects focused on sexually transmitted infections (STIs), human papillomavirus (HPV) research, HIV-1 drug resistance, HBV drug resistance, and the enhancement of laboratory workflows through the implementation of statistical quality control methods. Moreover, we actively contribute to the advancement of medical research through rigorous peer reviews, publication of scholarly articles, and the development of interactive dashboards.
Visit our website KP Analytics Insights (Pty) Ltd to gain a comprehensive understanding of how our expertise can assist you in harnessing the power of data-driven insights. Let us be your partner in transforming data into knowledge.