# Install necessary packages (if you haven't already)
#install.packages("tidyverse") # Include ggplot2 (for plotting) and dplyr (for data manipulation)
#install.packages("rstatix") # For Shapiro-Wilk test (more robust version)
#install.packages("gridExtra") # For creating multiple plots on a grid
#install.packages("car") # For using Levene's Test (to test for equal variances)
#install.packages("dunn.test") # For using Dunn's Test (non-parameteric post-hoc test)
# Load necessary libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(rstatix)
##
## Attaching package: 'rstatix'
##
## The following object is masked from 'package:stats':
##
## filter
library(gridExtra)
##
## Attaching package: 'gridExtra'
##
## The following object is masked from 'package:dplyr':
##
## combine
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
##
## The following object is masked from 'package:dplyr':
##
## recode
##
## The following object is masked from 'package:purrr':
##
## some
library(dunn.test)
Check for Normality:
For each tick survival parameter and each vaccine group, assess whether the data is approximately normally distributed. Use the Shapiro-Wilk test.
Note: With only 4 values per group, interpret the Shapiro-Wilk test results cautiously, as the test has low power. Supplement with visual inspection (histograms, Q-Q plots) if feasible, but understand that visual inspection can also be unreliable with such small samples
Check for Equality of Variances:
For each tick survival parameter, assess whether the variances are approximately equal across the vaccine groups. Use Levene’s test.
Note: Similar to normality, the test for equal variances also has limited power with small sample sizes.
Choose the appropriate test:
Case 1: Approximately Normal Data & Equal Variances: Use ANOVA followed by a post-hoc test (Tukey’s HSD if comparing all pairs, Dunnett’s test if comparing to a control group).
Case 2: Data Not Approximately Normal: Use the Kruskal-Wallis test followed by a post-hoc test (Dunn’s test or Mann-Whitney U with Bonferroni correction).
Case 3: Approximately Normal Data & Unequal Variances: Use Welch’s ANOVA followed by Games-Howell.
Perform the test.
Interpret the results:
Examine the p-values associated with the main test (ANOVA, Kruskal-Wallis, Welch’s ANOVA). A p-value below your chosen significance level (e.g., 0.05) indicates a statistically significant difference between at least some of the groups.
If the main test is significant, examine the p-values from the post-hoc test to determine which specific groups differ significantly from each other.
Report the findings:
Clearly state which statistical tests were used (including the normality and equal variance tests).
Report the test statistics (e.g., F-statistic for ANOVA, chi-squared statistic for Kruskal-Wallis), degrees of freedom, and p-values for both the main test and the post-hoc tests.
Clearly state your conclusions, being careful not to overstate the significance of your findings given the small sample size.
Acknowledge the limitations of your analysis due to the small number of replicates per group.
Small Sample Size: You only have 4 animals per group. This limits the statistical power of your tests (makes it harder to find significant differences).
High Variability: The high variability within groups further reduces statistical power.
Note: Non-parametric tests are generally more robust to violations of assumptions when sample sizes are small (e.g. Kruskal Wallis rather than ANOVA).
Note that I have transposed and cleaned up the data so that it is easy to use in R.
setwd("/Users/nanette/Documents/Stats_help/Christian_2025/")
data <- read_tsv("Trial1_all_data_per_Bovine.txt")
## Rows: 16 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (3): Tag, Group, Rep
## dbl (9): Number_females, Weight_females, Egg_weight, Efficacy, Bm86_IgG, OD_...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Interpretation: If the p-value is less than 0.05 (or your chosen alpha level), you reject the null hypothesis of normality, suggesting the data is likely not normally distributed.
column_name <- "Number_females"
data %>%
group_by(Group) %>%
shapiro_test(column_name)
## # A tibble: 4 × 4
## Group variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 Bm86 Number_females 0.873 0.311
## 2 Combi Number_females 0.868 0.290
## 3 Combi.CpG Number_females 0.925 0.563
## 4 control Number_females 0.887 0.368
Conclusion: “Total # engorged female ticks collected” likely normally distributed.
column_name <- "Weight_females"
data %>%
group_by(Group) %>%
shapiro_test(column_name)
## # A tibble: 4 × 4
## Group variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 Bm86 Weight_females 0.995 0.982
## 2 Combi Weight_females 0.928 0.581
## 3 Combi.CpG Weight_females 0.880 0.340
## 4 control Weight_females 0.998 0.994
Conclusion: “Mean weight engorged females” likely normally distributed.
column_name <- "Egg_weight"
data %>%
group_by(Group) %>%
shapiro_test(column_name)
## # A tibble: 4 × 4
## Group variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 Bm86 Egg_weight 0.998 0.995
## 2 Combi Egg_weight 0.910 0.484
## 3 Combi.CpG Egg_weight 0.939 0.651
## 4 control Egg_weight 0.875 0.319
Conclusion: “Mean egg weight” likely normally distributed.
column_name <- "Efficacy"
data %>%
filter(Group != "control") %>% # Exclude rows where Group is "control"
group_by(Group) %>%
shapiro_test(column_name)
## # A tibble: 3 × 4
## Group variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 Bm86 Efficacy 0.812 0.124
## 2 Combi Efficacy 0.900 0.432
## 3 Combi.CpG Efficacy 0.945 0.687
Conclusion: “Calculated efficacy %” likely normally distributed.
qqplot_ggplot <- function(data, group_name, column_name) {
# Calculate theoretical quantiles (normal distribution)
theoretical_quantiles <- qnorm(ppoints(nrow(data)))
# Order the sample data
sample_quantiles <- sort(data[[column_name]])
# Create a data frame for plotting
qq_data <- data.frame(theoretical = theoretical_quantiles,
sample = sample_quantiles)
# Create the ggplot QQ plot
ggplot(qq_data, aes(x = theoretical, y = sample)) +
geom_point() +
geom_abline(intercept = mean(sample_quantiles), slope = sd(sample_quantiles), color = "red") + # Add a line to show the distribution
labs(title = paste("QQ Plot for Group", group_name),
x = "Theoretical Quantiles (Normal)",
y = "Sample Quantiles") +
theme_bw()
}
column_name <- "Number_females"
# Get unique group names
group_names <- unique(data$Group)
# Create a list to store the QQ plots
qq_plots <- list()
# Loop through groups and create QQ plots
for (i in seq_along(group_names)) {
group_name <- group_names[i]
# Subset the data for the current group
group_data <- data %>% filter(Group == group_name)
# Create the QQ plot using the custom function
qq_plot <- qqplot_ggplot(group_data, group_name, column_name)
# Store the plot in the list
qq_plots[[i]] <- qq_plot
}
# Arrange the plots in a grid
grid.arrange(grobs = qq_plots, nrow = 2, ncol = 2) # Adjust nrow and ncol as needed
Conclusion: “Total # engorged female ticks collected” likely normally distributed.
column_name <- "Weight_females"
# Get unique group names
group_names <- unique(data$Group)
# Create a list to store the QQ plots
qq_plots <- list()
# Loop through groups and create QQ plots
for (i in seq_along(group_names)) {
group_name <- group_names[i]
# Subset the data for the current group
group_data <- data %>% filter(Group == group_name)
# Create the QQ plot using the custom function
qq_plot <- qqplot_ggplot(group_data, group_name, column_name)
# Store the plot in the list
qq_plots[[i]] <- qq_plot
}
# Arrange the plots in a grid
grid.arrange(grobs = qq_plots, nrow = 2, ncol = 2) # Adjust nrow and ncol as needed
Conclusion: “Mean weight engorged females” likely normally distributed.
column_name <- "Egg_weight"
# Get unique group names
group_names <- unique(data$Group)
# Create a list to store the QQ plots
qq_plots <- list()
# Loop through groups and create QQ plots
for (i in seq_along(group_names)) {
group_name <- group_names[i]
# Subset the data for the current group
group_data <- data %>% filter(Group == group_name)
# Create the QQ plot using the custom function
qq_plot <- qqplot_ggplot(group_data, group_name, column_name)
# Store the plot in the list
qq_plots[[i]] <- qq_plot
}
# Arrange the plots in a grid
grid.arrange(grobs = qq_plots, nrow = 2, ncol = 2) # Adjust nrow and ncol as needed
Conclusion: “Mean egg weight” likely normally distributed.
column_name <- "Efficacy"
# Get unique group names
group_names <- unique(data$Group)[-1]
# Create a list to store the QQ plots
qq_plots <- list()
# Loop through groups and create QQ plots
for (i in seq_along(group_names)) {
group_name <- group_names[i]
# Subset the data for the current group
group_data <- data %>% filter(Group == group_name)
# Create the QQ plot using the custom function
qq_plot <- qqplot_ggplot(group_data, group_name, column_name)
# Store the plot in the list
qq_plots[[i]] <- qq_plot
}
# Arrange the plots in a grid
grid.arrange(grobs = qq_plots, nrow = 2, ncol = 2) # Adjust nrow and ncol as needed
Conclusion: “Calculated efficacy %” likely normally distributed.
Note: This test is not very sensitive to the assumption of normality (this is good).
Interpretation: If the p-value is less than 0.05 (or your chosen alpha level), you reject the null hypothesis of equal variances, suggesting the variances between groups are likely not significantly different.
column_name <- "Number_females"
#Assuming you already have the column and variable selected.
data %>%
leveneTest(get(column_name) ~ Group, data = .)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 1.327 0.3114
## 12
Conclusion: “Total # engorged female ticks collected” likely equal variances.
column_name <- "Weight_females"
#Assuming you already have the column and variable selected.
data %>%
leveneTest(get(column_name) ~ Group, data = .)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 0.8778 0.4798
## 12
Conclusion: “Mean weight engorged females” likely normally distributed.
column_name <- "Egg_weight"
#Assuming you already have the column and variable selected.
data %>%
leveneTest(get(column_name) ~ Group, data = .)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 1.8154 0.198
## 12
Conclusion: “Mean egg weight” likely normally distributed.
column_name <- "Efficacy"
#Assuming you already have the column and variable selected.
data %>%
filter(Group != "control") %>% # Exclude rows where Group is "control"
leveneTest(get(column_name) ~ Group, data = .)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 0.2593 0.7772
## 9
Conclusion: “Calculated efficacy %” likely normally distributed.
# Specify the tick survival parameter
column_name <- "Number_females"
# Perform ANOVA
anova_results <- aov(get(column_name) ~ Group, data = data)
summary(anova_results)
## Df Sum Sq Mean Sq F value Pr(>F)
## Group 3 332163 110721 1.941 0.177
## Residuals 12 684669 57056
# Perform Post-Hoc Test (if ANOVA is significant)
TukeyHSD(anova_results)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = get(column_name) ~ Group, data = data)
##
## $Group
## diff lwr upr p adj
## Combi-Bm86 -342.0 -843.4532 159.4532 0.2325104
## Combi.CpG-Bm86 -337.0 -838.4532 164.4532 0.2428334
## control-Bm86 -135.5 -636.9532 365.9532 0.8521070
## Combi.CpG-Combi 5.0 -496.4532 506.4532 0.9999901
## control-Combi 206.5 -294.9532 707.9532 0.6251502
## control-Combi.CpG 201.5 -299.9532 702.9532 0.6424212
# Graphing (Optional but Recommended)
# Boxplot of sample
boxplot(get(column_name) ~ Group, data = data,
main = "Number of Females by Group",
xlab = "Group",
ylab = column_name)
Conclusion: There is no significant differences between means of groups for “Total # engorged female ticks collected”.
# Specify the tick survival parameter
column_name <- "Weight_females"
# Perform ANOVA
anova_results <- aov(get(column_name) ~ Group, data = data)
summary(anova_results)
## Df Sum Sq Mean Sq F value Pr(>F)
## Group 3 1064 354.5 0.699 0.571
## Residuals 12 6088 507.3
# Perform Post-Hoc Test (if ANOVA is significant)
TukeyHSD(anova_results)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = get(column_name) ~ Group, data = data)
##
## $Group
## diff lwr upr p adj
## Combi-Bm86 -19.725 -67.01081 27.56081 0.6158937
## Combi.CpG-Bm86 -0.200 -47.48581 47.08581 0.9999992
## control-Bm86 -10.125 -57.41081 37.16081 0.9184606
## Combi.CpG-Combi 19.525 -27.76081 66.81081 0.6232239
## control-Combi 9.600 -37.68581 56.88581 0.9292425
## control-Combi.CpG -9.925 -57.21081 37.36081 0.9226645
# Graphing (Optional but Recommended)
# Boxplot of sample
boxplot(get(column_name) ~ Group, data = data,
main = "Number of Females by Group",
xlab = "Group",
ylab = column_name)
Conclusion: There is no significant differences between means of groups for “Mean weight engorged females”.
# Specify the tick survival parameter
column_name <- "Egg_weight"
# Perform ANOVA
anova_results <- aov(get(column_name) ~ Group, data = data)
summary(anova_results)
## Df Sum Sq Mean Sq F value Pr(>F)
## Group 3 298 99.37 0.361 0.782
## Residuals 12 3301 275.05
# Perform Post-Hoc Test (if ANOVA is significant)
TukeyHSD(anova_results)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = get(column_name) ~ Group, data = data)
##
## $Group
## diff lwr upr p adj
## Combi-Bm86 -6.825 -41.64159 27.99159 0.9356108
## Combi.CpG-Bm86 -5.400 -40.21659 29.41659 0.9662505
## control-Bm86 3.975 -30.84159 38.79159 0.9859293
## Combi.CpG-Combi 1.425 -33.39159 36.24159 0.9993210
## control-Combi 10.800 -24.01659 45.61659 0.7943865
## control-Combi.CpG 9.375 -25.44159 44.19159 0.8533786
# Graphing (Optional but Recommended)
# Boxplot of sample
boxplot(get(column_name) ~ Group, data = data,
main = "Number of Females by Group",
xlab = "Group",
ylab = column_name)
Conclusion: There is no significant differences between means of groups for “Mean egg weight”.
# Specify the tick survival parameter
column_name <- "Efficacy"
data_no_control <- data %>%
filter(Group != "control") # Exclude rows where Group is "control"
# Perform ANOVA
anova_results <- aov(get(column_name) ~ Group, data = data_no_control)
summary(anova_results)
## Df Sum Sq Mean Sq F value Pr(>F)
## Group 2 4144 2071.8 2.557 0.132
## Residuals 9 7293 810.3
# Perform Post-Hoc Test (if ANOVA is significant)
TukeyHSD(anova_results)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = get(column_name) ~ Group, data = data_no_control)
##
## $Group
## diff lwr upr p adj
## Combi-Bm86 41.7075 -14.49112 97.90612 0.1508811
## Combi.CpG-Bm86 36.6400 -19.55862 92.83862 0.2174754
## Combi.CpG-Combi -5.0675 -61.26612 51.13112 0.9657947
# Graphing (Optional but Recommended)
# Boxplot of sample
boxplot(get(column_name) ~ Group, data = data,
main = "Number of Females by Group",
xlab = "Group",
ylab = column_name)
Conclusion: There is no significant differences between means of groups for “Calculated efficacy %”.
# Specify the column to analyze
column_name <- "Number_females"
# Perform Kruskal-Wallis test
kruskal_results <- kruskal.test(get(column_name) ~ Group, data = data)
print(kruskal_results)
##
## Kruskal-Wallis rank sum test
##
## data: get(column_name) by Group
## Kruskal-Wallis chi-squared = 4.6324, df = 3, p-value = 0.2008
# Perform Dunn's post-hoc test (if Kruskal-Wallis is significant or of interest)
dunn_results <- dunn.test(data[[column_name]], g = data$Group,
method = "bonferroni") #or other correction methods
## Kruskal-Wallis rank sum test
##
## data: x and group
## Kruskal-Wallis chi-squared = 4.6324, df = 3, p-value = 0.2
##
##
## Comparison of x by group
## (Bonferroni)
## Col Mean-|
## Row Mean | Bm86 Combi Combi.Cp
## ---------+---------------------------------
## Combi | 1.633743
## | 0.3069
## |
## Combi.Cp | 1.782265 0.148522
## | 0.2241 1.0000
## |
## control | 0.445566 -1.188177 -1.336699
## | 1.0000 0.7043 0.5440
##
## alpha = 0.05
## Reject Ho if p <= alpha/2
print(dunn_results)
## $chi2
## [1] 4.632353
##
## $Z
## [1] 1.6337434 1.7822656 0.1485221 0.4455664 -1.1881771 -1.3366992
##
## $P
## [1] 0.05115637 0.03735297 0.44096536 0.32795525 0.11738183 0.09066042
##
## $P.adjusted
## [1] 0.3069382 0.2241178 1.0000000 1.0000000 0.7042910 0.5439625
##
## $comparisons
## [1] "Bm86 - Combi" "Bm86 - Combi.CpG" "Combi - Combi.CpG"
## [4] "Bm86 - control" "Combi - control" "Combi.CpG - control"
Conclusion: There is no significant differences between medians of groups for “Total # engorged female ticks collected”.
# Specify the column to analyze
column_name <- "Weight_females"
# Perform Kruskal-Wallis test
kruskal_results <- kruskal.test(get(column_name) ~ Group, data = data)
print(kruskal_results)
##
## Kruskal-Wallis rank sum test
##
## data: get(column_name) by Group
## Kruskal-Wallis chi-squared = 2.2721, df = 3, p-value = 0.5179
# Perform Dunn's post-hoc test (if Kruskal-Wallis is significant or of interest)
dunn_results <- dunn.test(data[[column_name]], g = data$Group,
method = "bonferroni") #or other correction methods
## Kruskal-Wallis rank sum test
##
## data: x and group
## Kruskal-Wallis chi-squared = 2.2721, df = 3, p-value = 0.52
##
##
## Comparison of x by group
## (Bonferroni)
## Col Mean-|
## Row Mean | Bm86 Combi Combi.Cp
## ---------+---------------------------------
## Combi | 1.039654
## | 0.8955
## |
## Combi.Cp | -0.371305 -1.410960
## | 1.0000 0.4748
## |
## control | 0.519827 -0.519827 0.891132
## | 1.0000 1.0000 1.0000
##
## alpha = 0.05
## Reject Ho if p <= alpha/2
print(dunn_results)
## $chi2
## [1] 2.272059
##
## $Z
## [1] 1.0396549 -0.3713053 -1.4109602 0.5198275 -0.5198275 0.8911328
##
## $P
## [1] 0.14925013 0.35520506 0.07912817 0.30159192 0.30159192 0.18642897
##
## $P.adjusted
## [1] 0.8955008 1.0000000 0.4747690 1.0000000 1.0000000 1.0000000
##
## $comparisons
## [1] "Bm86 - Combi" "Bm86 - Combi.CpG" "Combi - Combi.CpG"
## [4] "Bm86 - control" "Combi - control" "Combi.CpG - control"
Conclusion: There is no significant differences between medians of groups for “Mean weight engorged females”.
# Specify the column to analyze
column_name <- "Egg_weight"
# Perform Kruskal-Wallis test
kruskal_results <- kruskal.test(get(column_name) ~ Group, data = data)
print(kruskal_results)
##
## Kruskal-Wallis rank sum test
##
## data: get(column_name) by Group
## Kruskal-Wallis chi-squared = 1.1912, df = 3, p-value = 0.7551
# Perform Dunn's post-hoc test (if Kruskal-Wallis is significant or of interest)
dunn_results <- dunn.test(data[[column_name]], g = data$Group,
method = "bonferroni") #or other correction methods
## Kruskal-Wallis rank sum test
##
## data: x and group
## Kruskal-Wallis chi-squared = 1.1912, df = 3, p-value = 0.76
##
##
## Comparison of x by group
## (Bonferroni)
## Col Mean-|
## Row Mean | Bm86 Combi Combi.Cp
## ---------+---------------------------------
## Combi | 0.594088
## | 1.0000
## |
## Combi.Cp | 0.594088 0.000000
## | 1.0000 1.0000
## |
## control | -0.297044 -0.891132 -0.891132
## | 1.0000 1.0000 1.0000
##
## alpha = 0.05
## Reject Ho if p <= alpha/2
print(dunn_results)
## $chi2
## [1] 1.191176
##
## $Z
## [1] 0.5940885 0.5940885 0.0000000 -0.2970443 -0.8911328 -0.8911328
##
## $P
## [1] 0.2762265 0.2762265 0.5000000 0.3832164 0.1864290 0.1864290
##
## $P.adjusted
## [1] 1 1 1 1 1 1
##
## $comparisons
## [1] "Bm86 - Combi" "Bm86 - Combi.CpG" "Combi - Combi.CpG"
## [4] "Bm86 - control" "Combi - control" "Combi.CpG - control"
Conclusion: There is no significant differences between medians of groups for “Mean egg weight”.
# Specify the column to analyze
column_name <- "Efficacy"
# Perform Kruskal-Wallis test
kruskal_results <- kruskal.test(get(column_name) ~ Group, data = data_no_control)
print(kruskal_results)
##
## Kruskal-Wallis rank sum test
##
## data: get(column_name) by Group
## Kruskal-Wallis chi-squared = 4.1538, df = 2, p-value = 0.1253
# Perform Dunn's post-hoc test (if Kruskal-Wallis is significant or of interest)
dunn_results <- dunn.test(data[[column_name]], g = data$Group,
method = "bonferroni") #or other correction methods
## Kruskal-Wallis rank sum test
##
## data: x and group
## Kruskal-Wallis chi-squared = 4.1538, df = 2, p-value = 0.13
##
##
## Comparison of x by group
## (Bonferroni)
## Col Mean-|
## Row Mean | Bm86 Combi
## ---------+----------------------
## Combi | -1.765045
## | 0.1163
## |
## Combi.Cp | -1.765045 0.000000
## | 0.1163 1.0000
##
## alpha = 0.05
## Reject Ho if p <= alpha/2
print(dunn_results)
## $chi2
## [1] 4.153846
##
## $Z
## [1] -1.765045 -1.765045 0.000000
##
## $P
## [1] 0.03877808 0.03877808 0.50000000
##
## $P.adjusted
## [1] 0.1163343 0.1163343 1.0000000
##
## $comparisons
## [1] "Bm86 - Combi" "Bm86 - Combi.CpG" "Combi - Combi.CpG"
Conclusion: There is no significant differences between medians of groups for “Calculated efficacy %”.