Read the data
data = read_csv("data/allnoise_flight_data-Winter2020.csv")
## Parsed with column specification:
## cols(
## .default = col_double(),
## filename = col_character(),
## trial_type = col_character(),
## chamber = col_character(),
## channel_letter = col_character(),
## test_date = col_date(format = ""),
## time_start = col_time(format = ""),
## time_end = col_time(format = ""),
## sex = col_character(),
## population = col_character(),
## county = col_character(),
## site = col_character(),
## host_plant = col_character(),
## flew = col_character(),
## flight_type = col_logical(),
## NOTES = col_character(),
## EWM = col_character(),
## w_morph = col_character(),
## morph_notes = col_character(),
## tested = col_character()
## )
## See spec(...) for full column specifications.
data
## # A tibble: 16 x 39
## ID filename trial_type chamber channel_letter channel_num set_number
## <dbl> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 176 set004-… T1 B-3 B 3 4
## 2 48 set007-… T1 A-4 A 4 7
## 3 113 set010-… T1 A-4 A 4 10
## 4 54 set010-… T1 A-4 A 4 10
## 5 123 set011-… T2 A-4 A 4 11
## 6 18 set011-… T2 B-2 B 2 11
## 7 187 set011-… T2 B-2 B 2 11
## 8 22 set013-… T2 A-1 A 1 13
## 9 419 set013-… T2 A-1 A 1 13
## 10 153 set013-… T2 A-4 A 4 13
## 11 41 set013-… T2 A-4 A 4 13
## 12 211 set015-… T2 A-1 A 1 15
## 13 31 set015-… T2 A-1 A 1 15
## 14 335 set015-… T2 A-1 A 1 15
## 15 395 set015-… T2 A-1 A 1 15
## 16 108 set016-… T2 B-3 B 3 16
## # … with 32 more variables: average_speed <dbl>, total_flight_time <dbl>,
## # distance <dbl>, shortest_flying_bout <dbl>, longest_flying_bout <dbl>,
## # portion_flying <dbl>, total_duration <dbl>, max_speed <dbl>,
## # test_date <date>, time_start <time>, time_end <time>, duration_check <dbl>,
## # sex <chr>, population <chr>, county <chr>, site <chr>, host_plant <chr>,
## # flew <chr>, flight_type <lgl>, NOTES <chr>, mass <dbl>, EWM <chr>,
## # latitude <dbl>, longitude <dbl>, total_eggs <dbl>, beak <dbl>,
## # thorax <dbl>, wing <dbl>, body <dbl>, w_morph <chr>, morph_notes <chr>,
## # tested <chr>
Summary Stats
# count
data$tested_b <- 0
data$tested_b[data$tested == "yes"] <- 1
tapply(X=data$tested_b, INDEX=data$chamber, FUN=sum, na.rm=T)
## A-1 A-4 B-2 B-3
## 6 6 2 2
# sum of distances
tapply(X=data$distance, INDEX=data$chamber, FUN=sum, na.rm=T)
## A-1 A-4 B-2 B-3
## 283.3633 218.0201 65.3432 101.1563
# mean of distances
tapply(X=data$distance, INDEX=data$chamber, FUN=mean, na.rm=T)
## A-1 A-4 B-2 B-3
## 47.22722 36.33668 32.67160 50.57815
Kruskal-Wallis
# pwc = "Pairwise Wilcoxon test" between groups
pwc <- data %>%
dunn_test(distance ~ chamber, p.adjust.method = "bonferroni")
pwc <- pwc %>% add_xy_position(x = "chamber")
res.kruskal <- data %>% kruskal_test(distance ~ chamber)
res.kruskal
## # A tibble: 1 x 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 distance 16 1.13 3 0.77 Kruskal-Wallis
ggboxplot(data, x = "chamber", y = "distance") +
stat_pvalue_manual(pwc, hide.ns = TRUE) +
labs(
subtitle = get_test_label(res.kruskal, detailed = TRUE),
caption = get_pwc_label(pwc)
)
Kruskal-Wallis (sometimes called the “one-way ANOVA on ranks”) is a rank-based nonparametric test that be used to detmine if there are statistically significant differences between two or more groups of an independent variable (e.g. chamber) or ordinal dependent variable. It is important to notice that the Kruskal-Wallis H test is an omnibus test statistic and cannot tell you which specific groups your independent variable are statistically significantly different from each other; it tells you that at least two groups were different.
Since you may have three or more groups in your study design, determining which of these groups differ from each other is important. You can do this using a post hoc test.
4 Assumptions
0.) Kruskal-Wallis H test does not assume normality.
1.) dependent variable should be measured at teh ordinal or continuous level (i.e. interval or ratio). Yes, we have that - distance is continuous.
2.) independent variable should consist of two or more categorical, independent groups. Yes, chambers are independent groups.
3.) should have independence of observations, which means that there is no relationship between the observations in each group or between the groups themselves.
4.) need to determine whether the distribution in each group have the same shape (which also means the same variability). This will change how you can interpret the results. If they do have the same shape, can use the K-W to compare the medians of your dependent variable. If not, then you can compare the mean ranks.
ggplot(data, aes(x=distance, color=chamber)) +
geom_histogram(fill="white")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Can see that the distributions do not have the same shape.
Can also plot violin plots instead of boxplots to help visualize more their shapes.
pwc <- data %>%
dunn_test(distance ~ chamber, p.adjust.method = "bonferroni")
pwc <- pwc %>% add_xy_position(x = "chamber")
pwc # can see differences between test groups if print pwc. None were significant.
## # A tibble: 6 x 13
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif y.position
## <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 dist… A-1 A-4 6 6 0.364 0.716 1 ns 186.
## 2 dist… A-1 B-2 6 2 1.05 0.293 1 ns 206.
## 3 dist… A-1 B-3 6 2 0.408 0.684 1 ns 225.
## 4 dist… A-4 B-2 6 2 0.794 0.427 1 ns 245.
## 5 dist… A-4 B-3 6 2 0.150 0.881 1 ns 264.
## 6 dist… B-2 B-3 2 2 -0.525 0.599 1 ns 284.
## # … with 3 more variables: groups <named list>, xmin <int>, xmax <int>
ggplot(data, aes(chamber, distance)) +
geom_violin() +
labs(
subtitle = get_test_label(res.kruskal, detailed = TRUE),
caption = get_pwc_label(pwc)
) +
theme(legend.position="none") +
xlab("chamber") +
ylab("distance (m)") +
geom_boxplot(width=0.1) +
stat_pvalue_manual(pwc, hide.ns = TRUE)
Reporting on the results:
A Kruskal-Wallis test was conducted to examine the differences on distance flown (m) by bugs who did not fly according to chamber. No significant differences (Chi-square = 1.13, p = 0.77, df = 3) were found among the four chamber categories.
If there was a difference, then you would say there was a statistically significant difference in distance between the different chambers (….), with mean rank distance of ___ for chamber x, ____ for chamber y, ____ for chamber z, etc.
Sources:
Kruskal-Wallis Test in R. Datanovia. https://www.datanovia.com/en/lessons/kruskal-wallis-test-in-r/.
Kruskal-Wallis H test using SPSS Statistics. Laerd statistics. https://statistics.laerd.com/spss-tutorials/kruskal-wallis-h-test-using-spss-statistics.php.