On the field, measurements were taken from each individual (in mm), respectively:
Three datasets including morphometrics and habitat, helminths and immunological variables of Rattus tanezumi and Rattus R3 have been saved in a RData file
A RData format (with extension .RData) is a format designed for use with R for storing selected “objects” from a workspace in a form that can be loaded back by R
A dataframe is a type of data structures that can be manipulated by R
## tibble [987 × 32] (S3: tbl_df/tbl/data.frame)
## $ id_indiv : chr [1:987] "C0021" "C0022" "C0023" "C0024" ...
## $ taxonomyID : chr [1:987] "Rattus_r3" "Rattus_r3" "Rattus_r3" "Rattus_r3" ...
## $ identificationType : chr [1:987] "coi" "coi" "coi" "coi" ...
## $ sex : chr [1:987] "female" "male" "male" "male" ...
## $ trappingDate : POSIXct[1:987], format: "2008-02-29" "2008-02-29" ...
## $ collector : chr [1:987] "morand" "morand" "morand" "morand" ...
## $ trappingMethod : chr [1:987] "collected" "collected" "collected" "collected" ...
## $ trappingAccuracy : num [1:987] 3 3 3 3 3 3 3 3 1 1 ...
## $ trappingLandscapeLowRes : chr [1:987] "lowland" "lowland" "lowland" "lowland" ...
## $ trappingLandscapeMediumRes: chr [1:987] "rice" "rice" "rice" "rice" ...
## $ id_site : chr [1:987] "veal-renh-0003" "veal-renh-0003" "veal-renh-0003" "veal-renh-0003" ...
## $ isTrappedAlive : chr [1:987] "yes" "yes" "yes" "yes" ...
## $ isDissected : chr [1:987] "yes" "yes" "yes" "yes" ...
## $ bodyWeight : num [1:987] 102 108 103 64 87 111 100 47 107 108 ...
## $ headBodyMeasurement : num [1:987] 156 175 172 137 161 157 158 123 165 165 ...
## $ tailMeasurement : num [1:987] NA 148 138 140 133 166 NA 110 NA 156 ...
## $ hindfootMeasurement : num [1:987] 34 36 33 38 31 34 33 29 29 32 ...
## $ earMeasurement : num [1:987] 22 22 21 20 21 23 20 20 19 21 ...
## $ headMeasurement : num [1:987] 43 41 40 40 43 42 43 36 45 45 ...
## $ spleenWeight : num [1:987] NA NA NA NA NA NA NA NA 170 263 ...
## $ vagina : chr [1:987] "open" NA NA NA ...
## $ teats : chr [1:987] "barely-visible" NA NA NA ...
## $ mammae : chr [1:987] "?+?+?" NA NA NA ...
## $ embryoLeftSide : num [1:987] 0 NA NA NA 0 0 0 0 NA 1 ...
## $ embryoRightSide : num [1:987] 0 NA NA NA 0 0 0 0 NA 0 ...
## $ testes : chr [1:987] NA "inside" "inside" "inside" ...
## $ testesLength : num [1:987] NA 9 8 8 NA NA NA NA 31 NA ...
## $ seminalVesicle : chr [1:987] NA "undeveloped" "undeveloped" "undeveloped" ...
## $ latitude : num [1:987] 10.7 10.7 10.7 10.7 10.7 ...
## $ longitude : num [1:987] 104 104 104 104 104 ...
## $ country : chr [1:987] "Cambodia" "Cambodia" "Cambodia" "Cambodia" ...
## $ province : chr [1:987] "Sihanouk" "Sihanouk" "Sihanouk" "Sihanouk" ...
The dataframe “rattus” has 3 data types:
Other data types are integer, logical (TRUE, FALSE)
ggplot(data = rattus, mapping = aes(x = bodyWeight))+
geom_freqpoly(mapping = aes(colour = taxonomyID)) +
theme_bw()
ggplot(data = rattus |>
dplyr::filter(sex == "female"), mapping = aes(x = bodyWeight))+
geom_freqpoly(mapping = aes(colour = taxonomyID)) +
labs(title="Distribution of female body weight") +
theme_classic()
Methods | R function | Description |
---|---|---|
T-test | t.test() | Compare two groups (parametric) |
Wilcoxon test | wilcox.test() | Compare two groups (non-parametric) |
ANOVA | aov() or anova() | Compare multiple groups (parametric) |
Kruskal-Wallis | kruskal.test() | Compare multiple groups (non-parametric) |
##
## Shapiro-Wilk normality test
##
## data: rattus$bodyWeight
## W = 0.98955, p-value = 2.109e-06
Interpretation
The null-hypothesis of this test is that the population is normally distributed. Thus, if the p value is less than the chosen alpha level, then the null hypothesis is rejected and there is evidence that the data tested are not normally distributed.
P value of the Wilcoxon test < 0.001 indicates a significant difference between the mean body weight of Rattus tanezumi and the mean body weight Rattus R3
ggpubr::ggviolin(rattus, x ="taxonomyID" , y = "bodyWeight", fill = "taxonomyID",
palette = c("#00AFBB", "#FC4E07"),
add = "boxplot", add.params = list(fill = "white"))+
ggpubr::stat_compare_means(comparisons = c("Rattus_tanzumi","Rattus_r3"), label = "p.signif")+ # Add significance levels
ggpubr::stat_compare_means(label.y = 280) # Add global the p-value
rattus_morpho <- rattus |>
dplyr::select(bodyWeight, headBodyMeasurement,tailMeasurement,hindfootMeasurement,earMeasurement ,headMeasurement) |>
tidyr::drop_na()
The function select of package dplyr allows to select variables from a given dataframe. Here we select (bodyWeight, headBodyMeasurement,tailMeasurement,hindfootMeasurement,earMeasurement, headMeasurement) from the dataframe rattus. If the rows with NA should be removed, we use the function drop_na of the package “tidyr”
Interpretation
All morphometric variables are significantly correlated with each other (p value < 0.001 given with three “*” and R2 from 0.52 to 0.89)
Interpretation
None of the variables follow a normal distribution (p value < 0.001).
Normalization of variables is necessary to perform parametric statistics, otherwise non-parametric tests are possible.
R_tan_sf <- rattus |>
dplyr::select(taxonomyID, longitude, latitude) |>
tidyr::drop_na() |>
dplyr::filter(taxonomyID == "Rattus_tanezumi") |>
sf::st_as_sf(coords = c("longitude", "latitude"),
crs= 4326)
R_R3_sf <- rattus |>
dplyr::select(taxonomyID, longitude, latitude) |>
tidyr::drop_na() |>
dplyr::filter(taxonomyID == "Rattus_r3") |>
sf::st_as_sf(coords = c("longitude", "latitude"),
crs= 4326)
map_rattus <- ggplot() +
geom_sf(data = R_tan_sf, aes(color = "Rattus_tanezumi"), size = 1) +
geom_sf(data = R_R3_sf, aes(color = "Rattus_r3"), size = 1) +
labs(title = "Spatial Distribution of Rattus tanezumi and Rattus R3",
color = "taxonomyID") +
theme_minimal()
Longitudinal distribution using ggplot2
ggplot(data = rattus, mapping = aes(x = longitude))+
geom_freqpoly(mapping = aes(colour = taxonomyID)) +
labs(title="Distribution of longitude distribution") +
theme_classic()
Longitudinal distribution using function ggboxplot from package ggpubr (note that individuals collected from Philippines were removed, using the function filter of the package dplyr)
ggpubr::ggboxplot(data = rattus |>
dplyr::filter(longitude < 120), y = "longitude", x = "taxonomyID",
color = "taxonomyID", palette = "jco",
add = "jitter", short.panel.labs = FALSE) +
ggpubr::stat_compare_means(comparisons = c("Rattus_r3", "Rattus_tanezumi"), label.y = 107) +
ggpubr::stat_compare_means(label = "p.format",
label.x = 1.5)
Interpretation
The null-hypothesis of the test is that the longitudinal distributions of Rattus tanezumi and Rattus R3 significantly differ. The p value of the Wilcoxon test is less than 0.0001, then the null hypothesis is rejected, wich supports that the two taxonomic entities have different longitudinal distribution (note that the test was done for individuals from Indochina, removing individuals from Philippines)
wilTest <- tibble::as_tibble(ggpubr::compare_means(longitude ~ taxonomyID, data = rattus))
knitr::kable(wilTest)
.y. | group1 | group2 | p | p.adj | p.format | p.signif | method |
---|---|---|---|---|---|---|---|
longitude | Rattus_r3 | Rattus_tanezumi | 0 | 0 | <2e-16 | **** | Wilcoxon |