Introduction

  The study of morphological characteristics in rodent populations is important for understanding their ecological and evolutionary impacts. This statistical research focuses on hindfoot length, a trait that offers insights into their mobility, habit use, and overall fitness (Arregoitia 2017). Accurate statsical analysis of such traits requires ensuring the normality of data since many parametric tests assume normally distributed variables (Noel 2021). Ensuring the normality of data is particularly cruicial in ecological studies because misinterpretations can lead to inaccurate conclusions about ecological processes. Such inaccuracies can affect ecological management and conservation effforts, potentially leading to ineffective or harmful interventions (Sit 1998). 
  
  This study aims to evaluate whether hindfoot length in rodents is normally distributed across different sexes and species by using data from “The Portal Project: A Long-Term Study of a Chihuahuan Desert Ecosystem”. The Portal Project, which began in the 1970, provides 40 year’s worth of data on rodent populations in a desert grassland habitat ecosystem in Portal, Arizona. The dataset is accessible through the ‘ratdat’ package and it includes measurements like hindfoot length, species identification, and sex of the rodents captured. The primary hypothesis of this study is that hindfoot length in rodents is normally distributed, regardless of sex and species. To test this hypotheis, Q-Q plots and Shapiro-Wilk tests will be used to evaluate normality. 

Materials and Methods

  The data for this study was collected from a long term controlled ecological environment. Rodent sampling occured monthly and was strategically scheduled to align as closely as possible to the new moon to minimize the influence of moonlight on rodent behavior. Sherman traps were used for capturing rodents and were baited with millet to atract them. For each Rodent captured, detailed records were kept including the species, sex, reproductive condition, weight, and hindfoot length. 24 fenced plots measuring 20-hectares in length were use. The Rodents of interest were able to selectivly access these plots through controlled use of 16 gates surrounding each plot. 
  
  The primary focus of this statistical analysis of the data gathered from this research was to determine if hindfoot length among captured rodents is normally distributed, considering both sex and species. The data was analyzed using R (v. 4.4.1 The R Foundation for Statistical Computing 2024). The analysis began with preparing data by first locating rodent data in ratdat's complete data and then sub-setting that data by species, sex, and hindfoot length. Once that was done, the sex and hindfoot length vectors were cleaned by excluding blank and 'NA' data. Additionally, the data was futher filtered to exclude data with sample sizes that had more than 5000 observations in order to meet Shapiro Wilk test criteria. Speices that were found to have a sample size that were less than 30 were also excluded to minimize factors that can contribute to departures of normality.  It was observed that 4 species did not meet the set sample size requirments and were excluded (Table 2). Although female data for fulvescens, merriami, ochrognathus and taylori did meet the valid conditions for sample size, the paired male data did not, so they were also excluded from the data set.(Table 2). After filtering was completed, a final data set with the variables of interest was created and named as 'final.rodent'. Q-Q plots were then created with this final data set to visually inspect departures from normality. Lastly, Shapiro-Wilk tests were conducted to assess the normality of hindfoot length distributions for each species-sex group. 

Analysis and Results

  Upon analysis, the data demonstatred that hindfoot length is not a normally distributed variable regardless of species and sex (Table 3). Visually, the Q-Q plots for all 12 species across gender depart from normality (Figures 1-24). Both male and female albigula data show significant deviations from a normal distribution, confirming that hindfoot length is not normally distributed in this species at alpha = 0.05 (Table 3. Shapiro-Wilk test for males: W = 0.88289, p-value = 5.0275e-18 & Shapiro-Wilk test for females: W = 0.91707, p-value = 6.1855e-18). Both male and female baileyi data show significant deviations from a normal distribution, confirming that hindfoot length is not normally distributed in this species at alpha = 0.05 (Table 3. male Shapiro-Wilk test: W = 0.63185, p-value = 4.7764e-15 & female Shapiro-Wilk test: W = 0.80248, p-value = 7.7422e-41). Both male and female eremicus data show significant deviations from a normal distribution, confirming that hindfoot length is not normally distributed in this species at alpha = 0.05 (Table 3. Shapiro-Wilk test for males: W = 0.84331, p-value = 4.2062e-25 & Shapiro-Wilk test for females: W = 0.84766, p-value = 1.7912e-22). Both male and female flavus data show significant deviations from a normal distribution, confirming that hindfoot length is not normally distributed in this species at alpha = 0.05 (Table 3. Shapiro-Wilk test for males: W = 0.81331, p-value = 6.3634e-29 & Shapiro-Wilk test for females: W = 0.60746, p-value = 1.854e-37) Both male and female hispidus data show significant deviations from a normal distribution, confirming that hindfoot length is not normally distributed in this species at alpha = 0.05 (Table 3. Shapiro-Wilk test for males: W = 0.93745, p-value = 1.5427e-03 & Shapiro-Wilk test for females: W = 0.9559, p-value = 3.7084e-03). Both male and female leucogaster data show significant deviations from a normal distribution, confirming that hindfoot length is not normally distributed in this species at alpha = 0.05 (Table 3. Shapiro-Wilk test for males: W = 0.67827, p-value = 3.5298e-29 & Shapiro-Wilk test for females: W = 0.90906, p-value = 1.5219e-15).Both male and female maniculatus data show significant deviations from a normal distribution, confirming that hindfoot length is not normally distributed in this species at alpha = 0.05 (Table 3. Shapiro-Wilk test for males: W = 0.72891, p-value = 2.7526e-27 & Shapiro-Wilk test for females: W = 0.75523, p-value = 8.9143e-23). Both male and female megalotis data show significant deviations from a normal distribution, confirming that hindfoot length is not normally distributed in this species at alpha = 0.05 (Table 3. Shapiro-Wilk test for males: W = 0.81420, p-value = 3.2387e-36 & Shapiro-Wilk test for females: W = .82943, p-value = 4.2775e-33). Both male and female ordii data show significant deviations from a normal distribution, confirming that hindfoot length is not normally distributed in this species at alpha = 0.05 (Table 3. Shapiro-Wilk test for males: W = 0.8288, p-value = 1.0291e-38 & Shapiro-Wilk test for females: W = 0.79035, p-value = 2.9742e-37). Both male and female penicillatus data show significant deviations from a normal distribution, confirming that hindfoot length is not normally distributed in this species at alpha = 0.05 (Table 3. Shapiro-Wilk test for males: W = 0.88578, p-value = 2.3347e-31 & Shapiro-Wilk test for females: W = 0.79053, p-value = 5.5665e-41). Both male and female spectabilis data show significant deviations from a normal distribution, confirming that hindfoot length is not normally distributed in this species at alpha = 0.05 (Table 3. Shapiro-Wilk test for males: W = 0.97248, p-value = 1.7822e-13 & Shapiro-Wilk test for females: W = 0.96075, p-value = 3.7195e-16). Both male and female torridus data show significant deviations from a normal distribution, confirming that hindfoot length is not normally distributed in this species at alpha = 0.05 (Table 3. Shapiro-Wilk test for males: W = 0.83934, p-value = 4.2544e-32 & Shapiro-Wilk test for females: W = 0.58814, p-value = 9.4814e-44).   

# Loading data
library(ratdat)
## Warning: package 'ratdat' was built under R version 4.4.1
library(kableExtra)
## Warning: package 'kableExtra' was built under R version 4.4.1
# ordering and sub setting data
data(package = "ratdat")
data("complete", package = "ratdat")
rodent.data <- subset(complete, taxa == "Rodent")

# Creating a Data Frame with subjects of interest
rodent.subset <- rodent.data[,c("species", "sex", "hindfoot_length")]

# Removing NA data
rodent.clean <- rodent.subset[!is.na(rodent.subset$hindfoot_length) & rodent.subset$sex != "", ]

# Filtering out data where 30 <= n <= 5000 (to meet Shapiro Wilk test criteria)
species.sex.n <- aggregate(hindfoot_length ~ species + sex, data = rodent.clean, length)
valid.size <- subset(species.sex.n, hindfoot_length >= 30 & hindfoot_length <= 5000)


# Sorting valid data 
species.count <- table(valid.size$species)
valid.species <- names(species.count[species.count == 2])
valid.size <- subset(valid.size, species %in% valid.species)
valid.data <- with(rodent.clean, paste(species, sex) %in% paste(valid.size$species, valid.size$sex))

# final data set 
rodent.final <- rodent.clean[valid.data, ]
species.list <- unique(rodent.final$species)
sex.list  <- unique(rodent.final$sex)

# Summary of final data  

kable(valid.size, caption = "Table 1. Final valid list of species to be analyzed") %>%
  kable_styling(bootstrap_options = "striped", full_width = F, position = "center")
Table 1. Final valid list of species to be analyzed
species sex hindfoot_length
1 albigula F 620
2 baileyi F 1645
3 eremicus F 544
4 flavus F 721
7 hispidus F 91
8 leucogaster F 438
10 maniculatus F 361
11 megalotis F 1133
15 ordii F 1244
16 penicillatus F 1574
18 spectabilis F 1046
20 torridus F 1019
21 albigula M 452
22 baileyi M 1213
23 eremicus M 666
24 flavus M 770
27 hispidus M 71
29 leucogaster M 480
31 maniculatus M 483
32 megalotis M 1296
36 ordii M 1640
37 penicillatus M 1444
39 spectabilis M 1083
41 torridus M 1115
# Table 1. List of species with valid sample size to use for QQ Plots and Shapiro Wilk tests

# Table of species excluded from analysis 
invalid.data <- subset(species.sex.n, hindfoot_length < 30 | hindfoot_length > 5000)
kable(invalid.data, caption = "Table 2. List of species excluded from analysis due to invalid sample size") %>%
  kable_styling(bootstrap_options = "striped", full_width = F, position = "center")
Table 2. List of species excluded from analysis due to invalid sample size
species sex hindfoot_length
6 fulviventer F 16
9 leucopus F 16
13 montanus F 4
17 sp. F 4
25 fulvescens M 18
26 fulviventer M 23
28 intermedius M 8
30 leucopus M 19
33 merriami M 5658
34 montanus M 4
35 ochrognathus M 11
38 sp. M 8
40 taylori M 14
# Creating a function to make QQ plots for each species based on gender
plots.per.page <- 6
num.plots <- length(species.list) * length(sex.list)
num.pages <- ceiling(num.plots / plots.per.page)
plot_index <- 1

for (page in 1:num.pages) {
   par(mfrow = c(2, 3), mar = c(4, 4, 2, 1))
  for (i in 1:plots.per.page) {
    if (plot_index <= num.plots){
      species_index <- (plot_index - 1) %/% length(sex.list) + 1
      sex_index <- (plot_index - 1) %% length(sex.list) + 1
        species <- species.list[species_index]
        sex <- sex.list[sex_index]
        subset_data <- rodent.final[rodent.final$species == species & rodent.final$sex == sex, ]
      if (nrow(subset_data) > 0) {
          qqnorm(subset_data$hindfoot_length, main = paste("Species:", species, "\nSex:", sex), ylab = "Hindfoot Length")
          qqline(subset_data$hindfoot_length)
     } else {
        plot.new()
        text(0.5, 0.5, paste("No data for", species, "\nSex:", sex))
     }
        plot_index <- plot_index + 1
  }
  }
}

# Figures 1 - 24. QQ Plots of 24 Rodent species based on sex.
# Creating a Function than runs Shapiro Wilk test (SWT) on all species based on gender
SWT <- function(data) {
  species.list <-unique(data$species)
  sex.list <- unique(data$sex)
  results <- data.frame(Species = character(), Sex = character(), W = numeric(), p.value = numeric(), stringsAsFactors = F)
  
  for (species in species.list) {
    for (sex in sex.list) {
      subset_data <- data[data$species == species & data$sex == sex, ]
      if (nrow(subset_data) > 0) {
        shapiro_test <- shapiro.test(subset_data$hindfoot_length)
        results <- rbind(results, data.frame(species = species, Sex = sex, W = shapiro_test$statistic, p.value = shapiro_test$p.value))
      }
    }
  }
  results$p.value <- format(results$p.value, scientific = T)
  return(results)
}

shapiro_results <- SWT(rodent.final)

#Ordering data
results.sorted <- shapiro_results[order(shapiro_results$species),]

# Summary of a Shapiro Wilk test done on all species based on sex with p-values and w statistic.
kable(results.sorted, caption = "Table 3. Shapiro Wilk Test on males and females of 12 Rodent species ") %>%
  kable_styling(bootstrap_options = "striped", full_width = F, position = "center")
Table 3. Shapiro Wilk Test on males and females of 12 Rodent species
species Sex W p.value
W albigula M 0.8828873 5.027533e-18
W1 albigula F 0.9170704 6.185466e-18
W22 baileyi M 0.6318539 4.777641e-45
W23 baileyi F 0.8024847 7.740224e-41
W12 eremicus M 0.8433063 4.206252e-25
W13 eremicus F 0.8476587 1.791187e-22
W2 flavus M 0.8133058 6.363432e-29
W3 flavus F 0.6074554 1.835362e-37
W8 hispidus M 0.9374484 1.542746e-03
W9 hispidus F 0.9559254 3.708424e-03
W16 leucogaster M 0.6782663 3.529778e-29
W17 leucogaster F 0.9090600 1.521979e-15
W20 maniculatus M 0.7289100 2.752571e-27
W21 maniculatus F 0.7552342 8.914373e-23
W18 megalotis M 0.8142041 3.238716e-36
W19 megalotis F 0.8294269 4.277482e-33
W14 ordii M 0.8288210 1.029124e-38
W15 ordii F 0.7903579 2.974286e-37
W6 penicillatus M 0.8857755 2.334725e-31
W7 penicillatus F 0.7905255 5.566473e-41
W4 spectabilis M 0.9724783 1.782156e-13
W5 spectabilis F 0.9607492 3.719492e-16
W10 torridus M 0.8393351 4.254400e-32
W11 torridus F 0.5881410 9.481430e-44

Discussion

  The underlying assumption of this study was that hindfoot length is a normally distributed variable regardless of species and sex in rodents. The results of this analysis consistently showed significant deviations from normality for all species and sex groups examined, which refutes the original hypothesis. These finding were supported by the Shapiro-Wilk tests and visual inspections of Q-Q plots, which revealed substantial departures from normal distribution patterns. Although this data does not meet the normal test assumptions, testing other cirteria like homogeneity of variance using levenes test is still important to check in order determine which type of statistical test is appropriate to use for futher evaluation. Non-parametic test like the Mann-Whitney U test could be considered for further analysis or data transformations could be considered to meet normality assumptions, allowing for the use of parametric tests. 
  
  In general, understanding the normality of data is important to minimizing Type I and Type II errors in statistical analysis. Minimizing these errors can improve the robustness and reliability of ecological studies. Assuming normality when data is in fact not normally distributed could lead to erroneous conclusions about the relationships between hindfoot length and other ecological variables. These findings highlight the importance of testing and validating statistical assumptions in ecological and evolutionary research. By adopting appropriate statistical methods and considering the underlying factors contributing to trait distributions, researchers can make more accurate and meaningful inferences about rodent populations and their ecological and evolutionary dynamics (Sit 1998). 
  
  
  

Acknowledgements

Thank anyone who provided assistance.

References

Ernest, S. K. M., Yenni, G. M., Allington, G., & Christensen, E. (2020). The Portal Project: a long term study of a Chihuahuan desert ecosystem . Cold Spring Harbor Laboratory, 1–23. https://doi.org/ https://doi.org/10.1101/332783 *

 Noel, D. D., Gnoan, K., & Alphonse, A. K. (2021). Normality Assessment of Several Quantitative Data Transformation Procedures. Biostatistics and Biometrics, 10(3), 1–15. https://doi.org/DOI: 10.19080/BBOAJ.2021.10.555786 
  
 Sit, Vera, Taylor, Brenda (1998). Statistical Methods for Adaptive Management Studies. Res. Br.,
B.C. Min. For., Res. Br., Victoria, BC, Land Manage. Handb. No. 42.
 
 Verde Arregoitia, L. D., Fisher, D. O., & Schweizer, M. (2017). Morphology captures diet and locomotor types in rodents. Royal Society Open Science, 1–14. https://doi.org/https://doi.org/10.1098/rsos.160957