1 Load libaries

library(ggplot2)

2 Measurements external morphology

On the field, measurements were taken from each individual (in mm), respectively:

  • Head and Body (HB)
  • Tail (T)
  • Head (H)
  • Ear (E)
  • Hind foot (HF)

knitr::include_graphics('./measurements_rodent.jpg')

2.1 Get (load) RData file

Three datasets including morphometrics and habitat, helminths and immunological variables of Rattus tanezumi and Rattus R3 have been saved in a RData file

load("Rattus_data.RData")

2.1.1 What is a RData file?

A RData format (with extension .RData) is a format designed for use with R for storing selected “objects” from a workspace in a form that can be loaded back by R

knitr::include_graphics('./rdata.jpg')

2.1.2 What is dataframe?

A dataframe is a type of data structures that can be manipulated by R

knitr::include_graphics('./data_structures.jpg')

2.1.3 Read the dataframe “rattus”

head(rattus,5) # print the "5" first rows

2.1.4 Display the structure of the dataframe using str

str(rattus)
## tibble [987 × 32] (S3: tbl_df/tbl/data.frame)
##  $ id_indiv                  : chr [1:987] "C0021" "C0022" "C0023" "C0024" ...
##  $ taxonomyID                : chr [1:987] "Rattus_r3" "Rattus_r3" "Rattus_r3" "Rattus_r3" ...
##  $ identificationType        : chr [1:987] "coi" "coi" "coi" "coi" ...
##  $ sex                       : chr [1:987] "female" "male" "male" "male" ...
##  $ trappingDate              : POSIXct[1:987], format: "2008-02-29" "2008-02-29" ...
##  $ collector                 : chr [1:987] "morand" "morand" "morand" "morand" ...
##  $ trappingMethod            : chr [1:987] "collected" "collected" "collected" "collected" ...
##  $ trappingAccuracy          : num [1:987] 3 3 3 3 3 3 3 3 1 1 ...
##  $ trappingLandscapeLowRes   : chr [1:987] "lowland" "lowland" "lowland" "lowland" ...
##  $ trappingLandscapeMediumRes: chr [1:987] "rice" "rice" "rice" "rice" ...
##  $ id_site                   : chr [1:987] "veal-renh-0003" "veal-renh-0003" "veal-renh-0003" "veal-renh-0003" ...
##  $ isTrappedAlive            : chr [1:987] "yes" "yes" "yes" "yes" ...
##  $ isDissected               : chr [1:987] "yes" "yes" "yes" "yes" ...
##  $ bodyWeight                : num [1:987] 102 108 103 64 87 111 100 47 107 108 ...
##  $ headBodyMeasurement       : num [1:987] 156 175 172 137 161 157 158 123 165 165 ...
##  $ tailMeasurement           : num [1:987] NA 148 138 140 133 166 NA 110 NA 156 ...
##  $ hindfootMeasurement       : num [1:987] 34 36 33 38 31 34 33 29 29 32 ...
##  $ earMeasurement            : num [1:987] 22 22 21 20 21 23 20 20 19 21 ...
##  $ headMeasurement           : num [1:987] 43 41 40 40 43 42 43 36 45 45 ...
##  $ spleenWeight              : num [1:987] NA NA NA NA NA NA NA NA 170 263 ...
##  $ vagina                    : chr [1:987] "open" NA NA NA ...
##  $ teats                     : chr [1:987] "barely-visible" NA NA NA ...
##  $ mammae                    : chr [1:987] "?+?+?" NA NA NA ...
##  $ embryoLeftSide            : num [1:987] 0 NA NA NA 0 0 0 0 NA 1 ...
##  $ embryoRightSide           : num [1:987] 0 NA NA NA 0 0 0 0 NA 0 ...
##  $ testes                    : chr [1:987] NA "inside" "inside" "inside" ...
##  $ testesLength              : num [1:987] NA 9 8 8 NA NA NA NA 31 NA ...
##  $ seminalVesicle            : chr [1:987] NA "undeveloped" "undeveloped" "undeveloped" ...
##  $ latitude                  : num [1:987] 10.7 10.7 10.7 10.7 10.7 ...
##  $ longitude                 : num [1:987] 104 104 104 104 104 ...
##  $ country                   : chr [1:987] "Cambodia" "Cambodia" "Cambodia" "Cambodia" ...
##  $ province                  : chr [1:987] "Sihanouk" "Sihanouk" "Sihanouk" "Sihanouk" ...

The dataframe “rattus” has 3 data types:

  • character (chr), such as $id_indiv
  • numeric (num), real or decimal, such as $bodyWeight
  • date (POSIXct), such as $ trappingDate

    Other data types are integer, logical (TRUE, FALSE)

2.2 Show distribution body weight (using hist())

hist(rattus$bodyWeight)

2.3 Distribution of body weight using the package “ggplot2

ggplot(data = rattus, mapping = aes(x = bodyWeight))+
  geom_freqpoly(mapping = aes(colour = taxonomyID)) +
  theme_bw()

2.3.1 Distribution of body weight according to sex (here female), filtering using the package “dplyr”

ggplot(data = rattus |>
         dplyr::filter(sex == "female"), mapping = aes(x = bodyWeight))+
  geom_freqpoly(mapping = aes(colour = taxonomyID)) +
  labs(title="Distribution of female body weight")  +
  theme_classic()

2.4 Methods for comparing means:

Methods R function Description
T-test t.test() Compare two groups (parametric)
Wilcoxon test wilcox.test() Compare two groups (non-parametric)
ANOVA aov() or anova() Compare multiple groups (parametric)
Kruskal-Wallis kruskal.test() Compare multiple groups (non-parametric)

2.5 Is the distribution of body weight “normal”? The Shapiro–Wilk test

shapiro.test(rattus$bodyWeight)
## 
##  Shapiro-Wilk normality test
## 
## data:  rattus$bodyWeight
## W = 0.98955, p-value = 2.109e-06

Interpretation

The null-hypothesis of this test is that the population is normally distributed. Thus, if the p value is less than the chosen alpha level, then the null hypothesis is rejected and there is evidence that the data tested are not normally distributed.

2.5.1 Violin distribution of body weight using the package “ggpubr”

P value of the Wilcoxon test < 0.001 indicates a significant difference between the mean body weight of Rattus tanezumi and the mean body weight Rattus R3

ggpubr::ggviolin(rattus, x ="taxonomyID" , y = "bodyWeight", fill = "taxonomyID",
         palette = c("#00AFBB",  "#FC4E07"),
         add = "boxplot", add.params = list(fill = "white"))+
  ggpubr::stat_compare_means(comparisons = c("Rattus_tanzumi","Rattus_r3"), label = "p.signif")+ # Add significance levels
  ggpubr::stat_compare_means(label.y = 280)   # Add global the p-value 

2.6 Morphometric analysis

2.6.1 Select morphometric variables using the package “dplyr”

rattus_morpho <- rattus |>
  dplyr::select(bodyWeight, headBodyMeasurement,tailMeasurement,hindfootMeasurement,earMeasurement ,headMeasurement) |>
  tidyr::drop_na()

The function select of package dplyr allows to select variables from a given dataframe. Here we select (bodyWeight, headBodyMeasurement,tailMeasurement,hindfootMeasurement,earMeasurement, headMeasurement) from the dataframe rattus. If the rows with NA should be removed, we use the function drop_na of the package “tidyr”

2.6.2 Correlation and distribution of morphometric variables using the package “PerformanceAnalytics”

PerformanceAnalytics::chart.Correlation(rattus_morpho, histogram=TRUE, pch=19)

Interpretation

All morphometric variables are significantly correlated with each other (p value < 0.001 given with three “*” and R2 from 0.52 to 0.89)

2.6.3 Diagnosis using the package “dlookr”: normality

dlookr::normality(rattus_morpho)

Interpretation

None of the variables follow a normal distribution (p value < 0.001).

Normalization of variables is necessary to perform parametric statistics, otherwise non-parametric tests are possible.

2.6.4 Diagnosis using the package “dlookr”:outlier

dlookr::diagnose_outlier(rattus_morpho)

Interpretation

Several outliers (from 5 individuals for bodyWeight to 29 individuals from tailMeasurement).

Removing outliers may be necessary to perform parametric statistics


3 Geographic distribution

3.1 Create a spatial dataframe using the package “sf”

R_tan_sf  <- rattus |>
  dplyr::select(taxonomyID, longitude, latitude) |>
  tidyr::drop_na() |>
  dplyr::filter(taxonomyID == "Rattus_tanezumi") |>
  sf::st_as_sf(coords = c("longitude", "latitude"), 
               crs= 4326)

R_R3_sf  <- rattus |>
  dplyr::select(taxonomyID, longitude, latitude) |>
  tidyr::drop_na() |>
  dplyr::filter(taxonomyID == "Rattus_r3") |>
  sf::st_as_sf(coords = c("longitude", "latitude"), 
               crs= 4326)

3.1.1 Draw distributon maps of Rattus tanezumi and Rattus R3 using the package “mapview”

R_tan_Map <- mapview::mapview(R_tan_sf,col.regions = "red", color="red", alpha = 0.9,
                 layer.name = "Rattus tanezumi")

R_R3_Map <- mapview::mapview(R_R3_sf,col.regions = "blue", color="blue",alpha = 0.9,
                 layer.name = "Rattus R3")

R_R3_Map
R_tan_Map 

3.1.2 Draw distributon maps of Rattus tanezumi and Rattus R3 using ggplot2

map_rattus <- ggplot() +
  geom_sf(data = R_tan_sf, aes(color = "Rattus_tanezumi"), size = 1) +
  geom_sf(data = R_R3_sf, aes(color = "Rattus_r3"), size = 1) +
  labs(title = "Spatial Distribution of Rattus tanezumi and Rattus R3",
       color = "taxonomyID") +
  theme_minimal()
map_rattus

3.2 Test of differences in longitudinal distribtion

Longitudinal distribution using ggplot2

ggplot(data = rattus, mapping = aes(x = longitude))+
  geom_freqpoly(mapping = aes(colour = taxonomyID)) +
  labs(title="Distribution of longitude distribution") +
  theme_classic() 

Longitudinal distribution using function ggboxplot from package ggpubr (note that individuals collected from Philippines were removed, using the function filter of the package dplyr)

ggpubr::ggboxplot(data = rattus |> 
                    dplyr::filter(longitude < 120),  y = "longitude", x = "taxonomyID",
          color = "taxonomyID", palette = "jco",
          add = "jitter", short.panel.labs = FALSE) + 
  ggpubr::stat_compare_means(comparisons =  c("Rattus_r3", "Rattus_tanezumi"), label.y = 107) +
  ggpubr::stat_compare_means(label = "p.format", 
                        label.x = 1.5)

Interpretation

The null-hypothesis of the test is that the longitudinal distributions of Rattus tanezumi and Rattus R3 significantly differ. The p value of the Wilcoxon test is less than 0.0001, then the null hypothesis is rejected, wich supports that the two taxonomic entities have different longitudinal distribution (note that the test was done for individuals from Indochina, removing individuals from Philippines)

wilTest <- tibble::as_tibble(ggpubr::compare_means(longitude ~ taxonomyID, data = rattus))
knitr::kable(wilTest)
.y. group1 group2 p p.adj p.format p.signif method
longitude Rattus_r3 Rattus_tanezumi 0 0 <2e-16 **** Wilcoxon


4 List of some tutorials