Investigating the relationship between parasite count / pathogen (BdZE) loads and frogs body size (SVL).
Setting up necessary library and importing data from the two studies
library(tidyverse)
Warning: package 'ggplot2' was built under R version 4.5.2
Warning: package 'purrr' was built under R version 4.5.1
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 4.0.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(patchwork)
Warning: package 'patchwork' was built under R version 4.5.2
library(ggpubr)
Warning: package 'ggpubr' was built under R version 4.5.2
library(mosaic)
Warning: package 'mosaic' was built under R version 4.5.2
Registered S3 method overwritten by 'mosaic':
method from
fortify.SpatialPolygonsDataFrame ggplot2
The 'mosaic' package masks several functions from core packages in order to add
additional features. The original behavior of these functions should not be affected by this.
Attaching package: 'mosaic'
The following object is masked from 'package:Matrix':
mean
The following objects are masked from 'package:dplyr':
count, do, tally
The following object is masked from 'package:purrr':
cross
The following object is masked from 'package:ggplot2':
stat
The following objects are masked from 'package:stats':
binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
quantile, sd, t.test, var
The following objects are masked from 'package:base':
max, mean, min, prod, range, sample, sum
library(ggplot2)library(boot)
Attaching package: 'boot'
The following object is masked from 'package:mosaic':
logit
The following object is masked from 'package:lattice':
melanoma
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
As we can see, only call duty cycle has an obvious correlation with SVL. Call rates and call effort has a flat or slightly upward regression line, while Intercall Interval has a slightly downward trend line.
Wait, is the data normal?
Testing BdZE distribution using Cumulative distribution function
plot(ecdf(sfsu_frog$BdZE))
So BdZE is not normally distributed but exponentially distributed. We can also tell in the following histogram.
hist(sfsu_frog$BdZE)
We can also run the Shapiro-Wilk normality test to see if its normally distributed.
shapiro.test(sfsu_frog$BdZE)
Shapiro-Wilk normality test
data: sfsu_frog$BdZE
W = 0.46488, p-value < 2.2e-16
With the p-value less than 0.05, we rejected the hypothesis of normality, meaning BdZe is non-parametric.
Now, let’s see if the other dataset (parasite count vs SVL) is normally distributed.
shapiro.test(anhui_frog_1$parasite)
Shapiro-Wilk normality test
data: anhui_frog_1$parasite
W = 0.74822, p-value = 5.111e-10
The other species in the study.
shapiro.test(anhui_frog_2$parasite)
Shapiro-Wilk normality test
data: anhui_frog_2$parasite
W = 0.70773, p-value = 6.318e-11
Both of the p-values are less than 0.05, meaning data for parasite is also non-parametric.
Unable to transform the data to normal by log transforming it.
Now try bootstrapping
# function to obtain R-Squared from the datarsq <-function(formula, data, indices){ d <- data[indices,] # allows boot to select sample fit <-lm(formula, data=d)return(summary(fit)$r.square)}# bootstrapping with 1000 replicationsresults <-boot(data=sfsu_frog, statistic = rsq,R=1000, formula=SVL~BdZE)# View resultsresults
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = sfsu_frog, statistic = rsq, R = 1000, formula = SVL ~
BdZE)
Bootstrap Statistics :
original bias std. error
t1* 0.006340548 0.001671743 0.007306354
plot(results)
Spearman’s rank-order correlation test
The Spearman’s rank-order correlation is the nonparametric version of the Pearson product-moment correlation. Spearman’s correlation coefficient, (ρ, also signified by rs) measures the strength and direction of association between two ranked variables.
Warning in cor.test.default(x, y, ...): Cannot compute exact p-value with ties
corr
Spearman's rank correlation rho
data: x and y
S = 19980043, p-value = 0.208
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
-0.057337
Lets visualize it.
ggscatter(sfsu_frog, x ="SVL", y ="BdZE",add ="reg.line", conf.int =TRUE,cor.coef =TRUE, cor.method ="spearman",xlab ="SVL", ylab ="BdZE")
With a p-value of 0.21, it means the data failed to reject the hypothesis that there are no correlation between SVL and BdZE load. R = -0.057, meaning there is only a tiny, ignorable negative correlation.
Lets see if SVL correlate with parasite count in the other study.
Warning in cor.test.default(x, y, ...): Cannot compute exact p-value with ties
corr2
Spearman's rank correlation rho
data: x and y
S = 52816, p-value = 0.03143
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.2487081
ggscatter(anhui_frog_1, x ="SVL", y ="parasite",add ="reg.line", conf.int =TRUE,cor.coef =TRUE, cor.method ="spearman",xlab ="SVL", ylab ="parasite count")
For the first species in the other study, SVL and parasite count does correlate, with a p-value of less than 0.05. Also, the correlation coefficient R is 0.25. (Note: the coefficient is a number ranging from -1 to 1 that indicates how strongly the two variables are correlated.)
Warning in cor.test.default(x, y, ...): Cannot compute exact p-value with ties
corr3
Spearman's rank correlation rho
data: x and y
S = 70794, p-value = 0.9523
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
-0.007023257
ggscatter(anhui_frog_2, x ="SVL", y ="parasite",add ="reg.line", conf.int =TRUE,cor.coef =TRUE, cor.method ="spearman",xlab ="SVL", ylab ="parasite count")
Interestingly or rather surprisingly , there is not much of a correlation between SVL and parasite count in the other species in the same study. We were expecting the result would be similar to the first species, however, it is not the case.