BIOL 458 Final Presentation

Author

Kai Yeung

Investigating the relationship between parasite count / pathogen (BdZE) loads and frogs body size (SVL).

Setting up necessary library and importing data from the two studies

library(tidyverse)

Warning: package 'ggplot2' was built under R version 4.5.2

Warning: package 'purrr' was built under R version 4.5.1

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   4.0.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(patchwork)

Warning: package 'patchwork' was built under R version 4.5.2

library(ggpubr)

Warning: package 'ggpubr' was built under R version 4.5.2

library(mosaic)

Warning: package 'mosaic' was built under R version 4.5.2

Registered S3 method overwritten by 'mosaic':
  method                           from   
  fortify.SpatialPolygonsDataFrame ggplot2

The 'mosaic' package masks several functions from core packages in order to add 
additional features.  The original behavior of these functions should not be affected by this.

Attaching package: 'mosaic'

The following object is masked from 'package:Matrix':

    mean

The following objects are masked from 'package:dplyr':

    count, do, tally

The following object is masked from 'package:purrr':

    cross

The following object is masked from 'package:ggplot2':

    stat

The following objects are masked from 'package:stats':

    binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
    quantile, sd, t.test, var

The following objects are masked from 'package:base':

    max, mean, min, prod, range, sample, sum

library(ggplot2)
library(boot)


Attaching package: 'boot'

The following object is masked from 'package:mosaic':

    logit

The following object is masked from 'package:lattice':

    melanoma

sfsu_frog <- read.csv("SFSU_FrogsDisease_Data.csv")

anhui_frog_1 <- read.csv("265_2025_3654_MOESM2_ESM.csv")

anhui_frog_2 <- read.csv("265_2025_3654_MOESM4_ESM.csv")

anhui_frog_3 <- read.csv("265_2025_3654_MOESM3_ESM.csv")

Initial Association

Plotting the relationship between SVL and BdZe in the sfsu study

ggplot(data = sfsu_frog,
       mapping = aes(x = SVL, y = BdZE)) + 
        geom_point() + geom_smooth(method = "lm")

`geom_smooth()` using formula = 'y ~ x'

Let’s see if SVL correlates with other call attributes in the SFSU study.

sfsu2 <- ggplot(data = sfsu_frog,
       mapping = aes(x = SVL, y = Call.Rate)) +
         geom_point() + geom_smooth(method = "lm") 

sfsu3 <- ggplot(data = sfsu_frog,
       mapping = aes(x = SVL, y = Call.Effort)) +
         geom_point() + geom_smooth(method = "lm")

sfsu4 <- ggplot(data = sfsu_frog,
       mapping = aes(x = SVL, y = Call.Duty.Cycle )) +
         geom_point() + geom_smooth(method = "lm")

sfsu5 <- ggplot(data = sfsu_frog,
       mapping = aes(x = SVL, y = InterCall.Interval )) +
         geom_point() + geom_smooth(method = "lm")

sfsu2 + sfsu3 + sfsu4 + sfsu5

`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

As we can see, only call duty cycle has an obvious correlation with SVL. Call rates and call effort has a flat or slightly upward regression line, while Intercall Interval has a slightly downward trend line.

Wait, is the data normal?

Testing BdZE distribution using Cumulative distribution function

plot(ecdf(sfsu_frog$BdZE))

So BdZE is not normally distributed but exponentially distributed. We can also tell in the following histogram.

hist(sfsu_frog$BdZE)

We can also run the Shapiro-Wilk normality test to see if its normally distributed.

shapiro.test(sfsu_frog$BdZE)


    Shapiro-Wilk normality test

data:  sfsu_frog$BdZE
W = 0.46488, p-value < 2.2e-16

With the p-value less than 0.05, we rejected the hypothesis of normality, meaning BdZe is non-parametric.

Now, let’s see if the other dataset (parasite count vs SVL) is normally distributed.

shapiro.test(anhui_frog_1$parasite)


    Shapiro-Wilk normality test

data:  anhui_frog_1$parasite
W = 0.74822, p-value = 5.111e-10

The other species in the study.

shapiro.test(anhui_frog_2$parasite)


    Shapiro-Wilk normality test

data:  anhui_frog_2$parasite
W = 0.70773, p-value = 6.318e-11

Both of the p-values are less than 0.05, meaning data for parasite is also non-parametric.

Tried to transform the data

log-transformed

sfsu_frog$log_BdZE <- log(sfsu_frog$BdZE)
hist(sfsu_frog$log_BdZE)

Unable to transform the data to normal by log transforming it.

Now try bootstrapping

# function to obtain R-Squared from the data
rsq <- function(formula, data, indices)
{
  d <- data[indices,] # allows boot to select sample
  fit <- lm(formula, data=d)
  return(summary(fit)$r.square)
}
# bootstrapping with 1000 replications
results <- boot(data=sfsu_frog, statistic = rsq,
                R=1000, formula=SVL~BdZE)

# View results
results


ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = sfsu_frog, statistic = rsq, R = 1000, formula = SVL ~ 
    BdZE)


Bootstrap Statistics :
       original      bias    std. error
t1* 0.006340548 0.001671743 0.007306354

plot(results)

Spearman’s rank-order correlation test

The Spearman’s rank-order correlation is the nonparametric version of the Pearson product-moment correlation. Spearman’s correlation coefficient, (ρ, also signified by r_s) measures the strength and direction of association between two ranked variables.

corr <- cor.test(x=sfsu_frog$SVL, y=sfsu_frog$BdZE, method = 'spearman')

Warning in cor.test.default(x, y, ...): Cannot compute exact p-value with ties

corr


    Spearman's rank correlation rho

data:  x and y
S = 19980043, p-value = 0.208
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
-0.057337

Lets visualize it.

ggscatter(sfsu_frog, x = "SVL", y = "BdZE",
          add = "reg.line", conf.int = TRUE,
          cor.coef = TRUE, cor.method = "spearman",
          xlab = "SVL", ylab = "BdZE")

With a p-value of 0.21, it means the data failed to reject the hypothesis that there are no correlation between SVL and BdZE load. R = -0.057, meaning there is only a tiny, ignorable negative correlation.

Lets see if SVL correlate with parasite count in the other study.

corr2 <- cor.test(x=anhui_frog_1$SVL, y=anhui_frog_1$parasite, method = 'spearman')

Warning in cor.test.default(x, y, ...): Cannot compute exact p-value with ties

corr2


    Spearman's rank correlation rho

data:  x and y
S = 52816, p-value = 0.03143
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.2487081

ggscatter(anhui_frog_1, x = "SVL", y = "parasite",
          add = "reg.line", conf.int = TRUE,
          cor.coef = TRUE, cor.method = "spearman",
          xlab = "SVL", ylab = "parasite count")

For the first species in the other study, SVL and parasite count does correlate, with a p-value of less than 0.05. Also, the correlation coefficient R is 0.25. (Note: the coefficient is a number ranging from -1 to 1 that indicates how strongly the two variables are correlated.)

What about the other species?

corr3 <- cor.test(x=anhui_frog_2$SVL, y=anhui_frog_2$parasite, method = 'spearman')

Warning in cor.test.default(x, y, ...): Cannot compute exact p-value with ties

corr3


    Spearman's rank correlation rho

data:  x and y
S = 70794, p-value = 0.9523
alternative hypothesis: true rho is not equal to 0
sample estimates:
         rho 
-0.007023257

ggscatter(anhui_frog_2, x = "SVL", y = "parasite",
          add = "reg.line", conf.int = TRUE,
          cor.coef = TRUE, cor.method = "spearman",
          xlab = "SVL", ylab = "parasite count")

Interestingly or rather surprisingly , there is not much of a correlation between SVL and parasite count in the other species in the same study. We were expecting the result would be similar to the first species, however, it is not the case.

Why?