1 Objectives

The objectives of this analyses is to determine factors affecting the prevalence and infection intensity of the exotic pathogen Bonamia ostreae on the NZ native flat oyster (Ostrea chilensis).

Code

library(tidyverse)
library(readxl)
library(knitr)
library(janitor)
library(sjPlot)
library(DHARMa)
library(broom)
library(ggfortify)
theme_set(theme_minimal())

d <-
  read_excel('Bonamia results for Javier formatted for R.xlsx', 1) %>%
  mutate(PCROs = factor(PCROs),
         Cohort = factor(Cohort),
         APX = factor(APX),
         Bucephalus = if_else(is.na(Buc), 0, 1))
vars <- read_excel('Bonamia results for Javier formatted for R.xlsx', 2)

2 Cohort size

Cohort 1 was smaller than Cohort 2 (mean 62.4 mm ± 7.1 S.D. and 69.3 mm ± 8.4, respectively, Table 1 and Figure 1) and this difference was significant (t-test, t = - 4.26, p<0.001, Table 2).

Code

d %>%
    group_by(Cohort) %>%
    summarise(
        mean = mean(Size),
        sd = sd(Size),
        n = n(),
        se = sd / sqrt(n),
        .groups = 'drop'
    ) %>% 
  kable(digits = 2)

Table 1: Summary stats of size between cohorts
Cohort	mean	sd	n	se
1	62.40	7.11	35	1.20
2	69.32	8.44	60	1.09

Code

ggplot(d, aes(Cohort, Size)) +
  geom_boxplot() +
  labs(y = 'Size (mm)')

Figure 1: Boxplot of oyster size between cohort

Code

t.test(Size ~ Cohort, data = d) %>% 
  tidy() %>% 
  mutate_if(is.numeric, ~ signif(., 3)) %>% 
              kable()

Table 2: Results of t-test comparing mean oyster size between cohorts
estimate	estimate1	estimate2	statistic	p.value	parameter	conf.low	conf.high	method	alternative
-6.92	62.4	69.3	-4.26	5.41e-05	81.2	-10.1	-3.69	Welch Two Sample t-test	two.sided

3 Prevalence of Bonamia, Bucephalus and APX by Cohort

Code

read_excel('Bonamia results for Javier formatted for R.xlsx', 1) %>% 
  mutate(Bucephalus = if_else(is.na(Buc), 0, 1)) %>% 
  pivot_longer(cols = c(PCROs, APX, Bucephalus)) %>% 
  mutate(name = fct_recode(name, `Bonamia ostreae` = "PCROs")) %>% 
  ggplot() +
  geom_bar(aes(x = factor(Cohort), fill = factor(value)),
           alpha = 0.6,
           color = 'gray20',
           position = 'fill') +
  labs(x = "Cohort", y = NULL) +
  scale_fill_discrete(
    name = NULL
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  facet_wrap(~fct_rev(name))

Figure 2: Barplot of prevalence of Bonamia ostraea, APX and Bucephalus in relation to cohort

Figure 2 shows that the prevalence of all three parasites was much higher in cohort 2 compared to cohort 1.

Fisher’s exact test was used to determine if there were an association between parasite prevalence and cohort. It is similar to chi-square test, however the Fisher’s test runs an exact procedure especially for small-sized samples, rather than an approximation assuming a large sample size.

Code

f1 <- 
tabyl(d, APX, Cohort, show_na = F) %>% 
  fisher.test(alternative="two.sided") %>% 
  glance()
f2 <- 
  tabyl(d, Bucephalus, Cohort, show_na = F) %>% 
  fisher.test(alternative="two.sided") %>% 
  glance()
f3 <- 
  tabyl(d, PCROs, Cohort, show_na = F) %>% 
  fisher.test(alternative="two.sided") %>% 
  glance()


bind_rows(APX = f1,Bucephalus = f2, B.ostreae = f3, .id = 'Parasite') %>% 
  select(-alternative, - method) %>% 
  mutate_if(is.numeric, ~signif(.x, 3)) %>% 
  kable()

Table 3: Results of the Fisher’s extact test to test the association between the prevalence of B.ostreae, Bucephalus and APX in relation to cohort’
Parasite	estimate	p.value	conf.low	conf.high
APX	29.0	0.000002	4.27	1240
Bucephalus	10.2	0.008110	1.42	449
B.ostreae	44.1	0.000000	11.60	220

The Fisher;’ exact test confirmed that the differences in prevalence between cohorts were highly significant for all three parasites (p<0.001, Table 3).

4 Logistic regression

Due to the binary nature of the prevalence data (i.e. presence/absence) logistic regression was used to test the effect of cohort, the prevalence of other parasites and % gonadal area. Logistic regression allows to test for for several factor simultaneously and their potential interactions (e.g. Cohort dependent effects of the prevalence of other parasite). In addition, it allows to control for the effects of continuous covariates (e.g. percent gonadal area). Logistic regression models were fitted with cohort and the prevalence of other parasites as a fixed effects with two levels each. The interaction between Cohort and the prevalence of other parasites was also included to test for potential cohort dependant effects. Percentage gonadal areas was fitted as a continuous variable to control for its potential effect on B. ostrae prevalence. The regression coefficients are presented as odds ratios (exponentiated coefficients) that are interpreted as the ratio of the probability of a infection event and the probability of the no infection.

5 Bonamia ostreae prevalence

Table 4 and Figure 3 show the frequency of Bonamia ostreae prevalence in relation to Cohort and APX

5.1 Data exploration

Code

d %>%
  group_by(Cohort, APX) %>%
  count(PCROs, name = "Frequency") %>% 
  kable()

Table 4: Frequency of Bonamia ostreae prevalence in relation to Cohort and APX.
Cohort	APX	PCROs	Frequency
1	0	0	26
1	0	1	8
1	1	0	1
2	0	0	2
2	0	1	30
2	1	0	2
2	1	1	26

Code

ggplot(d) +
  geom_bar(aes(x = Cohort, fill = PCROs),
           alpha = 0.6,
           color = 'gray20',
           position = 'fill') +
  labs(x = "Cohort", y = NULL) +
  scale_fill_discrete(
    name = "Bonamia \nostreae"
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  facet_wrap(~APX, labeller = label_both)

Figure 3: Barplot of prevalence of Bonamia ostreae in relation to cohort and the prevalence of APX

Figure 4 shows the % gonadal area in relation to the prevalence of Bonamia ostreae and APX

Code

ggplot(d,aes(PCROs,  `% gonadal area`, fill = Cohort)) +
  geom_boxplot(alpha = 0.6)

Figure 4: Boxplot of % gonadal area prevalence of Bonamaia ostreae by cohort

5.2 Modelling

Code

m1 <-
  glm(PCROs ~ Cohort * APX + `% gonadal area` + Cohort * factor(Bucephalus) ,
      data = d,
      family = 'binomial')

tab_model(m1, show.ci = F)

Table 5: Results of the logistic regression testing the effects of cohort, APX and % gonadal area on the prevalence of Bonamia ostreae.
	PCR Os
Predictors	Odds Ratios	p
(Intercept)	1.02	0.982
Cohort [2]	54.04	<0.001
APX [1]	0.00	0.995
% gonadal area	0.97	0.229
Bucephalus [1]	0.00	0.994
Cohort [2] * APX [1]	1805017.53	0.995
Cohort [2] * Bucephalus [1]	6029003.66	0.995
Observations	95
R² Tjur	0.541

Table 5 shows that Cohort was the only significant factor on B. ostreae prevalence.

5.3 Model selection

Code

m_bonamia_final <- step(m1,trace = F)
tab_model(m_bonamia_final)

Table 6: Results of the final (most parsimonious) logistic regression model.
	PCR Os
Predictors	Odds Ratios	CI	p
(Intercept)	0.30	0.13 – 0.62	0.003
Cohort [2]	47.25	14.41 – 195.79	<0.001
Observations	95
R² Tjur	0.526

Table 6 shows that the most parsimonious model only retained the effect of Cohort (p<0.001). The model predicted that the odds of having B. ostrae are 47 times higher for cohort 2 compared to cohort 1.

5.4 Model validation

Code

simulationOutput <- simulateResiduals(fittedModel = m_bonamia_final)
plot(simulationOutput, quantreg = FALSE, title = NULL)

Figure 5: Logistic regression model validation

The model was validated by inspecting simulated residuals. Figure 5 shows that the model assumptions were met (over-dispersion and homogeneity of variance).

6 APX prevalence

6.1 Modelling

Code

m_apx <-
  glm(APX ~ Cohort * PCROs + `% gonadal area` + Cohort * factor(Bucephalus) ,
      data = d,
      family = 'binomial')

m_apx_final <- step(m_apx,trace = F)
tab_model(m_apx_final)

Table 7: Results of the logistic regression testing the effects of cohort, Bonamia ostreae, Bucephalus and % gonadal area on the prevalence of APX.
	APX
Predictors	Odds Ratios	CI	p
(Intercept)	0.12	0.01 – 0.74	0.055
Cohort [2]	26.30	4.93 – 490.00	0.002
% gonadal area	0.96	0.93 – 0.99	0.009
Observations	95
R² Tjur	0.314

Table 7 shows that the most parsimonious model retained the effect of Cohort (p<0.01) and gonadal area (p<0.01). The model predicted that the odds of having B. ostrae are 26 times higher for cohort 2 compared to cohort 1 and that larger gonadal area slightly reduced the odds of infection (p<0.01).

6.2 Model validation

Code

simulationOutput_apx <- simulateResiduals(fittedModel = m_apx_final)
plot(simulationOutput_apx, quantreg = FALSE, title = NULL)

Figure 6: APX logistic regression model validation

The model was validated by inspecting simulated residuals. Figure 6 shows that the model assumptions were met (over-dispersion and homogeneity of variance).

7 Bucephalus prevalence

7.1 Modelling

Code

m_buc <-
  glm(
    factor(Bucephalus) ~ Cohort * APX + `% gonadal area` + Cohort * factor(PCROs) ,
    data = d,
    family = 'binomial'
  )

m_buc_final <- step(m_buc, trace = F)
tab_model(m_buc_final)

Table 8: Results of the logistic regression testing the effects of cohort, Bonamia ostreae, APX and % gonadal area on the prevalence of Bucephalus
	factor(Bucephalus)
Predictors	Odds Ratios	CI	p
(Intercept)	0.57	0.03 – 5.56	0.641
Cohort [2]	21.32	1.60 – 673.49	0.037
% gonadal area	0.85	0.77 – 0.91	<0.001
Observations	95
R² Tjur	0.677

Table 8 shows that the most parsimonious model retained the effect of Cohort (p<0.05) and gonadal area (p<0.001). The model predicted that the odds of having B. ostrae are 21 times higher for cohort 2 compared to cohort 1 and that larger gonadal area slightly reduced the odds of infection (p<0.001).

Note the full model summary is not presented for the sake of brevity.

7.2 Model validation

Code

simulationOutput_buc <- simulateResiduals(fittedModel = m_buc_final)
plot(simulationOutput_buc, quantreg = FALSE, title = NULL)

Figure 7: Logistic regression model validation

The model was validated by inspecting simulated residuals. Figure 5 shows that the model assumptions were met (over-dispersion and homogeneity of variance).

8 Modelling summary for all three parasites

Code

tab_model(m_bonamia_final, m_apx_final, m_buc_final, dv.labels = c("Bonamia ostreae", "APX", "Bucephalus"))

	Bonamia ostreae			APX			Bucephalus
Predictors	Odds Ratios	CI	p	Odds Ratios	CI	p	Odds Ratios	CI	p
(Intercept)	0.30	0.13 – 0.62	0.003	0.12	0.01 – 0.74	0.055	0.57	0.03 – 5.56	0.641
Cohort [2]	47.25	14.41 – 195.79	<0.001	26.30	4.93 – 490.00	0.002	21.32	1.60 – 673.49	0.037
% gonadal area				0.96	0.93 – 0.99	0.009	0.85	0.77 – 0.91	<0.001
Observations	95			95			95
R² Tjur	0.526			0.314			0.677

9 Bonamia ostreae infection intensity

WholeGrade data was filtered by `Useable for bonamia intensity`. Figure 8 show the distribution of Bonamia intensity is quite uniform across the observed range (0 - 3). It also shows that some oysters that tested positive did not show histological symptoms (i.e. WholeGrade zero).

Code

d1 <- 
  d %>% 
  dplyr::filter(`Useable for bonamia intensity` =="1") %>% 
    mutate(Cohort = factor(Cohort),
         APX = factor(APX),
         Bucephalus = if_else(is.na(Buc), 0, 1))


ggplot(d1, aes(WholeGrade, fill = PCROs)) +
  geom_histogram()

Figure 8: Histogram of Bonamia intensity

Figure 9 shows that intensity was larger in Cohort two but did not varied considerably with the presence of APX Figure 9 (a) and Bucephalus Figure 9 (b)

Code

ggplot(d1, aes(Cohort, WholeGrade)) +
  geom_boxplot() +
  facet_wrap(~APX, labeller = label_both) +
  theme_minimal()

ggplot(d1, aes(Cohort, WholeGrade)) +
  geom_boxplot() +
  facet_wrap(~Bucephalus, labeller = label_both) +
  theme_minimal()

Figure 10 shows that infection intensity was not related to gonadal area.

Code

ggplot(d1, aes(`% gonadal area`, WholeGrade)) +
  geom_point() +
  theme_minimal()

Figure 10: Scatterplot of % gonaldal area vs. infection intensity

9.1 Linear model

A linear model was used to test the effect of cohort, the prevalence of APX and % gonadal area. Cohort and APX were fitted as a fixed effects with two levels each. Percentage gonadal areas was fitted as a continuous variable to control for its potential effect on B. ostrae prevalence.

Code

lm1 <- lm(log(WholeGrade +0.1) ~ Cohort + APX + `% gonadal area` + Bucephalus , data = d1)
tab_model(lm1)

Table 9: Results of the full linear model to test differences in infection intensity
	log(Whole Grade+0.1)
Predictors	Estimates	CI	p
(Intercept)	-1.83	-2.85 – -0.81	0.001
Cohort [2]	1.58	0.85 – 2.30	<0.001
APX [1]	-0.08	-0.75 – 0.59	0.811
% gonadal area	-0.00	-0.03 – 0.02	0.764
Bucephalus	0.16	-0.86 – 1.18	0.760
Observations	78
R² / R² adjusted	0.251 / 0.210

Table 9 show that there was a significant effect of Cohort on infection intensity (p<0.001), whereas APX, Bucephalus and percentage gonadal area had no effect (p>0.75).

9.2 Model selection

The model was selected using a stepwise procedure based on the AIC. The most parsimonious model only included the effect of Cohort (Table 10). The model predicted a significant increase in infection intensity for oysters in Cohort 2.

Code

lm2 <- step(lm1, trace = F)
tab_model(lm2)

Table 10: Results of the final linear model to test differences in infection intensity
	log(Whole Grade+0.1)
Predictors	Estimates	CI	p
(Intercept)	-1.94	-2.50 – -1.38	<0.001
Cohort [2]	1.58	0.94 – 2.21	<0.001
Observations	78
R² / R² adjusted	0.244 / 0.234

9.3 Model validation

The model was validated by inspecting model residuals, that showed that model assumptions were reasonably met (Figure 11).

Code

autoplot(lm2) + geom_point(position = position_jitter(width = .1))

10 Conclusions

Oysters from Cohort 1 were significantly smaller than those from Cohort 2.
The prevalence of all three parasites, namely B. ostraea, APX and Bucephalus, was significantly higher in Cohort 2 compared to Cohort 1.
The prevalence of APX and Bucephalus was slightly but significantly reduced in oyster with increasing percentage gonadal area.
There was no significant effect of other parasites on all three parasite prevalence.
Infection intensity of B. ostraea followed the same pattern as prevalence , i.e. there was a significant Cohort effect but no effect of APX or gonadal area was detected.