Survival analysis for Kalloway

Survival analysis to the end of 2023 for oyster ploidy project

Read in the data and create basic tables and graphs

Load the required packages. Using survival package that was updated in August 2023

library(tidyverse)
library(here)
library(survival)
library(ggfortify)
library(ggsurvfit)
library(survminer)
library(gridExtra)
library(frailtyEM)
library(coxme)
library(emmeans)
library(multcomp)
library(knitr)

Read in the mortality data from the field experiments

survdat <- read.csv(here("Data", "MAY_2023_OCT_2024_MORTSBYBAG.csv"))
select <- dplyr::select#other packages are sasking select so set dplyr select as default

Data Wrangling

Correct the data structure

survdat <- survdat %>% mutate_at(c("Site", "Bag_Col", "Month", "Bag_no", "Month_no", "Year", "Month_year"), as.factor)
str(survdat)

'data.frame':   1152 obs. of  8 variables:
 $ Site      : Factor w/ 4 levels "Chelsea","Hood Head",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Bag_Col   : Factor w/ 3 levels "Blue","Green",..: 2 2 2 2 2 2 2 2 1 1 ...
 $ Month     : Factor w/ 7 levels "August","July",..: 4 4 4 4 4 4 4 4 4 4 ...
 $ Bag_no    : Factor w/ 75 levels "2","3","8","9",..: 45 54 66 67 70 56 55 7 20 22 ...
 $ Mort_count: int  0 0 0 0 0 0 0 0 0 0 ...
 $ Month_no  : Factor w/ 6 levels "5","6","7","8",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Year      : Factor w/ 2 levels "2023","2024": 1 1 1 1 1 1 1 1 1 1 ...
 $ Month_year: Factor w/ 12 levels "August_2023",..: 7 7 7 7 7 7 7 7 7 7 ...

Create a new data sheet that calculates the number of survivors at each sampling event (as opposed to the number of moralities)

# A tibble: 1,248 × 5
   Site    Bag_Col Bag_no Month_survivor           No_survivors
   <fct>   <fct>   <fct>  <fct>                           <dbl>
 1 Chelsea Green   59     April_2023_survivors               50
 2 Chelsea Green   59     May_2023_survivors                 50
 3 Chelsea Green   59     June_2023_survivors                50
 4 Chelsea Green   59     July_2023_survivors                50
 5 Chelsea Green   59     August_2023_survivors              48
 6 Chelsea Green   59     September_2023_survivors           27
 7 Chelsea Green   59     October_2023_survivors             20
 8 Chelsea Green   59     May_2024_survivors                 19
 9 Chelsea Green   59     June_2024_survivors                19
10 Chelsea Green   59     July_2024_survivors                18
# ℹ 1,238 more rows

Double check that all bags are accounted for. There should be 8 bags per colour site per sampling event = 104 as of Nov 2023

`summarise()` has grouped output by 'Site'. You can override using the
`.groups` argument.

# A tibble: 12 × 3
# Groups:   Site [4]
   Site          Bag_Col nbags
   <fct>         <fct>   <int>
 1 Chelsea       Blue      104
 2 Chelsea       Green     104
 3 Chelsea       Red       104
 4 Hood Head     Blue      104
 5 Hood Head     Green     104
 6 Hood Head     Red       104
 7 Manchester    Blue      104
 8 Manchester    Green     104
 9 Manchester    Red       104
10 Thorndyke Bay Blue      104
11 Thorndyke Bay Green     104
12 Thorndyke Bay Red       104

Summarise the number of survivors per site per sampling event

`summarise()` has grouped output by 'Site', 'Bag_Col'. You can override using
the `.groups` argument.

# A tibble: 156 × 5
# Groups:   Site, Bag_Col [12]
   Site    Bag_Col Date       meansurv semsurv
   <fct>   <fct>   <date>        <dbl>   <dbl>
 1 Chelsea Blue    2023-04-01     50     0    
 2 Chelsea Blue    2023-05-01     50     0    
 3 Chelsea Blue    2023-06-01     49.9   0.125
 4 Chelsea Blue    2023-07-01     49.5   0.327
 5 Chelsea Blue    2023-08-01     36.2   2.24 
 6 Chelsea Blue    2023-09-01     28.9   2.55 
 7 Chelsea Blue    2023-10-01     21.8   2.87 
 8 Chelsea Blue    2024-05-01     20.8   2.67 
 9 Chelsea Blue    2024-06-01     20.1   3.13 
10 Chelsea Blue    2024-07-01     17.2   2.95 
# ℹ 146 more rows

Create a basic survival plot that shows the number of survivors at each sampling event

Summarise survivors/mortality at the conclusion of the study

kable(mean_survival_proportion %>% filter(Date == "2024-10-01"))

Site	Bag_Col	Date	meansurv	semsurv
Chelsea	Blue	2024-10-01	11.750	2.9805920
Chelsea	Green	2024-10-01	18.375	3.1619247
Chelsea	Red	2024-10-01	15.625	2.3447319
Hood Head	Blue	2024-10-01	41.250	1.8684409
Hood Head	Green	2024-10-01	43.000	1.5000000
Hood Head	Red	2024-10-01	42.000	1.6903085
Manchester	Blue	2024-10-01	46.750	0.8813545
Manchester	Green	2024-10-01	46.375	0.9437293
Manchester	Red	2024-10-01	47.875	0.5805632
Thorndyke Bay	Blue	2024-10-01	41.625	2.0610114
Thorndyke Bay	Green	2024-10-01	41.125	1.5633926
Thorndyke Bay	Red	2024-10-01	31.625	3.4945749

Need to turn the data into individual survival data. Each oyster should be listed with either alive or dead at each sampling point. Each oyster will need an ID number. 8 Cages per ploidy per site were monitored for mortality so a total of 400 oysters per ploidy per site at the start. Total of 4800 oysters.

This is a very clunky way to do this and needs to be streamlined - a for loop might work?

# A tibble: 4,800 × 17
   Site.x  Bag_Col.x Bag_no.x OYS_ID         April_status May_status June_status
   <fct>   <fct>     <fct>    <chr>                 <dbl>      <dbl>       <dbl>
 1 Chelsea Green     59       ChelseaGreen5…            0          0           0
 2 Chelsea Green     59       ChelseaGreen5…            0          0           0
 3 Chelsea Green     59       ChelseaGreen5…            0          0           0
 4 Chelsea Green     59       ChelseaGreen5…            0          0           0
 5 Chelsea Green     59       ChelseaGreen5…            0          0           0
 6 Chelsea Green     59       ChelseaGreen5…            0          0           0
 7 Chelsea Green     59       ChelseaGreen5…            0          0           0
 8 Chelsea Green     59       ChelseaGreen5…            0          0           0
 9 Chelsea Green     59       ChelseaGreen5…            0          0           0
10 Chelsea Green     59       ChelseaGreen5…            0          0           0
# ℹ 4,790 more rows
# ℹ 10 more variables: July_status <dbl>, August_status <dbl>,
#   September_status <dbl>, October_status <dbl>, May_24_status <dbl>,
#   June_24_status <dbl>, July_24_status <dbl>, August_24_status <dbl>,
#   September_24_status <dbl>, October_24_status <dbl>

Pivot this data longer to get the correct formayt for survival analyses

Survival analyses

Survial plots for pooled data (no random effects of bag number)

Created Kaplan-Meyer survival plots for all data pooled across sites and ploidys. They do not consider the potential random effects of Bag within sites.

Create a survival plot faceted by Site

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
ℹ The deprecated feature was likely used in the ggpubr package.
  Please report the issue at <https://github.com/kassambara/ggpubr/issues>.

Warning: ggtheme is not a valid theme.
Please use `theme()` to construct themes.

Scale for x is already present.
Adding another scale for x, which will replace the existing scale.
Ignoring unknown labels:
• legend : "ploidy"
• linetype : "1"

Create a survival plot faceted by ploidy

pooled_ploidy_survplot <- survival_dat %>% 
 ggsurvplot_facet(survfit(Surv(month_no, event, type = "right") ~ Site.x , data = .),
            facet.by = "Bag_Col.x",
            data = .,
            risk.table = FALSE, pval = FALSE, conf.int = TRUE,
            xlab = ("Month"),
            xlim = c(4, 19), ylim = c(0.6, 1))

Warning: ggtheme is not a valid theme.
Please use `theme()` to construct themes.

pooled_ploidy_survplot

Ignoring unknown labels:
• linetype : "1"

These plots show that there appeared to be an effect of site on survival throughout summer of 2023

Modeling of survival

We initially modeled survival using a cox mixed effects model to determine if there were differences in survival between ploidies and sites. This did not include any environmental data. We compared models that included a random effect of bag number with those that did not.

Analysis of Deviance Table
 Cox model: response is  surv_field
 Model 1: ~Site.x * Bag_Col.x
 Model 2: ~Site.x * Bag_Col.x + (1 | Bag_no.x)
  loglik  Chisq Df P(>|Chi|)    
1 -12528                        
2 -12406 244.11  1 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The model comaprisons (Table 1) showed that the model that included theh random effect of bag was better than the model that did not.

kable(anova(coxph_pooled, coxme_randomeff), digits = 2)

Table 1: Comparison of model that includes random effects and that which does not. No environmental variables are included in this model.

loglik	Chisq	Df	P(>\|Chi\|)
-12527.69	NA	NA	NA
-12405.63	244.11	1	0

We then examined the signifcance of each of the main effects and their inetraction. Since the coxme can’t produce an anova stule output we performed liklihood ratio tests. All interaction and main effects in this model were significant (@LRT_cox_model).

# Full model with interaction (already fitted)
# coxme_randomeff <- coxme(surv_field ~ Site.x * Bag_Col.x + (1 | Bag_no.x), data = survival_dat)

# Additive model with only main effects
coxme_additive <- coxme(
  surv_field ~ Site.x + Bag_Col.x + (1 | Bag_no.x),
  data = survival_dat
)

# Model without Site.x
coxme_no_site <- coxme(
  surv_field ~ Bag_Col.x + (1 | Bag_no.x),
  data = survival_dat
)

# Model without Bag_Col.x
coxme_no_color <- coxme(
  surv_field ~ Site.x + (1 | Bag_no.x),
  data = survival_dat
)

# LRT comparison function with manual df
lrt_compare_manual <- function(full_model, reduced_model, effect_name, df_manual) {
  LRT <- 2 * (logLik(full_model)[1] - logLik(reduced_model)[1])
  p <- pchisq(LRT, df = df_manual, lower.tail = FALSE)
  
  data.frame(
    Effect = effect_name,
    LRT_statistic = round(LRT, 3),
    df = df_manual,
    p_value = signif(p, 4)
  )
}

# Run comparisons with manually specified df
lrt_results <- bind_rows(
  lrt_compare_manual(coxme_randomeff, coxme_additive, "Interaction (Site × Bag Color)", df_manual = 6),
  lrt_compare_manual(coxme_additive, coxme_no_site, "Main effect: Site", df_manual = 3),
  lrt_compare_manual(coxme_additive, coxme_no_color, "Main effect: Bag Color", df_manual = 2)
)

# View results
print(lrt_results)

                                      Effect LRT_statistic df   p_value
Penalized...1 Interaction (Site × Bag Color)        24.370  6 4.463e-04
Penalized...2              Main effect: Site       366.618  3 3.760e-79
Penalized...3         Main effect: Bag Color         2.750  2 2.528e-01

We then conducted pairwise comparisons to understand how mortality differed across sites in this model. We simplified the output to undestand differences between site within ploidies (@pairs_ploidy_noenv) and differences between ploidies across sites (@pairs_site_noenv)

pairwise_coxph_randomeff <- emmeans(coxme_randomeff, ~ Site.x*Bag_Col.x, adjust = "Holm")
pairs_site <- pairs(pairwise_coxph_randomeff, simple = "Site.x", adjust = "Holm")
pairs_ploidy <- pairs(pairwise_coxph_randomeff, simple = "Bag_Col.x", adjust = "Holm")
pairs_all <- pairs(pairwise_coxph_randomeff, adjust = "Holm")

#cld(pairs_all)#Adds compact letter display to show groupings

kable(lrt_results, digits = 2)

Table 2: Likleyhood ration tests to understand the imacts of main and interaction effects in this model.

	Effect	LRT_statistic	df	p_value
Penalized…1	Interaction (Site × Bag Color)	24.37	6	0.00
Penalized…2	Main effect: Site	366.62	3	0.00
Penalized…3	Main effect: Bag Color	2.75	2	0.25

kable(pairs_ploidy, digits = 2)

Table 3: Pairwise comparisons within sites across ploidies

contrast	Site.x	estimate	SE	df	z.ratio	p.value
Blue - Green	Chelsea	0.68	0.24	Inf	2.79	0.02
Blue - Red	Chelsea	0.42	0.18	Inf	2.27	0.05
Green - Red	Chelsea	-0.26	0.26	Inf	-1.01	0.31
Blue - Green	Hood Head	0.33	0.29	Inf	1.14	0.76
Blue - Red	Hood Head	0.33	0.29	Inf	1.14	0.76
Green - Red	Hood Head	0.00	0.30	Inf	0.01	0.99
Blue - Green	Manchester	0.11	0.36	Inf	0.30	0.77
Blue - Red	Manchester	0.92	0.37	Inf	2.46	0.04
Green - Red	Manchester	0.82	0.38	Inf	2.16	0.06
Blue - Green	Thorndyke Bay	-0.06	0.24	Inf	-0.26	0.80
Blue - Red	Thorndyke Bay	-1.11	0.25	Inf	-4.38	0.00
Green - Red	Thorndyke Bay	-1.05	0.26	Inf	-4.02	0.00

At Chelsea blue moratlity was significantly higher than the otehr two groups. Everywhere else red had the highest mortality. At Hood Head, no differences in mortality existed across ploidies, at Manchester Red mortality was significantly higher. than the other two groups. At Thorndyke red mortality was significantly higher than the otehr two groups.

kable(pairs_site, digits = 2)

Table 4: Pairwise comparisons within ploidies across sites

contrast	Bag_Col.x	estimate	SE	df	z.ratio	p.value
Chelsea - Hood Head	Blue	2.55	0.25	Inf	10.26	0.00
Chelsea - Manchester	Blue	3.58	0.28	Inf	12.89	0.00
Chelsea - Thorndyke Bay	Blue	2.79	0.22	Inf	12.49	0.00
Hood Head - Manchester	Blue	1.03	0.31	Inf	3.38	0.00
Hood Head - Thorndyke Bay	Blue	0.25	0.27	Inf	0.91	0.36
Manchester - Thorndyke Bay	Blue	-0.79	0.30	Inf	-2.59	0.02
Chelsea - Hood Head	Green	2.19	0.29	Inf	7.60	0.00
Chelsea - Manchester	Green	3.00	0.33	Inf	9.22	0.00
Chelsea - Thorndyke Bay	Green	2.05	0.27	Inf	7.70	0.00
Hood Head - Manchester	Green	0.81	0.35	Inf	2.32	0.04
Hood Head - Thorndyke Bay	Green	-0.14	0.30	Inf	-0.47	0.64
Manchester - Thorndyke Bay	Green	-0.95	0.34	Inf	-2.84	0.01
Chelsea - Hood Head	Red	2.46	0.25	Inf	9.75	0.00
Chelsea - Manchester	Red	4.08	0.32	Inf	12.78	0.00
Chelsea - Thorndyke Bay	Red	1.26	0.23	Inf	5.43	0.00
Hood Head - Manchester	Red	1.63	0.34	Inf	4.73	0.00
Hood Head - Thorndyke Bay	Red	-1.19	0.26	Inf	-4.59	0.00
Manchester - Thorndyke Bay	Red	-2.82	0.32	Inf	-8.79	0.00

For blue, signifcant differnces existed between all sites exxcept Hood Head and Thorndyke. Same for green. Red had significant diffs in mortality across al sites.