Survival analysis for Kalloway

Survival analysis to the end of 2023 for oyster ploidy project

Read in the data and create basic tables and graphs

Load the required packages. Using survival package that was updated in August 2023

library(tidyverse)
library(here)
library(survival)
library(ggfortify)
library(ggsurvfit)
library(survminer)
library(gridExtra)
library(frailtyEM)
library(coxme)
library(emmeans)
library(multcomp)
library(knitr)

Read in the mortality data from the field experiments

survdat <- read.csv(here("Data", "MAY_2023_OCT_2024_MORTSBYBAG.csv"))
select <- dplyr::select#other packages are sasking select so set dplyr select as default

Data Wrangling

Correct the data structure

survdat <- survdat %>% mutate_at(c("Site", "Bag_Col", "Month", "Bag_no", "Month_no", "Year", "Month_year"), as.factor)
str(survdat)
'data.frame':   1152 obs. of  8 variables:
 $ Site      : Factor w/ 4 levels "Chelsea","Hood Head",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Bag_Col   : Factor w/ 3 levels "Blue","Green",..: 2 2 2 2 2 2 2 2 1 1 ...
 $ Month     : Factor w/ 7 levels "August","July",..: 4 4 4 4 4 4 4 4 4 4 ...
 $ Bag_no    : Factor w/ 75 levels "2","3","8","9",..: 45 54 66 67 70 56 55 7 20 22 ...
 $ Mort_count: int  0 0 0 0 0 0 0 0 0 0 ...
 $ Month_no  : Factor w/ 6 levels "5","6","7","8",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Year      : Factor w/ 2 levels "2023","2024": 1 1 1 1 1 1 1 1 1 1 ...
 $ Month_year: Factor w/ 12 levels "August_2023",..: 7 7 7 7 7 7 7 7 7 7 ...

Create a new data sheet that calculates the number of survivors at each sampling event (as opposed to the number of moralities)

# A tibble: 1,248 × 5
   Site    Bag_Col Bag_no Month_survivor           No_survivors
   <fct>   <fct>   <fct>  <fct>                           <dbl>
 1 Chelsea Green   59     April_2023_survivors               50
 2 Chelsea Green   59     May_2023_survivors                 50
 3 Chelsea Green   59     June_2023_survivors                50
 4 Chelsea Green   59     July_2023_survivors                50
 5 Chelsea Green   59     August_2023_survivors              48
 6 Chelsea Green   59     September_2023_survivors           27
 7 Chelsea Green   59     October_2023_survivors             20
 8 Chelsea Green   59     May_2024_survivors                 19
 9 Chelsea Green   59     June_2024_survivors                19
10 Chelsea Green   59     July_2024_survivors                18
# ℹ 1,238 more rows

Double check that all bags are accounted for. There should be 8 bags per colour site per sampling event = 104 as of Nov 2023

`summarise()` has grouped output by 'Site'. You can override using the
`.groups` argument.
# A tibble: 12 × 3
# Groups:   Site [4]
   Site          Bag_Col nbags
   <fct>         <fct>   <int>
 1 Chelsea       Blue      104
 2 Chelsea       Green     104
 3 Chelsea       Red       104
 4 Hood Head     Blue      104
 5 Hood Head     Green     104
 6 Hood Head     Red       104
 7 Manchester    Blue      104
 8 Manchester    Green     104
 9 Manchester    Red       104
10 Thorndyke Bay Blue      104
11 Thorndyke Bay Green     104
12 Thorndyke Bay Red       104

Summarise the number of survivors per site per sampling event

`summarise()` has grouped output by 'Site', 'Bag_Col'. You can override using
the `.groups` argument.
# A tibble: 156 × 5
# Groups:   Site, Bag_Col [12]
   Site    Bag_Col Date       meansurv semsurv
   <fct>   <fct>   <date>        <dbl>   <dbl>
 1 Chelsea Blue    2023-04-01     50     0    
 2 Chelsea Blue    2023-05-01     50     0    
 3 Chelsea Blue    2023-06-01     49.9   0.125
 4 Chelsea Blue    2023-07-01     49.5   0.327
 5 Chelsea Blue    2023-08-01     36.2   2.24 
 6 Chelsea Blue    2023-09-01     28.9   2.55 
 7 Chelsea Blue    2023-10-01     21.8   2.87 
 8 Chelsea Blue    2024-05-01     20.8   2.67 
 9 Chelsea Blue    2024-06-01     20.1   3.13 
10 Chelsea Blue    2024-07-01     17.2   2.95 
# ℹ 146 more rows

Create a basic survival plot that shows the number of survivors at each sampling event

Summarise survivors/mortality at the conclusion of the study

kable(mean_survival_proportion %>% filter(Date == "2024-10-01"))
Site Bag_Col Date meansurv semsurv
Chelsea Blue 2024-10-01 11.750 2.9805920
Chelsea Green 2024-10-01 18.375 3.1619247
Chelsea Red 2024-10-01 15.625 2.3447319
Hood Head Blue 2024-10-01 41.250 1.8684409
Hood Head Green 2024-10-01 43.000 1.5000000
Hood Head Red 2024-10-01 42.000 1.6903085
Manchester Blue 2024-10-01 46.750 0.8813545
Manchester Green 2024-10-01 46.375 0.9437293
Manchester Red 2024-10-01 47.875 0.5805632
Thorndyke Bay Blue 2024-10-01 41.625 2.0610114
Thorndyke Bay Green 2024-10-01 41.125 1.5633926
Thorndyke Bay Red 2024-10-01 31.625 3.4945749

Need to turn the data into individual survival data. Each oyster should be listed with either alive or dead at each sampling point. Each oyster will need an ID number. 8 Cages per ploidy per site were monitored for mortality so a total of 400 oysters per ploidy per site at the start. Total of 4800 oysters.

This is a very clunky way to do this and needs to be streamlined - a for loop might work?

# A tibble: 4,800 × 17
   Site.x  Bag_Col.x Bag_no.x OYS_ID         April_status May_status June_status
   <fct>   <fct>     <fct>    <chr>                 <dbl>      <dbl>       <dbl>
 1 Chelsea Green     59       ChelseaGreen5…            0          0           0
 2 Chelsea Green     59       ChelseaGreen5…            0          0           0
 3 Chelsea Green     59       ChelseaGreen5…            0          0           0
 4 Chelsea Green     59       ChelseaGreen5…            0          0           0
 5 Chelsea Green     59       ChelseaGreen5…            0          0           0
 6 Chelsea Green     59       ChelseaGreen5…            0          0           0
 7 Chelsea Green     59       ChelseaGreen5…            0          0           0
 8 Chelsea Green     59       ChelseaGreen5…            0          0           0
 9 Chelsea Green     59       ChelseaGreen5…            0          0           0
10 Chelsea Green     59       ChelseaGreen5…            0          0           0
# ℹ 4,790 more rows
# ℹ 10 more variables: July_status <dbl>, August_status <dbl>,
#   September_status <dbl>, October_status <dbl>, May_24_status <dbl>,
#   June_24_status <dbl>, July_24_status <dbl>, August_24_status <dbl>,
#   September_24_status <dbl>, October_24_status <dbl>

Pivot this data longer to get the correct formayt for survival analyses

Survival analyses

Survial plots for pooled data (no random effects of bag number)

Created Kaplan-Meyer survival plots for all data pooled across sites and ploidys. They do not consider the potential random effects of Bag within sites.

Create a survival plot faceted by Site

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
ℹ The deprecated feature was likely used in the ggpubr package.
  Please report the issue at <https://github.com/kassambara/ggpubr/issues>.
Warning: ggtheme is not a valid theme.
Please use `theme()` to construct themes.
Scale for x is already present.
Adding another scale for x, which will replace the existing scale.
Ignoring unknown labels:
• legend : "ploidy"
• linetype : "1"

Create a survival plot faceted by ploidy

pooled_ploidy_survplot <- survival_dat %>% 
 ggsurvplot_facet(survfit(Surv(month_no, event, type = "right") ~ Site.x , data = .),
            facet.by = "Bag_Col.x",
            data = .,
            risk.table = FALSE, pval = FALSE, conf.int = TRUE,
            xlab = ("Month"),
            xlim = c(4, 19), ylim = c(0.6, 1))
Warning: ggtheme is not a valid theme.
Please use `theme()` to construct themes.
pooled_ploidy_survplot
Ignoring unknown labels:
• linetype : "1"

These plots show that there appeared to be an effect of site on survival throughout summer of 2023

Modeling of survival

We initially modeled survival using a cox mixed effects model to determine if there were differences in survival between ploidies and sites. This did not include any environmental data. We compared models that included a random effect of bag number with those that did not.

Analysis of Deviance Table
 Cox model: response is  surv_field
 Model 1: ~Site.x * Bag_Col.x
 Model 2: ~Site.x * Bag_Col.x + (1 | Bag_no.x)
  loglik  Chisq Df P(>|Chi|)    
1 -12528                        
2 -12406 244.11  1 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The model comaprisons (Table 1) showed that the model that included theh random effect of bag was better than the model that did not.

kable(anova(coxph_pooled, coxme_randomeff), digits = 2)
Table 1: Comparison of model that includes random effects and that which does not. No environmental variables are included in this model.
loglik Chisq Df P(>|Chi|)
-12527.69 NA NA NA
-12405.63 244.11 1 0

We then examined the signifcance of each of the main effects and their inetraction. Since the coxme can’t produce an anova stule output we performed liklihood ratio tests. All interaction and main effects in this model were significant (@LRT_cox_model).

# Full model with interaction (already fitted)
# coxme_randomeff <- coxme(surv_field ~ Site.x * Bag_Col.x + (1 | Bag_no.x), data = survival_dat)

# Additive model with only main effects
coxme_additive <- coxme(
  surv_field ~ Site.x + Bag_Col.x + (1 | Bag_no.x),
  data = survival_dat
)

# Model without Site.x
coxme_no_site <- coxme(
  surv_field ~ Bag_Col.x + (1 | Bag_no.x),
  data = survival_dat
)

# Model without Bag_Col.x
coxme_no_color <- coxme(
  surv_field ~ Site.x + (1 | Bag_no.x),
  data = survival_dat
)

# LRT comparison function with manual df
lrt_compare_manual <- function(full_model, reduced_model, effect_name, df_manual) {
  LRT <- 2 * (logLik(full_model)[1] - logLik(reduced_model)[1])
  p <- pchisq(LRT, df = df_manual, lower.tail = FALSE)
  
  data.frame(
    Effect = effect_name,
    LRT_statistic = round(LRT, 3),
    df = df_manual,
    p_value = signif(p, 4)
  )
}

# Run comparisons with manually specified df
lrt_results <- bind_rows(
  lrt_compare_manual(coxme_randomeff, coxme_additive, "Interaction (Site × Bag Color)", df_manual = 6),
  lrt_compare_manual(coxme_additive, coxme_no_site, "Main effect: Site", df_manual = 3),
  lrt_compare_manual(coxme_additive, coxme_no_color, "Main effect: Bag Color", df_manual = 2)
)

# View results
print(lrt_results)
                                      Effect LRT_statistic df   p_value
Penalized...1 Interaction (Site × Bag Color)        24.370  6 4.463e-04
Penalized...2              Main effect: Site       366.618  3 3.760e-79
Penalized...3         Main effect: Bag Color         2.750  2 2.528e-01

We then conducted pairwise comparisons to understand how mortality differed across sites in this model. We simplified the output to undestand differences between site within ploidies (@pairs_ploidy_noenv) and differences between ploidies across sites (@pairs_site_noenv)

pairwise_coxph_randomeff <- emmeans(coxme_randomeff, ~ Site.x*Bag_Col.x, adjust = "Holm")
pairs_site <- pairs(pairwise_coxph_randomeff, simple = "Site.x", adjust = "Holm")
pairs_ploidy <- pairs(pairwise_coxph_randomeff, simple = "Bag_Col.x", adjust = "Holm")
pairs_all <- pairs(pairwise_coxph_randomeff, adjust = "Holm")

#cld(pairs_all)#Adds compact letter display to show groupings
kable(lrt_results, digits = 2)
Table 2: Likleyhood ration tests to understand the imacts of main and interaction effects in this model.
Effect LRT_statistic df p_value
Penalized…1 Interaction (Site × Bag Color) 24.37 6 0.00
Penalized…2 Main effect: Site 366.62 3 0.00
Penalized…3 Main effect: Bag Color 2.75 2 0.25
kable(pairs_ploidy, digits = 2)
Table 3: Pairwise comparisons within sites across ploidies
contrast Site.x estimate SE df z.ratio p.value
Blue - Green Chelsea 0.68 0.24 Inf 2.79 0.02
Blue - Red Chelsea 0.42 0.18 Inf 2.27 0.05
Green - Red Chelsea -0.26 0.26 Inf -1.01 0.31
Blue - Green Hood Head 0.33 0.29 Inf 1.14 0.76
Blue - Red Hood Head 0.33 0.29 Inf 1.14 0.76
Green - Red Hood Head 0.00 0.30 Inf 0.01 0.99
Blue - Green Manchester 0.11 0.36 Inf 0.30 0.77
Blue - Red Manchester 0.92 0.37 Inf 2.46 0.04
Green - Red Manchester 0.82 0.38 Inf 2.16 0.06
Blue - Green Thorndyke Bay -0.06 0.24 Inf -0.26 0.80
Blue - Red Thorndyke Bay -1.11 0.25 Inf -4.38 0.00
Green - Red Thorndyke Bay -1.05 0.26 Inf -4.02 0.00

At Chelsea blue moratlity was significantly higher than the otehr two groups. Everywhere else red had the highest mortality. At Hood Head, no differences in mortality existed across ploidies, at Manchester Red mortality was significantly higher. than the other two groups. At Thorndyke red mortality was significantly higher than the otehr two groups.

kable(pairs_site, digits = 2)
Table 4: Pairwise comparisons within ploidies across sites
contrast Bag_Col.x estimate SE df z.ratio p.value
Chelsea - Hood Head Blue 2.55 0.25 Inf 10.26 0.00
Chelsea - Manchester Blue 3.58 0.28 Inf 12.89 0.00
Chelsea - Thorndyke Bay Blue 2.79 0.22 Inf 12.49 0.00
Hood Head - Manchester Blue 1.03 0.31 Inf 3.38 0.00
Hood Head - Thorndyke Bay Blue 0.25 0.27 Inf 0.91 0.36
Manchester - Thorndyke Bay Blue -0.79 0.30 Inf -2.59 0.02
Chelsea - Hood Head Green 2.19 0.29 Inf 7.60 0.00
Chelsea - Manchester Green 3.00 0.33 Inf 9.22 0.00
Chelsea - Thorndyke Bay Green 2.05 0.27 Inf 7.70 0.00
Hood Head - Manchester Green 0.81 0.35 Inf 2.32 0.04
Hood Head - Thorndyke Bay Green -0.14 0.30 Inf -0.47 0.64
Manchester - Thorndyke Bay Green -0.95 0.34 Inf -2.84 0.01
Chelsea - Hood Head Red 2.46 0.25 Inf 9.75 0.00
Chelsea - Manchester Red 4.08 0.32 Inf 12.78 0.00
Chelsea - Thorndyke Bay Red 1.26 0.23 Inf 5.43 0.00
Hood Head - Manchester Red 1.63 0.34 Inf 4.73 0.00
Hood Head - Thorndyke Bay Red -1.19 0.26 Inf -4.59 0.00
Manchester - Thorndyke Bay Red -2.82 0.32 Inf -8.79 0.00

For blue, signifcant differnces existed between all sites exxcept Hood Head and Thorndyke. Same for green. Red had significant diffs in mortality across al sites.