library(tidyverse)
library(here)
library(survival)
library(ggfortify)
library(ggsurvfit)
library(survminer)
library(gridExtra)
library(frailtyEM)
library(coxme)
library(emmeans)
library(multcomp)
library(knitr)Survival analysis for Kalloway
Survival analysis to the end of 2023 for oyster ploidy project
Read in the data and create basic tables and graphs
Load the required packages. Using survival package that was updated in August 2023
Read in the mortality data from the field experiments
survdat <- read.csv(here("Data", "MAY_2023_OCT_2024_MORTSBYBAG.csv"))
select <- dplyr::select#other packages are sasking select so set dplyr select as defaultData Wrangling
Correct the data structure
survdat <- survdat %>% mutate_at(c("Site", "Bag_Col", "Month", "Bag_no", "Month_no", "Year", "Month_year"), as.factor)
str(survdat)'data.frame': 1152 obs. of 8 variables:
$ Site : Factor w/ 4 levels "Chelsea","Hood Head",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Bag_Col : Factor w/ 3 levels "Blue","Green",..: 2 2 2 2 2 2 2 2 1 1 ...
$ Month : Factor w/ 7 levels "August","July",..: 4 4 4 4 4 4 4 4 4 4 ...
$ Bag_no : Factor w/ 75 levels "2","3","8","9",..: 45 54 66 67 70 56 55 7 20 22 ...
$ Mort_count: int 0 0 0 0 0 0 0 0 0 0 ...
$ Month_no : Factor w/ 6 levels "5","6","7","8",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Year : Factor w/ 2 levels "2023","2024": 1 1 1 1 1 1 1 1 1 1 ...
$ Month_year: Factor w/ 12 levels "August_2023",..: 7 7 7 7 7 7 7 7 7 7 ...
Create a new data sheet that calculates the number of survivors at each sampling event (as opposed to the number of moralities)
# A tibble: 1,248 × 5
Site Bag_Col Bag_no Month_survivor No_survivors
<fct> <fct> <fct> <fct> <dbl>
1 Chelsea Green 59 April_2023_survivors 50
2 Chelsea Green 59 May_2023_survivors 50
3 Chelsea Green 59 June_2023_survivors 50
4 Chelsea Green 59 July_2023_survivors 50
5 Chelsea Green 59 August_2023_survivors 48
6 Chelsea Green 59 September_2023_survivors 27
7 Chelsea Green 59 October_2023_survivors 20
8 Chelsea Green 59 May_2024_survivors 19
9 Chelsea Green 59 June_2024_survivors 19
10 Chelsea Green 59 July_2024_survivors 18
# ℹ 1,238 more rows
Double check that all bags are accounted for. There should be 8 bags per colour site per sampling event = 104 as of Nov 2023
`summarise()` has grouped output by 'Site'. You can override using the
`.groups` argument.
# A tibble: 12 × 3
# Groups: Site [4]
Site Bag_Col nbags
<fct> <fct> <int>
1 Chelsea Blue 104
2 Chelsea Green 104
3 Chelsea Red 104
4 Hood Head Blue 104
5 Hood Head Green 104
6 Hood Head Red 104
7 Manchester Blue 104
8 Manchester Green 104
9 Manchester Red 104
10 Thorndyke Bay Blue 104
11 Thorndyke Bay Green 104
12 Thorndyke Bay Red 104
Summarise the number of survivors per site per sampling event
`summarise()` has grouped output by 'Site', 'Bag_Col'. You can override using
the `.groups` argument.
# A tibble: 156 × 5
# Groups: Site, Bag_Col [12]
Site Bag_Col Date meansurv semsurv
<fct> <fct> <date> <dbl> <dbl>
1 Chelsea Blue 2023-04-01 50 0
2 Chelsea Blue 2023-05-01 50 0
3 Chelsea Blue 2023-06-01 49.9 0.125
4 Chelsea Blue 2023-07-01 49.5 0.327
5 Chelsea Blue 2023-08-01 36.2 2.24
6 Chelsea Blue 2023-09-01 28.9 2.55
7 Chelsea Blue 2023-10-01 21.8 2.87
8 Chelsea Blue 2024-05-01 20.8 2.67
9 Chelsea Blue 2024-06-01 20.1 3.13
10 Chelsea Blue 2024-07-01 17.2 2.95
# ℹ 146 more rows
Create a basic survival plot that shows the number of survivors at each sampling event
Summarise survivors/mortality at the conclusion of the study
kable(mean_survival_proportion %>% filter(Date == "2024-10-01"))| Site | Bag_Col | Date | meansurv | semsurv |
|---|---|---|---|---|
| Chelsea | Blue | 2024-10-01 | 11.750 | 2.9805920 |
| Chelsea | Green | 2024-10-01 | 18.375 | 3.1619247 |
| Chelsea | Red | 2024-10-01 | 15.625 | 2.3447319 |
| Hood Head | Blue | 2024-10-01 | 41.250 | 1.8684409 |
| Hood Head | Green | 2024-10-01 | 43.000 | 1.5000000 |
| Hood Head | Red | 2024-10-01 | 42.000 | 1.6903085 |
| Manchester | Blue | 2024-10-01 | 46.750 | 0.8813545 |
| Manchester | Green | 2024-10-01 | 46.375 | 0.9437293 |
| Manchester | Red | 2024-10-01 | 47.875 | 0.5805632 |
| Thorndyke Bay | Blue | 2024-10-01 | 41.625 | 2.0610114 |
| Thorndyke Bay | Green | 2024-10-01 | 41.125 | 1.5633926 |
| Thorndyke Bay | Red | 2024-10-01 | 31.625 | 3.4945749 |
Need to turn the data into individual survival data. Each oyster should be listed with either alive or dead at each sampling point. Each oyster will need an ID number. 8 Cages per ploidy per site were monitored for mortality so a total of 400 oysters per ploidy per site at the start. Total of 4800 oysters.
This is a very clunky way to do this and needs to be streamlined - a for loop might work?
# A tibble: 4,800 × 17
Site.x Bag_Col.x Bag_no.x OYS_ID April_status May_status June_status
<fct> <fct> <fct> <chr> <dbl> <dbl> <dbl>
1 Chelsea Green 59 ChelseaGreen5… 0 0 0
2 Chelsea Green 59 ChelseaGreen5… 0 0 0
3 Chelsea Green 59 ChelseaGreen5… 0 0 0
4 Chelsea Green 59 ChelseaGreen5… 0 0 0
5 Chelsea Green 59 ChelseaGreen5… 0 0 0
6 Chelsea Green 59 ChelseaGreen5… 0 0 0
7 Chelsea Green 59 ChelseaGreen5… 0 0 0
8 Chelsea Green 59 ChelseaGreen5… 0 0 0
9 Chelsea Green 59 ChelseaGreen5… 0 0 0
10 Chelsea Green 59 ChelseaGreen5… 0 0 0
# ℹ 4,790 more rows
# ℹ 10 more variables: July_status <dbl>, August_status <dbl>,
# September_status <dbl>, October_status <dbl>, May_24_status <dbl>,
# June_24_status <dbl>, July_24_status <dbl>, August_24_status <dbl>,
# September_24_status <dbl>, October_24_status <dbl>
Pivot this data longer to get the correct formayt for survival analyses
Survival analyses
Survial plots for pooled data (no random effects of bag number)
Created Kaplan-Meyer survival plots for all data pooled across sites and ploidys. They do not consider the potential random effects of Bag within sites.
Create a survival plot faceted by Site
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
ℹ The deprecated feature was likely used in the ggpubr package.
Please report the issue at <https://github.com/kassambara/ggpubr/issues>.
Warning: ggtheme is not a valid theme.
Please use `theme()` to construct themes.
Scale for x is already present.
Adding another scale for x, which will replace the existing scale.
Ignoring unknown labels:
• legend : "ploidy"
• linetype : "1"
Create a survival plot faceted by ploidy
pooled_ploidy_survplot <- survival_dat %>%
ggsurvplot_facet(survfit(Surv(month_no, event, type = "right") ~ Site.x , data = .),
facet.by = "Bag_Col.x",
data = .,
risk.table = FALSE, pval = FALSE, conf.int = TRUE,
xlab = ("Month"),
xlim = c(4, 19), ylim = c(0.6, 1))Warning: ggtheme is not a valid theme.
Please use `theme()` to construct themes.
pooled_ploidy_survplotIgnoring unknown labels:
• linetype : "1"
These plots show that there appeared to be an effect of site on survival throughout summer of 2023
Modeling of survival
We initially modeled survival using a cox mixed effects model to determine if there were differences in survival between ploidies and sites. This did not include any environmental data. We compared models that included a random effect of bag number with those that did not.
Analysis of Deviance Table
Cox model: response is surv_field
Model 1: ~Site.x * Bag_Col.x
Model 2: ~Site.x * Bag_Col.x + (1 | Bag_no.x)
loglik Chisq Df P(>|Chi|)
1 -12528
2 -12406 244.11 1 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The model comaprisons (Table 1) showed that the model that included theh random effect of bag was better than the model that did not.
kable(anova(coxph_pooled, coxme_randomeff), digits = 2)| loglik | Chisq | Df | P(>|Chi|) |
|---|---|---|---|
| -12527.69 | NA | NA | NA |
| -12405.63 | 244.11 | 1 | 0 |
We then examined the signifcance of each of the main effects and their inetraction. Since the coxme can’t produce an anova stule output we performed liklihood ratio tests. All interaction and main effects in this model were significant (@LRT_cox_model).
# Full model with interaction (already fitted)
# coxme_randomeff <- coxme(surv_field ~ Site.x * Bag_Col.x + (1 | Bag_no.x), data = survival_dat)
# Additive model with only main effects
coxme_additive <- coxme(
surv_field ~ Site.x + Bag_Col.x + (1 | Bag_no.x),
data = survival_dat
)
# Model without Site.x
coxme_no_site <- coxme(
surv_field ~ Bag_Col.x + (1 | Bag_no.x),
data = survival_dat
)
# Model without Bag_Col.x
coxme_no_color <- coxme(
surv_field ~ Site.x + (1 | Bag_no.x),
data = survival_dat
)
# LRT comparison function with manual df
lrt_compare_manual <- function(full_model, reduced_model, effect_name, df_manual) {
LRT <- 2 * (logLik(full_model)[1] - logLik(reduced_model)[1])
p <- pchisq(LRT, df = df_manual, lower.tail = FALSE)
data.frame(
Effect = effect_name,
LRT_statistic = round(LRT, 3),
df = df_manual,
p_value = signif(p, 4)
)
}
# Run comparisons with manually specified df
lrt_results <- bind_rows(
lrt_compare_manual(coxme_randomeff, coxme_additive, "Interaction (Site × Bag Color)", df_manual = 6),
lrt_compare_manual(coxme_additive, coxme_no_site, "Main effect: Site", df_manual = 3),
lrt_compare_manual(coxme_additive, coxme_no_color, "Main effect: Bag Color", df_manual = 2)
)
# View results
print(lrt_results) Effect LRT_statistic df p_value
Penalized...1 Interaction (Site × Bag Color) 24.370 6 4.463e-04
Penalized...2 Main effect: Site 366.618 3 3.760e-79
Penalized...3 Main effect: Bag Color 2.750 2 2.528e-01
We then conducted pairwise comparisons to understand how mortality differed across sites in this model. We simplified the output to undestand differences between site within ploidies (@pairs_ploidy_noenv) and differences between ploidies across sites (@pairs_site_noenv)
pairwise_coxph_randomeff <- emmeans(coxme_randomeff, ~ Site.x*Bag_Col.x, adjust = "Holm")
pairs_site <- pairs(pairwise_coxph_randomeff, simple = "Site.x", adjust = "Holm")
pairs_ploidy <- pairs(pairwise_coxph_randomeff, simple = "Bag_Col.x", adjust = "Holm")
pairs_all <- pairs(pairwise_coxph_randomeff, adjust = "Holm")
#cld(pairs_all)#Adds compact letter display to show groupingskable(lrt_results, digits = 2)| Effect | LRT_statistic | df | p_value | |
|---|---|---|---|---|
| Penalized…1 | Interaction (Site × Bag Color) | 24.37 | 6 | 0.00 |
| Penalized…2 | Main effect: Site | 366.62 | 3 | 0.00 |
| Penalized…3 | Main effect: Bag Color | 2.75 | 2 | 0.25 |
kable(pairs_ploidy, digits = 2)| contrast | Site.x | estimate | SE | df | z.ratio | p.value |
|---|---|---|---|---|---|---|
| Blue - Green | Chelsea | 0.68 | 0.24 | Inf | 2.79 | 0.02 |
| Blue - Red | Chelsea | 0.42 | 0.18 | Inf | 2.27 | 0.05 |
| Green - Red | Chelsea | -0.26 | 0.26 | Inf | -1.01 | 0.31 |
| Blue - Green | Hood Head | 0.33 | 0.29 | Inf | 1.14 | 0.76 |
| Blue - Red | Hood Head | 0.33 | 0.29 | Inf | 1.14 | 0.76 |
| Green - Red | Hood Head | 0.00 | 0.30 | Inf | 0.01 | 0.99 |
| Blue - Green | Manchester | 0.11 | 0.36 | Inf | 0.30 | 0.77 |
| Blue - Red | Manchester | 0.92 | 0.37 | Inf | 2.46 | 0.04 |
| Green - Red | Manchester | 0.82 | 0.38 | Inf | 2.16 | 0.06 |
| Blue - Green | Thorndyke Bay | -0.06 | 0.24 | Inf | -0.26 | 0.80 |
| Blue - Red | Thorndyke Bay | -1.11 | 0.25 | Inf | -4.38 | 0.00 |
| Green - Red | Thorndyke Bay | -1.05 | 0.26 | Inf | -4.02 | 0.00 |
At Chelsea blue moratlity was significantly higher than the otehr two groups. Everywhere else red had the highest mortality. At Hood Head, no differences in mortality existed across ploidies, at Manchester Red mortality was significantly higher. than the other two groups. At Thorndyke red mortality was significantly higher than the otehr two groups.
kable(pairs_site, digits = 2)| contrast | Bag_Col.x | estimate | SE | df | z.ratio | p.value |
|---|---|---|---|---|---|---|
| Chelsea - Hood Head | Blue | 2.55 | 0.25 | Inf | 10.26 | 0.00 |
| Chelsea - Manchester | Blue | 3.58 | 0.28 | Inf | 12.89 | 0.00 |
| Chelsea - Thorndyke Bay | Blue | 2.79 | 0.22 | Inf | 12.49 | 0.00 |
| Hood Head - Manchester | Blue | 1.03 | 0.31 | Inf | 3.38 | 0.00 |
| Hood Head - Thorndyke Bay | Blue | 0.25 | 0.27 | Inf | 0.91 | 0.36 |
| Manchester - Thorndyke Bay | Blue | -0.79 | 0.30 | Inf | -2.59 | 0.02 |
| Chelsea - Hood Head | Green | 2.19 | 0.29 | Inf | 7.60 | 0.00 |
| Chelsea - Manchester | Green | 3.00 | 0.33 | Inf | 9.22 | 0.00 |
| Chelsea - Thorndyke Bay | Green | 2.05 | 0.27 | Inf | 7.70 | 0.00 |
| Hood Head - Manchester | Green | 0.81 | 0.35 | Inf | 2.32 | 0.04 |
| Hood Head - Thorndyke Bay | Green | -0.14 | 0.30 | Inf | -0.47 | 0.64 |
| Manchester - Thorndyke Bay | Green | -0.95 | 0.34 | Inf | -2.84 | 0.01 |
| Chelsea - Hood Head | Red | 2.46 | 0.25 | Inf | 9.75 | 0.00 |
| Chelsea - Manchester | Red | 4.08 | 0.32 | Inf | 12.78 | 0.00 |
| Chelsea - Thorndyke Bay | Red | 1.26 | 0.23 | Inf | 5.43 | 0.00 |
| Hood Head - Manchester | Red | 1.63 | 0.34 | Inf | 4.73 | 0.00 |
| Hood Head - Thorndyke Bay | Red | -1.19 | 0.26 | Inf | -4.59 | 0.00 |
| Manchester - Thorndyke Bay | Red | -2.82 | 0.32 | Inf | -8.79 | 0.00 |
For blue, signifcant differnces existed between all sites exxcept Hood Head and Thorndyke. Same for green. Red had significant diffs in mortality across al sites.