I am conducting a reciprocal transplant study to test for local adaptation between two monkeyflower species in distinct microhabitats at a sympatric site on the central coast of California. My project involves analyzing germination rates, seedling survival of the two species, and reproductive success, and if possible relating those outcomes to weekly soil moisture data.
Seeded toothpicks were planted and surveyed weekly for roughly one year. Study design:
10 plots in each of 2 microhabitat environments (rock outcrop and seep)
2 monkeyflower species (annual and perennial)
9 lineages per species
6 replicates of each lineage were randomly allocated to positions in each plot
In total, this amounts to 108 toothpicks (6 reps x 9 lineages x 2 species) per plot and a total sample size of \(n = 2160\) (108 per plot x 10 plots x 2 microhabitats). Primary interest is in species/environment interaction effects on fitness.
Goals set during initial meeting:
Formulate an analysis-ready rearrangment of seed plot data.
Recommend an approach for analysis of germination rates.
Note
Questions/issues needing client’s attention are shown throughout like this.
Seed plot data
Data provided are shown below. I removed plot index A4_10_3, which was missing identifying information for species and lineage. I suspect this may have been miscoded, since after removal \(n = 2159\), which is one short of expected based on the study design.
Note
Check plot coding for index A4_10_3. Was this miscoded?
For the three individuals shown, values are missing in all weeks because the toothpicks never germinated.
Table: raw data (continued below)
species
mom
dad
envname
family
habit
color
LLP
9 G1
11 G1
LLP 9 G1 x LLP 11 G1
P9x11
P
WhitePink
LLP
6 G1
20 G1
LLP 6 G1 x LLP 20 G1
P6x20
P
WhiteBlue
LLP
15 G1
1 G1
LLP 15 G1 x LLP 1 G1
P15x1
P
WhiteBlack
Table continues below
location
replicate
plotind
plot
famabbrev
position
flower?
rock_outcrop
1
P1_1_1
1
P1
1
NA
rock_outcrop
1
P7_1_1
1
P7
2
NA
rock_outcrop
1
P5_1_1
1
P5
3
NA
Table continues below
week1
week2
week3
week4
week5
week6
week7
week8
week9
week10
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
Table continues below
week11
week12
week13
week14
week15
week16
week17
week18
week19
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
Table continues below
week20
week22
week24
week26
week28
week30
week33
week35
week39
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
Table continues below
week42
week44
week46
week48
week50
week52
week54
week56
week58
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
week60
week63
week65
NA
NA
NA
NA
NA
NA
NA
NA
NA
To verify that the data reflect the study design, the following tables show distinct combinations of study attributes and sample sizes per plot:
Table: Number of unique lineages per location/species
location
LLA
LLP
rock_outcrop
9
9
seep
9
9
Table: Sample sizes per plot
plot
rock_outcrop
seep
1
108
107
2
108
108
3
108
108
4
108
108
5
108
108
6
108
108
7
108
108
8
108
108
9
108
108
10
108
108
The values entered for each individual each week are:
missing if ungerminated, dead, or otherwise not present
1 if germinated
2 if flowering, with number of repetitions indicating number of flowers (e.g., 222 means three flowers)
3 if alive post-flowering
4 if fruiting, with number of repetitions indicating number of fruits (e.g., 44 means two fruits)
X if missing for reasons other than death
combinations as appropriate (e.g., 222 4 indicates three flowers and one fruit)
The unique encodings present in the data are shown below:
The decimals are artefacts of value encoding in excel when the entry can be parsed as numeric. So for example 2.22222222E8 is nine 2’s rewritten in scientific notation (\(2.22222222 \times 10^8\)).
Reformatting for analysis
I’ve cleaned up the encoding and data formatting for analysis as follows:
rearrange in ‘long’ format with a variable indicating week number
append sampling dates from the CensusDate sheet
derive a stage variable encoding life stage achieved by the time of observation:
ungerminated if no values were entered prior to the observation date
germinated if a ‘1’ is recorded and no other values were entered prior to the observation date
flowered if a ‘2’ was entered prior to or on the observation date
fruited if a ‘4’ was entered prior to or on the observation date
died if value is missing on the current observation date but at least one nonmissing value was entered prior to the observation date
NA if an ‘X’ was entered prior to or on the observation date
derive counts of numbers of fruits and flowers
add a unique identifier id per toothpick
Table: Example rows of analysis-ready data (continued below)
id
species
family
location
plot
replicate
week
date
1
LLP
P9x11
rock_outcrop
1
1
week1
2024-01-25
1
LLP
P9x11
rock_outcrop
1
1
week2
2024-02-01
1
LLP
P9x11
rock_outcrop
1
1
week3
2024-02-08
1
LLP
P9x11
rock_outcrop
1
1
week4
2024-02-15
1
LLP
P9x11
rock_outcrop
1
1
week5
2024-02-22
1
LLP
P9x11
rock_outcrop
1
1
week6
2024-02-29
stage
n.flower
n.fruit
ungerminated
0
0
ungerminated
0
0
ungerminated
0
0
ungerminated
0
0
ungerminated
0
0
ungerminated
0
0
I’ve provided the data in this form to client as seedplot.csv for direct import into R. I’d recommend explicitly encoding the life stage variable in the correct order, as shown below.
I considered adding a curr.stage or curr.status variable that is the same as stage but indicates current life stage at the time of observation: ungerminated, germinated, flowering, fruiting, living, dead.
This information is captured implicitly by the combination of flower/fruit counts and the existing stage variable, but would it be useful to encode it explicitly?
Descriptive analysis
Comparing progressions through life stages shows:
almost all germination occurs by 2/15/2024
slightly higher germination rate for perennial compared with annual species
no survival at rock outcrop after 6/25/2024
perennials generally don’t flower/fruit except in seep environment after 6/25
one rock outcrop plot missing, but otherwise few missing values
Code
# life stage over timeseedplot |>count(date, stage, location, species) |>ggplot(aes(x =factor(date), y = n, fill = stage)) +facet_grid(location ~ species) +geom_bar(position ='fill', stat ='identity', alpha =0.9) +scale_fill_grafify() +scale_y_continuous(expand =rep(0, 2)) +theme_bw() +theme(panel.grid =element_blank(), panel.spacing =unit(0.15, 'in'),axis.text.x =element_text(angle =90, vjust =0.5)) +labs(x ='date', y ='proportion of individuals')
Figure: Life stage progression by location and species (missing data removed)
Here’s the same figure but with normalization after removing missing data:
Code
# life stage over timeseedplot |>count(date, stage, location, species) |>drop_na(stage) |>ggplot(aes(x =factor(date), y = n, fill = stage)) +facet_grid(location ~ species) +geom_bar(position ='fill', stat ='identity', alpha =0.9) +scale_fill_grafify() +scale_y_continuous(expand =rep(0, 2)) +theme_bw() +theme(panel.grid =element_blank(), panel.spacing =unit(0.15, 'in'),axis.text.x =element_text(angle =90, vjust =0.5)) +labs(x ='date', y ='proportion of individuals')
Figure: Life stage progression by location and species (missing data removed)
Analysis of rate data
Analysis of event occurrence rates aggregated over the study period models the probability that the individual germinated/flowered/fruited at any point during the study period, possibly conditional on prior life stages. For this, individuals are dropped if missing throughout the study, but if nonmissing observations were recorded for any period of time, the individual is retained through the last stage for which values are recorded.
The framework for this analysis is a generalized linear mixed model for the relevant event probability.
Germination rates
I’ve added an indicator of whether germination occurred at some time during the study. Here are some examples of the indicator encoding:
The figure below shows germination rates (aggregated over plots) by lineage, location, and species, with the location/species average overlaid as a vertical dashed line.
Figure: Germination rates by species, lineage, and location; average shown in blue dash.
My observations:
no apparent location effect on germination rates in aggregate, but possible location/lineage interactions
usually slightly higher rates in seep environments, but not uniformly; e.g., P9x11 does better in rock outcrop)
if there are location/lineage interactions, the location effect may not be significant; few differences in germination rates exceeding 5-10%
apparent species effect – higher germination rates for perennials
Interestingly, there is a bit of a plot effect and a possible plot/species interaction. This suggests that there may be unaccounted-for microhabitat features that influence fitness.
Figure: Plot effects and plot/species interactions
Analysis ignoring lineage
Let \(Y_{ijkl}\) denote the whether germination occurred for the \(l\)th replicate of species \(i\) in plot \(k\) in microhabitat type \(j\). A model for the data, ignoring any effect of lineage, is:
plot (nested within microhabitat) \(\text{plot}_{k(j)}\)
plot/species interaction (nested within microhabitat) \(\left(\text{plot}\times\text{spp}\right)_{ik(j)}\)
To fit the model, use lme4::glmer:
Code
# fit modelfit <-glmer(germ ~ species*location + (1| plot:location/species),data = germ, family = binomial)
This produces the inference:
Code
# tests for fixed effectscar::Anova(fit, type ='II') |>pander(caption ='Table: Type II Wald tests for fixed effects')
Table: Type II Wald tests for fixed effects
Chisq
Df
Pr(>Chisq)
species
5.631
1
0.01765
location
0.002503
1
0.9601
species:location
0.0118
1
0.9135
Code
# test for plot effectglm(germ ~ species*location,data = germ, family = binomial) %>%anova(fit, ., type ='LRT') |>pander(caption ='Table: Likelihood ratio test for plot effect')
Table: Likelihood ratio test for plot effect
npar
AIC
BIC
logLik
-2*log(L)
Chisq
Df
Pr(>Chisq)
.
4
2795
2818
-1394
2787
NA
NA
NA
fit
6
2631
2665
-1310
2619
168.1
2
3.146e-37
Code
# test for plot/species interactionglmer(germ ~ species*location + (1| plot:location),data = germ, family = binomial) %>%anova(fit, ., type ='LRT') |>pander(caption ='Table: Likelihood ratio test for plot/species interaction')
Table: Likelihood ratio test for plot/species interaction
npar
AIC
BIC
logLik
-2*log(L)
Chisq
Df
Pr(>Chisq)
.
5
2630
2658
-1310
2620
NA
NA
NA
fit
6
2631
2665
-1310
2619
0.2977
1
0.5853
In summary:
the data provide evidence of a species effect (\(\chi^2_1 = 5.631, p = 0.01765\)) and a plot effect (\(\chi^2_2 = 168.1, p < 0.0001\))
the data do not provide evidence of a location effect or a species/plot interaction
Adjusting for lineage
Let \(Y_{ijklm}\) denote the whether germination occurred for the \(m\)th replicate of lineage \(l\) of species \(i\) in plot \(k\) in microhabitat type \(j\). Agumenting the model above to account for lineage yields:
plot (nested within microhabitat) \(\text{plot}_{k(j)}\)
plot/species interaction (nested within microhabitat) \(\left(\text{plot}\times\text{spp}\right)_{ik(j)}\)
lineage (nested within species) \(\text{lin}_{l(i)}\)
To fit the model:
Code
# fit modelfit_lin <-glmer(germ ~ species*location + (1| plot:location/species) + (1| family:species),data = germ, family = binomial)
By comparison with the initial model, there is evidence of an effect of lineage:
Code
# test for effect of lineageanova(fit_lin, fit) |>pander(caption ='Table: Likelihood ratio test for lineage effect')
Table: Likelihood ratio test for lineage effect
npar
AIC
BIC
logLik
-2*log(L)
Chisq
Df
Pr(>Chisq)
fit
6
2631
2665
-1310
2619
NA
NA
NA
fit_lin
7
2619
2659
-1303
2605
14.16
1
0.0001677
And interestingly, the effect of species vanishes when lineage is accounted for:
Code
# tests for fixed effectscar::Anova(fit_lin, type ='II') |>pander(caption ='Table: Type II Wald tests for fixed effects')
Table: Type II Wald tests for fixed effects
Chisq
Df
Pr(>Chisq)
species
2.139
1
0.1436
location
0.002618
1
0.9592
species:location
0.01313
1
0.9088
Inference on the other fixed effects – location and location/species interactions – are unchanged. Likewise for inferences on random effects – plot and plot/species interaction.
Table: Likelihood ratio test for plot effect
npar
AIC
BIC
logLik
-2*log(L)
Chisq
Df
Pr(>Chisq)
.
5
2787
2816
-1389
2777
NA
NA
NA
fit_lin
7
2619
2659
-1303
2605
172.2
2
3.985e-38
Table: Likelihood ratio test for plot/species interaction
npar
AIC
BIC
logLik
-2*log(L)
Chisq
Df
Pr(>Chisq)
.
6
2618
2651
-1303
2606
NA
NA
NA
fit_lin
7
2619
2659
-1303
2605
0.4603
1
0.4975
In summary:
the data provide evidence of a lineage effect (\(\chi^2_1 = 14.16, p = 0.0001677\)) and a plot effect (\(\chi^2_2 = 172.2, p < 0.0001\))
the data do not provide evidence of a species effect, a location effect, or location/species or plot/species interactions
As an aside, I also tested for lineage interactions with plot and location (separately), and neither were significant. These are shown below.
Table: Likelihood ratio test for lineage/plot interaction
npar
AIC
BIC
logLik
-2*log(L)
Chisq
Df
Pr(>Chisq)
fit_lin
7
2619
2659
-1303
2605
NA
NA
NA
.
8
2621
2666
-1302
2605
0.3193
1
0.572
Table: Likelihood ratio test for lineage/location interaction
npar
AIC
BIC
logLik
-2*log(L)
Chisq
Df
Pr(>Chisq)
fit_lin
7
2619
2659
-1303
2605
NA
NA
NA
.
8
2620
2665
-1302
2604
0.7485
1
0.3869
My recommendation would be to omit interaction terms involving lineage for model parsimony, but a full model could include both interactions above. This may be desirable for interpretability, but won’t change the inferences about effects. I checked but suppressed the analyses – codes are shown below.
Code
# full model with all interactionsfit_full <-glmer(germ ~ species*location + (1| plot:location/species) + (1| family:species/location) + (1| family:species:location:plot),data = germ, family = binomial) # inference for fixed effectscar::Anova(fit_full, type ='II')# inference for random effects: lineage/plot interactionglmer(germ ~ species*location + (1| plot:location/species) + (1| family:species/location),data = germ, family = binomial) %>%anova(fit_full, .)# inference for random effects: lineage/location interactionglmer(germ ~ species*location + (1| plot:location/species) + (1| family:species) + (1| family:species:location:plot),data = germ, family = binomial) %>%anova(fit_full, .)# inference for random effects: lineage effectglmer(germ ~ species*location + (1| plot:location/species),data = germ, family = binomial) %>%anova(fit_full, .)# inference for random effects: plot effectglmer(germ ~ species*location + (1| family:species/location),data = germ, family = binomial) %>%anova(fit_full, .)# inference for random effects: plot/species interactionglmer(germ ~ species*location + (1| plot:location) + (1| family:species),data = germ, family = binomial) %>%anova(fit_full, .)
Flowering rates
Flowering rates could be computed in one of two ways:
Unconditional on germination, i.e., rate at which all toothpicks flowered
Conditional on germination, i.e., rate at which germinated toothpicks flowered
I’ve carried out parallel analyses using both rates. Personally, I think the conditional rates make more sense for the analysis.
Unconditional flowering rates
Below are comparisons of the unconditional flowering rates.
Figure: Unconditional flowering rates by species, lineage, and location; average shown in blue dash.
Since some families have unconditional flowering rates of zero, it will not be possible to estimate lineage interactions. The analysis below uses the same model specification as that employed in the analysis of germination rates. (Codes are also identical.)
Inference for fixed effects indicates there is a species effect on unconditional flowering rates:
Code
# fit modelfit_flow <-glmer(flow ~ species*location + (1| plot:location/species) + (1| family:species),data = flow, family = binomial)
Table: Inference for fixed effects
Chisq
Df
Pr(>Chisq)
species
15.8
1
7.025e-05
location
0.1033
1
0.7479
species:location
3.173
1
0.07489
Inference for random effects indicates a lineage effect, a plot effect, and a species/plot interaction. Codes for these analyses are identical to those shown in the previous subsection on analysis of germination rates.
Table: Likelihood ratio test for lineage effect on unconditional flowering rates
npar
AIC
BIC
logLik
-2*log(L)
Chisq
Df
Pr(>Chisq)
.
6
1416
1450
-701.9
1404
NA
NA
NA
fit_flow
7
1400
1439
-692.9
1386
17.98
1
2.236e-05
Table: Likelihood ratio test for plot effect on unconditional flowering rates
npar
AIC
BIC
logLik
-2*log(L)
Chisq
Df
Pr(>Chisq)
.
5
1528
1557
-759.2
1518
NA
NA
NA
fit_flow
7
1400
1439
-692.9
1386
132.6
2
1.637e-29
Table: Likelihood ratio test for species/plot interaction effect on unconditional flowering rates
npar
AIC
BIC
logLik
-2*log(L)
Chisq
Df
Pr(>Chisq)
.
6
1406
1440
-696.9
1394
NA
NA
NA
fit_flow
7
1400
1439
-692.9
1386
7.928
1
0.004868
Higher-order interactions produce numerical instability in the model fitting and so are omitted. Estimates of unconditional flowering rates for each species/location combination are given below.
Table: Estimated (unconditional) flowering probabilities and 95% confidence intervals
species
location
prob
std.error
conf.low
conf.high
LLA
rock_outcrop
0.1265
0.0613
0.04653
0.3005
LLP
rock_outcrop
0.02323
0.01351
0.00735
0.07097
LLA
seep
0.1163
0.05412
0.04476
0.2697
LLP
seep
0.04893
0.02549
0.01729
0.1308
Graphically:
Marginal flowering rates by species (averaged over location) and location (averaged over species) are shown below.
Table: Estimated marginal unconditional flowering rates by species (averaged over location)
species
prob
std.error
conf.low
conf.high
LLA
0.1213
0.04285
0.05905
0.2328
LLP
0.0338
0.01389
0.01497
0.0745
Table: Estimated marginal unconditional flowering rates by location (averaged over species)
location
prob
std.error
conf.low
conf.high
rock_outcrop
0.05543
0.02794
0.0202
0.1431
seep
0.07602
0.03512
0.02995
0.1798
Lastly, factorial contrasts yield comparisons of marginal flowering rates between (a) locations and (b) species.
Table: Factorial contrasts for marginal species and location effects on unconditional flowering rates
contrast
odds.ratio
SE
conf.low
conf.high
loc
1.402
0.9868
0.3529
5.57
spp
3.945
1.312
2.056
7.57
This quantifies the significant species effect as follows: with 95% confidence, the odds of flowering are estimated to be between 2.06 and 7.57 times higher among annuals compared with perennials (point estimate: 3.95).
Conditional flowering rates
Below are comparisons of the conditional flowering rates.
Code
# binary indicator for flowering (conditional on germination)cflow <- seedplot |>drop_na(stage) |>filter(stage >'ungerminated') |># notice filter here for conditional rategroup_by(id, species, family, location, plot) |>summarize(flow =factor(sum(n.flower) >0, levels =c(F, T), labels =c('no', 'yes')),.groups ='drop')
Figure: Conditional flowering rates by species, lineage, and location; average shown in blue dash.
Since the conditional rates are determined after filtering out ungerminated plants, the relevant sample sizes change. After filtering:
Table: Sample sizes for conditional flowering rates by species, lineage, and plot
species
family
location
1
2
3
4
5
6
7
8
9
10
LLA
A12x19
rock_outcrop
NA
5
NA
3
3
2
3
4
4
4
LLA
A12x9
rock_outcrop
2
3
NA
4
6
4
1
7
6
4
LLA
A19x12
rock_outcrop
2
3
NA
4
6
4
3
2
2
5
LLA
A24x30
rock_outcrop
NA
4
NA
1
2
3
NA
1
3
6
LLA
A30x3
rock_outcrop
2
5
NA
6
3
2
1
4
5
4
LLA
A33x24
rock_outcrop
NA
5
NA
7
3
2
4
1
5
6
LLA
A3x21
rock_outcrop
2
5
NA
4
4
6
1
3
4
4
LLA
A7x10
rock_outcrop
NA
4
NA
3
5
3
2
4
4
3
LLA
A9x33
rock_outcrop
1
5
NA
2
4
4
1
4
3
3
LLP
P10x7
rock_outcrop
1
3
NA
4
4
2
2
4
4
4
LLP
P13x14
rock_outcrop
1
4
NA
3
5
5
4
5
5
3
LLP
P14x16
rock_outcrop
NA
4
NA
5
7
2
4
4
3
3
LLP
P15x1
rock_outcrop
1
4
NA
5
5
5
5
2
3
3
LLP
P17x6
rock_outcrop
2
4
NA
6
5
1
6
5
6
5
LLP
P1x8
rock_outcrop
2
4
NA
5
4
6
4
5
5
3
LLP
P20x4
rock_outcrop
1
6
NA
4
5
2
1
5
5
5
LLP
P6x20
rock_outcrop
1
4
NA
2
3
2
4
3
1
2
LLP
P9x11
rock_outcrop
1
3
NA
3
5
4
3
3
5
4
LLA
A12x19
seep
2
4
4
4
2
1
4
2
3
5
LLA
A12x9
seep
2
3
4
5
6
2
5
5
4
5
LLA
A19x12
seep
3
3
4
2
1
1
3
4
2
4
LLA
A24x30
seep
3
3
3
4
7
NA
3
5
1
5
LLA
A30x3
seep
2
5
4
4
5
1
6
4
4
3
LLA
A33x24
seep
3
3
2
3
2
1
4
2
3
6
LLA
A3x21
seep
1
4
2
4
3
NA
2
6
5
5
LLA
A7x10
seep
2
5
6
5
2
1
6
3
3
5
LLA
A9x33
seep
1
3
1
3
3
2
3
4
2
5
LLP
P10x7
seep
2
3
5
2
4
3
4
3
3
4
LLP
P13x14
seep
1
4
6
3
4
NA
5
6
4
5
LLP
P14x16
seep
3
5
3
6
5
3
6
6
4
6
LLP
P15x1
seep
2
4
4
2
4
1
6
3
2
6
LLP
P17x6
seep
4
4
6
3
4
1
5
5
4
6
LLP
P1x8
seep
3
4
4
3
4
2
5
4
4
5
LLP
P20x4
seep
2
4
3
4
3
2
3
5
3
4
LLP
P6x20
seep
4
3
3
2
3
NA
4
4
3
6
LLP
P9x11
seep
1
5
4
2
5
2
NA
3
2
5
Because some plots have no germinated plants of particular lineages, it will not be possible to estimate the plot/lineage interaction. However, the row totals are sufficient that the lineage effect is still estimable.
Inference for fixed effects (again using the same model specification employed for analysis of germination rates and the same codes) indicates there is a species effect on conditional flowering rates (no change compared with analysis of unconditional rates):
Table: Inference for fixed effects on conditional flowering rates
Chisq
Df
Pr(>Chisq)
species
20.61
1
5.625e-06
location
0.2234
1
0.6364
species:location
2.692
1
0.1008
Inference for random effects indicates a plot effect and a species/plot interaction and a lineage effect (no change compared with the analysis of unconditional flowering rates):
Table: Likelihood ratio test for lineage effect on conditional flowering rates
npar
AIC
BIC
logLik
-2*log(L)
Chisq
Df
Pr(>Chisq)
.
6
1110
1141
-549
1098
NA
NA
NA
fit_cflow
7
1093
1128
-539.3
1079
19.39
1
1.065e-05
Table: Likelihood ratio test for plot effect on conditional flowering rates
npar
AIC
BIC
logLik
-2*log(L)
Chisq
Df
Pr(>Chisq)
.
5
1174
1199
-581.8
1164
NA
NA
NA
fit_cflow
7
1093
1128
-539.3
1079
84.88
2
3.695e-19
Table: Likelihood ratio test for species/plot interaction effect on conditional flowering rates
npar
AIC
BIC
logLik
-2*log(L)
Chisq
Df
Pr(>Chisq)
.
6
1099
1130
-543.7
1087
NA
NA
NA
fit_cflow
7
1093
1128
-539.3
1079
8.755
1
0.003087
Estimates of conditional flowering rates for each species/location combination are given below.
Table: Estimated (conditional) flowering probabilities and 95% confidence intervals
species
location
prob
std.error
conf.low
conf.high
LLA
rock_outcrop
0.2623
0.09818
0.1162
0.4901
LLP
rock_outcrop
0.04167
0.02202
0.01454
0.1136
LLA
seep
0.253
0.09095
0.1165
0.4652
LLP
seep
0.08892
0.04078
0.03512
0.2075
Graphically:
Marginal flowering rates by species (averaged over location) and location (averaged over species) are shown below.
Table: Marginal conditional flowering rates by species (averaged over location)
species
prob
std.error
conf.low
conf.high
LLA
0.2576
0.07143
0.143
0.4191
LLP
0.06116
0.02293
0.02893
0.1247
Table: Marginal conditional flowering rates by location (averaged over species)
location
prob
std.error
conf.low
conf.high
rock_outcrop
0.1106
0.04691
0.04655
0.2405
seep
0.1538
0.05784
0.07072
0.3028
Lastly, factorial contrasts yield comparisons of marginal flowering rates between (a) locations and (b) species.
Table: Factorial contrasts for marginal species and location effects on conditional flowering rates
contrast
odds.ratio
SE
conf.low
conf.high
loc
1.462
0.9019
0.4365
4.898
spp
5.326
1.909
2.638
10.75
This quantifies the significant species effect as follows: with 95% confidence, the odds of flowering are estimated to be between 2.63 and 10.75 times higher among annuals compared with perennials (point estimate: 5.33).
Fruiting rates
Fruiting rates could be computed in one of three ways:
Unconditional on germination, i.e., rate at which all toothpicks produced fruit
Conditional on germination, i.e., rate at which germinated toothpicks produced fruit
Conditional on flowering, i.e., rate at which flowered plants produced fruit
Below I’ve only shown the analysis for option (3). To explore options 1-2, remove or change the filter on life stage in the codes shown below.
Code
# binary indicator for fruiting (conditional on flowering)fruit <- seedplot |>drop_na(stage) |>filter(stage >'germinated') |># notice filter here for conditional rategroup_by(id, species, family, location, plot) |>summarize(fruit =factor(sum(n.fruit) >0, levels =c(F, T), labels =c('no', 'yes')),.groups ='drop')
Figure: Conditional fruiting rates by species, lineage, and location; average shown in blue dash.
Since some families have conditional fruiting rates of zero, it will not be possible to estimate lineage interactions. The analysis below uses the same model specification as that employed in the analysis of germination rates. (Codes are also identical.)
Inference for fixed effects indicates there is a species effect on conditional fruiting rates:
Table: Inference for fixed effects on conditional fruiting rate
Chisq
Df
Pr(>Chisq)
species
8.955
1
0.002767
location
0.1853
1
0.6668
species:location
2.163
1
0.1413
Inference for random effects indicates a plot effect and a species/plot interaction but no lineage effect:
Table: Likelihood ratio test for lineage effect on conditional fruiting rates
npar
AIC
BIC
logLik
-2*log(L)
Chisq
Df
Pr(>Chisq)
.
6
672.2
702.6
-330.1
660.2
NA
NA
NA
fit_fruit
7
672.6
708
-329.3
658.6
1.596
1
0.2064
Table: Likelihood ratio test for plot effect on conditional fruiting rates
npar
AIC
BIC
logLik
-2*log(L)
Chisq
Df
Pr(>Chisq)
.
5
707.6
732.9
-348.8
697.6
NA
NA
NA
fit_fruit
7
672.6
708
-329.3
658.6
39.01
2
3.377e-09
Table: Likelihood ratio test for species/plot interaction effect on conditional fruiting rates
npar
AIC
BIC
logLik
-2*log(L)
Chisq
Df
Pr(>Chisq)
.
6
676.2
706.5
-332.1
664.2
NA
NA
NA
fit_fruit
7
672.6
708
-329.3
658.6
5.588
1
0.01809
Higher-order interactions produce numerical instability in the model fitting and so are omitted. Estimates of conditional fruiting rates for each species/location combination are given below.
Table: Estimated (conditional) fruiting probabilities and 95% confidence intervals
species
location
prob
std.error
conf.low
conf.high
LLA
rock_outcrop
0.1183
0.04827
0.05135
0.2494
LLP
rock_outcrop
0.02249
0.01223
0.007673
0.06406
LLA
seep
0.06554
0.02804
0.02779
0.1468
LLP
seep
0.03452
0.01773
0.01245
0.09207
Graphically:
Marginal fruiting rates by species (averaged over location) and location (averaged over species) are shown below.
Table: Estimated marginal conditional fruiting rates by species (averaged over location)
species
prob
std.error
conf.low
conf.high
LLA
0.08841
0.02763
0.0472
0.1596
LLP
0.02788
0.01113
0.01267
0.06026
Table: Estimated marginal conditional fruiting rates by location (averaged over species)
location
prob
std.error
conf.low
conf.high
rock_outcrop
0.05262
0.02147
0.02333
0.1144
seep
0.04769
0.01888
0.02169
0.1016
Lastly, factorial contrasts yield comparisons of marginal fruiting rates between (a) locations and (b) species.
Table: Factorial contrasts for marginal species and location effects on conditional fruiting rates
contrast
odds.ratio
SE
conf.low
conf.high
loc
0.9015
0.4997
0.3042
2.671
spp
3.382
1.364
1.534
7.455
This quantifies the significant species effect as follows: with 95% confidence, the odds of fruiting are estimated to be between 1.53 and 7.46 times higher among annuals compared with perennials (point estimate: 3.38).
Analysis of time-to-event data
If you’re interested in time patterns, it’s fairly easy to obtain first dates for each stage. Here’s what that looks like for one individual that saw all life stages:
Code
# earliest dates for each life stagefirst_dates <- seedplot |>drop_na(stage) |>group_by(id, species, family, location, plot, replicate, stage) |>slice_min(date) |>select(-starts_with('n'), -week) |>ungroup()# previewfirst_dates |>filter(id ==140) |>pander(caption ='Table: dates to life stage events for individual no. 140')
Table: dates to life stage events for individual no. 140
id
species
family
location
plot
replicate
date
stage
140
LLP
P15x1
rock_outcrop
2
4
2024-01-25
ungerminated
140
LLP
P15x1
rock_outcrop
2
4
2024-02-01
germinated
140
LLP
P15x1
rock_outcrop
2
4
2024-04-12
flowered
140
LLP
P15x1
rock_outcrop
2
4
2024-05-31
fruited
140
LLP
P15x1
rock_outcrop
2
4
2024-06-07
died
This can be rearranged to calculate times to each stage. Here are some examples for individuals who reached different life stages:
Code
# time to event datadurations <- first_dates |>pivot_wider(names_from = stage, values_from = date) |>rename(first.obs = ungerminated) |>mutate(time.to.germ = germinated -ymd("2024-01-17"),time.to.flower = flowered -ymd("2024-01-17"),time.to.fruit = fruited -ymd("2024-01-17"),time.to.death = died -ymd("2024-01-17"))
Table continues below
id
first.obs
germinated
flowered
fruited
died
25
2024-01-25
2024-02-08
NA
NA
2024-03-14
131
2024-01-25
2024-03-21
2024-05-03
NA
2024-05-09
140
2024-01-25
2024-02-01
2024-04-12
2024-05-31
2024-06-07
time.to.germ
time.to.flower
time.to.fruit
time.to.death
22 days
NA
NA
57 days
64 days
107 days
NA
113 days
15 days
86 days
135 days
142 days
Note
I manually input the start of the experiment as 1/17/25 because this was the earliest date that appeared in the metadata, and is consistent with the week 1 observation being 1/25/25. Client should confirm and adjust start date if needed so that durations are accurate.
Here are distributions of times to each event by species and location:
Figure: Distributions of times to life stage events by species and location
It’s perhaps not the most efficient comparison, but one can see:
approximate time ranges for each event
longer survival times and later flowering and fruiting in seep plots
Survival analysis methods can be used to estimate the (instantaneous) probability of germination/flowering/fruiting/death as a function of time. I focused on a simple approach using Kaplan-Meier estimators after grouping by location and species and ignoring plot and lineage effects. Mixed-effects Cox regression could be used for a proper analysis accounting for the experimental design with a specification analogous to that used above in modeling event rates in aggregate across the study period, but since this is a secondary analysis, I’ve just shown some simple exploratory work.
Missing durations are right-censored – all we know is that the event of interest occurred (or would have occurred) sometime after the end of the study. To format the data correctly, I’ve replaced missing durations with the number of days that the study persisted (442) and added an indicator of whether the corresponding observation is censored (i = 0) or not (i = 1).
Table: data formatted for analysis of time-to-event data
id
species
location
t.g
t.fl
t.fr
t.d
i.g
i.fl
i.fr
i.d
25
LLA
rock_outcrop
22
442
442
57
1
0
0
1
30
LLA
rock_outcrop
22
442
442
36
1
0
0
1
35
LLA
rock_outcrop
36
442
442
86
1
0
0
1
38
LLA
rock_outcrop
22
442
442
64
1
0
0
1
42
LLP
rock_outcrop
22
442
442
29
1
0
0
1
46
LLP
rock_outcrop
36
442
442
51
1
0
0
1
I’ve provided a copy of the data formatted in this way as seedplot-times.csv.
Code
# read in time to event datatdata <-read_csv('seedplot-times.csv')
A quick caveat: it’s somewhat awkward to represent germination, flowering, and fruiting as right-censored since this implicitly assumes that these events would eventually occur, which is not true in this system. This is more conceptual than analytical – what will happen in the analysis is that after sufficient time the estimated instantaneous event probabilities will plateau at the long-run event rates.
Germination
Modeling the instantaneous probability of germination as a function of time since the start of the study shows relatively quick convergence to aggregate germination rates:
The bands show pointwise 95% confidence intervals, and the overlap is consistent with the prior analysis which found no effect of species or location on germination rates. This just displays that conclusion a bit differently. There doesn’t look to be much difference in the rate at which the germination probability increases with time, though the estimate for LLP does increase slightly faster in the rock outrcop microhabitat.
Flowering
For modeling time to flowering, I used:
Flowering occurrences conditional on germination, i.e., all ungerminated individuals were removed prior to analysis
The time since germination (rather than the time since the start of the experiment)
This yields the following estimates of the instantaneous probability of flowering as a function of time since germination:
There is a clear and persistent difference in flowering probability by species: annuals flower at higher rates, as one might expect. While there is no long-run apparent difference in flowering probability by microhabitat, the flowering probability increases faster in the rock outcrop microhabitat. This reflects the descriptive finding earlier that flowering occurs later in the seep microhabitat.
Fruiting
Similar to flowering, I estimated the instantaneous conditional probability of fruiting given germination, as a function of the time since germination.
Note
Client may prefer to model the conditional probability of fruiting given flowering as a function of time since flowering. If so, change Surv(t.fr - t.g, i.fr) to Surv(t.fr - t.fl, i.fr) in the model specification.
Interestingly, there appears to be an interaction effect here: depending on species, the rate of increase and plateau differs between environments.
Survival
Finally, an analysis of survival times after germination is given below. The figure shows the probability of survival as a function of time, and reflects the descriptive finding earlier that plants survived longer in the seep microhabitat.
Reproducibility
I’ve provided a copy of all codes used to generate analyses shown above starting from the client’s spreadsheet. In addition, I generated two derivative files with reformatted data:
seedplot.csv has data arranged for event occurrence rate analysis
seedplot-time.csv has data arranged for time-to-event analysis
These files are available (only client has access permissions) in a compressed folder here (you may need to right-click to open).
Source Code
---title: "Monkeyflower reciprocal transplant study"author: "Avin Niknafs"author-title: "CLIENT"date: "8/19/25"published-title: "DATE"format: html: code-fold: true code-tools: truetoc: trueexecute: warning: false message: false echo: false---```{r}# setuplibrary(tidyverse)library(lubridate)library(grafify)library(lme4)library(lmerTest)library(pander)library(emmeans)library(survival)```From client:> I am conducting a reciprocal transplant study to test for local adaptation between two monkeyflower species in distinct microhabitats at a sympatric site on the central coast of California. My project involves analyzing germination rates, seedling survival of the two species, and reproductive success, and if possible relating those outcomes to weekly soil moisture data.Seeded toothpicks were planted and surveyed weekly for roughly one year. Study design:- 10 plots in each of 2 microhabitat environments (rock outcrop and seep)- 2 monkeyflower species (annual and perennial)- 9 lineages per species- 6 replicates of each lineage were randomly allocated to positions in each plotIn total, this amounts to 108 toothpicks (6 reps x 9 lineages x 2 species) per plot and a total sample size of $n = 2160$ (108 per plot x 10 plots x 2 microhabitats). Primary interest is in species/environment interaction effects on fitness.Goals set during initial meeting:1. Formulate an analysis-ready rearrangment of seed plot data.2. Recommend an approach for analysis of germination rates.::: callout-noteQuestions/issues needing client's attention are shown throughout like this.:::## Seed plot dataData provided are shown below. I removed plot index A4_10_3, which was missing identifying information for species and lineage. I suspect this may have been miscoded, since after removal $n = 2159$, which is one short of expected based on the study design.::: callout-noteCheck plot coding for index A4_10_3. Was this miscoded?:::For the three individuals shown, values are missing in all weeks because the toothpicks never germinated.```{r}# read in seed plot dataraw <- here::here("25su-niknafs", "FieldData2024.xlsx") |> readxl::read_xlsx(sheet ="Seed_Plots",col_types ="text") |>rename_with(tolower) |>filter(plotind !="A4_10_3") # removed due to missing species/lineage/etc# previewhead(raw, 3) |>pander(caption ='Table: raw data')```To verify that the data reflect the study design, the following tables show distinct combinations of study attributes and sample sizes per plot:```{r}# 36 distinct treatment combinations# (9 lineages x 2 species x 2 locations)raw |>distinct(species, family, location) |>arrange(species, family, location) |>count(location, species) |>spread(species, n) |>pander(caption ='Table: Number of unique lineages per location/species')# sample sizes per plotraw |>count(plot, location) |>arrange(location) |>group_by(location) |>mutate(plot =consecutive_id(plot)) |>spread(location, n) |>pander(caption ='Table: Sample sizes per plot')```The values entered for each individual each week are:- missing if ungerminated, dead, or otherwise not present- 1 if germinated- 2 if flowering, with number of repetitions indicating number of flowers (*e.g.*, 222 means three flowers)- 3 if alive post-flowering- 4 if fruiting, with number of repetitions indicating number of fruits (*e.g.*, 44 means two fruits)- X if missing for reasons other than death- combinations as appropriate (*e.g.*, 222 4 indicates three flowers and one fruit)The unique encodings present in the data are shown below:```{r}# unique encodingsraw |>select(starts_with('week')) |>gather() |>pull(value) |>unique()```The decimals are artefacts of value encoding in excel when the entry can be parsed as numeric. So for example `2.22222222E8` is nine 2's rewritten in scientific notation ($2.22222222 \times 10^8$). ## Reformatting for analysisI've cleaned up the encoding and data formatting for analysis as follows:- rearrange in 'long' format with a variable indicating week number- append sampling dates from the `CensusDate` sheet- derive a `stage` variable encoding life stage achieved by the time of observation: + `ungerminated` if no values were entered prior to the observation date+ `germinated` if a '1' is recorded and no other values were entered prior to the observation date+ `flowered` if a '2' was entered prior to or on the observation date+ `fruited` if a '4' was entered prior to or on the observation date+ `died` if value is missing on the current observation date but at least one nonmissing value was entered prior to the observation date+ NA if an 'X' was entered prior to or on the observation date- derive counts of numbers of fruits and flowers- add a unique identifier `id` per toothpick```{r}# read in sampling datesdates <- here::here("25su-niknafs", "FieldData2024.xlsx") |> readxl::read_xlsx(sheet ="CensusDate",col_types =c('text', 'date')) |>rename_with(tolower) |>mutate(week =paste('week', week, sep =''),date =as_date(date))# rearrange data and exportraw |>select(species, family, location, plot, replicate, starts_with('week')) |>pivot_longer(starts_with('week'), names_to ='week', values_to ='code.raw') |>left_join(dates, join_by(week)) |>mutate(across(c(plot, replicate), as.numeric),id =consecutive_id(species, family, location, plot, replicate)) |>mutate(code.clean =str_remove(code.raw, 'E.*') |>str_remove('\\.') |>str_remove('0') |>str_remove_all('[:punct:]') |>str_squish()) |>mutate(flag1 =str_detect(code.clean, '1'),flag2 =str_detect(code.clean, '2'),flag3 =str_detect(code.clean, '3'),flag4 =str_detect(code.clean, '4'),flagx =str_detect(code.clean, 'X'),flagna =is.na(code.clean)) |>mutate(across(starts_with('flag'), ~replace_na(.x, F))) |>arrange(id, date) |>group_by(id) |>mutate(germ =cumsum(flag1) >0,flow =cumsum(flag2) >0,frui =cumsum(flag4) >0,miss =cumsum(flagx) >0,stage.num = germ + flow + frui,stage.desc =case_when(stage.num ==0~'ungerminated', (stage.num ==1) & (!flagna) ~'germinated', (stage.num ==2) & (!flagna) ~'flowered', (stage.num ==3) & (!flagna) ~'fruited', (stage.num >0) & flagna ~'died'),stage =if_else(miss, NA, stage.desc),n.flower =str_count(code.clean, '2') |>replace_na(0),n.fruit =str_count(code.clean, '4') |>replace_na(0)) |>select(id, species, family, location, plot, replicate, week, date, stage, n.flower, n.fruit) |>write_csv(file = here::here('25su-niknafs', 'seedplot.csv'))# read in seedplot dataseedplot <- here::here('25su-niknafs', 'seedplot.csv') |>read_csv() |>mutate(stage =factor(stage, levels =c('ungerminated', 'germinated', 'flowered', 'fruited', 'died'), ordered = T))# previewhead(seedplot) |>pander(caption ='Table: Example rows of analysis-ready data')```I've provided the data in this form to client as `seedplot.csv` for direct import into R. I'd recommend explicitly encoding the life stage variable in the correct order, as shown below.```{r, echo = T}seedplot <- read_csv('seedplot.csv') |> mutate(stage = factor(stage, levels = c('ungerminated', 'germinated', 'flowered', 'fruited', 'died'), ordered = T))```::: callout-noteI considered adding a `curr.stage` or `curr.status` variable that is the same as `stage` but indicates *current* life stage at the time of observation: ungerminated, germinated, flowering, fruiting, living, dead.This information is captured implicitly by the combination of flower/fruit counts and the existing `stage` variable, but would it be useful to encode it explicitly?:::## Descriptive analysisComparing progressions through life stages shows:- almost all germination occurs by 2/15/2024- slightly higher germination rate for perennial compared with annual species- no survival at rock outcrop after 6/25/2024- perennials generally don't flower/fruit except in seep environment after 6/25- one rock outcrop plot missing, but otherwise few missing values```{r, echo = T}#| fig-width: 10#| fig.height: 6#| fig-cap: "Figure: Life stage progression by location and species (missing data removed)"# life stage over timeseedplot |> count(date, stage, location, species) |> ggplot(aes(x = factor(date), y = n, fill = stage)) + facet_grid(location ~ species) + geom_bar(position = 'fill', stat = 'identity', alpha = 0.9) + scale_fill_grafify() + scale_y_continuous(expand = rep(0, 2)) + theme_bw() + theme(panel.grid = element_blank(), panel.spacing = unit(0.15, 'in'), axis.text.x = element_text(angle = 90, vjust = 0.5)) + labs(x = 'date', y = 'proportion of individuals')```Here's the same figure but with normalization after removing missing data:```{r, echo = T}#| fig-width: 10#| fig.height: 6#| fig-cap: "Figure: Life stage progression by location and species (missing data removed)"# life stage over timeseedplot |> count(date, stage, location, species) |> drop_na(stage) |> ggplot(aes(x = factor(date), y = n, fill = stage)) + facet_grid(location ~ species) + geom_bar(position = 'fill', stat = 'identity', alpha = 0.9) + scale_fill_grafify() + scale_y_continuous(expand = rep(0, 2)) + theme_bw() + theme(panel.grid = element_blank(), panel.spacing = unit(0.15, 'in'), axis.text.x = element_text(angle = 90, vjust = 0.5)) + labs(x = 'date', y = 'proportion of individuals')```## Analysis of rate dataAnalysis of event occurrence rates aggregated over the study period models the probability that the individual germinated/flowered/fruited at any point during the study period, possibly conditional on prior life stages. For this, individuals are dropped if missing *throughout* the study, but if nonmissing observations were recorded for any period of time, the individual is retained through the last stage for which values are recorded. The framework for this analysis is a generalized linear mixed model for the relevant event probability.### Germination ratesI've added an indicator of whether germination occurred at some time during the study. Here are some examples of the indicator encoding:```{r, echo = T}# binary indicator for germinationgerm <- seedplot |> drop_na(stage) |> group_by(id, species, family, location, plot) |> summarize(last.stage = max(stage), .groups = 'drop') |> mutate(germ = fct_collapse(last.stage, no = 'ungerminated', other_level = 'yes') |> factor(ordered = F)) ``````{r}germ |>group_by(last.stage) |>slice_head(n =1) |>pander(caption ='Table: Example germination indicator encoding')```The figure below shows germination rates (aggregated over plots) by lineage, location, and species, with the location/species average overlaid as a vertical dashed line.```{r}#| fig-cap: "Figure: Germination rates by species, lineage, and location; average shown in blue dash."# compute group means for primary study armsmean_grates <- germ |>group_by(species, location) |>summarize(prop.germ =mean(germ =='yes'), .groups ='drop') # plot germ |>group_by(species, location, family) |>summarize(prop.germ =mean(germ =='yes'), .groups ='drop') |>ggplot(aes(y =fct_reorder(family, .x = prop.germ), x = prop.germ)) +facet_wrap(species~location, scales ='free_y') +geom_point() +geom_linerange(aes(xmin =0, xmax = prop.germ)) +theme_bw() +theme(panel.grid.minor =element_blank(),panel.grid.major.y =element_blank()) +labs(x ='germination rate', y =NULL) +geom_vline(aes(xintercept = prop.germ), linetype ='dashed',color ='blue',data = mean_grates)```My observations:- no apparent location effect on germination rates in aggregate, but possible location/lineage interactions + usually slightly higher rates in seep environments, but not uniformly; *e.g.*, P9x11 does better in rock outcrop) + if there are location/lineage interactions, the location effect may not be significant; few differences in germination rates exceeding 5-10%- apparent species effect -- higher germination rates for perennialsInterestingly, there is a bit of a plot effect *and* a possible plot/species interaction. This suggests that there may be unaccounted-for microhabitat features that influence fitness.```{r}#| fig-height: 3#| fig-cap: "Figure: Plot effects and plot/species interactions"# look for plot effectgerm |>group_by(species, plot, location) |>summarize(prop.germ =mean(germ =='yes'), .groups ='drop') |>ggplot(aes(x = species, y = prop.germ)) +facet_wrap(~location) +geom_point() +geom_path(aes(group = plot)) +theme_bw() +theme(panel.grid.minor =element_blank(),panel.grid.major.x =element_blank()) +labs(y ='germination rate')```#### Analysis ignoring lineageLet $Y_{ijkl}$ denote the whether germination occurred for the $l$th replicate of species $i$ in plot $k$ in microhabitat type $j$. A model for the data, ignoring any effect of lineage, is:$$\begin{cases}Y_{ijkl} \sim \text{Bernoulli}(p_{ijk}) \\\log\left(\frac{p_{ijk}}{1 - p_{ijk}}\right) = \text{spp}_i + \text{mh}_j + \left(\text{spp}\times\text{mh}\right)_{ij} + \text{plot}_{k(j)} + \left(\text{plot}\times\text{spp}\right)_{ik(j)}\end{cases}$$Fixed effects:- species $\text{spp}_i$- microhabitat $\text{mh}_j$- species/microhabitat interaction $\left(\text{spp}\times\text{mh}\right)_{ij}$Random effects:- plot (nested within microhabitat) $\text{plot}_{k(j)}$- plot/species interaction (nested within microhabitat) $\left(\text{plot}\times\text{spp}\right)_{ik(j)}$To fit the model, use `lme4::glmer`:```{r, echo = T}# fit modelfit <- glmer(germ ~ species*location + (1 | plot:location/species), data = germ, family = binomial) ```This produces the inference:```{r, echo = T}# tests for fixed effectscar::Anova(fit, type = 'II') |> pander(caption = 'Table: Type II Wald tests for fixed effects')# test for plot effectglm(germ ~ species*location, data = germ, family = binomial) %>% anova(fit, ., type = 'LRT') |> pander(caption = 'Table: Likelihood ratio test for plot effect')# test for plot/species interactionglmer(germ ~ species*location + (1 | plot:location), data = germ, family = binomial) %>% anova(fit, ., type = 'LRT') |> pander(caption = 'Table: Likelihood ratio test for plot/species interaction')```In summary:- the data provide evidence of a species effect ($\chi^2_1 = 5.631, p = 0.01765$) and a plot effect ($\chi^2_2 = 168.1, p < 0.0001$)- the data do not provide evidence of a location effect or a species/plot interaction#### Adjusting for lineageLet $Y_{ijklm}$ denote the whether germination occurred for the $m$th replicate of lineage $l$ of species $i$ in plot $k$ in microhabitat type $j$. Agumenting the model above to account for lineage yields:$$\begin{cases}Y_{ijklm} \sim \text{Bernoulli}(p_{ijk}) \\\log\left(\frac{p_{ijkl}}{1 - p_{ijkl}}\right) = \text{spp}_i + \text{mh}_j + \left(\text{spp}\times\text{mh}\right)_{ij} + \text{plot}_{k(j)} + \left(\text{plot}\times\text{spp}\right)_{ik(j)} + \text{lin}_{l(i)}\end{cases}$$Fixed effects:- species $\text{spp}_i$- microhabitat $\text{mh}_j$- species/microhabitat interaction $\left(\text{spp}\times\text{mh}\right)_{ij}$Random effects:- plot (nested within microhabitat) $\text{plot}_{k(j)}$- plot/species interaction (nested within microhabitat) $\left(\text{plot}\times\text{spp}\right)_{ik(j)}$- **lineage (nested within species) $\text{lin}_{l(i)}$**To fit the model:```{r, echo = T}# fit modelfit_lin <- glmer(germ ~ species*location + (1 | plot:location/species) + (1 | family:species), data = germ, family = binomial) ```By comparison with the initial model, there is evidence of an effect of lineage:```{r, echo = T}# test for effect of lineageanova(fit_lin, fit) |> pander(caption = 'Table: Likelihood ratio test for lineage effect')```And interestingly, the effect of species vanishes when lineage is accounted for:```{r, echo = T}# tests for fixed effectscar::Anova(fit_lin, type = 'II') |> pander(caption = 'Table: Type II Wald tests for fixed effects')```Inference on the other fixed effects -- location and location/species interactions -- are unchanged. Likewise for inferences on random effects -- plot and plot/species interaction.```{r}# test for plot effectglmer(germ ~ species*location + (1| family:species),data = germ, family = binomial) %>%anova(fit_lin, ., type ='LRT') |>pander(caption ='Table: Likelihood ratio test for plot effect')# test for plot/species interactionglmer(germ ~ species*location + (1| plot:location) + (1| family:species),data = germ, family = binomial) %>%anova(fit_lin, ., type ='LRT') |>pander(caption ='Table: Likelihood ratio test for plot/species interaction')```In summary:- the data provide evidence of a lineage effect ($\chi^2_1 = 14.16, p = 0.0001677$) and a plot effect ($\chi^2_2 = 172.2, p < 0.0001$)- the data do not provide evidence of a species effect, a location effect, or location/species or plot/species interactionsAs an aside, I also tested for lineage interactions with plot and location (separately), and neither were significant. These are shown below.```{r}# test for lineage/plot interactionglmer(germ ~ species*location + (1| plot:location/species) + (1| family:species) + (1| family:species:plot:location),data = germ, family = binomial) %>%anova(fit_lin) |>pander(caption ='Table: Likelihood ratio test for lineage/plot interaction')# test for lineage/location interactionglmer(germ ~ species*location + (1| plot:location/species) + (1| family:species/location),data = germ, family = binomial) %>%anova(fit_lin) |>pander(caption ='Table: Likelihood ratio test for lineage/location interaction')```My recommendation would be to omit interaction terms involving lineage for model parsimony, but a full model could include both interactions above. This may be desirable for interpretability, but won't change the inferences about effects. I checked but suppressed the analyses -- codes are shown below.```{r, eval = F, echo = T}# full model with all interactionsfit_full <- glmer(germ ~ species*location + (1 | plot:location/species) + (1 | family:species/location) + (1 | family:species:location:plot), data = germ, family = binomial) # inference for fixed effectscar::Anova(fit_full, type = 'II')# inference for random effects: lineage/plot interactionglmer(germ ~ species*location + (1 | plot:location/species) + (1 | family:species/location), data = germ, family = binomial) %>% anova(fit_full, .)# inference for random effects: lineage/location interactionglmer(germ ~ species*location + (1 | plot:location/species) + (1 | family:species) + (1 | family:species:location:plot), data = germ, family = binomial) %>% anova(fit_full, .)# inference for random effects: lineage effectglmer(germ ~ species*location + (1 | plot:location/species), data = germ, family = binomial) %>% anova(fit_full, .)# inference for random effects: plot effectglmer(germ ~ species*location + (1 | family:species/location), data = germ, family = binomial) %>% anova(fit_full, .)# inference for random effects: plot/species interactionglmer(germ ~ species*location + (1 | plot:location) + (1 | family:species), data = germ, family = binomial) %>% anova(fit_full, .)```### Flowering ratesFlowering rates could be computed in one of two ways:1. **Unconditional** on germination, *i.e.*, rate at which all toothpicks flowered2. **Conditional** on germination, *i.e.*, rate at which germinated toothpicks floweredI've carried out parallel analyses using both rates. Personally, I think the conditional rates make more sense for the analysis.#### Unconditional flowering ratesBelow are comparisons of the *unconditional* flowering rates.```{r, echo = T}# binary indicator for flowering (unconditional)flow <- seedplot |> drop_na(stage) |> group_by(id, species, family, location, plot) |> summarize(flow = factor(sum(n.flower) > 0, levels = c(F, T), labels = c('no', 'yes')), .groups = 'drop')``````{r}#| fig-cap: "Figure: Unconditional flowering rates by species, lineage, and location; average shown in blue dash."# compute group means for primary study armsmean_frates <- flow |>group_by(species, location) |>summarize(prop.flow =mean(flow =='yes'), .groups ='drop') # plot flow |>group_by(species, location, family) |>summarize(prop.flow =mean(flow =='yes'), .groups ='drop') |>ggplot(aes(y =fct_reorder(family, .x = prop.flow), x = prop.flow)) +facet_wrap(species~location, scales ='free_y') +geom_point() +geom_linerange(aes(xmin =0, xmax = prop.flow)) +theme_bw() +theme(panel.grid.minor =element_blank(),panel.grid.major.y =element_blank()) +labs(x ='unconditional flowering rate', y =NULL) +geom_vline(aes(xintercept = prop.flow), linetype ='dashed',color ='blue',data = mean_frates)```Since some families have unconditional flowering rates of zero, it will not be possible to estimate lineage interactions. The analysis below uses the same model specification as that employed in the analysis of germination rates. (Codes are also identical.)Inference for fixed effects indicates there is a species effect on unconditional flowering rates:```{r, echo = T}# fit modelfit_flow <- glmer(flow ~ species*location + (1 | plot:location/species) + (1 | family:species), data = flow, family = binomial) ``````{r}# anova table for fixed effects (ignore the pander part)car::Anova(fit_flow, type ='II') |>pander(caption ='Table: Inference for fixed effects')```Inference for random effects indicates a lineage effect, a plot effect, and a species/plot interaction. Codes for these analyses are identical to those shown in the previous subsection on analysis of germination rates.```{r}# test for lineage effectglmer(flow ~ species*location + (1| plot:location/species),data = flow, family = binomial) %>%anova(fit_flow) |>pander(caption ='Table: Likelihood ratio test for lineage effect on unconditional flowering rates')# test for plot effectglmer(flow ~ species*location + (1| family:species),data = flow, family = binomial) %>%anova(fit_flow) |>pander(caption ='Table: Likelihood ratio test for plot effect on unconditional flowering rates')# test for species/plot interactionglmer(flow ~ species*location + (1| plot:location) + (1| family:species),data = flow, family = binomial) %>%anova(fit_flow) |>pander(caption ='Table: Likelihood ratio test for species/plot interaction effect on unconditional flowering rates')```Higher-order interactions produce numerical instability in the model fitting and so are omitted. Estimates of unconditional flowering rates for each species/location combination are given below.```{r}# estimated unconditional flowering probabilitiesemmeans(fit_flow, specs =~species*location, type ='response') |>confint() |> broom::tidy() |>select(-df) |>pander(caption ='Table: Estimated (unconditional) flowering probabilities and 95% confidence intervals')```Graphically:```{r}# plot estimated probabilitiesemmeans(fit_flow, specs =~species*location, type ='response') |>confint() |> broom::tidy() |>ggplot(aes(x = species, y = prob, color = location)) +geom_point(position =position_dodge(width =0.5)) +geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width =0.1, position =position_dodge(width =0.5)) +theme_bw() +labs(x =NULL, y ='estimated unconditional flowering probability')```Marginal flowering rates by species (averaged over location) and location (averaged over species) are shown below.```{r}# marginal flowering rates by speciesemmeans(fit_flow, specs =~ species, type ='response') |>confint() |> broom::tidy() |>select(-df) |>pander(caption ='Table: Estimated marginal unconditional flowering rates by species (averaged over location)')# marginal flowering rates by locationemmeans(fit_flow, specs =~ location, type ='response') |>confint() |> broom::tidy() |>select(-df) |>pander(caption ='Table: Estimated marginal unconditional flowering rates by location (averaged over species)')```Lastly, factorial contrasts yield comparisons of marginal flowering rates between (a) locations and (b) species.```{r}# factorial contrastsfac.eff.emmc <-function(levels, ...){ modelr::data_grid(flow, location, species) |>bind_cols(loc.contr =0.5*rep(c(-1, 1), each =2),spp.contr =0.5*rep(c(1, -1), 2)) |>select(ends_with('contr')) |>rename_with(~str_remove_all(.x, '.contr'))}emmeans(fit_flow, specs =~ species*location, type ='response') |>contrast('fac.eff') |>confint() |>as_tibble() |>rename(conf.low = asymp.LCL,conf.high = asymp.UCL) |>select(-df) |>pander(caption ='Table: Factorial contrasts for marginal species and location effects on unconditional flowering rates')```This quantifies the significant species effect as follows: with 95% confidence, the odds of flowering are estimated to be between 2.06 and 7.57 times higher among annuals compared with perennials (point estimate: 3.95).#### Conditional flowering ratesBelow are comparisons of the *conditional* flowering rates.```{r, echo = T}# binary indicator for flowering (conditional on germination)cflow <- seedplot |> drop_na(stage) |> filter(stage > 'ungerminated') |> # notice filter here for conditional rate group_by(id, species, family, location, plot) |> summarize(flow = factor(sum(n.flower) > 0, levels = c(F, T), labels = c('no', 'yes')), .groups = 'drop')``````{r}#| fig-cap: "Figure: Conditional flowering rates by species, lineage, and location; average shown in blue dash."# compute group means for primary study armsmean_cfrates <- cflow |>group_by(species, location) |>summarize(prop.flow =mean(flow =='yes'), .groups ='drop') # plot cflow |>group_by(species, location, family) |>summarize(prop.flow =mean(flow =='yes'), .groups ='drop') |>ggplot(aes(y =fct_reorder(family, .x = prop.flow), x = prop.flow)) +facet_wrap(species~location, scales ='free_y') +geom_point() +geom_linerange(aes(xmin =0, xmax = prop.flow)) +theme_bw() +theme(panel.grid.minor =element_blank(),panel.grid.major.y =element_blank()) +labs(x ='flowering rate (conditional on germination)', y =NULL) +geom_vline(aes(xintercept = prop.flow), linetype ='dashed',color ='blue',data = mean_cfrates)```Since the conditional rates are determined after filtering out ungerminated plants, the relevant sample sizes change. After filtering:```{r}# for display purposesplot_recode <- seedplot |>distinct(plot, location) |>arrange(location) |>group_by(location) |>mutate(plot.num =row_number()) |>ungroup() |>select(plot, plot.num)# sample sizes by plotcflow |>left_join(plot_recode, join_by(plot)) |>count(species, family, location, plot.num) |>spread(plot.num, n) |>arrange(location, species) |>pander(caption ='Table: Sample sizes for conditional flowering rates by species, lineage, and plot')```Because some plots have no germinated plants of particular lineages, it will not be possible to estimate the plot/lineage interaction. However, the row totals are sufficient that the lineage effect is still estimable.Inference for fixed effects (again using the same model specification employed for analysis of germination rates and the same codes) indicates there is a species effect on conditional flowering rates (no change compared with analysis of unconditional rates):```{r}# fit modelfit_cflow <-glmer(flow ~ species*location + (1| plot:location/species) + (1| family:species),data = cflow, family = binomial) # anova table for fixed effectscar::Anova(fit_cflow, type ='II') |>pander(caption ='Table: Inference for fixed effects on conditional flowering rates')```Inference for random effects indicates a plot effect and a species/plot interaction and a lineage effect (no change compared with the analysis of unconditional flowering rates):```{r}# test for lineage effectglmer(flow ~ species*location + (1| plot:location/species),data = cflow, family = binomial) %>%anova(fit_cflow) |>pander(caption ='Table: Likelihood ratio test for lineage effect on conditional flowering rates')# test for plot effectglmer(flow ~ species*location + (1| family:species),data = cflow, family = binomial) %>%anova(fit_cflow) |>pander(caption ='Table: Likelihood ratio test for plot effect on conditional flowering rates')# test for species/plot interactionglmer(flow ~ species*location + (1| plot:location) + (1| family:species),data = cflow, family = binomial) %>%anova(fit_cflow) |>pander(caption ='Table: Likelihood ratio test for species/plot interaction effect on conditional flowering rates')```Estimates of conditional flowering rates for each species/location combination are given below.```{r}# estimated conditional flowering probabilitiesemmeans(fit_cflow, specs =~species*location, type ='response') |>confint() |> broom::tidy() |>select(-df) |>pander(caption ='Table: Estimated (conditional) flowering probabilities and 95% confidence intervals')```Graphically:```{r}# plot estimated probabilitiesemmeans(fit_cflow, specs =~species*location, type ='response') |>confint() |> broom::tidy() |>ggplot(aes(x = species, y = prob, color = location)) +geom_point(position =position_dodge(width =0.5)) +geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width =0.1, position =position_dodge(width =0.5)) +theme_bw() +labs(x =NULL, y ='estimated conditional flowering probability')```Marginal flowering rates by species (averaged over location) and location (averaged over species) are shown below.```{r}# marginal flowering rates by speciesemmeans(fit_cflow, specs =~ species, type ='response') |>confint() |> broom::tidy() |>select(-df) |>pander(caption ='Table: Marginal conditional flowering rates by species (averaged over location)')# marginal flowering rates by locationemmeans(fit_cflow, specs =~ location, type ='response') |>confint() |> broom::tidy() |>select(-df) |>pander(caption ='Table: Marginal conditional flowering rates by location (averaged over species)')```Lastly, factorial contrasts yield comparisons of marginal flowering rates between (a) locations and (b) species.```{r}# factorial contrastsfac.eff.emmc <-function(levels, ...){ modelr::data_grid(flow, location, species) |>bind_cols(loc.contr =0.5*rep(c(-1, 1), each =2),spp.contr =0.5*rep(c(1, -1), 2)) |>select(ends_with('contr')) |>rename_with(~str_remove_all(.x, '.contr'))}emmeans(fit_cflow, specs =~ species*location, type ='response') |>contrast('fac.eff') |>confint() |>as_tibble() |>rename(conf.low = asymp.LCL,conf.high = asymp.UCL) |>select(-df) |>pander(caption ='Table: Factorial contrasts for marginal species and location effects on conditional flowering rates')```This quantifies the significant species effect as follows: with 95% confidence, the odds of flowering are estimated to be between 2.63 and 10.75 times higher among annuals compared with perennials (point estimate: 5.33).### Fruiting ratesFruiting rates could be computed in one of three ways:1. **Unconditional** on germination, *i.e.*, rate at which all toothpicks produced fruit2. **Conditional** on germination, *i.e.*, rate at which germinated toothpicks produced fruit3. **Conditional** on flowering, *i.e.*, rate at which flowered plants produced fruitBelow I've only shown the analysis for option (3). To explore options 1-2, remove or change the filter on life stage in the codes shown below.```{r, echo = T}# binary indicator for fruiting (conditional on flowering)fruit <- seedplot |> drop_na(stage) |> filter(stage > 'germinated') |> # notice filter here for conditional rate group_by(id, species, family, location, plot) |> summarize(fruit = factor(sum(n.fruit) > 0, levels = c(F, T), labels = c('no', 'yes')), .groups = 'drop')``````{r}#| fig-cap: "Figure: Conditional fruiting rates by species, lineage, and location; average shown in blue dash."# compute group means for primary study armsmean_ftrates <- fruit |>group_by(species, location) |>summarize(prop.fruit =mean(fruit =='yes'), .groups ='drop') # plot fruit |>group_by(species, location, family) |>summarize(prop.fruit =mean(fruit =='yes'), .groups ='drop') |>ggplot(aes(y =fct_reorder(family, .x = prop.fruit), x = prop.fruit)) +facet_wrap(species~location, scales ='free_y') +geom_point() +geom_linerange(aes(xmin =0, xmax = prop.fruit)) +theme_bw() +theme(panel.grid.minor =element_blank(),panel.grid.major.y =element_blank()) +labs(x ='conditional fruiting rate', y =NULL) +geom_vline(aes(xintercept = prop.fruit), linetype ='dashed',color ='blue',data = mean_ftrates)```Since some families have conditional fruiting rates of zero, it will not be possible to estimate lineage interactions. The analysis below uses the same model specification as that employed in the analysis of germination rates. (Codes are also identical.)Inference for fixed effects indicates there is a species effect on conditional fruiting rates:```{r}# fit modelfit_fruit <-glmer(fruit ~ species*location + (1| plot:location/species) + (1| family:species),data = fruit, family = binomial) # anova table for fixed effectscar::Anova(fit_fruit, type ='II') |>pander(caption ='Table: Inference for fixed effects on conditional fruiting rate')```Inference for random effects indicates a plot effect and a species/plot interaction but **no** lineage effect:```{r}# test for lineage effectglmer(fruit ~ species*location + (1| plot:location/species),data = fruit, family = binomial) %>%anova(fit_fruit) |>pander(caption ='Table: Likelihood ratio test for lineage effect on conditional fruiting rates')# test for plot effectglmer(fruit ~ species*location + (1| family:species),data = fruit, family = binomial) %>%anova(fit_fruit) |>pander(caption ='Table: Likelihood ratio test for plot effect on conditional fruiting rates')# test for species/plot interactionglmer(fruit ~ species*location + (1| plot:location) + (1| family:species),data = fruit, family = binomial) %>%anova(fit_fruit) |>pander(caption ='Table: Likelihood ratio test for species/plot interaction effect on conditional fruiting rates')```Higher-order interactions produce numerical instability in the model fitting and so are omitted. Estimates of conditional fruiting rates for each species/location combination are given below.```{r}# estimates of conditional fruiting probabilitiesemmeans(fit_fruit, specs =~species*location, type ='response') |>confint() |> broom::tidy() |>select(-df) |>pander(caption ='Table: Estimated (conditional) fruiting probabilities and 95% confidence intervals')```Graphically:```{r}# plot estimated probabilitiesemmeans(fit_fruit, specs =~species*location, type ='response') |>confint() |> broom::tidy() |>ggplot(aes(x = species, y = prob, color = location)) +geom_point(position =position_dodge(width =0.5)) +geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width =0.1, position =position_dodge(width =0.5)) +theme_bw() +labs(x =NULL, y ='estimated conditional fruiting probability')```Marginal fruiting rates by species (averaged over location) and location (averaged over species) are shown below.```{r}# marginal fruiting rates by speciesemmeans(fit_fruit, specs =~ species, type ='response') |>confint() |> broom::tidy() |>select(-df) |>pander(caption ='Table: Estimated marginal conditional fruiting rates by species (averaged over location)')# marginal fruiting rates by locationemmeans(fit_fruit, specs =~ location, type ='response') |>confint() |> broom::tidy() |>select(-df) |>pander(caption ='Table: Estimated marginal conditional fruiting rates by location (averaged over species)')```Lastly, factorial contrasts yield comparisons of marginal fruiting rates between (a) locations and (b) species.```{r}# factorial contrastsfac.eff.emmc <-function(levels, ...){ modelr::data_grid(fruit, location, species) |>bind_cols(loc.contr =0.5*rep(c(-1, 1), each =2),spp.contr =0.5*rep(c(1, -1), 2)) |>select(ends_with('contr')) |>rename_with(~str_remove_all(.x, '.contr'))}emmeans(fit_fruit, specs =~ species*location, type ='response') |>contrast('fac.eff') |>confint() |>as_tibble() |>rename(conf.low = asymp.LCL,conf.high = asymp.UCL) |>select(-df) |>pander(caption ='Table: Factorial contrasts for marginal species and location effects on conditional fruiting rates')```This quantifies the significant species effect as follows: with 95% confidence, the odds of fruiting are estimated to be between 1.53 and 7.46 times higher among annuals compared with perennials (point estimate: 3.38).## Analysis of time-to-event dataIf you're interested in time patterns, it's fairly easy to obtain first dates for each stage. Here's what that looks like for one individual that saw all life stages:```{r, echo = T}# earliest dates for each life stagefirst_dates <- seedplot |> drop_na(stage) |> group_by(id, species, family, location, plot, replicate, stage) |> slice_min(date) |> select(-starts_with('n'), -week) |> ungroup()# previewfirst_dates |> filter(id == 140) |> pander(caption = 'Table: dates to life stage events for individual no. 140')```This can be rearranged to calculate times to each stage. Here are some examples for individuals who reached different life stages:```{r, echo = T}# time to event datadurations <- first_dates |> pivot_wider(names_from = stage, values_from = date) |> rename(first.obs = ungerminated) |> mutate(time.to.germ = germinated - ymd("2024-01-17"), time.to.flower = flowered - ymd("2024-01-17"), time.to.fruit = fruited - ymd("2024-01-17"), time.to.death = died - ymd("2024-01-17")) ``````{r}# some example rowsbind_rows(durations |>drop_na(time.to.germ) |>slice_head(n =1), durations |>drop_na(time.to.flower) |>slice_head(n =1), durations |>drop_na(time.to.fruit) |>slice_head(n =1)) |>select(id, first.obs, germinated, flowered, fruited, died, time.to.germ, time.to.flower, time.to.fruit, time.to.death) |>pander()```::: callout-noteI manually input the start of the experiment as 1/17/25 because this was the earliest date that appeared in the metadata, and is consistent with the week 1 observation being 1/25/25. Client should confirm and adjust start date if needed so that durations are accurate.:::Here are distributions of times to each event by species and location:```{r}#| fig-height: 8#| fig-width: 8#| fig-cap: "Figure: Distributions of times to life stage events by species and location"durations |>pivot_longer(starts_with('time.to.')) |>ggplot(aes(x = value)) +facet_grid(species*location~name, scale ='free_x') +geom_histogram(bins =5) +theme_bw() +theme(panel.grid.minor =element_blank()) +labs(x ='days')```It's perhaps not the most efficient comparison, but one can see:- approximate time ranges for each event- longer survival times and later flowering and fruiting in seep plotsSurvival analysis methods can be used to estimate the (instantaneous) probability of germination/flowering/fruiting/death as a function of time. I focused on a simple approach using Kaplan-Meier estimators after grouping by location and species **and ignoring plot and lineage effects**. Mixed-effects Cox regression could be used for a proper analysis accounting for the experimental design with a specification analogous to that used above in modeling event rates in aggregate across the study period, but since this is a secondary analysis, I've just shown some simple exploratory work.Missing durations are right-censored -- all we know is that the event of interest occurred (or would have occurred) sometime after the end of the study. To format the data correctly, I've replaced missing durations with the number of days that the study persisted (442) and added an indicator of whether the corresponding observation is censored (`i = 0`) or not (`i = 1`).```{r}# number of days between first and last observation datemax_t <- seedplot |>pull(date) |>range() |>diff()# compute time and status variables for survival analysis# replace NA (right-censored values) with maximum duration (442 days)tdata <- durations |>filter(!is.na(first.obs)) |>mutate(t.g =as.integer(time.to.germ) |>replace_na(as.integer(max_t)),i.g =1-is.na(time.to.germ),t.fl =as.integer(time.to.flower) |>replace_na(as.integer(max_t)),i.fl =1-is.na(time.to.flower),t.fr =as.integer(time.to.fruit) |>replace_na(as.integer(max_t)),i.fr =1-is.na(time.to.fruit),t.d =as.integer(time.to.death) |>replace_na(as.integer(max_t)),i.d =1-is.na(time.to.death)) |>select(id, species, location, starts_with('t.'), starts_with('i.'))# previewtdata |>filter(t.g <442) |>slice_head(n =6) |>pander(caption ='Table: data formatted for analysis of time-to-event data')# export for client usewrite_csv(tdata, file ='seedplot-times.csv')```I've provided a copy of the data formatted in this way as `seedplot-times.csv`.```{r, echo = T}# read in time to event datatdata <- read_csv('seedplot-times.csv')```A quick caveat: it's somewhat awkward to represent germination, flowering, and fruiting as right-censored since this implicitly assumes that these events would eventually occur, which is not true in this system. This is more conceptual than analytical -- what will happen in the analysis is that after sufficient time the estimated instantaneous event probabilities will plateau at the long-run event rates.### GerminationModeling the instantaneous probability of germination as a function of time since the start of the study shows relatively quick convergence to aggregate germination rates:```{r}# kaplan-meier curves for germination probabilityfit_km_germ <-survfit(Surv(t.g, i.g) ~interaction(location, species), data = tdata)# plottibble(t = fit_km_germ$time |>c(rep(0, 4)), haz = fit_km_germ$surv |>c(rep(1, 4)),lwr = fit_km_germ$lower |>c(rep(NA, 4)),upr = fit_km_germ$upper |>c(rep(NA, 4)),group =rep(c('rock.LLA', 'seep.LLA', 'rock.LLP', 'seep.LLP'), fit_km_germ$strata) |>c(c('rock.LLA', 'seep.LLA', 'rock.LLP', 'seep.LLP'))) |>separate_wider_delim(cols = group, delim ='.', names =c('location', 'species')) |>ggplot(aes(x = t, y =1- haz)) +geom_step(aes(color = species, linetype = location)) + pammtools::geom_stepribbon(aes(ymin =1- lwr, ymax =1- upr, fill = species,linetype = location),alpha =0.2) +theme_bw() +theme(panel.grid.minor =element_blank()) +guides(linetype =guide_legend(override.aes =list(fill =NA))) +labs(x ='time (days)',y ='Pr(germination)')```The bands show pointwise 95% confidence intervals, and the overlap is consistent with the prior analysis which found no effect of species or location on germination rates. This just displays that conclusion a bit differently. There doesn't look to be much difference in the rate at which the germination probability increases with time, though the estimate for LLP does increase slightly faster in the rock outrcop microhabitat.### FloweringFor modeling time to flowering, I used:1. Flowering occurrences *conditional on* germination, *i.e.*, all ungerminated individuals were removed prior to analysis2. The time since germination (rather than the time since the start of the experiment)This yields the following estimates of the instantaneous probability of flowering as a function of time since germination:```{r}# kaplan-meier curves for (conditional) flowering probabilityfit_km_flow <- tdata |>filter(t.g <as.integer(max_t)) %>%survfit(Surv(t.fl - t.g, i.fl) ~interaction(location, species), data = .)tibble(t = fit_km_flow$time |>c(rep(0, 4)), haz = fit_km_flow$surv |>c(rep(1, 4)),lwr = fit_km_flow$lower |>c(rep(NA, 4)),upr = fit_km_flow$upper |>c(rep(NA, 4)),group =rep(c('rock.LLA', 'seep.LLA', 'rock.LLP', 'seep.LLP'), fit_km_flow$strata) |>c(c('rock.LLA', 'seep.LLA', 'rock.LLP', 'seep.LLP'))) |>separate_wider_delim(cols = group, delim ='.', names =c('location', 'species')) |>ggplot(aes(x = t, y =1- haz)) +geom_step(aes(color = species, linetype = location)) + pammtools::geom_stepribbon(aes(ymin =1- lwr, ymax =1- upr, fill = species,linetype = location),alpha =0.2) +theme_bw() +theme(panel.grid.minor =element_blank()) +guides(linetype =guide_legend(override.aes =list(fill =NA))) +labs(x ='time (days since germination)',y ='Pr(flower | germinated)')```There is a clear and persistent difference in flowering probability by species: annuals flower at higher rates, as one might expect. While there is no long-run apparent difference in flowering probability by microhabitat, the flowering probability increases faster in the rock outcrop microhabitat. This reflects the descriptive finding earlier that flowering occurs later in the seep microhabitat.### FruitingSimilar to flowering, I estimated the instantaneous *conditional* probability of fruiting given germination, as a function of the time since germination. ::: callout-noteClient may prefer to model the conditional probability of fruiting given flowering as a function of time since flowering. If so, change `Surv(t.fr - t.g, i.fr)` to `Surv(t.fr - t.fl, i.fr)` in the model specification.:::```{r}# kaplan-meier curves for (conditional) fruiting probabilityfit_km_fru <- tdata |>filter(t.g <as.integer(max_t)) %>%survfit(Surv(t.fr - t.g, i.fr) ~interaction(location, species), data = .)tibble(t = fit_km_fru$time |>c(rep(0, 4)), haz = fit_km_fru$surv |>c(rep(1, 4)),lwr = fit_km_fru$lower |>c(rep(NA, 4)),upr = fit_km_fru$upper |>c(rep(NA, 4)),group =rep(c('rock.LLA', 'seep.LLA', 'rock.LLP', 'seep.LLP'), fit_km_fru$strata) |>c(c('rock.LLA', 'seep.LLA', 'rock.LLP', 'seep.LLP'))) |>separate_wider_delim(cols = group, delim ='.', names =c('location', 'species')) |>ggplot(aes(x = t, y =1- haz)) +geom_step(aes(color = species, linetype = location)) + pammtools::geom_stepribbon(aes(ymin =1- lwr, ymax =1- upr, fill = species,linetype = location),alpha =0.2) +theme_bw() +theme(panel.grid.minor =element_blank()) +guides(linetype =guide_legend(override.aes =list(fill =NA))) +labs(x ='time (days since germination)',y ='Pr(fruit | germinated)')```Interestingly, there appears to be an interaction effect here: depending on species, the rate of increase and plateau differs between environments.### SurvivalFinally, an analysis of survival times after germination is given below. The figure shows the probability of survival as a function of time, and reflects the descriptive finding earlier that plants survived longer in the seep microhabitat.```{r}# kaplan-meier curves for survival after germinationfit_km_surv <- tdata |>filter(t.g <as.integer(max_t)) %>%survfit(Surv(t.d - t.g, i.d) ~interaction(location, species), data = .)# plottibble(t = fit_km_surv$time |>c(rep(0, 4)), haz = fit_km_surv$surv |>c(rep(1, 4)),lwr = fit_km_surv$lower |>c(rep(NA, 4)),upr = fit_km_surv$upper |>c(rep(NA, 4)),group =rep(c('rock.LLA', 'seep.LLA', 'rock.LLP', 'seep.LLP'), fit_km_surv$strata) |>c(c('rock.LLA', 'seep.LLA', 'rock.LLP', 'seep.LLP'))) |>separate_wider_delim(cols = group, delim ='.', names =c('location', 'species')) |>ggplot(aes(x = t, y = haz)) +geom_step(aes(color = species, linetype = location)) + pammtools::geom_stepribbon(aes(ymin = lwr, ymax = upr, fill = species,linetype = location),alpha =0.2) +theme_bw() +theme(panel.grid.minor =element_blank()) +guides(linetype =guide_legend(override.aes =list(fill =NA))) +labs(x ='time (days since germination)',y ='Pr(survival | germinated)')```## ReproducibilityI've provided a copy of all codes used to generate analyses shown above starting from the client's spreadsheet. In addition, I generated two derivative files with reformatted data:- `seedplot.csv` has data arranged for event occurrence rate analysis- `seedplot-time.csv` has data arranged for time-to-event analysisThese files are available (only client has access permissions) in a compressed folder [here](https://cpslo-my.sharepoint.com/:u:/r/personal/truiz01_calpoly_edu/Documents/consulting/25su-niknafs/niknafs-consult-toclient.zip?csf=1&web=1&e=jporSr) (you may need to right-click to open).