Monkeyflower reciprocal transplant study

CLIENT

Avin Niknafs

DATE

August 19, 2025

From client:

I am conducting a reciprocal transplant study to test for local adaptation between two monkeyflower species in distinct microhabitats at a sympatric site on the central coast of California. My project involves analyzing germination rates, seedling survival of the two species, and reproductive success, and if possible relating those outcomes to weekly soil moisture data.

Seeded toothpicks were planted and surveyed weekly for roughly one year. Study design:

In total, this amounts to 108 toothpicks (6 reps x 9 lineages x 2 species) per plot and a total sample size of \(n = 2160\) (108 per plot x 10 plots x 2 microhabitats). Primary interest is in species/environment interaction effects on fitness.

Goals set during initial meeting:

  1. Formulate an analysis-ready rearrangment of seed plot data.
  2. Recommend an approach for analysis of germination rates.
Note

Questions/issues needing client’s attention are shown throughout like this.

Seed plot data

Data provided are shown below. I removed plot index A4_10_3, which was missing identifying information for species and lineage. I suspect this may have been miscoded, since after removal \(n = 2159\), which is one short of expected based on the study design.

Note

Check plot coding for index A4_10_3. Was this miscoded?

For the three individuals shown, values are missing in all weeks because the toothpicks never germinated.

Table: raw data (continued below)
species mom dad envname family habit color
LLP 9 G1 11 G1 LLP 9 G1 x LLP 11 G1 P9x11 P WhitePink
LLP 6 G1 20 G1 LLP 6 G1 x LLP 20 G1 P6x20 P WhiteBlue
LLP 15 G1 1 G1 LLP 15 G1 x LLP 1 G1 P15x1 P WhiteBlack
Table continues below
location replicate plotind plot famabbrev position flower?
rock_outcrop 1 P1_1_1 1 P1 1 NA
rock_outcrop 1 P7_1_1 1 P7 2 NA
rock_outcrop 1 P5_1_1 1 P5 3 NA
Table continues below
week1 week2 week3 week4 week5 week6 week7 week8 week9 week10
NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA
Table continues below
week11 week12 week13 week14 week15 week16 week17 week18 week19
NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
Table continues below
week20 week22 week24 week26 week28 week30 week33 week35 week39
NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
Table continues below
week42 week44 week46 week48 week50 week52 week54 week56 week58
NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
week60 week63 week65
NA NA NA
NA NA NA
NA NA NA

To verify that the data reflect the study design, the following tables show distinct combinations of study attributes and sample sizes per plot:

Table: Number of unique lineages per location/species
location LLA LLP
rock_outcrop 9 9
seep 9 9
Table: Sample sizes per plot
plot rock_outcrop seep
1 108 107
2 108 108
3 108 108
4 108 108
5 108 108
6 108 108
7 108 108
8 108 108
9 108 108
10 108 108

The values entered for each individual each week are:

  • missing if ungerminated, dead, or otherwise not present
  • 1 if germinated
  • 2 if flowering, with number of repetitions indicating number of flowers (e.g., 222 means three flowers)
  • 3 if alive post-flowering
  • 4 if fruiting, with number of repetitions indicating number of fruits (e.g., 44 means two fruits)
  • X if missing for reasons other than death
  • combinations as appropriate (e.g., 222 4 indicates three flowers and one fruit)

The unique encodings present in the data are shown below:

 [1] NA                         "X"                       
 [3] "1"                        "2"                       
 [5] "3"                        "22"                      
 [7] "222"                      "4"                       
 [9] "222222 (4)"               "3 (4)"                   
[11] "2222"                     "2 (4)"                   
[13] "222222"                   "22 (4)"                  
[15] " "                        "222222222"               
[17] "22222"                    "222 (4)"                 
[19] "22 (44)"                  "2222 (4444)"             
[21] "222222 (444444)"          "222 (444)"               
[23] "22222222222222"           "2222222"                 
[25] "2 (4444)"                 "222 (444444)"            
[27] "44"                       "22 (444)"                
[29] "2 (44)"                   "444"                     
[31] "22222222"                 "222222222222"            
[33] "22 (444444)"              "2 (444)"                 
[35] "4444"                     "2222222222 (44444444444)"
[37] "2222222222"              

The decimals are artefacts of value encoding in excel when the entry can be parsed as numeric. So for example 2.22222222E8 is nine 2’s rewritten in scientific notation (\(2.22222222 \times 10^8\)).

Reformatting for analysis

I’ve cleaned up the encoding and data formatting for analysis as follows:

  • rearrange in ‘long’ format with a variable indicating week number

  • append sampling dates from the CensusDate sheet

  • derive a stage variable encoding life stage achieved by the time of observation:

  • ungerminated if no values were entered prior to the observation date

  • germinated if a ‘1’ is recorded and no other values were entered prior to the observation date

  • flowered if a ‘2’ was entered prior to or on the observation date

  • fruited if a ‘4’ was entered prior to or on the observation date

  • died if value is missing on the current observation date but at least one nonmissing value was entered prior to the observation date

  • NA if an ‘X’ was entered prior to or on the observation date

  • derive counts of numbers of fruits and flowers

  • add a unique identifier id per toothpick

Table: Example rows of analysis-ready data (continued below)
id species family location plot replicate week date
1 LLP P9x11 rock_outcrop 1 1 week1 2024-01-25
1 LLP P9x11 rock_outcrop 1 1 week2 2024-02-01
1 LLP P9x11 rock_outcrop 1 1 week3 2024-02-08
1 LLP P9x11 rock_outcrop 1 1 week4 2024-02-15
1 LLP P9x11 rock_outcrop 1 1 week5 2024-02-22
1 LLP P9x11 rock_outcrop 1 1 week6 2024-02-29
stage n.flower n.fruit
ungerminated 0 0
ungerminated 0 0
ungerminated 0 0
ungerminated 0 0
ungerminated 0 0
ungerminated 0 0

I’ve provided the data in this form to client as seedplot.csv for direct import into R. I’d recommend explicitly encoding the life stage variable in the correct order, as shown below.

Code
seedplot <- read_csv('seedplot.csv') |>
  mutate(stage = factor(stage, 
                        levels = c('ungerminated', 'germinated', 
                                   'flowered', 'fruited', 'died'), 
                        ordered = T))
Note

I considered adding a curr.stage or curr.status variable that is the same as stage but indicates current life stage at the time of observation: ungerminated, germinated, flowering, fruiting, living, dead.

This information is captured implicitly by the combination of flower/fruit counts and the existing stage variable, but would it be useful to encode it explicitly?

Descriptive analysis

Comparing progressions through life stages shows:

  • almost all germination occurs by 2/15/2024
  • slightly higher germination rate for perennial compared with annual species
  • no survival at rock outcrop after 6/25/2024
  • perennials generally don’t flower/fruit except in seep environment after 6/25
  • one rock outcrop plot missing, but otherwise few missing values
Code
# life stage over time
seedplot |>
  count(date, stage, location, species) |>
  ggplot(aes(x = factor(date), y = n, fill = stage)) +
  facet_grid(location ~ species) +
  geom_bar(position = 'fill', stat = 'identity', alpha = 0.9) +
  scale_fill_grafify() +
  scale_y_continuous(expand = rep(0, 2)) +
  theme_bw() +
  theme(panel.grid = element_blank(), 
        panel.spacing = unit(0.15, 'in'),
        axis.text.x = element_text(angle = 90, vjust = 0.5)) +
  labs(x = 'date', y = 'proportion of individuals')

Figure: Life stage progression by location and species (missing data removed)

Here’s the same figure but with normalization after removing missing data:

Code
# life stage over time
seedplot |>
  count(date, stage, location, species) |>
  drop_na(stage) |>
  ggplot(aes(x = factor(date), y = n, fill = stage)) +
  facet_grid(location ~ species) +
  geom_bar(position = 'fill', stat = 'identity', alpha = 0.9) +
  scale_fill_grafify() +
  scale_y_continuous(expand = rep(0, 2)) +
  theme_bw() +
  theme(panel.grid = element_blank(), 
        panel.spacing = unit(0.15, 'in'),
        axis.text.x = element_text(angle = 90, vjust = 0.5)) +
  labs(x = 'date', y = 'proportion of individuals')

Figure: Life stage progression by location and species (missing data removed)

Analysis of rate data

Analysis of event occurrence rates aggregated over the study period models the probability that the individual germinated/flowered/fruited at any point during the study period, possibly conditional on prior life stages. For this, individuals are dropped if missing throughout the study, but if nonmissing observations were recorded for any period of time, the individual is retained through the last stage for which values are recorded.

The framework for this analysis is a generalized linear mixed model for the relevant event probability.

Germination rates

I’ve added an indicator of whether germination occurred at some time during the study. Here are some examples of the indicator encoding:

Code
# binary indicator for germination
germ <- seedplot |>
  drop_na(stage) |>
  group_by(id, species, family, location, plot) |>
  summarize(last.stage = max(stage), .groups = 'drop') |>
  mutate(germ = fct_collapse(last.stage, 
                             no = 'ungerminated', 
                             other_level = 'yes') |>
           factor(ordered = F)) 
Table: Example germination indicator encoding
id species family location plot last.stage germ
1 LLP P9x11 rock_outcrop 1 ungerminated no
927 LLP P14x16 rock_outcrop 9 germinated yes
993 LLP P17x6 seep 10 flowered yes
1299 LLP P13x14 seep 13 fruited yes
25 LLA A30x3 rock_outcrop 1 died yes

The figure below shows germination rates (aggregated over plots) by lineage, location, and species, with the location/species average overlaid as a vertical dashed line.

Figure: Germination rates by species, lineage, and location; average shown in blue dash.

My observations:

  • no apparent location effect on germination rates in aggregate, but possible location/lineage interactions

    • usually slightly higher rates in seep environments, but not uniformly; e.g., P9x11 does better in rock outcrop)
    • if there are location/lineage interactions, the location effect may not be significant; few differences in germination rates exceeding 5-10%
  • apparent species effect – higher germination rates for perennials

Interestingly, there is a bit of a plot effect and a possible plot/species interaction. This suggests that there may be unaccounted-for microhabitat features that influence fitness.

Figure: Plot effects and plot/species interactions

Analysis ignoring lineage

Let \(Y_{ijkl}\) denote the whether germination occurred for the \(l\)th replicate of species \(i\) in plot \(k\) in microhabitat type \(j\). A model for the data, ignoring any effect of lineage, is:

\[ \begin{cases} Y_{ijkl} \sim \text{Bernoulli}(p_{ijk}) \\ \log\left(\frac{p_{ijk}}{1 - p_{ijk}}\right) = \text{spp}_i + \text{mh}_j + \left(\text{spp}\times\text{mh}\right)_{ij} + \text{plot}_{k(j)} + \left(\text{plot}\times\text{spp}\right)_{ik(j)} \end{cases} \]

Fixed effects:

  • species \(\text{spp}_i\)
  • microhabitat \(\text{mh}_j\)
  • species/microhabitat interaction \(\left(\text{spp}\times\text{mh}\right)_{ij}\)

Random effects:

  • plot (nested within microhabitat) \(\text{plot}_{k(j)}\)
  • plot/species interaction (nested within microhabitat) \(\left(\text{plot}\times\text{spp}\right)_{ik(j)}\)

To fit the model, use lme4::glmer:

Code
# fit model
fit <- glmer(germ ~ species*location + (1 | plot:location/species),
             data = germ, family = binomial) 

This produces the inference:

Code
# tests for fixed effects
car::Anova(fit, type = 'II') |> 
  pander(caption = 'Table: Type II Wald tests for fixed effects')
Table: Type II Wald tests for fixed effects
  Chisq Df Pr(>Chisq)
species 5.631 1 0.01765
location 0.002503 1 0.9601
species:location 0.0118 1 0.9135
Code
# test for plot effect
glm(germ ~ species*location,
                data = germ, family = binomial) %>% 
  anova(fit, ., type = 'LRT') |> 
  pander(caption = 'Table: Likelihood ratio test for plot effect')
Table: Likelihood ratio test for plot effect
  npar AIC BIC logLik -2*log(L) Chisq Df Pr(>Chisq)
. 4 2795 2818 -1394 2787 NA NA NA
fit 6 2631 2665 -1310 2619 168.1 2 3.146e-37
Code
# test for plot/species interaction
glmer(germ ~ species*location + (1 | plot:location),
                  data = germ, family = binomial) %>%
  anova(fit, ., type = 'LRT') |> 
  pander(caption = 'Table: Likelihood ratio test for plot/species interaction')
Table: Likelihood ratio test for plot/species interaction
  npar AIC BIC logLik -2*log(L) Chisq Df Pr(>Chisq)
. 5 2630 2658 -1310 2620 NA NA NA
fit 6 2631 2665 -1310 2619 0.2977 1 0.5853

In summary:

  • the data provide evidence of a species effect (\(\chi^2_1 = 5.631, p = 0.01765\)) and a plot effect (\(\chi^2_2 = 168.1, p < 0.0001\))
  • the data do not provide evidence of a location effect or a species/plot interaction

Adjusting for lineage

Let \(Y_{ijklm}\) denote the whether germination occurred for the \(m\)th replicate of lineage \(l\) of species \(i\) in plot \(k\) in microhabitat type \(j\). Agumenting the model above to account for lineage yields:

\[ \begin{cases} Y_{ijklm} \sim \text{Bernoulli}(p_{ijk}) \\ \log\left(\frac{p_{ijkl}}{1 - p_{ijkl}}\right) = \text{spp}_i + \text{mh}_j + \left(\text{spp}\times\text{mh}\right)_{ij} + \text{plot}_{k(j)} + \left(\text{plot}\times\text{spp}\right)_{ik(j)} + \text{lin}_{l(i)} \end{cases} \]

Fixed effects:

  • species \(\text{spp}_i\)
  • microhabitat \(\text{mh}_j\)
  • species/microhabitat interaction \(\left(\text{spp}\times\text{mh}\right)_{ij}\)

Random effects:

  • plot (nested within microhabitat) \(\text{plot}_{k(j)}\)
  • plot/species interaction (nested within microhabitat) \(\left(\text{plot}\times\text{spp}\right)_{ik(j)}\)
  • lineage (nested within species) \(\text{lin}_{l(i)}\)

To fit the model:

Code
# fit model
fit_lin <- glmer(germ ~ species*location + (1 | plot:location/species) + (1 | family:species),
                 data = germ, family = binomial) 

By comparison with the initial model, there is evidence of an effect of lineage:

Code
# test for effect of lineage
anova(fit_lin, fit) |>
  pander(caption = 'Table: Likelihood ratio test for lineage effect')
Table: Likelihood ratio test for lineage effect
  npar AIC BIC logLik -2*log(L) Chisq Df Pr(>Chisq)
fit 6 2631 2665 -1310 2619 NA NA NA
fit_lin 7 2619 2659 -1303 2605 14.16 1 0.0001677

And interestingly, the effect of species vanishes when lineage is accounted for:

Code
# tests for fixed effects
car::Anova(fit_lin, type = 'II') |> 
  pander(caption = 'Table: Type II Wald tests for fixed effects')
Table: Type II Wald tests for fixed effects
  Chisq Df Pr(>Chisq)
species 2.139 1 0.1436
location 0.002618 1 0.9592
species:location 0.01313 1 0.9088

Inference on the other fixed effects – location and location/species interactions – are unchanged. Likewise for inferences on random effects – plot and plot/species interaction.

Table: Likelihood ratio test for plot effect
  npar AIC BIC logLik -2*log(L) Chisq Df Pr(>Chisq)
. 5 2787 2816 -1389 2777 NA NA NA
fit_lin 7 2619 2659 -1303 2605 172.2 2 3.985e-38
Table: Likelihood ratio test for plot/species interaction
  npar AIC BIC logLik -2*log(L) Chisq Df Pr(>Chisq)
. 6 2618 2651 -1303 2606 NA NA NA
fit_lin 7 2619 2659 -1303 2605 0.4603 1 0.4975

In summary:

  • the data provide evidence of a lineage effect (\(\chi^2_1 = 14.16, p = 0.0001677\)) and a plot effect (\(\chi^2_2 = 172.2, p < 0.0001\))
  • the data do not provide evidence of a species effect, a location effect, or location/species or plot/species interactions

As an aside, I also tested for lineage interactions with plot and location (separately), and neither were significant. These are shown below.

Table: Likelihood ratio test for lineage/plot interaction
  npar AIC BIC logLik -2*log(L) Chisq Df Pr(>Chisq)
fit_lin 7 2619 2659 -1303 2605 NA NA NA
. 8 2621 2666 -1302 2605 0.3193 1 0.572
Table: Likelihood ratio test for lineage/location interaction
  npar AIC BIC logLik -2*log(L) Chisq Df Pr(>Chisq)
fit_lin 7 2619 2659 -1303 2605 NA NA NA
. 8 2620 2665 -1302 2604 0.7485 1 0.3869

My recommendation would be to omit interaction terms involving lineage for model parsimony, but a full model could include both interactions above. This may be desirable for interpretability, but won’t change the inferences about effects. I checked but suppressed the analyses – codes are shown below.

Code
# full model with all interactions
fit_full <- glmer(germ ~ species*location + (1 | plot:location/species) + (1 | family:species/location) + (1 | family:species:location:plot),
                 data = germ, family = binomial) 

# inference for fixed effects
car::Anova(fit_full, type = 'II')

# inference for random effects: lineage/plot interaction
glmer(germ ~ species*location + (1 | plot:location/species) + (1 | family:species/location),
                 data = germ, family = binomial) %>%
  anova(fit_full, .)

# inference for random effects: lineage/location interaction
glmer(germ ~ species*location + (1 | plot:location/species) + (1 | family:species) + (1 | family:species:location:plot),
                 data = germ, family = binomial) %>%
  anova(fit_full, .)

# inference for random effects: lineage effect
glmer(germ ~ species*location + (1 | plot:location/species),
                 data = germ, family = binomial) %>%
  anova(fit_full, .)

# inference for random effects: plot effect
glmer(germ ~ species*location + (1 | family:species/location),
                 data = germ, family = binomial) %>%
  anova(fit_full, .)

# inference for random effects: plot/species interaction
glmer(germ ~ species*location + (1 | plot:location) + (1 | family:species),
                 data = germ, family = binomial) %>%
  anova(fit_full, .)

Flowering rates

Flowering rates could be computed in one of two ways:

  1. Unconditional on germination, i.e., rate at which all toothpicks flowered
  2. Conditional on germination, i.e., rate at which germinated toothpicks flowered

I’ve carried out parallel analyses using both rates. Personally, I think the conditional rates make more sense for the analysis.

Unconditional flowering rates

Below are comparisons of the unconditional flowering rates.

Code
# binary indicator for flowering (unconditional)
flow <- seedplot |>
  drop_na(stage) |>
  group_by(id, species, family, location, plot) |>
  summarize(flow = factor(sum(n.flower) > 0, 
                          levels = c(F, T), 
                          labels = c('no', 'yes')),
            .groups = 'drop')

Figure: Unconditional flowering rates by species, lineage, and location; average shown in blue dash.

Since some families have unconditional flowering rates of zero, it will not be possible to estimate lineage interactions. The analysis below uses the same model specification as that employed in the analysis of germination rates. (Codes are also identical.)

Inference for fixed effects indicates there is a species effect on unconditional flowering rates:

Code
# fit model
fit_flow <- glmer(flow ~ species*location + (1 | plot:location/species) + (1 | family:species),
                  data = flow, family = binomial) 
Table: Inference for fixed effects
  Chisq Df Pr(>Chisq)
species 15.8 1 7.025e-05
location 0.1033 1 0.7479
species:location 3.173 1 0.07489

Inference for random effects indicates a lineage effect, a plot effect, and a species/plot interaction. Codes for these analyses are identical to those shown in the previous subsection on analysis of germination rates.

Table: Likelihood ratio test for lineage effect on unconditional flowering rates
  npar AIC BIC logLik -2*log(L) Chisq Df Pr(>Chisq)
. 6 1416 1450 -701.9 1404 NA NA NA
fit_flow 7 1400 1439 -692.9 1386 17.98 1 2.236e-05
Table: Likelihood ratio test for plot effect on unconditional flowering rates
  npar AIC BIC logLik -2*log(L) Chisq Df Pr(>Chisq)
. 5 1528 1557 -759.2 1518 NA NA NA
fit_flow 7 1400 1439 -692.9 1386 132.6 2 1.637e-29
Table: Likelihood ratio test for species/plot interaction effect on unconditional flowering rates
  npar AIC BIC logLik -2*log(L) Chisq Df Pr(>Chisq)
. 6 1406 1440 -696.9 1394 NA NA NA
fit_flow 7 1400 1439 -692.9 1386 7.928 1 0.004868

Higher-order interactions produce numerical instability in the model fitting and so are omitted. Estimates of unconditional flowering rates for each species/location combination are given below.

Table: Estimated (unconditional) flowering probabilities and 95% confidence intervals
species location prob std.error conf.low conf.high
LLA rock_outcrop 0.1265 0.0613 0.04653 0.3005
LLP rock_outcrop 0.02323 0.01351 0.00735 0.07097
LLA seep 0.1163 0.05412 0.04476 0.2697
LLP seep 0.04893 0.02549 0.01729 0.1308

Graphically:

Marginal flowering rates by species (averaged over location) and location (averaged over species) are shown below.

Table: Estimated marginal unconditional flowering rates by species (averaged over location)
species prob std.error conf.low conf.high
LLA 0.1213 0.04285 0.05905 0.2328
LLP 0.0338 0.01389 0.01497 0.0745
Table: Estimated marginal unconditional flowering rates by location (averaged over species)
location prob std.error conf.low conf.high
rock_outcrop 0.05543 0.02794 0.0202 0.1431
seep 0.07602 0.03512 0.02995 0.1798

Lastly, factorial contrasts yield comparisons of marginal flowering rates between (a) locations and (b) species.

Table: Factorial contrasts for marginal species and location effects on unconditional flowering rates
contrast odds.ratio SE conf.low conf.high
loc 1.402 0.9868 0.3529 5.57
spp 3.945 1.312 2.056 7.57

This quantifies the significant species effect as follows: with 95% confidence, the odds of flowering are estimated to be between 2.06 and 7.57 times higher among annuals compared with perennials (point estimate: 3.95).

Conditional flowering rates

Below are comparisons of the conditional flowering rates.

Code
# binary indicator for flowering (conditional on germination)
cflow <- seedplot |>
  drop_na(stage) |>
  filter(stage > 'ungerminated') |> # notice filter here for conditional rate
  group_by(id, species, family, location, plot) |>
  summarize(flow = factor(sum(n.flower) > 0, 
                          levels = c(F, T), 
                          labels = c('no', 'yes')),
            .groups = 'drop')

Figure: Conditional flowering rates by species, lineage, and location; average shown in blue dash.

Since the conditional rates are determined after filtering out ungerminated plants, the relevant sample sizes change. After filtering:

Table: Sample sizes for conditional flowering rates by species, lineage, and plot
species family location 1 2 3 4 5 6 7 8 9 10
LLA A12x19 rock_outcrop NA 5 NA 3 3 2 3 4 4 4
LLA A12x9 rock_outcrop 2 3 NA 4 6 4 1 7 6 4
LLA A19x12 rock_outcrop 2 3 NA 4 6 4 3 2 2 5
LLA A24x30 rock_outcrop NA 4 NA 1 2 3 NA 1 3 6
LLA A30x3 rock_outcrop 2 5 NA 6 3 2 1 4 5 4
LLA A33x24 rock_outcrop NA 5 NA 7 3 2 4 1 5 6
LLA A3x21 rock_outcrop 2 5 NA 4 4 6 1 3 4 4
LLA A7x10 rock_outcrop NA 4 NA 3 5 3 2 4 4 3
LLA A9x33 rock_outcrop 1 5 NA 2 4 4 1 4 3 3
LLP P10x7 rock_outcrop 1 3 NA 4 4 2 2 4 4 4
LLP P13x14 rock_outcrop 1 4 NA 3 5 5 4 5 5 3
LLP P14x16 rock_outcrop NA 4 NA 5 7 2 4 4 3 3
LLP P15x1 rock_outcrop 1 4 NA 5 5 5 5 2 3 3
LLP P17x6 rock_outcrop 2 4 NA 6 5 1 6 5 6 5
LLP P1x8 rock_outcrop 2 4 NA 5 4 6 4 5 5 3
LLP P20x4 rock_outcrop 1 6 NA 4 5 2 1 5 5 5
LLP P6x20 rock_outcrop 1 4 NA 2 3 2 4 3 1 2
LLP P9x11 rock_outcrop 1 3 NA 3 5 4 3 3 5 4
LLA A12x19 seep 2 4 4 4 2 1 4 2 3 5
LLA A12x9 seep 2 3 4 5 6 2 5 5 4 5
LLA A19x12 seep 3 3 4 2 1 1 3 4 2 4
LLA A24x30 seep 3 3 3 4 7 NA 3 5 1 5
LLA A30x3 seep 2 5 4 4 5 1 6 4 4 3
LLA A33x24 seep 3 3 2 3 2 1 4 2 3 6
LLA A3x21 seep 1 4 2 4 3 NA 2 6 5 5
LLA A7x10 seep 2 5 6 5 2 1 6 3 3 5
LLA A9x33 seep 1 3 1 3 3 2 3 4 2 5
LLP P10x7 seep 2 3 5 2 4 3 4 3 3 4
LLP P13x14 seep 1 4 6 3 4 NA 5 6 4 5
LLP P14x16 seep 3 5 3 6 5 3 6 6 4 6
LLP P15x1 seep 2 4 4 2 4 1 6 3 2 6
LLP P17x6 seep 4 4 6 3 4 1 5 5 4 6
LLP P1x8 seep 3 4 4 3 4 2 5 4 4 5
LLP P20x4 seep 2 4 3 4 3 2 3 5 3 4
LLP P6x20 seep 4 3 3 2 3 NA 4 4 3 6
LLP P9x11 seep 1 5 4 2 5 2 NA 3 2 5

Because some plots have no germinated plants of particular lineages, it will not be possible to estimate the plot/lineage interaction. However, the row totals are sufficient that the lineage effect is still estimable.

Inference for fixed effects (again using the same model specification employed for analysis of germination rates and the same codes) indicates there is a species effect on conditional flowering rates (no change compared with analysis of unconditional rates):

Table: Inference for fixed effects on conditional flowering rates
  Chisq Df Pr(>Chisq)
species 20.61 1 5.625e-06
location 0.2234 1 0.6364
species:location 2.692 1 0.1008

Inference for random effects indicates a plot effect and a species/plot interaction and a lineage effect (no change compared with the analysis of unconditional flowering rates):

Table: Likelihood ratio test for lineage effect on conditional flowering rates
  npar AIC BIC logLik -2*log(L) Chisq Df Pr(>Chisq)
. 6 1110 1141 -549 1098 NA NA NA
fit_cflow 7 1093 1128 -539.3 1079 19.39 1 1.065e-05
Table: Likelihood ratio test for plot effect on conditional flowering rates
  npar AIC BIC logLik -2*log(L) Chisq Df Pr(>Chisq)
. 5 1174 1199 -581.8 1164 NA NA NA
fit_cflow 7 1093 1128 -539.3 1079 84.88 2 3.695e-19
Table: Likelihood ratio test for species/plot interaction effect on conditional flowering rates
  npar AIC BIC logLik -2*log(L) Chisq Df Pr(>Chisq)
. 6 1099 1130 -543.7 1087 NA NA NA
fit_cflow 7 1093 1128 -539.3 1079 8.755 1 0.003087

Estimates of conditional flowering rates for each species/location combination are given below.

Table: Estimated (conditional) flowering probabilities and 95% confidence intervals
species location prob std.error conf.low conf.high
LLA rock_outcrop 0.2623 0.09818 0.1162 0.4901
LLP rock_outcrop 0.04167 0.02202 0.01454 0.1136
LLA seep 0.253 0.09095 0.1165 0.4652
LLP seep 0.08892 0.04078 0.03512 0.2075

Graphically:

Marginal flowering rates by species (averaged over location) and location (averaged over species) are shown below.

Table: Marginal conditional flowering rates by species (averaged over location)
species prob std.error conf.low conf.high
LLA 0.2576 0.07143 0.143 0.4191
LLP 0.06116 0.02293 0.02893 0.1247
Table: Marginal conditional flowering rates by location (averaged over species)
location prob std.error conf.low conf.high
rock_outcrop 0.1106 0.04691 0.04655 0.2405
seep 0.1538 0.05784 0.07072 0.3028

Lastly, factorial contrasts yield comparisons of marginal flowering rates between (a) locations and (b) species.

Table: Factorial contrasts for marginal species and location effects on conditional flowering rates
contrast odds.ratio SE conf.low conf.high
loc 1.462 0.9019 0.4365 4.898
spp 5.326 1.909 2.638 10.75

This quantifies the significant species effect as follows: with 95% confidence, the odds of flowering are estimated to be between 2.63 and 10.75 times higher among annuals compared with perennials (point estimate: 5.33).

Fruiting rates

Fruiting rates could be computed in one of three ways:

  1. Unconditional on germination, i.e., rate at which all toothpicks produced fruit
  2. Conditional on germination, i.e., rate at which germinated toothpicks produced fruit
  3. Conditional on flowering, i.e., rate at which flowered plants produced fruit

Below I’ve only shown the analysis for option (3). To explore options 1-2, remove or change the filter on life stage in the codes shown below.

Code
# binary indicator for fruiting (conditional on flowering)
fruit <- seedplot |>
  drop_na(stage) |>
  filter(stage > 'germinated') |> # notice filter here for conditional rate
  group_by(id, species, family, location, plot) |>
  summarize(fruit = factor(sum(n.fruit) > 0, 
                          levels = c(F, T), 
                          labels = c('no', 'yes')),
            .groups = 'drop')

Figure: Conditional fruiting rates by species, lineage, and location; average shown in blue dash.

Since some families have conditional fruiting rates of zero, it will not be possible to estimate lineage interactions. The analysis below uses the same model specification as that employed in the analysis of germination rates. (Codes are also identical.)

Inference for fixed effects indicates there is a species effect on conditional fruiting rates:

Table: Inference for fixed effects on conditional fruiting rate
  Chisq Df Pr(>Chisq)
species 8.955 1 0.002767
location 0.1853 1 0.6668
species:location 2.163 1 0.1413

Inference for random effects indicates a plot effect and a species/plot interaction but no lineage effect:

Table: Likelihood ratio test for lineage effect on conditional fruiting rates
  npar AIC BIC logLik -2*log(L) Chisq Df Pr(>Chisq)
. 6 672.2 702.6 -330.1 660.2 NA NA NA
fit_fruit 7 672.6 708 -329.3 658.6 1.596 1 0.2064
Table: Likelihood ratio test for plot effect on conditional fruiting rates
  npar AIC BIC logLik -2*log(L) Chisq Df Pr(>Chisq)
. 5 707.6 732.9 -348.8 697.6 NA NA NA
fit_fruit 7 672.6 708 -329.3 658.6 39.01 2 3.377e-09
Table: Likelihood ratio test for species/plot interaction effect on conditional fruiting rates
  npar AIC BIC logLik -2*log(L) Chisq Df Pr(>Chisq)
. 6 676.2 706.5 -332.1 664.2 NA NA NA
fit_fruit 7 672.6 708 -329.3 658.6 5.588 1 0.01809

Higher-order interactions produce numerical instability in the model fitting and so are omitted. Estimates of conditional fruiting rates for each species/location combination are given below.

Table: Estimated (conditional) fruiting probabilities and 95% confidence intervals
species location prob std.error conf.low conf.high
LLA rock_outcrop 0.1183 0.04827 0.05135 0.2494
LLP rock_outcrop 0.02249 0.01223 0.007673 0.06406
LLA seep 0.06554 0.02804 0.02779 0.1468
LLP seep 0.03452 0.01773 0.01245 0.09207

Graphically:

Marginal fruiting rates by species (averaged over location) and location (averaged over species) are shown below.

Table: Estimated marginal conditional fruiting rates by species (averaged over location)
species prob std.error conf.low conf.high
LLA 0.08841 0.02763 0.0472 0.1596
LLP 0.02788 0.01113 0.01267 0.06026
Table: Estimated marginal conditional fruiting rates by location (averaged over species)
location prob std.error conf.low conf.high
rock_outcrop 0.05262 0.02147 0.02333 0.1144
seep 0.04769 0.01888 0.02169 0.1016

Lastly, factorial contrasts yield comparisons of marginal fruiting rates between (a) locations and (b) species.

Table: Factorial contrasts for marginal species and location effects on conditional fruiting rates
contrast odds.ratio SE conf.low conf.high
loc 0.9015 0.4997 0.3042 2.671
spp 3.382 1.364 1.534 7.455

This quantifies the significant species effect as follows: with 95% confidence, the odds of fruiting are estimated to be between 1.53 and 7.46 times higher among annuals compared with perennials (point estimate: 3.38).

Analysis of time-to-event data

If you’re interested in time patterns, it’s fairly easy to obtain first dates for each stage. Here’s what that looks like for one individual that saw all life stages:

Code
# earliest dates for each life stage
first_dates <- seedplot |>
  drop_na(stage) |>
  group_by(id, species, family, location, plot, replicate, stage) |>
  slice_min(date) |>
  select(-starts_with('n'), -week) |>
  ungroup()

# preview
first_dates |>
  filter(id == 140) |>
  pander(caption = 'Table: dates to life stage events for individual no. 140')
Table: dates to life stage events for individual no. 140
id species family location plot replicate date stage
140 LLP P15x1 rock_outcrop 2 4 2024-01-25 ungerminated
140 LLP P15x1 rock_outcrop 2 4 2024-02-01 germinated
140 LLP P15x1 rock_outcrop 2 4 2024-04-12 flowered
140 LLP P15x1 rock_outcrop 2 4 2024-05-31 fruited
140 LLP P15x1 rock_outcrop 2 4 2024-06-07 died

This can be rearranged to calculate times to each stage. Here are some examples for individuals who reached different life stages:

Code
# time to event data
durations <- first_dates |>
  pivot_wider(names_from = stage, values_from = date) |>
  rename(first.obs = ungerminated) |>
  mutate(time.to.germ = germinated - ymd("2024-01-17"),
         time.to.flower = flowered - ymd("2024-01-17"),
         time.to.fruit = fruited - ymd("2024-01-17"),
         time.to.death = died - ymd("2024-01-17")) 
Table continues below
id first.obs germinated flowered fruited died
25 2024-01-25 2024-02-08 NA NA 2024-03-14
131 2024-01-25 2024-03-21 2024-05-03 NA 2024-05-09
140 2024-01-25 2024-02-01 2024-04-12 2024-05-31 2024-06-07
time.to.germ time.to.flower time.to.fruit time.to.death
22 days NA NA 57 days
64 days 107 days NA 113 days
15 days 86 days 135 days 142 days
Note

I manually input the start of the experiment as 1/17/25 because this was the earliest date that appeared in the metadata, and is consistent with the week 1 observation being 1/25/25. Client should confirm and adjust start date if needed so that durations are accurate.

Here are distributions of times to each event by species and location:

Figure: Distributions of times to life stage events by species and location

It’s perhaps not the most efficient comparison, but one can see:

  • approximate time ranges for each event
  • longer survival times and later flowering and fruiting in seep plots

Survival analysis methods can be used to estimate the (instantaneous) probability of germination/flowering/fruiting/death as a function of time. I focused on a simple approach using Kaplan-Meier estimators after grouping by location and species and ignoring plot and lineage effects. Mixed-effects Cox regression could be used for a proper analysis accounting for the experimental design with a specification analogous to that used above in modeling event rates in aggregate across the study period, but since this is a secondary analysis, I’ve just shown some simple exploratory work.

Missing durations are right-censored – all we know is that the event of interest occurred (or would have occurred) sometime after the end of the study. To format the data correctly, I’ve replaced missing durations with the number of days that the study persisted (442) and added an indicator of whether the corresponding observation is censored (i = 0) or not (i = 1).

Table: data formatted for analysis of time-to-event data
id species location t.g t.fl t.fr t.d i.g i.fl i.fr i.d
25 LLA rock_outcrop 22 442 442 57 1 0 0 1
30 LLA rock_outcrop 22 442 442 36 1 0 0 1
35 LLA rock_outcrop 36 442 442 86 1 0 0 1
38 LLA rock_outcrop 22 442 442 64 1 0 0 1
42 LLP rock_outcrop 22 442 442 29 1 0 0 1
46 LLP rock_outcrop 36 442 442 51 1 0 0 1

I’ve provided a copy of the data formatted in this way as seedplot-times.csv.

Code
# read in time to event data
tdata <- read_csv('seedplot-times.csv')

A quick caveat: it’s somewhat awkward to represent germination, flowering, and fruiting as right-censored since this implicitly assumes that these events would eventually occur, which is not true in this system. This is more conceptual than analytical – what will happen in the analysis is that after sufficient time the estimated instantaneous event probabilities will plateau at the long-run event rates.

Germination

Modeling the instantaneous probability of germination as a function of time since the start of the study shows relatively quick convergence to aggregate germination rates:

The bands show pointwise 95% confidence intervals, and the overlap is consistent with the prior analysis which found no effect of species or location on germination rates. This just displays that conclusion a bit differently. There doesn’t look to be much difference in the rate at which the germination probability increases with time, though the estimate for LLP does increase slightly faster in the rock outrcop microhabitat.

Flowering

For modeling time to flowering, I used:

  1. Flowering occurrences conditional on germination, i.e., all ungerminated individuals were removed prior to analysis
  2. The time since germination (rather than the time since the start of the experiment)

This yields the following estimates of the instantaneous probability of flowering as a function of time since germination:

There is a clear and persistent difference in flowering probability by species: annuals flower at higher rates, as one might expect. While there is no long-run apparent difference in flowering probability by microhabitat, the flowering probability increases faster in the rock outcrop microhabitat. This reflects the descriptive finding earlier that flowering occurs later in the seep microhabitat.

Fruiting

Similar to flowering, I estimated the instantaneous conditional probability of fruiting given germination, as a function of the time since germination.

Note

Client may prefer to model the conditional probability of fruiting given flowering as a function of time since flowering. If so, change Surv(t.fr - t.g, i.fr) to Surv(t.fr - t.fl, i.fr) in the model specification.

Interestingly, there appears to be an interaction effect here: depending on species, the rate of increase and plateau differs between environments.

Survival

Finally, an analysis of survival times after germination is given below. The figure shows the probability of survival as a function of time, and reflects the descriptive finding earlier that plants survived longer in the seep microhabitat.

Reproducibility

I’ve provided a copy of all codes used to generate analyses shown above starting from the client’s spreadsheet. In addition, I generated two derivative files with reformatted data:

  • seedplot.csv has data arranged for event occurrence rate analysis
  • seedplot-time.csv has data arranged for time-to-event analysis

These files are available (only client has access permissions) in a compressed folder here (you may need to right-click to open).