This analysis evaluates how precisely the current monitoring system estimates oyster height and length at each reef and assesses the smallest changes that can be statistically detected given the observed variability and sample sizes. By combining exploratory data analysis, confidence intervals, and minimum detectable effect (MDE) estimates, it provides insight into the system’s ability to detect meaningful biological changes across sites.

Note: These findings can also serve as a guide for defining future sampling strategies, by helping determine the sample sizes required to detect specific biological changes with a desired level of precision and confidence.


Exploratory Analysis

# Count number of samples per reef, year, and month
sampling_summary <- oyster_biometry %>%
  group_by(oyster_reef_name, sampling_year, sampling_month) %>%
  summarise(samples = n(), .groups = "drop") %>%
  arrange(oyster_reef_name, sampling_year, sampling_month)

ggplot(sampling_summary, aes(x = factor(sampling_year), y = samples)) +
  geom_bar(stat = "identity", fill = "#e6550d", alpha = 0.7) +
  facet_wrap(~ oyster_reef_name, scales = "fixed") +  # Set to "fixed"
  labs(
    title = "Sampling Effort per Reef by Year",
    x = "Year",
    y = "Number of Samples"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

Sampling Effort by Year: Sampling effort varies across reefs and years. While reefs like Terra Amarela, Romana, and Aquavila show consistent sampling across all four years, others like Áries and Goiabal have more limited or uneven data. These disparities in sampling intensity can influence the reliability and comparability of reef-level estimates over time.

# Split your reefs into two groups
reef_group_1 <- c("Água Boa", "Aquavila", "Áries", "Goiabal", "Jacarequara", "Lauro Sodré")
reef_group_2 <- c("Marauá", "Pinheiro", "Romana", "Terra Amarela", "Tio Oscar")

# First group
oyster_biometry %>%
  filter(oyster_reef_name %in% reef_group_1) %>%
  ggplot(aes(x = height_mm)) +
  geom_histogram(binwidth = 5, fill = "steelblue", color = "white", alpha = 0.7) +
  facet_grid(oyster_reef_name ~ sampling_year) +
  labs(
    title = "Distribution of Oyster Height by Reef and Year",
    x = "Oyster Height (mm)",
    y = "Count"
  ) +
  theme_minimal() +
  theme(
    strip.text.y = element_text(size = 7)  # adjust this to fit better
  )


# Second group
oyster_biometry %>%
  filter(oyster_reef_name %in% reef_group_2) %>%
  ggplot(aes(x = height_mm)) +
  geom_histogram(binwidth = 5, fill = "steelblue", color = "white", alpha = 0.7) +
  facet_grid(oyster_reef_name ~ sampling_year) +
  labs(
    title = "Distribution of Oyster Height by Reef and Year",
    x = "Oyster Height (mm)",
    y = "Count"
  ) +
  theme_minimal() +
  theme(
    strip.text.y = element_text(size = 7)  # adjust this to fit better
  )

# First group
oyster_biometry %>%
  filter(oyster_reef_name %in% reef_group_1) %>%
  ggplot(aes(x = length_mm)) +
  geom_histogram(binwidth = 5, fill = "darkgreen", color = "white", alpha = 0.7) +
  facet_grid(oyster_reef_name ~ sampling_year) +
  labs(
    title = "Distribution of Oyster Length by Reef and Year",
    x = "Oyster Length (mm)",
    y = "Count"
  ) +
  theme_minimal() +
  theme(
    strip.text.y = element_text(size = 7)  # adjust this to fit better
  )


# Second group
oyster_biometry %>%
  filter(oyster_reef_name %in% reef_group_2) %>%
  ggplot(aes(x = length_mm)) +
  geom_histogram(binwidth = 5, fill = "darkgreen", color = "white", alpha = 0.7) +
  facet_grid(oyster_reef_name ~ sampling_year) +
  labs(
    title = "Distribution of Oyster Length by Reef and Year",
    x = "Oyster Length (mm)",
    y = "Count"
  ) +
  theme_minimal() +
  theme(
    strip.text.y = element_text(size = 7)  # adjust this to fit better
  )

Oyster Height and Length Distributions (by Reef and Year): Distributions are generally unimodal and approximately normal in most reefs and years, supporting the use of parametric methods for mean-based analyses. However, some distributions (especially in reefs with low sample sizes like Áries and Goiabal) show skewness or irregular shapes, suggesting higher uncertainty in those estimates.


# Trend in height by reef over years
ggplot(oyster_biometry, aes(x = sampling_year, y = height_mm)) +
  stat_summary(fun = mean, geom = "point", color = "steelblue") +
  stat_summary(fun.data = mean_cl_normal, geom = "errorbar", width = 0.2, color = "steelblue") + #CI 95%: assessing statistical reliability (e.g., “we’re 95% confident the true oyster height is between 41 and 44 mm”)
  facet_wrap(~ oyster_reef_name) +
  labs(title = "Mean Oyster Height with 95% Confidence Intervals by Reef and Year", y = "Mean Height (mm)", x = "Year") +
  theme_minimal()

# Trend in height by reef over years
ggplot(oyster_biometry, aes(x = sampling_year, y = length_mm)) +
  stat_summary(fun = mean, geom = "point", color = "darkgreen") +
  stat_summary(fun.data = mean_cl_normal, geom = "errorbar", width = 0.2, color = "darkgreen") + 
  facet_wrap(~ oyster_reef_name) +
  labs(title = "Mean Oyster Length with 95% Confidence Intervals by Reef and Year", y = "Mean Length (mm)", x = "Year") +
  theme_minimal()

Mean and Confidence Interval Plots (Height and Length): The mean oyster size (both height and length) varies across reefs and years, with several showing noticeable interannual changes. Confidence intervals are narrow in reefs with high sample sizes, indicating precise estimates, while wider intervals in low-sample reefs reflect higher uncertainty.

# Stratify data by reef and year
summary_by_reef_year <- oyster_biometry %>%
  group_by(oyster_reef_name, sampling_year) %>%
  summarise(
    n = n(),
    mean_height = mean(height_mm, na.rm = TRUE),
    sd_height = sd(height_mm, na.rm = TRUE),
    mean_length = mean(length_mm, na.rm = TRUE),
    sd_length = sd(length_mm, na.rm = TRUE),
    .groups = "drop"
  )

summary_by_reef_year %>%
  select(
    Reef = oyster_reef_name,
    Year = sampling_year,
    `Mean Height (mm)` = mean_height,
    `SD Height (mm)` = sd_height,
    `Mean Length (mm)` = mean_length,
    `SD Length (mm)` = sd_length,
    `Sample Size (n)` = n
  ) %>%
  mutate(across(where(is.numeric), ~ round(.x, 2))) %>%
  kbl(
    caption = "Table: Summary statistics of oyster height and length by reef and year.",
    align = "lccccc",
    booktabs = TRUE
  ) %>%
  kable_styling(
    full_width = FALSE,
    bootstrap_options = c("striped", "hover", "condensed", "responsive")
  )
Table: Summary statistics of oyster height and length by reef and year.
Reef Year Mean Height (mm) SD Height (mm) Mean Length (mm) SD Length (mm) Sample Size (n)
Aquavila 2021 53.77 11.95 41.01 9.82 195
Aquavila 2022 54.72 12.94 39.86 10.45 200
Aquavila 2023 55.49 12.59 37.86 9.93 196
Aquavila 2024 42.12 14.20 59.58 18.63 224
Goiabal 2021 24.67 5.40 21.55 5.26 15
Goiabal 2022 40.70 5.50 33.40 5.85 10
Goiabal 2023 49.60 5.08 37.00 5.03 10
Goiabal 2024 30.07 9.94 38.74 13.06 42
Jacarequara 2021 53.00 8.49 39.85 9.17 195
Jacarequara 2022 38.86 11.35 28.74 8.07 163
Jacarequara 2023 45.72 10.48 35.37 8.50 191
Jacarequara 2024 30.31 11.70 39.63 15.78 180
Lauro Sodré 2021 42.46 10.10 33.33 8.51 226
Lauro Sodré 2022 42.75 8.53 35.37 7.79 178
Lauro Sodré 2023 40.43 9.20 30.28 7.32 176
Lauro Sodré 2024 31.07 9.65 38.20 11.82 206
Marauá 2021 45.94 8.89 35.51 6.65 188
Marauá 2022 44.29 9.36 32.56 6.62 195
Marauá 2023 45.77 8.37 33.01 6.27 200
Marauá 2024 27.96 11.21 37.10 14.51 326
Pinheiro 2021 47.42 10.48 35.19 8.26 220
Pinheiro 2022 44.46 9.12 34.83 7.50 195
Pinheiro 2023 52.38 9.18 37.38 7.54 200
Pinheiro 2024 31.97 13.48 38.49 15.99 295
Romana 2021 30.84 9.23 25.59 7.56 174
Romana 2022 36.67 9.83 29.81 8.59 155
Romana 2023 43.74 13.07 36.28 11.68 141
Romana 2024 29.19 13.18 36.54 16.79 339
Terra Amarela 2021 44.18 9.63 32.31 8.04 232
Terra Amarela 2022 48.29 11.04 35.43 8.05 234
Terra Amarela 2023 48.95 10.69 37.36 9.71 240
Terra Amarela 2024 30.03 12.73 36.03 14.62 239
Tio Oscar 2021 46.60 9.60 37.38 9.55 60
Tio Oscar 2022 47.46 7.56 37.84 6.74 50
Tio Oscar 2023 46.20 8.58 35.63 7.51 100
Tio Oscar 2024 30.78 8.78 36.73 10.26 132
Água Boa 2021 41.79 8.69 33.38 8.24 79
Água Boa 2022 39.31 11.66 28.47 9.53 115
Água Boa 2023 39.82 9.28 29.25 7.33 103
Água Boa 2024 29.28 12.88 36.47 15.41 43
Áries 2021 19.31 4.73 16.58 8.69 30
Áries 2024 25.81 10.97 32.69 13.17 85

Summary Table by Reef and Year: The table provides a clear numeric overview of sample size, mean, and standard deviation per reef and year. It highlights both the central tendencies and variability in oyster dimensions. It also allows rapid identification of reefs with strong data coverage (e.g., >200 samples per year) and those needing increased effort to improve precision.


Confidence Intervals

Answer: How precisely are we estimating the current average height and length of oysters at each reef?

I’ve addressed this question above (see mean and 95% confidence intervals plots shown above). Below, I provide the numerical values for the confidence intervals in 2024 and illustrate how they can be interpreted in practice.

Note:

  • A narrow CI around the mean means high precision (our estimate is close to the true population value)

  • A wide CI means lower precision (our current sample might not be enough to estimate the true mean reliably)

# Add CI bounds
summary_by_reef_year <- summary_by_reef_year %>%
  mutate(
    se_height = sd_height / sqrt(n),
    ci_lower = mean_height - 1.96 * se_height,
    ci_upper = mean_height + 1.96 * se_height
  )

#Plot year 2024
ggplot(filter(summary_by_reef_year, sampling_year == 2024),
       aes(x = reorder(oyster_reef_name, mean_height), y = mean_height)) +
  geom_point(color = "steelblue") +
  geom_errorbar(aes(ymin = ci_lower, ymax = ci_upper), width = 0.2, color = "steelblue") +
  coord_flip() +
  labs(
    title = "Mean Oyster Height by Reef in 2024 with 95% Confidence Intervals",
    y = "Height (mm)",
    x = "Reef"
  ) +
  theme_minimal()


# Table
summary_by_reef_year %>%
  filter(sampling_year == 2024) %>%
  select(Reef = oyster_reef_name, 
         `Mean Height (mm)` = mean_height, 
         `Lower 95% CI` = ci_lower, 
         `Upper 95% CI` = ci_upper) %>%
  mutate(across(where(is.numeric), ~round(.x, 2))) %>%
  kbl(
    caption = "Table: Mean Oyster Height and 95% Confidence Intervals by Reef (2024).",
    align = "lccc",
    booktabs = TRUE
  ) %>%
  kable_styling(
    full_width = FALSE,
    bootstrap_options = c("striped", "hover", "condensed", "responsive")
  )
Table: Mean Oyster Height and 95% Confidence Intervals by Reef (2024).
Reef Mean Height (mm) Lower 95% CI Upper 95% CI
Aquavila 42.12 40.26 43.98
Goiabal 30.07 27.06 33.08
Jacarequara 30.31 28.60 32.02
Lauro Sodré 31.07 29.75 32.39
Marauá 27.96 26.75 29.18
Pinheiro 31.97 30.43 33.51
Romana 29.19 27.79 30.60
Terra Amarela 30.03 28.42 31.64
Tio Oscar 30.78 29.28 32.28
Água Boa 29.28 25.43 33.13
Áries 25.81 23.48 28.14
# Add CI bounds
summary_by_reef_year <- summary_by_reef_year %>%
  mutate(
    se_length = sd_length / sqrt(n),
    ci_lower_length = mean_length - 1.96 * se_length,
    ci_upper_length = mean_length + 1.96 * se_length
  )

#Plot year 2024
ggplot(filter(summary_by_reef_year, sampling_year == 2024),
       aes(x = reorder(oyster_reef_name, mean_length), y = mean_length)) +
  geom_point(color = "darkgreen") +
  geom_errorbar(aes(ymin = ci_lower_length, ymax = ci_upper_length), 
                width = 0.2, color = "darkgreen") +
  coord_flip() +
  labs(
    title = "Mean Oyster Length by Reef in 2024 with 95% Confidence Intervals",
    y = "Length (mm)",
    x = "Reef"
  ) +
  theme_minimal()


# Table
summary_by_reef_year %>%
  filter(sampling_year == 2024) %>%
  select(Reef = oyster_reef_name, 
         `Mean Length (mm)` = mean_length, 
         `Lower 95% CI` = ci_lower_length, 
         `Upper 95% CI` = ci_upper_length) %>%
  mutate(across(where(is.numeric), ~round(.x, 2))) %>%
  kbl(
    caption = "Table: Mean Oyster Length and 95% Confidence Intervals by Reef (2024).",
    align = "lccc",
    booktabs = TRUE
  ) %>%
  kable_styling(
    full_width = FALSE,
    bootstrap_options = c("striped", "hover", "condensed", "responsive")
  )
Table: Mean Oyster Length and 95% Confidence Intervals by Reef (2024).
Reef Mean Length (mm) Lower 95% CI Upper 95% CI
Aquavila 59.58 57.14 62.02
Goiabal 38.74 34.79 42.69
Jacarequara 39.63 37.33 41.94
Lauro Sodré 38.20 36.59 39.81
Marauá 37.10 35.53 38.68
Pinheiro 38.49 36.67 40.32
Romana 36.54 34.75 38.33
Terra Amarela 36.03 34.18 37.88
Tio Oscar 36.73 34.98 38.49
Água Boa 36.47 31.86 41.07
Áries 32.69 29.89 35.49

Example Interpretation of Confidence Intervals:

In Pinheiro reef, the mean oyster length in 2024 was estimated at 38.49 mm, with a 95% confidence interval of 36.67 to 40.32 mm. This means we can be 95% confident that the true average oyster length at Pinheiro reef lies between 36.67 mm and 40.32 mm. The relatively narrow interval indicates that the estimate is statistically reliable and the current monitoring effort provides good precision at this site.

By contrast, in Água Boa reef, the mean oyster length in 2024 was 36.47 mm, but the confidence interval was much wider: 31.86 to 41.07 mm. This wider interval suggests lower reliability, likely due to smaller sample size or higher variability.

How do we know if a CI is good enough?

One practical benchmark is to check whether the confidence interval falls within a certain percentage of the mean. For example:

  • If we aim for precision within ±10% of the mean, then for a reef with a mean of 40 mm, the ideal CI would fall roughly within 36 to 44 mm.

  • In the case of Pinheiro, the CI range is about ±5% of the mean, which is excellent.

  • In Água Boa, the CI spans more than ±10%, which suggests the need for increased sampling or reduced variability to improve estimate reliability.


Minimum Detectable Effect

Answer: What is the smallest change in oyster height or length that the current monitoring system can reliably detect at each reef, with 80% power and 5% significance?

I calculate this using only 2024 samples, to reflect the actual performance and sensitivity of the current monitoring system. I’ll also show an example below to illustrate how to interpret these results in practice.

Note:

  • An 80% power means there’s an 80% chance of detecting a real change of that size if it actually exists (i.e., a low risk of a false negative)

  • A 5% significance level means we’re accepting a 5% risk of falsely detecting a change when there isn’t one (i.e., a false positive)


# Z-scores for 95% confidence and 80% power
z_alpha <- qnorm(1 - 0.05 / 2)
z_beta <- qnorm(0.80)

# Subset to 2024 data and calculate MDE
mde_2024 <- summary_by_reef_year %>%
  filter(sampling_year == 2024) %>%
  mutate(
    # Height MDE
    mde_height_mm = (z_alpha + z_beta) * (sd_height / sqrt(n)),
    mde_height_pct = 100 * mde_height_mm / mean_height,
    
    # Length MDE
    mde_length_mm = (z_alpha + z_beta) * (sd_length / sqrt(n)),
    mde_length_pct = 100 * mde_length_mm / mean_length)
mde_2024 %>%
  select(
    Reef = oyster_reef_name,
    `Mean Height (mm)` = mean_height,
    `MDE (mm)` = mde_height_mm,
    `MDE (% of mean)` = mde_height_pct
  ) %>%
  mutate(across(where(is.numeric), round, 2)) %>%
  kbl(
    caption = "Table: Minimum detectable effect in oyster height by reef (2024 samples only).",
    align = "lccc",
    booktabs = TRUE
  ) %>%
  kable_styling(
    full_width = FALSE,
    bootstrap_options = c("striped", "hover", "condensed", "responsive")
  )
Table: Minimum detectable effect in oyster height by reef (2024 samples only).
Reef Mean Height (mm) MDE (mm) MDE (% of mean)
Aquavila 42.12 2.66 6.31
Goiabal 30.07 4.30 14.29
Jacarequara 30.31 2.44 8.06
Lauro Sodré 31.07 1.88 6.06
Marauá 27.96 1.74 6.22
Pinheiro 31.97 2.20 6.88
Romana 29.19 2.00 6.87
Terra Amarela 30.03 2.31 7.68
Tio Oscar 30.78 2.14 6.95
Água Boa 29.28 5.50 18.79
Áries 25.81 3.33 12.91
mde_2024 %>%
  select(
    Reef = oyster_reef_name,
    `Mean Length (mm)` = mean_length,
    `MDE (mm)` = mde_length_mm,
    `MDE (% of mean)` = mde_length_pct
  ) %>%
  mutate(across(where(is.numeric), round, 2)) %>%
  kbl(
    caption = "Table: Minimum detectable effect in oyster length by reef (2024 samples only).",
    align = "lccc",
    booktabs = TRUE
  ) %>%
  kable_styling(
    full_width = FALSE,
    bootstrap_options = c("striped", "hover", "condensed", "responsive")
  )
Table: Minimum detectable effect in oyster length by reef (2024 samples only).
Reef Mean Length (mm) MDE (mm) MDE (% of mean)
Aquavila 59.58 3.49 5.85
Goiabal 38.74 5.64 14.57
Jacarequara 39.63 3.29 8.31
Lauro Sodré 38.20 2.31 6.04
Marauá 37.10 2.25 6.07
Pinheiro 38.49 2.61 6.77
Romana 36.54 2.55 6.99
Terra Amarela 36.03 2.65 7.35
Tio Oscar 36.73 2.50 6.81
Água Boa 36.47 6.58 18.05
Áries 32.69 4.00 12.24

Example Interpretation of Minimum Detectable Effect:

In Pinheiro reef, the mean oyster length in 2024 was estimated at 38.49 mm, and the current monitoring design allows us to detect a minimum change of 2.61 mm — approximately 6.8% of the mean. This suggests that even relatively small shifts in oyster length can be statistically detected at this site, indicating a good sensitivity of the monitoring system.

By contrast, in Água Boa, the mean length was 36.47 mm, but the minimum detectable effect was 6.58 mm, which is about 18% of the mean. This much larger threshold implies that only large changes in oyster length would be statistically detectable, highlighting low sensitivity — likely due to limited sample size or high variability in the data.


Conclusion

  • The current monitoring system shows consistent data collection in 2024 across most reefs, with several sites achieving over 200 samples. This strong sampling effort supports reliable statistical inference for oyster size estimates at these sites.

  • Confidence interval analysis demonstrates that at many reefs, mean estimates of oyster height and length are statistically precise.

  • Minimum detectable effect (MDE) analysis confirms that the current 2024 monitoring design is sensitive enough to detect biologically meaningful changes in oyster size across most reefs.

  • However, reefs with low sampling effort in 2024 (such as Água Boa, Áries, and Goiabal) show reduced precision and statistical sensitivity. Wide confidence intervals and high MDE values (above 12–18% of the mean), suggest that changes in oyster size must be much larger to be statistically detectable. Increasing sample size in these sites would improve reliability and sensitivity.