This analysis evaluates how precisely the current monitoring system estimates oyster height and length at each reef and assesses the smallest changes that can be statistically detected given the observed variability and sample sizes. By combining exploratory data analysis, confidence intervals, and minimum detectable effect (MDE) estimates, it provides insight into the system’s ability to detect meaningful biological changes across sites.
Note: These findings can also serve as a guide for defining future sampling strategies, by helping determine the sample sizes required to detect specific biological changes with a desired level of precision and confidence.
# Count number of samples per reef, year, and month
sampling_summary <- oyster_biometry %>%
group_by(oyster_reef_name, sampling_year, sampling_month) %>%
summarise(samples = n(), .groups = "drop") %>%
arrange(oyster_reef_name, sampling_year, sampling_month)
ggplot(sampling_summary, aes(x = factor(sampling_year), y = samples)) +
geom_bar(stat = "identity", fill = "#e6550d", alpha = 0.7) +
facet_wrap(~ oyster_reef_name, scales = "fixed") + # Set to "fixed"
labs(
title = "Sampling Effort per Reef by Year",
x = "Year",
y = "Number of Samples"
) +
theme_minimal() +
theme(legend.position = "none")
Sampling Effort by Year: Sampling effort varies across reefs and years. While reefs like Terra Amarela, Romana, and Aquavila show consistent sampling across all four years, others like Áries and Goiabal have more limited or uneven data. These disparities in sampling intensity can influence the reliability and comparability of reef-level estimates over time.
# Split your reefs into two groups
reef_group_1 <- c("Água Boa", "Aquavila", "Áries", "Goiabal", "Jacarequara", "Lauro Sodré")
reef_group_2 <- c("Marauá", "Pinheiro", "Romana", "Terra Amarela", "Tio Oscar")
# First group
oyster_biometry %>%
filter(oyster_reef_name %in% reef_group_1) %>%
ggplot(aes(x = height_mm)) +
geom_histogram(binwidth = 5, fill = "steelblue", color = "white", alpha = 0.7) +
facet_grid(oyster_reef_name ~ sampling_year) +
labs(
title = "Distribution of Oyster Height by Reef and Year",
x = "Oyster Height (mm)",
y = "Count"
) +
theme_minimal() +
theme(
strip.text.y = element_text(size = 7) # adjust this to fit better
)
# Second group
oyster_biometry %>%
filter(oyster_reef_name %in% reef_group_2) %>%
ggplot(aes(x = height_mm)) +
geom_histogram(binwidth = 5, fill = "steelblue", color = "white", alpha = 0.7) +
facet_grid(oyster_reef_name ~ sampling_year) +
labs(
title = "Distribution of Oyster Height by Reef and Year",
x = "Oyster Height (mm)",
y = "Count"
) +
theme_minimal() +
theme(
strip.text.y = element_text(size = 7) # adjust this to fit better
)
# First group
oyster_biometry %>%
filter(oyster_reef_name %in% reef_group_1) %>%
ggplot(aes(x = length_mm)) +
geom_histogram(binwidth = 5, fill = "darkgreen", color = "white", alpha = 0.7) +
facet_grid(oyster_reef_name ~ sampling_year) +
labs(
title = "Distribution of Oyster Length by Reef and Year",
x = "Oyster Length (mm)",
y = "Count"
) +
theme_minimal() +
theme(
strip.text.y = element_text(size = 7) # adjust this to fit better
)
# Second group
oyster_biometry %>%
filter(oyster_reef_name %in% reef_group_2) %>%
ggplot(aes(x = length_mm)) +
geom_histogram(binwidth = 5, fill = "darkgreen", color = "white", alpha = 0.7) +
facet_grid(oyster_reef_name ~ sampling_year) +
labs(
title = "Distribution of Oyster Length by Reef and Year",
x = "Oyster Length (mm)",
y = "Count"
) +
theme_minimal() +
theme(
strip.text.y = element_text(size = 7) # adjust this to fit better
)
Oyster Height and Length Distributions (by Reef and Year): Distributions are generally unimodal and approximately normal in most reefs and years, supporting the use of parametric methods for mean-based analyses. However, some distributions (especially in reefs with low sample sizes like Áries and Goiabal) show skewness or irregular shapes, suggesting higher uncertainty in those estimates.
# Trend in height by reef over years
ggplot(oyster_biometry, aes(x = sampling_year, y = height_mm)) +
stat_summary(fun = mean, geom = "point", color = "steelblue") +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", width = 0.2, color = "steelblue") + #CI 95%: assessing statistical reliability (e.g., “we’re 95% confident the true oyster height is between 41 and 44 mm”)
facet_wrap(~ oyster_reef_name) +
labs(title = "Mean Oyster Height with 95% Confidence Intervals by Reef and Year", y = "Mean Height (mm)", x = "Year") +
theme_minimal()
# Trend in height by reef over years
ggplot(oyster_biometry, aes(x = sampling_year, y = length_mm)) +
stat_summary(fun = mean, geom = "point", color = "darkgreen") +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", width = 0.2, color = "darkgreen") +
facet_wrap(~ oyster_reef_name) +
labs(title = "Mean Oyster Length with 95% Confidence Intervals by Reef and Year", y = "Mean Length (mm)", x = "Year") +
theme_minimal()
Mean and Confidence Interval Plots (Height and Length): The mean oyster size (both height and length) varies across reefs and years, with several showing noticeable interannual changes. Confidence intervals are narrow in reefs with high sample sizes, indicating precise estimates, while wider intervals in low-sample reefs reflect higher uncertainty.
# Stratify data by reef and year
summary_by_reef_year <- oyster_biometry %>%
group_by(oyster_reef_name, sampling_year) %>%
summarise(
n = n(),
mean_height = mean(height_mm, na.rm = TRUE),
sd_height = sd(height_mm, na.rm = TRUE),
mean_length = mean(length_mm, na.rm = TRUE),
sd_length = sd(length_mm, na.rm = TRUE),
.groups = "drop"
)
summary_by_reef_year %>%
select(
Reef = oyster_reef_name,
Year = sampling_year,
`Mean Height (mm)` = mean_height,
`SD Height (mm)` = sd_height,
`Mean Length (mm)` = mean_length,
`SD Length (mm)` = sd_length,
`Sample Size (n)` = n
) %>%
mutate(across(where(is.numeric), ~ round(.x, 2))) %>%
kbl(
caption = "Table: Summary statistics of oyster height and length by reef and year.",
align = "lccccc",
booktabs = TRUE
) %>%
kable_styling(
full_width = FALSE,
bootstrap_options = c("striped", "hover", "condensed", "responsive")
)
| Reef | Year | Mean Height (mm) | SD Height (mm) | Mean Length (mm) | SD Length (mm) | Sample Size (n) |
|---|---|---|---|---|---|---|
| Aquavila | 2021 | 53.77 | 11.95 | 41.01 | 9.82 | 195 |
| Aquavila | 2022 | 54.72 | 12.94 | 39.86 | 10.45 | 200 |
| Aquavila | 2023 | 55.49 | 12.59 | 37.86 | 9.93 | 196 |
| Aquavila | 2024 | 42.12 | 14.20 | 59.58 | 18.63 | 224 |
| Goiabal | 2021 | 24.67 | 5.40 | 21.55 | 5.26 | 15 |
| Goiabal | 2022 | 40.70 | 5.50 | 33.40 | 5.85 | 10 |
| Goiabal | 2023 | 49.60 | 5.08 | 37.00 | 5.03 | 10 |
| Goiabal | 2024 | 30.07 | 9.94 | 38.74 | 13.06 | 42 |
| Jacarequara | 2021 | 53.00 | 8.49 | 39.85 | 9.17 | 195 |
| Jacarequara | 2022 | 38.86 | 11.35 | 28.74 | 8.07 | 163 |
| Jacarequara | 2023 | 45.72 | 10.48 | 35.37 | 8.50 | 191 |
| Jacarequara | 2024 | 30.31 | 11.70 | 39.63 | 15.78 | 180 |
| Lauro Sodré | 2021 | 42.46 | 10.10 | 33.33 | 8.51 | 226 |
| Lauro Sodré | 2022 | 42.75 | 8.53 | 35.37 | 7.79 | 178 |
| Lauro Sodré | 2023 | 40.43 | 9.20 | 30.28 | 7.32 | 176 |
| Lauro Sodré | 2024 | 31.07 | 9.65 | 38.20 | 11.82 | 206 |
| Marauá | 2021 | 45.94 | 8.89 | 35.51 | 6.65 | 188 |
| Marauá | 2022 | 44.29 | 9.36 | 32.56 | 6.62 | 195 |
| Marauá | 2023 | 45.77 | 8.37 | 33.01 | 6.27 | 200 |
| Marauá | 2024 | 27.96 | 11.21 | 37.10 | 14.51 | 326 |
| Pinheiro | 2021 | 47.42 | 10.48 | 35.19 | 8.26 | 220 |
| Pinheiro | 2022 | 44.46 | 9.12 | 34.83 | 7.50 | 195 |
| Pinheiro | 2023 | 52.38 | 9.18 | 37.38 | 7.54 | 200 |
| Pinheiro | 2024 | 31.97 | 13.48 | 38.49 | 15.99 | 295 |
| Romana | 2021 | 30.84 | 9.23 | 25.59 | 7.56 | 174 |
| Romana | 2022 | 36.67 | 9.83 | 29.81 | 8.59 | 155 |
| Romana | 2023 | 43.74 | 13.07 | 36.28 | 11.68 | 141 |
| Romana | 2024 | 29.19 | 13.18 | 36.54 | 16.79 | 339 |
| Terra Amarela | 2021 | 44.18 | 9.63 | 32.31 | 8.04 | 232 |
| Terra Amarela | 2022 | 48.29 | 11.04 | 35.43 | 8.05 | 234 |
| Terra Amarela | 2023 | 48.95 | 10.69 | 37.36 | 9.71 | 240 |
| Terra Amarela | 2024 | 30.03 | 12.73 | 36.03 | 14.62 | 239 |
| Tio Oscar | 2021 | 46.60 | 9.60 | 37.38 | 9.55 | 60 |
| Tio Oscar | 2022 | 47.46 | 7.56 | 37.84 | 6.74 | 50 |
| Tio Oscar | 2023 | 46.20 | 8.58 | 35.63 | 7.51 | 100 |
| Tio Oscar | 2024 | 30.78 | 8.78 | 36.73 | 10.26 | 132 |
| Água Boa | 2021 | 41.79 | 8.69 | 33.38 | 8.24 | 79 |
| Água Boa | 2022 | 39.31 | 11.66 | 28.47 | 9.53 | 115 |
| Água Boa | 2023 | 39.82 | 9.28 | 29.25 | 7.33 | 103 |
| Água Boa | 2024 | 29.28 | 12.88 | 36.47 | 15.41 | 43 |
| Áries | 2021 | 19.31 | 4.73 | 16.58 | 8.69 | 30 |
| Áries | 2024 | 25.81 | 10.97 | 32.69 | 13.17 | 85 |
Summary Table by Reef and Year: The table provides a clear numeric overview of sample size, mean, and standard deviation per reef and year. It highlights both the central tendencies and variability in oyster dimensions. It also allows rapid identification of reefs with strong data coverage (e.g., >200 samples per year) and those needing increased effort to improve precision.
Answer: How precisely are we estimating the current average height and length of oysters at each reef?
I’ve addressed this question above (see mean and 95% confidence intervals plots shown above). Below, I provide the numerical values for the confidence intervals in 2024 and illustrate how they can be interpreted in practice.
Note:
A narrow CI around the mean means high precision (our estimate is close to the true population value)
A wide CI means lower precision (our current sample might not be enough to estimate the true mean reliably)
# Add CI bounds
summary_by_reef_year <- summary_by_reef_year %>%
mutate(
se_height = sd_height / sqrt(n),
ci_lower = mean_height - 1.96 * se_height,
ci_upper = mean_height + 1.96 * se_height
)
#Plot year 2024
ggplot(filter(summary_by_reef_year, sampling_year == 2024),
aes(x = reorder(oyster_reef_name, mean_height), y = mean_height)) +
geom_point(color = "steelblue") +
geom_errorbar(aes(ymin = ci_lower, ymax = ci_upper), width = 0.2, color = "steelblue") +
coord_flip() +
labs(
title = "Mean Oyster Height by Reef in 2024 with 95% Confidence Intervals",
y = "Height (mm)",
x = "Reef"
) +
theme_minimal()
# Table
summary_by_reef_year %>%
filter(sampling_year == 2024) %>%
select(Reef = oyster_reef_name,
`Mean Height (mm)` = mean_height,
`Lower 95% CI` = ci_lower,
`Upper 95% CI` = ci_upper) %>%
mutate(across(where(is.numeric), ~round(.x, 2))) %>%
kbl(
caption = "Table: Mean Oyster Height and 95% Confidence Intervals by Reef (2024).",
align = "lccc",
booktabs = TRUE
) %>%
kable_styling(
full_width = FALSE,
bootstrap_options = c("striped", "hover", "condensed", "responsive")
)
| Reef | Mean Height (mm) | Lower 95% CI | Upper 95% CI |
|---|---|---|---|
| Aquavila | 42.12 | 40.26 | 43.98 |
| Goiabal | 30.07 | 27.06 | 33.08 |
| Jacarequara | 30.31 | 28.60 | 32.02 |
| Lauro Sodré | 31.07 | 29.75 | 32.39 |
| Marauá | 27.96 | 26.75 | 29.18 |
| Pinheiro | 31.97 | 30.43 | 33.51 |
| Romana | 29.19 | 27.79 | 30.60 |
| Terra Amarela | 30.03 | 28.42 | 31.64 |
| Tio Oscar | 30.78 | 29.28 | 32.28 |
| Água Boa | 29.28 | 25.43 | 33.13 |
| Áries | 25.81 | 23.48 | 28.14 |
# Add CI bounds
summary_by_reef_year <- summary_by_reef_year %>%
mutate(
se_length = sd_length / sqrt(n),
ci_lower_length = mean_length - 1.96 * se_length,
ci_upper_length = mean_length + 1.96 * se_length
)
#Plot year 2024
ggplot(filter(summary_by_reef_year, sampling_year == 2024),
aes(x = reorder(oyster_reef_name, mean_length), y = mean_length)) +
geom_point(color = "darkgreen") +
geom_errorbar(aes(ymin = ci_lower_length, ymax = ci_upper_length),
width = 0.2, color = "darkgreen") +
coord_flip() +
labs(
title = "Mean Oyster Length by Reef in 2024 with 95% Confidence Intervals",
y = "Length (mm)",
x = "Reef"
) +
theme_minimal()
# Table
summary_by_reef_year %>%
filter(sampling_year == 2024) %>%
select(Reef = oyster_reef_name,
`Mean Length (mm)` = mean_length,
`Lower 95% CI` = ci_lower_length,
`Upper 95% CI` = ci_upper_length) %>%
mutate(across(where(is.numeric), ~round(.x, 2))) %>%
kbl(
caption = "Table: Mean Oyster Length and 95% Confidence Intervals by Reef (2024).",
align = "lccc",
booktabs = TRUE
) %>%
kable_styling(
full_width = FALSE,
bootstrap_options = c("striped", "hover", "condensed", "responsive")
)
| Reef | Mean Length (mm) | Lower 95% CI | Upper 95% CI |
|---|---|---|---|
| Aquavila | 59.58 | 57.14 | 62.02 |
| Goiabal | 38.74 | 34.79 | 42.69 |
| Jacarequara | 39.63 | 37.33 | 41.94 |
| Lauro Sodré | 38.20 | 36.59 | 39.81 |
| Marauá | 37.10 | 35.53 | 38.68 |
| Pinheiro | 38.49 | 36.67 | 40.32 |
| Romana | 36.54 | 34.75 | 38.33 |
| Terra Amarela | 36.03 | 34.18 | 37.88 |
| Tio Oscar | 36.73 | 34.98 | 38.49 |
| Água Boa | 36.47 | 31.86 | 41.07 |
| Áries | 32.69 | 29.89 | 35.49 |
Example Interpretation of Confidence Intervals:
In Pinheiro reef, the mean oyster length in 2024 was estimated at 38.49 mm, with a 95% confidence interval of 36.67 to 40.32 mm. This means we can be 95% confident that the true average oyster length at Pinheiro reef lies between 36.67 mm and 40.32 mm. The relatively narrow interval indicates that the estimate is statistically reliable and the current monitoring effort provides good precision at this site.
By contrast, in Água Boa reef, the mean oyster length in 2024 was 36.47 mm, but the confidence interval was much wider: 31.86 to 41.07 mm. This wider interval suggests lower reliability, likely due to smaller sample size or higher variability.
How do we know if a CI is good enough?
One practical benchmark is to check whether the confidence interval falls within a certain percentage of the mean. For example:
If we aim for precision within ±10% of the mean, then for a reef with a mean of 40 mm, the ideal CI would fall roughly within 36 to 44 mm.
In the case of Pinheiro, the CI range is about ±5% of the mean, which is excellent.
In Água Boa, the CI spans more than ±10%, which suggests the need for increased sampling or reduced variability to improve estimate reliability.
Answer: What is the smallest change in oyster height or length that the current monitoring system can reliably detect at each reef, with 80% power and 5% significance?
I calculate this using only 2024 samples, to reflect the actual performance and sensitivity of the current monitoring system. I’ll also show an example below to illustrate how to interpret these results in practice.
Note:
An 80% power means there’s an 80% chance of detecting a real change of that size if it actually exists (i.e., a low risk of a false negative)
A 5% significance level means we’re accepting a 5% risk of falsely detecting a change when there isn’t one (i.e., a false positive)
# Z-scores for 95% confidence and 80% power
z_alpha <- qnorm(1 - 0.05 / 2)
z_beta <- qnorm(0.80)
# Subset to 2024 data and calculate MDE
mde_2024 <- summary_by_reef_year %>%
filter(sampling_year == 2024) %>%
mutate(
# Height MDE
mde_height_mm = (z_alpha + z_beta) * (sd_height / sqrt(n)),
mde_height_pct = 100 * mde_height_mm / mean_height,
# Length MDE
mde_length_mm = (z_alpha + z_beta) * (sd_length / sqrt(n)),
mde_length_pct = 100 * mde_length_mm / mean_length)
mde_2024 %>%
select(
Reef = oyster_reef_name,
`Mean Height (mm)` = mean_height,
`MDE (mm)` = mde_height_mm,
`MDE (% of mean)` = mde_height_pct
) %>%
mutate(across(where(is.numeric), round, 2)) %>%
kbl(
caption = "Table: Minimum detectable effect in oyster height by reef (2024 samples only).",
align = "lccc",
booktabs = TRUE
) %>%
kable_styling(
full_width = FALSE,
bootstrap_options = c("striped", "hover", "condensed", "responsive")
)
| Reef | Mean Height (mm) | MDE (mm) | MDE (% of mean) |
|---|---|---|---|
| Aquavila | 42.12 | 2.66 | 6.31 |
| Goiabal | 30.07 | 4.30 | 14.29 |
| Jacarequara | 30.31 | 2.44 | 8.06 |
| Lauro Sodré | 31.07 | 1.88 | 6.06 |
| Marauá | 27.96 | 1.74 | 6.22 |
| Pinheiro | 31.97 | 2.20 | 6.88 |
| Romana | 29.19 | 2.00 | 6.87 |
| Terra Amarela | 30.03 | 2.31 | 7.68 |
| Tio Oscar | 30.78 | 2.14 | 6.95 |
| Água Boa | 29.28 | 5.50 | 18.79 |
| Áries | 25.81 | 3.33 | 12.91 |
mde_2024 %>%
select(
Reef = oyster_reef_name,
`Mean Length (mm)` = mean_length,
`MDE (mm)` = mde_length_mm,
`MDE (% of mean)` = mde_length_pct
) %>%
mutate(across(where(is.numeric), round, 2)) %>%
kbl(
caption = "Table: Minimum detectable effect in oyster length by reef (2024 samples only).",
align = "lccc",
booktabs = TRUE
) %>%
kable_styling(
full_width = FALSE,
bootstrap_options = c("striped", "hover", "condensed", "responsive")
)
| Reef | Mean Length (mm) | MDE (mm) | MDE (% of mean) |
|---|---|---|---|
| Aquavila | 59.58 | 3.49 | 5.85 |
| Goiabal | 38.74 | 5.64 | 14.57 |
| Jacarequara | 39.63 | 3.29 | 8.31 |
| Lauro Sodré | 38.20 | 2.31 | 6.04 |
| Marauá | 37.10 | 2.25 | 6.07 |
| Pinheiro | 38.49 | 2.61 | 6.77 |
| Romana | 36.54 | 2.55 | 6.99 |
| Terra Amarela | 36.03 | 2.65 | 7.35 |
| Tio Oscar | 36.73 | 2.50 | 6.81 |
| Água Boa | 36.47 | 6.58 | 18.05 |
| Áries | 32.69 | 4.00 | 12.24 |
Example Interpretation of Minimum Detectable Effect:
In Pinheiro reef, the mean oyster length in 2024 was estimated at 38.49 mm, and the current monitoring design allows us to detect a minimum change of 2.61 mm — approximately 6.8% of the mean. This suggests that even relatively small shifts in oyster length can be statistically detected at this site, indicating a good sensitivity of the monitoring system.
By contrast, in Água Boa, the mean length was 36.47 mm, but the minimum detectable effect was 6.58 mm, which is about 18% of the mean. This much larger threshold implies that only large changes in oyster length would be statistically detectable, highlighting low sensitivity — likely due to limited sample size or high variability in the data.
The current monitoring system shows consistent data collection in 2024 across most reefs, with several sites achieving over 200 samples. This strong sampling effort supports reliable statistical inference for oyster size estimates at these sites.
Confidence interval analysis demonstrates that at many reefs, mean estimates of oyster height and length are statistically precise.
Minimum detectable effect (MDE) analysis confirms that the current 2024 monitoring design is sensitive enough to detect biologically meaningful changes in oyster size across most reefs.
However, reefs with low sampling effort in 2024 (such as Água Boa, Áries, and Goiabal) show reduced precision and statistical sensitivity. Wide confidence intervals and high MDE values (above 12–18% of the mean), suggest that changes in oyster size must be much larger to be statistically detectable. Increasing sample size in these sites would improve reliability and sensitivity.