By the end of this practical, you will be able to:
Fishing gears do not capture all sizes of fish equally. Selectivity describes the probability that a fish of a given size is retained by the gear once it encounters it. Understanding selectivity is crucial for:
A classic experimental design compares catches between two mesh sizes fished simultaneously (or alternating) in the same area. This controls for differences in fish availability, allowing us to isolate the effect of mesh size on selectivity.
Key assumption: The only difference between the catches is due to selectivity and relative fishing power—the fish population available to both gears is identical.
First, let’s load the packages we’ll need. If you haven’t installed
them, uncomment the install.packages() line.
We have length measurements from fish caught with two different mesh sizes. Load your data file and define the mesh sizes used in the experiment.
# Load the catch data
# IMPORTANT: Update the path below to match your file location
# setwd("YOUR/WORKING/DIRECTORY")
df <- read.csv("TP3_TwoMesh_LengthMeasurements.csv")
# Define mesh sizes (in mm)
mesh1 <- 80 # Smaller mesh
mesh2 <- 100 # Larger mesh
# Quick look at the data structure
str(df)## 'data.frame': 880 obs. of 3 variables:
## $ fish_id : chr "M80_0001" "M80_0002" "M80_0003" "M80_0004" ...
## $ mesh : int 80 80 80 80 80 80 80 80 80 80 ...
## $ length_cm: num 84.9 87.1 82.1 97.5 87.9 75.7 86.9 95.7 80.5 83.1 ...
Task: Examine the data frame. What variables does it contain? How many fish were measured in total?
Raw length measurements need to be binned into length classes and counted. This creates the catch-at-length matrix that forms the basis of our analysis.
# Bin width (cm) - this groups fish into size classes
# Adjust based on your data resolution (2 cm is common)
bin_width <- 2
# Create length bins (midpoint of each bin)
df_bins <- df %>%
mutate(length_bin = floor(length_cm / bin_width) * bin_width + bin_width/2)
# Count catches per length-bin and mesh
dat_long <- df_bins %>%
group_by(length = length_bin, mesh) %>%
summarise(catch = n(), .groups = "drop")
# Convert to wide format: one column per mesh
dat <- dat_long %>%
pivot_wider(names_from = mesh, values_from = catch, values_fill = 0) %>%
rename(C1 = !!as.character(mesh1), C2 = !!as.character(mesh2)) %>%
arrange(length) %>%
filter(C1 + C2 > 0) # Keep only length classes with catches
# View the catch-at-length table
print(dat, n = 25)## # A tibble: 32 × 3
## length C1 C2
## <dbl> <int> <int>
## 1 65 1 0
## 2 67 6 0
## 3 69 10 0
## 4 71 4 0
## 5 73 12 1
## 6 75 20 0
## 7 77 27 2
## 8 79 29 3
## 9 81 23 3
## 10 83 42 5
## 11 85 38 8
## 12 87 45 9
## 13 89 46 17
## 14 91 42 15
## 15 93 27 22
## 16 95 19 33
## 17 97 24 32
## 18 99 13 28
## 19 101 8 33
## 20 103 5 31
## 21 105 6 27
## 22 107 2 34
## 23 109 1 25
## 24 111 0 20
## 25 113 0 24
## # ℹ 7 more rows
Understanding the output:
C1is the count caught by the smaller mesh (80mm) andC2is the count caught by the larger mesh (100mm). Why do you think we removed rows where both catches are zero?
Before fitting any models, always visualize your data. This helps identify patterns and potential issues.
# Prepare data for plotting
plot_df <- bind_rows(
data.frame(length = dat$length, mesh = factor(mesh1), catch = dat$C1),
data.frame(length = dat$length, mesh = factor(mesh2), catch = dat$C2)
)
# Plot catch-at-length distributions
p1 <- ggplot(plot_df, aes(x = length, y = catch, color = mesh, fill = mesh)) +
geom_area(alpha = 0.3, position = "identity") +
geom_line(linewidth = 1) +
scale_color_manual(values = c("#065A82", "#00A896"), name = "Mesh (mm)") +
scale_fill_manual(values = c("#00A896", "#065A82"), name = "Mesh (mm)") +
labs(
title = "Catch-at-Length by Mesh Size",
x = "Length (cm)",
y = "Number caught"
)
print(p1)Discussion Question: Why is the 100mm mesh distribution shifted to the right compared to the 80mm mesh? What does this tell us about selectivity?
The Holt method is a classic approach based on the ratio of catches between two mesh sizes. The key insight is:
\[\ln\left(\frac{C_1}{C_2}\right) = a + b \cdot L\]
Where:
This relationship is linear in length—simple but powerful!
The cross-over length (\(L^*\)) is where both meshes catch equally (\(C_1 = C_2\)), meaning \(\ln(C_1/C_2) = 0\):
\[L^* = -\frac{a}{b}\]
Here we are using weighted linear regression, assuming that length classes with more fish give more reliable ratio estimates.
# Prepare data for Holt analysis
holt_df <- dat %>%
mutate(total = C1 + C2) %>%
# Avoid log(0) and require minimum sample size
filter(C1 > 0, C2 > 0, total >= 5) %>%
mutate(
log_ratio = log(C1 / C2),
weight = total # More fish = more reliable ratio
)
# Fit weighted linear regression
holt_fit <- lm(log_ratio ~ length, data = holt_df, weights = weight)
# View regression summary
summary(holt_fit)##
## Call:
## lm(formula = log_ratio ~ length, data = holt_df, weights = weight)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -4.289 -1.976 -0.518 1.384 3.971
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.533557 0.914797 18.07 4.53e-12 ***
## length -0.176153 0.009852 -17.88 5.34e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.423 on 16 degrees of freedom
## Multiple R-squared: 0.9523, Adjusted R-squared: 0.9494
## F-statistic: 319.7 on 1 and 16 DF, p-value: 5.345e-12
# Extract coefficients
a_holt <- coef(holt_fit)[1]
b_holt <- coef(holt_fit)[2]
# Calculate cross-over length
L_star <- -a_holt / b_holt
cat(sprintf(" Intercept (a) = %.4f\n", a_holt))## Intercept (a) = 16.5336
## Slope (b) = -0.17615
##
## Cross-over length L* (where both meshes catch equally) = 93.9 cm
# Generate predictions for plotting
pred_holt <- data.frame(length = seq(min(holt_df$length), max(holt_df$length), 1))
pred_holt$log_ratio <- predict(holt_fit, newdata = pred_holt)
# Plot
p2 <- ggplot(holt_df, aes(x = length, y = log_ratio)) +
geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
geom_point(aes(size = weight), alpha = 0.6, color = "#065A82") +
geom_line(data = pred_holt, color = "#F96167", linewidth = 1.2) +
geom_vline(xintercept = L_star, linetype = "dotted", color = "#00A896", linewidth = 1) +
annotate("text", x = L_star + 2, y = max(holt_df$log_ratio) * 0.8,
label = paste0("L* = ", round(L_star, 1), " cm"),
hjust = 0, color = "#00A896", fontface = "bold") +
scale_size_continuous(name = "Sample size", range = c(1, 5)) +
labs(
title = "Holt (1963) Catch-Ratio Analysis",
subtitle = "Weighted linear regression of ln(C1/C2) on length",
x = "Length (cm)",
y = expression(ln(C[1]/C[2]))
)
print(p2)Important: \(L^*\) is NOT the \(L_{50}\) of either mesh! It’s the length where both meshes catch equally: \(r_1(L^*) \cdot P_1 = r_2(L^*) \cdot P_2\)
Under the geometric similarity assumption (selectivity parameters scale proportionally with mesh size), we can estimate the selectivity coefficient \(k\):
# Under geometric similarity: L50 = k × mesh
# L* relates to the geometric mean of optimal lengths
k_holt <- L_star / sqrt(mesh1 * mesh2)
cat(sprintf("Estimated k from Holt method: %.4f\n\n", k_holt))## Estimated k from Holt method: 1.0494
## L50 for 80mm mesh ≈ 84.0 cm
## L50 for 100mm mesh ≈ 104.9 cm
Limitations of the Holt method: While quick and intuitive, this approach assumes a linear log-ratio relationship and lacks a proper statistical error structure. The SELECT method addresses these limitations.
The SELECT method (Millar & Fryer, 1999) is the modern standard for selectivity estimation. Instead of working with catch ratios, it models the probability of capture directly.
Key point: Given that a fish of length \(L\) was caught by one of the two meshes, what is the probability it was caught by mesh 2?
\[P(\text{mesh 2} | \text{caught}) = \frac{P_2 \cdot r_2(L)}{P_1 \cdot r_1(L) + P_2 \cdot r_2(L)}\]
Where:
This conditional probability follows a binomial distribution!
What about mesh 1? Remember that the probabilities
are complementary: \(P(\text{mesh 1} |
\text{caught}) = 1 - P(\text{mesh 2} | \text{caught})\). The
binomial likelihood handles both automatically — when we count
y fish caught by mesh 2 out of n total, the
remaining n - y caught by mesh 1 are implicit. The choice
of mesh 2 as the “success” is arbitrary; using mesh 1 would give
identical results with only the sign of alpha flipped.
Two common selectivity curves:
# Normal (dome-shaped) selectivity with geometric similarity
sel_normal_geom <- function(L, mesh, k, sig_ratio) {
# k = L50/mesh ratio (constant across mesh sizes)
# sig_ratio = sigma/mesh ratio (constant across mesh sizes)
L50 <- k * mesh
sigma <- sig_ratio * mesh
exp(-0.5 * ((L - L50) / sigma)^2)
}
# Logistic (S-shaped) selectivity with geometric similarity
sel_logistic_geom <- function(L, mesh, k, SR_ratio) {
# k = L50/mesh ratio (constant across mesh sizes)
# SR_ratio = selection range/mesh ratio (constant across mesh sizes)
L50 <- k * mesh
SR <- SR_ratio * mesh
1 / (1 + exp(-log(3) * 2 * (L - L50) / SR))
}select_df <- dat %>%
mutate(
n = C1 + C2, # Total caught at this length
y = C2, # Number caught by mesh2
p_obs = y / n # Observed proportion
) %>%
filter(n >= 3) # Minimum sample size for reliable proportions
cat(sprintf(" Length classes: %d\n", nrow(select_df)))## Length classes: 31
## Total fish: 879
Before fitting, let’s look at what we’re trying to model:
p3 <- ggplot(select_df, aes(x = length, y = p_obs)) +
geom_point(aes(size = n), alpha = 0.6, color = "#065A82") +
geom_hline(yintercept = 0.5, linetype = "dashed", color = "gray50") +
scale_size_continuous(name = "Sample\nsize", range = c(1, 5)) +
coord_cartesian(ylim = c(0, 1)) +
labs(
title = "Observed Conditional Proportions",
subtitle = paste0("P(caught by ", mesh2, "mm | caught by either mesh)"),
x = "Length (cm)",
y = "Proportion caught by larger mesh"
)
print(p3)Task: Describe the pattern you see. At what length does the proportion cross 0.5? How does this relate to \(L^*\) from the Holt method?
This is the heart of the SELECT method. We maximize the likelihood of observing our data given the selectivity parameters.
nll_select <- function(par, data, mesh1, mesh2, sel_type = "normal") {
# Extract parameters (on transformed scale for optimization)
k <- exp(par[1]) # Ensures k > 0
spread <- exp(par[2]) # sig_ratio or SR_ratio > 0
alpha <- par[3] # log(P2/P1) - can be any real number
L <- data$length
y <- data$y
n <- data$n
# Calculate selectivity for each mesh
if (sel_type == "normal") {
r1 <- sel_normal_geom(L, mesh1, k, spread)
r2 <- sel_normal_geom(L, mesh2, k, spread)
} else if (sel_type == "logistic") {
r1 <- sel_logistic_geom(L, mesh1, k, spread)
r2 <- sel_logistic_geom(L, mesh2, k, spread)
}
# Relative fishing power
P2_P1 <- exp(alpha)
# Conditional probability of capture by mesh2
p <- (P2_P1 * r2) / (r1 + P2_P1 * r2)
# Avoid numerical issues (log of 0 or 1)
eps <- 1e-10
p <- pmin(pmax(p, eps), 1 - eps)
# Binomial negative log-likelihood
-sum(dbinom(y, n, p, log = TRUE))
}This model is appropriate for gillnets, where fish can be too small to be caught (mesh passes through) or too large (mesh doesn’t stretch enough).
# Starting values from Holt estimate
k_start <- L_star / sqrt(mesh1 * mesh2)
start_normal <- c(log(k_start), log(0.12), 0)
# Optimize
fit_normal <- optim(
par = start_normal,
fn = nll_select,
data = select_df,
mesh1 = mesh1,
mesh2 = mesh2,
sel_type = "normal",
method = "BFGS",
hessian = TRUE
)
# Check convergence
if (fit_normal$convergence != 0) {
warning("Normal model did not converge!")
} else {
cat("Normal model converged successfully.\n\n")
}## Normal model converged successfully.
# Extract estimates (back-transform from log scale)
k_normal <- exp(fit_normal$par[1])
sig_ratio_normal <- exp(fit_normal$par[2])
alpha_normal <- fit_normal$par[3]
# Standard errors from Hessian (delta method approximation)
se_normal <- sqrt(diag(solve(fit_normal$hessian)))
# Display model results
cat(sprintf(" k (L50/mesh) = %.4f (SE: %.4f)\n", k_normal, se_normal[1] * k_normal))## k (L50/mesh) = 1.0392 (SE: 0.5866)
## sig_ratio = 0.1180 (SE: 0.0330)
## alpha (log P2/P1) = -0.4443 (SE: 9.6768)
## L50 (mesh1=80mm) = 83.1 cm
## L50 (mesh2=100mm) = 103.9 cm
## Selection range (mesh1) ≈ 18.9 cm
## Selection range (mesh2) ≈ 23.6 cm
## Relative fishing power P2/P1 = 0.641
# AIC for model comparison
aic_normal <- 2 * fit_normal$value + 2 * length(fit_normal$par)
cat(sprintf("AIC (Normal): %.2f\n", aic_normal))## AIC (Normal): 81.88
This model is appropriate for trawls, where fish above a certain size are always retained.
# Starting values
start_logistic <- c(log(k_start), log(0.2), 0)
# Optimize
fit_logistic <- optim(
par = start_logistic,
fn = nll_select,
data = select_df,
mesh1 = mesh1,
mesh2 = mesh2,
sel_type = "logistic",
method = "BFGS",
hessian = TRUE
)
# Extract estimates
k_logistic <- exp(fit_logistic$par[1])
SR_ratio_logistic <- exp(fit_logistic$par[2])
alpha_logistic <- fit_logistic$par[3]
# Standard errors from Hessian
se_logistic <- sqrt(diag(solve(fit_logistic$hessian)))
# Display results
cat(sprintf(" k (L50/mesh) = %.4f (SE: %.4f)\n", k_logistic, se_logistic[1] * k_logistic))## k (L50/mesh) = 1.0493 (SE: 0.0127)
cat(sprintf(" SR_ratio = %.4f (SE: %.4f)\n", SR_ratio_logistic, se_logistic[2] * SR_ratio_logistic))## SR_ratio = 0.0800 (SE: 0.0039)
## alpha (log P2/P1) = 2.9385 (SE: 0.3057)
## Derived selectivity parameters:
## L50 (mesh1=80mm) = 83.9 cm
## L50 (mesh2=100mm) = 104.9 cm
## Selection range (mesh1) = 6.4 cm
## Selection range (mesh2) = 8.0 cm
## Relative fishing power P2/P1 = 18.888
# AIC
aic_logistic <- 2 * fit_logistic$value + 2 * length(fit_logistic$par)
cat(sprintf("AIC (Logistic): %.2f\n", aic_logistic))## AIC (Logistic): 101.66
We use AIC (Akaike Information Criterion) to compare models. Lower AIC indicates a better balance between fit and complexity.
## Normal (dome-shaped): AIC = 81.88
## Logistic (S-shaped): AIC = 101.66
## ΔAIC = 19.78
if (aic_normal < aic_logistic) {
cat(">> Best model: NORMAL (dome-shaped)\n")
cat(" This is expected for gillnet-like selectivity.\n")
best_type <- "normal"
} else {
cat(">> Best model: LOGISTIC (S-shaped)\n")
cat(" This suggests trawl-like selectivity.\n")
best_type <- "logistic"
}## >> Best model: NORMAL (dome-shaped)
## This is expected for gillnet-like selectivity.
Task: Why would you expect gillnets to have dome-shaped selectivity while trawls have S-shaped selectivity? Think about the physical mechanisms of fish capture.
# Generate length sequence for plotting
L_seq <- seq(min(dat$length), max(dat$length), by = 0.5)
# Calculate selectivity curves for best model
if (best_type == "normal") {
sel1 <- sel_normal_geom(L_seq, mesh1, k_normal, sig_ratio_normal)
sel2 <- sel_normal_geom(L_seq, mesh2, k_normal, sig_ratio_normal)
} else {
sel1 <- sel_logistic_geom(L_seq, mesh1, k_logistic, SR_ratio_logistic)
sel2 <- sel_logistic_geom(L_seq, mesh2, k_logistic, SR_ratio_logistic)
}
# Combine for plotting
sel_curves <- bind_rows(
data.frame(length = L_seq, selectivity = sel1, mesh = factor(mesh1)),
data.frame(length = L_seq, selectivity = sel2, mesh = factor(mesh2))
)
# Plot
p4 <- ggplot(sel_curves, aes(x = length, y = selectivity, color = mesh)) +
geom_line(linewidth = 1.2) +
scale_color_manual(values = c("#065A82", "#00A896"), name = "Mesh (mm)") +
coord_cartesian(ylim = c(0, 1.05)) +
labs(
title = paste0("Fitted Selectivity Curves (", best_type, " model)"),
x = "Length (cm)",
y = "Relative selectivity r(L)"
)
print(p4)Good models should have residuals randomly scattered around zero with no systematic patterns.
# Calculate predicted probabilities for the best model
if (best_type == "normal") {
r1_pred <- sel_normal_geom(select_df$length, mesh1, k_normal, sig_ratio_normal)
r2_pred <- sel_normal_geom(select_df$length, mesh2, k_normal, sig_ratio_normal)
P2_P1 <- exp(alpha_normal)
} else {
r1_pred <- sel_logistic_geom(select_df$length, mesh1, k_logistic, SR_ratio_logistic)
r2_pred <- sel_logistic_geom(select_df$length, mesh2, k_logistic, SR_ratio_logistic)
P2_P1 <- exp(alpha_logistic)
}
# Calculate deviance residuals
select_df <- select_df %>%
mutate(
p_pred = (P2_P1 * r2_pred) / (r1_pred + P2_P1 * r2_pred),
y_exp = n * p_pred,
dev_resid = sign(y - y_exp) * sqrt(2 * (
ifelse(y > 0, y * log(y / y_exp), 0) +
ifelse(n - y > 0, (n - y) * log((n - y) / (n - y_exp)), 0)
))
)
# Plot residuals
p5 <- ggplot(select_df, aes(x = length, y = dev_resid)) +
geom_hline(yintercept = 0, color = "gray50") +
geom_hline(yintercept = c(-2, 2), linetype = "dashed", color = "gray70") +
geom_point(aes(size = n), alpha = 0.6, color = "#065A82") +
scale_size_continuous(name = "Sample\nsize", range = c(1, 4)) +
labs(
title = "Deviance Residuals",
subtitle = "Points should be randomly scattered between ±2",
x = "Length (cm)",
y = "Deviance residual"
)
print(p5)Interpreting residuals: Points outside the dashed lines (±2) suggest potential lack of fit at those lengths. Systematic patterns (e.g., all positive at small sizes, all negative at large sizes) would indicate model misspecification.
Based on our analysis:
Question: What do these values mean for:
Suppose the length at 50% maturity (\(L_m\)) for this species is 90 cm.
Question: Which mesh size would better protect juveniles? Why?
Hint: Compare \(L_{50}\) of each mesh to \(L_m\). What proportion of immature fish would be retained by each mesh?
Try modifying the analysis:
bin_width parameter to 1 cm or 4 cm. How
does this affect your estimates?| Parameter | Description |
|---|---|
| \(L_{50}\) | Length at 50% (or maximum) selectivity |
| Selection range | Width of the selectivity curve (~\(2\sigma\) for normal) |
| Relative fishing power (\(P_2/P_1\)) | Efficiency ratio between gears |
Under this assumption, selectivity scales with mesh size:
This allows us to estimate a single set of parameters (\(k\), \(\sigma_{ratio}\)) that apply to any mesh size.
## Great work! - You have fitted your first size selectivity models