Introduction

In fisheries management, selecting appropriate indicators or model-based trends for use in a Harvest Control Rule (HCR) is crucial. These indicators should not only classify the stock’s status correctly but also identify trends in stock status. This vignette discusses the process of choosing and fine-tuning indicators for HCRs, specifically through the use of Receiver Operator Characteristics (ROC) analysis.

Advice for data limited stocks may be based on empirical indicators, since often only a relative index of abundance, such as a survey index or catch per unit effort (CPUE), is available. These indices are used in a harvest control rules (HCRs) to identify whether a quantity such as stock biomass or harvest rate is increasing or decreasing. For example, the “2-over-3” rule where the last 2 values are compared to the earlier 3 values, or to Y values from a reference period (10.1093/icesjms/fsaa054?), or a linear regression trend (10.1093/icesjms/fsu017?)

Such rules are both simple and intuitive as they move catches up or down in relation to the trend in the recent data. Rule can be tuned to give the desired level of stability in catch advice, and fluctuations in catch advice can be dampened by reducing and choosing a longer period over which the slope is calculated. However, in the absence of a target such rules cannot rebuild stock biomass adequately from low depletion levels.

The Need for Effective Indicators

When choosing indicators or model-based trends for HCRs, it is necessary to demonstrate their ability to correctly classify a stock as overfished or experiencing overfishing. Additionally, they should be capable of identifying trends in stock status. To assess these criteria, we follow a structured approach.

The Process

Operating Model Without Feedback: Our process begins by running an Operating Model without feedback, which incorporates critical life history information. This model represents the dynamic nature of the fishery system, including factors like fishing mortality.
Simulating an Index: To gauge the state of the stock and its abundance, we simulate an index using an observation error model. This empirical index serves as a measurable proxy for the stock’s abundance.
Receiver Operator Characteristics (ROC): To ensure the effectiveness of the empirical indicators in HCRs, we employ Receiver Operator Characteristics (ROC) analysis. ROC analysis assesses the indicators’ performance in correctly classifying stock status and detecting trends.

In conclusion, the process involves conditioning an operating model with life history information, generating an empirical index of abundance through an observation error model, and fine-tuning the selected indicators for HCRs using ROC analysis. This structured approach ensures that the chosen indicators are effective tools for making informed decisions about stock management.

Simulation

When choosing an indicators, or model-based trends, for use in a harvest control rule, it is necessary to show what combination of index and reference level can correctly classify a stock as being over-fished or over-fishing occurring, and whether the index can identify trends in stock status. We, therefore run an Operating Model without feedback where there is a trend in fishing mortality. We then use Reciever Operator Characteristics to tune empirical indicators for use a HCRs. First a simulation or operating model is conditioned ion life history information and then an obervation error model is used to simulate an index of abundance.

Packages

As well as ‘FLR’ a variety of other packages are required

for plotting

and data manipulation

library(plyr)

Quick Start

The packages FLife is required for modelling life-histories and estimation. The simplest way to obtain these to install them from the FLR repository

See help(install.packages) for more details.

Operating Model Conditioning

Use life history parameters to condition a FLBRP equilibrium object

par=lhPar(FLPar(c(linf= 59.1,  k=0.28, t0=-0.4,
           a=0.01111,b=3.15,a50=4.0, l50=43.25,
           s=0.7),units="NA"))
eq=lhEql(par)

Set F to be linear increasing or decreasing trends

fbar(eq)=propagate(FLQuant(rep(1,40))%*%refpts(eq)["msy","harvest"],100)
fbr2=maply(data.frame(x=c(seq(0.2,1,length.out=51),seq(1,4,length.out=51)[-1])), function(x) seq(1,x,length.out=20))
fbar(eq)[,ac(21:40)]=fbar(eq)[,ac(21:40)]%*%FLQuant(c(t(fbr2)),dimnames=list(year=21:40,iter=seq(100)))

Convert the equilibrium object to an FLStock to mocel time series dynamics

set.seed(234) 
srDev =rlnoise(100,fbar(eq)%=%0,0.3,0)

om=as(eq,"FLStock")
om=propagate(om,dim(srDev)[6])


om=ffwd(om,fbar=fbar(eq)[,-1],sr=eq,deviances=srDev)
om=window(om,start=11)

mets <- list(F       =function(x, y) fbar( x)/  fmsy(y),
             SSB     =function(x, y) ssb(  x)/ sbmsy(y),
             Recruits=function(x, y) rec(  x)/refpts(y)["msy","rec"],
             Yield   =function(x, y) catch(x)/refpts(y)["msy","yield"])
fqs <- FLQuants(lapply(mets, function(fn) fn(om, eq))) 

plot(fqs,iter=1)+
  ylim(c(0, NA))+
  geom_hline(yintercept=1, linetype=2)+
  theme_bw(16)+
  theme(legend.position="none")

Figure 1 Operating Model.

Empirical Rules

X-Over-Y Rule

The X-Over-Y rule is used for data poor stocks, this is simply the average of the most recent X years divided by the previous Y years, e.g. Simple catch rules have been developed for data-limited stocks, for example, the “2 over 3” rule aims to keep stocks at their current level by multiplying recent catches by the trend in a biomass index:

\[\begin{equation} TAC_{t+1}=TAC_t−\frac{\sum\limits_{i=X}^{t-X+i} (I_i)/X}{\sum\limits_{i=Y}^{t-X-Y+i} (I_i)/Y} \end{equation}\]

where \(TAC_{+1}\) is the newly advised catch for year \(t+1\), \({TACy_t}\) is the previously advised catch, and \(I\) is an index of relative abundance.

The rule can also be used for a reference period, rather than making \(Y\) a sliding window, as it can also be set it to a period of years (e.g. N years ago) when the stock was thought to be at a target level.

\[\begin{equation} TAC_{t+1}=TAC_t−\frac{\sum\limits_{i=X}^{t-X+i} (I_i)/X}{\sum\limits_{i=Y-N}^{t-X-Y+i} (I_i)/Y} \end{equation}\]

Linear trend

The trend for the recent period (\(s\)) could also estimated by the slope of the linear regression of \(ln I_y\) against \(y^\prime\) for years \(y^\prime y-p+1, y-p+2,...,y\) for an abundance index \(I\), and \(p\) is the number of years over which the slope is calculated.

For example the rule of (hillary2013sbthcr?), which takes as input an index of abundance (\(I_{t}\)): the total allowable catch next year (\(TAC_{t+1}\)) is increased relative to the catch this year (\(TAC_t\)) if the trend is positive and decreased if the trend is negative.

\[\begin{equation} TAC_{t+1}=TAC_t\times \left\{\begin{array}{rcl} {1-k_1|\lambda|} & \mbox{for} & \lambda<0\\[0.35cm] {1+k_2\lambda} & \mbox{for} & \lambda\geq 0 \end{array}\right. \label{eqn:bft} \end{equation}\]

where \(\lambda\) is the slope in the regression of \(\ln I_t\) against year for the most recent \(n\) years and \(k_1\) and \(k_2\) are parameters. The rule actions asymmetry, as decreases in the index, do not necessarily result in the same relative change as an increase. The control parameters values (\(k_1\),\(k_2\)) are tuned, i.e. calibrated, to meet the management objectives.

If p, X or Y are too small the trend estimates could fluctuate too much (tracking noise), but if too large, a HCR would not be able to react sufficiently rapidly to recent trends in resource abundance.

Indicators

Simulate an inxdex of abundance

idx=ssb(om)%*%rlnorm(dim(ssb(om))[6],ssb(om)%=%0,0.3)

Classification Skill

Receiver Operating Characteristics

For a particular index, e.g. length-based-indicator and threshold a confusion matrix, can be used to visualise and summarise classification skill. First the classifications are summarised as being either true positive (TP), true negative (TN), false positive (FP) or false negative (FN). The first panel in Figure 3 is a visualisation of a confusion matrix, and plots the index relative to the threshold against the true stock status from the Operating Model relative to a reference point. The cross-hairs are at x=1 and y=1, points in the top right-hand quadrant are TP and those in the top left-hand quadrant are FP. The true skill statistic (TSS) can be used to summarise classification skill, this is the sum of the true positive rate (TPR=TP/(TP+FP)) and the false positive rate (FPR=FP/(TP+FP)). A coin toss would have a zero TSS as the chance of a correct classification is the same as an incorrect one. Perfect classifications would receive a score of 1, random ones a score of 0, and predictions inferior to random ones a negative score. The best threshold can be found from calibration, i.e. the level that maximises the sum of TPRs (sensitivity) and TNRs (specificity. The second panel plots TSS against the reference level; the reference level is negatively biased, since if the reference level was increased the TP would be increased and the TSS would improve.

Receiver Operating Characteristics (green1966signal?) can be used to estimate the ability of indicators to assess status. ROC curves are constructed by sorting the values of the indicator, in the case of overfishing (\(F/F_{MSY}>1\)) this would be from high to low, while for overfished (\(B/B_{MSY}<1\)) this would be low to high and then comparing these to the known state (i.e. from the Operating Model). The cumulative TPR and TNR are then calculated for the ordered observed outcomes. The TPR can then be plotted against FPR for the ordered indicator. ROC curves can be thought of as a plot of the power as a function of the Type I Error of the decision rule, providing a tool to select the best candidate indicators. This also allows tuning, i.e. calibration, by choosing a reference level with the best classification skill and allows the bias in the standard reference points and levels to be evaluated. The ROC curve is a probability curve, and the Area Under the Curve (AUC) is an important metric for measuring performance. For example, a coin toss would produce a curve that fell along the y=x line, and the area under the curve would equal 0.5. The closer the area under the curve is to 1, the better an indicator is at ranking. The ROC curve can also be used to graphically identify the performance of a choice of indicator ratio (i.e. discriminant threshold): since the best reference points have the shortest Euclidean distance between the top left-hand corner (TPR=1, FPR=0) and the corresponding point on the curve.

Risks are also asymmetric, i.e. the risk of indicating overfishing is occurring when the stock is sustainably exploited is not the same as the risk of failing to identify overfishing. Therefore, it may be desirable to adjust the threshold to increase or decrease the sensitivity to false positives or negatives. While some indicators may perform better at identifying the start of overfishing than recovery, and vice versa. As well as assessing state, the ROC curves can be used to assess trend as well, e.g. a metric based on the x-over-y rule of a linear trend from recent points.

Reference Level

Figure 2. Pairwise scatter plot of index relative to reference verses \(SSB/B_{MSY}\)

Figure 3. ROC curve, black point is when index/reference =1, and red is the point with the best skill score.

Figure 4. Plot of TSS v index value.

2-over-3 Rule

ggplot(model.frame(FLQuants(State    =state,
                            Indicator=trnd)))+
  geom_point(aes(State,Indicator))+
  geom_hline(aes(yintercept=1),linetype=2)+
  geom_vline(aes(xintercept=1),linetype=2)+
  geom_text(aes(x=x,y=y,label=label),data=data.frame(x=c(0.1,0.1,2,2),y=c(0.1,2.75,2.75,0.1),label=c("TN","FP","TP","FN")),
             size=6,col="blue")+
  xlab("Operating Model Status")+ylab("Indicator/Reference")+
  theme_bw()

Figure 5. Pairwise scatter plot of index relative to reference verses \(SSB/B_{MSY}\)

ggplot(idxSkill)+
  geom_line(aes(FPR,TPR))+
  geom_abline(aes(intercept=0,slope=1),linetype=2)+
  geom_point(aes(FPR,TPR),data=idxSkill[order((idxSkill$ind-1)^2)[1],],size=5)+
  geom_point(aes(FPR,TPR),data=idxSkill[max(idxSkill$TSS)==idxSkill$TSS,],size=5,col="blue")+
  xlab("False Positive Rate (FPR)")+ylab("True Positive Rate (TPR)")

Figure 6. ROC curve, black point is when index/reference =1, and red is the point with the best skill score.

ggplot(idxSkill)+
  geom_line(aes(ind,TSS))+
  geom_vline(aes(xintercept=1),col="red")+
  xlab("Reference Level")+ylab("True Skill Score (TSS)")

Figure 7. Plot of TSS v index value.

Reference Period

ggplot(model.frame(FLQuants(State    =state,
                            Indicator=trnd)))+
  geom_point(aes(State,Indicator))+
  geom_hline(aes(yintercept=1),linetype=2)+
  geom_vline(aes(xintercept=1),linetype=2)+
  geom_text(aes(x=x,y=y,label=label),data=data.frame(x=c(0.1,0.1,2,2),y=c(0.1,2.75,2.75,0.1),label=c("TN","FP","TP","FN")),
             size=6,col="blue")+
  xlab("Operating Model Status")+ylab("Indicator/Reference")+
  theme_bw()

Figure 8. Pairwise scatter plot of index relative to reference verses \(SSB/B_{MSY}\)

ggplot(idxSkill)+
  geom_line(aes(FPR,TPR))+
  geom_abline(aes(intercept=0,slope=1),linetype=2)+
  geom_point(aes(FPR,TPR),data=idxSkill[order((idxSkill$ind-1)^2)[1],],size=5)+
  geom_point(aes(FPR,TPR),data=idxSkill[max(idxSkill$TSS)==idxSkill$TSS,],size=5,col="blue")+
  xlab("False Positive Rate (FPR)")+ylab("True Positive Rate (TPR)")

Figure 9. ROC curve, black point is when index/reference =1, and red is the point with the best skill score.

ggplot(idxSkill)+
  geom_line(aes(ind,TSS))+
  geom_vline(aes(xintercept=1),col="red")+
  xlab("Reference Level")+ylab("True Skill Score (TSS)")

Figure 10. Plot of TSS v index value.

Linear Regression

ggplot(model.frame(FLQuants(State    =state, 
                            Indicator=trnd)))+
  geom_point(aes(State,Indicator))+
  geom_hline(aes(yintercept=0),linetype=2)+
  geom_vline(aes(xintercept=0),linetype=2)+
  geom_text(aes(x=x,y=y,label=label),data=data.frame(x=c(-0.4,-0.4,0.4,0.4),y=c(-0.6,0.8,0.8,-0.6),label=c("TN","FP","TP","FN")),
             size=6,col="blue")+
  xlab("Operating Model Status")+ylab("Indicator/Reference")+
  theme_bw()

Figure 11. Pairwise scatter plot of index relative to reference verses \(SSB/B_{MSY}\)

ggplot(idxSkill)+
  geom_line(aes(FPR,TPR))+
  geom_abline(aes(intercept=0,slope=1),linetype=2)+
  geom_point(aes(FPR,TPR),data=idxSkill[order((idxSkill$ind-1)^2)[1],]   ,size=5,col="red")+
  geom_point(aes(FPR,TPR),data=idxSkill[max(idxSkill$TSS)==idxSkill$TSS,],size=5,col="blue")+
  xlab("False Positive Rate (FPR)")+ylab("True Positive Rate (TPR)")+
  geom_point(aes(0,1),size=5)

Figure 12. ROC curve, red point is when index/reference =1, and blue is the point with the best skill score.

ggplot(idxSkill)+ 
  geom_line(aes(ind,TSS))+
  geom_vline(aes(xintercept=1),col="red")+
  xlab("Reference Level")+ylab("True Skill Score (TSS)")

Figure 13. Plot of TSS v index value.

More Information

You can submit bug reports, questions or suggestions on FLife at the FLife issue page ¹, or on the FLR mailing list.
Or send a pull request to https://github.com/lauriekell/FLife/
For more information on the FLR Project for Quantitative Fisheries Science in R, visit the FLR webpage ².
The latest version of FLife can always be installed using the devtools package, by calling

ROC curve An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds. This curve plots two parameters:

True Positive Rate False Positive Rate True Positive Rate (TPR) is a synonym for recall and is therefore defined as follows:

False Positive Rate (FPR) is defined as follows:

An ROC curve plots TPR vs. FPR at different classification thresholds. Lowering the classification threshold classifies more items as positive, thus increasing both False Positives and True Positives. The following figure shows a typical ROC curve.

ROC Curve showing TP Rate vs. FP Rate at different classification thresholds. Figure 4. TP vs. FP rate at different classification thresholds.

To compute the points in an ROC curve, we could evaluate a logistic regression model many times with different classification thresholds, but this would be inefficient. Fortunately, there’s an efficient, sorting-based algorithm that can provide this information for us, called AUC.

AUC: Area Under the ROC Curve AUC stands for “Area under the ROC Curve.” That is, AUC measures the entire two-dimensional area underneath the entire ROC curve (think integral calculus) from (0,0) to (1,1).

AUC (Area under the ROC Curve). Figure 5. AUC (Area under the ROC Curve).

AUC provides an aggregate measure of performance across all possible classification thresholds. One way of interpreting AUC is as the probability that the model ranks a random positive example more highly than a random negative example. For example, given the following examples, which are arranged from left to right in ascending order of logistic regression predictions:

Positive and negative examples ranked in ascending order of logistic regression score Figure 6. Predictions ranked in ascending order of logistic regression score.

AUC represents the probability that a random positive (green) example is positioned to the right of a random negative (red) example.

AUC ranges in value from 0 to 1. A model whose predictions are 100% wrong has an AUC of 0.0; one whose predictions are 100% correct has an AUC of 1.0.

AUC is desirable for the following two reasons:

AUC is scale-invariant. It measures how well predictions are ranked, rather than their absolute values. AUC is classification-threshold-invariant. It measures the quality of the model’s predictions irrespective of what classification threshold is chosen. However, both these reasons come with caveats, which may limit the usefulness of AUC in certain use cases:

Scale invariance is not always desirable. For example, sometimes we really do need well calibrated probability outputs, and AUC won’t tell us about that.

Classification-threshold invariance is not always desirable. In cases where there are wide disparities in the cost of false negatives vs. false positives, it may be critical to minimize one type of classification error. For example, when doing email spam detection, you likely want to prioritize minimizing false positives (even if that results in a significant increase of false negatives). AUC isn’t a useful metric for this type of optimization.

There are two main questions to be asked when choosing LBIs, namely can a combination of indicator, reference point, and reference level correctly classify a stock, e.g. as being over-fished; or can an indicator be used to rank stocks or identify trends in stock status, i.e. should some stocks be assigned higher priority for management intervention in a risk assessment or are things getting better or worse?

For a particular stock and LBI, the best discriminate threshold, i.e. the ratio of the indicator to the reference level, for classifying overfishing, is unlikely to be the one listed in Table 1. This is due to variations in stock and fishery characteristics and uncertainty about the assumptions made. For example, in the case of and L∞, the ratio with the best classification skill may not be 0.8. High random, e.g. measurement, error may also lead to poor classification skill. We, therefore, calculate the true positive rate (TPR, i.e. sensitivity), and the true negative rate (TNR, i.e. specificity). Sensitivity (⁠

⁠) measures the ability of a test to identify positive cases, i.e. the proportion of positives that are correctly identified, while specificity (⁠

⁠) measures the proportion of negatives that are correctly identified. This allows the true skill score (TSS) to be calculated, i.e. TSS = TPR + TNR - 1. A perfect prediction would receive a score of 1, random predictions receive a score of 0, and predictions inferior to random ones receive a negative score.

Receiver operating characteristic (ROC) curves (Green et al., 1966) can be used to estimate the ability of LBIs to assess status. ROC curves were constructed by sorting the values of F/FMSY, with the highest values first, from the Operating Model and then comparing these to each LBI. The cumulative TPR and TNR are then calculated for the ordered observed outcomes, and the TPR is plotted against the false positive rate (FPR = 1 − TNR) for the different observed indicator and reference ratios, i.e. the potential threshold settings. ROC curves can be thought of as a plot of the power as a function of the Type I Error of the decision rule, and so provides a tool to select the best candidate indicators. This also allows tuning, i.e. calibration, by choosing a reference level that has the best classification skill, and allows the bias in the standard reference points and levels to be evaluated.

The ROC curve is a probability curve, and the area under the curve (AUC) is an important metric for measuring performance. For example, a coin toss would produce a curve that fell along the y = x line and the AUC would be equal to 0.5. The closer the AUC is to 1 the better an indicator is at ranking. The ROC curve can also be used to graphically identify the performance of a choice of indicator ratio (i.e. discriminant threshold): since the best reference points have the shortest euclidean distance between the top left-hand corner (TPR = 1, FPR = 0) and the corresponding point on the curve.

Risks are also asymmetric, i.e. the risk of indicating overfishing is occurring when the stock is sustainably exploited is not the same as the risk of failing to identify overfishing. It may be desirable, therefore, to adjust the threshold to increase or decrease the sensitivity to false positives or false negatives. While some indicators may perform better at identifying the start of overfishing than recovery, and vice versa.

Software Versions

R version 4.2.1 (2022-06-23)
FLCore: 2.6.19.9105
FLPKG:
Compiled: Fri Jan 12 12:41:40 2024
Git Hash:

Author information

Laurence KELL. laurie@seaplusplus.co.uk

FAO: Small Pelagics

03

L Kell

12 January, 2024