Benchmarking vs. Validation

Benchmarking task force: Bjarki, Laurie, Margarita, Francesco, Paul, …

Need to define benchmarking and be consistent

Within Teamfish I presume that the benchmarking should be on multispecies or fully ecosystem models where usually there is a continous implementation process also in relation to new data availability and knowledge on ecosystem processes (e.g. impact of climate on population dynamics). Probably, it would be nice to ask to the CS leaders and modellers their ideas on this. Regards, Francesco

There were 6 references to benchmarking in the 1st stage proposal but no clear definition, which may present problems when comparing across Case Studies and making links across WP.

A propsosal is to define what is meant by benchmarking in each WP, make this consistent across WPs (or at least identify differences in approach), and identify the requirements of the advisory/management bodies who will be the customers.

Two important concepts are Foresight and Validation

Existing Benchmarking Processes

Single Species Stock Assessment

ICES has a formal benchmarking process of single species advice
GFCM has implemented the benchmarking of the main stocks, with external reviewers. Presumably over the next few years this process will be enforced and better structured.
Other processes
- US has the SEDAR Benchmark Approach (e.g. benchmark and ecological referance ponts stock assessments for Atlantic Menhaden)
- In ICCAT the general approach is like GCFM to pay for an external expert to provide a review.
- Review paper on stock assessment diagnostics, with suggested procedure, this included techniques such as crossvalidation.

Management Strategy Evaluation of single species advice

Currently there is a separation of stock assessment benchmarks from MSE in ICES, yet in its definition, the benchmark reviews the reference points, which is what the MSE is for. Our “benchmark” should adheres to the ICES definition and includes the MSE, putting them back in step and conforming to the ICES definition
A problem is that a new way of working (new model, new data, new parameters, new ref points), in whatever guise (single species, multispecies, ecosystem), will not get accepted and used by ICES WG without being accepted at a benchmark. If it doesn’t get used by WG it doesn’t go into advice. If it doesn’t get into advice it doesn’t make it to management and is, therefore, not operational. So to operationalise a tool we have to have a benchmarking step, whether that be part of the formal set up or demonstrated clearly that it adheres to the same principles such that it will be ready when it is formally benchmarked.

+Ices is also developing guidelines for MSE WKGMSE3

Also a review of the procedures adopted by the tuna RFMOs has just been accepted by Fish and Fisheries.

Ecosystem Models

A main task is the development of a key run, e.g.

Working Group on Multispecies Assessment Methods (WGSAM).

Rather than concentarting on reference points alone, thought should be given to the dynamics and the role of feedback, e.g. Surplus production dynamics in declining and recovering fish populations

Benchmarking examples

Example of an idealised benchmarking process for single species stock assessments.

MSE

An Example of a review of sources of uncertainty included in tRFMO MSEs.

An example of benchmarking Operatinng Models conditioned on stock assessments.

Conclusions

In the TeamFISH project there should be a broadening of the ICES way of thinking, where currently the focus is on single-stock assessments.
We need something that is pragmatic that fits with and improves current practices.
The accepted definition of MSE is that it includes the feedback loop where the assessment and reference points to be used in practice are evaluated. Although often this is not done. If we are going to conduct MSEs with MiCE models then in many cases we are going to have to take shortcuts. This is fine as long as we recognise the limitations
A problem with ICES MSEs is that they do not fully evaluate uncertainty as Operating Model (OM) is commonly the assessment used to give advice, and only a limited number of scenarios are considered.
MSEs could also be run for different OM scenarios, e.g. including environmental factors for the use in the assessment process. The benchmarked assessment is then used as the basis for the SA in the MSE, where issues related to robust due to uncertainty is model structure can be explored.
When conducting MSE we could use a MiCE model as the OM and then use the single species advice to set quotas based on single species reference points. When we evaluate the outcomes though the multispecies reference points from the MiCE model would be used. We dont need multispecies reference points to set advice, but we do need them to see if the current advice is robust. This could be done for several single species stock assessments at the sametime
An idea is to start asking CS leaders about whcih stocks have planned ICES/GCFM benchmarks within the life time of the project. Then ask CS to developed some scenarios, e.g. related to the environment in their models.
Separation between ICES/GCFM benchmarks and what is to be performed by experts from the consortium to evaluate the usefulness of models for scientific advice.
Could we also engage external reviewers

TEAMFish: Benchmarking