Biomass-dynamic models such as SPiCT represent uncertainty in population dynamics through a process-error term. If the production function and state equation are correctly specified, and all relevant covariates are included, these process residuals should be independent and identically distributed around zero (i.e. white noise), with approximately constant variance. In practice, however, the process-error term often compensates for structural misspecification and non-stationarity, for example time-varying productivity, changing selectivity or catchability, e.g. in stocks which are not the main target.
Here we propose a set of diagnostics for process error using an example data set. The aim is not to formally test ypotheses, but to provide a visual summary to highlights when a model is explain model misspecification via process-error focussing on four aspects namely:
Use is made of existing diagnostics used for model fits, such as runs tests and residual diagnostics to help identify non-stationarity, the impact of environmental drivers, and the balance between stochasticity and model misspecification. If process residuals appear random with approximately constant variance, this suggests a correctly specified production function with genuine stochastic variation, conditional on the chosen model structure and fixed parameters. Positive or negative trends in residuals may indicate production-curve misspecification or structural changes, for example an incorrect shape parameter, time-varying productivity, changes in discarding or landings, or regime shifts. Consistently large residuals relative to the assumed process variance may reflect high biological variability, such as recruitment-driven dynamics, but may also signal structural model inadequacy or a mis-specified variance structure. Significant temporal autocorrelation may point to environmental forcing, omitted covariates, or serially correlated implementation error rather than white-noise process error. But we live in a world full of red noise.
The following R code loads precomputed process-error, e.g. output from SPiCT, JABBA or SS3, and runs the set of diagnostic plots for a selected stock.
library(ggplot2)
library(dplyr)
library(tidyr)
library(patchwork)
load("C:/active/flrpapers/rebuild/papers/brebuild/data/age/pe.RData")
pe = subset(
pe,
.id == "ank.27.78abd" &
Scenario == 1 &
SRR == "Beverton & Holt" &
!is.na(pe))
The data frame pe contains:
year - assessment yearpe - process residual on the biomass statebiomass - biomass estimatesA single figure with four panels is used to summarise process error: i.e. 1) residuals vs time, 2) residuals vs fitted biomass, 3) autocorrelation function (ACF), and 4) a histogram with normal curve for comparison.
p_time = ggplot(pe, aes(x = year, y = pe)) +
geom_hline(yintercept = 0, colour = "grey60") +
geom_line(colour = "steelblue") +
geom_point(colour = "steelblue") +
labs(x = "Year", y = "Process residual",
title = "Over time") +
theme_bw()
p_fit = ggplot(pe, aes(x = ssb, y = pe)) +
geom_hline(yintercept = 0, colour = "grey60") +
geom_point(alpha = 0.6, colour = "steelblue") +
labs(x = "Fitted biomass (or state)", y = "Process residual",
title = "Residuals vs Biomass") +
theme_bw()
acf_obj = acf(pe$pe, plot = FALSE)
acf_df = with(acf_obj, data.frame(lag = lag, acf = acf))[-1, ]
crit = qnorm(0.975) / sqrt(nrow(pe))
p_acf = ggplot(acf_df, aes(x = lag, y = acf)) +
geom_hline(yintercept = 0, colour = "grey60") +
geom_hline(yintercept = c(-crit, crit), linetype = "dashed",
colour = "grey60") +
geom_col(fill = "steelblue") +
labs(x = "Lag", y = "ACF",
title = "Autocorrelation") +
theme_bw()
p_hist = ggplot(pe, aes(x = pe)) +
geom_histogram(aes(y = ..density..), bins = 20,
fill = "steelblue", colour = "white", alpha = 0.7) +
stat_function(fun = dnorm,
args = list(mean = mean(pe$pe), sd = sd(pe$pe)),
colour = "red", linewidth = 0.8) +
labs(x = "Residual", y = "Density",
title = "Distribution") +
theme_bw()
(p_time | p_fit) /
(p_acf | p_hist)
The top-left panel (“Over time”) shows process residuals plotted against year. For a well-specified residuals would be expected to fluctuate around zero with no obvious trend and roughly constant variance. However, residuals are mostly positive after about 2000, with an upward shift in the mean. The spread also appears slightly larger in recent years. Together this suggest that the process-error term is not stationary and probably reflects systematic model bias rather than purely stochastic fluctuations.
The time-series panel, together with the top-right panel (“Residuals vs Biomass”), identifies trends and shifts. Before 2000, residuals randomly vary about zero; afterwards this they are predominantly positive. This step-change implies that the model starts to under-predict the biomass once the stock moves into a higher biomass regime. This is consistent with a production function that is mis-specified for the more recent period or with time-varying productivity or selectivity.
The bottom-right panel (“Distribution”) shows the empirical distribution of residuals with a normal density overlay. The shape is roughly Gaussian but centred above zero. Residual magnitudes are not particulary extreme, i.e. relative to the assumed process standard deviation, but the mean of 0.2 points to bias in the production curve or to a missing process, rather than to occasional large shocks superimposed on an otherwise correct specified model.
The bottom-left panel (“Autocorrelation”) is the ACF of the process residuals. Several low-lag bars exceed the approximate 95% bounds, and there is a pattern of positive autocorrelation at short lags followed by negative values at longer lags. This is inconsistent with a white-noise process and suggests slow, temporally structured deviations from the model, for example an environmental regime, time-varying productivity, or changes in catchability.
Together, these plots suggest that the process-error term in this example is accounting for model misspecification or non-stationarity rather than representing pure stochastic noise. This is because
Process-error appears not to be purely stochastic and so confidence intervals or fractiles from the fitted model will be biased.
Possible next steps are to
The same diagnostic framework can be applied systematically across stocks to identify cases where process error behaves as desired and those where it is clearly signalling deeper structural issues. These diagnostics can be applied to any model with an implicit or explicit production function.