This document summarizes work done to investigate SERFS sampling design options. A change in sampling design for the SERFS monitoring program has been discussed for many years, particularly consideration of a stratified sampling design. Consideration has been motivated by at least the following issues:
Change in estimation performance of species relative abundance associated with potential future reductions in sampling effort is unknown. A stratified design may be more robust than a simple random design to sampling effort reductions.
Overall improvement in estimation efficiency by identifying important and consistent sources of variation in fish distribution.
The overall strategy to evaluate alternative sampling designs will be to use a simulation model to: 1) generate population trends of each species and distribute those species across time and among the defined SERFS sampling universe; 2) sample those locations with chevron traps to observe species catch; 3) generate abundance indices for target species; and 4) evaluate the performance of the sampling design as measured via criteria based on a collection of attributes from the abundance indices of target species. These various elements can be termed: 1) Operating Model, 2) Observation Model; 3) Estimation Model, and 4) Evaluation Process (Figure 1). Currently, only chevron trap data are being used and the simulation is aimed at modeling only chevron trap sampling.
The first requirement of the observation model is to have a method to specify the distribution of species across space. Since space in this simulation is a collection of known sampling locations, the model requires information about how each species is expected to be distributed among those sampling locations.
To define these distributions, I used the empirical catch record from the chevron trap samples. I initially attempted to use the empirical data to describe density based on parametric and/or categorical relationships between environmental covariates (i.e., latitude, longitude, and depth) at each sample site and observed species catches. I framed these analyses in a glm framework using a zero-inflated count error distribution. I had very poor success finding a method that would work well for all species using parametric relationships since the functional form for each species needed to be flexible and the catch data were not adequately spread across the range of covariate values (e.g., some species only caught in deep water) to inform the estimation of parametric relationships.
I then explored breaking the covariates into categories to predict catches. Overall, this method worked better and depending on the numbers of categories for each covariate, I could do a reasonably good job explaining the empirical distribution. However, the question became how should I define the appropriate number of levels for each covariate and how should those levels be defined across the covariate space? Additionally, if my goal is only to describe species distribution rather than predict species distribution, is parsimony a concern?
After proceeding down this path, I ultimately decided that the best way to define species distribution among sampling locations is to use a completely empirical approach and compute the density of each species at each sample site based upon the sum of the total catch of that species at that site as a fraction of total catch of that species with site specific and total catch aggregated across all years. Here are a few examples of empirical density computed as described (Figures 2-9).
I think this is the best way forward, but it does come with one major caveat that will require some sensitivity analysis. If one assumes that the sampling universe is representative to inform the abundance trend of each species, then this approach is good. However, if the sampling universe is either hyperstable or hyperdepletion for a species, then this approach is biased. I think there are some reasonable ways to simulate these alternative assumptions that I can look into later.
The next step is to define the annual population trend for each species. This is done by defining a time series of proportions of a maximum arbitrary abundance. Stochasticity is included about the trend using a log-normal random variate. To scale the population abundance I begin by specifying a maximum annual abundance of 50,000. The initial annual abundance is then distributed using a multinomal process to distribute the abundance among the sampling sites with non-zero density probability. For each subsequent year, the annual abundance is distributed accross sampling sites with density probability based on the previous year density. Once the abundance has been distributed spatially and temporally, the abundance at each year/sample id combination is rescaled by the maximum observed species catch from the empirical data rounded to the nearest integer catch. This step essentially models the number of fish that would be observed (caught) in each year/sample id scaled according to the empirical data. Using the data in this way essentially scales the maximum catch in each sample id using the empirical data to extract the product of catchability and maximum abundance for each species. Additionally, distributing the abundance of fish in the following year based on the density of fish in the previous year allows for a degree of autocorrelation that seems a sensible process.
Here is are example replicate spatial and temporal distributions of gray triggerfish, gag grouper, red porgy, black sea bass, red snapper, scamp grouper, white grunt, and bank sea bass (Figues 10-17).
The observation model consists of a timeseries of annual samples and a schema to select those annual samples from the sampling universe. I examined the empirical data to determine potentially useful sampling schemas.
To determine candidate sampling schemas, I attempted several multivariate analyses to look for covariates that explained variation in the community composition of the trap catch. My reasoning was that if community composition varied predictably with particular groupings of covariate values, it could be possible to use those grouping to specify sampling strata.
I conducted a redundancy analysis (rda) to examine how much variation in the fish community composition across sites could be explained by the environmental coviariates. In this case the fish community was defined as: grey triggerfish, black sea bass, red grouper, snowy grouper, white grunt, hogfish, mutton snapper, red snapper, gray snapper, gag grouper, scamp grouper, red porgy, vermilion snapper, and yellowmouth grouper. The normalized environmental covariates were: year, set duration, latitude, longitude, depth, temperature, salinity, dissolved oxygen, and julian date. The species counts were converted from absolute to relative values using the Hellinger transformation. The proportion of variance in the fish community composition explained by the covariates was quite low at approximately 14%. The remaining variance was unexplained.
I conducted a multivariate regression tree (mrt) analyses using the same species composition as for the rda analysis and with normalized covariates: depth, latitude, and longitude. I restricted my consideration of these three covariates considering that these covariates could be used to structure a stratified sample design. The complexity parameter was used to find a parsimonious tree where additional nodes increased the amount of explained variance by less than 1% (Figure 18). Unfortunately, this structure only explained 17% of the variance (i.e., 1-Error).
Since I did not have much luck explaining a significant amount of the community composition variance using environmental covariates, I reasoned that an alternative method to efficiently focus sampling effort might be to develop a diversity weighting based upon the occurrence of some community of defined important species. For each sampling site, the mean number of species occurrence could be computed from the empirical data and used to assign a weight for that site. These weights could then be normalized to a sample inclusion probability and used to select sites via a weighted random sample. I termed this a diversity weighted random sample. The list of important community composition I used was the same as used in the above multivariate analyses: grey triggerfish, black sea bass, red grouper, snowy grouper, white grunt, hogfish, mutton snapper, red snapper, gray snapper, gag grouper, scamp grouper, red porgy, vermilion snapper, and yellowmouth grouper. The sample site inclusion probability diversity weights differ substantially from the constant inclusion probabilities for the simple random sample (Figure 19)
Gray triggerfish, gag grouper, red porgy, black sea bass, red snapper, scamp grouper, white grunt, bank sea bass were annually sampled in each of 12 years among 500 replicate samples where species abundance distribution and sample distribution were annual stochastic processes. Annual samples were set to a constant 500 and chosen from the sampling universe using either annual simple random samples (SRS) or annual diversity weighted random samples (DVS)
I computed indices for each replicate for each species and sample type (Figures 20-27). Each model used latitude, longitude, and depth for both the negative binomial and binomial portions of the zero inflated error structure.
I have not focused much on the evaluation process yet. However, a key concern in using indices in stock assessments is the CV of the index. Therefore, I computed the annual CV for each year and species for both DIV and SRS sampling schema (Figues 28-35). Considering the abundance trends and sample sizes considered thus far, the DIV schema consistently provides smaller CV compared to SRS.
I also computed performance scores based on average CV for each species and summed across all species (Figure 36).
I am relatively sure that I am not sampling the sampling stations as is done in practice. I need clarification as to how samples are chosen.
If a diversity weighted sampling schema is desirable to pursue, what species composition should be used to compute diversity? Would it be sensible to sample sites using multiple diversity indices (e.g., “Import Species” and Rare Species) and allocate a proportion of total sample size to each diversity? Another idea is to allocate some proportion of the samples to SRS and the remaining to DIV, thoughts?
I am currently scaling the observed catches for a particular species by the largest observed catch in the database. Are there problems with doing that?
What scenarios are of interest?
Species to monitor
Sample sizes
Trends
Ideas about how to evaluate performance?
Use mean CV’s across time, then compute species weighted score based on:
Biomass
Value
Expert ranking of “importance to the SA/USA”
Need for index in assessments
test