Spat Performance Modeling

Javier Atalah and Deanna Elvines

2025-06-15

Modelling Spat Performace

  • Explore relationships between historical spat performance and environmental data using multiple regressions.

  • Develop a decision-support tool to help farmers adapt spat-selection and seeding practices to improve crop performance.

  • Assess if local environmental parameters influence spat performance.

  • Use outcomes to guide future monitoring efforts and forecasting in AQU2023-05.

Industry data summaries

Sanford: 1,400 obs. from 91 sites in Banks Peninsula, TOS, SI (July 2007 - April 2024)

NIML: 939 obs. from 9 sites in Coromandel between (May 2013 - June 2023)

Cedenco: 439 obs. from 15 sites in TOS (March 2018 and August 2023)

Maclab: 199 obs. from 36 sites in TOS (October 2017 - November 2024)

Map of mean PI ratios

Response variable

P:I ratio is the meters of intermediate seed divided by the metres of primary seed, corrected for blue mussel numbers.

\[ P : I \text{ ratio} = \frac{\text{Intermediated seed (m)}}{\text{Primary seed (m)}} \]

Operational variables

  • Seeded Date: date lines were seeded with fresh spat

  • Strip Date: date lines were stripped (interseeding or harvest date)

  • SiteID: farm identifier for primary seeding

  • Owner: site owner (for confidentiality purposes)

  • Region: broad grow‑out area

Cycle Time

Days between Seeded Date and Strip Date

Environmental data

  • SST – GHRSST MUR v4.1
    • 1 km resolution, daily global coverage
    • Full time series overlap with industry data
  • Chlorophyll-a
    • 4 km (multi-sensor): Full time series overlap
    • 300 m (Sentinel-3 OLCI): Higher resolution, from 2016
    • Combined data prioritises 300 m values data.marine.copernicus.eu
  • Phytoplankton Proxies
    • By size: PICO, MICRO
    • By group: DIATO, DINO, GREEN, PROKAR, PROCOHL
  • Water Clarity
    • SPM, Secchi depth (ZSD), Light attenuation (KD490)

High resolution hindcast data

  • National model: NZ Environmental Datacube (Oceanum, for MPI & PFR); public data (2010–2023); ~300–400 m nearshore resolution; coarsest of the three.

  • Marlborough model: 10-year hindcast (2009–2017) for Cook Strait/TOS (Develped by Cawtthron).

  • Southland model: 10-year hindcast (2010–2019) for Southland–Stewart Island, developed for MPI.

  • Variables extracted at 2 m intervals from 0–20 m depth: wind speed, wave height, current speed, salinity, temperature.

Modelling approach

Modelling process

  • Removed irrelevant and collinear variables.
  • Applied preprocessing (centering, scaling, transformation).
  • Fit candidate algorithms (e.g. Boosted Regression Tree, Random Forest, Generalised Additive Modelling)
  • Tuned hyperparameters automatically.
  • Assess performance based on 5-fold cross-validation.

Predictor correlations

Stochastic Gradient Boosting

Note

The final model fitted with 1,520 samples and 9 predictors captured 27% of P:I ratio variability, with average prediction errors of 0.32 units—offering moderate accuracy and consistent performance.

Partial dependence plots

CV observed vs predicted PIratio

Compete dataset predictions

Challenges

  • Models are only as good as the data they are fed

  • Missing predictor forecasts hinder operational rollout

  • Limited understanding of critical factors of spat retention

  • Numerous interacting biological, environmental and operational factors

  • Lack of data on critical factors affecting performance, such as spat origin or type, may hinder the predictions.

Next steps

  • Complete high‑resolution satellite data acquisition

  • Refine predictive models

  • Develop a simplified model using either forecasted variables or the previous month’s values, so it can be operationalised.

  • Validate models using Year 1 and Year 2 deployment data

  • Develop a prototype operational decision-support tool