MoanaNZ commissioned Cawthron to analyse mortality data associated to the transfer of oyster from Croisilles to farms several farms in the North Island.

The data provide contained a shipment data with metadata about the transferred oysters batches.

## Observations: 208
## Variables: 12
## $ Date of shipment       (time) 2016-03-07, 2016-03-07, 2016-03-07, 20...
## $ Delivery_ID            (chr) "D1", "D2", "D3", "D4", "D5", "D6", "D7...
## $ Date on farm           (time) 2016-03-10, 2016-03-09, 2016-03-09, 20...
## $ Sites                  (chr) "Coromandel (PMF)", "Orongo (PMF)", "Wh...
## $ Batch                  (chr) "3NMay14", "3NFeb14", "3NFeb14", "2NFeb...
## $ Grade                  (dbl) 35, 35, 35, 70, 70, 70, 50, 50, 50, 50,...
## $ g/ind                  (dbl) 9.778357, 9.806107, 9.806107, 41.666667...
## $ Shell blower           (chr) "0", "0", "0", "0", "0", "0", "0", "0",...
## $ Days since grading min (dbl) 11, 11, 11, 14, 18, 18, 14, 13, 14, 13,...
## $ Days since grading max (dbl) 11, 11, 11, 14, 18, 18, 17, 27, 17, 27,...
## $ Days since grade med   (dbl) 11.0, 11.0, 11.0, 14.0, 18.0, 18.0, 15....
## $ # Bags                 (dbl) 1040, 480, 1012, 443, 411, 345, 915, 38...

The sentinel spreadsheet contained oyster delivery and post-transfer count data. Total number of oyster was considered as number of live oyster in the delivery. Dead data was separated between delivery and post-delivery data. The two data sets were split and then joint by Sentinel_ID.

## Observations: 945
## Variables: 8
## $ Delivery_ID (chr) "D1", "D1", "D1", "D1", "D1", "D1", "D1", "D1", "D...
## $ Date        (time) 2016-03-23, 2016-03-23, 2016-03-23, 2016-03-23, 2...
## $ Site        (chr) "Coromandel (PMF)", "Coromandel (PMF)", "Coromande...
## $ Sentinel_ID (chr) "T1", "T2", "T3", "T4", "T5", "T6", "T7", "T8", "T...
## $ Live        (dbl) 164, 183, 167, 163, 176, 144, 180, 166, 144, 165, ...
## $ Dead        (dbl) 24, 20, 57, 55, 24, 21, 24, 29, 35, 24, 25, 27, 23...
## $ total       (dbl) 189, 203, 216, 197, 191, 161, 206, 221, 159, 157, ...
## $ dead_trans  (dbl) 31, 7, 6, 16, 28, 46, 7, 9, 35, 42, 15, 12, 14, 12...

Then the shipment data is joint with the sentinel data by ‘Delivery_ID’. Several variables were created:

  1. Transition time was calculated as Date on farm-Date of shipment.
  2. Count days was define as the number of days from delivery to the mortality assessment was calculated as Date - Date on farm.
  3. Month was extracted from the Date variable to account for seasonal effects.
  4. Mortality was calculated as a percentage of the total number of oysters.
  5. Mortality rate was calculated as mortality/count days (individuals per day) and used as main response variable in the subsequent analysis.
  6. The variable ‘ployidy’ was obtained from batch to compared 2N vs. 3N
  7. Biomass was calculated as g/ind*total.

Some other variables were renamed to facilitate code writing: grad_min,grad_max, grad_med, weight and bags.

## Observations: 945
## Variables: 29
## $ Delivery_ID      (chr) "D1", "D1", "D1", "D1", "D1", "D1", "D1", "D1...
## $ Date             (time) 2016-03-23, 2016-03-23, 2016-03-23, 2016-03-...
## $ Site             (chr) "Coromandel (PMF)", "Coromandel (PMF)", "Coro...
## $ Sentinel_ID      (chr) "T1", "T2", "T3", "T4", "T5", "T6", "T7", "T8...
## $ Live             (dbl) 164, 183, 167, 163, 176, 144, 180, 166, 144, ...
## $ Dead             (dbl) 24, 20, 57, 55, 24, 21, 24, 29, 35, 24, 25, 2...
## $ total            (dbl) 189, 203, 216, 197, 191, 161, 206, 221, 159, ...
## $ dead_trans       (dbl) 31, 7, 6, 16, 28, 46, 7, 9, 35, 42, 15, 12, 1...
## $ Date of shipment (time) 2016-03-07, 2016-03-07, 2016-03-07, 2016-03-...
## $ Date on farm     (time) 2016-03-10, 2016-03-10, 2016-03-10, 2016-03-...
## $ Sites            (chr) "Coromandel (PMF)", "Coromandel (PMF)", "Coro...
## $ Batch            (chr) "3NMay14", "3NMay14", "3NMay14", "3NMay14", "...
## $ Grade            (dbl) 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 3...
## $ weight           (dbl) 9.778357, 9.778357, 9.778357, 9.778357, 9.778...
## $ Shell blower     (chr) "0", "0", "0", "0", "0", "0", "0", "0", "0", ...
## $ grad_min         (dbl) 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 1...
## $ grad_max         (dbl) 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 1...
## $ grad_med         (dbl) 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 1...
## $ bags             (dbl) 1040, 1040, 1040, 1040, 1040, 1040, 1040, 104...
## $ trans_time       (dbl) 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
## $ count_days       (dbl) 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 1...
## $ month            (dbl) 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
## $ mort             (dbl) 12.765957, 9.852217, 25.446429, 25.229358, 12...
## $ mort_rate        (dbl) 0.9819967, 0.7578628, 1.9574176, 1.9407198, 0...
## $ ploidy           (chr) "3N", "3N", "3N", "3N", "3N", "3N", "3N", "3N...
## $ Age              (chr) "14", "14", "14", "14", "14", "14", "14", "14...
## $ blower           (fctr) 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ biomass          (dbl) 1848.110, 1985.007, 2112.125, 1926.336, 1867....
## $ ratio            (dbl) 1.0053191, 1.0000000, 0.9642857, 0.9036697, 0...

Data quality assurance

First the the consistency between the total data at delivery and post-transfer was checked by plotting it in relation to the one-to-one relationship. It seems that there a few points with relatively large difference between these two counts. The are a couple of outliers with total counts >600 individuals.

Check number of days between delivery and assessment (‘Assessment time’). The median is 14 days and the max. is 45 (outlier). There are a couple of negative values which do not make sense.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -48.00   12.00   14.00   15.26   17.00   45.00
## Source: local data frame [1 x 5]
## 
##   Delivery_ID       Date             Site Sentinel_ID count_days
##         (chr)     (time)            (chr)       (chr)      (dbl)
## 1        D145 2016-09-16 Terepo Fisheries        S897         45

Check number of transit days. The median number of transit time is two days. Again there is some negative values that need to be removed.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -2.000   2.000   2.000   2.488   3.000   9.000

Finally, check the ration between total and Live+Dead. A ratio of 1 is expect for consistent values, however this ration ranges between 0.35 and 95, indicating inconsistencies in the data.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.3488  1.0000  1.0000  1.0190  1.0000  9.5560       2

Data cleaning

Based on data assurance checks a subset of ‘clean’ data was created for the analyses as follow:

  1. Delete observations with count_days >0 and <40
  2. Total number of individuals <400
  3. Ration of total : live+dead >0.9 & <1.1, i.e. <10% of differences between delivery and post-delivery counts.

Re-sketch data based on the ‘clean’ data-set.

Data exploration

Response variable: Mortality rate

First check the distribution of the response variable, i.e. Mortality rate (individuals per day). Mortality rate is an strictly positive continuous variable, showing some degree of right skewness. There is only 5% of zero data mortality data, thus this is not an consideration for the modelling process. This type of data is best modeled using regression models with Gamma errors and a log link. Mortality rate ranged between a minimum of zero and a maximum 3.72 and had a median value of 0.33 ind./day.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1591  0.3330  0.4830  0.5955  3.7290
## mort_rate 
##  3.959276

Relationship between mortality rate and explanatory variables

Site

Oysters were transferred into twelve different sites showing considerably variability in mortality rates. Smallest mortality rates (< 0.1 ind. per day) were recorded at Terepo Fisheries, AV & GD Clifford and Robert Rush, whereas largest mortality rates (>0.4 ind. per day) were recorded at NZ Oyster Co, Coromandel (PMF) and Orongo (PMF). Additionally, Coromandel (PMF) showed the largest variability in mortality rates.

Variance inflation factor

Multi-collinearity among predictor variables (the existence of correlation among predictor variables) can cause issues in regression models by inflation standard errors of the parameters, which means that P-values get larger making it more difficult to detect and effect. As a rule of thumb the variance inflation factor (VIF) should be <5 to be included in the regression analyses, which was the case for all predictor variables in the oyster data-set.

##                GVIF Df GVIF^(1/2Df)
## Site       3.828573 10     1.069429
## grad_min   2.686579  1     1.639079
## grad_max   2.602297  1     1.613164
## bags       2.477343  1     1.573958
## trans_time 1.426177  1     1.194227
## month      2.691661  1     1.640628
## ploidy     1.600759  1     1.265211
## Age        2.141074  2     1.209645
## blower     4.541015  3     1.286843
## biomass    2.102443  1     1.449980

Multiple regression model

A generalised linear model with Gamma errors and a log link was used to model mortality rate data in relation to ten explanatory variables:

  1. Sites as categorical with eleven levels (site names)
  2. Ploidy as fixed two levels (2N and 3N)
  3. Transition time as continuous (days)
  4. Days since grading min (grad_min) as continuous (days)
  5. Days since grading max (grad_max) as continuous (days)
  6. Blower as fixed with four levels:
  • 0 = no blower
  • 1 = through blower on the day of shipment
  • 2 = blower on the day just prior to the shipment
  • 3 = blower a long time before shipment)
  1. Biomass as continuous (g),
  2. Bags as continuous (count of number of bags)
  3. Age as fixed with three levels:
  • 13
  • 14
  • 15
  1. Month (seasonal effect) as categorical with seven levels: 3,5,6,7,8,9,10.

When factors are categorical the first levels is used as a baseline for the comparison in the analyses (and summary table). For example, for the effect of Site tests are done using ‘AV & GD Clifford’ as a baseline for the comparison. Similarly, for blower 0, ploidy 2N, Age 13 and month 3 are use as baseline levels for the respective factors tests. The log of total number of oysters was used a weighting factor in the regression model to give more importance to mortality rates that were calculated from a larger number of individuals.

## GAMLSS-RS iteration 1: Global Deviance = 783.5535 
## GAMLSS-RS iteration 2: Global Deviance = 783.5532
## Start:  AIC= 841.55 
##  mort_rate + 0.01 ~ Site + trans_time + ploidy + grad_min + grad_max +  
##     blower + biomass + bags + Age + factor(month)
## *******************************************************************
## Family:  c("GA", "Gamma") 
## 
## Call:  gamlss(formula = mort_rate + 0.01 ~ Site + trans_time + ploidy +  
##     grad_min + grad_max + blower + bags + Age + factor(month),  
##     family = GA, data = na.omit(moana1), weights = log(total),  
##     trace = FALSE) 
## 
## Fitting method: RS() 
## 
## -------------------------------------------------------------------
## Mu link function:  log
## Mu Coefficients:
##                                 Estimate  Std. Error t value Pr(>|t|)    
## (Intercept)                  -1.26747298  0.18848381  -6.725 3.21e-11 ***
## SiteCoromandel (PMF)          2.16738450  0.14696183  14.748  < 2e-16 ***
## SiteKerikeri (PMF)            1.54878159  0.15486092  10.001  < 2e-16 ***
## SiteNZ Oyster Co              2.43247382  0.16221726  14.995  < 2e-16 ***
## SiteOrongo (PMF)              2.30133525  0.14904895  15.440  < 2e-16 ***
## SiteOwen Robertson            1.30147809  0.23503344   5.537 4.08e-08 ***
## SitePacific Oyster Farms Ltd  1.79705233  0.15564105  11.546  < 2e-16 ***
## SiteParua Bay Oysters         1.32116986  0.16780675   7.873 1.04e-14 ***
## SiteRobert Rush               1.32225040  0.17351398   7.620 6.70e-14 ***
## SiteTerepo Fisheries          1.47198516  0.15116214   9.738  < 2e-16 ***
## SiteWhangaroa (PMF)           1.97001921  0.14791613  13.318  < 2e-16 ***
## trans_time                   -0.04435479  0.01388482  -3.194  0.00145 ** 
## ploidy3N                      0.08674168  0.04081371   2.125  0.03385 *  
## grad_min                      0.00595170  0.00183351   3.246  0.00122 ** 
## grad_max                     -0.00869328  0.00143890  -6.042 2.27e-09 ***
## blower1                       0.62126091  0.11221466   5.536 4.11e-08 ***
## blower2                       0.50794562  0.05663732   8.968  < 2e-16 ***
## blower3                      -0.59634840  0.05137030 -11.609  < 2e-16 ***
## bags                         -0.00034046  0.00004149  -8.205 8.42e-16 ***
## Age14                        -0.12994892  0.08539118  -1.522  0.12843    
## Age15                        -0.36528237  0.08285842  -4.409 1.17e-05 ***
## factor(month)5               -0.96974940  0.08454003 -11.471  < 2e-16 ***
## factor(month)6               -1.19854013  0.08169554 -14.671  < 2e-16 ***
## factor(month)7               -0.83725732  0.08308038 -10.078  < 2e-16 ***
## factor(month)8               -0.74956525  0.08371484  -8.954  < 2e-16 ***
## factor(month)9               -0.62968065  0.09500368  -6.628 6.01e-11 ***
## factor(month)10              -1.05075553  0.10447942 -10.057  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## -------------------------------------------------------------------
## Sigma link function:  log
## Sigma Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.234803   0.009649  -24.33   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## -------------------------------------------------------------------
## No. of observations in the fit:  884 
## Degrees of Freedom for the fit:  28
##       Residual Deg. of Freedom:  856 
##                       at cycle:  2 
##  
## Global Deviance:     783.7944 
##             AIC:     839.7944 
##             SBC:     973.7591 
## *******************************************************************

The minimal parsimonious model was then selected based on a step-wise process based on the Akaike Information Criteria. This process eliminate factors that do not contribute to explain the variability in mortality rate.

Model validation

The models assumptions are then checked by visually inspecting the Pearson residuals of the model, which show no evident patterns in the residuals in relation to the fitted values (e.g. non-linear patterns). The equally spread residuals around zero without distinct patterns, is a good indication of the absence of non-linear relationships.

The densirty estiamte and Q-Q normal plots show the residuals have bell shaped distributiona and are lined well on the straight red line, i.e. residuals are normally distributed.

## *******************************************************************
##        Summary of the Quantile Residuals
##                            mean   =  0.0006235028 
##                        variance   =  1.011227 
##                coef. of skewness  =  -0.03773682 
##                coef. of kurtosis  =  0.6268597 
## Filliben correlation coefficient  =  0.9976007 
## *******************************************************************

Model interpretation

Given the minimal model met the assumptions satisfactorily, it can be use to make inferences about the factors affecting oyster mortality associated to the transport into North Island farms.

The minimal regression model indicates significant effects for Site, grad_max, blower, bags, Age and month as indicated by the summary table below. Even though not significant ‘Transition time’ and ‘grad_min’ were retained in the model.

It is easier to visualize the point estimates and measures of uncertainty of the fitted model using the coefficient plot. Negative coefficient has a negative effect on the response variable (i.e. mortality rate) and vice-versa. The magnitude of the coefficient if relative to the effect size, i.e. larger coefficient (either negative an positive) have comparatively larger effects on the response. The confidence interval around the coefficient provides and indication of the uncertainty of the effect and when they overlap with zero they tend to be non-significant or have small effect size.

The coefficient plot below shows that month of the year has a relative large effect, with all months having a negative effect in mortality in relation to the baseline month 3. In other words, March has significantly higher mortality rates compared to all other months.

The effect of Age was marginal (near non-significant) as indicated by the overlap of the confidence interval with zero. The is a small lower mortality for batches from 14 and 15 compared to 13 that was used as a baseline for the comparisons.

The coefficient for Number of bags was virtually zero, even though significant in the model, it doesn’t have an perceivable effect on mortality rates.

Blower had a significant effect with oysters that went through the machine either on the on the day of shipment (1) or on the day just prior to the shipment (2) had significantly higher mortality rates compared to those that did not went through the blower (0, used as baseline) or went through the blower a long time before shipment. This effect makes sense as the blower could be viewed as a source of stress for the oyster. However, this may be confounded by the fact that oysters that batches with low mortality do not go through the blower to remove dead shells?

The effects of days since grading max and min, and transition time are virtually negligible although were kept in the model during the selection process.

Finally, Site has a large significant effect with all sites showing a positive effect in mortality compared to the baseline level ‘AV & GD Clifford’. Consistent with the data exploration NZ Oyster Co, Coromandel (PMF) and Orongo (PMF) had the largest effect sizes. The large and significant effect of Site can be indicative of a range of other factors that are not available in the data-set, including oyster handling at the farm and environmental variables.

## *******************************************************************
## Family:  c("GA", "Gamma") 
## 
## Call:  gamlss(formula = mort_rate + 0.01 ~ Site + trans_time + ploidy +  
##     grad_min + grad_max + blower + bags + Age + factor(month),  
##     family = GA, data = na.omit(moana1), weights = log(total),  
##     trace = FALSE) 
## 
## Fitting method: RS() 
## 
## -------------------------------------------------------------------
## Mu link function:  log
## Mu Coefficients:
##                                 Estimate  Std. Error t value Pr(>|t|)    
## (Intercept)                  -1.26747298  0.18848381  -6.725 3.21e-11 ***
## SiteCoromandel (PMF)          2.16738450  0.14696183  14.748  < 2e-16 ***
## SiteKerikeri (PMF)            1.54878159  0.15486092  10.001  < 2e-16 ***
## SiteNZ Oyster Co              2.43247382  0.16221726  14.995  < 2e-16 ***
## SiteOrongo (PMF)              2.30133525  0.14904895  15.440  < 2e-16 ***
## SiteOwen Robertson            1.30147809  0.23503344   5.537 4.08e-08 ***
## SitePacific Oyster Farms Ltd  1.79705233  0.15564105  11.546  < 2e-16 ***
## SiteParua Bay Oysters         1.32116986  0.16780675   7.873 1.04e-14 ***
## SiteRobert Rush               1.32225040  0.17351398   7.620 6.70e-14 ***
## SiteTerepo Fisheries          1.47198516  0.15116214   9.738  < 2e-16 ***
## SiteWhangaroa (PMF)           1.97001921  0.14791613  13.318  < 2e-16 ***
## trans_time                   -0.04435479  0.01388482  -3.194  0.00145 ** 
## ploidy3N                      0.08674168  0.04081371   2.125  0.03385 *  
## grad_min                      0.00595170  0.00183351   3.246  0.00122 ** 
## grad_max                     -0.00869328  0.00143890  -6.042 2.27e-09 ***
## blower1                       0.62126091  0.11221466   5.536 4.11e-08 ***
## blower2                       0.50794562  0.05663732   8.968  < 2e-16 ***
## blower3                      -0.59634840  0.05137030 -11.609  < 2e-16 ***
## bags                         -0.00034046  0.00004149  -8.205 8.42e-16 ***
## Age14                        -0.12994892  0.08539118  -1.522  0.12843    
## Age15                        -0.36528237  0.08285842  -4.409 1.17e-05 ***
## factor(month)5               -0.96974940  0.08454003 -11.471  < 2e-16 ***
## factor(month)6               -1.19854013  0.08169554 -14.671  < 2e-16 ***
## factor(month)7               -0.83725732  0.08308038 -10.078  < 2e-16 ***
## factor(month)8               -0.74956525  0.08371484  -8.954  < 2e-16 ***
## factor(month)9               -0.62968065  0.09500368  -6.628 6.01e-11 ***
## factor(month)10              -1.05075553  0.10447942 -10.057  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## -------------------------------------------------------------------
## Sigma link function:  log
## Sigma Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.234803   0.009649  -24.33   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## -------------------------------------------------------------------
## No. of observations in the fit:  884 
## Degrees of Freedom for the fit:  28
##       Residual Deg. of Freedom:  856 
##                       at cycle:  2 
##  
## Global Deviance:     783.7944 
##             AIC:     839.7944 
##             SBC:     973.7591 
## *******************************************************************

Partial plots

Final model is presented as partial effects plots, which show the effect of each predictor variable conditional to others in the model. The partial effects of each predictor are displayed as regression lines or boxplots, in the case of categorical variables, showing either negative or positive effects relative to the overall mean of the response variable centred on zero. Plot amplitude (the range of the marginal response on the Y-axis) is directly related to a predictor variable’s importance; amplitude is large for predictor variables with high importance. Partial plots also show standard errors around the fitted spline and partial residuals for each observation.

Model performance

Model performance was summarized using four statistics; regression R2, Nash-Sutcliffe efficiency (NSE), percent bias (PBIAS) and the relative root mean square error (RSR).

The regression R2 value is the coefficient of determination derived from a regression of the observations against the predictions. The model had a an R2 value of 0.3 indicating that its predictive power is weak.

The model had an NSE values of 0.3, which indicates how closely the observations coincide with predictions. NSE values range from −∞ to 1. An NSE of 1 corresponds to a perfect match between predictions and the observations. An NSE of 0 indicates the model is only as accurate as the mean of the observed data and values less than 0 indicate the model predictions are less accurate than using the mean of the observed data. As a rule of thumb a satisfactory predictive model NSE should be >0.5, thus this confirm the weak predictive power of the model.

Bias measures the average tendency of the predicted values to be larger or smaller than the observed values. Optimal bias is zero, positive values indicate underestimation bias and negative values indicate overestimation bias. The oyster mortality model had a percentage bias of 2.7%, indicative of a very small overestimation in the predictions.

RSR is calculated as the ratio of the root mean square error to the standard deviation of the observations a measure of the characteristic model uncertainty. A rule of thumb is that model predictions are satisfactory if RSR < 0.70, thus again it confirms the weak predictive ability of the model.

R2 NSE PBIAS % RSR
0.3 0.3 2.7 0.83