AIM : Which predictors are best explaining the Ebed algal ratio between Brown, Red and Green algae? We investigate the question using a GAM approach. We focus here on the ratio \(\frac{Ebed_{brown}}{Ebed_{red}}\), \(\frac{Ebed_{green}}{Ebed_{red}}\) and \(\frac{Ebed_{green}}{Ebed_{brown}}\) as response variables influenced by 4 main predictors: \(ag(440)\), \(chl\), \(tsm\), \(nap_{astar}\), and \(z\).

Document Note : The maps, plots and app within this document are interactive so make sure you give them a play like zooming in and out in the maps but also on the plots. Clicking on the legend allows to only select and display the time series needed.

Table of contents

  1. Synthetic data

  2. GAM - Single predictor

  3. GAM - Multiple predictors, no interaction

  4. GAM - Multiple predictors + interactions

  5. Conclusion

Synthetic data

Spectral F0, a, bb, Kd, Ebed and action spectra of Green, Brown and Red algae.

Distribution of the 1,000 sets of synthetic chl, tsm, ag440 and zz. Distribution of Ebed PAR. Distribution of EBed ratio Green/Brown, Brown/Red and Green/Red. . Observations: * Z could represent best shallower depths, maybe via uniform law * Brown always win vs Green, why this clear cut? * Red almost always win over Brown and Green. Maybe due to few shallower depths with low chl, tsm and ag440? * Why some high Ebed values?

GAM - Single predictor

vars <- read_csv('Synth_vars_RGB.csv')

vars_gam_BR <- vars %>% 
  dplyr::select(-c(Ebed_par,EbedAbs_green,EbedAbs_brown,EbedAbs_red)) %>% 
  pivot_longer(-Ebed_BR_ratio,values_to = "values",names_to = "variables")


pgam_BR <- ggplot(vars_gam_BR,aes(x=values,y=Ebed_BR_ratio)) + geom_point(color='#ff5500',alpha=.75) +
  geom_smooth(se=F, lwd=.5, color='#00aaff',method='gam') + facet_wrap(~variables, scales='free_x') +
  labs(x='') + theme_light()
pgam_BR
Figure 1 - Ration between Ebed_brown and Ebed_red as a function of its predictors. The blue lines are fitted using a GAM using single predictor.

Figure 1 - Ration between Ebed_brown and Ebed_red as a function of its predictors. The blue lines are fitted using a GAM using single predictor.

Expected negative relationship between the ratio Ebed B/R with increasing ag440, tsm and chl. In other words, red algae are doing better than brown for higher ag440, tsm and chl. Also negative relationship with depth (zz) which is expected too as red are knwon to do better at deeper depth. No relationship between Ebed B/R and n, which is expected, no sampling biaises. Interesting relationships between the Ebed B/R ratio with the 2 other ratios: When Red are doing better than Brown (low ratio), Red also do bettern than Green (Ebed GR_ratio) This is expected given the similar photosynthetic requirements/abilities of Brown and Green. Finally, when Brown do better than green, Red do better than Brown (Ebed_GB_ratio). In the next, we are going to focus on the interactive effects of chl, tsm, ag440 and depth on the Ebed RGB ratios.

AIC of single predictor model

kable(do(group_by(vars_gam_BR,variables), glance(gam(Ebed_BR_ratio ~ s(values, bs='cr'), data = .))
)%>%select(variables,AIC),caption = 'Table 1 - AIC for GAMs using single predictor.')
Table 1 - AIC for GAMs using single predictor.
variables AIC
ag440 -2298.604
chl -2183.246
Ebed_GB_ratio -2497.060
Ebed_GR_ratio -4489.205
n -2122.814
tsm -2246.249
zz -2403.018

Observations:

  • The model including zz has the lowest AIC, potentially explaining the best compared to other predictors (outside of Ebed_GR_ratio and Ebed_GB_ratio)
  • Followed by ag440, tsm, chl in increasing order of AIC

z

mod_gam_zz <- gam(Ebed_BR_ratio ~ s(zz, bs="cr"), data=vars) #cr:  cubic regression splines
summary(mod_gam_zz)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Ebed_BR_ratio ~ s(zz, bs = "cr")
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.746989   0.002291     326   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##         edf Ref.df     F p-value    
## s(zz) 5.788    6.8 48.66  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.248   Deviance explained = 25.2%
## GCV = 0.0052852  Scale est. = 0.0052493  n = 1000

Deviance explained: 25.2%

tsm

mod_gam_tsm <- gam(Ebed_BR_ratio ~ s(tsm, bs="cr"), data=vars) #cr:  cubic regression splines
summary(mod_gam_tsm)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Ebed_BR_ratio ~ s(tsm, bs = "cr")
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.746989   0.002476   301.8   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##          edf Ref.df    F p-value    
## s(tsm) 7.782  8.374 17.3  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.122   Deviance explained = 12.9%
## GCV = 0.0061824  Scale est. = 0.0061281  n = 1000

Deviance explained: 12.9%

chl

mod_gam_chl <- gam(Ebed_BR_ratio ~ s(chl, bs="cr"), data=vars)
summary(mod_gam_chl)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Ebed_BR_ratio ~ s(chl, bs = "cr")
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.746989   0.002555   292.4   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##          edf Ref.df     F p-value    
## s(chl) 7.481  8.137 9.172  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.0646   Deviance explained = 7.16%
## GCV = 0.0065844  Scale est. = 0.0065286  n = 1000

Deviance explained: 7.16%

ag440

mod_gam_ag440 <- gam(Ebed_BR_ratio ~ s(ag440, bs="cr"), data=vars)
summary(mod_gam_ag440)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Ebed_BR_ratio ~ s(ag440, bs = "cr")
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.746989   0.002414   309.5   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##            edf Ref.df     F p-value    
## s(ag440) 5.849  6.724 29.83  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.165   Deviance explained =   17%
## GCV = 0.0058669  Scale est. = 0.0058267  n = 1000

Deviance explained: 17%

Observations:

  • Deviance explained by zz: 25.2%
  • Followed by ag440 (17%), tsm (12.9%), chl (7.16%)
  • Pretty low \(\rightarrow\) Need to take multiple predictors to explain ratio of Ebed BR, not surprising

GAM - Multiple predictors, no interaction

All predictors

mod_gam2 <- gam(Ebed_BR_ratio ~ s(chl, bs="cr") + s(tsm, bs="cr") + s(ag440, bs="cr") + s(zz, bs="cr"), data=vars)
summary(mod_gam2)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Ebed_BR_ratio ~ s(chl, bs = "cr") + s(tsm, bs = "cr") + s(ag440, 
##     bs = "cr") + s(zz, bs = "cr")
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.746989   0.001684   443.7   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##            edf Ref.df     F p-value    
## s(chl)   3.181  3.623 37.31  <2e-16 ***
## s(tsm)   8.326  8.770 35.57  <2e-16 ***
## s(ag440) 6.610  7.477 53.81  <2e-16 ***
## s(zz)    5.696  6.707 89.42  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.594   Deviance explained = 60.4%
## GCV = 0.0029068  Scale est. = 0.0028346  n = 1000
plot_gam(mod_gam2, ncol = 2) + ylab("Pred. Ebed B/R") + theme_light()#all predictors
Figure 2 - Ratio between Ebed_brown and Ebed_red as a function of its predictors. The red lines are fitted using a GAM using multiple predictors.

Figure 2 - Ratio between Ebed_brown and Ebed_red as a function of its predictors. The red lines are fitted using a GAM using multiple predictors.

plot_gam_3d(model = mod_gam2, main_var = tsm, second_var = chl, palette='bilbao', direction = -1) 

Figure 3 - Prediction of the ratio between Ebed_brown and Ebed_red using a GAM using multiple predictors (zz, tsm, chl, ag440). Showing chl and tsm.

#(model = mod_gam2, main_var = tsm, second_var = zz, palette='bilbao', direction = -1)
#plot_gam_3d(model = mod_gam2, main_var = chl, second_var = zz, palette='bilbao', direction = -1)
plot_gam_3d(model = mod_gam2, main_var = tsm, second_var = zz, palette='bilbao', direction = -1)

Figure 4 - Prediction of the ratio between Ebed_brown and Ebed_red using a GAM using multiple predictors (zz, tsm, chl, ag440). Showing zz and tsm.

plot_gam_3d(model = mod_gam2, main_var = ag440, second_var = zz, palette='bilbao', direction = -1)

Figure 5 - Prediction of the ratio between Ebed_brown and Ebed_red using a GAM using multiple predictors (zz, tsm, chl, ag440). Showing zz and tsm.

Observations:

  • Deviance explained by all predictors: 60.4%.
  • Deeper, higher ag440, tsm and chl favour Red over Brown.
  • What if we take interactions into account?

GAM - Multiple predictors + interactions

tsm, chl and ag440

mod_gam2_int <- gam(Ebed_BR_ratio ~ te(tsm, chl,ag440, bs='cr') + s(zz, bs = 'cr'), data=vars)
summary(mod_gam2_int)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Ebed_BR_ratio ~ te(tsm, chl, ag440, bs = "cr") + s(zz, bs = "cr")
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.746989   0.001606   465.1   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                      edf Ref.df     F p-value    
## te(tsm,chl,ag440) 57.145  66.15 16.13  <2e-16 ***
## s(zz)              5.271   6.27 98.36  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =   0.63   Deviance explained = 65.4%
## GCV = 0.002754  Scale est. = 0.0025794  n = 1000

Deviance explained: 65.4%

plot_gam_3d(model = mod_gam2_int, main_var = tsm, second_var = zz, palette='bilbao', direction = -1)

Figure 6 - Ratio between Ebed_brown and Ebed_red as a function of its predictors. The red lines are fitted using a GAM using multiple predictors + interactions.

plot_gam_3d(model = mod_gam2_int, main_var = tsm, second_var = chl, palette='bilbao', direction = -1)

Conclusion - Model comparison

AIC(mod_gam_zz,mod_gam_chl,mod_gam_tsm,mod_gam_ag440,mod_gam2,mod_gam2_int)
##                      df       AIC
## mod_gam_zz     7.787644 -2403.018
## mod_gam_chl    9.480988 -2183.246
## mod_gam_tsm    9.781875 -2246.249
## mod_gam_ag440  7.849464 -2298.604
## mod_gam2      25.812641 -3001.467
## mod_gam2_int  64.416126 -3059.008

It seems like taking into account the interaction of tsm, chl and ag440 in the GAM is what gives the highest proportion of the deviance explained. But not so much difference with the model without interaction.

We do see here some evidence that red algae do better than Brown in deeper and higher chl, tsm and ag440 conditions.

What can be changed

Changing the distribution of depth could be worth investigating, for instance having more shallow waters.

Also, here red algae are most of the time wining, even in shallow waters, whereas we should see brown winning at intermediate and low chl, tsm, ag440 waters.