AIM : How to best predict Ebed algal (Brown, Red and Green) using (Ebed PAR int?) Ebed at different lambda (443, 488, 531, 555, 645)? We investigate the question using a linear multiple regression and GAM approach. Synthetic data generated by synthetic+Ebedspectral_29.pro from Matt Pinkerton and synth_EbedAlgal_05.hdf.

Document Note : Some of the plots within this document are interactive so make sure you give them a play like zooming in and out.

Table of contents

  1. Synthetic data

  2. Predicting E algal Brown

  3. Predicting E algal Red

  4. Predicting E algal Green

  5. Conclusion

Synthetic data

vars <- read_csv('Synth_Ealgal_Elam_Kdlam_EbedInt_05.csv') %>% 
  dplyr::select(n,Ebed_443,Ebed_488,Ebed_531,Ebed_555,Ebed_645,Ebed_par,EbedAction_green,EbedAction_brown,EbedAction_red)

p <- vars %>% 
  pivot_longer(-n,names_to = 'Vars',values_to = 'Values') %>%
  ggplot(data=.) +
  geom_histogram(aes(x=Values)) +
  facet_wrap(~Vars, scales = 'free') +
  theme_light()
p
Figure 1 - Distribution of variables.

Figure 1 - Distribution of variables.

Observations:

vars_cor<-rcorr(as.matrix(vars))
corrplot(vars_cor$r,type="upper", method="number")
Figure 2 - Correlation of variables.

Figure 2 - Correlation of variables.

#GGally::ggpairs(vars, title="correlogram with ggpairs()") #-> Sick Viz but too heavy

Observations:

Predicting E algal Brown

Linear regression

vars_brown <- vars %>% 
  pivot_longer(-EbedAction_brown,names_to = 'Vars',values_to = 'Values')
  
p <- ggplot(data=vars_brown,aes(x=Values,y=EbedAction_brown)) +
  geom_point() +
  facet_wrap(~Vars, scales = 'free') +
  geom_smooth(method = 'lm') +
  geom_smooth(method = 'lm', formula = y~ poly(x,2), col='red') +
  theme_light()
p
Figure 3 - Linear model to explain E algal Brown.

Figure 3 - Linear model to explain E algal Brown.

kable(do(group_by(vars_brown,Vars), glance(lm(EbedAction_brown ~ Values, data = .)))
 ,caption = 'Table 1 - Linear model.')
Table 1 - Linear model.
Vars r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
Ebed_443 0.9325291 0.9325224 81.942676 1.381845e+05 0.0000000 1 -58248.58 116503.17 116524.80 67132591.5 9998 10000
Ebed_488 0.9683271 0.9683240 56.143003 3.056666e+05 0.0000000 1 -54467.41 108940.81 108962.44 31514063.7 9998 10000
Ebed_531 0.9326838 0.9326771 81.848700 1.385250e+05 0.0000000 1 -58237.11 116480.22 116501.85 66978698.0 9998 10000
Ebed_555 0.9162009 0.9161925 91.321200 1.093112e+05 0.0000000 1 -59332.21 118670.43 118692.06 83378937.0 9998 10000
Ebed_645 0.8803794 0.8803675 109.107584 7.358295e+04 0.0000000 1 -61111.73 122229.46 122251.09 119020839.5 9998 10000
Ebed_par 0.9966446 0.9966443 18.273502 2.969709e+06 0.0000000 1 -43242.91 86491.81 86513.44 3338540.9 9998 10000
EbedAction_green 0.9991750 0.9991749 9.060987 1.210898e+07 0.0000000 1 -36228.17 72462.33 72483.96 820850.7 9998 10000
EbedAction_red 0.9881327 0.9881315 34.365981 8.324821e+05 0.0000000 1 -49559.06 99124.11 99145.74 11807844.8 9998 10000
n 0.0000001 -0.0001000 315.465592 6.246000e-04 0.9800625 1 -71728.88 143463.76 143485.39 994986362.0 9998 10000

Observations:

  • Very good linear relationships with Ebed_par but we remove it from the next as we want to use Ebed(\(\lambda\)) to predict Ealgal
  • Good-ish linear relationships with Ebed(\(\lambda\)). Some more expo fit-like.
  • Also with EbedAction Green and Red, not used in the next. We are also interested in explaining deviation in their relationships, but another story.
kable(do(group_by(vars_brown,Vars), glance(lm(EbedAction_brown ~ poly(Values,2), data = .)))
 ,caption = 'Table 1bis - Quadratic model.')
Table 1bis - Quadratic model.
Vars r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
Ebed_443 0.9518384 0.9518288 69.23475 9.878753e+04 0.000000 2 -56562.91 113133.83 113162.67 47920125 9997 10000
Ebed_488 0.9791455 0.9791413 45.55898 2.346855e+05 0.000000 2 -52377.96 104763.93 104792.77 20749982 9997 10000
Ebed_531 0.9915927 0.9915910 28.92699 5.895411e+05 0.000000 2 -47835.63 95679.27 95708.11 8365195 9997 10000
Ebed_555 0.9876471 0.9876446 35.06378 3.996424e+05 0.000000 2 -49759.57 99527.14 99555.98 12290995 9997 10000
Ebed_645 0.9001250 0.9001051 99.70158 4.504908e+04 0.000000 2 -60209.70 120427.40 120456.24 99374233 9997 10000
Ebed_par 0.9977968 0.9977964 14.80806 2.263772e+06 0.000000 2 -41139.60 82287.21 82316.05 2192130 9997 10000
EbedAction_green 0.9994758 0.9994757 7.22296 9.530793e+06 0.000000 2 -33960.53 67929.07 67957.91 521555 9997 10000
EbedAction_red 0.9982650 0.9982647 13.14072 2.876038e+06 0.000000 2 -39945.04 79898.09 79926.93 1726268 9997 10000
n 0.0001732 -0.0000268 315.45405 8.660243e-01 0.420652 2 -71728.02 143464.03 143492.87 994814066 9997 10000

Observations:

  • Higher \(R^2\) for Ebeds with quadratic relationship, makes sense according to figure 3, red lines.

Multiple linear regression

vars_mlm_b <- lm(EbedAction_brown ~ Ebed_443 + Ebed_488 + Ebed_531 + Ebed_555 + Ebed_645, data=vars)
summary(vars_mlm_b)
## 
## Call:
## lm(formula = EbedAction_brown ~ Ebed_443 + Ebed_488 + Ebed_531 + 
##     Ebed_555 + Ebed_645, data = vars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -60.339  -1.598  -0.132   0.911  83.269 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.09201    0.10658   0.863    0.388    
## Ebed_443    513.02879    2.28142 224.873   <2e-16 ***
## Ebed_488     51.22680    4.38984  11.669   <2e-16 ***
## Ebed_531    241.61329    7.82157  30.891   <2e-16 ***
## Ebed_555    126.08296    5.65456  22.298   <2e-16 ***
## Ebed_645    490.32286    1.48224 330.799   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.466 on 9994 degrees of freedom
## Multiple R-squared:  0.9994, Adjusted R-squared:  0.9994 
## F-statistic: 3.568e+06 on 5 and 9994 DF,  p-value: < 2.2e-16

Observations:

  • All predictors significant
  • \(R^2 = 0.9994\) better than just with taking Ebed_par alone…
layout(matrix(c(1,2,3,4),2,2)) # optional 4 graphs/page
plot(vars_mlm_b)
Figure 4 - Assumptions check of linear model to explain E algal Brown.

Figure 4 - Assumptions check of linear model to explain E algal Brown.

Observations:

  • Q-Q plot suggests more extreme values than “pure” normal distribution.
  • What about using polynomial regression giving shape of figure 3?

Polynomial regression

vars_mlmp_b <- lm(EbedAction_brown ~ poly(Ebed_443,2) + poly(Ebed_488,2) + poly(Ebed_531,2) + poly(Ebed_555,2) + poly(Ebed_645,2), data=vars)
summary(vars_mlmp_b)
## 
## Call:
## lm(formula = EbedAction_brown ~ poly(Ebed_443, 2) + poly(Ebed_488, 
##     2) + poly(Ebed_531, 2) + poly(Ebed_555, 2) + poly(Ebed_645, 
##     2), data = vars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -58.454  -1.308   0.188   0.639  66.803 
## 
## Coefficients:
##                      Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)         1.743e+02  6.541e-02 2665.405  < 2e-16 ***
## poly(Ebed_443, 2)1  1.126e+04  8.801e+01  127.885  < 2e-16 ***
## poly(Ebed_443, 2)2  9.335e+02  2.106e+01   44.316  < 2e-16 ***
## poly(Ebed_488, 2)1  2.437e+03  1.581e+02   15.417  < 2e-16 ***
## poly(Ebed_488, 2)2 -2.079e+03  6.622e+01  -31.387  < 2e-16 ***
## poly(Ebed_531, 2)1  5.194e+03  3.395e+02   15.299  < 2e-16 ***
## poly(Ebed_531, 2)2  2.259e+03  1.588e+02   14.224  < 2e-16 ***
## poly(Ebed_555, 2)1  6.583e+03  2.658e+02   24.766  < 2e-16 ***
## poly(Ebed_555, 2)2 -1.420e+03  1.274e+02  -11.149  < 2e-16 ***
## poly(Ebed_645, 2)1  7.602e+03  3.954e+01  192.244  < 2e-16 ***
## poly(Ebed_645, 2)2  7.645e+01  1.169e+01    6.539  6.5e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.541 on 9989 degrees of freedom
## Multiple R-squared:  0.9996, Adjusted R-squared:  0.9996 
## F-statistic: 2.325e+06 on 10 and 9989 DF,  p-value: < 2.2e-16
layout(matrix(c(1,2,3,4),2,2)) # optional 4 graphs/page
plot(vars_mlmp_b)
Figure 4bis - Assumptions check of multiple linear model (polynomial) to explain E algal Brown.

Figure 4bis - Assumptions check of multiple linear model (polynomial) to explain E algal Brown.

GAM

p <- vars %>% 
  dplyr::select(Ebed_443,Ebed_488,Ebed_531,Ebed_555,Ebed_645,EbedAction_brown) %>% 
  pivot_longer(-EbedAction_brown) %>% 
  ggplot(data=.,aes(x=value,y=EbedAction_brown)) +
     geom_point() +
     facet_wrap(~name, scales = 'free') +
     geom_smooth(method = 'gam') +
     theme_light()
p
Figure 5 - GAM model to explain E algal Brown. No interactions.

Figure 5 - GAM model to explain E algal Brown. No interactions.

vars_brown <- vars %>% 
  dplyr::select(Ebed_443,Ebed_488,Ebed_531,Ebed_555,Ebed_645,EbedAction_brown) %>% 
  pivot_longer(-EbedAction_brown)

kable(do(group_by(vars_brown,name), glance(gam(EbedAction_brown ~ value, data = .)))
 ,caption = 'Table 2 - GAM no interactions.')
Table 2 - GAM no interactions.
name df logLik AIC BIC deviance df.residual nobs
Ebed_443 2 -58248.58 116503.2 116524.8 67132592 9998 10000
Ebed_488 2 -54467.41 108940.8 108962.4 31514064 9998 10000
Ebed_531 2 -58237.11 116480.2 116501.8 66978698 9998 10000
Ebed_555 2 -59332.21 118670.4 118692.1 83378937 9998 10000
Ebed_645 2 -61111.73 122229.5 122251.1 119020840 9998 10000

Observations:

  • Best predictor based on lowest AIC: Ebed_488, then Ebed_531, then Ebed_443, then Ebed_555, then Ebed_645.
  • What about addition and interactions?
vars_gam_b <- gam(EbedAction_brown ~ s(Ebed_443, bs='cr') + s(Ebed_488, bs='cr') + s(Ebed_531, bs='cr') + s(Ebed_555, bs='cr') + s(Ebed_645, bs='cr'), data=vars)

plot_gam(vars_gam_b, ncol = 3) + ylab("Pred. Ebed Algal Brown") + theme_light()#all predictors
Figure 6 - GAM model to explain E algal Brown.

Figure 6 - GAM model to explain E algal Brown.

Observations:

  • Ebed(443) and Ebed(645) most linear and predict highest EbedAlgalBrown. Makes sense as Blue abd Red light are the most important in action spectrum?
  • Interesting negative relationship for higher Ebed(488). How to interpret it? Higher Ebed(488) usually associated with lower Ebed(443) because of absorption by tss? So higher Ebed(488) mean sedimented water so no good for EbedAlgalBrown?
  • Prediction by Ebed(531) exponentially increasing, why?
  • Prediction by Ebed(555) plateauing, why?
summary(vars_gam_b)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## EbedAction_brown ~ s(Ebed_443, bs = "cr") + s(Ebed_488, bs = "cr") + 
##     s(Ebed_531, bs = "cr") + s(Ebed_555, bs = "cr") + s(Ebed_645, 
##     bs = "cr")
## 
## Parametric coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 174.33413    0.06478    2691   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##               edf Ref.df       F p-value    
## s(Ebed_443) 5.916  6.054 2893.64  <2e-16 ***
## s(Ebed_488) 4.342  4.805  234.32  <2e-16 ***
## s(Ebed_531) 5.875  6.409   53.73  <2e-16 ***
## s(Ebed_555) 5.287  6.041   69.85  <2e-16 ***
## s(Ebed_645) 5.373  5.641 6790.06  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =      1   Deviance explained =  100%
## GCV = 42.083  Scale est. = 41.966    n = 10000

Observations:

  • Deviance explained = 100%
  • All predictors significant
plot_gam_check(vars_gam_b) #+ theme_light()#all predictors
Figure 6bis - GAM model checl to explain E algal Brown.

Figure 6bis - GAM model checl to explain E algal Brown.

kable(concurvity(vars_gam_b))
para s(Ebed_443) s(Ebed_488) s(Ebed_531) s(Ebed_555) s(Ebed_645)
worst 0 0.9954256 0.9986489 0.9997434 0.9995971 0.9797318
observed 0 0.9940207 0.9967781 0.9996545 0.9994756 0.9739143
estimate 0 0.8599890 0.5817391 0.7848626 0.7928082 0.6262149

Observations:

  • Concurvity super high (close to 1), not surprising given correlation between predictors.
  • QQ plot looks quite bad
  • \(\rightarrow\) Shouldn’t do GAM?
plot_gam_3d(model = vars_gam_b, main_var = Ebed_443, second_var = Ebed_488, palette='bilbao', direction = 1)

Figure 6bis - Prediction of Ebed_brown using a GAM using multiple predictors (Ebed_443, Ebed_488, Ebed_531 ,Ebed_555, Ebed_645). Showing Ebed_4443 and Ebed_488.

Observations:

  • Xx
plot_gam_3d(model = vars_gam_b, main_var = Ebed_488, second_var = Ebed_531, palette='bilbao', direction = 1)

Figure 6bis - Prediction of Ebed_brown using a GAM using multiple predictors (Ebed_443, Ebed_488, Ebed_531 ,Ebed_555, Ebed_645). Showing Ebed_488 and Ebed_531.

Observations:

  • Interesting shape, how to explain it?

Also, see if worth including interactions?

Predicting E algal red

Linear regression

vars_red <- vars %>% 
  pivot_longer(-EbedAction_red,names_to = 'Vars',values_to = 'Values')
  
p <- ggplot(data=vars_red,aes(x=Values,y=EbedAction_red)) +
  geom_point() +
  facet_wrap(~Vars, scales = 'free') +
  geom_smooth(method = 'lm') +
  geom_smooth(method = 'lm', formula = y~ poly(x,2), col='red') +
  theme_light()
p
Figure 7 - Linear model to explain E algal Red

Figure 7 - Linear model to explain E algal Red

kable(do(group_by(vars_red,Vars), glance(lm(EbedAction_red ~ Values, data = .)))
 ,caption = 'Table 2 - Linear model.')
Table 2 - Linear model.
Vars r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
Ebed_443 0.8778354 0.8778232 105.34180 7.184241e+04 0.0000000 1 -60760.49 121526.98 121548.61 110946760 9998 10000
Ebed_488 0.9739178 0.9739152 48.67438 3.733281e+05 0.0000000 1 -53039.91 106085.83 106107.46 23687216 9998 10000
Ebed_531 0.9735840 0.9735813 48.98485 3.684845e+05 0.0000000 1 -53103.50 106212.99 106234.62 23990353 9998 10000
Ebed_555 0.9640148 0.9640112 57.17286 2.678386e+05 0.0000000 1 -54649.18 109304.36 109325.99 32680820 9998 10000
Ebed_645 0.8365580 0.8365417 121.84561 5.117355e+04 0.0000000 1 -62215.93 124437.87 124459.50 148433833 9998 10000
Ebed_par 0.9941128 0.9941122 23.12508 1.688256e+06 0.0000000 1 -45597.56 91201.13 91222.76 5346624 9998 10000
EbedAction_brown 0.9881327 0.9881315 32.83256 8.324821e+05 0.0000000 1 -49102.59 98211.19 98232.82 10777617 9998 10000
EbedAction_green 0.9839769 0.9839753 38.15060 6.139759e+05 0.0000000 1 -50603.80 101213.60 101235.23 14551774 9998 10000
n 0.0000034 -0.0000966 301.38895 3.399840e-02 0.8537139 1 -71272.40 142550.80 142572.43 908171328 9998 10000

Observations:

  • Very good linear relationships with Ebed_par
  • Also with EbedAction Green and Brown, not used in the next. We are also interested in explaining deviation in their relationships, but another story.
  • Less Good-ish linear relationships with Ebed(\(\lambda\)). Some more expo fit-like.

Multiple linear regression

vars_mlm_r <- lm(EbedAction_red ~ Ebed_443 + Ebed_488 + Ebed_531 + Ebed_555 + Ebed_645, data=vars)
summary(vars_mlm_r)
## 
## Call:
## lm(formula = EbedAction_red ~ Ebed_443 + Ebed_488 + Ebed_531 + 
##     Ebed_555 + Ebed_645, data = vars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -86.883  -1.623   0.513   1.083  55.130 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.0832     0.1109  -9.766  < 2e-16 ***
## Ebed_443    152.4520     2.3743  64.210  < 2e-16 ***
## Ebed_488    245.1243     4.5685  53.655  < 2e-16 ***
## Ebed_531     48.5699     8.1400   5.967  2.5e-09 ***
## Ebed_555    373.1007     5.8847  63.401  < 2e-16 ***
## Ebed_645    364.3895     1.5426 236.222  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.77 on 9994 degrees of freedom
## Multiple R-squared:  0.9993, Adjusted R-squared:  0.9993 
## F-statistic: 3.006e+06 on 5 and 9994 DF,  p-value: < 2.2e-16

Observations:

  • \(R^2 = 0.9993\) better than just with Ebed_par alone…
  • All predictors significant
layout(matrix(c(1,2,3,4),2,2)) # optional 4 graphs/page
plot(vars_mlm_r)
Figure 8 - Assumptions check of linear model to explain E algal Red

Figure 8 - Assumptions check of linear model to explain E algal Red

Observations:

  • Q-Q plot suggests more extreme values than “pure” normal distribution.

GAM

vars_gam_r <- gam(EbedAction_red ~ s(Ebed_443, bs='cr') + s(Ebed_488, bs='cr') + s(Ebed_531, bs='cr') + s(Ebed_555, bs='cr') + s(Ebed_645, bs='cr'), data=vars)

plot_gam(vars_gam_r, ncol = 2) + ylab("Pred. Ebed Algal Red") + theme_light()#all predictors
Figure 9 - GAM model to explain E algal Red

Figure 9 - GAM model to explain E algal Red

Observations:

  • Ebed(443) and Ebed(645) most linear and predict highest EbedAlgalBrown. Makes sense as Blue abd Red light are the most important in action spectrum?
  • Prediction by Ebed(488) plateauing, why?
  • Prediction by Ebed(531) exponentially increasing, why?
  • Prediction by Ebed(555) plateauing, why? Perdiction higher than for Brown which makes sense given action spectrum of red with peak around 555nm.
summary(vars_gam_r)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## EbedAction_red ~ s(Ebed_443, bs = "cr") + s(Ebed_488, bs = "cr") + 
##     s(Ebed_531, bs = "cr") + s(Ebed_555, bs = "cr") + s(Ebed_645, 
##     bs = "cr")
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 187.1534     0.0742    2522   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##               edf Ref.df       F p-value    
## s(Ebed_443) 5.543  5.805  305.63  <2e-16 ***
## s(Ebed_488) 3.812  4.312  282.51  <2e-16 ***
## s(Ebed_531) 5.729  6.286   33.69  <2e-16 ***
## s(Ebed_555) 5.571  6.261  150.56  <2e-16 ***
## s(Ebed_645) 5.619  5.872 2759.95  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.999   Deviance explained = 99.9%
## GCV = 55.214  Scale est. = 55.063    n = 10000

Observations:

  • Deviance explained = 99.9%
  • All predictors significant
plot_gam_3d(model = vars_gam_r, main_var = Ebed_488, second_var = Ebed_531, palette='bilbao', direction = 1)

Figure 9bis - Prediction of Ebed_red using a GAM using multiple predictors (Ebed_443, Ebed_488, Ebed_531, Ebed_555, Ebed_645). Showing Ebed_488 and Ebed_531.

Observations:

  • How to interpret it?

Predicting E algal Green

Linear regression

vars_green <- vars %>% 
  pivot_longer(-EbedAction_green,names_to = 'Vars',values_to = 'Values')
  
p <- ggplot(data=vars_green,aes(x=Values,y=EbedAction_green)) +
  geom_point() +
  facet_wrap(~Vars, scales = 'free') +
  geom_smooth(method = 'lm') +
  geom_smooth(method = 'lm', formula = y~ poly(x,2), col='red') +
  theme_light()
p
Figure 10 - Linear model to explain E algal Green

Figure 10 - Linear model to explain E algal Green

kable(do(group_by(vars_green,Vars), glance(lm(EbedAction_green ~ Values, data = .)))
 ,caption = 'Table 3 - Linear model.')
Table 3 - Linear model.
Vars r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
Ebed_443 0.9342311 0.9342245 82.65379 1.420191e+05 0.0000000 1 -58334.99 116675.98 116697.62 68302833.0 9998 10000
Ebed_488 0.9603684 0.9603644 64.16130 2.422755e+05 0.0000000 1 -55802.39 111610.77 111632.41 41158487.9 9998 10000
Ebed_531 0.9218042 0.9217964 90.12474 1.178606e+05 0.0000000 1 -59200.33 118406.67 118428.30 81208450.1 9998 10000
Ebed_555 0.9060950 0.9060856 98.76356 9.647132e+04 0.0000000 1 -60115.67 120237.34 120258.98 97522895.8 9998 10000
Ebed_645 0.8957033 0.8956929 104.08487 8.586317e+04 0.0000000 1 -60640.45 121286.90 121308.53 108314934.9 9998 10000
Ebed_par 0.9958762 0.9958758 20.69674 2.414457e+06 0.0000000 1 -44488.15 88982.29 89003.92 4282692.3 9998 10000
EbedAction_brown 0.9991750 0.9991749 9.25712 1.210898e+07 0.0000000 1 -36442.32 72890.63 72912.26 856771.4 9998 10000
EbedAction_red 0.9839769 0.9839753 40.79677 6.139759e+05 0.0000000 1 -51274.41 102554.83 102576.46 16640432.8 9998 10000
n 0.0000000 -0.0001000 322.29413 4.740000e-05 0.9945076 1 -71943.03 143892.06 143913.69 1038527319.0 9998 10000

Observations:

  • Very good linear relationships with Ebed_par
  • Also with EbedAction Brown and Red, not used in the next. We are also interested in explaining deviation in their relationships, but another story.
  • Negative relationship but way weaker relationship with Kds

Multiple linear regression

vars_mlm_g <- lm(EbedAction_green ~ Ebed_443 + Ebed_488 + Ebed_531 + Ebed_555 + Ebed_645, data=vars)
summary(vars_mlm_g)
## 
## Call:
## lm(formula = EbedAction_green ~ Ebed_443 + Ebed_488 + Ebed_531 + 
##     Ebed_555 + Ebed_645, data = vars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -53.987  -1.893  -0.118   0.826  78.511 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.03992    0.11030   0.362    0.717    
## Ebed_443    509.35980    2.36118 215.723   <2e-16 ***
## Ebed_488     58.56503    4.54331  12.890   <2e-16 ***
## Ebed_531    197.07872    8.09502  24.346   <2e-16 ***
## Ebed_555    139.16949    5.85225  23.781   <2e-16 ***
## Ebed_645    605.86911    1.53406 394.945   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.727 on 9994 degrees of freedom
## Multiple R-squared:  0.9994, Adjusted R-squared:  0.9994 
## F-statistic: 3.477e+06 on 5 and 9994 DF,  p-value: < 2.2e-16

Observations:

  • \(R^2 = 0.9994\) better than just with Ebed_par alone…
  • All predictors significant
layout(matrix(c(1,2,3,4),2,2)) # optional 4 graphs/page
plot(vars_mlm_g)
Figure 9 - Assumptions check of linear model to explain E algal Green

Figure 9 - Assumptions check of linear model to explain E algal Green

Observations:

  • Q-Q plot suggests more extreme values than “pure” normal distribution.

GAM

vars_gam_g <- gam(EbedAction_green ~ s(Ebed_443, bs='cr') + s(Ebed_488, bs='cr') + s(Ebed_531, bs='cr') + s(Ebed_555, bs='cr') + s(Ebed_645, bs='cr'), data=vars)

plot_gam(vars_gam_g, ncol = 2) + ylab("Pred. Ebed Algal Green") + theme_light()#all predictors
Figure 10 - GAM model to explain E algal Green

Figure 10 - GAM model to explain E algal Green

Observations:

  • Should be similar than the one for brown.
summary(vars_gam_g)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## EbedAction_green ~ s(Ebed_443, bs = "cr") + s(Ebed_488, bs = "cr") + 
##     s(Ebed_531, bs = "cr") + s(Ebed_555, bs = "cr") + s(Ebed_645, 
##     bs = "cr")
## 
## Parametric coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 173.64172    0.06751    2572   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##               edf Ref.df       F p-value    
## s(Ebed_443) 5.897  6.040 2613.24  <2e-16 ***
## s(Ebed_488) 4.286  4.752  209.74  <2e-16 ***
## s(Ebed_531) 5.858  6.395   38.78  <2e-16 ***
## s(Ebed_555) 5.276  6.032   63.23  <2e-16 ***
## s(Ebed_645) 5.546  5.810 9332.86  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =      1   Deviance explained =  100%
## GCV = 45.703  Scale est. = 45.575    n = 10000

Observations:

  • Deviance explained = 100%
  • All predictors significant
  • Similar to Brown?
plot_gam_3d(model = vars_gam_g, main_var = Ebed_645, second_var = Ebed_488, palette='bilbao', direction = -1)

Figure 11 - Prediction of Ebed_green using a GAM using multiple predictors (Ebed_443, Ebed_488, Ebed_531, Ebed_555, Ebed_645). Showing Ebed_645 and Ebed_488.

Conclusion - Model comparison

What can be changed?

What is the unit of Ebed algal?