1. Data exploration

Use Cleveland dot plots, box-plots and histograms to check for outliers and variable distribution of both metrics and predictor variables.

2. Scale, centre and transform metrics and predictors using YeoJohnson.

3. Correlograms of predictors and metric variables

5. Variance Inflation factor (VIF)

Collinearity is the existence of correlation between covariates. One strategy for addressing this problem is to sequentially drop the covariate with the highest VIF, recalculate the VIFs and repeat this process until all VIFs are smaller than a pre-selected threshold. In this case <5.

## 
## 
## Variance inflation factors
## 
##                   GVIF
## instreamVis   1.324628
## DIN           2.006379
## DRP           1.499392
## CHLA          1.406675
## maxrateToQ50  1.347883
## ORDER_        3.525147
## ELEVATION     7.381125
## DSDIST2COA    4.229626
## SEGRIPSHAD    2.784004
## SEGJANAIRT    3.283909
## SEGMINTNOR    4.475691
## USAVGSLOPE    3.622418
## USCALCIUM     2.472736
## USPHOSPHOR    3.601356
## USHARDNESS    2.791326
## SEGFLOWSTA    2.495058
## SpecMeanF    22.250112
## SpecMALF     16.107939
## FRE3          2.775510
## 
## 
## Variance inflation factors
## 
##                  GVIF
## instreamVis  1.324609
## DIN          2.002264
## DRP          1.469842
## CHLA         1.367579
## maxrateToQ50 1.322533
## ORDER_       3.519355
## ELEVATION    7.038428
## DSDIST2COA   4.050351
## SEGRIPSHAD   2.695145
## SEGJANAIRT   3.118129
## SEGMINTNOR   4.068723
## USAVGSLOPE   3.451939
## USCALCIUM    2.414506
## USPHOSPHOR   3.595996
## USHARDNESS   2.791227
## SEGFLOWSTA   2.466110
## SpecMALF     3.028114
## FRE3         1.980470
## 
## 
## Variance inflation factors
## 
##                  GVIF
## instreamVis  1.312122
## DIN          1.950044
## DRP          1.467376
## CHLA         1.364042
## maxrateToQ50 1.321632
## ORDER_       3.049796
## DSDIST2COA   2.720981
## SEGRIPSHAD   2.692653
## SEGJANAIRT   2.423385
## SEGMINTNOR   3.389084
## USAVGSLOPE   3.427591
## USCALCIUM    2.345723
## USPHOSPHOR   3.591736
## USHARDNESS   2.769488
## SEGFLOWSTA   2.428301
## SpecMALF     2.940207
## FRE3         1.977364

6. Explore metric vs. predictor relationships

Plot scatter plots for each metrics vs. all non-collinear predictor to visualize relationships. The blue line is fitted using GAM to help visualize trends.

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

7. Linear models, summary table and coefficient plot

Multiple regression was conducted on each of the 21 selected transformed and scaled metrics using all non-collinear variables instreamVis, DIN, DRP, CHLA, maxrateToQ50, ORDER_, DSDIST2COA, SEGRIPSHAD, SEGJANAIRT, SEGMINTNOR, USAVGSLOPE, USCALCIUM, USPHOSPHOR, USHARDNESS, SEGFLOWSTA, SpecMALF, FRE3. Predictor variables selection was done using backward selection based on the BIC criteria.

metric r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual
sed_pct_richness_decreaser 0.52 0.51 0.70 54.55 0 10 -535.06 1092.12 1138.70 243.43 500
X11b 0.47 0.46 0.74 44.00 0 10 -562.17 1146.34 1192.92 270.73 500
MCI_hb 0.45 0.44 0.75 51.68 0 8 -569.96 1157.92 1196.03 279.13 502
sed_richness_increaser 0.44 0.44 0.75 67.01 0 6 -573.59 1161.18 1190.82 283.13 504
X7b 0.41 0.40 0.77 50.00 0 7 -588.47 1192.93 1226.81 300.14 503
sed_MCI_like 0.41 0.40 0.77 57.65 0 6 -589.91 1193.82 1223.46 301.84 504
X6c 0.40 0.40 0.77 85.37 0 4 -591.65 1193.30 1214.47 303.91 506
X3c 0.41 0.40 0.78 38.42 0 9 -589.33 1198.67 1241.01 301.16 501
chl_MCI_like 0.40 0.40 0.78 48.80 0 7 -590.99 1197.98 1231.86 303.12 503
X3b 0.41 0.40 0.78 34.45 0 10 -589.51 1201.03 1247.61 301.38 500
X6b 0.39 0.39 0.78 65.52 0 5 -595.66 1203.33 1228.73 308.73 505
chl_richness_increaser 0.37 0.36 0.80 97.39 0 3 -607.12 1222.23 1239.17 322.91 507
pEPTrich 0.36 0.35 0.80 35.71 0 8 -608.28 1234.57 1272.68 324.39 502
sed_pct_richness_increaser 0.34 0.33 0.82 37.43 0 7 -616.23 1248.46 1282.33 334.66 503
pEPTabund 0.33 0.32 0.82 31.56 0 8 -619.27 1256.55 1294.66 338.68 502
chl_pct_richness_increaser 0.31 0.30 0.83 45.76 0 5 -627.88 1267.75 1293.16 350.30 505
sed_richness_decreaser 0.31 0.30 0.84 28.02 0 8 -629.01 1276.02 1314.13 351.86 502
X10b 0.30 0.30 0.84 54.50 0 4 -631.81 1273.62 1294.79 355.74 506
X8a 0.27 0.27 0.86 38.01 0 5 -641.70 1295.40 1320.81 369.82 505
chl_richness_decreaser 0.25 0.24 0.87 23.80 0 7 -650.20 1316.40 1350.28 382.35 503
chl_pct_richness_decreaser 0.24 0.23 0.88 25.96 0 6 -654.48 1322.97 1352.61 388.83 504
EPTrich 0.24 0.23 0.88 19.57 0 8 -653.95 1325.89 1364.00 388.01 502
X1b 0.23 0.22 0.88 21.10 0 7 -657.51 1331.02 1364.89 393.47 503
X13b 0.14 0.13 0.93 26.59 0 3 -685.90 1379.80 1396.74 439.81 507
X13d 0.09 0.08 0.96 11.96 0 4 -700.12 1410.25 1431.42 465.04 506

8. Coefficients plots

To visualize the regression estimates for metric. The error bars are 95% confidence intervals. Because the metrics and predictors were centered and scaled, the effect sizes (coefficients) are directly comparable.

9. Models validations by residuals plots inspection

Residual plots look fine, showing no evident patterns.

10. Partial effects plots and variance partitioning

Partial regression plot show the effect of each predictor while holding the other variable in the model constant (i.e.the median). Plots contain a confidence band, prediction line, and partial residuals.

Additionally, the last panel on each set of plots shows the hierarchical partitioning of R^2 values to determine the proportion of variance explained independently by each predictor.