Use Cleveland dot plots, box-plots and histograms to check for outliers and variable distribution of both metrics and predictor variables.
Collinearity is the existence of correlation between covariates. One strategy for addressing this problem is to sequentially drop the covariate with the highest VIF, recalculate the VIFs and repeat this process until all VIFs are smaller than a pre-selected threshold. In this case <5.
##
##
## Variance inflation factors
##
## GVIF
## instreamVis 1.324628
## DIN 2.006379
## DRP 1.499392
## CHLA 1.406675
## maxrateToQ50 1.347883
## ORDER_ 3.525147
## ELEVATION 7.381125
## DSDIST2COA 4.229626
## SEGRIPSHAD 2.784004
## SEGJANAIRT 3.283909
## SEGMINTNOR 4.475691
## USAVGSLOPE 3.622418
## USCALCIUM 2.472736
## USPHOSPHOR 3.601356
## USHARDNESS 2.791326
## SEGFLOWSTA 2.495058
## SpecMeanF 22.250112
## SpecMALF 16.107939
## FRE3 2.775510
##
##
## Variance inflation factors
##
## GVIF
## instreamVis 1.324609
## DIN 2.002264
## DRP 1.469842
## CHLA 1.367579
## maxrateToQ50 1.322533
## ORDER_ 3.519355
## ELEVATION 7.038428
## DSDIST2COA 4.050351
## SEGRIPSHAD 2.695145
## SEGJANAIRT 3.118129
## SEGMINTNOR 4.068723
## USAVGSLOPE 3.451939
## USCALCIUM 2.414506
## USPHOSPHOR 3.595996
## USHARDNESS 2.791227
## SEGFLOWSTA 2.466110
## SpecMALF 3.028114
## FRE3 1.980470
##
##
## Variance inflation factors
##
## GVIF
## instreamVis 1.312122
## DIN 1.950044
## DRP 1.467376
## CHLA 1.364042
## maxrateToQ50 1.321632
## ORDER_ 3.049796
## DSDIST2COA 2.720981
## SEGRIPSHAD 2.692653
## SEGJANAIRT 2.423385
## SEGMINTNOR 3.389084
## USAVGSLOPE 3.427591
## USCALCIUM 2.345723
## USPHOSPHOR 3.591736
## USHARDNESS 2.769488
## SEGFLOWSTA 2.428301
## SpecMALF 2.940207
## FRE3 1.977364
Plot scatter plots for each metrics vs. all non-collinear predictor to visualize relationships. The blue line is fitted using GAM to help visualize trends.
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
Multiple regression was conducted on each of the 21 selected transformed and scaled metrics using all non-collinear variables instreamVis, DIN, DRP, CHLA, maxrateToQ50, ORDER_, DSDIST2COA, SEGRIPSHAD, SEGJANAIRT, SEGMINTNOR, USAVGSLOPE, USCALCIUM, USPHOSPHOR, USHARDNESS, SEGFLOWSTA, SpecMALF, FRE3. Predictor variables selection was done using backward selection based on the BIC criteria.
| metric | r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual |
|---|---|---|---|---|---|---|---|---|---|---|---|
| sed_pct_richness_decreaser | 0.52 | 0.51 | 0.70 | 54.55 | 0 | 10 | -535.06 | 1092.12 | 1138.70 | 243.43 | 500 |
| X11b | 0.47 | 0.46 | 0.74 | 44.00 | 0 | 10 | -562.17 | 1146.34 | 1192.92 | 270.73 | 500 |
| MCI_hb | 0.45 | 0.44 | 0.75 | 51.68 | 0 | 8 | -569.96 | 1157.92 | 1196.03 | 279.13 | 502 |
| sed_richness_increaser | 0.44 | 0.44 | 0.75 | 67.01 | 0 | 6 | -573.59 | 1161.18 | 1190.82 | 283.13 | 504 |
| X7b | 0.41 | 0.40 | 0.77 | 50.00 | 0 | 7 | -588.47 | 1192.93 | 1226.81 | 300.14 | 503 |
| sed_MCI_like | 0.41 | 0.40 | 0.77 | 57.65 | 0 | 6 | -589.91 | 1193.82 | 1223.46 | 301.84 | 504 |
| X6c | 0.40 | 0.40 | 0.77 | 85.37 | 0 | 4 | -591.65 | 1193.30 | 1214.47 | 303.91 | 506 |
| X3c | 0.41 | 0.40 | 0.78 | 38.42 | 0 | 9 | -589.33 | 1198.67 | 1241.01 | 301.16 | 501 |
| chl_MCI_like | 0.40 | 0.40 | 0.78 | 48.80 | 0 | 7 | -590.99 | 1197.98 | 1231.86 | 303.12 | 503 |
| X3b | 0.41 | 0.40 | 0.78 | 34.45 | 0 | 10 | -589.51 | 1201.03 | 1247.61 | 301.38 | 500 |
| X6b | 0.39 | 0.39 | 0.78 | 65.52 | 0 | 5 | -595.66 | 1203.33 | 1228.73 | 308.73 | 505 |
| chl_richness_increaser | 0.37 | 0.36 | 0.80 | 97.39 | 0 | 3 | -607.12 | 1222.23 | 1239.17 | 322.91 | 507 |
| pEPTrich | 0.36 | 0.35 | 0.80 | 35.71 | 0 | 8 | -608.28 | 1234.57 | 1272.68 | 324.39 | 502 |
| sed_pct_richness_increaser | 0.34 | 0.33 | 0.82 | 37.43 | 0 | 7 | -616.23 | 1248.46 | 1282.33 | 334.66 | 503 |
| pEPTabund | 0.33 | 0.32 | 0.82 | 31.56 | 0 | 8 | -619.27 | 1256.55 | 1294.66 | 338.68 | 502 |
| chl_pct_richness_increaser | 0.31 | 0.30 | 0.83 | 45.76 | 0 | 5 | -627.88 | 1267.75 | 1293.16 | 350.30 | 505 |
| sed_richness_decreaser | 0.31 | 0.30 | 0.84 | 28.02 | 0 | 8 | -629.01 | 1276.02 | 1314.13 | 351.86 | 502 |
| X10b | 0.30 | 0.30 | 0.84 | 54.50 | 0 | 4 | -631.81 | 1273.62 | 1294.79 | 355.74 | 506 |
| X8a | 0.27 | 0.27 | 0.86 | 38.01 | 0 | 5 | -641.70 | 1295.40 | 1320.81 | 369.82 | 505 |
| chl_richness_decreaser | 0.25 | 0.24 | 0.87 | 23.80 | 0 | 7 | -650.20 | 1316.40 | 1350.28 | 382.35 | 503 |
| chl_pct_richness_decreaser | 0.24 | 0.23 | 0.88 | 25.96 | 0 | 6 | -654.48 | 1322.97 | 1352.61 | 388.83 | 504 |
| EPTrich | 0.24 | 0.23 | 0.88 | 19.57 | 0 | 8 | -653.95 | 1325.89 | 1364.00 | 388.01 | 502 |
| X1b | 0.23 | 0.22 | 0.88 | 21.10 | 0 | 7 | -657.51 | 1331.02 | 1364.89 | 393.47 | 503 |
| X13b | 0.14 | 0.13 | 0.93 | 26.59 | 0 | 3 | -685.90 | 1379.80 | 1396.74 | 439.81 | 507 |
| X13d | 0.09 | 0.08 | 0.96 | 11.96 | 0 | 4 | -700.12 | 1410.25 | 1431.42 | 465.04 | 506 |
To visualize the regression estimates for metric. The error bars are 95% confidence intervals. Because the metrics and predictors were centered and scaled, the effect sizes (coefficients) are directly comparable.
Residual plots look fine, showing no evident patterns.
Partial regression plot show the effect of each predictor while holding the other variable in the model constant (i.e.the median). Plots contain a confidence band, prediction line, and partial residuals.
Additionally, the last panel on each set of plots shows the hierarchical partitioning of R^2 values to determine the proportion of variance explained independently by each predictor.