Use cleveland dot plots, boxplots and histograms to check for outliers and variable distributionof both metrcis and predictor variables.
Collinearity is the existence of correlation between covariates. One strategy for addressing this problem is to sequentially drop the covariate with the highest VIF, recalculate the VIFs and repeat this process until all VIFs are smaller than a pre-selected threshold. In this case a more stringent <3 approach, in order to reduce the large number of predictors.
##
##
## Variance inflation factors
##
## GVIF
## T1NativeVeg 3.079820
## T1ExoticVeg 1.290736
## T2PastoralHeavy 3.027416
## T1Urban 1.326437
## maxrateToQ50 1.151006
## ORDER_ 2.664446
## ELEVATION 3.295385
## DSDIST2COA 2.962276
## SEGRIPSHAD 3.012890
## SEGJANAIRT 1.777061
## SEGMINTNOR 2.054681
## USAVGSLOPE 3.760013
## USCALCIUM 1.455501
## USPHOSPHOR 1.775200
## USHARDNESS 2.141925
## SEGFLOWSTA 2.159379
## SpecMeanF 15.073455
## SpecMALF 10.097130
## FRE3 4.632543
##
##
## Variance inflation factors
##
## GVIF
## T1NativeVeg 3.066335
## T1ExoticVeg 1.288759
## T2PastoralHeavy 3.026443
## T1Urban 1.326086
## maxrateToQ50 1.149860
## ORDER_ 2.619052
## ELEVATION 3.247737
## DSDIST2COA 2.944875
## SEGRIPSHAD 3.009216
## SEGJANAIRT 1.690946
## SEGMINTNOR 2.051317
## USAVGSLOPE 3.680068
## USCALCIUM 1.439506
## USPHOSPHOR 1.769846
## USHARDNESS 2.129066
## SEGFLOWSTA 2.148366
## SpecMALF 2.377520
## FRE3 2.146377
##
##
## Variance inflation factors
##
## GVIF
## T1NativeVeg 2.761922
## T1ExoticVeg 1.270000
## T2PastoralHeavy 2.923182
## T1Urban 1.309640
## maxrateToQ50 1.148572
## ORDER_ 2.510949
## ELEVATION 3.157824
## DSDIST2COA 2.692045
## SEGRIPSHAD 2.973396
## SEGJANAIRT 1.688929
## SEGMINTNOR 1.807952
## USCALCIUM 1.429484
## USPHOSPHOR 1.674130
## USHARDNESS 1.693608
## SEGFLOWSTA 2.137972
## SpecMALF 2.332969
## FRE3 2.060702
##
##
## Variance inflation factors
##
## GVIF
## T1NativeVeg 2.752517
## T1ExoticVeg 1.269974
## T2PastoralHeavy 2.892895
## T1Urban 1.305398
## maxrateToQ50 1.141453
## ORDER_ 2.368539
## DSDIST2COA 1.873960
## SEGRIPSHAD 2.931674
## SEGJANAIRT 1.535771
## SEGMINTNOR 1.775643
## USCALCIUM 1.429363
## USPHOSPHOR 1.673898
## USHARDNESS 1.664648
## SEGFLOWSTA 1.984227
## SpecMALF 2.331645
## FRE3 2.049462
##
##
## Variance inflation factors
##
## GVIF
## T1NativeVeg 2.696642
## T1ExoticVeg 1.269892
## T2PastoralHeavy 2.790788
## T1Urban 1.303773
## maxrateToQ50 1.141059
## ORDER_ 2.368225
## DSDIST2COA 1.664225
## SEGRIPSHAD 2.922107
## SEGJANAIRT 1.531252
## USCALCIUM 1.422990
## USPHOSPHOR 1.670762
## USHARDNESS 1.654423
## SEGFLOWSTA 1.947498
## SpecMALF 2.302771
## FRE3 1.682966
##
##
## Variance inflation factors
##
## GVIF
## T1NativeVeg 2.635212
## T1ExoticVeg 1.193987
## T2PastoralHeavy 2.776451
## T1Urban 1.294689
## maxrateToQ50 1.141059
## ORDER_ 2.366075
## DSDIST2COA 1.546134
## SEGRIPSHAD 2.887348
## SEGJANAIRT 1.269630
## USCALCIUM 1.246037
## USHARDNESS 1.650598
## SEGFLOWSTA 1.942885
## SpecMALF 2.289884
## FRE3 1.681654
Plot scatter plots for each metrics vs. all non-collinear predictor to visualize relationships. The blue lines were fitted using GAM to help visualize trends.
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
## [[1]]
Multiple regression was conducted on each of the 21 selected transformed and scaled metrics using all non-collinear variables T1NativeVeg, T1ExoticVeg, T2PastoralHeavy, T1Urban, maxrateToQ50, ORDER_, DSDIST2COA, SEGRIPSHAD, SEGJANAIRT, USCALCIUM, USHARDNESS, SEGFLOWSTA, SpecMALF, FRE3. Predictor variables selection was done using backward selection based on the BIC criteria.
| metric | r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual |
|---|---|---|---|---|---|---|---|---|---|---|---|
| X13b | 0.12 | 0.12 | 0.94 | 164.19 | 0 | 7 | -11908.84 | 23833.69 | 23890.32 | 7756.19 | 8767 |
| X13d | 0.12 | 0.12 | 0.94 | 153.38 | 0 | 8 | -11874.54 | 23767.08 | 23830.80 | 7695.78 | 8766 |
| X10b | 0.16 | 0.16 | 0.92 | 171.28 | 0 | 10 | -11666.12 | 23354.25 | 23432.12 | 7338.72 | 8764 |
| X1b | 0.22 | 0.22 | 0.88 | 223.46 | 0 | 11 | -11364.57 | 22753.14 | 22838.09 | 6851.21 | 8763 |
| chl_pct_richness_decreaser | 0.22 | 0.22 | 0.88 | 275.41 | 0 | 9 | -11356.74 | 22733.47 | 22804.27 | 6838.99 | 8765 |
| sed_pct_richness_increaser | 0.25 | 0.25 | 0.86 | 329.56 | 0 | 9 | -11170.57 | 22361.14 | 22431.94 | 6554.85 | 8765 |
| X11b | 0.31 | 0.31 | 0.83 | 442.06 | 0 | 9 | -10807.39 | 21634.79 | 21705.58 | 6034.06 | 8765 |
| EPTrich | 0.32 | 0.32 | 0.83 | 510.19 | 0 | 8 | -10772.25 | 21562.49 | 21626.21 | 5985.91 | 8766 |
| chl_pct_richness_increaser | 0.33 | 0.33 | 0.82 | 354.15 | 0 | 12 | -10714.52 | 21455.05 | 21547.08 | 5907.67 | 8762 |
| pEPTabund | 0.33 | 0.33 | 0.82 | 426.24 | 0 | 10 | -10710.60 | 21443.19 | 21521.07 | 5902.38 | 8764 |
| X6b | 0.33 | 0.33 | 0.82 | 544.18 | 0 | 8 | -10680.35 | 21378.71 | 21442.42 | 5861.83 | 8766 |
| X8a | 0.35 | 0.35 | 0.81 | 578.98 | 0 | 8 | -10588.25 | 21194.51 | 21258.22 | 5740.05 | 8766 |
| X3b | 0.36 | 0.36 | 0.80 | 538.17 | 0 | 9 | -10519.30 | 21058.59 | 21129.39 | 5650.53 | 8765 |
| X3c | 0.37 | 0.37 | 0.79 | 574.57 | 0 | 9 | -10414.94 | 20849.88 | 20920.68 | 5517.71 | 8765 |
| pEPTrich | 0.38 | 0.38 | 0.79 | 541.29 | 0 | 10 | -10339.29 | 20700.57 | 20778.45 | 5423.37 | 8764 |
| sed_MCI_like | 0.39 | 0.39 | 0.78 | 706.56 | 0 | 8 | -10259.34 | 20536.68 | 20600.39 | 5329.49 | 8761 |
| X6c | 0.42 | 0.42 | 0.76 | 779.66 | 0 | 8 | -10091.73 | 20201.47 | 20265.18 | 5125.81 | 8766 |
| chl_MCI_like | 0.42 | 0.42 | 0.76 | 578.91 | 0 | 11 | -10040.84 | 20105.69 | 20190.63 | 5073.40 | 8754 |
| X7b | 0.42 | 0.42 | 0.76 | 637.48 | 0 | 10 | -10051.28 | 20124.57 | 20202.44 | 5078.76 | 8764 |
| sed_pct_richness_decreaser | 0.47 | 0.47 | 0.73 | 849.37 | 0 | 9 | -9698.24 | 19416.48 | 19487.28 | 4686.06 | 8765 |
| MCI_hb | 0.48 | 0.48 | 0.72 | 633.26 | 0 | 13 | -9542.82 | 19113.65 | 19212.76 | 4522.96 | 8761 |
To visualize the regression estimates for metric. The error bars are 95% confidence intervals. Because metrics and predictors were centered and scaled, effect sizes (coefficients) are directly comparable.
Residual plots look fine, showing no evident patterns.
Partial regression plot show the effect of each predictor while holding the other variable in the model constant (i.e.the median). Plots contain a confidence band, prediction line, and partial residuals.
Additionally, the last panel on each set of plots shows the hierarchical partitioning of R^2 values to determine the proportion of variance explained independently by each predictor.
## # A tibble: 21 x 2
## metric n
## <chr> <int>
## 1 chl_MCI_like 11
## 2 chl_pct_richness_decreaser 9
## 3 chl_pct_richness_increaser 12
## 4 EPTrich 8
## 5 MCI_hb 13
## 6 pEPTabund 10
## 7 pEPTrich 10
## 8 sed_MCI_like 8
## 9 sed_pct_richness_decreaser 9
## 10 sed_pct_richness_increaser 9
## # ... with 11 more rows
## 500 regressions calculated: 1547 to go...
## 1000 regressions calculated: 1047 to go...
## 1500 regressions calculated: 547 to go...
## 2000 regressions calculated: 47 to go...
## 500 regressions calculated: 11 to go...
## 500 regressions calculated: 3595 to go...
## 1000 regressions calculated: 3095 to go...
## 1500 regressions calculated: 2595 to go...
## 2000 regressions calculated: 2095 to go...
## 2500 regressions calculated: 1595 to go...
## 3000 regressions calculated: 1095 to go...
## 3500 regressions calculated: 595 to go...
## 4000 regressions calculated: 95 to go...
## 500 regressions calculated: 3595 to go...
## 1000 regressions calculated: 3095 to go...
## 1500 regressions calculated: 2595 to go...
## 2000 regressions calculated: 2095 to go...
## 2500 regressions calculated: 1595 to go...
## 3000 regressions calculated: 1095 to go...
## 3500 regressions calculated: 595 to go...
## 4000 regressions calculated: 95 to go...
## 500 regressions calculated: 523 to go...
## 1000 regressions calculated: 23 to go...
## 500 regressions calculated: 523 to go...
## 1000 regressions calculated: 23 to go...
## 500 regressions calculated: 11 to go...
## 500 regressions calculated: 11 to go...
## 500 regressions calculated: 523 to go...
## 1000 regressions calculated: 23 to go...
## 500 regressions calculated: 11 to go...
## 500 regressions calculated: 1547 to go...
## 1000 regressions calculated: 1047 to go...
## 1500 regressions calculated: 547 to go...
## 2000 regressions calculated: 47 to go...
## 500 regressions calculated: 11 to go...
## 500 regressions calculated: 11 to go...
## 500 regressions calculated: 523 to go...
## 1000 regressions calculated: 23 to go...