1. Data exploration

Use cleveland dot plots, boxplots and histograms to check for outliers and variable distributionof both metrcis and predictor variables.

2. Scale, centre and transform metrics and predictors using YeoJohnson.

3. Correlograms of predictors and metric variables

5. Variance Inflation factor (VIF)

Collinearity is the existence of correlation between covariates. One strategy for addressing this problem is to sequentially drop the covariate with the highest VIF, recalculate the VIFs and repeat this process until all VIFs are smaller than a pre-selected threshold. In this case a more stringent <3 approach, in order to reduce the large number of predictors.

## 
## 
## Variance inflation factors
## 
##                      GVIF
## T1NativeVeg      3.079820
## T1ExoticVeg      1.290736
## T2PastoralHeavy  3.027416
## T1Urban          1.326437
## maxrateToQ50     1.151006
## ORDER_           2.664446
## ELEVATION        3.295385
## DSDIST2COA       2.962276
## SEGRIPSHAD       3.012890
## SEGJANAIRT       1.777061
## SEGMINTNOR       2.054681
## USAVGSLOPE       3.760013
## USCALCIUM        1.455501
## USPHOSPHOR       1.775200
## USHARDNESS       2.141925
## SEGFLOWSTA       2.159379
## SpecMeanF       15.073455
## SpecMALF        10.097130
## FRE3             4.632543
## 
## 
## Variance inflation factors
## 
##                     GVIF
## T1NativeVeg     3.066335
## T1ExoticVeg     1.288759
## T2PastoralHeavy 3.026443
## T1Urban         1.326086
## maxrateToQ50    1.149860
## ORDER_          2.619052
## ELEVATION       3.247737
## DSDIST2COA      2.944875
## SEGRIPSHAD      3.009216
## SEGJANAIRT      1.690946
## SEGMINTNOR      2.051317
## USAVGSLOPE      3.680068
## USCALCIUM       1.439506
## USPHOSPHOR      1.769846
## USHARDNESS      2.129066
## SEGFLOWSTA      2.148366
## SpecMALF        2.377520
## FRE3            2.146377
## 
## 
## Variance inflation factors
## 
##                     GVIF
## T1NativeVeg     2.761922
## T1ExoticVeg     1.270000
## T2PastoralHeavy 2.923182
## T1Urban         1.309640
## maxrateToQ50    1.148572
## ORDER_          2.510949
## ELEVATION       3.157824
## DSDIST2COA      2.692045
## SEGRIPSHAD      2.973396
## SEGJANAIRT      1.688929
## SEGMINTNOR      1.807952
## USCALCIUM       1.429484
## USPHOSPHOR      1.674130
## USHARDNESS      1.693608
## SEGFLOWSTA      2.137972
## SpecMALF        2.332969
## FRE3            2.060702
## 
## 
## Variance inflation factors
## 
##                     GVIF
## T1NativeVeg     2.752517
## T1ExoticVeg     1.269974
## T2PastoralHeavy 2.892895
## T1Urban         1.305398
## maxrateToQ50    1.141453
## ORDER_          2.368539
## DSDIST2COA      1.873960
## SEGRIPSHAD      2.931674
## SEGJANAIRT      1.535771
## SEGMINTNOR      1.775643
## USCALCIUM       1.429363
## USPHOSPHOR      1.673898
## USHARDNESS      1.664648
## SEGFLOWSTA      1.984227
## SpecMALF        2.331645
## FRE3            2.049462
## 
## 
## Variance inflation factors
## 
##                     GVIF
## T1NativeVeg     2.696642
## T1ExoticVeg     1.269892
## T2PastoralHeavy 2.790788
## T1Urban         1.303773
## maxrateToQ50    1.141059
## ORDER_          2.368225
## DSDIST2COA      1.664225
## SEGRIPSHAD      2.922107
## SEGJANAIRT      1.531252
## USCALCIUM       1.422990
## USPHOSPHOR      1.670762
## USHARDNESS      1.654423
## SEGFLOWSTA      1.947498
## SpecMALF        2.302771
## FRE3            1.682966
## 
## 
## Variance inflation factors
## 
##                     GVIF
## T1NativeVeg     2.635212
## T1ExoticVeg     1.193987
## T2PastoralHeavy 2.776451
## T1Urban         1.294689
## maxrateToQ50    1.141059
## ORDER_          2.366075
## DSDIST2COA      1.546134
## SEGRIPSHAD      2.887348
## SEGJANAIRT      1.269630
## USCALCIUM       1.246037
## USHARDNESS      1.650598
## SEGFLOWSTA      1.942885
## SpecMALF        2.289884
## FRE3            1.681654

6. Explore metric vs. predictor relationships

Plot scatter plots for each metrics vs. all non-collinear predictor to visualize relationships. The blue lines were fitted using GAM to help visualize trends.

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

7. Linear models, summary table and coefficient plot

Multiple regression was conducted on each of the 21 selected transformed and scaled metrics using all non-collinear variables T1NativeVeg, T1ExoticVeg, T2PastoralHeavy, T1Urban, maxrateToQ50, ORDER_, DSDIST2COA, SEGRIPSHAD, SEGJANAIRT, USCALCIUM, USHARDNESS, SEGFLOWSTA, SpecMALF, FRE3. Predictor variables selection was done using backward selection based on the BIC criteria.

metric r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual
X13b 0.12 0.12 0.94 164.19 0 7 -11908.84 23833.69 23890.32 7756.19 8767
X13d 0.12 0.12 0.94 153.38 0 8 -11874.54 23767.08 23830.80 7695.78 8766
X10b 0.16 0.16 0.92 171.28 0 10 -11666.12 23354.25 23432.12 7338.72 8764
X1b 0.22 0.22 0.88 223.46 0 11 -11364.57 22753.14 22838.09 6851.21 8763
chl_pct_richness_decreaser 0.22 0.22 0.88 275.41 0 9 -11356.74 22733.47 22804.27 6838.99 8765
sed_pct_richness_increaser 0.25 0.25 0.86 329.56 0 9 -11170.57 22361.14 22431.94 6554.85 8765
X11b 0.31 0.31 0.83 442.06 0 9 -10807.39 21634.79 21705.58 6034.06 8765
EPTrich 0.32 0.32 0.83 510.19 0 8 -10772.25 21562.49 21626.21 5985.91 8766
chl_pct_richness_increaser 0.33 0.33 0.82 354.15 0 12 -10714.52 21455.05 21547.08 5907.67 8762
pEPTabund 0.33 0.33 0.82 426.24 0 10 -10710.60 21443.19 21521.07 5902.38 8764
X6b 0.33 0.33 0.82 544.18 0 8 -10680.35 21378.71 21442.42 5861.83 8766
X8a 0.35 0.35 0.81 578.98 0 8 -10588.25 21194.51 21258.22 5740.05 8766
X3b 0.36 0.36 0.80 538.17 0 9 -10519.30 21058.59 21129.39 5650.53 8765
X3c 0.37 0.37 0.79 574.57 0 9 -10414.94 20849.88 20920.68 5517.71 8765
pEPTrich 0.38 0.38 0.79 541.29 0 10 -10339.29 20700.57 20778.45 5423.37 8764
sed_MCI_like 0.39 0.39 0.78 706.56 0 8 -10259.34 20536.68 20600.39 5329.49 8761
X6c 0.42 0.42 0.76 779.66 0 8 -10091.73 20201.47 20265.18 5125.81 8766
chl_MCI_like 0.42 0.42 0.76 578.91 0 11 -10040.84 20105.69 20190.63 5073.40 8754
X7b 0.42 0.42 0.76 637.48 0 10 -10051.28 20124.57 20202.44 5078.76 8764
sed_pct_richness_decreaser 0.47 0.47 0.73 849.37 0 9 -9698.24 19416.48 19487.28 4686.06 8765
MCI_hb 0.48 0.48 0.72 633.26 0 13 -9542.82 19113.65 19212.76 4522.96 8761

8. Coefficients plots

To visualize the regression estimates for metric. The error bars are 95% confidence intervals. Because metrics and predictors were centered and scaled, effect sizes (coefficients) are directly comparable.

9. Models validations by residuals plots inspection

Residual plots look fine, showing no evident patterns.

10. Partial effects plots and variance partitioning

Partial regression plot show the effect of each predictor while holding the other variable in the model constant (i.e.the median). Plots contain a confidence band, prediction line, and partial residuals.

Additionally, the last panel on each set of plots shows the hierarchical partitioning of R^2 values to determine the proportion of variance explained independently by each predictor.

## # A tibble: 21 x 2
##                        metric     n
##                         <chr> <int>
##  1               chl_MCI_like    11
##  2 chl_pct_richness_decreaser     9
##  3 chl_pct_richness_increaser    12
##  4                    EPTrich     8
##  5                     MCI_hb    13
##  6                  pEPTabund    10
##  7                   pEPTrich    10
##  8               sed_MCI_like     8
##  9 sed_pct_richness_decreaser     9
## 10 sed_pct_richness_increaser     9
## # ... with 11 more rows
## 500 regressions calculated: 1547 to go...
## 1000 regressions calculated: 1047 to go...
## 1500 regressions calculated: 547 to go...
## 2000 regressions calculated: 47 to go...
## 500 regressions calculated: 11 to go...
## 500 regressions calculated: 3595 to go...
## 1000 regressions calculated: 3095 to go...
## 1500 regressions calculated: 2595 to go...
## 2000 regressions calculated: 2095 to go...
## 2500 regressions calculated: 1595 to go...
## 3000 regressions calculated: 1095 to go...
## 3500 regressions calculated: 595 to go...
## 4000 regressions calculated: 95 to go...
## 500 regressions calculated: 3595 to go...
## 1000 regressions calculated: 3095 to go...
## 1500 regressions calculated: 2595 to go...
## 2000 regressions calculated: 2095 to go...
## 2500 regressions calculated: 1595 to go...
## 3000 regressions calculated: 1095 to go...
## 3500 regressions calculated: 595 to go...
## 4000 regressions calculated: 95 to go...
## 500 regressions calculated: 523 to go...
## 1000 regressions calculated: 23 to go...
## 500 regressions calculated: 523 to go...
## 1000 regressions calculated: 23 to go...
## 500 regressions calculated: 11 to go...
## 500 regressions calculated: 11 to go...
## 500 regressions calculated: 523 to go...
## 1000 regressions calculated: 23 to go...
## 500 regressions calculated: 11 to go...
## 500 regressions calculated: 1547 to go...
## 1000 regressions calculated: 1047 to go...
## 1500 regressions calculated: 547 to go...
## 2000 regressions calculated: 47 to go...
## 500 regressions calculated: 11 to go...
## 500 regressions calculated: 11 to go...
## 500 regressions calculated: 523 to go...
## 1000 regressions calculated: 23 to go...