For this project, we will explore hydric soils in six geographically contrasting wildland sites:
| Soil Survey Area | County | State |
|---|---|---|
| TN640 | GSMNP | TN |
| TN059 | GSMNP | TN |
| TN609 | Blount County | TN |
| TN608 | Sevier County | TN |
| TN606 | Cocke County | TN |
| TN113 | Madison County | TN |
| TN093 | Knox County | TN |
| TN602 | Knox County | TN |
| NC009 | Ashe County | NC |
| NC101 | Johnston County | NC |
| NC099 | Jackson County | NC |
## Warning: package 'usmap' was built under R version 4.5.2
## Warning: package 'maps' was built under R version 4.5.2
##
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
##
## map
## The following object is masked from 'package:cluster':
##
## votes.repub
Overarching Objective:
Hypotheses
Checking for overly correlated variables
looks like we found a two 100% correlated variables! The fragvol_r and total_frag_pct are related. Dropping one!
Now let’s fit a linear model and then take a look at the residuals.
## 1 2 3 4 5 6
## -0.007163786 0.011648340 -0.033175586 0.039100242 0.036794693 0.046244578
## 1 2 3 4 5 6
## -0.007163786 0.011648340 -0.033175586 0.039100242 0.036794693 0.046244578
That could look worse! Let’s keep going.
## dbthirdbar_r sandtotal_r claytotal_r silttotal_r om_r
## 1.077448 18.493864 11.673226 12.765997 1.460983
## total_frags_pct ksat_r
## 1.100736 1.322543
## dbthirdbar_r sandtotal_r claytotal_r silttotal_r om_r
## FALSE TRUE TRUE TRUE FALSE
## total_frags_pct ksat_r
## FALSE FALSE
## `geom_smooth()` using formula = 'y ~ x'
Okay, but do we need all these variables? Time for a stepwise selection.
##
## Backwards Step-down - Original Model
##
## Deleted Chi-Sq d.f. P Residual d.f. P AIC R2
## sandtotal_r 0.4 1 0.5279 0.4 1 0.5279 -1.6 794.3
##
## Approximate Estimates after Deleting Factors
##
## Coef S.E. Wald Z P
## Intercept 0.2323586 1.024e-02 22.692 0.000e+00
## dbthirdbar_r -0.0729788 6.818e-03 -10.704 0.000e+00
## claytotal_r 0.0003522 5.584e-05 6.308 2.830e-10
## silttotal_r 0.0005533 4.214e-05 13.129 0.000e+00
## om_r 0.0053320 4.424e-04 12.052 0.000e+00
## total_frags_pct -0.0011769 3.699e-05 -31.819 0.000e+00
## ksat_r -0.0002027 3.486e-05 -5.815 6.078e-09
##
## Factors in Final Model
##
## [1] dbthirdbar_r claytotal_r silttotal_r om_r
## [5] total_frags_pct ksat_r
## index.orig training test optimism index.corrected Lower Upper n
## R-square 0.5112 0.5210 0.4948 0.0262 0.4851 0.3589 0.5461 40
## MSE 0.0009 0.0009 0.0009 -0.0001 0.0010 0.0008 0.0012 40
## g 0.0325 0.0326 0.0316 0.0009 0.0315 0.0248 0.0339 40
## Intercept 0.0000 0.0000 0.0036 -0.0036 0.0036 -0.0089 0.0298 40
## Slope 1.0000 1.0000 0.9742 0.0258 0.9742 0.7745 1.0605 40
##
## Factors Retained in Backwards Elimination
##
## dbthirdbar_r sandtotal_r claytotal_r silttotal_r om_r total_frags_pct ksat_r
## * * * * *
## * * * * *
## * * * * * *
## * * * * * * *
## * * * * * * *
## * * * * * * *
## * * * * * * *
## * * * * * *
## * * * * * * *
## * * * * *
## * * * * * *
## * * * * * * *
## * * * * *
## * * * * * * *
## * * * * *
## * * * * * *
## * * * * * * *
## * * * * * * *
## * * * * * * *
## * * * * * *
## * * * * * *
## * * * * * * *
## * * * * * *
## * * * * * *
## * * * * * *
## * * * * * * *
## * * * * *
## * * * * * *
## * * * * * * *
## * * * * * *
## * * * * * * *
## * * * * * *
## * * * * * * *
## * * * * *
## * * * * *
## * * * * * * *
## * * * * * * *
## * * * * *
## * * * *
## * * * * * * *
##
## Frequencies of Numbers of Factors Retained
##
## 4 5 6 7
## 1 9 12 18
Alright, let’s go ahead and drop sand based on these results.
Now we can fit our final model.
## Linear Regression Model
##
## ols(formula = awc_r ~ dbthirdbar_r + claytotal_r + silttotal_r +
## om_r + total_frags_pct + ksat_r, data = dat, weights = dat$hzdepb_r,
## x = TRUE, y = TRUE)
##
## Model Likelihood Discrimination
## Ratio Test Indexes
## Obs 2730 LR chi2 1851.77 R2 0.493
## sigma0.2688 d.f. 6 R2 adj 0.491
## d.f. 2723 Pr(> chi2) 0.0000 g 0.031
##
## Residuals
##
## Min 1Q Median 3Q Max
## -0.427916 -0.016431 0.001849 0.019031 0.283423
##
##
## Coef S.E. t Pr(>|t|)
## Intercept 0.2324 0.0102 22.69 <0.0001
## dbthirdbar_r -0.0730 0.0068 -10.71 <0.0001
## claytotal_r 0.0004 0.0001 6.31 <0.0001
## silttotal_r 0.0006 0.0000 13.13 <0.0001
## om_r 0.0053 0.0004 12.05 <0.0001
## total_frags_pct -0.0012 0.0000 -31.82 <0.0001
## ksat_r -0.0002 0.0000 -5.82 <0.0001
## index.orig training test optimism index.corrected Lower Upper n
## R-square 0.4340 0.5125 0.4729 0.0396 0.3944 -0.1249 0.6285 10
## MSE 0.0011 0.0009 0.0010 -0.0001 0.0011 0.0005 0.0017 10
## g 0.0309 0.0325 0.0321 0.0004 0.0305 0.0112 0.0415 10
## Intercept 0.0000 0.0000 0.0005 -0.0005 0.0005 -0.0480 0.1227 10
## Slope 1.0000 1.0000 0.9952 0.0048 0.9952 0.1415 1.3376 10
## [1] 1.527482
## [1] 0.03250423
## [1] 0.433991
## `geom_smooth()` using formula = 'y ~ x'
Final Model Accuracy
## Linear Regression Model
##
## ols(formula = awc_r ~ dbthirdbar_r + claytotal_r + silttotal_r +
## om_r + total_frags_pct + ksat_r, data = dat, weights = dat$hzdepb_r,
## x = TRUE, y = TRUE)
##
## Model Likelihood Discrimination
## Ratio Test Indexes
## Obs 2730 LR chi2 1851.77 R2 0.493
## sigma0.2688 d.f. 6 R2 adj 0.491
## d.f. 2723 Pr(> chi2) 0.0000 g 0.031
##
## Residuals
##
## Min 1Q Median 3Q Max
## -0.427916 -0.016431 0.001849 0.019031 0.283423
##
##
## Coef S.E. t Pr(>|t|)
## Intercept 0.2324 0.0102 22.69 <0.0001
## dbthirdbar_r -0.0730 0.0068 -10.71 <0.0001
## claytotal_r 0.0004 0.0001 6.31 <0.0001
## silttotal_r 0.0006 0.0000 13.13 <0.0001
## om_r 0.0053 0.0004 12.05 <0.0001
## total_frags_pct -0.0012 0.0000 -31.82 <0.0001
## ksat_r -0.0002 0.0000 -5.82 <0.0001
Anova
## Analysis of Variance Response: awc_r
##
## Factor d.f. Partial SS MS F P
## dbthirdbar_r 1 8.278434 8.27843379 114.61 <.0001
## claytotal_r 1 2.874617 2.87461730 39.80 <.0001
## silttotal_r 1 12.454132 12.45413192 172.42 <.0001
## om_r 1 10.494479 10.49447883 145.29 <.0001
## total_frags_pct 1 73.145590 73.14558984 1012.66 <.0001
## ksat_r 1 2.442645 2.44264453 33.82 <.0001
## REGRESSION 6 190.888928 31.81482137 440.46 <.0001
## ERROR 2723 196.685176 0.07223106
Partial R2
Total fragments percent causes the most variance
Model Effects
Plot Effects