STAT2020-final-project2

Soil Survey Area	County	State
TN640	GSMNP	TN
TN059	GSMNP	TN
TN609	Blount County	TN
TN608	Sevier County	TN
TN606	Cocke County	TN
TN113	Madison County	TN
TN093	Knox County	TN
TN602	Knox County	TN
NC009	Ashe County	NC
NC101	Johnston County	NC
NC099	Jackson County	NC

RESULTS AND DISCUSSION

Checking for overly correlated variables

looks like we found a two 100% correlated variables! The fragvol_r and total_frag_pct are related. Dropping one!

Now let’s fit a linear model and then take a look at the residuals.

##            1            2            3            4            5            6 
## -0.007163786  0.011648340 -0.033175586  0.039100242  0.036794693  0.046244578

##            1            2            3            4            5            6 
## -0.007163786  0.011648340 -0.033175586  0.039100242  0.036794693  0.046244578

That could look worse! Let’s keep going.

##    dbthirdbar_r     sandtotal_r     claytotal_r     silttotal_r            om_r 
##        1.077448       18.493864       11.673226       12.765997        1.460983 
## total_frags_pct          ksat_r 
##        1.100736        1.322543

##    dbthirdbar_r     sandtotal_r     claytotal_r     silttotal_r            om_r 
##           FALSE            TRUE            TRUE            TRUE           FALSE 
## total_frags_pct          ksat_r 
##           FALSE           FALSE

## `geom_smooth()` using formula = 'y ~ x'

Okay, but do we need all these variables? Time for a stepwise selection.

## 
##      Backwards Step-down - Original Model
## 
##  Deleted     Chi-Sq d.f. P      Residual d.f. P      AIC  R2   
##  sandtotal_r 0.4    1    0.5279 0.4      1    0.5279 -1.6 794.3
## 
## Approximate Estimates after Deleting Factors
## 
##                       Coef      S.E.  Wald Z         P
## Intercept        0.2323586 1.024e-02  22.692 0.000e+00
## dbthirdbar_r    -0.0729788 6.818e-03 -10.704 0.000e+00
## claytotal_r      0.0003522 5.584e-05   6.308 2.830e-10
## silttotal_r      0.0005533 4.214e-05  13.129 0.000e+00
## om_r             0.0053320 4.424e-04  12.052 0.000e+00
## total_frags_pct -0.0011769 3.699e-05 -31.819 0.000e+00
## ksat_r          -0.0002027 3.486e-05  -5.815 6.078e-09
## 
## Factors in Final Model
## 
## [1] dbthirdbar_r    claytotal_r     silttotal_r     om_r           
## [5] total_frags_pct ksat_r

##           index.orig training   test optimism index.corrected   Lower  Upper  n
## R-square      0.5112   0.5210 0.4948   0.0262          0.4851  0.3589 0.5461 40
## MSE           0.0009   0.0009 0.0009  -0.0001          0.0010  0.0008 0.0012 40
## g             0.0325   0.0326 0.0316   0.0009          0.0315  0.0248 0.0339 40
## Intercept     0.0000   0.0000 0.0036  -0.0036          0.0036 -0.0089 0.0298 40
## Slope         1.0000   1.0000 0.9742   0.0258          0.9742  0.7745 1.0605 40
## 
## Factors Retained in Backwards Elimination
## 
##  dbthirdbar_r sandtotal_r claytotal_r silttotal_r om_r total_frags_pct ksat_r
##  *                                    *           *    *               *     
##  *                                    *           *    *               *     
##  *            *           *           *                *               *     
##  *            *           *           *           *    *               *     
##  *            *           *           *           *    *               *     
##  *            *           *           *           *    *               *     
##  *            *           *           *           *    *               *     
##  *            *                       *           *    *               *     
##  *            *           *           *           *    *               *     
##  *            *           *                            *               *     
##  *            *           *                       *    *               *     
##  *            *           *           *           *    *               *     
##  *                                    *           *    *               *     
##  *            *           *           *           *    *               *     
##  *                                    *           *    *               *     
##  *            *           *                       *    *               *     
##  *            *           *           *           *    *               *     
##  *            *           *           *           *    *               *     
##  *            *           *           *           *    *               *     
##  *            *                       *           *    *               *     
##  *            *           *                       *    *               *     
##  *            *           *           *           *    *               *     
##  *                        *           *           *    *               *     
##  *            *           *                       *    *               *     
##  *                        *           *           *    *               *     
##  *            *           *           *           *    *               *     
##  *                                    *           *    *               *     
##  *            *           *                       *    *               *     
##  *            *           *           *           *    *               *     
##  *            *           *           *                *               *     
##  *            *           *           *           *    *               *     
##  *            *           *           *                *               *     
##  *            *           *           *           *    *               *     
##  *                                    *           *    *               *     
##  *                                    *           *    *               *     
##  *            *           *           *           *    *               *     
##  *            *           *           *           *    *               *     
##  *                                    *           *    *               *     
##  *            *           *                            *                     
##  *            *           *           *           *    *               *     
## 
## Frequencies of Numbers of Factors Retained
## 
##  4  5  6  7 
##  1  9 12 18

Alright, let’s go ahead and drop sand based on these results.

Now we can fit our final model.

## Linear Regression Model
## 
## ols(formula = awc_r ~ dbthirdbar_r + claytotal_r + silttotal_r + 
##     om_r + total_frags_pct + ksat_r, data = dat, weights = dat$hzdepb_r, 
##     x = TRUE, y = TRUE)
## 
##                  Model Likelihood    Discrimination    
##                        Ratio Test           Indexes    
## Obs    2730    LR chi2    1851.77    R2       0.493    
## sigma0.2688    d.f.             6    R2 adj   0.491    
## d.f.   2723    Pr(> chi2)  0.0000    g        0.031    
## 
## Residuals
## 
##       Min        1Q    Median        3Q       Max 
## -0.427916 -0.016431  0.001849  0.019031  0.283423 
## 
## 
##                 Coef    S.E.   t      Pr(>|t|)
## Intercept        0.2324 0.0102  22.69 <0.0001 
## dbthirdbar_r    -0.0730 0.0068 -10.71 <0.0001 
## claytotal_r      0.0004 0.0001   6.31 <0.0001 
## silttotal_r      0.0006 0.0000  13.13 <0.0001 
## om_r             0.0053 0.0004  12.05 <0.0001 
## total_frags_pct -0.0012 0.0000 -31.82 <0.0001 
## ksat_r          -0.0002 0.0000  -5.82 <0.0001

##           index.orig training   test optimism index.corrected   Lower  Upper  n
## R-square      0.4340   0.5125 0.4729   0.0396          0.3944 -0.1249 0.6285 10
## MSE           0.0011   0.0009 0.0010  -0.0001          0.0011  0.0005 0.0017 10
## g             0.0309   0.0325 0.0321   0.0004          0.0305  0.0112 0.0415 10
## Intercept     0.0000   0.0000 0.0005  -0.0005          0.0005 -0.0480 0.1227 10
## Slope         1.0000   1.0000 0.9952   0.0048          0.9952  0.1415 1.3376 10

## [1] 1.527482

## [1] 0.03250423

## [1] 0.433991

## `geom_smooth()` using formula = 'y ~ x'

Final Model Accuracy

## Linear Regression Model
## 
## ols(formula = awc_r ~ dbthirdbar_r + claytotal_r + silttotal_r + 
##     om_r + total_frags_pct + ksat_r, data = dat, weights = dat$hzdepb_r, 
##     x = TRUE, y = TRUE)
## 
##                  Model Likelihood    Discrimination    
##                        Ratio Test           Indexes    
## Obs    2730    LR chi2    1851.77    R2       0.493    
## sigma0.2688    d.f.             6    R2 adj   0.491    
## d.f.   2723    Pr(> chi2)  0.0000    g        0.031    
## 
## Residuals
## 
##       Min        1Q    Median        3Q       Max 
## -0.427916 -0.016431  0.001849  0.019031  0.283423 
## 
## 
##                 Coef    S.E.   t      Pr(>|t|)
## Intercept        0.2324 0.0102  22.69 <0.0001 
## dbthirdbar_r    -0.0730 0.0068 -10.71 <0.0001 
## claytotal_r      0.0004 0.0001   6.31 <0.0001 
## silttotal_r      0.0006 0.0000  13.13 <0.0001 
## om_r             0.0053 0.0004  12.05 <0.0001 
## total_frags_pct -0.0012 0.0000 -31.82 <0.0001 
## ksat_r          -0.0002 0.0000  -5.82 <0.0001

Anova

##                 Analysis of Variance          Response: awc_r 
## 
##  Factor          d.f. Partial SS MS          F       P     
##  dbthirdbar_r       1   8.278434  8.27843379  114.61 <.0001
##  claytotal_r        1   2.874617  2.87461730   39.80 <.0001
##  silttotal_r        1  12.454132 12.45413192  172.42 <.0001
##  om_r               1  10.494479 10.49447883  145.29 <.0001
##  total_frags_pct    1  73.145590 73.14558984 1012.66 <.0001
##  ksat_r             1   2.442645  2.44264453   33.82 <.0001
##  REGRESSION         6 190.888928 31.81482137  440.46 <.0001
##  ERROR           2723 196.685176  0.07223106

Partial R2

Total fragments percent causes the most variance

Model Effects

Plot Effects

STAT2020-final-project2

Erin Rooney

2026-03-02

STUDY AREA

OBJECTIVES AND HYPOTHESES

RESULTS AND DISCUSSION