4/2/2020

Predicting grass growth based on soil characteristics

  • Depending on the year, soil properties and topography can explain 60% or more of crop yield variability (Jiang & Thelan, 2004)
  • SSURGO (Soil Survey Geographic Database) provides representative yields for several grass-legume (pasture) crops associated with soil components.
  • Each soil component:
    • can be geographically mapped by connecting the component to the mapunit
    • is associated with many representative soil charactistics (slope, % silt, cec, etc.)

Predicting grass growth based on soil characteristics

Create working environment

Built with R version 3.6.1

#load libraries
library(dplyr)
library(ggplot2)
library(tidyr)
library(GGally)
library(caret)
library(MASS)
library(leaps)
library(gt)
library(car)

Representative component crop yields

crawford_crop <- 
  read.csv("EXTRACTIONS/WI_023/SSURGO/WI_023_SSURGO_cocropyld.csv")
vernon_crop <- 
  read.csv("EXTRACTIONS/WI_123/SSURGO/WI_123_SSURGO_cocropyld.csv")
# combine each county to make full data set
crop <- rbind(crawford_crop, vernon_crop)
colnames(crop)
##  [1] "cropname"      "yldunits"      "nonirryield.l" "nonirryield.r"
##  [5] "nonirryield.h" "irryield.l"    "irryield.r"    "irryield.h"   
##  [9] "cropprodindex" "vasoiprdgrp"   "cokey"         "cocropyldkey"
crop <- crop %>% dplyr::select(c(cropname, nonirryield.r, cokey)) %>%
  rename(yield = nonirryield.r)

Examine data

levels(crop$cropname)
##  [1] "Alfalfa hay"             "Bluegrass-white clover" 
##  [3] "Bromegrass-alfalfa hay"  "Corn"                   
##  [5] "Corn silage"             "Grass-clover"           
##  [7] "Grass-legume pasture"    "Kentucky bluegrass"     
##  [9] "Oats"                    "Orchardgrass-alsike"    
## [11] "Orchardgrass-red clover" "Red clover hay"         
## [13] "Smooth bromegrass"       "Soybeans"               
## [15] "Timothy-alsike"          "Winter wheat"           
## [17] "Cool-season grasses"     "Reed canarygrass"

Examine data

We are only interested in grass-legume combinations.

grass <- crop %>%
  filter(cropname == "Bluegrass-white clover"|
           cropname == "Bromegrass-alfalfa hay"|
           cropname == "Grass-clover"|
           cropname == "Grass-legume pasture"|
           cropname == "Orchardgrass-alsike"|
           cropname == "Orchardgrass-red clover"|
           cropname == "Timothy-alsike") %>%
  droplevels()

Examine data

Now we have 7 different grass-legume combinations. Summary data of each.

Representative yields (tons/acre)
Crawford and Vernon County
Grass mixture Min Mean Max No. Obs
Bluegrass-white clover 1.0 2.357353 4.0 544
Orchardgrass-alsike 1.6 3.097794 4.8 544
Orchardgrass-red clover 1.8 3.334559 5.6 544
Timothy-alsike 1.4 2.866912 4.4 544
Bromegrass-alfalfa hay 5.8 6.166667 6.4 24
Grass-legume pasture 3.3 3.300000 3.3 2
Grass-clover 6.0 6.000000 6.0 1

Let’s remove the crops without many data points.

They are mostly anomolous, and the grass-legume mixture doesn’t add any value.

Representative yields (tons/acre)
Crawford and Vernon County
Grass mixture Min Mean Max No. Obs
Bluegrass-white clover 1.0 2.36 4.0 544
Orchardgrass-alsike 1.6 3.10 4.8 544
Orchardgrass-red clover 1.8 3.33 5.6 544
Timothy-alsike 1.4 2.87 4.4 544

Yield distribution

Distribution of yields by crop

Add grass yields to cleaned soil data

#upload soils file
soil <- read.table("Final Clean Soil/fullSoilCrawfordVernon.txt", 
                   header = TRUE, sep = ",")
#summary(soil)
#remove high OM (<55) mucky soils 
soil <- soil %>%
  filter(om < 55)
#join crop yield and soil data by cokey
grassSoil <- left_join(grass, soil, by = "cokey")
# some cokey's from the yield data do not have matching cokeys from the soil data
# remove yield observations with no corresponding soil data (was from mucky soils)
grassSoil <- grassSoil %>%
  drop_na()

Variable selection

Step 1. Create training and test sets of data

set.seed(0731)
inTrain <- createDataPartition(grassSoil$yield, p = 0.75, list = FALSE)
trainGM <- grassSoil[inTrain,]
testGM <- grassSoil[-inTrain,]

Variable selection

Step 2. Null model hypothesis

  • This gives us a starting place to compare other models
best.guess <- round(mean(trainGM$yield),2) 
# Evaluate RMSE
RMSE.baseline <- round(sqrt(mean((best.guess-testGM$yield)^2)),2)

The average yield of all of the grass varieties across all of the variables is 2.92 and the difference between that average and the error based on untrained data is 1.15.

Variable selection: Slope

Slope directly affects crop growth through changing and redirecting water availability. It indirectly effects crop growth by changing other soil properties (Jiang & Thelan, 2004).

Slope

Slope predicts approx 16 percent of the variability in yield.

Variable selection: organic matter

Organic matter

Organic matter predicts approx 10 percent of the variability in yield.

Variable selection: Cation Exchange Capacity

Cation exchange capacity

Cation exchange capacity predicts approx 50 percent of the variability in yield.

Variable selection: Available Water Capacity

Available Water Capacity

Available water capacity predicts approx 50 percent of the variability in yield.

Variable Selection: Saturated hydraulic conductivity (ksat)

Raw data: ksat

ksat

Saturated hydraulic conductivity predicts approx 37 percent of the variability in yield.

Variable selection: pH

pH

pH predicts approx 18 percent of the variability in yield.

Variable selection: depth

Depth

Soil depth predicts approx 7 percent of the variability in yield.

Variable selection: silt

Silt

Percent silt predicts approx 53 percent of the variability in yield.

Variable selection: sand

Sand

Percent sand predicts approx 53 percent of the variability in yield.

Variable selection: Grass type

Grass type

Grass type predicts approx 10 percent of the variability in yield.

Comparing variables

Variables explored
single variable model predictions
Variable R2 RMSE
Null Null 1.15
Grass mix 0.1 1.10
Depth 0.07 1.09
OM 0.1 1.09
pH 0.18 1.08
Slope 0.16 1.04
Ksat 0.37 0.92
AWC 0.5 0.82
CEC 0.5 0.82
Sand 0.53 0.81
Silt 0.53 0.81

New dataset

grassSoil <- grassSoil %>%
  dplyr::select(c(yield, cropname, sand, silt, cec, awc, ksat, slope.r, ph, om, total.depth)) %>%
  rename(slope = slope.r)
summary(grassSoil)
##      yield                         cropname        sand            silt      
##  Min.   :1.00   Bluegrass-white clover :533   Min.   : 6.00   Min.   : 0.97  
##  1st Qu.:1.80   Orchardgrass-alsike    :533   1st Qu.:14.00   1st Qu.:21.40  
##  Median :2.80   Orchardgrass-red clover:533   Median :45.68   Median :42.30  
##  Mean   :2.92   Timothy-alsike         :533   Mean   :46.35   Mean   :40.86  
##  3rd Qu.:4.00                                 3rd Qu.:67.17   3rd Qu.:69.30  
##  Max.   :5.60                                 Max.   :96.87   Max.   :79.17  
##       cec             awc              ksat            slope      
##  Min.   : 1.65   Min.   :0.0700   Min.   :  6.54   Min.   : 1.00  
##  1st Qu.: 6.10   1st Qu.:0.1400   1st Qu.:  9.17   1st Qu.: 3.50  
##  Median :10.50   Median :0.1800   Median : 12.27   Median : 9.00  
##  Mean   :10.30   Mean   :0.1723   Mean   : 30.07   Mean   :12.89  
##  3rd Qu.:14.28   3rd Qu.:0.2200   3rd Qu.: 23.29   3rd Qu.:16.00  
##  Max.   :20.01   Max.   :0.2600   Max.   :181.14   Max.   :70.00  
##        ph              om         total.depth   
##  Min.   :5.370   Min.   :0.500   Min.   :102.0  
##  1st Qu.:6.070   1st Qu.:1.120   1st Qu.:152.0  
##  Median :6.200   Median :1.770   Median :183.0  
##  Mean   :6.151   Mean   :1.817   Mean   :175.7  
##  3rd Qu.:6.200   3rd Qu.:2.090   3rd Qu.:203.0  
##  Max.   :7.700   Max.   :8.000   Max.   :204.0

Check for correlations

Correlations

  • Sand and silt
  • Sand/silt and ksat
  • Sand/silt and cec
  • Sand/silt and awc
  • Available water capacity and cation exchange capacity

Model selection processes

Create new training and testing datasets

set.seed(0731)
inTrain <- createDataPartition(grassSoil$yield, p = 0.75, list = FALSE)
trainGM <- grassSoil[inTrain,]
testGM <- grassSoil[-inTrain,]

Create full model for comparisons

train.control <- trainControl(method = "cv", number = 10) #cross validate 10 times
full.model <- train(yield~., data = trainGM, method = "lm", trControl = train.control)
full.pred <- predict(full.model, testGM)
RMSE.full <- round(sqrt(mean((full.pred-testGM$yield)^2)),3)
adj.rsquared.full <- round((summary(full.model)$adj.r.squared),3)
rsquared.full <- round((summary(full.model)$r.squared),3)
models <- data.frame(Model = "Full Model", RMSE = RMSE.full, Rsquared = rsquared.full, adj.Rsquared = adj.rsquared.full)
models
##        Model  RMSE Rsquared adj.Rsquared
## 1 Full Model 0.524     0.78        0.778

Check the variance inflation factor (VIF)

VIFs exceeding 4 warrant further investigation, while VIFs exceeding 10 are signs of serious multicollinearity requiring correction

as.matrix(car::vif(full.model$finalModel))
##                                         [,1]
## `cropnameOrchardgrass-alsike`       1.485184
## `cropnameOrchardgrass-red clover`   1.490203
## `cropnameTimothy-alsike`            1.482299
## sand                              312.015555
## silt                              222.443135
## cec                                12.237214
## awc                                12.194694
## ksat                                4.778327
## slope                               1.138307
## ph                                  1.461658
## om                                  1.208798
## total.depth                         1.232427

Remove either sand or silt

This is the most highly correlated.

Model comparisons
Model RMSE R2 Adj. R2
Full Model 0.524 0.780 0.778
No Sand 0.564 0.756 0.754
No Silt 0.571 0.749 0.747

The data suggest the model is better with sand as a predictor, rather than silt. We continue with variable selection on a dataset without silt.

VIF when Silt is removed

as.data.frame(car::vif(noSilt.mod$finalModel))
##                                   car::vif(noSilt.mod$finalModel)
## `cropnameOrchardgrass-alsike`                            1.484786
## `cropnameOrchardgrass-red clover`                        1.489605
## `cropnameTimothy-alsike`                                 1.481280
## sand                                                    12.122878
## cec                                                      6.695798
## awc                                                     11.976573
## ksat                                                     3.635177
## slope                                                    1.122322
## ph                                                       1.461338
## om                                                       1.148582
## total.depth                                              1.232392

StepAIC variable selection

aic.model <- train(yield ~., data = train.noSand,
                    method = "lmStepAIC", 
                    trControl = train.control
                    )
## Start:  AIC=-1585.92
## .outcome ~ `cropnameOrchardgrass-alsike` + `cropnameOrchardgrass-red clover` + 
##     `cropnameTimothy-alsike` + silt + cec + awc + ksat + slope + 
##     ph + om + total.depth
## 
##                                     Df Sum of Sq    RSS     AIC
## - cec                                1     0.514 471.99 -1586.3
## <none>                                           471.48 -1585.9
## - awc                                1     3.114 474.59 -1578.4
## - ph                                 1     3.901 475.38 -1576.0
## - om                                 1     6.048 477.53 -1569.5
## - ksat                               1     7.408 478.88 -1565.5
## - total.depth                        1    27.679 499.16 -1505.7
## - silt                               1    33.031 504.51 -1490.3
## - `cropnameTimothy-alsike`           1    42.431 513.91 -1463.7
## - `cropnameOrchardgrass-alsike`      1    91.506 562.98 -1332.3
## - slope                              1   140.229 611.71 -1212.7
## - `cropnameOrchardgrass-red clover`  1   179.445 650.92 -1123.2
## 
## Step:  AIC=-1586.35
## .outcome ~ `cropnameOrchardgrass-alsike` + `cropnameOrchardgrass-red clover` + 
##     `cropnameTimothy-alsike` + silt + awc + ksat + slope + ph + 
##     om + total.depth
## 
##                                     Df Sum of Sq    RSS     AIC
## <none>                                           471.99 -1586.3
## - ph                                 1     4.604 476.60 -1574.4
## - awc                                1     4.614 476.61 -1574.3
## - om                                 1     6.533 478.52 -1568.5
## - ksat                               1     7.455 479.45 -1565.8
## - total.depth                        1    29.303 501.29 -1501.5
## - silt                               1    38.245 510.24 -1476.1
## - `cropnameTimothy-alsike`           1    42.351 514.34 -1464.5
## - `cropnameOrchardgrass-alsike`      1    91.717 563.71 -1332.5
## - slope                              1   146.654 618.65 -1198.5
## - `cropnameOrchardgrass-red clover`  1   179.629 651.62 -1123.6
## Start:  AIC=-1547.77
## .outcome ~ `cropnameOrchardgrass-alsike` + `cropnameOrchardgrass-red clover` + 
##     `cropnameTimothy-alsike` + silt + cec + awc + ksat + slope + 
##     ph + om + total.depth
## 
##                                     Df Sum of Sq    RSS     AIC
## - cec                                1     0.062 485.59 -1549.6
## <none>                                           485.53 -1547.8
## - ph                                 1     4.095 489.63 -1537.7
## - awc                                1     4.499 490.03 -1536.5
## - om                                 1     5.476 491.01 -1533.6
## - ksat                               1     8.944 494.48 -1523.4
## - silt                               1    27.199 512.73 -1471.1
## - total.depth                        1    31.043 516.57 -1460.3
## - `cropnameTimothy-alsike`           1    50.123 535.65 -1408.0
## - `cropnameOrchardgrass-alsike`      1    99.653 585.18 -1280.4
## - slope                              1   149.831 635.36 -1161.7
## - `cropnameOrchardgrass-red clover`  1   181.798 667.33 -1090.8
## 
## Step:  AIC=-1549.58
## .outcome ~ `cropnameOrchardgrass-alsike` + `cropnameOrchardgrass-red clover` + 
##     `cropnameTimothy-alsike` + silt + awc + ksat + slope + ph + 
##     om + total.depth
## 
##                                     Df Sum of Sq    RSS     AIC
## <none>                                           485.59 -1549.6
## - ph                                 1     4.426 490.02 -1538.5
## - om                                 1     5.612 491.21 -1535.0
## - awc                                1     5.617 491.21 -1535.0
## - ksat                               1     8.953 494.55 -1525.2
## - silt                               1    29.759 515.35 -1465.8
## - total.depth                        1    32.144 517.74 -1459.1
## - `cropnameTimothy-alsike`           1    50.112 535.70 -1409.9
## - `cropnameOrchardgrass-alsike`      1    99.676 585.27 -1282.2
## - slope                              1   154.476 640.07 -1153.0
## - `cropnameOrchardgrass-red clover`  1   182.109 667.70 -1092.0
## Start:  AIC=-1565.37
## .outcome ~ `cropnameOrchardgrass-alsike` + `cropnameOrchardgrass-red clover` + 
##     `cropnameTimothy-alsike` + silt + cec + awc + ksat + slope + 
##     ph + om + total.depth
## 
##                                     Df Sum of Sq    RSS     AIC
## - cec                                1     0.153 477.70 -1566.9
## <none>                                           477.55 -1565.4
## - ph                                 1     2.620 480.17 -1559.5
## - awc                                1     3.890 481.44 -1555.7
## - om                                 1     5.455 483.01 -1551.0
## - ksat                               1     7.791 485.34 -1544.1
## - total.depth                        1    29.532 507.08 -1481.0
## - silt                               1    32.494 510.05 -1472.6
## - `cropnameTimothy-alsike`           1    45.106 522.66 -1437.4
## - `cropnameOrchardgrass-alsike`      1    94.630 572.18 -1307.0
## - slope                              1   142.587 620.14 -1191.1
## - `cropnameOrchardgrass-red clover`  1   176.856 654.41 -1113.7
## 
## Step:  AIC=-1566.91
## .outcome ~ `cropnameOrchardgrass-alsike` + `cropnameOrchardgrass-red clover` + 
##     `cropnameTimothy-alsike` + silt + awc + ksat + slope + ph + 
##     om + total.depth
## 
##                                     Df Sum of Sq    RSS     AIC
## <none>                                           477.70 -1566.9
## - ph                                 1     2.990 480.69 -1559.9
## - awc                                1     5.201 482.90 -1553.3
## - om                                 1     5.662 483.37 -1551.9
## - ksat                               1     7.773 485.48 -1545.7
## - total.depth                        1    30.842 508.55 -1478.8
## - silt                               1    35.913 513.62 -1464.5
## - `cropnameTimothy-alsike`           1    45.172 522.88 -1438.8
## - `cropnameOrchardgrass-alsike`      1    94.757 572.46 -1308.3
## - slope                              1   146.537 624.24 -1183.6
## - `cropnameOrchardgrass-red clover`  1   176.955 654.66 -1115.1
## Start:  AIC=-1549.97
## .outcome ~ `cropnameOrchardgrass-alsike` + `cropnameOrchardgrass-red clover` + 
##     `cropnameTimothy-alsike` + silt + cec + awc + ksat + slope + 
##     ph + om + total.depth
## 
##                                     Df Sum of Sq    RSS     AIC
## <none>                                           483.39 -1550.0
## - cec                                1     0.680 484.07 -1550.0
## - awc                                1     3.670 487.06 -1541.1
## - ph                                 1     4.509 487.89 -1538.6
## - om                                 1     5.347 488.73 -1536.1
## - ksat                               1     7.425 490.81 -1530.0
## - total.depth                        1    25.816 509.20 -1477.0
## - silt                               1    27.446 510.83 -1472.4
## - `cropnameTimothy-alsike`           1    48.601 531.99 -1413.9
## - `cropnameOrchardgrass-alsike`      1    97.779 581.16 -1286.5
## - slope                              1   142.591 625.98 -1179.5
## - `cropnameOrchardgrass-red clover`  1   172.361 655.75 -1112.5
## Start:  AIC=-1572.65
## .outcome ~ `cropnameOrchardgrass-alsike` + `cropnameOrchardgrass-red clover` + 
##     `cropnameTimothy-alsike` + silt + cec + awc + ksat + slope + 
##     ph + om + total.depth
## 
##                                     Df Sum of Sq    RSS     AIC
## - cec                                1     0.260 475.40 -1573.9
## <none>                                           475.14 -1572.7
## - ph                                 1     2.843 477.98 -1566.1
## - awc                                1     3.978 479.12 -1562.7
## - om                                 1     6.065 481.21 -1556.4
## - ksat                               1     9.604 484.74 -1545.8
## - silt                               1    28.010 503.15 -1492.2
## - total.depth                        1    31.328 506.47 -1482.7
## - `cropnameTimothy-alsike`           1    46.415 521.56 -1440.4
## - `cropnameOrchardgrass-alsike`      1    94.196 569.34 -1314.2
## - slope                              1   141.571 616.71 -1199.1
## - `cropnameOrchardgrass-red clover`  1   168.534 643.67 -1137.5
## 
## Step:  AIC=-1573.87
## .outcome ~ `cropnameOrchardgrass-alsike` + `cropnameOrchardgrass-red clover` + 
##     `cropnameTimothy-alsike` + silt + awc + ksat + slope + ph + 
##     om + total.depth
## 
##                                     Df Sum of Sq    RSS     AIC
## <none>                                           475.40 -1573.9
## - ph                                 1     3.237 478.64 -1566.1
## - awc                                1     5.362 480.76 -1559.7
## - om                                 1     6.454 481.85 -1556.5
## - ksat                               1     9.709 485.11 -1546.8
## - silt                               1    31.701 507.10 -1482.9
## - total.depth                        1    32.936 508.34 -1479.4
## - `cropnameTimothy-alsike`           1    46.411 521.81 -1441.7
## - `cropnameOrchardgrass-alsike`      1    94.453 569.85 -1314.9
## - slope                              1   146.847 622.25 -1188.2
## - `cropnameOrchardgrass-red clover`  1   168.682 644.08 -1138.6
## Start:  AIC=-1558.83
## .outcome ~ `cropnameOrchardgrass-alsike` + `cropnameOrchardgrass-red clover` + 
##     `cropnameTimothy-alsike` + silt + cec + awc + ksat + slope + 
##     ph + om + total.depth
## 
##                                     Df Sum of Sq    RSS     AIC
## - cec                                1     0.368 480.09 -1559.7
## <none>                                           479.72 -1558.8
## - awc                                1     3.380 483.10 -1550.7
## - ph                                 1     3.487 483.21 -1550.4
## - om                                 1     5.876 485.60 -1543.3
## - ksat                               1     8.030 487.75 -1536.9
## - total.depth                        1    27.228 506.95 -1481.3
## - silt                               1    30.141 509.87 -1473.1
## - `cropnameTimothy-alsike`           1    43.535 523.26 -1435.7
## - `cropnameOrchardgrass-alsike`      1    89.428 569.15 -1314.7
## - slope                              1   150.874 630.60 -1167.0
## - `cropnameOrchardgrass-red clover`  1   164.574 644.30 -1136.1
## 
## Step:  AIC=-1559.72
## .outcome ~ `cropnameOrchardgrass-alsike` + `cropnameOrchardgrass-red clover` + 
##     `cropnameTimothy-alsike` + silt + awc + ksat + slope + ph + 
##     om + total.depth
## 
##                                     Df Sum of Sq    RSS     AIC
## <none>                                           480.09 -1559.7
## - ph                                 1     4.127 484.22 -1549.4
## - awc                                1     4.850 484.94 -1547.2
## - om                                 1     6.260 486.35 -1543.1
## - ksat                               1     8.098 488.19 -1537.6
## - total.depth                        1    28.720 508.81 -1478.1
## - silt                               1    34.392 514.48 -1462.1
## - `cropnameTimothy-alsike`           1    43.466 523.56 -1436.9
## - `cropnameOrchardgrass-alsike`      1    89.656 569.75 -1315.2
## - slope                              1   157.396 637.49 -1153.4
## - `cropnameOrchardgrass-red clover`  1   164.652 644.74 -1137.1
## Start:  AIC=-1559.29
## .outcome ~ `cropnameOrchardgrass-alsike` + `cropnameOrchardgrass-red clover` + 
##     `cropnameTimothy-alsike` + silt + cec + awc + ksat + slope + 
##     ph + om + total.depth
## 
##                                     Df Sum of Sq    RSS     AIC
## <none>                                           480.27 -1559.3
## - cec                                1     0.680 480.95 -1559.2
## - ph                                 1     2.648 482.92 -1553.4
## - awc                                1     4.392 484.66 -1548.2
## - om                                 1     4.725 485.00 -1547.2
## - ksat                               1     7.977 488.25 -1537.5
## - silt                               1    27.071 507.34 -1482.3
## - total.depth                        1    29.691 509.96 -1474.8
## - `cropnameTimothy-alsike`           1    42.060 522.33 -1440.3
## - `cropnameOrchardgrass-alsike`      1    93.180 573.45 -1305.8
## - slope                              1   144.413 624.68 -1182.5
## - `cropnameOrchardgrass-red clover`  1   172.749 653.02 -1118.5
## Start:  AIC=-1572.23
## .outcome ~ `cropnameOrchardgrass-alsike` + `cropnameOrchardgrass-red clover` + 
##     `cropnameTimothy-alsike` + silt + cec + awc + ksat + slope + 
##     ph + om + total.depth
## 
##                                     Df Sum of Sq    RSS     AIC
## - cec                                1     0.110 476.09 -1573.9
## <none>                                           475.98 -1572.2
## - ph                                 1     3.050 479.03 -1565.0
## - awc                                1     3.112 479.09 -1564.8
## - om                                 1     6.498 482.48 -1554.7
## - ksat                               1     8.594 484.57 -1548.4
## - total.depth                        1    28.509 504.49 -1490.4
## - silt                               1    35.122 511.10 -1471.6
## - `cropnameTimothy-alsike`           1    43.860 519.84 -1447.2
## - `cropnameOrchardgrass-alsike`      1    98.153 574.13 -1304.1
## - slope                              1   142.713 618.69 -1196.3
## - `cropnameOrchardgrass-red clover`  1   176.029 652.01 -1120.8
## 
## Step:  AIC=-1573.89
## .outcome ~ `cropnameOrchardgrass-alsike` + `cropnameOrchardgrass-red clover` + 
##     `cropnameTimothy-alsike` + silt + awc + ksat + slope + ph + 
##     om + total.depth
## 
##                                     Df Sum of Sq    RSS     AIC
## <none>                                           476.09 -1573.9
## - ph                                 1     3.419 479.51 -1565.6
## - awc                                1     4.069 480.16 -1563.6
## - om                                 1     6.748 482.84 -1555.6
## - ksat                               1     8.635 484.72 -1550.0
## - total.depth                        1    29.634 505.72 -1488.9
## - silt                               1    38.922 515.01 -1462.7
## - `cropnameTimothy-alsike`           1    43.823 519.91 -1449.0
## - `cropnameOrchardgrass-alsike`      1    98.261 574.35 -1305.5
## - slope                              1   145.683 621.77 -1191.2
## - `cropnameOrchardgrass-red clover`  1   176.109 652.20 -1122.3
## Start:  AIC=-1530.56
## .outcome ~ `cropnameOrchardgrass-alsike` + `cropnameOrchardgrass-red clover` + 
##     `cropnameTimothy-alsike` + silt + cec + awc + ksat + slope + 
##     ph + om + total.depth
## 
##                                     Df Sum of Sq    RSS     AIC
## - cec                                1     0.364 489.60 -1531.5
## <none>                                           489.23 -1530.6
## - ph                                 1     1.696 490.93 -1527.6
## - awc                                1     2.930 492.16 -1524.0
## - om                                 1     5.209 494.44 -1517.3
## - ksat                               1     9.294 498.53 -1505.5
## - total.depth                        1    29.504 518.74 -1448.2
## - silt                               1    33.104 522.34 -1438.3
## - `cropnameTimothy-alsike`           1    47.121 536.35 -1400.2
## - `cropnameOrchardgrass-alsike`      1    97.888 587.12 -1269.9
## - slope                              1   149.617 638.85 -1148.3
## - `cropnameOrchardgrass-red clover`  1   179.653 668.89 -1082.2
## 
## Step:  AIC=-1531.49
## .outcome ~ `cropnameOrchardgrass-alsike` + `cropnameOrchardgrass-red clover` + 
##     `cropnameTimothy-alsike` + silt + awc + ksat + slope + ph + 
##     om + total.depth
## 
##                                     Df Sum of Sq    RSS     AIC
## <none>                                           489.60 -1531.5
## - ph                                 1     2.083 491.68 -1527.4
## - awc                                1     4.148 493.75 -1521.3
## - om                                 1     5.629 495.23 -1517.0
## - ksat                               1     9.384 498.98 -1506.2
## - total.depth                        1    31.159 520.76 -1444.7
## - silt                               1    37.902 527.50 -1426.1
## - `cropnameTimothy-alsike`           1    47.136 536.73 -1401.1
## - `cropnameOrchardgrass-alsike`      1    98.199 587.80 -1270.3
## - slope                              1   154.972 644.57 -1137.5
## - `cropnameOrchardgrass-red clover`  1   180.061 669.66 -1082.5
## Start:  AIC=-1556.88
## .outcome ~ `cropnameOrchardgrass-alsike` + `cropnameOrchardgrass-red clover` + 
##     `cropnameTimothy-alsike` + silt + cec + awc + ksat + slope + 
##     ph + om + total.depth
## 
##                                     Df Sum of Sq    RSS     AIC
## - cec                                1     0.058 481.83 -1558.7
## <none>                                           481.77 -1556.9
## - ph                                 1     3.544 485.32 -1548.3
## - awc                                1     3.709 485.48 -1547.8
## - om                                 1     4.925 486.70 -1544.2
## - ksat                               1     8.924 490.70 -1532.4
## - total.depth                        1    29.660 511.43 -1472.7
## - silt                               1    31.828 513.60 -1466.6
## - `cropnameTimothy-alsike`           1    44.704 526.48 -1430.9
## - `cropnameOrchardgrass-alsike`      1    92.042 573.82 -1306.8
## - slope                              1   139.140 620.91 -1193.0
## - `cropnameOrchardgrass-red clover`  1   164.077 645.85 -1136.2
## 
## Step:  AIC=-1558.71
## .outcome ~ `cropnameOrchardgrass-alsike` + `cropnameOrchardgrass-red clover` + 
##     `cropnameTimothy-alsike` + silt + awc + ksat + slope + ph + 
##     om + total.depth
## 
##                                     Df Sum of Sq    RSS     AIC
## <none>                                           481.83 -1558.7
## - ph                                 1     3.822 485.65 -1549.3
## - awc                                1     4.608 486.44 -1547.0
## - om                                 1     5.090 486.92 -1545.6
## - ksat                               1     8.932 490.76 -1534.2
## - total.depth                        1    30.682 512.51 -1471.7
## - silt                               1    35.054 516.89 -1459.4
## - `cropnameTimothy-alsike`           1    44.717 526.55 -1432.7
## - `cropnameOrchardgrass-alsike`      1    92.157 573.99 -1308.3
## - slope                              1   142.127 623.96 -1188.0
## - `cropnameOrchardgrass-red clover`  1   164.233 646.06 -1137.8
## Start:  AIC=-1734.57
## .outcome ~ `cropnameOrchardgrass-alsike` + `cropnameOrchardgrass-red clover` + 
##     `cropnameTimothy-alsike` + silt + cec + awc + ksat + slope + 
##     ph + om + total.depth
## 
##                                     Df Sum of Sq    RSS     AIC
## - cec                                1     0.316 534.09 -1735.6
## <none>                                           533.77 -1734.6
## - ph                                 1     3.542 537.31 -1726.0
## - awc                                1     4.058 537.83 -1724.4
## - om                                 1     6.192 539.96 -1718.1
## - ksat                               1     9.328 543.10 -1708.8
## - total.depth                        1    32.192 565.96 -1642.8
## - silt                               1    33.886 567.66 -1638.0
## - `cropnameTimothy-alsike`           1    50.424 584.19 -1592.0
## - `cropnameOrchardgrass-alsike`      1   105.402 639.17 -1448.0
## - slope                              1   160.612 694.38 -1315.4
## - `cropnameOrchardgrass-red clover`  1   192.893 726.66 -1242.7
## 
## Step:  AIC=-1735.62
## .outcome ~ `cropnameOrchardgrass-alsike` + `cropnameOrchardgrass-red clover` + 
##     `cropnameTimothy-alsike` + silt + awc + ksat + slope + ph + 
##     om + total.depth
## 
##                                     Df Sum of Sq    RSS     AIC
## <none>                                           534.09 -1735.6
## - ph                                 1     4.098 538.18 -1725.4
## - awc                                1     5.602 539.69 -1720.9
## - om                                 1     6.565 540.65 -1718.1
## - ksat                               1     9.372 543.46 -1709.8
## - total.depth                        1    33.802 567.89 -1639.4
## - silt                               1    38.354 572.44 -1626.6
## - `cropnameTimothy-alsike`           1    50.410 584.50 -1593.2
## - `cropnameOrchardgrass-alsike`      1   105.620 639.71 -1448.7
## - slope                              1   166.316 700.40 -1303.6
## - `cropnameOrchardgrass-red clover`  1   193.118 727.20 -1243.5

StepAIC variable selection

aic.model$finalModel
## 
## Call:
## lm(formula = .outcome ~ `cropnameOrchardgrass-alsike` + `cropnameOrchardgrass-red clover` + 
##     `cropnameTimothy-alsike` + silt + awc + ksat + slope + ph + 
##     om + total.depth, data = dat)
## 
## Coefficients:
##                       (Intercept)      `cropnameOrchardgrass-alsike`  
##                         -0.981621                           0.724671  
## `cropnameOrchardgrass-red clover`           `cropnameTimothy-alsike`  
##                          0.975875                           0.502270  
##                              silt                                awc  
##                          0.017522                           3.729771  
##                              ksat                              slope  
##                         -0.004302                          -0.027485  
##                                ph                                 om  
##                          0.204286                           0.069365  
##                       total.depth  
##                          0.006191
summary(aic.model$finalModel) 
## 
## Call:
## lm(formula = .outcome ~ `cropnameOrchardgrass-alsike` + `cropnameOrchardgrass-red clover` + 
##     `cropnameTimothy-alsike` + silt + awc + ksat + slope + ph + 
##     om + total.depth, data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.85041 -0.39430 -0.00899  0.39022  1.61208 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                       -0.9816210  0.3696316  -2.656 0.007994 ** 
## `cropnameOrchardgrass-alsike`      0.7246710  0.0408673  17.732  < 2e-16 ***
## `cropnameOrchardgrass-red clover`  0.9758747  0.0406995  23.978  < 2e-16 ***
## `cropnameTimothy-alsike`           0.5022704  0.0410001  12.250  < 2e-16 ***
## silt                               0.0175218  0.0016398  10.686  < 2e-16 ***
## awc                                3.7297709  0.9132971   4.084 4.65e-05 ***
## ksat                              -0.0043022  0.0008145  -5.282 1.45e-07 ***
## slope                             -0.0274854  0.0012352 -22.252  < 2e-16 ***
## ph                                 0.2042861  0.0584868   3.493 0.000491 ***
## om                                 0.0693650  0.0156899   4.421 1.05e-05 ***
## total.depth                        0.0061914  0.0006172  10.031  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5796 on 1590 degrees of freedom
## Multiple R-squared:  0.756,  Adjusted R-squared:  0.7544 
## F-statistic: 492.6 on 10 and 1590 DF,  p-value: < 2.2e-16

Step AIC says take out CEC

Test the datasets with and without CEC

# create dataset without silt
noCEC <- noSilt %>% dplyr::select(-cec)
#create training and test sets
set.seed(0731)
inTrain <- createDataPartition(noCEC$yield, p = 0.75, list = FALSE)
train.noCEC <- noCEC[inTrain,]
test.noCEC <- noCEC[-inTrain,]
# create model
noCEC.mod <- train(yield~., data = train.noCEC, method = "lm", trControl = train.control)
noCEC.pred <- predict(noCEC.mod, test.noCEC)
RMSE.noCEC <- round(sqrt(mean((noCEC.pred-test.noCEC$yield)^2)),3)
adj.rsquared.noCEC <- round((summary(noCEC.mod)$adj.r.squared),3)
rsquared.noCEC <- round((summary(noCEC.mod)$r.squared),3)

# create dataset without sand
noCEC2 <- noSand %>% dplyr::select(-cec)
#create training and test sets
set.seed(0731)
inTrain <- createDataPartition(noCEC2$yield, p = 0.75, list = FALSE)
train.noCEC2 <- noCEC2[inTrain,]
test.noCEC2 <- noCEC2[-inTrain,]
# create model
noCEC.mod2 <- train(yield~., data = train.noCEC2, method = "lm", trControl = train.control)
noCEC.pred2 <- predict(noCEC.mod2, test.noCEC2)
RMSE.noCEC2 <- round(sqrt(mean((noCEC.pred2-test.noCEC2$yield)^2)),3)
adj.rsquared.noCEC2 <- round((summary(noCEC.mod2)$adj.r.squared),3)
rsquared.noCEC2 <- round((summary(noCEC.mod2)$r.squared),3)

models <- add_row(models, Model = c("No CEC/No Silt", "No CEC/No sand"), RMSE = c(RMSE.noCEC, RMSE.noCEC2), 
                  Rsquared = c(rsquared.noCEC, rsquared.noCEC2), adj.Rsquared = c(adj.rsquared.noCEC, adj.rsquared.noCEC2))
models <- models %>%
  arrange(RMSE)
model_display <- models %>%
  gt()%>%
  tab_header(
    title = "Model comparisons") %>%
  cols_label(adj.Rsquared = html("Adj. R<sup>2</sup>"),
             Rsquared = html("R<sup>2</sup>"))

model_display
Model comparisons
Model RMSE R2 Adj. R2
Full Model 0.524 0.780 0.778
No CEC/No sand 0.563 0.756 0.754
No Sand 0.564 0.756 0.754
No Silt 0.571 0.749 0.747
No CEC/No Silt 0.571 0.749 0.747

Check the VIF again

as.data.frame(car::vif(noCEC.mod2$finalModel))
##                                   car::vif(noCEC.mod2$finalModel)
## `cropnameOrchardgrass-alsike`                            1.484439
## `cropnameOrchardgrass-red clover`                        1.489528
## `cropnameTimothy-alsike`                                 1.481414
## silt                                                     8.032766
## awc                                                     10.664886
## ksat                                                     3.635620
## slope                                                    1.090521
## ph                                                       1.406391
## om                                                       1.129298
## total.depth                                              1.205214

Silt and AWC still create a problem

Remove AWC

Model comparisons
Model RMSE R2 Adj. R2
Full Model 0.524 0.780 0.778
No CEC/No sand 0.563 0.756 0.754
No Sand 0.564 0.756 0.754
No sand/CEC/AWC 0.570 0.753 0.752
No Silt 0.571 0.749 0.747
No CEC/No Silt 0.571 0.749 0.747

Remove AWC

Much better.

as.data.frame(vif(noAWC.mod$finalModel))
##                                   vif(noAWC.mod$finalModel)
## `cropnameOrchardgrass-alsike`                      1.484325
## `cropnameOrchardgrass-red clover`                  1.488714
## `cropnameTimothy-alsike`                           1.481368
## silt                                               2.909388
## ksat                                               2.561730
## slope                                              1.065049
## ph                                                 1.401685
## om                                                 1.128637
## total.depth                                        1.175022

Test two way interactions

silt:crop

Silt:crop

  • No significant interaction
silt.crop.mod <- lm(yield~silt*cropname, data = grassSoil)
anova(silt.crop.mod)
## Analysis of Variance Table
## 
## Response: yield
##                 Df  Sum Sq Mean Sq   F value  Pr(>F)    
## silt             1 1516.37 1516.37 2960.1811 < 2e-16 ***
## cropname         3  278.70   92.90  181.3549 < 2e-16 ***
## silt:cropname    3    3.81    1.27    2.4793 0.05947 .  
## Residuals     2124 1088.03    0.51                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two way interactions

Silt:ksat

Silt:ksat

  • Significant interaction
silt.ksat.mod <- lm(yield~silt*ksat, data = grassSoil)
anova(silt.ksat.mod)
## Analysis of Variance Table
## 
## Response: yield
##             Df  Sum Sq Mean Sq  F value    Pr(>F)    
## silt         1 1516.37 1516.37 2397.417 < 2.2e-16 ***
## ksat         1   17.86   17.86   28.232 1.189e-07 ***
## silt:ksat    1    6.72    6.72   10.628  0.001132 ** 
## Residuals 2128 1345.96    0.63                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two way interactions

Silt:slope

Silt:slope

  • Significant interaction.
silt.slope.mod <- lm(yield~silt*slope, data = grassSoil)
anova(silt.slope.mod)
## Analysis of Variance Table
## 
## Response: yield
##              Df  Sum Sq Mean Sq F value    Pr(>F)    
## silt          1 1516.37 1516.37 3186.14 < 2.2e-16 ***
## slope         1  276.29  276.29  580.54 < 2.2e-16 ***
## silt:slope    1   81.47   81.47  171.19 < 2.2e-16 ***
## Residuals  2128 1012.77    0.48                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two way interactions

Silt:ph

Silt:ph

  • Significant interaction.
silt.ph.mod <- lm(yield~silt*ph, data = grassSoil)
anova(silt.ph.mod)
## Analysis of Variance Table
## 
## Response: yield
##             Df  Sum Sq Mean Sq  F value    Pr(>F)    
## silt         1 1516.37 1516.37 2442.976 < 2.2e-16 ***
## ph           1   27.38   27.38   44.115 3.919e-11 ***
## silt:ph      1   22.30   22.30   35.922 2.406e-09 ***
## Residuals 2128 1320.86    0.62                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two way interactions

Silt:om

Silt:om

  • No significant interaction.
silt.om.mod <- lm(yield~silt*om, data = grassSoil)
anova(silt.om.mod)
## Analysis of Variance Table
## 
## Response: yield
##             Df  Sum Sq Mean Sq  F value Pr(>F)    
## silt         1 1516.37 1516.37 2435.866 <2e-16 ***
## om           1   45.77   45.77   73.529 <2e-16 ***
## silt:om      1    0.05    0.05    0.082 0.7746    
## Residuals 2128 1324.72    0.62                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two way interactions

Silt:depth

Silt:depth

  • Significant interaction.
silt.depth.mod <- lm(yield~silt*total.depth, data = grassSoil)
anova(silt.depth.mod)
## Analysis of Variance Table
## 
## Response: yield
##                    Df  Sum Sq Mean Sq  F value  Pr(>F)    
## silt                1 1516.37 1516.37 2478.073 < 2e-16 ***
## total.depth         1   48.62   48.62   79.455 < 2e-16 ***
## silt:total.depth    1   19.77   19.77   32.303 1.5e-08 ***
## Residuals        2128 1302.15    0.61                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two way interactions: Silt

Sand interacts with: ksat, slope, pH, OM

Two way interactions

crop:ksat

Crop:ksat

  • No significant interaction
crop.ksat.mod <- lm(yield~cropname*ksat, data = grassSoil)
anova(crop.ksat.mod)
## Analysis of Variance Table
## 
## Response: yield
##                 Df  Sum Sq Mean Sq   F value Pr(>F)    
## cropname         3  278.70   92.90  126.8468 <2e-16 ***
## ksat             1 1049.85 1049.85 1433.4715 <2e-16 ***
## cropname:ksat    3    2.79    0.93    1.2692 0.2833    
## Residuals     2124 1555.57    0.73                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two way interactions

crop:slope

Crop:slope

  • No significant interaction
crop.slope.mod <- lm(yield~cropname*slope, data = grassSoil)
anova(crop.slope.mod)
## Analysis of Variance Table
## 
## Response: yield
##                  Df  Sum Sq Mean Sq  F value Pr(>F)    
## cropname          3  278.70   92.90  92.6262 <2e-16 ***
## slope             1  476.71  476.71 475.3039 <2e-16 ***
## cropname:slope    3    1.22    0.41   0.4056  0.749    
## Residuals      2124 2130.28    1.00                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two way interactions

crop:ph

Crop:ph

  • No significant interaction
crop.ph.mod <- lm(yield~cropname*ph, data = grassSoil)
anova(crop.ph.mod)
## Analysis of Variance Table
## 
## Response: yield
##               Df  Sum Sq Mean Sq  F value Pr(>F)    
## cropname       3  278.70   92.90  92.3362 <2e-16 ***
## ph             1  469.31  469.31 466.4599 <2e-16 ***
## cropname:ph    3    1.93    0.64   0.6395 0.5896    
## Residuals   2124 2136.97    1.01                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two way interactions

crop:om

Crop:om

  • No significant interaction
crop.om.mod <- lm(yield~cropname*om, data = grassSoil)
anova(crop.om.mod)
## Analysis of Variance Table
## 
## Response: yield
##               Df  Sum Sq Mean Sq  F value Pr(>F)    
## cropname       3  278.70  92.900  84.5615 <2e-16 ***
## om             1  273.07 273.073 248.5630 <2e-16 ***
## cropname:om    3    1.69   0.564   0.5131 0.6733    
## Residuals   2124 2333.44   1.099                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two way interactions

crop:depth

Crop:depth

  • No significant interaction
crop.depth.mod <- lm(yield~cropname*total.depth, data = grassSoil)
anova(crop.depth.mod)
## Analysis of Variance Table
## 
## Response: yield
##                        Df  Sum Sq Mean Sq  F value Pr(>F)    
## cropname                3  278.70  92.900  82.6713 <2e-16 ***
## total.depth             1  220.24 220.239 195.9892 <2e-16 ***
## cropname:total.depth    3    1.17   0.391   0.3477 0.7908    
## Residuals            2124 2386.80   1.124                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two way interactions: crop

Crop doesn’t interact with any variable!

Two way interactions

ksat: slope

Ksat:slope

  • Significant interaction
ksat.slope.mod <- lm(yield~ksat*slope, data = grassSoil)
anova(ksat.slope.mod)
## Analysis of Variance Table
## 
## Response: yield
##              Df  Sum Sq Mean Sq F value    Pr(>F)    
## ksat          1 1049.85 1049.85 1690.22 < 2.2e-16 ***
## slope         1  384.76  384.76  619.46 < 2.2e-16 ***
## ksat:slope    1  130.53  130.53  210.15 < 2.2e-16 ***
## Residuals  2128 1321.77    0.62                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two way interactions

ksat: ph

Ksat:pH

  • Significant interaction
ksat.ph.mod <- lm(yield~ksat*ph, data = grassSoil)
anova(ksat.ph.mod)
## Analysis of Variance Table
## 
## Response: yield
##             Df  Sum Sq Mean Sq  F value    Pr(>F)    
## ksat         1 1049.85 1049.85 1368.847 < 2.2e-16 ***
## ph           1  187.49  187.49  244.460 < 2.2e-16 ***
## ksat:ph      1   17.49   17.49   22.805 1.916e-06 ***
## Residuals 2128 1632.08    0.77                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two way interactions

ksat: om

Ksat:om

  • Significant interaction
ksat.om.mod <- lm(yield~ksat*om, data = grassSoil)
anova(ksat.om.mod)
## Analysis of Variance Table
## 
## Response: yield
##             Df  Sum Sq Mean Sq  F value    Pr(>F)    
## ksat         1 1049.85 1049.85 1271.358 < 2.2e-16 ***
## om           1   69.93   69.93   84.680 < 2.2e-16 ***
## ksat:om      1    9.90    9.90   11.994 0.0005443 ***
## Residuals 2128 1757.23    0.83                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two way interactions

ksat: depth

Ksat:depth

  • Significant interaction
ksat.depth.mod <- lm(yield~ksat*total.depth, data = grassSoil)
anova(ksat.depth.mod)
## Analysis of Variance Table
## 
## Response: yield
##                    Df  Sum Sq Mean Sq F value    Pr(>F)    
## ksat                1 1049.85 1049.85 1437.24 < 2.2e-16 ***
## total.depth         1  206.18  206.18  282.25 < 2.2e-16 ***
## ksat:total.depth    1   76.47   76.47  104.68 < 2.2e-16 ***
## Residuals        2128 1554.42    0.73                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two way interactions: saturated hydraulic conductivity

Ksat interacts with: slope, pH, OM, depth.

Two way interactions

slope: depth

Slope:depth

  • No Significant interaction
slope.depth.mod <- lm(yield~slope*total.depth, data = grassSoil)
anova(slope.depth.mod)
## Analysis of Variance Table
## 
## Response: yield
##                     Df  Sum Sq Mean Sq  F value Pr(>F)    
## slope                1  476.71  476.71 457.7511 <2e-16 ***
## total.depth          1  191.94  191.94 184.3081 <2e-16 ***
## slope:total.depth    1    2.13    2.13   2.0413 0.1532    
## Residuals         2128 2216.13    1.04                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two way interactions

slope: ph

slope:pH

  • Significant interaction
slope.ph.mod <- lm(yield~slope*ph, data = grassSoil)
anova(slope.ph.mod)
## Analysis of Variance Table
## 
## Response: yield
##             Df  Sum Sq Mean Sq F value    Pr(>F)    
## slope        1  476.71  476.71 501.612 < 2.2e-16 ***
## ph           1  341.28  341.28 359.107 < 2.2e-16 ***
## slope:ph     1   46.57   46.57  49.002 3.415e-12 ***
## Residuals 2128 2022.35    0.95                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two way interactions

slope: om

slope:om

  • Significant interaction
slope.om.mod <- lm(yield~slope*om, data = grassSoil)
anova(slope.om.mod)
## Analysis of Variance Table
## 
## Response: yield
##             Df  Sum Sq Mean Sq F value    Pr(>F)    
## slope        1  476.71  476.71 455.320 < 2.2e-16 ***
## om           1  167.52  167.52 160.007 < 2.2e-16 ***
## slope:om     1   14.71   14.71  14.051 0.0001826 ***
## Residuals 2128 2227.96    1.05                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two way interactions: slope

Slope interacts with: pH and OM.

Two way interaction

ph:depth

pH:depth

  • Significant interaction
ph.depth.mod <- lm(yield~ph*total.depth, data = grassSoil)
anova(ph.depth.mod)
## Analysis of Variance Table
## 
## Response: yield
##                  Df  Sum Sq Mean Sq F value    Pr(>F)    
## ph                1  469.31  469.31 427.122 < 2.2e-16 ***
## total.depth       1   62.62   62.62  56.990 6.455e-14 ***
## ph:total.depth    1   16.80   16.80  15.289 9.517e-05 ***
## Residuals      2128 2338.18    1.10                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two way interactions

pH:om

pH:OM

  • No sgnificant interaction
om.ph.mod <- lm(yield~om*ph, data = grassSoil)
anova(om.ph.mod)
## Analysis of Variance Table
## 
## Response: yield
##             Df  Sum Sq Mean Sq  F value Pr(>F)    
## om           1  273.07  273.07 255.1408 <2e-16 ***
## ph           1  333.63  333.63 311.7247 <2e-16 ***
## om:ph        1    2.63    2.63   2.4614 0.1168    
## Residuals 2128 2277.57    1.07                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two way interactions: pH

pH interacts with depth

Two way interactions

om:depth

OM:depth

  • No significant interaction
om.depth.mod <- lm(yield~om*total.depth, data = grassSoil)
anova(om.depth.mod)
## Analysis of Variance Table
## 
## Response: yield
##                  Df  Sum Sq Mean Sq  F value Pr(>F)    
## om                1  273.07 273.073 238.9728 <2e-16 ***
## total.depth       1  178.66 178.660 156.3496 <2e-16 ***
## om:total.depth    1    3.52   3.517   3.0781 0.0795 .  
## Residuals      2128 2431.66   1.143                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two way interactions: Organic matter

No more interactions

Create training and test sets

Data with crop, silt, ksat, slope, pH, OM, depth (no sand, cec, or awc)

finalDat <- noAWC
set.seed(0731)
inTrain <- createDataPartition(finalDat$yield, p = 0.75, list = FALSE)
train.final <- finalDat[inTrain,]
test.final <- finalDat[-inTrain,]

Create model with significant interactions

Drop non-significant interaction terms until they are all significant, or until removing them decreases model performance.

Model comparisons
Model RMSE Adj. R2 R2
Full InterX mod 0.481 0.833 0.838
No crop interX 0.477 0.832 0.835
InterX from two way tests 0.498 0.821 0.824
Drop ksat:depth 0.499 0.821 0.823
No interactions 0.570 0.752 0.753
Baseline 1.150 Null Null

Predicted versus test values