Libraries

library(mlbench)
library(caret)
library(tidyverse)
library(corrplot)
library(e1071)
library(mice)
library(questionr)

Question 3.1

The UC Irvine Machine Learning Repository6 contains a data set related to glass identification. The data consist of 214 glass samples labeled as one of seven class categories. There are nine predictors, including the refractive index and percentages of eight elements: Na, Mg, Al, Si, K, Ca, Ba, and Fe.

data("Glass")
str(Glass)
## 'data.frame':    214 obs. of  10 variables:
##  $ RI  : num  1.52 1.52 1.52 1.52 1.52 ...
##  $ Na  : num  13.6 13.9 13.5 13.2 13.3 ...
##  $ Mg  : num  4.49 3.6 3.55 3.69 3.62 3.61 3.6 3.61 3.58 3.6 ...
##  $ Al  : num  1.1 1.36 1.54 1.29 1.24 1.62 1.14 1.05 1.37 1.36 ...
##  $ Si  : num  71.8 72.7 73 72.6 73.1 ...
##  $ K   : num  0.06 0.48 0.39 0.57 0.55 0.64 0.58 0.57 0.56 0.57 ...
##  $ Ca  : num  8.75 7.83 7.78 8.22 8.07 8.07 8.17 8.24 8.3 8.4 ...
##  $ Ba  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Fe  : num  0 0 0 0 0 0.26 0 0 0 0.11 ...
##  $ Type: Factor w/ 6 levels "1","2","3","5",..: 1 1 1 1 1 1 1 1 1 1 ...

(a)

Using visualizations, explore the predictor variables to understand their distributions as well as the relationships between predictors.

factors <- Glass[, 1:9]

# Histograms of Predictors
factors %>%
  gather() %>%
  ggplot(aes(value)) + 
  geom_histogram(bins = 16) + 
  facet_wrap(~key, scales = 'free') +
  ggtitle("Histograms of Predictors")

# Glass
Glass %>%
  ggplot() +
  geom_bar(aes(x = Type)) +
  ggtitle("Distribution of Types of Glass")

# Correlation Matrix
Cor_Fact <- cor(factors)
corrplot.mixed(Cor_Fact, lower = 'number', upper = 'square', order ='AOE')

The distributions of the predictors seem to be mainly right-skewed distributions as observed for Ba, Ca, Fe, and K, with Ba and Fe predominantly centered around 0. On the other side we can see a left-skewed distribution coming from Mg and Si. Na’s distribution appears nearly normal with a slight right tail. The distribution for types of glass seem to have the highest frequency of types 1, 2, and 7. Finally looking at the different correlation between predictor values, we can see a strongest positive correlation between RI and Ca with a value of 0.81. With the next most positive correlation being Ba and Al with a value of 0.48. Comparatively, there are strongest negative correlations between RI and Si with a value of -0.54. With Al and Mg, Ca and Mg, and Ba and Mg are all strongly negatively correlated with values within the range of -0.45 and -0.5.

(b)

Do there appear to be any outliers in the data? Are any predictors skewed?

factors %>%
  summarise_all(~skewness(.))
##         RI        Na        Mg        Al         Si        K       Ca      Ba
## 1 1.602715 0.4478343 -1.136452 0.8946104 -0.7202392 6.460089 2.018446 3.36868
##         Fe
## 1 1.729811

While looking at skewness values for each of the variables, we are able to distinguish that all the variables besides Si, Na, and Al all will have outliers within the data. Since skewness measures the asymmetry of the distribution of data, a number that is farther away from 0 would have a a higher amount of skewness with a positive number indicating a right-skewed distribution and a negative number indicating left-skewed distribution. The reason why we are using skewness to “determine” outliers since as a data set is more skewed the probability of outliers increase. We are using the value of 1 for skewness to be the line where the distribution is moderately assymmetric.

(c)

Are there any relevant transformations of one or more predictors that might improve the classification model?

factors %>%
  mutate_all(list(~BoxCoxTrans(.)$lambda)) %>%
  head(1)
##   RI   Na Mg  Al Si  K   Ca Ba Fe
## 1 -2 -0.1 NA 0.5  2 NA -1.1 NA NA
freq.na(factors)
##    missing %
## RI       0 0
## Na       0 0
## Mg       0 0
## Al       0 0
## Si       0 0
## K        0 0
## Ca       0 0
## Ba       0 0
## Fe       0 0

I decided to try using a Box-Cox transformation to adjust the predictor distribution to be more linear. Something to note, is that Na and Al both have values less than 1 or greater than -1, showcasing that these distribution are quite normalized. We also see that Ri, Si, and Ca are all given a power transformation with Ca and Ri being negative and Si being positive. It should be noted that Mg, K, Ba, and Fe all have NA showcasing that Box-Cox transformation doesn’t work on those predictors.

Question 3.2

The soybean data can also be found at the UC Irvine Machine Learning Repository. Data were collected to predict disease in 683 soybeans. The 35 predictors are mostly categorical and include information on the environmental conditions (e.g., temperature, precipitation) and plant conditions (e.g., left spots, mold growth). The outcome labels consist of 19 distinct classes.

data("Soybean")

(a)

Investigate the frequency distributions for the categorical predictors. Are any of the distributions degenerate in the ways discussed earlier in this chapter?

str(Soybean)
## 'data.frame':    683 obs. of  36 variables:
##  $ Class          : Factor w/ 19 levels "2-4-d-injury",..: 11 11 11 11 11 11 11 11 11 11 ...
##  $ date           : Factor w/ 7 levels "0","1","2","3",..: 7 5 4 4 7 6 6 5 7 5 ...
##  $ plant.stand    : Ord.factor w/ 2 levels "0"<"1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ precip         : Ord.factor w/ 3 levels "0"<"1"<"2": 3 3 3 3 3 3 3 3 3 3 ...
##  $ temp           : Ord.factor w/ 3 levels "0"<"1"<"2": 2 2 2 2 2 2 2 2 2 2 ...
##  $ hail           : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 2 1 1 ...
##  $ crop.hist      : Factor w/ 4 levels "0","1","2","3": 2 3 2 2 3 4 3 2 4 3 ...
##  $ area.dam       : Factor w/ 4 levels "0","1","2","3": 2 1 1 1 1 1 1 1 1 1 ...
##  $ sever          : Factor w/ 3 levels "0","1","2": 2 3 3 3 2 2 2 2 2 3 ...
##  $ seed.tmt       : Factor w/ 3 levels "0","1","2": 1 2 2 1 1 1 2 1 2 1 ...
##  $ germ           : Ord.factor w/ 3 levels "0"<"1"<"2": 1 2 3 2 3 2 1 3 2 3 ...
##  $ plant.growth   : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
##  $ leaves         : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
##  $ leaf.halo      : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ leaf.marg      : Factor w/ 3 levels "0","1","2": 3 3 3 3 3 3 3 3 3 3 ...
##  $ leaf.size      : Ord.factor w/ 3 levels "0"<"1"<"2": 3 3 3 3 3 3 3 3 3 3 ...
##  $ leaf.shread    : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ leaf.malf      : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ leaf.mild      : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ stem           : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
##  $ lodging        : Factor w/ 2 levels "0","1": 2 1 1 1 1 1 2 1 1 1 ...
##  $ stem.cankers   : Factor w/ 4 levels "0","1","2","3": 4 4 4 4 4 4 4 4 4 4 ...
##  $ canker.lesion  : Factor w/ 4 levels "0","1","2","3": 2 2 1 1 2 1 2 2 2 2 ...
##  $ fruiting.bodies: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
##  $ ext.decay      : Factor w/ 3 levels "0","1","2": 2 2 2 2 2 2 2 2 2 2 ...
##  $ mycelium       : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ int.discolor   : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ sclerotia      : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ fruit.pods     : Factor w/ 4 levels "0","1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
##  $ fruit.spots    : Factor w/ 4 levels "0","1","2","4": 4 4 4 4 4 4 4 4 4 4 ...
##  $ seed           : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ mold.growth    : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ seed.discolor  : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ seed.size      : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ shriveling     : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ roots          : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
Soybean_tidy <- Soybean %>%
  select(-Class) %>%
  mutate(across(where(is.factor), as.numeric)) %>%
  pivot_longer(cols = -c(date), names_to = "name", values_to = "value")

# Distribution of Predictors
ggplot(Soybean_tidy, aes(x = value)) +
  geom_histogram(stat = "count") +
  facet_wrap(vars(name))

Degenerate distributions occur when a variable consistently takes on a single value. It seems that the variables: mycelium and sclerotia, both exhibit degenerate distributions. As we can see that they both predominantly have one possible value. With that there is also the variables: leaf.mild and leaf.malf that exhibit one-sided distributions very heavily and after excluding missing values could exhibit degenerate distributions. values.

(b)

Roughly 18 % of the data are missing. Are there particular predictors that are more likely to be missing? Is the pattern of missing data related to the classes?

# Proportion of missing values for each predictor
Soybean %>%
  summarise_all(list(~sum(is.na(.)))) %>%
  pivot_longer(everything(), names_to = "predictor", values_to = "missing_count") %>%
  ggplot(aes(y = reorder(predictor, missing_count), x = missing_count)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  labs(title = "Proportion of Missing Values for Predictors",
       x = "Proportion of Missing Values",
       y = "Predictor") +
  theme_minimal()

# NA Frequency By Predictor
freq.na(Soybean)
##                 missing  %
## hail                121 18
## sever               121 18
## seed.tmt            121 18
## lodging             121 18
## germ                112 16
## leaf.mild           108 16
## fruiting.bodies     106 16
## fruit.spots         106 16
## seed.discolor       106 16
## shriveling          106 16
## leaf.shread         100 15
## seed                 92 13
## mold.growth          92 13
## seed.size            92 13
## leaf.halo            84 12
## leaf.marg            84 12
## leaf.size            84 12
## leaf.malf            84 12
## fruit.pods           84 12
## precip               38  6
## stem.cankers         38  6
## canker.lesion        38  6
## ext.decay            38  6
## mycelium             38  6
## int.discolor         38  6
## sclerotia            38  6
## plant.stand          36  5
## roots                31  5
## temp                 30  4
## crop.hist            16  2
## plant.growth         16  2
## stem                 16  2
## date                  1  0
## area.dam              1  0
## Class                 0  0
## leaves                0  0
Total_Na <- Soybean %>%
  group_by(Class) %>%
  summarize_at(vars(-group_cols()), ~ sum(is.na(.)))

# NA Frequency by Class
row_sum <- rowSums(Total_Na[, -1])
result <- cbind(Total_Na[, 1, drop = FALSE], RowSums = row_sum)
result
##                          Class RowSums
## 1                 2-4-d-injury     450
## 2          alternarialeaf-spot       0
## 3                  anthracnose       0
## 4             bacterial-blight       0
## 5            bacterial-pustule       0
## 6                   brown-spot       0
## 7               brown-stem-rot       0
## 8                 charcoal-rot       0
## 9                cyst-nematode     336
## 10 diaporthe-pod-&-stem-blight     177
## 11       diaporthe-stem-canker       0
## 12                downy-mildew       0
## 13          frog-eye-leaf-spot       0
## 14            herbicide-injury     160
## 15      phyllosticta-leaf-spot       0
## 16            phytophthora-rot    1214
## 17              powdery-mildew       0
## 18           purple-seed-stain       0
## 19        rhizoctonia-root-rot       0

As shown in the table above we can see that the Predictors: Hail, Sever, Seed.tmt, and Lodging has 18% missing values each. We can also see that the classes that has NA values within them are: phytophthora-rot, herbicide-injury, diaporthe-pod-&-stem-blight, cyst-nematode, and 2-4-d-injury. With phytophthora-rot having the most at 1214.

(c)

Develop a strategy for handling missing data, either by eliminating predictors or imputation.

Summary

A Strategy that is used for handing the missing data is using Multivariate Imputation via Chained Equation or MICE. This method revolves around imputation of the missing data, and as we can see from md.pattern() we get a visual representation of where the missing data is for each Predictor. There are 3 different methods can be used with MICE, Predictive Mean Matching(PMM), Classification and regression trees(Cart), and Lasso Linear Regression(laso.norm). We used 5 imputation since it can help add uncertainty for the missing data and allow for a more accurate parameter estimate.

MICE
## 'data.frame':    683 obs. of  36 variables:
##  $ Class          : Factor w/ 19 levels "2-4-d-injury",..: 11 11 11 11 11 11 11 11 11 11 ...
##  $ date           : Factor w/ 7 levels "0","1","2","3",..: 7 5 4 4 7 6 6 5 7 5 ...
##  $ plant.stand    : Ord.factor w/ 2 levels "0"<"1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ precip         : Ord.factor w/ 3 levels "0"<"1"<"2": 3 3 3 3 3 3 3 3 3 3 ...
##  $ temp           : Ord.factor w/ 3 levels "0"<"1"<"2": 2 2 2 2 2 2 2 2 2 2 ...
##  $ hail           : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 2 1 1 ...
##  $ crop.hist      : Factor w/ 4 levels "0","1","2","3": 2 3 2 2 3 4 3 2 4 3 ...
##  $ area.dam       : Factor w/ 4 levels "0","1","2","3": 2 1 1 1 1 1 1 1 1 1 ...
##  $ sever          : Factor w/ 3 levels "0","1","2": 2 3 3 3 2 2 2 2 2 3 ...
##  $ seed.tmt       : Factor w/ 3 levels "0","1","2": 1 2 2 1 1 1 2 1 2 1 ...
##  $ germ           : Ord.factor w/ 3 levels "0"<"1"<"2": 1 2 3 2 3 2 1 3 2 3 ...
##  $ plant.growth   : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
##  $ leaves         : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
##  $ leaf.halo      : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ leaf.marg      : Factor w/ 3 levels "0","1","2": 3 3 3 3 3 3 3 3 3 3 ...
##  $ leaf.size      : Ord.factor w/ 3 levels "0"<"1"<"2": 3 3 3 3 3 3 3 3 3 3 ...
##  $ leaf.shread    : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ leaf.malf      : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ leaf.mild      : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ stem           : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
##  $ lodging        : Factor w/ 2 levels "0","1": 2 1 1 1 1 1 2 1 1 1 ...
##  $ stem.cankers   : Factor w/ 4 levels "0","1","2","3": 4 4 4 4 4 4 4 4 4 4 ...
##  $ canker.lesion  : Factor w/ 4 levels "0","1","2","3": 2 2 1 1 2 1 2 2 2 2 ...
##  $ fruiting.bodies: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
##  $ ext.decay      : Factor w/ 3 levels "0","1","2": 2 2 2 2 2 2 2 2 2 2 ...
##  $ mycelium       : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ int.discolor   : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ sclerotia      : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ fruit.pods     : Factor w/ 4 levels "0","1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
##  $ fruit.spots    : Factor w/ 4 levels "0","1","2","4": 4 4 4 4 4 4 4 4 4 4 ...
##  $ seed           : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ mold.growth    : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ seed.discolor  : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ seed.size      : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ shriveling     : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ roots          : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...

##     Class date area.dam crop.hist temp plant.stand precip hail sever    
## 562     1    1        1         1    1           1      1    1     1   0
## 77      1    1        1         1    1           1      1    0     0   2
## 8       1    1        1         1    1           1      0    0     0   3
## 6       1    1        1         1    1           0      1    0     0   3
## 14      1    1        1         1    0           0      0    0     0   5
## 15      1    1        1         0    0           0      0    0     0   6
## 1       1    0        0         0    0           0      0    0     0   8
##         0    1        1        16   30          36     38  121   121 364

##     leaves plant.growth leaf.halo leaf.marg leaf.size leaf.malf leaf.shread
## 562      1            1         1         1         1         1           1
## 21       1            1         1         1         1         1           1
## 9        1            1         0         0         0         0           0
## 75       1            1         0         0         0         0           0
## 16       1            0         1         1         1         1           0
##          0           16        84        84        84        84         100
##     germ seed.tmt    
## 562    1        1   0
## 21     0        0   2
## 9      1        0   6
## 75     0        0   7
## 16     0        0   4
##      112      121 685

##     stem stem.cankers canker.lesion ext.decay mycelium int.discolor
## 562    1            1             1         1        1            1
## 15     1            1             1         1        1            1
## 13     1            1             1         1        1            1
## 55     1            1             1         1        1            1
## 22     1            0             0         0        0            0
## 16     0            0             0         0        0            0
##       16           38            38        38       38           38
##     fruiting.bodies leaf.mild lodging    
## 562               1         1       1   0
## 15                1         0       0   2
## 13                0         1       0   2
## 55                0         0       0   3
## 22                0         0       0   8
## 16                0         0       0   9
##                 106       108     121 541

##     roots sclerotia fruit.pods seed mold.growth seed.size fruit.spots
## 562     1         1          1    1           1         1           1
## 68      1         1          0    0           0         0           0
## 14      1         0          1    1           1         1           0
## 8       1         0          1    0           0         0           0
## 15      0         1          1    1           1         1           1
## 16      0         0          0    0           0         0           0
##        31        38         84   92          92        92         106
##     seed.discolor shriveling    
## 562             1          1   0
## 68              0          0   7
## 14              0          0   4
## 8               0          0   7
## 15              1          1   1
## 16              0          0   9
##               106        106 747
## 
##  iter imp variable
##   1   1  date  plant.stand*  precip*  temp  hail*  crop.hist*  area.dam*  sever*  seed.tmt*  germ*  plant.growth*  leaf.halo  leaf.marg  leaf.size*  leaf.shread*  leaf.malf*  leaf.mild*  stem*  lodging*  stem.cankers*  canker.lesion*  fruiting.bodies*  ext.decay*  mycelium*  int.discolor*  sclerotia*  fruit.pods*  fruit.spots*  seed*  mold.growth*  seed.discolor*  seed.size*  shriveling*  roots*
##   1   2  date  plant.stand*  precip*  temp*  hail*  crop.hist  area.dam  sever*  seed.tmt*  germ*  plant.growth*  leaf.halo*  leaf.marg  leaf.size  leaf.shread*  leaf.malf*  leaf.mild*  stem*  lodging*  stem.cankers*  canker.lesion*  fruiting.bodies*  ext.decay*  mycelium*  int.discolor*  sclerotia*  fruit.pods*  fruit.spots*  seed*  mold.growth*  seed.discolor*  seed.size*  shriveling*  roots*
##   1   3  date  plant.stand*  precip*  temp*  hail*  crop.hist*  area.dam  sever*  seed.tmt*  germ*  plant.growth*  leaf.halo  leaf.marg*  leaf.size*  leaf.shread*  leaf.malf*  leaf.mild*  stem*  lodging*  stem.cankers*  canker.lesion*  fruiting.bodies*  ext.decay*  mycelium*  int.discolor*  sclerotia*  fruit.pods*  fruit.spots*  seed*  mold.growth*  seed.discolor*  seed.size*  shriveling*  roots*
##   1   4  date  plant.stand*  precip*  temp*  hail*  crop.hist  area.dam  sever*  seed.tmt*  germ*  plant.growth*  leaf.halo*  leaf.marg*  leaf.size*  leaf.shread*  leaf.malf*  leaf.mild*  stem*  lodging*  stem.cankers*  canker.lesion*  fruiting.bodies*  ext.decay*  mycelium*  int.discolor*  sclerotia*  fruit.pods*  fruit.spots*  seed*  mold.growth*  seed.discolor*  seed.size*  shriveling*  roots*
##   1   5  date  plant.stand*  precip  temp  hail*  crop.hist*  area.dam  sever*  seed.tmt*  germ*  plant.growth*  leaf.halo  leaf.marg  leaf.size*  leaf.shread*  leaf.malf*  leaf.mild*  stem*  lodging*  stem.cankers*  canker.lesion*  fruiting.bodies*  ext.decay*  mycelium*  int.discolor*  sclerotia*  fruit.pods*  fruit.spots*  seed*  mold.growth*  seed.discolor*  seed.size*  shriveling*  roots*
##   2   1  date*  plant.stand*  precip*  temp*  hail*  crop.hist*  area.dam*  sever*  seed.tmt*  germ*  plant.growth*  leaf.halo*  leaf.marg*  leaf.size*  leaf.shread*  leaf.malf*  leaf.mild*  stem*  lodging*  stem.cankers*  canker.lesion*  fruiting.bodies*  ext.decay*  mycelium*  int.discolor*  sclerotia*  fruit.pods*  fruit.spots*  seed*  mold.growth*  seed.discolor*  seed.size*  shriveling*  roots*
##   2   2  date*  plant.stand*  precip*  temp*  hail*  crop.hist*  area.dam*  sever*  seed.tmt*  germ*  plant.growth*  leaf.halo*  leaf.marg*  leaf.size*  leaf.shread*  leaf.malf*  leaf.mild*  stem*  lodging*  stem.cankers*  canker.lesion*  fruiting.bodies*  ext.decay*  mycelium*  int.discolor*  sclerotia*  fruit.pods*  fruit.spots*  seed*  mold.growth*  seed.discolor*  seed.size*  shriveling*  roots*
##   2   3  date*  plant.stand*  precip*  temp*  hail*  crop.hist*  area.dam  sever*  seed.tmt*  germ*  plant.growth*  leaf.halo*  leaf.marg*  leaf.size*  leaf.shread*  leaf.malf*  leaf.mild*  stem*  lodging*  stem.cankers*  canker.lesion*  fruiting.bodies*  ext.decay*  mycelium*  int.discolor*  sclerotia*  fruit.pods*  fruit.spots*  seed*  mold.growth*  seed.discolor*  seed.size*  shriveling*  roots*
##   2   4  date*  plant.stand*  precip*  temp*  hail*  crop.hist*  area.dam*  sever*  seed.tmt*  germ*  plant.growth*  leaf.halo*  leaf.marg*  leaf.size*  leaf.shread*  leaf.malf*  leaf.mild*  stem*  lodging*  stem.cankers*  canker.lesion*  fruiting.bodies*  ext.decay*  mycelium*  int.discolor*  sclerotia*  fruit.pods*  fruit.spots*  seed*  mold.growth*  seed.discolor*  seed.size*  shriveling*  roots*
##   2   5  date*  plant.stand*  precip*  temp*  hail*  crop.hist*  area.dam*  sever*  seed.tmt*  germ*  plant.growth*  leaf.halo*  leaf.marg*  leaf.size*  leaf.shread*  leaf.malf*  leaf.mild*  stem  lodging*  stem.cankers*  canker.lesion*  fruiting.bodies*  ext.decay*  mycelium*  int.discolor*  sclerotia  fruit.pods*  fruit.spots*  seed*  mold.growth*  seed.discolor*  seed.size*  shriveling*  roots*
##   3   1  date*  plant.stand*  precip*  temp*  hail*  crop.hist*  area.dam*  sever*  seed.tmt*  germ*  plant.growth*  leaf.halo*  leaf.marg*  leaf.size*  leaf.shread*  leaf.malf*  leaf.mild*  stem*  lodging*  stem.cankers*  canker.lesion*  fruiting.bodies*  ext.decay*  mycelium*  int.discolor*  sclerotia*  fruit.pods*  fruit.spots*  seed*  mold.growth*  seed.discolor*  seed.size*  shriveling*  roots*
##   3   2  date*  plant.stand*  precip*  temp*  hail*  crop.hist*  area.dam*  sever*  seed.tmt*  germ*  plant.growth*  leaf.halo*  leaf.marg  leaf.size*  leaf.shread*  leaf.malf*  leaf.mild*  stem*  lodging*  stem.cankers*  canker.lesion*  fruiting.bodies*  ext.decay*  mycelium*  int.discolor*  sclerotia*  fruit.pods*  fruit.spots*  seed*  mold.growth*  seed.discolor*  seed.size*  shriveling*  roots*
##   3   3  date*  plant.stand*  precip*  temp*  hail*  crop.hist*  area.dam*  sever*  seed.tmt*  germ*  plant.growth*  leaf.halo*  leaf.marg*  leaf.size*  leaf.shread*  leaf.malf*  leaf.mild*  stem*  lodging*  stem.cankers*  canker.lesion*  fruiting.bodies*  ext.decay*  mycelium*  int.discolor*  sclerotia*  fruit.pods*  fruit.spots*  seed*  mold.growth*  seed.discolor*  seed.size*  shriveling*  roots*
##   3   4  date*  plant.stand*  precip*  temp*  hail*  crop.hist*  area.dam*  sever*  seed.tmt*  germ*  plant.growth*  leaf.halo*  leaf.marg*  leaf.size*  leaf.shread*  leaf.malf*  leaf.mild*  stem*  lodging*  stem.cankers*  canker.lesion*  fruiting.bodies*  ext.decay*  mycelium*  int.discolor*  sclerotia*  fruit.pods*  fruit.spots*  seed*  mold.growth*  seed.discolor*  seed.size*  shriveling*  roots*
##   3   5  date*  plant.stand*  precip*  temp*  hail*  crop.hist*  area.dam*  sever*  seed.tmt*  germ*  plant.growth*  leaf.halo*  leaf.marg*  leaf.size  leaf.shread*  leaf.malf*  leaf.mild*  stem*  lodging*  stem.cankers*  canker.lesion*  fruiting.bodies*  ext.decay*  mycelium*  int.discolor*  sclerotia*  fruit.pods*  fruit.spots*  seed*  mold.growth*  seed.discolor*  seed.size*  shriveling*  roots*
##   4   1  date*  plant.stand*  precip*  temp*  hail*  crop.hist*  area.dam*  sever*  seed.tmt*  germ*  plant.growth*  leaf.halo*  leaf.marg*  leaf.size*  leaf.shread*  leaf.malf*  leaf.mild*  stem*  lodging*  stem.cankers*  canker.lesion*  fruiting.bodies*  ext.decay*  mycelium*  int.discolor*  sclerotia*  fruit.pods*  fruit.spots*  seed*  mold.growth*  seed.discolor*  seed.size*  shriveling*  roots*
##   4   2  date*  plant.stand*  precip*  temp*  hail*  crop.hist*  area.dam*  sever*  seed.tmt*  germ*  plant.growth*  leaf.halo*  leaf.marg*  leaf.size*  leaf.shread*  leaf.malf*  leaf.mild*  stem*  lodging*  stem.cankers*  canker.lesion*  fruiting.bodies*  ext.decay*  mycelium*  int.discolor*  sclerotia*  fruit.pods*  fruit.spots*  seed*  mold.growth*  seed.discolor*  seed.size*  shriveling*  roots*
##   4   3  date*  plant.stand*  precip*  temp*  hail*  crop.hist*  area.dam*  sever*  seed.tmt*  germ*  plant.growth*  leaf.halo*  leaf.marg*  leaf.size*  leaf.shread*  leaf.malf*  leaf.mild*  stem*  lodging*  stem.cankers*  canker.lesion*  fruiting.bodies*  ext.decay*  mycelium*  int.discolor*  sclerotia*  fruit.pods*  fruit.spots*  seed*  mold.growth*  seed.discolor*  seed.size*  shriveling*  roots*
##   4   4  date*  plant.stand*  precip*  temp*  hail*  crop.hist*  area.dam*  sever*  seed.tmt*  germ*  plant.growth*  leaf.halo*  leaf.marg*  leaf.size*  leaf.shread*  leaf.malf*  leaf.mild*  stem*  lodging*  stem.cankers*  canker.lesion*  fruiting.bodies*  ext.decay*  mycelium*  int.discolor  sclerotia*  fruit.pods*  fruit.spots*  seed*  mold.growth*  seed.discolor*  seed.size*  shriveling*  roots*
##   4   5  date*  plant.stand*  precip*  temp*  hail*  crop.hist*  area.dam*  sever*  seed.tmt*  germ*  plant.growth*  leaf.halo*  leaf.marg*  leaf.size*  leaf.shread*  leaf.malf*  leaf.mild*  stem*  lodging*  stem.cankers*  canker.lesion*  fruiting.bodies*  ext.decay*  mycelium*  int.discolor*  sclerotia  fruit.pods*  fruit.spots*  seed*  mold.growth*  seed.discolor*  seed.size*  shriveling*  roots*
##   5   1  date*  plant.stand*  precip*  temp*  hail*  crop.hist*  area.dam*  sever*  seed.tmt*  germ*  plant.growth*  leaf.halo*  leaf.marg*  leaf.size*  leaf.shread*  leaf.malf*  leaf.mild*  stem*  lodging*  stem.cankers*  canker.lesion*  fruiting.bodies*  ext.decay*  mycelium*  int.discolor*  sclerotia*  fruit.pods*  fruit.spots*  seed*  mold.growth*  seed.discolor*  seed.size*  shriveling*  roots*
##   5   2  date*  plant.stand*  precip*  temp*  hail*  crop.hist*  area.dam*  sever*  seed.tmt*  germ*  plant.growth*  leaf.halo*  leaf.marg*  leaf.size*  leaf.shread*  leaf.malf*  leaf.mild*  stem*  lodging*  stem.cankers*  canker.lesion*  fruiting.bodies*  ext.decay*  mycelium*  int.discolor*  sclerotia*  fruit.pods*  fruit.spots*  seed*  mold.growth*  seed.discolor*  seed.size*  shriveling*  roots*
##   5   3  date*  plant.stand*  precip*  temp*  hail*  crop.hist*  area.dam*  sever*  seed.tmt*  germ*  plant.growth*  leaf.halo*  leaf.marg*  leaf.size*  leaf.shread*  leaf.malf*  leaf.mild*  stem*  lodging*  stem.cankers*  canker.lesion*  fruiting.bodies*  ext.decay*  mycelium*  int.discolor*  sclerotia*  fruit.pods*  fruit.spots*  seed*  mold.growth*  seed.discolor*  seed.size*  shriveling*  roots*
##   5   4  date*  plant.stand*  precip*  temp*  hail*  crop.hist*  area.dam*  sever*  seed.tmt*  germ*  plant.growth*  leaf.halo*  leaf.marg*  leaf.size*  leaf.shread*  leaf.malf*  leaf.mild*  stem*  lodging*  stem.cankers*  canker.lesion*  fruiting.bodies*  ext.decay*  mycelium*  int.discolor*  sclerotia*  fruit.pods*  fruit.spots*  seed*  mold.growth*  seed.discolor*  seed.size*  shriveling*  roots*
##   5   5  date*  plant.stand*  precip*  temp*  hail*  crop.hist*  area.dam*  sever*  seed.tmt*  germ*  plant.growth*  leaf.halo*  leaf.marg*  leaf.size*  leaf.shread*  leaf.malf*  leaf.mild*  stem*  lodging*  stem.cankers*  canker.lesion*  fruiting.bodies*  ext.decay*  mycelium*  int.discolor*  sclerotia*  fruit.pods*  fruit.spots*  seed*  mold.growth*  seed.discolor*  seed.size*  shriveling*  roots*
Further Analysis
Soybean_tidy_2 <- Soybean_imputed_complete %>%
  select(-Class) %>%
  mutate(across(where(is.factor), as.numeric)) %>%
  pivot_longer(cols = -c(date), names_to = "name", values_to = "value")

# Distribution of Imputed Predictors
ggplot(Soybean_tidy_2, aes(x = value)) +
  geom_histogram(stat = "count") +
  facet_wrap(vars(name))
## Warning in geom_histogram(stat = "count"): Ignoring unknown parameters:
## `binwidth`, `bins`, and `pad`