library(mlbench)
library(caret)
library(tidyverse)
library(corrplot)
library(e1071)
library(mice)
library(questionr)
The UC Irvine Machine Learning Repository6 contains a data set related to glass identification. The data consist of 214 glass samples labeled as one of seven class categories. There are nine predictors, including the refractive index and percentages of eight elements: Na, Mg, Al, Si, K, Ca, Ba, and Fe.
data("Glass")
str(Glass)
## 'data.frame': 214 obs. of 10 variables:
## $ RI : num 1.52 1.52 1.52 1.52 1.52 ...
## $ Na : num 13.6 13.9 13.5 13.2 13.3 ...
## $ Mg : num 4.49 3.6 3.55 3.69 3.62 3.61 3.6 3.61 3.58 3.6 ...
## $ Al : num 1.1 1.36 1.54 1.29 1.24 1.62 1.14 1.05 1.37 1.36 ...
## $ Si : num 71.8 72.7 73 72.6 73.1 ...
## $ K : num 0.06 0.48 0.39 0.57 0.55 0.64 0.58 0.57 0.56 0.57 ...
## $ Ca : num 8.75 7.83 7.78 8.22 8.07 8.07 8.17 8.24 8.3 8.4 ...
## $ Ba : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Fe : num 0 0 0 0 0 0.26 0 0 0 0.11 ...
## $ Type: Factor w/ 6 levels "1","2","3","5",..: 1 1 1 1 1 1 1 1 1 1 ...
Using visualizations, explore the predictor variables to understand their distributions as well as the relationships between predictors.
factors <- Glass[, 1:9]
# Histograms of Predictors
factors %>%
gather() %>%
ggplot(aes(value)) +
geom_histogram(bins = 16) +
facet_wrap(~key, scales = 'free') +
ggtitle("Histograms of Predictors")
# Glass
Glass %>%
ggplot() +
geom_bar(aes(x = Type)) +
ggtitle("Distribution of Types of Glass")
# Correlation Matrix
Cor_Fact <- cor(factors)
corrplot.mixed(Cor_Fact, lower = 'number', upper = 'square', order ='AOE')
The distributions of the predictors seem to be mainly right-skewed distributions as observed for Ba, Ca, Fe, and K, with Ba and Fe predominantly centered around 0. On the other side we can see a left-skewed distribution coming from Mg and Si. Na’s distribution appears nearly normal with a slight right tail. The distribution for types of glass seem to have the highest frequency of types 1, 2, and 7. Finally looking at the different correlation between predictor values, we can see a strongest positive correlation between RI and Ca with a value of 0.81. With the next most positive correlation being Ba and Al with a value of 0.48. Comparatively, there are strongest negative correlations between RI and Si with a value of -0.54. With Al and Mg, Ca and Mg, and Ba and Mg are all strongly negatively correlated with values within the range of -0.45 and -0.5.
Do there appear to be any outliers in the data? Are any predictors skewed?
factors %>%
summarise_all(~skewness(.))
## RI Na Mg Al Si K Ca Ba
## 1 1.602715 0.4478343 -1.136452 0.8946104 -0.7202392 6.460089 2.018446 3.36868
## Fe
## 1 1.729811
While looking at skewness values for each of the variables, we are able to distinguish that all the variables besides Si, Na, and Al all will have outliers within the data. Since skewness measures the asymmetry of the distribution of data, a number that is farther away from 0 would have a a higher amount of skewness with a positive number indicating a right-skewed distribution and a negative number indicating left-skewed distribution. The reason why we are using skewness to “determine” outliers since as a data set is more skewed the probability of outliers increase. We are using the value of 1 for skewness to be the line where the distribution is moderately assymmetric.
Are there any relevant transformations of one or more predictors that might improve the classification model?
factors %>%
mutate_all(list(~BoxCoxTrans(.)$lambda)) %>%
head(1)
## RI Na Mg Al Si K Ca Ba Fe
## 1 -2 -0.1 NA 0.5 2 NA -1.1 NA NA
freq.na(factors)
## missing %
## RI 0 0
## Na 0 0
## Mg 0 0
## Al 0 0
## Si 0 0
## K 0 0
## Ca 0 0
## Ba 0 0
## Fe 0 0
I decided to try using a Box-Cox transformation to adjust the predictor distribution to be more linear. Something to note, is that Na and Al both have values less than 1 or greater than -1, showcasing that these distribution are quite normalized. We also see that Ri, Si, and Ca are all given a power transformation with Ca and Ri being negative and Si being positive. It should be noted that Mg, K, Ba, and Fe all have NA showcasing that Box-Cox transformation doesn’t work on those predictors.
The soybean data can also be found at the UC Irvine Machine Learning Repository. Data were collected to predict disease in 683 soybeans. The 35 predictors are mostly categorical and include information on the environmental conditions (e.g., temperature, precipitation) and plant conditions (e.g., left spots, mold growth). The outcome labels consist of 19 distinct classes.
data("Soybean")
Investigate the frequency distributions for the categorical predictors. Are any of the distributions degenerate in the ways discussed earlier in this chapter?
str(Soybean)
## 'data.frame': 683 obs. of 36 variables:
## $ Class : Factor w/ 19 levels "2-4-d-injury",..: 11 11 11 11 11 11 11 11 11 11 ...
## $ date : Factor w/ 7 levels "0","1","2","3",..: 7 5 4 4 7 6 6 5 7 5 ...
## $ plant.stand : Ord.factor w/ 2 levels "0"<"1": 1 1 1 1 1 1 1 1 1 1 ...
## $ precip : Ord.factor w/ 3 levels "0"<"1"<"2": 3 3 3 3 3 3 3 3 3 3 ...
## $ temp : Ord.factor w/ 3 levels "0"<"1"<"2": 2 2 2 2 2 2 2 2 2 2 ...
## $ hail : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 2 1 1 ...
## $ crop.hist : Factor w/ 4 levels "0","1","2","3": 2 3 2 2 3 4 3 2 4 3 ...
## $ area.dam : Factor w/ 4 levels "0","1","2","3": 2 1 1 1 1 1 1 1 1 1 ...
## $ sever : Factor w/ 3 levels "0","1","2": 2 3 3 3 2 2 2 2 2 3 ...
## $ seed.tmt : Factor w/ 3 levels "0","1","2": 1 2 2 1 1 1 2 1 2 1 ...
## $ germ : Ord.factor w/ 3 levels "0"<"1"<"2": 1 2 3 2 3 2 1 3 2 3 ...
## $ plant.growth : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ leaves : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ leaf.halo : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
## $ leaf.marg : Factor w/ 3 levels "0","1","2": 3 3 3 3 3 3 3 3 3 3 ...
## $ leaf.size : Ord.factor w/ 3 levels "0"<"1"<"2": 3 3 3 3 3 3 3 3 3 3 ...
## $ leaf.shread : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ leaf.malf : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ leaf.mild : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
## $ stem : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ lodging : Factor w/ 2 levels "0","1": 2 1 1 1 1 1 2 1 1 1 ...
## $ stem.cankers : Factor w/ 4 levels "0","1","2","3": 4 4 4 4 4 4 4 4 4 4 ...
## $ canker.lesion : Factor w/ 4 levels "0","1","2","3": 2 2 1 1 2 1 2 2 2 2 ...
## $ fruiting.bodies: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ ext.decay : Factor w/ 3 levels "0","1","2": 2 2 2 2 2 2 2 2 2 2 ...
## $ mycelium : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ int.discolor : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
## $ sclerotia : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ fruit.pods : Factor w/ 4 levels "0","1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
## $ fruit.spots : Factor w/ 4 levels "0","1","2","4": 4 4 4 4 4 4 4 4 4 4 ...
## $ seed : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ mold.growth : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ seed.discolor : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ seed.size : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ shriveling : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ roots : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
Soybean_tidy <- Soybean %>%
select(-Class) %>%
mutate(across(where(is.factor), as.numeric)) %>%
pivot_longer(cols = -c(date), names_to = "name", values_to = "value")
# Distribution of Predictors
ggplot(Soybean_tidy, aes(x = value)) +
geom_histogram(stat = "count") +
facet_wrap(vars(name))
Degenerate distributions occur when a variable consistently takes on a
single value. It seems that the variables: mycelium and sclerotia, both
exhibit degenerate distributions. As we can see that they both
predominantly have one possible value. With that there is also the
variables: leaf.mild and leaf.malf that exhibit one-sided distributions
very heavily and after excluding missing values could exhibit degenerate
distributions. values.
Roughly 18 % of the data are missing. Are there particular predictors that are more likely to be missing? Is the pattern of missing data related to the classes?
# Proportion of missing values for each predictor
Soybean %>%
summarise_all(list(~sum(is.na(.)))) %>%
pivot_longer(everything(), names_to = "predictor", values_to = "missing_count") %>%
ggplot(aes(y = reorder(predictor, missing_count), x = missing_count)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(title = "Proportion of Missing Values for Predictors",
x = "Proportion of Missing Values",
y = "Predictor") +
theme_minimal()
# NA Frequency By Predictor
freq.na(Soybean)
## missing %
## hail 121 18
## sever 121 18
## seed.tmt 121 18
## lodging 121 18
## germ 112 16
## leaf.mild 108 16
## fruiting.bodies 106 16
## fruit.spots 106 16
## seed.discolor 106 16
## shriveling 106 16
## leaf.shread 100 15
## seed 92 13
## mold.growth 92 13
## seed.size 92 13
## leaf.halo 84 12
## leaf.marg 84 12
## leaf.size 84 12
## leaf.malf 84 12
## fruit.pods 84 12
## precip 38 6
## stem.cankers 38 6
## canker.lesion 38 6
## ext.decay 38 6
## mycelium 38 6
## int.discolor 38 6
## sclerotia 38 6
## plant.stand 36 5
## roots 31 5
## temp 30 4
## crop.hist 16 2
## plant.growth 16 2
## stem 16 2
## date 1 0
## area.dam 1 0
## Class 0 0
## leaves 0 0
Total_Na <- Soybean %>%
group_by(Class) %>%
summarize_at(vars(-group_cols()), ~ sum(is.na(.)))
# NA Frequency by Class
row_sum <- rowSums(Total_Na[, -1])
result <- cbind(Total_Na[, 1, drop = FALSE], RowSums = row_sum)
result
## Class RowSums
## 1 2-4-d-injury 450
## 2 alternarialeaf-spot 0
## 3 anthracnose 0
## 4 bacterial-blight 0
## 5 bacterial-pustule 0
## 6 brown-spot 0
## 7 brown-stem-rot 0
## 8 charcoal-rot 0
## 9 cyst-nematode 336
## 10 diaporthe-pod-&-stem-blight 177
## 11 diaporthe-stem-canker 0
## 12 downy-mildew 0
## 13 frog-eye-leaf-spot 0
## 14 herbicide-injury 160
## 15 phyllosticta-leaf-spot 0
## 16 phytophthora-rot 1214
## 17 powdery-mildew 0
## 18 purple-seed-stain 0
## 19 rhizoctonia-root-rot 0
As shown in the table above we can see that the Predictors: Hail, Sever, Seed.tmt, and Lodging has 18% missing values each. We can also see that the classes that has NA values within them are: phytophthora-rot, herbicide-injury, diaporthe-pod-&-stem-blight, cyst-nematode, and 2-4-d-injury. With phytophthora-rot having the most at 1214.
Develop a strategy for handling missing data, either by eliminating predictors or imputation.
A Strategy that is used for handing the missing data is using Multivariate Imputation via Chained Equation or MICE. This method revolves around imputation of the missing data, and as we can see from md.pattern() we get a visual representation of where the missing data is for each Predictor. There are 3 different methods can be used with MICE, Predictive Mean Matching(PMM), Classification and regression trees(Cart), and Lasso Linear Regression(laso.norm). We used 5 imputation since it can help add uncertainty for the missing data and allow for a more accurate parameter estimate.
## 'data.frame': 683 obs. of 36 variables:
## $ Class : Factor w/ 19 levels "2-4-d-injury",..: 11 11 11 11 11 11 11 11 11 11 ...
## $ date : Factor w/ 7 levels "0","1","2","3",..: 7 5 4 4 7 6 6 5 7 5 ...
## $ plant.stand : Ord.factor w/ 2 levels "0"<"1": 1 1 1 1 1 1 1 1 1 1 ...
## $ precip : Ord.factor w/ 3 levels "0"<"1"<"2": 3 3 3 3 3 3 3 3 3 3 ...
## $ temp : Ord.factor w/ 3 levels "0"<"1"<"2": 2 2 2 2 2 2 2 2 2 2 ...
## $ hail : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 2 1 1 ...
## $ crop.hist : Factor w/ 4 levels "0","1","2","3": 2 3 2 2 3 4 3 2 4 3 ...
## $ area.dam : Factor w/ 4 levels "0","1","2","3": 2 1 1 1 1 1 1 1 1 1 ...
## $ sever : Factor w/ 3 levels "0","1","2": 2 3 3 3 2 2 2 2 2 3 ...
## $ seed.tmt : Factor w/ 3 levels "0","1","2": 1 2 2 1 1 1 2 1 2 1 ...
## $ germ : Ord.factor w/ 3 levels "0"<"1"<"2": 1 2 3 2 3 2 1 3 2 3 ...
## $ plant.growth : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ leaves : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ leaf.halo : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
## $ leaf.marg : Factor w/ 3 levels "0","1","2": 3 3 3 3 3 3 3 3 3 3 ...
## $ leaf.size : Ord.factor w/ 3 levels "0"<"1"<"2": 3 3 3 3 3 3 3 3 3 3 ...
## $ leaf.shread : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ leaf.malf : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ leaf.mild : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
## $ stem : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ lodging : Factor w/ 2 levels "0","1": 2 1 1 1 1 1 2 1 1 1 ...
## $ stem.cankers : Factor w/ 4 levels "0","1","2","3": 4 4 4 4 4 4 4 4 4 4 ...
## $ canker.lesion : Factor w/ 4 levels "0","1","2","3": 2 2 1 1 2 1 2 2 2 2 ...
## $ fruiting.bodies: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ ext.decay : Factor w/ 3 levels "0","1","2": 2 2 2 2 2 2 2 2 2 2 ...
## $ mycelium : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ int.discolor : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
## $ sclerotia : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ fruit.pods : Factor w/ 4 levels "0","1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
## $ fruit.spots : Factor w/ 4 levels "0","1","2","4": 4 4 4 4 4 4 4 4 4 4 ...
## $ seed : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ mold.growth : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ seed.discolor : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ seed.size : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ shriveling : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ roots : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
## Class date area.dam crop.hist temp plant.stand precip hail sever
## 562 1 1 1 1 1 1 1 1 1 0
## 77 1 1 1 1 1 1 1 0 0 2
## 8 1 1 1 1 1 1 0 0 0 3
## 6 1 1 1 1 1 0 1 0 0 3
## 14 1 1 1 1 0 0 0 0 0 5
## 15 1 1 1 0 0 0 0 0 0 6
## 1 1 0 0 0 0 0 0 0 0 8
## 0 1 1 16 30 36 38 121 121 364
## leaves plant.growth leaf.halo leaf.marg leaf.size leaf.malf leaf.shread
## 562 1 1 1 1 1 1 1
## 21 1 1 1 1 1 1 1
## 9 1 1 0 0 0 0 0
## 75 1 1 0 0 0 0 0
## 16 1 0 1 1 1 1 0
## 0 16 84 84 84 84 100
## germ seed.tmt
## 562 1 1 0
## 21 0 0 2
## 9 1 0 6
## 75 0 0 7
## 16 0 0 4
## 112 121 685
## stem stem.cankers canker.lesion ext.decay mycelium int.discolor
## 562 1 1 1 1 1 1
## 15 1 1 1 1 1 1
## 13 1 1 1 1 1 1
## 55 1 1 1 1 1 1
## 22 1 0 0 0 0 0
## 16 0 0 0 0 0 0
## 16 38 38 38 38 38
## fruiting.bodies leaf.mild lodging
## 562 1 1 1 0
## 15 1 0 0 2
## 13 0 1 0 2
## 55 0 0 0 3
## 22 0 0 0 8
## 16 0 0 0 9
## 106 108 121 541
## roots sclerotia fruit.pods seed mold.growth seed.size fruit.spots
## 562 1 1 1 1 1 1 1
## 68 1 1 0 0 0 0 0
## 14 1 0 1 1 1 1 0
## 8 1 0 1 0 0 0 0
## 15 0 1 1 1 1 1 1
## 16 0 0 0 0 0 0 0
## 31 38 84 92 92 92 106
## seed.discolor shriveling
## 562 1 1 0
## 68 0 0 7
## 14 0 0 4
## 8 0 0 7
## 15 1 1 1
## 16 0 0 9
## 106 106 747
##
## iter imp variable
## 1 1 date plant.stand* precip* temp hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo leaf.marg leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 1 2 date plant.stand* precip* temp* hail* crop.hist area.dam sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg leaf.size leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 1 3 date plant.stand* precip* temp* hail* crop.hist* area.dam sever* seed.tmt* germ* plant.growth* leaf.halo leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 1 4 date plant.stand* precip* temp* hail* crop.hist area.dam sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 1 5 date plant.stand* precip temp hail* crop.hist* area.dam sever* seed.tmt* germ* plant.growth* leaf.halo leaf.marg leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 2 1 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 2 2 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 2 3 date* plant.stand* precip* temp* hail* crop.hist* area.dam sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 2 4 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 2 5 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 3 1 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 3 2 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 3 3 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 3 4 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 3 5 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 4 1 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 4 2 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 4 3 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 4 4 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 4 5 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 5 1 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 5 2 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 5 3 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 5 4 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 5 5 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
Soybean_tidy_2 <- Soybean_imputed_complete %>%
select(-Class) %>%
mutate(across(where(is.factor), as.numeric)) %>%
pivot_longer(cols = -c(date), names_to = "name", values_to = "value")
# Distribution of Imputed Predictors
ggplot(Soybean_tidy_2, aes(x = value)) +
geom_histogram(stat = "count") +
facet_wrap(vars(name))
## Warning in geom_histogram(stat = "count"): Ignoring unknown parameters:
## `binwidth`, `bins`, and `pad`