The UC Irvine Machine Learning Repository contains a data set related to glass identification. The data consist of 214 glass samples labeled as one of seven class categories. There are nine predictors, including the refractive index and percentages of eight elements: Na, Mg, Al, SI, K, Ca, Ba and Fe.
Using visualizations, explore the predictor variables to understand their distributions as well as the relationships between predictors.
library(mlbench)
data(Glass)
str(Glass)
## 'data.frame': 214 obs. of 10 variables:
## $ RI : num 1.52 1.52 1.52 1.52 1.52 ...
## $ Na : num 13.6 13.9 13.5 13.2 13.3 ...
## $ Mg : num 4.49 3.6 3.55 3.69 3.62 3.61 3.6 3.61 3.58 3.6 ...
## $ Al : num 1.1 1.36 1.54 1.29 1.24 1.62 1.14 1.05 1.37 1.36 ...
## $ Si : num 71.8 72.7 73 72.6 73.1 ...
## $ K : num 0.06 0.48 0.39 0.57 0.55 0.64 0.58 0.57 0.56 0.57 ...
## $ Ca : num 8.75 7.83 7.78 8.22 8.07 8.07 8.17 8.24 8.3 8.4 ...
## $ Ba : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Fe : num 0 0 0 0 0 0.26 0 0 0 0.11 ...
## $ Type: Factor w/ 6 levels "1","2","3","5",..: 1 1 1 1 1 1 1 1 1 1 ...
The first thing that caught my eye was all the 0’s in the BA column so I wanted to get a quick numerical summary of each column.
glass.summary <- Glass[,1:9] #the columns excluding "Type"
summary(glass.summary)
## RI Na Mg Al
## Min. :1.511 Min. :10.73 Min. :0.000 Min. :0.290
## 1st Qu.:1.517 1st Qu.:12.91 1st Qu.:2.115 1st Qu.:1.190
## Median :1.518 Median :13.30 Median :3.480 Median :1.360
## Mean :1.518 Mean :13.41 Mean :2.685 Mean :1.445
## 3rd Qu.:1.519 3rd Qu.:13.82 3rd Qu.:3.600 3rd Qu.:1.630
## Max. :1.534 Max. :17.38 Max. :4.490 Max. :3.500
## Si K Ca Ba
## Min. :69.81 Min. :0.0000 Min. : 5.430 Min. :0.000
## 1st Qu.:72.28 1st Qu.:0.1225 1st Qu.: 8.240 1st Qu.:0.000
## Median :72.79 Median :0.5550 Median : 8.600 Median :0.000
## Mean :72.65 Mean :0.4971 Mean : 8.957 Mean :0.175
## 3rd Qu.:73.09 3rd Qu.:0.6100 3rd Qu.: 9.172 3rd Qu.:0.000
## Max. :75.41 Max. :6.2100 Max. :16.190 Max. :3.150
## Fe
## Min. :0.00000
## 1st Qu.:0.00000
## Median :0.00000
## Mean :0.05701
## 3rd Qu.:0.10000
## Max. :0.51000
Glass is a described as a no-crystalline, amorphous solid and the most familiar type manufactured are silicate glasses. These types are made of silicon dioxide or quartz. Silicon, a hard crystalline solid, seems to be found in the highest percentage overall, with a minimum value of 69.81%, which makes sense based on what I know about glass. Silicon is followed distantly by sodium, at a minimum of 10.73%. Iron, magnesium, barium, and potasium do not occur at all in some of the glass type. I would think these can be used as classifiers.
par(mfrow = c(3, 3))
for (i in 1:ncol(glass.summary)){
hist(glass.summary[, i], xlab = names(glass.summary[i]), main = paste("Histogram for: ", names(glass.summary[i])), col = "green")
}
The elements that I mentioned had O values have a the majority of their points at 0. RI and Al are right skewed and Si is left skewed. Na is normally distributed. The rest of the predictors are irregularly distributed. Next we’ll take a look at a scatter plot matrix to see if that tells us anything.
pairs(glass.summary, col = "black", lower.panel = panel.smooth, main = "Scatterplot Matrix for Glass Data")
The scatterplot matrix shows a RI and Ca have a positive relationship. Other than that, there is no clear relationship between the predictors.
Do there appear to be any outliers in the data? Are any predictors skewed?
There are a few ways different methods to detect outliers. I will first use the Tukey’s method because it is not dependent on the distribution of the data. Klodian Dhana has a great script that I’d like to build on for this.
outlierKD <- function(dt, var) {
var_name <- eval(substitute(var),eval(dt))
na1 <- sum(is.na(var_name))
m1 <- mean(var_name, na.rm = T) #mean of each variable
par(mfrow=c(2, 2), oma=c(0,0,3,0))
boxplot(var_name, main="With outliers") #gives the boxplot with outliers included
hist(var_name, main="With outliers", xlab=NA, ylab=NA) #histogram with outliers included
outlier <- boxplot.stats(var_name)$out #
mo <- mean(outlier)
var_name <- ifelse(var_name %in% outlier, NA, var_name)
boxplot(var_name, main="Without outliers")
hist(var_name, main="Without outliers", xlab=NA, ylab=NA)
title("Outlier Check", outer=TRUE)
na2 <- sum(is.na(var_name))
cat("Outliers identified:", na2 - na1, "\n")
cat("Propotion (%) of outliers:", round((na2 - na1) / sum(!is.na(var_name))*100, 1), "\n")
cat("Mean of the outliers:", round(mo, 2), "\n")
m2 <- mean(var_name, na.rm = T)
cat("Mean without removing outliers:", round(m1, 2), "\n")
cat("Mean if we remove outliers:", round(m2, 2), "\n")
}
outlierKD(glass.summary, RI)
## Outliers identified: 17
## Propotion (%) of outliers: 8.6
## Mean of the outliers: 1.52
## Mean without removing outliers: 1.52
## Mean if we remove outliers: 1.52
outlierKD(glass.summary, Na)
## Outliers identified: 7
## Propotion (%) of outliers: 3.4
## Mean of the outliers: 12.66
## Mean without removing outliers: 13.41
## Mean if we remove outliers: 13.43
outlierKD(glass.summary, Mg)
## Outliers identified: 0
## Propotion (%) of outliers: 0
## Mean of the outliers: NaN
## Mean without removing outliers: 2.68
## Mean if we remove outliers: 2.68
outlierKD(glass.summary, Al)
## Outliers identified: 18
## Propotion (%) of outliers: 9.2
## Mean of the outliers: 2.09
## Mean without removing outliers: 1.44
## Mean if we remove outliers: 1.39
outlierKD(glass.summary, Si)
## Outliers identified: 12
## Propotion (%) of outliers: 5.9
## Mean of the outliers: 71.82
## Mean without removing outliers: 72.65
## Mean if we remove outliers: 72.7
outlierKD(glass.summary, K)
## Outliers identified: 7
## Propotion (%) of outliers: 3.4
## Mean of the outliers: 3.06
## Mean without removing outliers: 0.5
## Mean if we remove outliers: 0.41
outlierKD(glass.summary, Ca)
## Outliers identified: 26
## Propotion (%) of outliers: 13.8
## Mean of the outliers: 11.17
## Mean without removing outliers: 8.96
## Mean if we remove outliers: 8.65
outlierKD(glass.summary, Fe)
## Outliers identified: 12
## Propotion (%) of outliers: 5.9
## Mean of the outliers: 0.32
## Mean without removing outliers: 0.06
## Mean if we remove outliers: 0.04
With the exception of magnesium, all the predictors have outliers. The function that I used also gives the mean with and without the outliers. For the most part, there is no significant difference in the means.
library(e1071)
s.RI <- skewness(glass.summary$RI)
print(paste0("The skewness value for RI is: ", s.RI, "."))
## [1] "The skewness value for RI is: 1.60271508274373."
s.Na <- skewness(glass.summary$Na)
print(paste0("The skewness value for Na is: ", s.Na, "."))
## [1] "The skewness value for Na is: 0.447834258917133."
s.Mg <- skewness(glass.summary$Mg)
print(paste0("The skewness value for Mg is: ", s.Mg, "."))
## [1] "The skewness value for Mg is: -1.13645227846653."
s.Al <- skewness(glass.summary$Al)
print(paste0("The skewness value for Al is: ", s.Al, "."))
## [1] "The skewness value for Al is: 0.89461041611312."
s.Si <- skewness(glass.summary$Si)
print(paste0("The skewness value for Si is: ", s.Si, "."))
## [1] "The skewness value for Si is: -0.720239210805621."
s.K<- skewness(glass.summary$K)
print(paste0("The skewness value for K is: ", s.K, "."))
## [1] "The skewness value for K is: 6.46008889572281."
s.Ca <- skewness(glass.summary$Ca)
print(paste0("The skewness value for Ca is: ", s.Ca, "."))
## [1] "The skewness value for Ca is: 2.01844629445302."
s.Ba <- skewness(glass.summary$Ba)
print(paste0("The skewness value for Ba is: ", s.Ba, "."))
## [1] "The skewness value for Ba is: 3.36867996880571."
s.Fe <- skewness(glass.summary$Fe)
print(paste0("The skewness value for Fe is: ", s.Fe, "."))
## [1] "The skewness value for Fe is: 1.7298107095598."
All the predictors seem to have some level of skewness, the largest being K which is rightly skewed. Magnisum and Silicon are left skewed.
Are there any relevant transfornations of one or more predictors that might improve the classification model?
Assuming we are planning to do a linear model, we can use powerTransform in the car package to find the optimal power to transform predictors. We’ll use that to determine if there is any meanigful transformation to be done, do it and see if we notice any differences.
library(car)
#trans <- powerTransform(glass.summary) #when I used this, it said "first argument must be strictly posirive" which lead me to changing the family to yjPower to account for out 0 values.
summary(trans <- powerTransform(glass.summary, family = "yjPower"))
## yjPower Transformations to Multinormality
## Est Power Rounded Pwr Wald Lwr Bnd Wald Upr Bnd
## RI -25.0853 -25.09 -25.1140 -25.0566
## Na 1.3756 1.00 0.6042 2.1469
## Mg 1.7699 2.00 1.5354 2.0044
## Al 0.9773 1.00 0.6451 1.3095
## Si 10.9453 10.95 5.9996 15.8910
## K -0.1441 0.00 -0.3432 0.0549
## Ca 0.6774 0.50 0.4470 0.9079
## Ba -6.8620 -6.86 -8.0321 -5.6920
## Fe -14.9246 -14.92 -17.8279 -12.0212
##
## Likelihood ratio test that all transformation parameters are equal to 0
## LRT df pval
## LR test, lambda = (0 0 0 0 0 0 0 0 0) 851.2514 9 < 2.22e-16
trans$roundlam
## RI Na Mg Al Si K
## -25.085311 1.000000 2.000000 1.000000 10.945270 0.000000
## Ca Ba Fe
## 0.500000 -6.862046 -14.924560
For Na and Al there is no transformations. This is because lambda is 1. A log transformation should be done for K since lambda is 0. Lastly we should perform a square root transformation for Ca since the lambda value is .5. I’m no sure what ot make of RI, Mg, Si, Ba, and Fe since the values do not fall under any of the above criteria.If I had more time, I would perfomr each of the transformation and take a look at how it affects outliers, etc.
They soybean data can also be found at the UC Irvine Machine Learning Repository. Data were collect to predict disease in 683 soybeans. The 35 predictors are mostly categorical and include information on the environmental conditions(e.g., temperature, precipitation) and plant conditions(e.g., left spots, mold gorwth). The outcome labels consist of 19 distinct classes.
data(Soybean)
?Soybean
str(Soybean)
## 'data.frame': 683 obs. of 36 variables:
## $ Class : Factor w/ 19 levels "2-4-d-injury",..: 11 11 11 11 11 11 11 11 11 11 ...
## $ date : Factor w/ 7 levels "0","1","2","3",..: 7 5 4 4 7 6 6 5 7 5 ...
## $ plant.stand : Ord.factor w/ 2 levels "0"<"1": 1 1 1 1 1 1 1 1 1 1 ...
## $ precip : Ord.factor w/ 3 levels "0"<"1"<"2": 3 3 3 3 3 3 3 3 3 3 ...
## $ temp : Ord.factor w/ 3 levels "0"<"1"<"2": 2 2 2 2 2 2 2 2 2 2 ...
## $ hail : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 2 1 1 ...
## $ crop.hist : Factor w/ 4 levels "0","1","2","3": 2 3 2 2 3 4 3 2 4 3 ...
## $ area.dam : Factor w/ 4 levels "0","1","2","3": 2 1 1 1 1 1 1 1 1 1 ...
## $ sever : Factor w/ 3 levels "0","1","2": 2 3 3 3 2 2 2 2 2 3 ...
## $ seed.tmt : Factor w/ 3 levels "0","1","2": 1 2 2 1 1 1 2 1 2 1 ...
## $ germ : Ord.factor w/ 3 levels "0"<"1"<"2": 1 2 3 2 3 2 1 3 2 3 ...
## $ plant.growth : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ leaves : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ leaf.halo : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
## $ leaf.marg : Factor w/ 3 levels "0","1","2": 3 3 3 3 3 3 3 3 3 3 ...
## $ leaf.size : Ord.factor w/ 3 levels "0"<"1"<"2": 3 3 3 3 3 3 3 3 3 3 ...
## $ leaf.shread : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ leaf.malf : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ leaf.mild : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
## $ stem : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ lodging : Factor w/ 2 levels "0","1": 2 1 1 1 1 1 2 1 1 1 ...
## $ stem.cankers : Factor w/ 4 levels "0","1","2","3": 4 4 4 4 4 4 4 4 4 4 ...
## $ canker.lesion : Factor w/ 4 levels "0","1","2","3": 2 2 1 1 2 1 2 2 2 2 ...
## $ fruiting.bodies: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ ext.decay : Factor w/ 3 levels "0","1","2": 2 2 2 2 2 2 2 2 2 2 ...
## $ mycelium : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ int.discolor : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
## $ sclerotia : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ fruit.pods : Factor w/ 4 levels "0","1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
## $ fruit.spots : Factor w/ 4 levels "0","1","2","4": 4 4 4 4 4 4 4 4 4 4 ...
## $ seed : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ mold.growth : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ seed.discolor : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ seed.size : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ shriveling : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ roots : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
summary(Soybean)
## Class date plant.stand precip temp
## brown-spot : 92 5 :149 0 :354 0 : 74 0 : 80
## alternarialeaf-spot: 91 4 :131 1 :293 1 :112 1 :374
## frog-eye-leaf-spot : 91 3 :118 NA's: 36 2 :459 2 :199
## phytophthora-rot : 88 2 : 93 NA's: 38 NA's: 30
## anthracnose : 44 6 : 90
## brown-stem-rot : 44 (Other):101
## (Other) :233 NA's : 1
## hail crop.hist area.dam sever seed.tmt germ
## 0 :435 0 : 65 0 :123 0 :195 0 :305 0 :165
## 1 :127 1 :165 1 :227 1 :322 1 :222 1 :213
## NA's:121 2 :219 2 :145 2 : 45 2 : 35 2 :193
## 3 :218 3 :187 NA's:121 NA's:121 NA's:112
## NA's: 16 NA's: 1
##
##
## plant.growth leaves leaf.halo leaf.marg leaf.size leaf.shread
## 0 :441 0: 77 0 :221 0 :357 0 : 51 0 :487
## 1 :226 1:606 1 : 36 1 : 21 1 :327 1 : 96
## NA's: 16 2 :342 2 :221 2 :221 NA's:100
## NA's: 84 NA's: 84 NA's: 84
##
##
##
## leaf.malf leaf.mild stem lodging stem.cankers canker.lesion
## 0 :554 0 :535 0 :296 0 :520 0 :379 0 :320
## 1 : 45 1 : 20 1 :371 1 : 42 1 : 39 1 : 83
## NA's: 84 2 : 20 NA's: 16 NA's:121 2 : 36 2 :177
## NA's:108 3 :191 3 : 65
## NA's: 38 NA's: 38
##
##
## fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods
## 0 :473 0 :497 0 :639 0 :581 0 :625 0 :407
## 1 :104 1 :135 1 : 6 1 : 44 1 : 20 1 :130
## NA's:106 2 : 13 NA's: 38 2 : 20 NA's: 38 2 : 14
## NA's: 38 NA's: 38 3 : 48
## NA's: 84
##
##
## fruit.spots seed mold.growth seed.discolor seed.size shriveling
## 0 :345 0 :476 0 :524 0 :513 0 :532 0 :539
## 1 : 75 1 :115 1 : 67 1 : 64 1 : 59 1 : 38
## 2 : 57 NA's: 92 NA's: 92 NA's:106 NA's: 92 NA's:106
## 4 :100
## NA's:106
##
##
## roots
## 0 :551
## 1 : 86
## 2 : 15
## NA's: 31
##
##
##
Investigave the frequency distributions for the categorial predictors. Are any of the distributions degenerate in the ways discussed earlier in this chapter?
The text refers to degenerate distributions as those where the predicotr model has a single unique value. Going off of that, none of our variables. We do have some where there is a signifant difference so I’ll use the caret package to check the frequency.
library(caret)
soybean.summary <- Soybean[, 2:36]
nearZeroVar(soybean.summary, names = TRUE, saveMetrics = TRUE)
## freqRatio percentUnique zeroVar nzv
## date 1.137405 1.0248902 FALSE FALSE
## plant.stand 1.208191 0.2928258 FALSE FALSE
## precip 4.098214 0.4392387 FALSE FALSE
## temp 1.879397 0.4392387 FALSE FALSE
## hail 3.425197 0.2928258 FALSE FALSE
## crop.hist 1.004587 0.5856515 FALSE FALSE
## area.dam 1.213904 0.5856515 FALSE FALSE
## sever 1.651282 0.4392387 FALSE FALSE
## seed.tmt 1.373874 0.4392387 FALSE FALSE
## germ 1.103627 0.4392387 FALSE FALSE
## plant.growth 1.951327 0.2928258 FALSE FALSE
## leaves 7.870130 0.2928258 FALSE FALSE
## leaf.halo 1.547511 0.4392387 FALSE FALSE
## leaf.marg 1.615385 0.4392387 FALSE FALSE
## leaf.size 1.479638 0.4392387 FALSE FALSE
## leaf.shread 5.072917 0.2928258 FALSE FALSE
## leaf.malf 12.311111 0.2928258 FALSE FALSE
## leaf.mild 26.750000 0.4392387 FALSE TRUE
## stem 1.253378 0.2928258 FALSE FALSE
## lodging 12.380952 0.2928258 FALSE FALSE
## stem.cankers 1.984293 0.5856515 FALSE FALSE
## canker.lesion 1.807910 0.5856515 FALSE FALSE
## fruiting.bodies 4.548077 0.2928258 FALSE FALSE
## ext.decay 3.681481 0.4392387 FALSE FALSE
## mycelium 106.500000 0.2928258 FALSE TRUE
## int.discolor 13.204545 0.4392387 FALSE FALSE
## sclerotia 31.250000 0.2928258 FALSE TRUE
## fruit.pods 3.130769 0.5856515 FALSE FALSE
## fruit.spots 3.450000 0.5856515 FALSE FALSE
## seed 4.139130 0.2928258 FALSE FALSE
## mold.growth 7.820896 0.2928258 FALSE FALSE
## seed.discolor 8.015625 0.2928258 FALSE FALSE
## seed.size 9.016949 0.2928258 FALSE FALSE
## shriveling 14.184211 0.2928258 FALSE FALSE
## roots 6.406977 0.4392387 FALSE FALSE
From the above, we see that leaf.mild, mycelium, sclerotia seem to be at near zero variance. By definition, they will also be considered degenerate.
Roughly 18% of the data are missing. Are there particular predictors that are more likely to be missing. Is the pattern of missing data related to the classes?
From my research, I found that the VIM package handles “graphics that describe distributions and patterns of missing data” well.
library(VIM)
aggr(soybean.summary, numbers = TRUE, prop = c(TRUE, FALSE), sortVars = TRUE)
##
## Variables sorted by number of missings:
## Variable Count
## hail 0.177159590
## sever 0.177159590
## seed.tmt 0.177159590
## lodging 0.177159590
## germ 0.163982430
## leaf.mild 0.158125915
## fruiting.bodies 0.155197657
## fruit.spots 0.155197657
## seed.discolor 0.155197657
## shriveling 0.155197657
## leaf.shread 0.146412884
## seed 0.134699854
## mold.growth 0.134699854
## seed.size 0.134699854
## leaf.halo 0.122986823
## leaf.marg 0.122986823
## leaf.size 0.122986823
## leaf.malf 0.122986823
## fruit.pods 0.122986823
## precip 0.055636896
## stem.cankers 0.055636896
## canker.lesion 0.055636896
## ext.decay 0.055636896
## mycelium 0.055636896
## int.discolor 0.055636896
## sclerotia 0.055636896
## plant.stand 0.052708638
## roots 0.045387994
## temp 0.043923865
## crop.hist 0.023426061
## plant.growth 0.023426061
## stem 0.023426061
## date 0.001464129
## area.dam 0.001464129
## leaves 0.000000000
There are about 9 variables with more than 15% missing values. From the question, and also from the combination plot we can see that 82% of the data aren’t missing(as per the question). Now, I’ll try to group the data by type to see if there’s a certain type that holds the most missing values. I’ll first filter the complete cases the group by class then calculate the proportion of missing/total. Let’s see what we see.
library(dplyr)
Soybean %>%
mutate(Total = n()) %>%
filter(!complete.cases(.)) %>%
group_by(Class) %>%
mutate(Missing = n(), Proportion = Missing/Total) %>%
select(Class, Missing, Proportion) %>%
unique()
## # A tibble: 5 x 3
## # Groups: Class [5]
## Class Missing Proportion
## <fct> <int> <dbl>
## 1 phytophthora-rot 68 0.0996
## 2 diaporthe-pod-&-stem-blight 15 0.0220
## 3 cyst-nematode 14 0.0205
## 4 2-4-d-injury 16 0.0234
## 5 herbicide-injury 8 0.0117
The majority of the missing values are in phytophthora-rot with 68 or 10% missing values.
Develop a strategy for handling missing data, either by eliminating predictors or imputations.
Since I’ve used the mice package in the past, I’m going to use it again here.
library(mice)
soybean.mice <- mice(Soybean, method = "pmm", seed = 700)
##
## iter imp variable
## 1 1 date plant.stand* precip* temp* hail* crop.hist* area.dam sever* seed.tmt* germ* plant.growth* leaf.halo leaf.marg leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 1 2 date plant.stand* precip* temp* hail* crop.hist* area.dam sever* seed.tmt* germ* plant.growth* leaf.halo leaf.marg leaf.size leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 1 3 date plant.stand precip* temp* hail* crop.hist* area.dam sever* seed.tmt* germ* plant.growth* leaf.halo leaf.marg leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 1 4 date plant.stand precip* temp* hail* crop.hist* area.dam sever* seed.tmt* germ* plant.growth* leaf.halo leaf.marg leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 1 5 date plant.stand precip* temp* hail* crop.hist area.dam sever* seed.tmt* germ* plant.growth* leaf.halo leaf.marg leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 2 1 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 2 2 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 2 3 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 2 4 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 2 5 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor sclerotia fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 3 1 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 3 2 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 3 3 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 3 4 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 3 5 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 4 1 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 4 2 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 4 3 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 4 4 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 4 5 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 5 1 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 5 2 date* plant.stand precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 5 3 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 5 4 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## 5 5 date* plant.stand* precip* temp* hail* crop.hist* area.dam* sever* seed.tmt* germ* plant.growth* leaf.halo* leaf.marg* leaf.size* leaf.shread* leaf.malf* leaf.mild* stem* lodging* stem.cankers* canker.lesion* fruiting.bodies* ext.decay* mycelium* int.discolor* sclerotia* fruit.pods* fruit.spots* seed* mold.growth* seed.discolor* seed.size* shriveling* roots*
## * Please inspect the loggedEvents
aggr(complete(soybean.mice), numbers = TRUE, prop = c(TRUE, FALSE), sortVars = TRUE)
##
## Variables sorted by number of missings:
## Variable Count
## Class 0
## date 0
## plant.stand 0
## precip 0
## temp 0
## hail 0
## crop.hist 0
## area.dam 0
## sever 0
## seed.tmt 0
## germ 0
## plant.growth 0
## leaves 0
## leaf.halo 0
## leaf.marg 0
## leaf.size 0
## leaf.shread 0
## leaf.malf 0
## leaf.mild 0
## stem 0
## lodging 0
## stem.cankers 0
## canker.lesion 0
## fruiting.bodies 0
## ext.decay 0
## mycelium 0
## int.discolor 0
## sclerotia 0
## fruit.pods 0
## fruit.spots 0
## seed 0
## mold.growth 0
## seed.discolor 0
## seed.size 0
## shriveling 0
## roots 0
The mice function was able to impute all the missing data as shown by the plots and numerical output.