c. Develop a strategy for handling missing data, either by
eliminating predictors or imputation.
# Identify predictors with near-zero variance
nzv <- nearZeroVar(Soybean, saveMetrics = TRUE)
print(nzv)
## freqRatio percentUnique zeroVar nzv
## Class 1.010989 2.7818448 FALSE FALSE
## date 1.137405 1.0248902 FALSE FALSE
## plant.stand 1.208191 0.2928258 FALSE FALSE
## precip 4.098214 0.4392387 FALSE FALSE
## temp 1.879397 0.4392387 FALSE FALSE
## hail 3.425197 0.2928258 FALSE FALSE
## crop.hist 1.004587 0.5856515 FALSE FALSE
## area.dam 1.213904 0.5856515 FALSE FALSE
## sever 1.651282 0.4392387 FALSE FALSE
## seed.tmt 1.373874 0.4392387 FALSE FALSE
## germ 1.103627 0.4392387 FALSE FALSE
## plant.growth 1.951327 0.2928258 FALSE FALSE
## leaves 7.870130 0.2928258 FALSE FALSE
## leaf.halo 1.547511 0.4392387 FALSE FALSE
## leaf.marg 1.615385 0.4392387 FALSE FALSE
## leaf.size 1.479638 0.4392387 FALSE FALSE
## leaf.shread 5.072917 0.2928258 FALSE FALSE
## leaf.malf 12.311111 0.2928258 FALSE FALSE
## leaf.mild 26.750000 0.4392387 FALSE TRUE
## stem 1.253378 0.2928258 FALSE FALSE
## lodging 12.380952 0.2928258 FALSE FALSE
## stem.cankers 1.984293 0.5856515 FALSE FALSE
## canker.lesion 1.807910 0.5856515 FALSE FALSE
## fruiting.bodies 4.548077 0.2928258 FALSE FALSE
## ext.decay 3.681481 0.4392387 FALSE FALSE
## mycelium 106.500000 0.2928258 FALSE TRUE
## int.discolor 13.204545 0.4392387 FALSE FALSE
## sclerotia 31.250000 0.2928258 FALSE TRUE
## fruit.pods 3.130769 0.5856515 FALSE FALSE
## fruit.spots 3.450000 0.5856515 FALSE FALSE
## seed 4.139130 0.2928258 FALSE FALSE
## mold.growth 7.820896 0.2928258 FALSE FALSE
## seed.discolor 8.015625 0.2928258 FALSE FALSE
## seed.size 9.016949 0.2928258 FALSE FALSE
## shriveling 14.184211 0.2928258 FALSE FALSE
## roots 6.406977 0.4392387 FALSE FALSE
# Remove near-zero variance predictors
Soybean_clean <- Soybean[, -nzv$nzv]
# Impute missing data using predictive mean matching for numeric variables
imputed_data <- mice(Soybean_clean, m = 5, maxit = 5, method = "norm.predict", seed = 500)
## Warning: Type mismatch for variable(s): date
## Imputation method norm.predict is not for factors with >2 levels.
## Warning: Type mismatch for variable(s): plant.stand
## Imputation method norm.predict is not for factors.
## Warning: Type mismatch for variable(s): precip
## Imputation method norm.predict is not for factors with >2 levels.
## Warning: Type mismatch for variable(s): temp
## Imputation method norm.predict is not for factors with >2 levels.
## Warning: Type mismatch for variable(s): hail
## Imputation method norm.predict is not for factors.
## Warning: Type mismatch for variable(s): crop.hist
## Imputation method norm.predict is not for factors with >2 levels.
## Warning: Type mismatch for variable(s): area.dam
## Imputation method norm.predict is not for factors with >2 levels.
## Warning: Type mismatch for variable(s): sever
## Imputation method norm.predict is not for factors with >2 levels.
## Warning: Type mismatch for variable(s): seed.tmt
## Imputation method norm.predict is not for factors with >2 levels.
## Warning: Type mismatch for variable(s): germ
## Imputation method norm.predict is not for factors with >2 levels.
## Warning: Type mismatch for variable(s): plant.growth
## Imputation method norm.predict is not for factors.
## Warning: Type mismatch for variable(s): leaf.halo
## Imputation method norm.predict is not for factors with >2 levels.
## Warning: Type mismatch for variable(s): leaf.marg
## Imputation method norm.predict is not for factors with >2 levels.
## Warning: Type mismatch for variable(s): leaf.size
## Imputation method norm.predict is not for factors with >2 levels.
## Warning: Type mismatch for variable(s): leaf.shread
## Imputation method norm.predict is not for factors.
## Warning: Type mismatch for variable(s): leaf.malf
## Imputation method norm.predict is not for factors.
## Warning: Type mismatch for variable(s): leaf.mild
## Imputation method norm.predict is not for factors with >2 levels.
## Warning: Type mismatch for variable(s): stem
## Imputation method norm.predict is not for factors.
## Warning: Type mismatch for variable(s): lodging
## Imputation method norm.predict is not for factors.
## Warning: Type mismatch for variable(s): stem.cankers
## Imputation method norm.predict is not for factors with >2 levels.
## Warning: Type mismatch for variable(s): canker.lesion
## Imputation method norm.predict is not for factors with >2 levels.
## Warning: Type mismatch for variable(s): fruiting.bodies
## Imputation method norm.predict is not for factors.
## Warning: Type mismatch for variable(s): ext.decay
## Imputation method norm.predict is not for factors with >2 levels.
## Warning: Type mismatch for variable(s): mycelium
## Imputation method norm.predict is not for factors.
## Warning: Type mismatch for variable(s): int.discolor
## Imputation method norm.predict is not for factors with >2 levels.
## Warning: Type mismatch for variable(s): sclerotia
## Imputation method norm.predict is not for factors.
## Warning: Type mismatch for variable(s): fruit.pods
## Imputation method norm.predict is not for factors with >2 levels.
## Warning: Type mismatch for variable(s): fruit.spots
## Imputation method norm.predict is not for factors with >2 levels.
## Warning: Type mismatch for variable(s): seed
## Imputation method norm.predict is not for factors.
## Warning: Type mismatch for variable(s): mold.growth
## Imputation method norm.predict is not for factors.
## Warning: Type mismatch for variable(s): seed.discolor
## Imputation method norm.predict is not for factors.
## Warning: Type mismatch for variable(s): seed.size
## Imputation method norm.predict is not for factors.
## Warning: Type mismatch for variable(s): shriveling
## Imputation method norm.predict is not for factors.
## Warning: Type mismatch for variable(s): roots
## Imputation method norm.predict is not for factors with >2 levels.
##
## iter imp variable
## 1 1 date
## Warning in Ops.factor(y, z$residuals): '-' not meaningful for factors
## Warning in `[<-.factor`(`*tmp*`, cc, value = structure(3.10283374779887, dim =
## c(1L, : invalid factor level, NA generated
## plant.stand
## Warning in Ops.ordered(y, z$residuals): '-' is not meaningful for ordered
## factors
## Warning in Ops.ordered(y, z$residuals): invalid factor level, NA generated
## precip
## Warning in Ops.ordered(y, z$residuals): '-' is not meaningful for ordered
## factors
## Warning in Ops.ordered(y, z$residuals): invalid factor level, NA generated
## temp hail
## Warning in Ops.factor(y, z$residuals): '-' not meaningful for factors
## * crop.hist area.dam sever seed.tmt germ plant.growth leaf.halo leaf.marg leaf.size leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## 1 2 date
## Warning in Ops.factor(y, z$residuals): '-' not meaningful for factors
## Warning in Ops.factor(y, z$residuals): invalid factor level, NA generated
## plant.stand
## Warning in Ops.ordered(y, z$residuals): '-' is not meaningful for ordered
## factors
## Warning in Ops.ordered(y, z$residuals): invalid factor level, NA generated
## precip
## Warning in Ops.ordered(y, z$residuals): '-' is not meaningful for ordered
## factors
## Warning in Ops.ordered(y, z$residuals): invalid factor level, NA generated
## temp hail
## Warning in Ops.factor(y, z$residuals): '-' not meaningful for factors
## * crop.hist area.dam sever seed.tmt germ plant.growth leaf.halo leaf.marg leaf.size leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## 1 3 date
## Warning in Ops.factor(y, z$residuals): '-' not meaningful for factors
## Warning in Ops.factor(y, z$residuals): invalid factor level, NA generated
## plant.stand
## Warning in Ops.ordered(y, z$residuals): '-' is not meaningful for ordered
## factors
## Warning in Ops.ordered(y, z$residuals): invalid factor level, NA generated
## precip
## Warning in Ops.ordered(y, z$residuals): '-' is not meaningful for ordered
## factors
## Warning in Ops.ordered(y, z$residuals): invalid factor level, NA generated
## temp hail
## Warning in Ops.factor(y, z$residuals): '-' not meaningful for factors
## * crop.hist area.dam sever seed.tmt germ plant.growth leaf.halo leaf.marg leaf.size leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## 1 4 date
## Warning in Ops.factor(y, z$residuals): '-' not meaningful for factors
## Warning in Ops.factor(y, z$residuals): invalid factor level, NA generated
## plant.stand
## Warning in Ops.ordered(y, z$residuals): '-' is not meaningful for ordered
## factors
## Warning in Ops.ordered(y, z$residuals): invalid factor level, NA generated
## precip
## Warning in Ops.ordered(y, z$residuals): '-' is not meaningful for ordered
## factors
## Warning in Ops.ordered(y, z$residuals): invalid factor level, NA generated
## temp hail
## Warning in Ops.factor(y, z$residuals): '-' not meaningful for factors
## * crop.hist area.dam sever seed.tmt germ plant.growth leaf.halo leaf.marg leaf.size leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## 1 5 date
## Warning in Ops.factor(y, z$residuals): '-' not meaningful for factors
## Warning in Ops.factor(y, z$residuals): invalid factor level, NA generated
## plant.stand
## Warning in Ops.ordered(y, z$residuals): '-' is not meaningful for ordered
## factors
## Warning in Ops.ordered(y, z$residuals): invalid factor level, NA generated
## precip
## Warning in Ops.ordered(y, z$residuals): '-' is not meaningful for ordered
## factors
## Warning in Ops.ordered(y, z$residuals): invalid factor level, NA generated
## temp hail
## Warning in Ops.factor(y, z$residuals): '-' not meaningful for factors
## * crop.hist area.dam sever seed.tmt germ plant.growth leaf.halo leaf.marg leaf.size leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## 2 1 date plant.stand precip temp hail crop.hist area.dam sever seed.tmt germ plant.growth leaf.halo leaf.marg leaf.size leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## 2 2 date plant.stand precip temp hail crop.hist area.dam sever seed.tmt germ plant.growth leaf.halo leaf.marg leaf.size leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## 2 3 date plant.stand precip temp hail crop.hist area.dam sever seed.tmt germ plant.growth leaf.halo leaf.marg leaf.size leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## 2 4 date plant.stand precip temp hail crop.hist area.dam sever seed.tmt germ plant.growth leaf.halo leaf.marg leaf.size leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## 2 5 date plant.stand precip temp hail crop.hist area.dam sever seed.tmt germ plant.growth leaf.halo leaf.marg leaf.size leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## 3 1 date plant.stand precip temp hail crop.hist area.dam sever seed.tmt germ plant.growth leaf.halo leaf.marg leaf.size leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## 3 2 date plant.stand precip temp hail crop.hist area.dam sever seed.tmt germ plant.growth leaf.halo leaf.marg leaf.size leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## 3 3 date plant.stand precip temp hail crop.hist area.dam sever seed.tmt germ plant.growth leaf.halo leaf.marg leaf.size leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## 3 4 date plant.stand precip temp hail crop.hist area.dam sever seed.tmt germ plant.growth leaf.halo leaf.marg leaf.size leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## 3 5 date plant.stand precip temp hail crop.hist area.dam sever seed.tmt germ plant.growth leaf.halo leaf.marg leaf.size leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## 4 1 date plant.stand precip temp hail crop.hist area.dam sever seed.tmt germ plant.growth leaf.halo leaf.marg leaf.size leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## 4 2 date plant.stand precip temp hail crop.hist area.dam sever seed.tmt germ plant.growth leaf.halo leaf.marg leaf.size leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## 4 3 date plant.stand precip temp hail crop.hist area.dam sever seed.tmt germ plant.growth leaf.halo leaf.marg leaf.size leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## 4 4 date plant.stand precip temp hail crop.hist area.dam sever seed.tmt germ plant.growth leaf.halo leaf.marg leaf.size leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## 4 5 date plant.stand precip temp hail crop.hist area.dam sever seed.tmt germ plant.growth leaf.halo leaf.marg leaf.size leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## 5 1 date plant.stand precip temp hail crop.hist area.dam sever seed.tmt germ plant.growth leaf.halo leaf.marg leaf.size leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## 5 2 date plant.stand precip temp hail crop.hist area.dam sever seed.tmt germ plant.growth leaf.halo leaf.marg leaf.size leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## 5 3 date plant.stand precip temp hail crop.hist area.dam sever seed.tmt germ plant.growth leaf.halo leaf.marg leaf.size leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## 5 4 date plant.stand precip temp hail crop.hist area.dam sever seed.tmt germ plant.growth leaf.halo leaf.marg leaf.size leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## 5 5 date plant.stand precip temp hail crop.hist area.dam sever seed.tmt germ plant.growth leaf.halo leaf.marg leaf.size leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## Warning: Number of logged events: 20
Soybean_imputed <- complete(imputed_data, 1)
- The approach taken to handle missing data first identifies
predictors with near-zero variance, removing variables that contribute
little meaningful information. This step is crucial, as near-zero
variance predictors can introduce noise and complexity into the model
without providing any useful insights.
- For the remaining missing data, multiple imputation using the
“norm.predict” method is applied. This is an effective strategy as it
leverages the relationships between predictors to fill in missing values
rather than simply dropping rows or using mean imputation. This
maintains the dataset’s integrity by preserving as much information as
possible.
- The final dataset is checked for remaining missing values,
confirming that imputation has been successfully applied. This approach
balances the elimination of low-information predictors with a robust
imputation strategy, ensuring that the cleaned dataset is suitable for
predictive modeling.