The UC Irvine Machine Learning Repository6 contains a data set related to glass identification. The data consist of 214 glass samples labeled as one of seven class categories. There are nine predictors, including the refractive index and percentages of eight elements: Na, Mg, Al, Si, K, Ca, Ba, and Fe. The data can be accessed via:
## 'data.frame': 214 obs. of 10 variables:
## $ RI : num 1.52 1.52 1.52 1.52 1.52 ...
## $ Na : num 13.6 13.9 13.5 13.2 13.3 ...
## $ Mg : num 4.49 3.6 3.55 3.69 3.62 3.61 3.6 3.61 3.58 3.6 ...
## $ Al : num 1.1 1.36 1.54 1.29 1.24 1.62 1.14 1.05 1.37 1.36 ...
## $ Si : num 71.8 72.7 73 72.6 73.1 ...
## $ K : num 0.06 0.48 0.39 0.57 0.55 0.64 0.58 0.57 0.56 0.57 ...
## $ Ca : num 8.75 7.83 7.78 8.22 8.07 8.07 8.17 8.24 8.3 8.4 ...
## $ Ba : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Fe : num 0 0 0 0 0 0.26 0 0 0 0.11 ...
## $ Type: Factor w/ 6 levels "1","2","3","5",..: 1 1 1 1 1 1 1 1 1 1 ...
Using visualizations, explore the predictor variables to understand their distributions as well as the relationships between predictors.
## RI Na Mg Al Si K
## breaks Numeric,13 Integer,9 Numeric,10 Numeric,8 Numeric,13 Numeric,14
## counts Integer,12 Integer,8 Integer,9 Integer,7 Integer,12 Integer,13
## density Numeric,12 Numeric,8 Numeric,9 Numeric,7 Numeric,12 Numeric,13
## mids Numeric,12 Numeric,8 Numeric,9 Numeric,7 Numeric,12 Numeric,13
## xname "X[[i]]" "X[[i]]" "X[[i]]" "X[[i]]" "X[[i]]" "X[[i]]"
## equidist TRUE TRUE TRUE TRUE TRUE TRUE
## Ca Ba Fe
## breaks Integer,13 Numeric,8 Numeric,12
## counts Integer,12 Integer,7 Integer,11
## density Numeric,12 Numeric,7 Numeric,11
## mids Numeric,12 Numeric,7 Numeric,11
## xname "X[[i]]" "X[[i]]" "X[[i]]"
## equidist TRUE TRUE TRUE
Do there appear to be any outliers in the data? Are any predictors skewed?
The skew values show that RI,Mg, K, CA, BA and Fe are skewed more than 1 or -1. The halfnorm plots show some outliers but do not seem to have high leverage as the are in the same direction as the trends.
## RI Na Mg Al Si K Ca
## 1.6027151 0.4478343 -1.1364523 0.8946104 -0.7202392 6.4600889 2.0184463
## Ba Fe
## 3.3686800 1.7298107
Are there any relevant transformations of one or more predictors that might improve the classification model?
The boxplot show that Si has a different magnitude then the other varialbes and a transformation my help. The box transformation data gives us some possible transformation values for RI(-2),Na(-.1),AL(.5),SI(2) and Ca(-1.1)
## RI Na Mg Al Si K Ca
## lambda -2 -0.1 NA 0.5 2 NA -1.1
## fudge 0.2 0.2 Numeric,6 0.2 0.2 Numeric,6 0.2
## n 214 214 Inf 214 214 Inf 214
## summary Numeric,6 Numeric,6 214 Numeric,6 Numeric,6 214 Numeric,6
## ratio 1.015075 1.619758 0.2 12.06897 1.080218 0.2 2.981584
## skewness 1.602715 0.4478343 -1.136452 0.8946104 -0.7202392 6.460089 2.018446
## Ba Fe
## lambda NA NA
## fudge Numeric,6 Numeric,6
## n Inf Inf
## summary 214 214
## ratio 0.2 0.2
## skewness 3.36868 1.729811
The soybean data can also be found at the UC Irvine Machine Learning Repository. Data were collected to predict disease in 683 soybeans. The 35 predictors are mostly categorical and include information on the environmental conditions (e.g., temperature, precipitation) and plant conditions (e.g., left spots, mold growth). The outcome labels consist of 19 distinct classes.
Investigate the frequency distributions for the categorical predictors. Are any of the distributions degenerate in the ways discussed earlier in this chapter?
0.56% of the predictors are right skewed
## Var1 Freq
## 1 2-4-d-injury 16
## 2 alternarialeaf-spot 91
## 3 anthracnose 44
## 4 bacterial-blight 20
## 5 bacterial-pustule 20
## 6 brown-spot 92
## 7 brown-stem-rot 44
## 8 charcoal-rot 20
## 9 cyst-nematode 14
## 10 diaporthe-pod-&-stem-blight 15
## 11 diaporthe-stem-canker 20
## 12 downy-mildew 20
## 13 frog-eye-leaf-spot 91
## 14 herbicide-injury 8
## 15 phyllosticta-leaf-spot 20
## 16 phytophthora-rot 88
## 17 powdery-mildew 20
## 18 purple-seed-stain 20
## 19 rhizoctonia-root-rot 20
## $plant.stand
## [,1]
## [1,] 0.7
## [2,] 1.9
##
## $precip
## [,1]
## [1,] 0.7
## [2,] 1.9
## [3,] 3.1
##
## $temp
## [,1]
## [1,] 0.7
## [2,] 1.9
## [3,] 3.1
##
## $hail
## [,1]
## [1,] 0.7
## [2,] 1.9
##
## $crop.hist
## [,1]
## [1,] 0.7
## [2,] 1.9
## [3,] 3.1
## [4,] 4.3
##
## $area.dam
## [,1]
## [1,] 0.7
## [2,] 1.9
## [3,] 3.1
## [4,] 4.3
##
## $sever
## [,1]
## [1,] 0.7
## [2,] 1.9
## [3,] 3.1
##
## $seed.tmt
## [,1]
## [1,] 0.7
## [2,] 1.9
## [3,] 3.1
##
## $germ
## [,1]
## [1,] 0.7
## [2,] 1.9
## [3,] 3.1
##
## $plant.growth
## [,1]
## [1,] 0.7
## [2,] 1.9
##
## $leaves
## [,1]
## [1,] 0.7
## [2,] 1.9
##
## $leaf.halo
## [,1]
## [1,] 0.7
## [2,] 1.9
## [3,] 3.1
##
## $leaf.marg
## [,1]
## [1,] 0.7
## [2,] 1.9
## [3,] 3.1
##
## $leaf.size
## [,1]
## [1,] 0.7
## [2,] 1.9
## [3,] 3.1
##
## $leaf.shread
## [,1]
## [1,] 0.7
## [2,] 1.9
##
## $leaf.malf
## [,1]
## [1,] 0.7
## [2,] 1.9
##
## $leaf.mild
## [,1]
## [1,] 0.7
## [2,] 1.9
## [3,] 3.1
##
## $stem
## [,1]
## [1,] 0.7
## [2,] 1.9
##
## $lodging
## [,1]
## [1,] 0.7
## [2,] 1.9
##
## $stem.cankers
## [,1]
## [1,] 0.7
## [2,] 1.9
## [3,] 3.1
## [4,] 4.3
##
## $canker.lesion
## [,1]
## [1,] 0.7
## [2,] 1.9
## [3,] 3.1
## [4,] 4.3
##
## $fruiting.bodies
## [,1]
## [1,] 0.7
## [2,] 1.9
##
## $ext.decay
## [,1]
## [1,] 0.7
## [2,] 1.9
## [3,] 3.1
##
## $mycelium
## [,1]
## [1,] 0.7
## [2,] 1.9
##
## $int.discolor
## [,1]
## [1,] 0.7
## [2,] 1.9
## [3,] 3.1
##
## $sclerotia
## [,1]
## [1,] 0.7
## [2,] 1.9
##
## $fruit.pods
## [,1]
## [1,] 0.7
## [2,] 1.9
## [3,] 3.1
## [4,] 4.3
##
## $fruit.spots
## [,1]
## [1,] 0.7
## [2,] 1.9
## [3,] 3.1
## [4,] 4.3
##
## $seed
## [,1]
## [1,] 0.7
## [2,] 1.9
##
## $mold.growth
## [,1]
## [1,] 0.7
## [2,] 1.9
##
## $seed.discolor
## [,1]
## [1,] 0.7
## [2,] 1.9
##
## $seed.size
## [,1]
## [1,] 0.7
## [2,] 1.9
##
## $shriveling
## [,1]
## [1,] 0.7
## [2,] 1.9
##
## $roots
## [,1]
## [1,] 0.7
## [2,] 1.9
## [3,] 3.1
Roughly 18% of the data are missing. Are there particular predictors that are more likely to be missing? Is the pattern of missing data related to the classes
The appears to several variables that are missing more data than others including germ, hail, server, seed and logding.
## Mode FALSE TRUE
## logical 121 562
## Class date plant.stand precip temp
## 0.0 0.1 5.3 5.6 4.4
## hail crop.hist area.dam sever seed.tmt
## 17.7 2.3 0.1 17.7 17.7
## germ plant.growth leaves leaf.halo leaf.marg
## 16.4 2.3 0.0 12.3 12.3
## leaf.size leaf.shread leaf.malf leaf.mild stem
## 12.3 14.6 12.3 15.8 2.3
## lodging stem.cankers canker.lesion fruiting.bodies ext.decay
## 17.7 5.6 5.6 15.5 5.6
## mycelium int.discolor sclerotia fruit.pods fruit.spots
## 5.6 5.6 5.6 12.3 15.5
## seed mold.growth seed.discolor seed.size shriveling
## 13.5 13.5 15.5 13.5 15.5
## roots
## 4.5
Develop a strategy for handling missing data, either by eliminating predictors or imputation.
I first run a GLM on all the data with Class as the predicted value. I ran the step mode then reran after removing the 4 columns with most missing data.
##
## Call:
## glm(formula = Class ~ ., family = binomial(link = "logit"), data = Soybean)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -8.49 0.00 0.00 0.00 8.49
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.587e+15 4.444e+07 103216018 <2e-16 ***
## date1 4.698e+14 2.084e+07 22541307 <2e-16 ***
## date2 -5.824e+14 2.064e+07 -28210137 <2e-16 ***
## date3 -2.571e+15 2.159e+07 -119074437 <2e-16 ***
## date4 -2.728e+15 2.139e+07 -127576208 <2e-16 ***
## date5 -2.905e+15 2.132e+07 -136262644 <2e-16 ***
## date6 -3.451e+15 2.204e+07 -156580853 <2e-16 ***
## plant.stand.L 1.939e+14 5.819e+06 33320092 <2e-16 ***
## precip.L -6.197e+14 1.333e+07 -46489783 <2e-16 ***
## precip.Q 2.031e+14 9.404e+06 21596947 <2e-16 ***
## temp.L 1.098e+14 1.124e+07 9776535 <2e-16 ***
## temp.Q -1.178e+14 6.576e+06 -17906974 <2e-16 ***
## hail1 1.997e+14 8.948e+06 22316290 <2e-16 ***
## crop.hist1 -7.596e+12 1.262e+07 -602060 <2e-16 ***
## crop.hist2 3.180e+13 1.219e+07 2609630 <2e-16 ***
## crop.hist3 -4.568e+13 1.236e+07 -3696906 <2e-16 ***
## area.dam1 -2.714e+13 1.050e+07 -2585694 <2e-16 ***
## area.dam2 -9.215e+12 9.817e+06 -938632 <2e-16 ***
## area.dam3 3.248e+13 9.986e+06 3252995 <2e-16 ***
## sever1 2.562e+14 8.645e+06 29634177 <2e-16 ***
## sever2 -9.649e+13 1.714e+07 -5630662 <2e-16 ***
## seed.tmt1 -5.915e+13 7.078e+06 -8357845 <2e-16 ***
## seed.tmt2 1.556e+14 1.362e+07 11430316 <2e-16 ***
## germ.L -1.470e+14 6.239e+06 -23556416 <2e-16 ***
## germ.Q 1.409e+14 5.330e+06 26430840 <2e-16 ***
## plant.growth1 8.255e+14 1.431e+07 57678943 <2e-16 ***
## leaves1 7.164e+14 1.890e+07 37910393 <2e-16 ***
## leaf.halo1 -3.932e+15 5.429e+07 -72420358 <2e-16 ***
## leaf.halo2 -1.748e+15 5.077e+07 -34430192 <2e-16 ***
## leaf.marg1 6.142e+14 3.055e+07 20101914 <2e-16 ***
## leaf.marg2 NA NA NA NA
## leaf.size.L -2.097e+15 3.676e+07 -57048417 <2e-16 ***
## leaf.size.Q 1.062e+15 2.141e+07 49614233 <2e-16 ***
## leaf.shread1 6.736e+13 1.041e+07 6473496 <2e-16 ***
## leaf.malf1 4.090e+14 1.726e+07 23693055 <2e-16 ***
## leaf.mild1 3.263e+14 2.905e+07 11229260 <2e-16 ***
## leaf.mild2 2.877e+15 3.803e+07 75649461 <2e-16 ***
## stem1 -6.878e+14 2.816e+07 -24427649 <2e-16 ***
## lodging1 -2.302e+15 1.567e+07 -146835250 <2e-16 ***
## stem.cankers1 2.495e+15 5.986e+07 41684437 <2e-16 ***
## stem.cankers2 3.674e+15 4.992e+07 73590961 <2e-16 ***
## stem.cankers3 5.968e+15 4.087e+07 145998348 <2e-16 ***
## canker.lesion1 -1.218e+15 2.660e+07 -45792973 <2e-16 ***
## canker.lesion2 -2.927e+14 2.865e+07 -10217274 <2e-16 ***
## canker.lesion3 5.936e+15 2.667e+07 222585592 <2e-16 ***
## fruiting.bodies1 -4.239e+13 1.969e+07 -2152378 <2e-16 ***
## ext.decay1 -8.834e+14 1.386e+07 -63746398 <2e-16 ***
## mycelium1 4.920e+15 3.278e+07 150078192 <2e-16 ***
## int.discolor1 8.692e+14 3.487e+07 24927370 <2e-16 ***
## int.discolor2 -1.999e+15 4.522e+07 -44202762 <2e-16 ***
## sclerotia1 NA NA NA NA
## fruit.pods1 3.838e+15 5.350e+07 71745375 <2e-16 ***
## fruit.pods3 4.048e+14 4.784e+07 8460200 <2e-16 ***
## fruit.spots1 -2.924e+15 5.096e+07 -57375041 <2e-16 ***
## fruit.spots2 -4.669e+15 5.712e+07 -81733023 <2e-16 ***
## fruit.spots4 -2.765e+15 2.605e+07 -106154965 <2e-16 ***
## seed1 -2.213e+15 3.263e+07 -67821359 <2e-16 ***
## mold.growth1 2.342e+15 2.892e+07 81006454 <2e-16 ***
## seed.discolor1 -8.474e+14 2.939e+07 -28837763 <2e-16 ***
## seed.size1 7.853e+14 2.713e+07 28950407 <2e-16 ***
## shriveling1 -8.927e+14 3.029e+07 -29470681 <2e-16 ***
## roots1 5.119e+14 3.151e+07 16243708 <2e-16 ***
## roots2 1.207e+15 7.446e+07 16203584 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 497.76 on 561 degrees of freedom
## Residual deviance: 1585.92 on 501 degrees of freedom
## (121 observations deleted due to missingness)
## AIC: 1707.9
##
## Number of Fisher Scoring iterations: 25
## Start: AIC=1707.92
## Class ~ date + plant.stand + precip + temp + hail + crop.hist +
## area.dam + sever + seed.tmt + germ + plant.growth + leaves +
## leaf.halo + leaf.marg + leaf.size + leaf.shread + leaf.malf +
## leaf.mild + stem + lodging + stem.cankers + canker.lesion +
## fruiting.bodies + ext.decay + mycelium + int.discolor + sclerotia +
## fruit.pods + fruit.spots + seed + mold.growth + seed.discolor +
## seed.size + shriveling + roots
##
##
## Step: AIC=1707.92
## Class ~ date + plant.stand + precip + temp + hail + crop.hist +
## area.dam + sever + seed.tmt + germ + plant.growth + leaves +
## leaf.halo + leaf.marg + leaf.size + leaf.shread + leaf.malf +
## leaf.mild + stem + lodging + stem.cankers + canker.lesion +
## fruiting.bodies + ext.decay + mycelium + int.discolor + fruit.pods +
## fruit.spots + seed + mold.growth + seed.discolor + seed.size +
## shriveling + roots
##
##
## Step: AIC=5456.46
## Class ~ date + plant.stand + precip + temp + hail + crop.hist +
## area.dam + sever + seed.tmt + germ + plant.growth + leaves +
## leaf.halo + leaf.marg + leaf.size + leaf.shread + leaf.malf +
## leaf.mild + lodging + stem.cankers + canker.lesion + fruiting.bodies +
## ext.decay + mycelium + int.discolor + fruit.pods + fruit.spots +
## seed + mold.growth + seed.discolor + seed.size + shriveling +
## roots
##
## Call:
## glm(formula = Class ~ date + plant.stand + precip + temp + hail +
## crop.hist + area.dam + sever + seed.tmt + germ + plant.growth +
## leaves + leaf.halo + leaf.marg + leaf.size + leaf.shread +
## leaf.malf + leaf.mild + lodging + stem.cankers + canker.lesion +
## fruiting.bodies + ext.decay + mycelium + int.discolor + fruit.pods +
## fruit.spots + seed + mold.growth + seed.discolor + seed.size +
## shriveling + roots, family = binomial(link = "logit"), data = Soybean)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -8.49 0.00 0.00 0.00 8.49
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.108e+25 4.711e+17 -44754892 <2e-16 ***
## date1 9.310e+14 2.077e+07 44825994 <2e-16 ***
## date2 5.099e+14 2.054e+07 24823124 <2e-16 ***
## date3 -1.415e+15 2.131e+07 -66393880 <2e-16 ***
## date4 -1.652e+15 2.114e+07 -78134557 <2e-16 ***
## date5 -2.072e+15 2.103e+07 -98507165 <2e-16 ***
## date6 -2.098e+15 2.121e+07 -98918350 <2e-16 ***
## plant.stand.L -2.768e+13 5.756e+06 -4808523 <2e-16 ***
## precip.L -6.255e+14 1.328e+07 -47108460 <2e-16 ***
## precip.Q 6.560e+14 9.112e+06 71998161 <2e-16 ***
## temp.L -2.703e+14 1.118e+07 -24185493 <2e-16 ***
## temp.Q 2.548e+14 6.525e+06 39055351 <2e-16 ***
## hail1 -1.808e+14 8.311e+06 -21749604 <2e-16 ***
## crop.hist1 3.238e+14 1.248e+07 25942195 <2e-16 ***
## crop.hist2 4.045e+14 1.218e+07 33215392 <2e-16 ***
## crop.hist3 2.668e+14 1.236e+07 21591583 <2e-16 ***
## area.dam1 -2.814e+14 1.025e+07 -27454538 <2e-16 ***
## area.dam2 -1.012e+14 9.805e+06 -10316617 <2e-16 ***
## area.dam3 -1.086e+14 9.981e+06 -10879953 <2e-16 ***
## sever1 3.931e+13 8.629e+06 4555366 <2e-16 ***
## sever2 -2.019e+14 1.688e+07 -11958957 <2e-16 ***
## seed.tmt1 4.030e+13 6.980e+06 5774094 <2e-16 ***
## seed.tmt2 3.278e+14 1.225e+07 26760361 <2e-16 ***
## germ.L 8.355e+13 6.154e+06 13576228 <2e-16 ***
## germ.Q 1.370e+14 5.305e+06 25824003 <2e-16 ***
## plant.growth1 -1.508e+14 1.410e+07 -10695205 <2e-16 ***
## leaves1 5.354e+14 1.856e+07 28850337 <2e-16 ***
## leaf.halo1 2.108e+25 4.711e+17 44754892 <2e-16 ***
## leaf.halo2 2.108e+25 4.711e+17 44754892 <2e-16 ***
## leaf.marg1 1.446e+15 3.053e+07 47370581 <2e-16 ***
## leaf.marg2 2.108e+25 4.711e+17 44754892 <2e-16 ***
## leaf.size.L -2.418e+15 3.668e+07 -65924585 <2e-16 ***
## leaf.size.Q 1.249e+15 2.139e+07 58413402 <2e-16 ***
## leaf.shread1 -8.110e+14 1.040e+07 -77989691 <2e-16 ***
## leaf.malf1 5.615e+14 1.717e+07 32713365 <2e-16 ***
## leaf.mild1 -7.721e+14 2.742e+07 -28159615 <2e-16 ***
## leaf.mild2 3.775e+15 3.797e+07 99409041 <2e-16 ***
## lodging1 -2.484e+15 1.516e+07 -163821580 <2e-16 ***
## stem.cankers1 8.707e+14 5.665e+07 15369501 <2e-16 ***
## stem.cankers2 2.496e+15 4.654e+07 53624067 <2e-16 ***
## stem.cankers3 4.396e+15 3.523e+07 124777878 <2e-16 ***
## canker.lesion1 -1.466e+15 2.705e+07 -54196435 <2e-16 ***
## canker.lesion2 -2.651e+15 2.897e+07 -91513932 <2e-16 ***
## canker.lesion3 4.130e+15 2.391e+07 172712651 <2e-16 ***
## fruiting.bodies1 -2.779e+15 1.886e+07 -147369363 <2e-16 ***
## ext.decay1 1.414e+15 1.381e+07 102383326 <2e-16 ***
## mycelium1 1.244e+15 3.273e+07 38007983 <2e-16 ***
## int.discolor1 1.466e+15 2.707e+07 54179598 <2e-16 ***
## int.discolor2 -1.010e+15 4.154e+07 -24320129 <2e-16 ***
## fruit.pods1 4.356e+14 5.346e+07 8148218 <2e-16 ***
## fruit.pods3 1.944e+15 4.725e+07 41146045 <2e-16 ***
## fruit.spots1 2.448e+14 5.067e+07 4831484 <2e-16 ***
## fruit.spots2 1.449e+15 5.684e+07 25486935 <2e-16 ***
## fruit.spots4 -1.682e+15 2.578e+07 -65229236 <2e-16 ***
## seed1 -2.149e+15 3.263e+07 -65871374 <2e-16 ***
## mold.growth1 -4.877e+14 2.891e+07 -16872230 <2e-16 ***
## seed.discolor1 -7.207e+14 2.909e+07 -24773362 <2e-16 ***
## seed.size1 2.413e+15 2.711e+07 89040859 <2e-16 ***
## shriveling1 -5.137e+14 3.026e+07 -16979888 <2e-16 ***
## roots1 -3.168e+15 3.149e+07 -100599782 <2e-16 ***
## roots2 -7.456e+15 3.218e+07 -231722717 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 497.76 on 561 degrees of freedom
## Residual deviance: 5334.46 on 501 degrees of freedom
## (121 observations deleted due to missingness)
## AIC: 5456.5
##
## Number of Fisher Scoring iterations: 25
##
## Call:
## glm(formula = Class ~ date + plant.stand + precip + temp + crop.hist +
## area.dam + seed.tmt + plant.growth + leaves + leaf.halo +
## leaf.marg + leaf.size + leaf.shread + leaf.malf + leaf.mild +
## stem.cankers + canker.lesion + fruiting.bodies + ext.decay +
## mycelium + int.discolor + fruit.pods + fruit.spots + mold.growth +
## seed.discolor + seed.size + shriveling + roots, family = binomial(link = "logit"),
## data = Soybean)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -8.49 0.00 0.00 0.00 8.49
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 6.985e+15 4.326e+07 161476810 <2e-16 ***
## date1 -3.853e+14 2.045e+07 -18838745 <2e-16 ***
## date2 -3.989e+14 2.010e+07 -19843526 <2e-16 ***
## date3 -2.593e+15 2.088e+07 -124208126 <2e-16 ***
## date4 -2.912e+15 2.078e+07 -140151781 <2e-16 ***
## date5 -3.574e+15 2.080e+07 -171845213 <2e-16 ***
## date6 -3.837e+15 2.148e+07 -178609104 <2e-16 ***
## plant.stand.L 1.204e+14 4.984e+06 24146689 <2e-16 ***
## precip.L -1.442e+15 1.293e+07 -111457853 <2e-16 ***
## precip.Q 9.018e+14 9.006e+06 100128714 <2e-16 ***
## temp.L 5.824e+14 1.092e+07 53333663 <2e-16 ***
## temp.Q -4.461e+14 6.369e+06 -70035484 <2e-16 ***
## crop.hist1 2.266e+14 1.224e+07 18515357 <2e-16 ***
## crop.hist2 8.110e+13 1.190e+07 6813876 <2e-16 ***
## crop.hist3 1.887e+14 1.209e+07 15607118 <2e-16 ***
## area.dam1 -2.472e+14 1.039e+07 -23802017 <2e-16 ***
## area.dam2 -3.316e+14 9.739e+06 -34046885 <2e-16 ***
## area.dam3 -1.073e+14 9.904e+06 -10830887 <2e-16 ***
## seed.tmt1 8.756e+13 7.004e+06 12501326 <2e-16 ***
## seed.tmt2 4.288e+14 1.346e+07 31860834 <2e-16 ***
## plant.growth1 2.000e+14 1.404e+07 14240624 <2e-16 ***
## leaves1 -1.137e+15 1.816e+07 -62633737 <2e-16 ***
## leaf.halo1 -3.994e+15 5.388e+07 -74132496 <2e-16 ***
## leaf.halo2 -1.488e+15 5.059e+07 -29415040 <2e-16 ***
## leaf.marg1 1.866e+15 3.004e+07 62126624 <2e-16 ***
## leaf.marg2 NA NA NA NA
## leaf.size.L -1.803e+15 3.660e+07 -49265812 <2e-16 ***
## leaf.size.Q 1.219e+15 2.132e+07 57159769 <2e-16 ***
## leaf.shread1 1.305e+14 1.033e+07 12631272 <2e-16 ***
## leaf.malf1 1.965e+15 1.718e+07 114381108 <2e-16 ***
## leaf.mild1 1.002e+15 2.745e+07 36514185 <2e-16 ***
## leaf.mild2 3.733e+15 3.256e+07 114648054 <2e-16 ***
## stem.cankers1 -2.897e+14 5.551e+07 -5219409 <2e-16 ***
## stem.cankers2 -6.315e+13 4.547e+07 -1388810 <2e-16 ***
## stem.cankers3 2.918e+15 3.458e+07 84391062 <2e-16 ***
## canker.lesion1 -3.807e+14 2.645e+07 -14393513 <2e-16 ***
## canker.lesion2 -3.296e+14 2.854e+07 -11548480 <2e-16 ***
## canker.lesion3 1.995e+15 2.197e+07 90810331 <2e-16 ***
## fruiting.bodies1 5.014e+14 1.824e+07 27480242 <2e-16 ***
## ext.decay1 -5.624e+14 1.343e+07 -41865597 <2e-16 ***
## mycelium1 9.117e+14 3.221e+07 28300022 <2e-16 ***
## int.discolor1 1.292e+14 2.568e+07 5032637 <2e-16 ***
## int.discolor2 -2.615e+15 3.955e+07 -66123102 <2e-16 ***
## fruit.pods1 3.058e+15 5.280e+07 57922690 <2e-16 ***
## fruit.pods3 1.780e+15 4.598e+07 38701322 <2e-16 ***
## fruit.spots1 -7.670e+14 5.033e+07 -15240563 <2e-16 ***
## fruit.spots2 -4.478e+15 5.606e+07 -79885210 <2e-16 ***
## fruit.spots4 -1.487e+15 2.477e+07 -60047093 <2e-16 ***
## mold.growth1 8.227e+14 2.477e+07 33216518 <2e-16 ***
## seed.discolor1 -1.456e+15 1.575e+07 -92469149 <2e-16 ***
## seed.size1 -1.151e+14 2.644e+07 -4353207 <2e-16 ***
## shriveling1 3.768e+14 2.906e+07 12963748 <2e-16 ***
## roots1 -9.219e+14 3.114e+07 -29609711 <2e-16 ***
## roots2 8.166e+13 7.413e+07 1101647 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 497.76 on 561 degrees of freedom
## Residual deviance: 1874.27 on 509 degrees of freedom
## (121 observations deleted due to missingness)
## AIC: 1980.3
##
## Number of Fisher Scoring iterations: 25
Code used in analysis
knitr::opts_chunk$set(
echo = FALSE,
message = FALSE,
warning = FALSE
)
#knitr::opts_chunk$set(echo = TRUE)
require(knitr)
library(ggplot2)
library(tidyr)
library(MASS)
library(psych)
library(kableExtra)
library(dplyr)
library(faraway)
library(gridExtra)
library(reshape2)
library(leaps)
library(pROC)
library(caret)
library(naniar)
library(pander)
library(pROC)
library(mlbench)
library(e1071)
data(Glass)
str(Glass)
plot(Glass)
Glass2 <-subset(Glass, select = -c(Type))
GlassCor<-cor(Glass2)
corrplot::corrplot(GlassCor)
par(mfrow=c(3,3))
sapply(Glass2,hist)
skewvalues<-sapply(Glass2,skewness)
skewvalues
m1<-glm(Type~.,data=Glass,family="binomial"(link="logit"))
#summary(m1)
halfnorm(hatvalues(m1))
par(mfrow = c(2,2))
plot(Glass2)
boxplot(Glass)
m1<-glm(Type~.,data=Glass,family="binomial"(link="logit"))
sapply(Glass2,BoxCoxTrans)
data(Soybean, package = 'mlbench')
#names(Soybean)
#str(Soybean)
boxplot(Soybean[3:36], las=2)
ggplot(Soybean, aes(Class))+
geom_bar()+
theme(axis.text.x = element_text(angle = 90, hjust = 1))
data.frame(table(Soybean$Class))
par(mfrow = c(6,6),mar=c(2,2,2,2))
plot(Class~., data = Soybean, las = 1)
par(mfrow = c(6,6),mar=c(2,2,2,2))
lapply(Soybean[3:36], plot)
summary(complete.cases(Soybean))
vis_miss(Soybean)
gg_miss_upset(Soybean)
sapply(Soybean, function(x) round(sum(is.na(x))/nrow(Soybean)*100,1))
lmod<-glm(Class~.,data=Soybean,family="binomial"(link="logit"))
summary(lmod)
slmod<-step(lmod)
summary(slmod)
lmod2<-update(slmod, .~.-seed-hail-lodging-germ-sever)
summary(lmod2)