3.1

The UC Irvine Machine Learning Repository6 contains a data set related to glass identification. The data consist of 214 glass samples labeled as one of seven class categories. There are nine predictors, including the refractive index and percentages of eight elements: Na, Mg, Al, Si, K, Ca, Ba, and Fe. The data can be accessed via:

## 'data.frame':    214 obs. of  10 variables:
##  $ RI  : num  1.52 1.52 1.52 1.52 1.52 ...
##  $ Na  : num  13.6 13.9 13.5 13.2 13.3 ...
##  $ Mg  : num  4.49 3.6 3.55 3.69 3.62 3.61 3.6 3.61 3.58 3.6 ...
##  $ Al  : num  1.1 1.36 1.54 1.29 1.24 1.62 1.14 1.05 1.37 1.36 ...
##  $ Si  : num  71.8 72.7 73 72.6 73.1 ...
##  $ K   : num  0.06 0.48 0.39 0.57 0.55 0.64 0.58 0.57 0.56 0.57 ...
##  $ Ca  : num  8.75 7.83 7.78 8.22 8.07 8.07 8.17 8.24 8.3 8.4 ...
##  $ Ba  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Fe  : num  0 0 0 0 0 0.26 0 0 0 0.11 ...
##  $ Type: Factor w/ 6 levels "1","2","3","5",..: 1 1 1 1 1 1 1 1 1 1 ...

A

Using visualizations, explore the predictor variables to understand their distributions as well as the relationships between predictors.

##          RI         Na        Mg         Al        Si         K         
## breaks   Numeric,13 Integer,9 Numeric,10 Numeric,8 Numeric,13 Numeric,14
## counts   Integer,12 Integer,8 Integer,9  Integer,7 Integer,12 Integer,13
## density  Numeric,12 Numeric,8 Numeric,9  Numeric,7 Numeric,12 Numeric,13
## mids     Numeric,12 Numeric,8 Numeric,9  Numeric,7 Numeric,12 Numeric,13
## xname    "X[[i]]"   "X[[i]]"  "X[[i]]"   "X[[i]]"  "X[[i]]"   "X[[i]]"  
## equidist TRUE       TRUE      TRUE       TRUE      TRUE       TRUE      
##          Ca         Ba        Fe        
## breaks   Integer,13 Numeric,8 Numeric,12
## counts   Integer,12 Integer,7 Integer,11
## density  Numeric,12 Numeric,7 Numeric,11
## mids     Numeric,12 Numeric,7 Numeric,11
## xname    "X[[i]]"   "X[[i]]"  "X[[i]]"  
## equidist TRUE       TRUE      TRUE

B

Do there appear to be any outliers in the data? Are any predictors skewed?

The skew values show that RI,Mg, K, CA, BA and Fe are skewed more than 1 or -1. The halfnorm plots show some outliers but do not seem to have high leverage as the are in the same direction as the trends.

##         RI         Na         Mg         Al         Si          K         Ca 
##  1.6027151  0.4478343 -1.1364523  0.8946104 -0.7202392  6.4600889  2.0184463 
##         Ba         Fe 
##  3.3686800  1.7298107

C

Are there any relevant transformations of one or more predictors that might improve the classification model?

The boxplot show that Si has a different magnitude then the other varialbes and a transformation my help. The box transformation data gives us some possible transformation values for RI(-2),Na(-.1),AL(.5),SI(2) and Ca(-1.1)

##          RI        Na        Mg        Al        Si         K         Ca       
## lambda   -2        -0.1      NA        0.5       2          NA        -1.1     
## fudge    0.2       0.2       Numeric,6 0.2       0.2        Numeric,6 0.2      
## n        214       214       Inf       214       214        Inf       214      
## summary  Numeric,6 Numeric,6 214       Numeric,6 Numeric,6  214       Numeric,6
## ratio    1.015075  1.619758  0.2       12.06897  1.080218   0.2       2.981584 
## skewness 1.602715  0.4478343 -1.136452 0.8946104 -0.7202392 6.460089  2.018446 
##          Ba        Fe       
## lambda   NA        NA       
## fudge    Numeric,6 Numeric,6
## n        Inf       Inf      
## summary  214       214      
## ratio    0.2       0.2      
## skewness 3.36868   1.729811

3.2

The soybean data can also be found at the UC Irvine Machine Learning Repository. Data were collected to predict disease in 683 soybeans. The 35 predictors are mostly categorical and include information on the environmental conditions (e.g., temperature, precipitation) and plant conditions (e.g., left spots, mold growth). The outcome labels consist of 19 distinct classes.

A

Investigate the frequency distributions for the categorical predictors. Are any of the distributions degenerate in the ways discussed earlier in this chapter?

0.56% of the predictors are right skewed

##                           Var1 Freq
## 1                 2-4-d-injury   16
## 2          alternarialeaf-spot   91
## 3                  anthracnose   44
## 4             bacterial-blight   20
## 5            bacterial-pustule   20
## 6                   brown-spot   92
## 7               brown-stem-rot   44
## 8                 charcoal-rot   20
## 9                cyst-nematode   14
## 10 diaporthe-pod-&-stem-blight   15
## 11       diaporthe-stem-canker   20
## 12                downy-mildew   20
## 13          frog-eye-leaf-spot   91
## 14            herbicide-injury    8
## 15      phyllosticta-leaf-spot   20
## 16            phytophthora-rot   88
## 17              powdery-mildew   20
## 18           purple-seed-stain   20
## 19        rhizoctonia-root-rot   20

## $plant.stand
##      [,1]
## [1,]  0.7
## [2,]  1.9
## 
## $precip
##      [,1]
## [1,]  0.7
## [2,]  1.9
## [3,]  3.1
## 
## $temp
##      [,1]
## [1,]  0.7
## [2,]  1.9
## [3,]  3.1
## 
## $hail
##      [,1]
## [1,]  0.7
## [2,]  1.9
## 
## $crop.hist
##      [,1]
## [1,]  0.7
## [2,]  1.9
## [3,]  3.1
## [4,]  4.3
## 
## $area.dam
##      [,1]
## [1,]  0.7
## [2,]  1.9
## [3,]  3.1
## [4,]  4.3
## 
## $sever
##      [,1]
## [1,]  0.7
## [2,]  1.9
## [3,]  3.1
## 
## $seed.tmt
##      [,1]
## [1,]  0.7
## [2,]  1.9
## [3,]  3.1
## 
## $germ
##      [,1]
## [1,]  0.7
## [2,]  1.9
## [3,]  3.1
## 
## $plant.growth
##      [,1]
## [1,]  0.7
## [2,]  1.9
## 
## $leaves
##      [,1]
## [1,]  0.7
## [2,]  1.9
## 
## $leaf.halo
##      [,1]
## [1,]  0.7
## [2,]  1.9
## [3,]  3.1
## 
## $leaf.marg
##      [,1]
## [1,]  0.7
## [2,]  1.9
## [3,]  3.1
## 
## $leaf.size
##      [,1]
## [1,]  0.7
## [2,]  1.9
## [3,]  3.1
## 
## $leaf.shread
##      [,1]
## [1,]  0.7
## [2,]  1.9
## 
## $leaf.malf
##      [,1]
## [1,]  0.7
## [2,]  1.9
## 
## $leaf.mild
##      [,1]
## [1,]  0.7
## [2,]  1.9
## [3,]  3.1
## 
## $stem
##      [,1]
## [1,]  0.7
## [2,]  1.9
## 
## $lodging
##      [,1]
## [1,]  0.7
## [2,]  1.9
## 
## $stem.cankers
##      [,1]
## [1,]  0.7
## [2,]  1.9
## [3,]  3.1
## [4,]  4.3
## 
## $canker.lesion
##      [,1]
## [1,]  0.7
## [2,]  1.9
## [3,]  3.1
## [4,]  4.3
## 
## $fruiting.bodies
##      [,1]
## [1,]  0.7
## [2,]  1.9
## 
## $ext.decay
##      [,1]
## [1,]  0.7
## [2,]  1.9
## [3,]  3.1
## 
## $mycelium
##      [,1]
## [1,]  0.7
## [2,]  1.9
## 
## $int.discolor
##      [,1]
## [1,]  0.7
## [2,]  1.9
## [3,]  3.1
## 
## $sclerotia
##      [,1]
## [1,]  0.7
## [2,]  1.9
## 
## $fruit.pods
##      [,1]
## [1,]  0.7
## [2,]  1.9
## [3,]  3.1
## [4,]  4.3
## 
## $fruit.spots
##      [,1]
## [1,]  0.7
## [2,]  1.9
## [3,]  3.1
## [4,]  4.3
## 
## $seed
##      [,1]
## [1,]  0.7
## [2,]  1.9
## 
## $mold.growth
##      [,1]
## [1,]  0.7
## [2,]  1.9
## 
## $seed.discolor
##      [,1]
## [1,]  0.7
## [2,]  1.9
## 
## $seed.size
##      [,1]
## [1,]  0.7
## [2,]  1.9
## 
## $shriveling
##      [,1]
## [1,]  0.7
## [2,]  1.9
## 
## $roots
##      [,1]
## [1,]  0.7
## [2,]  1.9
## [3,]  3.1

B

Roughly 18% of the data are missing. Are there particular predictors that are more likely to be missing? Is the pattern of missing data related to the classes

The appears to several variables that are missing more data than others including germ, hail, server, seed and logding.

##    Mode   FALSE    TRUE 
## logical     121     562

##           Class            date     plant.stand          precip            temp 
##             0.0             0.1             5.3             5.6             4.4 
##            hail       crop.hist        area.dam           sever        seed.tmt 
##            17.7             2.3             0.1            17.7            17.7 
##            germ    plant.growth          leaves       leaf.halo       leaf.marg 
##            16.4             2.3             0.0            12.3            12.3 
##       leaf.size     leaf.shread       leaf.malf       leaf.mild            stem 
##            12.3            14.6            12.3            15.8             2.3 
##         lodging    stem.cankers   canker.lesion fruiting.bodies       ext.decay 
##            17.7             5.6             5.6            15.5             5.6 
##        mycelium    int.discolor       sclerotia      fruit.pods     fruit.spots 
##             5.6             5.6             5.6            12.3            15.5 
##            seed     mold.growth   seed.discolor       seed.size      shriveling 
##            13.5            13.5            15.5            13.5            15.5 
##           roots 
##             4.5

C

Develop a strategy for handling missing data, either by eliminating predictors or imputation.

I first run a GLM on all the data with Class as the predicted value. I ran the step mode then reran after removing the 4 columns with most missing data.

## 
## Call:
## glm(formula = Class ~ ., family = binomial(link = "logit"), data = Soybean)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
##  -8.49    0.00    0.00    0.00    8.49  
## 
## Coefficients: (2 not defined because of singularities)
##                    Estimate Std. Error    z value Pr(>|z|)    
## (Intercept)       4.587e+15  4.444e+07  103216018   <2e-16 ***
## date1             4.698e+14  2.084e+07   22541307   <2e-16 ***
## date2            -5.824e+14  2.064e+07  -28210137   <2e-16 ***
## date3            -2.571e+15  2.159e+07 -119074437   <2e-16 ***
## date4            -2.728e+15  2.139e+07 -127576208   <2e-16 ***
## date5            -2.905e+15  2.132e+07 -136262644   <2e-16 ***
## date6            -3.451e+15  2.204e+07 -156580853   <2e-16 ***
## plant.stand.L     1.939e+14  5.819e+06   33320092   <2e-16 ***
## precip.L         -6.197e+14  1.333e+07  -46489783   <2e-16 ***
## precip.Q          2.031e+14  9.404e+06   21596947   <2e-16 ***
## temp.L            1.098e+14  1.124e+07    9776535   <2e-16 ***
## temp.Q           -1.178e+14  6.576e+06  -17906974   <2e-16 ***
## hail1             1.997e+14  8.948e+06   22316290   <2e-16 ***
## crop.hist1       -7.596e+12  1.262e+07    -602060   <2e-16 ***
## crop.hist2        3.180e+13  1.219e+07    2609630   <2e-16 ***
## crop.hist3       -4.568e+13  1.236e+07   -3696906   <2e-16 ***
## area.dam1        -2.714e+13  1.050e+07   -2585694   <2e-16 ***
## area.dam2        -9.215e+12  9.817e+06    -938632   <2e-16 ***
## area.dam3         3.248e+13  9.986e+06    3252995   <2e-16 ***
## sever1            2.562e+14  8.645e+06   29634177   <2e-16 ***
## sever2           -9.649e+13  1.714e+07   -5630662   <2e-16 ***
## seed.tmt1        -5.915e+13  7.078e+06   -8357845   <2e-16 ***
## seed.tmt2         1.556e+14  1.362e+07   11430316   <2e-16 ***
## germ.L           -1.470e+14  6.239e+06  -23556416   <2e-16 ***
## germ.Q            1.409e+14  5.330e+06   26430840   <2e-16 ***
## plant.growth1     8.255e+14  1.431e+07   57678943   <2e-16 ***
## leaves1           7.164e+14  1.890e+07   37910393   <2e-16 ***
## leaf.halo1       -3.932e+15  5.429e+07  -72420358   <2e-16 ***
## leaf.halo2       -1.748e+15  5.077e+07  -34430192   <2e-16 ***
## leaf.marg1        6.142e+14  3.055e+07   20101914   <2e-16 ***
## leaf.marg2               NA         NA         NA       NA    
## leaf.size.L      -2.097e+15  3.676e+07  -57048417   <2e-16 ***
## leaf.size.Q       1.062e+15  2.141e+07   49614233   <2e-16 ***
## leaf.shread1      6.736e+13  1.041e+07    6473496   <2e-16 ***
## leaf.malf1        4.090e+14  1.726e+07   23693055   <2e-16 ***
## leaf.mild1        3.263e+14  2.905e+07   11229260   <2e-16 ***
## leaf.mild2        2.877e+15  3.803e+07   75649461   <2e-16 ***
## stem1            -6.878e+14  2.816e+07  -24427649   <2e-16 ***
## lodging1         -2.302e+15  1.567e+07 -146835250   <2e-16 ***
## stem.cankers1     2.495e+15  5.986e+07   41684437   <2e-16 ***
## stem.cankers2     3.674e+15  4.992e+07   73590961   <2e-16 ***
## stem.cankers3     5.968e+15  4.087e+07  145998348   <2e-16 ***
## canker.lesion1   -1.218e+15  2.660e+07  -45792973   <2e-16 ***
## canker.lesion2   -2.927e+14  2.865e+07  -10217274   <2e-16 ***
## canker.lesion3    5.936e+15  2.667e+07  222585592   <2e-16 ***
## fruiting.bodies1 -4.239e+13  1.969e+07   -2152378   <2e-16 ***
## ext.decay1       -8.834e+14  1.386e+07  -63746398   <2e-16 ***
## mycelium1         4.920e+15  3.278e+07  150078192   <2e-16 ***
## int.discolor1     8.692e+14  3.487e+07   24927370   <2e-16 ***
## int.discolor2    -1.999e+15  4.522e+07  -44202762   <2e-16 ***
## sclerotia1               NA         NA         NA       NA    
## fruit.pods1       3.838e+15  5.350e+07   71745375   <2e-16 ***
## fruit.pods3       4.048e+14  4.784e+07    8460200   <2e-16 ***
## fruit.spots1     -2.924e+15  5.096e+07  -57375041   <2e-16 ***
## fruit.spots2     -4.669e+15  5.712e+07  -81733023   <2e-16 ***
## fruit.spots4     -2.765e+15  2.605e+07 -106154965   <2e-16 ***
## seed1            -2.213e+15  3.263e+07  -67821359   <2e-16 ***
## mold.growth1      2.342e+15  2.892e+07   81006454   <2e-16 ***
## seed.discolor1   -8.474e+14  2.939e+07  -28837763   <2e-16 ***
## seed.size1        7.853e+14  2.713e+07   28950407   <2e-16 ***
## shriveling1      -8.927e+14  3.029e+07  -29470681   <2e-16 ***
## roots1            5.119e+14  3.151e+07   16243708   <2e-16 ***
## roots2            1.207e+15  7.446e+07   16203584   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance:  497.76  on 561  degrees of freedom
## Residual deviance: 1585.92  on 501  degrees of freedom
##   (121 observations deleted due to missingness)
## AIC: 1707.9
## 
## Number of Fisher Scoring iterations: 25
## Start:  AIC=1707.92
## Class ~ date + plant.stand + precip + temp + hail + crop.hist + 
##     area.dam + sever + seed.tmt + germ + plant.growth + leaves + 
##     leaf.halo + leaf.marg + leaf.size + leaf.shread + leaf.malf + 
##     leaf.mild + stem + lodging + stem.cankers + canker.lesion + 
##     fruiting.bodies + ext.decay + mycelium + int.discolor + sclerotia + 
##     fruit.pods + fruit.spots + seed + mold.growth + seed.discolor + 
##     seed.size + shriveling + roots
## 
## 
## Step:  AIC=1707.92
## Class ~ date + plant.stand + precip + temp + hail + crop.hist + 
##     area.dam + sever + seed.tmt + germ + plant.growth + leaves + 
##     leaf.halo + leaf.marg + leaf.size + leaf.shread + leaf.malf + 
##     leaf.mild + stem + lodging + stem.cankers + canker.lesion + 
##     fruiting.bodies + ext.decay + mycelium + int.discolor + fruit.pods + 
##     fruit.spots + seed + mold.growth + seed.discolor + seed.size + 
##     shriveling + roots
## 
## 
## Step:  AIC=5456.46
## Class ~ date + plant.stand + precip + temp + hail + crop.hist + 
##     area.dam + sever + seed.tmt + germ + plant.growth + leaves + 
##     leaf.halo + leaf.marg + leaf.size + leaf.shread + leaf.malf + 
##     leaf.mild + lodging + stem.cankers + canker.lesion + fruiting.bodies + 
##     ext.decay + mycelium + int.discolor + fruit.pods + fruit.spots + 
##     seed + mold.growth + seed.discolor + seed.size + shriveling + 
##     roots
## 
## Call:
## glm(formula = Class ~ date + plant.stand + precip + temp + hail + 
##     crop.hist + area.dam + sever + seed.tmt + germ + plant.growth + 
##     leaves + leaf.halo + leaf.marg + leaf.size + leaf.shread + 
##     leaf.malf + leaf.mild + lodging + stem.cankers + canker.lesion + 
##     fruiting.bodies + ext.decay + mycelium + int.discolor + fruit.pods + 
##     fruit.spots + seed + mold.growth + seed.discolor + seed.size + 
##     shriveling + roots, family = binomial(link = "logit"), data = Soybean)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
##  -8.49    0.00    0.00    0.00    8.49  
## 
## Coefficients:
##                    Estimate Std. Error    z value Pr(>|z|)    
## (Intercept)      -2.108e+25  4.711e+17  -44754892   <2e-16 ***
## date1             9.310e+14  2.077e+07   44825994   <2e-16 ***
## date2             5.099e+14  2.054e+07   24823124   <2e-16 ***
## date3            -1.415e+15  2.131e+07  -66393880   <2e-16 ***
## date4            -1.652e+15  2.114e+07  -78134557   <2e-16 ***
## date5            -2.072e+15  2.103e+07  -98507165   <2e-16 ***
## date6            -2.098e+15  2.121e+07  -98918350   <2e-16 ***
## plant.stand.L    -2.768e+13  5.756e+06   -4808523   <2e-16 ***
## precip.L         -6.255e+14  1.328e+07  -47108460   <2e-16 ***
## precip.Q          6.560e+14  9.112e+06   71998161   <2e-16 ***
## temp.L           -2.703e+14  1.118e+07  -24185493   <2e-16 ***
## temp.Q            2.548e+14  6.525e+06   39055351   <2e-16 ***
## hail1            -1.808e+14  8.311e+06  -21749604   <2e-16 ***
## crop.hist1        3.238e+14  1.248e+07   25942195   <2e-16 ***
## crop.hist2        4.045e+14  1.218e+07   33215392   <2e-16 ***
## crop.hist3        2.668e+14  1.236e+07   21591583   <2e-16 ***
## area.dam1        -2.814e+14  1.025e+07  -27454538   <2e-16 ***
## area.dam2        -1.012e+14  9.805e+06  -10316617   <2e-16 ***
## area.dam3        -1.086e+14  9.981e+06  -10879953   <2e-16 ***
## sever1            3.931e+13  8.629e+06    4555366   <2e-16 ***
## sever2           -2.019e+14  1.688e+07  -11958957   <2e-16 ***
## seed.tmt1         4.030e+13  6.980e+06    5774094   <2e-16 ***
## seed.tmt2         3.278e+14  1.225e+07   26760361   <2e-16 ***
## germ.L            8.355e+13  6.154e+06   13576228   <2e-16 ***
## germ.Q            1.370e+14  5.305e+06   25824003   <2e-16 ***
## plant.growth1    -1.508e+14  1.410e+07  -10695205   <2e-16 ***
## leaves1           5.354e+14  1.856e+07   28850337   <2e-16 ***
## leaf.halo1        2.108e+25  4.711e+17   44754892   <2e-16 ***
## leaf.halo2        2.108e+25  4.711e+17   44754892   <2e-16 ***
## leaf.marg1        1.446e+15  3.053e+07   47370581   <2e-16 ***
## leaf.marg2        2.108e+25  4.711e+17   44754892   <2e-16 ***
## leaf.size.L      -2.418e+15  3.668e+07  -65924585   <2e-16 ***
## leaf.size.Q       1.249e+15  2.139e+07   58413402   <2e-16 ***
## leaf.shread1     -8.110e+14  1.040e+07  -77989691   <2e-16 ***
## leaf.malf1        5.615e+14  1.717e+07   32713365   <2e-16 ***
## leaf.mild1       -7.721e+14  2.742e+07  -28159615   <2e-16 ***
## leaf.mild2        3.775e+15  3.797e+07   99409041   <2e-16 ***
## lodging1         -2.484e+15  1.516e+07 -163821580   <2e-16 ***
## stem.cankers1     8.707e+14  5.665e+07   15369501   <2e-16 ***
## stem.cankers2     2.496e+15  4.654e+07   53624067   <2e-16 ***
## stem.cankers3     4.396e+15  3.523e+07  124777878   <2e-16 ***
## canker.lesion1   -1.466e+15  2.705e+07  -54196435   <2e-16 ***
## canker.lesion2   -2.651e+15  2.897e+07  -91513932   <2e-16 ***
## canker.lesion3    4.130e+15  2.391e+07  172712651   <2e-16 ***
## fruiting.bodies1 -2.779e+15  1.886e+07 -147369363   <2e-16 ***
## ext.decay1        1.414e+15  1.381e+07  102383326   <2e-16 ***
## mycelium1         1.244e+15  3.273e+07   38007983   <2e-16 ***
## int.discolor1     1.466e+15  2.707e+07   54179598   <2e-16 ***
## int.discolor2    -1.010e+15  4.154e+07  -24320129   <2e-16 ***
## fruit.pods1       4.356e+14  5.346e+07    8148218   <2e-16 ***
## fruit.pods3       1.944e+15  4.725e+07   41146045   <2e-16 ***
## fruit.spots1      2.448e+14  5.067e+07    4831484   <2e-16 ***
## fruit.spots2      1.449e+15  5.684e+07   25486935   <2e-16 ***
## fruit.spots4     -1.682e+15  2.578e+07  -65229236   <2e-16 ***
## seed1            -2.149e+15  3.263e+07  -65871374   <2e-16 ***
## mold.growth1     -4.877e+14  2.891e+07  -16872230   <2e-16 ***
## seed.discolor1   -7.207e+14  2.909e+07  -24773362   <2e-16 ***
## seed.size1        2.413e+15  2.711e+07   89040859   <2e-16 ***
## shriveling1      -5.137e+14  3.026e+07  -16979888   <2e-16 ***
## roots1           -3.168e+15  3.149e+07 -100599782   <2e-16 ***
## roots2           -7.456e+15  3.218e+07 -231722717   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance:  497.76  on 561  degrees of freedom
## Residual deviance: 5334.46  on 501  degrees of freedom
##   (121 observations deleted due to missingness)
## AIC: 5456.5
## 
## Number of Fisher Scoring iterations: 25
## 
## Call:
## glm(formula = Class ~ date + plant.stand + precip + temp + crop.hist + 
##     area.dam + seed.tmt + plant.growth + leaves + leaf.halo + 
##     leaf.marg + leaf.size + leaf.shread + leaf.malf + leaf.mild + 
##     stem.cankers + canker.lesion + fruiting.bodies + ext.decay + 
##     mycelium + int.discolor + fruit.pods + fruit.spots + mold.growth + 
##     seed.discolor + seed.size + shriveling + roots, family = binomial(link = "logit"), 
##     data = Soybean)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
##  -8.49    0.00    0.00    0.00    8.49  
## 
## Coefficients: (1 not defined because of singularities)
##                    Estimate Std. Error    z value Pr(>|z|)    
## (Intercept)       6.985e+15  4.326e+07  161476810   <2e-16 ***
## date1            -3.853e+14  2.045e+07  -18838745   <2e-16 ***
## date2            -3.989e+14  2.010e+07  -19843526   <2e-16 ***
## date3            -2.593e+15  2.088e+07 -124208126   <2e-16 ***
## date4            -2.912e+15  2.078e+07 -140151781   <2e-16 ***
## date5            -3.574e+15  2.080e+07 -171845213   <2e-16 ***
## date6            -3.837e+15  2.148e+07 -178609104   <2e-16 ***
## plant.stand.L     1.204e+14  4.984e+06   24146689   <2e-16 ***
## precip.L         -1.442e+15  1.293e+07 -111457853   <2e-16 ***
## precip.Q          9.018e+14  9.006e+06  100128714   <2e-16 ***
## temp.L            5.824e+14  1.092e+07   53333663   <2e-16 ***
## temp.Q           -4.461e+14  6.369e+06  -70035484   <2e-16 ***
## crop.hist1        2.266e+14  1.224e+07   18515357   <2e-16 ***
## crop.hist2        8.110e+13  1.190e+07    6813876   <2e-16 ***
## crop.hist3        1.887e+14  1.209e+07   15607118   <2e-16 ***
## area.dam1        -2.472e+14  1.039e+07  -23802017   <2e-16 ***
## area.dam2        -3.316e+14  9.739e+06  -34046885   <2e-16 ***
## area.dam3        -1.073e+14  9.904e+06  -10830887   <2e-16 ***
## seed.tmt1         8.756e+13  7.004e+06   12501326   <2e-16 ***
## seed.tmt2         4.288e+14  1.346e+07   31860834   <2e-16 ***
## plant.growth1     2.000e+14  1.404e+07   14240624   <2e-16 ***
## leaves1          -1.137e+15  1.816e+07  -62633737   <2e-16 ***
## leaf.halo1       -3.994e+15  5.388e+07  -74132496   <2e-16 ***
## leaf.halo2       -1.488e+15  5.059e+07  -29415040   <2e-16 ***
## leaf.marg1        1.866e+15  3.004e+07   62126624   <2e-16 ***
## leaf.marg2               NA         NA         NA       NA    
## leaf.size.L      -1.803e+15  3.660e+07  -49265812   <2e-16 ***
## leaf.size.Q       1.219e+15  2.132e+07   57159769   <2e-16 ***
## leaf.shread1      1.305e+14  1.033e+07   12631272   <2e-16 ***
## leaf.malf1        1.965e+15  1.718e+07  114381108   <2e-16 ***
## leaf.mild1        1.002e+15  2.745e+07   36514185   <2e-16 ***
## leaf.mild2        3.733e+15  3.256e+07  114648054   <2e-16 ***
## stem.cankers1    -2.897e+14  5.551e+07   -5219409   <2e-16 ***
## stem.cankers2    -6.315e+13  4.547e+07   -1388810   <2e-16 ***
## stem.cankers3     2.918e+15  3.458e+07   84391062   <2e-16 ***
## canker.lesion1   -3.807e+14  2.645e+07  -14393513   <2e-16 ***
## canker.lesion2   -3.296e+14  2.854e+07  -11548480   <2e-16 ***
## canker.lesion3    1.995e+15  2.197e+07   90810331   <2e-16 ***
## fruiting.bodies1  5.014e+14  1.824e+07   27480242   <2e-16 ***
## ext.decay1       -5.624e+14  1.343e+07  -41865597   <2e-16 ***
## mycelium1         9.117e+14  3.221e+07   28300022   <2e-16 ***
## int.discolor1     1.292e+14  2.568e+07    5032637   <2e-16 ***
## int.discolor2    -2.615e+15  3.955e+07  -66123102   <2e-16 ***
## fruit.pods1       3.058e+15  5.280e+07   57922690   <2e-16 ***
## fruit.pods3       1.780e+15  4.598e+07   38701322   <2e-16 ***
## fruit.spots1     -7.670e+14  5.033e+07  -15240563   <2e-16 ***
## fruit.spots2     -4.478e+15  5.606e+07  -79885210   <2e-16 ***
## fruit.spots4     -1.487e+15  2.477e+07  -60047093   <2e-16 ***
## mold.growth1      8.227e+14  2.477e+07   33216518   <2e-16 ***
## seed.discolor1   -1.456e+15  1.575e+07  -92469149   <2e-16 ***
## seed.size1       -1.151e+14  2.644e+07   -4353207   <2e-16 ***
## shriveling1       3.768e+14  2.906e+07   12963748   <2e-16 ***
## roots1           -9.219e+14  3.114e+07  -29609711   <2e-16 ***
## roots2            8.166e+13  7.413e+07    1101647   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance:  497.76  on 561  degrees of freedom
## Residual deviance: 1874.27  on 509  degrees of freedom
##   (121 observations deleted due to missingness)
## AIC: 1980.3
## 
## Number of Fisher Scoring iterations: 25

APPENDIX

Code used in analysis

knitr::opts_chunk$set(
    echo = FALSE,
    message = FALSE,
    warning = FALSE
)
#knitr::opts_chunk$set(echo = TRUE)
require(knitr)
library(ggplot2)
library(tidyr)
library(MASS)
library(psych)
library(kableExtra)
library(dplyr)
library(faraway)
library(gridExtra)
library(reshape2)
library(leaps)
library(pROC)
library(caret)
library(naniar)
library(pander)
library(pROC)
library(mlbench)
library(e1071)
data(Glass)
str(Glass)

plot(Glass)
Glass2 <-subset(Glass, select = -c(Type))
GlassCor<-cor(Glass2)
corrplot::corrplot(GlassCor)
par(mfrow=c(3,3))
sapply(Glass2,hist)


skewvalues<-sapply(Glass2,skewness)
skewvalues

m1<-glm(Type~.,data=Glass,family="binomial"(link="logit"))
#summary(m1)
halfnorm(hatvalues(m1))

par(mfrow = c(2,2))
plot(Glass2)

boxplot(Glass)
m1<-glm(Type~.,data=Glass,family="binomial"(link="logit"))
sapply(Glass2,BoxCoxTrans)
data(Soybean, package = 'mlbench')

#names(Soybean)
#str(Soybean)

boxplot(Soybean[3:36], las=2)

ggplot(Soybean, aes(Class))+
    geom_bar()+
    theme(axis.text.x = element_text(angle = 90, hjust = 1))
data.frame(table(Soybean$Class))

par(mfrow = c(6,6),mar=c(2,2,2,2))
plot(Class~., data = Soybean, las = 1)

par(mfrow = c(6,6),mar=c(2,2,2,2))
lapply(Soybean[3:36], plot)

summary(complete.cases(Soybean))
vis_miss(Soybean)
gg_miss_upset(Soybean)
sapply(Soybean, function(x) round(sum(is.na(x))/nrow(Soybean)*100,1))
lmod<-glm(Class~.,data=Soybean,family="binomial"(link="logit"))
summary(lmod)

slmod<-step(lmod)
summary(slmod)


lmod2<-update(slmod, .~.-seed-hail-lodging-germ-sever)
summary(lmod2)