1 Data Preview

Here is an observation of what the first 10 types of glass look like so we can get an idea for the dataset.

RI	Na	Mg	Al	Si	K	Ca	Fe	Type
1.52101	13.64	4.49	1.10	71.78	0.06	8.75	0.00	1
1.51761	13.89	3.60	1.36	72.73	0.48	7.83	0.00	1
1.51618	13.53	3.55	1.54	72.99	0.39	7.78	0.00	1
1.51766	13.21	3.69	1.29	72.61	0.57	8.22	0.00	1
1.51742	13.27	3.62	1.24	73.08	0.55	8.07	0.00	1
1.51596	12.79	3.61	1.62	72.97	0.64	8.07	0.26	1
1.51743	13.30	3.60	1.14	73.09	0.58	8.17	0.00	1
1.51756	13.15	3.61	1.05	73.24	0.57	8.24	0.00	1
1.51918	14.04	3.58	1.37	72.08	0.56	8.30	0.00	1
1.51755	13.00	3.60	1.36	72.99	0.57	8.40	0.11	1

2 Exploratory Data Analysis

2.1 Density Plot

Here we will observe where the majority of the data lies in the glass dataset as far as Type is concerned. We can gather that the majority of the dataset lies with the glass type of 1 and the least with the glass type of 7.

3 Machine Learning Aspects

3.1 Principal Component Analysis

This is the result of PCA Clustering, not really the clearest result as far as grouping goes, but we can see that with respect to the Features, Calcium (Ca) and Silicon (Si) weigh in heavy as strong factors. However, it doesn’t look like it clusters very well, so we’ll move on.

3.2 Boruta

With Respect to Boruta, all the elements seem to be significant, with Magnesium (Mg) being the element with the most Importance.

3.3 Neural Network

3.3.1 Processing

# weights:  86
initial  value 320.359519 
iter  10 value 233.444547
iter  20 value 220.765256
iter  30 value 190.017092
iter  40 value 162.771210
iter  50 value 148.628254
iter  60 value 138.185821
iter  70 value 132.049234
iter  80 value 130.729057
iter  90 value 130.313783
iter 100 value 127.234913
final  value 127.234913 
stopped after 100 iterations

Neural Network 

153 samples
  9 predictor
  6 classes: '1', '2', '3', '5', '6', '7' 

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 153, 153, 153, 153, 153, 153, ... 
Resampling results across tuning parameters:

  size  decay  Accuracy   Kappa     
  1     0e+00  0.3432269  0.01771826
  1     1e-04  0.3316885  0.00000000
  1     1e-01  0.5173183  0.30886806
  3     0e+00  0.3769426  0.07835713
  3     1e-04  0.3427074  0.02351551
  3     1e-01  0.6117960  0.44800101
  5     0e+00  0.3928880  0.10353155
  5     1e-04  0.4029769  0.12527870
  5     1e-01  0.6462576  0.49375083

Accuracy was used to select the optimal model using  the largest value.
The final values used for the model were size = 5 and decay = 0.1.

[1] 57.38

nnet variable importance

  variables are sorted by maximum importance across the classes
   Overall      1      2      3      5      6      7
Al  100.00 100.00 100.00 100.00 100.00 100.00 100.00
Si   80.58  80.58  80.58  80.58  80.58  80.58  80.58
Na   62.65  62.65  62.65  62.65  62.65  62.65  62.65
Ca   61.73  61.73  61.73  61.73  61.73  61.73  61.73
K    58.03  58.03  58.03  58.03  58.03  58.03  58.03
Mg   46.33  46.33  46.33  46.33  46.33  46.33  46.33
Ba   24.66  24.66  24.66  24.66  24.66  24.66  24.66
Fe   21.03  21.03  21.03  21.03  21.03  21.03  21.03
RI    0.00   0.00   0.00   0.00   0.00   0.00   0.00

With Neural Networks, we recieved an accuracy of 57.38% and it took 27.73 seconds.

# weights:  81
initial  value 315.126100 
iter  10 value 233.753345
iter  20 value 228.440107
iter  30 value 220.226715
iter  40 value 158.324086
iter  50 value 152.916484
iter  60 value 151.559302
iter  70 value 148.368256
iter  80 value 138.709618
iter  90 value 133.401741
iter 100 value 129.353293
final  value 129.353293 
stopped after 100 iterations

Neural Network 

153 samples
  9 predictor
  6 classes: '1', '2', '3', '5', '6', '7' 

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 153, 153, 153, 153, 153, 153, ... 
Resampling results across tuning parameters:

  size  decay  Accuracy   Kappa      
  1     0e+00  0.3316885  0.000000000
  1     1e-04  0.3350927  0.009527837
  1     1e-01  0.5188706  0.312297669
  3     0e+00  0.3598209  0.052339418
  3     1e-04  0.4091221  0.126779812
  3     1e-01  0.6071120  0.437584884
  5     0e+00  0.3784116  0.080809862
  5     1e-04  0.4153113  0.123829701
  5     1e-01  0.6522854  0.504971562

Accuracy was used to select the optimal model using  the largest value.
The final values used for the model were size = 5 and decay = 0.1.

[1] 55.74

With Cutting out RI as a variable, our new Neural Network accuracy is 55.74%, which improved/decreased the accuracy of the model with all the variable by -1.64% and it took 30.42 seconds.

3.3.2 Neural Network Visualization

3.4 Elastic Net

Confusion Matrix and Statistics

          Reference
Prediction  1  2  3  5  6  7
         1 15 10  2  0  0  0
         2  2 16  2  1  0  0
         3  1  1  1  0  0  0
         5  0  0  0  1  0  0
         6  0  0  0  0  3  1
         7  0  2  0  0  0  5

Overall Statistics
                                          
               Accuracy : 0.6508          
                 95% CI : (0.5203, 0.7666)
    No Information Rate : 0.4603          
    P-Value [Acc > NIR] : 0.001802        
                                          
                  Kappa : 0.5055          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: 1 Class: 2 Class: 3 Class: 5 Class: 6 Class: 7
Sensitivity            0.8333   0.5517  0.20000  0.50000  1.00000  0.83333
Specificity            0.7333   0.8529  0.96552  1.00000  0.98333  0.96491
Pos Pred Value         0.5556   0.7619  0.33333  1.00000  0.75000  0.71429
Neg Pred Value         0.9167   0.6905  0.93333  0.98387  1.00000  0.98214
Prevalence             0.2857   0.4603  0.07937  0.03175  0.04762  0.09524
Detection Rate         0.2381   0.2540  0.01587  0.01587  0.04762  0.07937
Detection Prevalence   0.4286   0.3333  0.04762  0.01587  0.06349  0.11111
Balanced Accuracy      0.7833   0.7023  0.58276  0.75000  0.99167  0.89912

With Elastic Netting and using all variables, we get an accuracy of 65.08%.

glmnet variable importance

  variables are sorted by maximum importance across the classes
         1        2        3        5        6       7
RI 4.78319 21.08620 100.0000 51.72349 37.29281 98.1277
Fe 0.76078  0.86824   0.5037  1.21101  4.28371  0.2648
Al 0.59127  0.45683   0.4761  0.76456  0.36420  0.3954
K  0.17331  0.06841   0.2940  0.36151  0.73142  0.3703
Ba 0.40004  0.05669   0.2924  0.28617  0.52647  0.6483
Si 0.01988  0.20369   0.3040  0.17307  0.20788  0.5423
Na 0.05724  0.18925   0.1693  0.34089  0.39130  0.4152
Mg 0.21292  0.10712   0.2299  0.27110  0.09849  0.1631
Ca 0.06059  0.16359   0.1480  0.06375  0.00000  0.1416

3.5 cTree

Confusion Matrix and Statistics

          Reference
Prediction  1  2  3  5  6  7
         1 23  3  0  0  1  0
         2  6 14  0  0  1  0
         3  2  1  0  0  0  0
         5  0  1  0  0  0  0
         6  0  2  0  0  1  1
         7  1  0  0  0  1  5

Overall Statistics
                                          
               Accuracy : 0.6825          
                 95% CI : (0.5531, 0.7942)
    No Information Rate : 0.5079          
    P-Value [Acc > NIR] : 0.003767        
                                          
                  Kappa : 0.5165          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: 1 Class: 2 Class: 3 Class: 5 Class: 6 Class: 7
Sensitivity            0.7188   0.6667       NA       NA  0.25000  0.83333
Specificity            0.8710   0.8333  0.95238  0.98413  0.94915  0.96491
Pos Pred Value         0.8519   0.6667       NA       NA  0.25000  0.71429
Neg Pred Value         0.7500   0.8333       NA       NA  0.94915  0.98214
Prevalence             0.5079   0.3333  0.00000  0.00000  0.06349  0.09524
Detection Rate         0.3651   0.2222  0.00000  0.00000  0.01587  0.07937
Detection Prevalence   0.4286   0.3333  0.04762  0.01587  0.06349  0.11111
Balanced Accuracy      0.7949   0.7500       NA       NA  0.59958  0.89912

With a Classification Tree, we still have a measily 68.25%, let’s work towards better results.

3.6 Random Forest

It looks like we have pretty good feature importance with respect to random forest (or atleast the graph comes out nice)

Confusion Matrix and Statistics

          Reference
Prediction  1  2  3  5  6  7
         1 18  1  0  0  0  0
         2  4 15  0  3  0  0
         3  4  1  1  0  0  0
         5  0  0  0  2  0  0
         6  0  0  0  1  3  0
         7  0  2  0  0  0 11

Overall Statistics
                                          
               Accuracy : 0.7576          
                 95% CI : (0.6364, 0.8546)
    No Information Rate : 0.3939          
    P-Value [Acc > NIR] : 2.093e-09       
                                          
                  Kappa : 0.6772          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: 1 Class: 2 Class: 3 Class: 5 Class: 6 Class: 7
Sensitivity            0.6923   0.7895  1.00000  0.33333  1.00000   1.0000
Specificity            0.9750   0.8511  0.92308  1.00000  0.98413   0.9636
Pos Pred Value         0.9474   0.6818  0.16667  1.00000  0.75000   0.8462
Neg Pred Value         0.8298   0.9091  1.00000  0.93750  1.00000   1.0000
Prevalence             0.3939   0.2879  0.01515  0.09091  0.04545   0.1667
Detection Rate         0.2727   0.2273  0.01515  0.03030  0.04545   0.1667
Detection Prevalence   0.2879   0.3333  0.09091  0.03030  0.06061   0.1970
Balanced Accuracy      0.8337   0.8203  0.96154  0.66667  0.99206   0.9818

With Random Forest, we get an accuracy of 75.76%, which is okay, but nothing really to write home about.

3.7 C5 Bagging CART

Confusion Matrix and Statistics

   
     1  2  3  5  6  7
  1 17  3  1  0  0  0
  2  2 14  2  1  0  1
  3  0  2  3  0  0  0
  5  0  2  0  1  0  0
  6  0  1  0  0  2  1
  7  0  0  0  0  2 11

Overall Statistics
                                          
               Accuracy : 0.7273          
                 95% CI : (0.6036, 0.8297)
    No Information Rate : 0.3333          
    P-Value [Acc > NIR] : 7.091e-11       
                                          
                  Kappa : 0.6396          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: 1 Class: 2 Class: 3 Class: 5 Class: 6 Class: 7
Sensitivity            0.8947   0.6364  0.50000  0.50000  0.50000   0.8462
Specificity            0.9149   0.8636  0.96667  0.96875  0.96774   0.9623
Pos Pred Value         0.8095   0.7000  0.60000  0.33333  0.50000   0.8462
Neg Pred Value         0.9556   0.8261  0.95082  0.98413  0.96774   0.9623
Prevalence             0.2879   0.3333  0.09091  0.03030  0.06061   0.1970
Detection Rate         0.2576   0.2121  0.04545  0.01515  0.03030   0.1667
Detection Prevalence   0.3182   0.3030  0.07576  0.04545  0.06061   0.1970
Balanced Accuracy      0.9048   0.7500  0.73333  0.73438  0.73387   0.9042

With a Treebag or “C5 Bagged CART”, we recieve an accuracy of 72.73%. Not too bad.

3.8 Naive Bayes Prediction With Multiple ROC Chart

With a Naive Bayes Prediction, we recieved an average AUC of 0.7916917.

Here is the full table

Glass Type	AUC
1	0.8586207
2	0.6121324
3	0.7101449
5	0.9583333
6	0.7659574
7	0.8449612
Mean	0.7916917

Our Lowest AUC with respect to our Naive Bayes Model belonged to Glass Type: 2 with an AUC of 0.6121324.
Our Highest AUC with respect to our Naive Bayes belonged to Glass Type 5 with an AUC of 0.9583333.

Glass Machine Learning (UCI Dataset)

Benjamin Lott

January 30, 2017