Building up a predictive model for prediction of defects in steel

Data Collection

Since I do not have access to industry data, I took data from the machine learning repository and the link for the same is http://archive.ics.uci.edu/ml/machine-learning-databases/00198/

The data contains two parts:

  1. The first part is the data regarding 27 different features(no. of columns) of steel production process. Total number of observations is 1941(no. of rows)

A header of this data which contains first 10 observations(out of 1941) are shown below

##    X_Minimum X_Maximum Y_Minimum Y_Maximum Pixels_Areas X_Perimeter
## 1         42        50    270900    270944          267          17
## 2        645       651   2538079   2538108          108          10
## 3        829       835   1553913   1553931           71           8
## 4        853       860    369370    369415          176          13
## 5       1289      1306    498078    498335         2409          60
## 6        430       441    100250    100337          630          20
## 7        413       446    138468    138883         9052         230
## 8        190       200    210936    210956          132          11
## 9        330       343    429227    429253          264          15
## 10        74        90    779144    779308         1506          46
##    Y_Perimeter Sum_of_Luminosity Minimum_of_Luminosity
## 1           44             24220                    76
## 2           30             11397                    84
## 3           19              7972                    99
## 4           45             18996                    99
## 5          260            246930                    37
## 6           87             62357                    64
## 7          432           1481991                    23
## 8           20             20007                   124
## 9           26             29748                    53
## 10         167            180215                    53
##    Maximum_of_Luminosity Length_of_Conveyer TypeOfSteel_A300
## 1                    108               1687                1
## 2                    123               1687                1
## 3                    125               1623                1
## 4                    126               1353                0
## 5                    126               1353                0
## 6                    127               1387                0
## 7                    199               1687                0
## 8                    172               1687                0
## 9                    148               1687                0
## 10                   143               1687                0
##    TypeOfSteel_A400 Steel_Plate_Thickness Edges_Index Empty_Index
## 1                 0                    80      0.0498      0.2415
## 2                 0                    80      0.7647      0.3793
## 3                 0                   100       0.971      0.3426
## 4                 1                   290      0.7287      0.4413
## 5                 1                   185      0.0695      0.4486
## 6                 1                    40        0.62      0.3417
## 7                 1                   150      0.4896       0.339
## 8                 1                   150      0.2253        0.34
## 9                 1                   150      0.3912      0.2189
## 10                1                   150      0.0877      0.4261
##    Square_Index Outside_X_Index Edges_X_Index Edges_Y_Index
## 1        0.1818          0.0047        0.4706             1
## 2        0.2069          0.0036           0.6        0.9667
## 3        0.3333          0.0037          0.75        0.9474
## 4        0.1556          0.0052        0.5385             1
## 5        0.0662          0.0126        0.2833        0.9885
## 6        0.1264          0.0079          0.55             1
## 7        0.0795          0.0196        0.1435        0.9607
## 8           0.5          0.0059        0.9091             1
## 9           0.5          0.0077        0.8667             1
## 10       0.0976          0.0095        0.3478         0.982
##    Outside_Global_Index LogOfAreas Log_X_Index Log_Y_Index
## 1                     1     2.4265      0.9031      1.6435
## 2                     1     2.0334      0.7782      1.4624
## 3                     1     1.8513      0.7782      1.2553
## 4                     1     2.2455      0.8451      1.6532
## 5                     1     3.3818      1.2305      2.4099
## 6                     1     2.7993      1.0414      1.9395
## 7                     1     3.9567      1.5185      2.6181
## 8                     1     2.1206           1       1.301
## 9                     1     2.4216      1.1139       1.415
## 10                    1     3.1778      1.2041      2.2148
##    Orientation_Index Luminosity_Index SigmoidOfAreas
## 1             0.8182          -0.2913         0.5822
## 2             0.7931          -0.1756         0.2984
## 3             0.6667          -0.1228          0.215
## 4             0.8444          -0.1568         0.5212
## 5             0.9338          -0.1992              1
## 6             0.8736          -0.2267         0.9874
## 7             0.9205           0.2791              1
## 8                0.5           0.1841         0.3359
## 9                0.5          -0.1197         0.5593
## 10            0.9024          -0.0651              1
  1. The second part of the data is the defects. There are 7 type of defects names of which are
## [1] "Pastry"       "Z_Scratch"    "K_Scatch"     "Stains"      
## [5] "Dirtiness"    "Bumps"        "Other_Faults"

Against each observations of part ‘a’, one type of defect (out of these 7) is marked.

Our goal will be to develop a predictive model which will tell us which type of defect is expected with a given set of features.