Bartek Bielski
2020-04-26
The study of classification of types of glass was motivated by criminological investigation. At the scene of the crime, the glass left can be used as evidence, if it is correctly identified.
The data set was created by B. German at Central Research Establishment Home Office Forensic Science Service in UK. The application is supposed to recognize the type of glass, on basis of user input (chemical test results of a sample). Typical vector with glass sample chemical test results is: (1.51, 13.00, 3.5, 2, 72, 0.6, 8, 0, 0.1)
The dataset consists of 214 observations, with 9 variables (unit measurement: weight percent in corresponding oxide, as are attributes 4-10) and type of glass code.
library(mlbench)
data(Glass)
summary(Glass)
RI Na Mg Al
Min. :1.511 Min. :10.73 Min. :0.000 Min. :0.290
1st Qu.:1.517 1st Qu.:12.91 1st Qu.:2.115 1st Qu.:1.190
Median :1.518 Median :13.30 Median :3.480 Median :1.360
Mean :1.518 Mean :13.41 Mean :2.685 Mean :1.445
3rd Qu.:1.519 3rd Qu.:13.82 3rd Qu.:3.600 3rd Qu.:1.630
Max. :1.534 Max. :17.38 Max. :4.490 Max. :3.500
Si K Ca Ba
Min. :69.81 Min. :0.0000 Min. : 5.430 Min. :0.000
1st Qu.:72.28 1st Qu.:0.1225 1st Qu.: 8.240 1st Qu.:0.000
Median :72.79 Median :0.5550 Median : 8.600 Median :0.000
Mean :72.65 Mean :0.4971 Mean : 8.957 Mean :0.175
3rd Qu.:73.09 3rd Qu.:0.6100 3rd Qu.: 9.172 3rd Qu.:0.000
Max. :75.41 Max. :6.2100 Max. :16.190 Max. :3.150
Fe Type
Min. :0.00000 1:70
1st Qu.:0.00000 2:76
Median :0.00000 3:17
Mean :0.05701 5:13
3rd Qu.:0.10000 6: 9
Max. :0.51000 7:29
The Types of glass: (class attribute)
As glass recognition is typical classification problem, the random forest algorithm seems to be apriopriate. For presentation purposes the code for building the model was with “light” settings. In shiny app, the model was calculated with more advanced options.
require("mlbench") # Glass dataset is there
library(mlbench)
library(caret)
set.seed(56789)
data(Glass)
trainIndex <- createDataPartition(Glass$Type, p=0.7, list=FALSE); trainData <- Glass[trainIndex,];testData <- Glass[-trainIndex,]
rf <- train(data=trainData, Type ~ ., method = "rf", metric="Accuracy")
tres <- predict(rf, newdata = testData)
model.summary <- confusionMatrix(tres, testData$Type)
The overall accuracy is at level of 70%, with sensitivity different in different clasess waving from 50% up to 85% and specificity in range of 72% - 100%. With such properties, the modal can be treated as clue, but not as an evidence in any court case.
All files, ui.R, server.R and this presentation are in the following repository: https://github.com/bartriman/DataProducts_week4