Data_set: Car evaluation This Car evaluation dataset is taken from UCI Machine learning repository derived from simple hierarchical decision model
Total no. of Observations: 1728
Input Variable:
Output Variable:
## 'data.frame': 1728 obs. of 7 variables:
## $ buying : Factor w/ 4 levels "high","low","med",..: 4 4 4 4 4 4 4 4 4 4 ...
## $ maint : Factor w/ 4 levels "high","low","med",..: 4 4 4 4 4 4 4 4 4 4 ...
## $ doors : Factor w/ 4 levels "2","3","4","5more": 1 1 1 1 1 1 1 1 1 1 ...
## $ persons : Factor w/ 3 levels "2","4","more": 1 1 1 1 1 1 1 1 1 2 ...
## $ lug_boot : Factor w/ 3 levels "big","med","small": 3 3 3 2 2 2 1 1 1 3 ...
## $ safety : Factor w/ 3 levels "high","low","med": 2 3 1 2 3 1 2 3 1 2 ...
## $ Class.Value: Factor w/ 4 levels "acc","good","unacc",..: 3 3 3 3 3 3 3 3 3 3 ...
All the variables are factor variables
Conclusion: Seating capacity is an important factor for customers in accepting or rejecting a car
Conclusion: Safety is an important factor for Customers in accepting or rejecting a car
Low safety cars are not accepted by the customers
Since the data set has multiclass output function, C50 rules, c50 tree and rpart tree algorithms have been used to predict the model
Number & proportion of class values for unacceptable, acceptable, good and vgood
##
## unacc acc good vgood
## 1210 384 69 65
##
## unacc acc good vgood
## 0.70023148 0.22222222 0.03993056 0.03761574
Variable Importance
## persons safety
## 85.35115 69.87636
Create a rpart tree with safety and persons variables and compare with the original data
## Confusion Matrix and Statistics
##
## Reference
## Prediction unacc acc good vgood
## unacc 516 82 0 0
## acc 47 142 0 0
## good 9 29 0 0
## vgood 0 39 0 0
##
## Overall Statistics
##
## Accuracy : 0.7616
## 95% CI : (0.7317, 0.7896)
## No Information Rate : 0.662
## P-Value [Acc > NIR] : 1.242e-10
##
## Kappa : 0.4904
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: unacc Class: acc Class: good Class: vgood
## Sensitivity 0.9021 0.4863 NA NA
## Specificity 0.7192 0.9178 0.95602 0.95486
## Pos Pred Value 0.8629 0.7513 NA NA
## Neg Pred Value 0.7895 0.7778 NA NA
## Prevalence 0.6620 0.3380 0.00000 0.00000
## Detection Rate 0.5972 0.1644 0.00000 0.00000
## Detection Prevalence 0.6921 0.2188 0.04398 0.04514
## Balanced Accuracy 0.8106 0.7021 NA NA
Variable Importance
## safety persons maint lug_boot buying
## 95.38608 91.93787 57.18082 48.12396 30.21811
Confusion matrix
## Confusion Matrix and Statistics
##
## Reference
## Prediction unacc acc good vgood
## unacc 576 15 3 0
## acc 14 164 18 3
## good 0 0 26 7
## vgood 0 8 0 30
##
## Overall Statistics
##
## Accuracy : 0.9213
## 95% CI : (0.9013, 0.9384)
## No Information Rate : 0.6829
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.8349
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: unacc Class: acc Class: good Class: vgood
## Sensitivity 0.9763 0.8770 0.55319 0.75000
## Specificity 0.9343 0.9483 0.99143 0.99029
## Pos Pred Value 0.9697 0.8241 0.78788 0.78947
## Neg Pred Value 0.9481 0.9654 0.97473 0.98789
## Prevalence 0.6829 0.2164 0.05440 0.04630
## Detection Rate 0.6667 0.1898 0.03009 0.03472
## Detection Prevalence 0.6875 0.2303 0.03819 0.04398
## Balanced Accuracy 0.9553 0.9127 0.77231 0.87015
##
## Call:
## C5.0.default(x = train1[, -7], y = train1$Class.Value, rules = TRUE)
##
##
## C5.0 [Release 2.07 GPL Edition] Sun Oct 18 03:05:55 2015
## -------------------------------
##
## Class specified by attribute `outcome'
##
## Read 864 cases (7 attributes) from undefined.data
##
## Rules:
##
## Rule 1: (292, lift 1.4)
## safety = low
## -> class unacc [0.997]
##
## Rule 2: (282, lift 1.4)
## persons = 2
## -> class unacc [0.996]
##
## Rule 3: (54, lift 1.4)
## buying = vhigh
## maint = high
## -> class unacc [0.982]
##
## Rule 4: (18, lift 1.3)
## doors = 2
## persons = more
## lug_boot = small
## -> class unacc [0.950]
##
## Rule 5: (48/2, lift 1.3)
## maint in {vhigh, high}
## lug_boot = small
## safety = med
## -> class unacc [0.940]
##
## Rule 6: (38/6, lift 1.2)
## maint in {vhigh, high}
## doors = 2
## lug_boot = med
## -> class unacc [0.825]
##
## Rule 7: (428/83, lift 1.1)
## buying in {vhigh, high}
## -> class unacc [0.805]
##
## Rule 8: (26, lift 4.4)
## buying in {vhigh, high}
## maint in {med, low}
## persons in {4, more}
## lug_boot = big
## safety in {med, high}
## -> class acc [0.964]
##
## Rule 9: (22, lift 4.4)
## buying = high
## maint in {high, med, low}
## persons in {4, more}
## lug_boot = big
## safety in {med, high}
## -> class acc [0.958]
##
## Rule 10: (45/2, lift 4.3)
## buying in {vhigh, high}
## maint in {med, low}
## persons in {4, more}
## safety = high
## -> class acc [0.936]
##
## Rule 11: (13, lift 4.3)
## buying = low
## maint = high
## persons in {4, more}
## safety = med
## -> class acc [0.933]
##
## Rule 12: (13, lift 4.3)
## buying = med
## maint = high
## persons in {4, more}
## safety = high
## -> class acc [0.933]
##
## Rule 13: (11, lift 4.2)
## buying = high
## maint = high
## persons in {4, more}
## safety = high
## -> class acc [0.923]
##
## Rule 14: (6, lift 4.0)
## buying in {vhigh, high}
## maint in {high, med, low}
## doors = 3
## persons = more
## lug_boot = med
## safety in {med, high}
## -> class acc [0.875]
##
## Rule 15: (4, lift 3.8)
## buying = low
## maint in {vhigh, high}
## doors = 2
## persons in {4, more}
## lug_boot = med
## safety in {med, high}
## -> class acc [0.833]
##
## Rule 16: (9/1, lift 3.8)
## buying in {vhigh, high}
## maint in {high, med, low}
## doors in {4, 5more}
## persons in {4, more}
## lug_boot = med
## safety = med
## -> class acc [0.818]
##
## Rule 17: (194/89, lift 2.5)
## buying in {med, low}
## persons in {4, more}
## safety in {med, high}
## -> class acc [0.541]
##
## Rule 18: (14/1, lift 26.1)
## buying = low
## maint in {med, low}
## persons in {4, more}
## lug_boot in {big, med}
## safety = med
## -> class good [0.875]
##
## Rule 19: (14/2, lift 24.2)
## buying in {med, low}
## maint = low
## persons in {4, more}
## lug_boot in {big, med}
## safety = med
## -> class good [0.813]
##
## Rule 20: (8/1, lift 23.8)
## buying in {med, low}
## maint = low
## persons in {4, more}
## lug_boot = small
## safety = high
## -> class good [0.800]
##
## Rule 21: (8/1, lift 23.8)
## buying = low
## maint in {med, low}
## persons in {4, more}
## lug_boot = small
## safety = high
## -> class good [0.800]
##
## Rule 22: (34/5, lift 20.0)
## buying in {med, low}
## maint in {med, low}
## persons in {4, more}
## lug_boot in {big, med}
## safety = high
## -> class vgood [0.833]
##
## Rule 23: (9/2, lift 17.5)
## buying = low
## maint = high
## persons in {4, more}
## lug_boot in {big, med}
## safety = high
## -> class vgood [0.727]
##
## Default class: unacc
##
##
## Evaluation on training data (864 cases):
##
## Rules
## ----------------
## No Errors
##
## 23 12( 1.4%) <<
##
##
## (a) (b) (c) (d) <-classified as
## ---- ---- ---- ----
## 607 3 1 (a): class unacc
## 1 183 2 2 (b): class acc
## 26 3 (c): class good
## 36 (d): class vgood
##
##
## Attribute usage:
##
## 71.99% buying
## 70.02% safety
## 66.20% persons
## 34.95% maint
## 25.58% lug_boot
## 8.22% doors
##
##
## Time: 0.0 secs
## Confusion Matrix and Statistics
##
## Reference
## Prediction unacc acc good vgood
## unacc 596 3 0 0
## acc 3 184 7 2
## good 0 0 34 6
## vgood 0 0 0 29
##
## Overall Statistics
##
## Accuracy : 0.9757
## 95% CI : (0.9631, 0.9849)
## No Information Rate : 0.6933
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9479
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: unacc Class: acc Class: good Class: vgood
## Sensitivity 0.9950 0.9840 0.82927 0.78378
## Specificity 0.9887 0.9823 0.99271 1.00000
## Pos Pred Value 0.9950 0.9388 0.85000 1.00000
## Neg Pred Value 0.9887 0.9955 0.99150 0.99042
## Prevalence 0.6933 0.2164 0.04745 0.04282
## Detection Rate 0.6898 0.2130 0.03935 0.03356
## Detection Prevalence 0.6933 0.2269 0.04630 0.03356
## Balanced Accuracy 0.9918 0.9831 0.91099 0.89189
##
## Call:
## C5.0.default(x = train1[, c(-3, -5, -7)], y = train1$Class.Value)
##
##
## C5.0 [Release 2.07 GPL Edition] Sun Oct 18 03:05:55 2015
## -------------------------------
##
## Class specified by attribute `outcome'
##
## Read 864 cases (5 attributes) from undefined.data
##
## Decision tree:
##
## safety = low: unacc (292)
## safety in {med,high}:
## :...persons = 2: unacc (195)
## persons in {4,more}:
## :...buying in {vhigh,high}:
## :...maint = vhigh: unacc (45)
## : maint in {high,med,low}:
## : :...safety = med: unacc (71/29)
## : safety = high:
## : :...maint in {med,low}: acc (45/2)
## : maint = high:
## : :...buying = vhigh: unacc (11)
## : buying = high: acc (11)
## buying in {med,low}:
## :...safety = high:
## :...maint = vhigh: acc (29/1)
## : maint in {med,low}: vgood (48/19)
## : maint = high:
## : :...buying = med: acc (13)
## : buying = low: vgood (11/4)
## safety = med:
## :...buying = med: acc (46/19)
## buying = low:
## :...maint = high: acc (13)
## maint in {med,low}: good (23/10)
## maint = vhigh:
## :...persons = 4: unacc (7/2)
## persons = more: acc (4)
##
##
## Evaluation on training data (864 cases):
##
## Decision Tree
## ----------------
## Size Errors
##
## 16 86(10.0%) <<
##
##
## (a) (b) (c) (d) <-classified as
## ---- ---- ---- ----
## 590 18 2 1 (a): class unacc
## 31 139 8 10 (b): class acc
## 4 13 12 (c): class good
## 36 (d): class vgood
##
##
## Attribute usage:
##
## 100.00% safety
## 66.20% persons
## 43.63% buying
## 38.31% maint
##
##
## Time: 0.0 secs
## Confusion Matrix and Statistics
##
## Reference
## Prediction unacc acc good vgood
## unacc 575 20 0 4
## acc 40 134 12 10
## good 0 9 13 18
## vgood 0 0 0 29
##
## Overall Statistics
##
## Accuracy : 0.8692
## 95% CI : (0.8449, 0.891)
## No Information Rate : 0.7118
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.7157
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: unacc Class: acc Class: good Class: vgood
## Sensitivity 0.9350 0.8221 0.52000 0.47541
## Specificity 0.9036 0.9116 0.96782 1.00000
## Pos Pred Value 0.9599 0.6837 0.32500 1.00000
## Neg Pred Value 0.8491 0.9566 0.98544 0.96168
## Prevalence 0.7118 0.1887 0.02894 0.07060
## Detection Rate 0.6655 0.1551 0.01505 0.03356
## Detection Prevalence 0.6933 0.2269 0.04630 0.03356
## Balanced Accuracy 0.9193 0.8668 0.74391 0.73770
| Algorithm | Variables | Accuracy |
|---|---|---|
| rpart | safety+Persons | 0.7616 |
| rpart | all | 0.941 |
| rpart | all except doors | 0.9213 |
| c50rules | all | 0.9757 |
| c50tree | buying+maint+safety+persons | 0.8692 |