Introduction:

Data_set: Car evaluation This Car evaluation dataset is taken from UCI Machine learning repository derived from simple hierarchical decision model

Total no. of Observations: 1728

Input Variable:

Output Variable:

Car_Evaluation attributes and its types

## 'data.frame':    1728 obs. of  7 variables:
##  $ buying     : Factor w/ 4 levels "high","low","med",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ maint      : Factor w/ 4 levels "high","low","med",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ doors      : Factor w/ 4 levels "2","3","4","5more": 1 1 1 1 1 1 1 1 1 1 ...
##  $ persons    : Factor w/ 3 levels "2","4","more": 1 1 1 1 1 1 1 1 1 2 ...
##  $ lug_boot   : Factor w/ 3 levels "big","med","small": 3 3 3 2 2 2 1 1 1 3 ...
##  $ safety     : Factor w/ 3 levels "high","low","med": 2 3 1 2 3 1 2 3 1 2 ...
##  $ Class.Value: Factor w/ 4 levels "acc","good","unacc",..: 3 3 3 3 3 3 3 3 3 3 ...

All the variables are factor variables

Car acceptability grouping

Car evaluation based on various parameters

Seating capacity vs Car Acceptability

Conclusion: Seating capacity is an important factor for customers in accepting or rejecting a car

Car safety vs Car Acceptability

Conclusion: Safety is an important factor for Customers in accepting or rejecting a car

Low safety cars are not accepted by the customers

Buying price vs Car Acceptability

Maintenance price vs Car Acceptability

No of doors vs Car Acceptability

Luggage boot size vs Car Acceptability

Read the car evaluation data from car_data.csv

Split the data 50:50 for Training and testing purpose

Since the data set has multiclass output function, C50 rules, c50 tree and rpart tree algorithms have been used to predict the model

Algorithm 1: RPART TREE

Number & proportion of class values for unacceptable, acceptable, good and vgood

## 
## unacc   acc  good vgood 
##  1210   384    69    65
## 
##      unacc        acc       good      vgood 
## 0.70023148 0.22222222 0.03993056 0.03761574

——————————————————————————————-

Rpart 1 - Build a rpart tree with safety and persons variable (based on the stacked bars)

Variable Importance

##  persons   safety 
## 85.35115 69.87636

Create a rpart tree with safety and persons variables and compare with the original data

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction unacc acc good vgood
##      unacc   516  82    0     0
##      acc      47 142    0     0
##      good      9  29    0     0
##      vgood     0  39    0     0
## 
## Overall Statistics
##                                           
##                Accuracy : 0.7616          
##                  95% CI : (0.7317, 0.7896)
##     No Information Rate : 0.662           
##     P-Value [Acc > NIR] : 1.242e-10       
##                                           
##                   Kappa : 0.4904          
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: unacc Class: acc Class: good Class: vgood
## Sensitivity                0.9021     0.4863          NA           NA
## Specificity                0.7192     0.9178     0.95602      0.95486
## Pos Pred Value             0.8629     0.7513          NA           NA
## Neg Pred Value             0.7895     0.7778          NA           NA
## Prevalence                 0.6620     0.3380     0.00000      0.00000
## Detection Rate             0.5972     0.1644     0.00000      0.00000
## Detection Prevalence       0.6921     0.2188     0.04398      0.04514
## Balanced Accuracy          0.8106     0.7021          NA           NA

Rpart3 - Removing the number of doors variable from the data following the same prediction as above

Variable Importance

##   safety  persons    maint lug_boot   buying 
## 95.38608 91.93787 57.18082 48.12396 30.21811

Confusion matrix

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction unacc acc good vgood
##      unacc   576  15    3     0
##      acc      14 164   18     3
##      good      0   0   26     7
##      vgood     0   8    0    30
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9213          
##                  95% CI : (0.9013, 0.9384)
##     No Information Rate : 0.6829          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.8349          
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: unacc Class: acc Class: good Class: vgood
## Sensitivity                0.9763     0.8770     0.55319      0.75000
## Specificity                0.9343     0.9483     0.99143      0.99029
## Pos Pred Value             0.9697     0.8241     0.78788      0.78947
## Neg Pred Value             0.9481     0.9654     0.97473      0.98789
## Prevalence                 0.6829     0.2164     0.05440      0.04630
## Detection Rate             0.6667     0.1898     0.03009      0.03472
## Detection Prevalence       0.6875     0.2303     0.03819      0.04398
## Balanced Accuracy          0.9553     0.9127     0.77231      0.87015

Algorithm 2: c50 RULES

Prediction of c50 rules with all the variables

## 
## Call:
## C5.0.default(x = train1[, -7], y = train1$Class.Value, rules = TRUE)
## 
## 
## C5.0 [Release 2.07 GPL Edition]      Sun Oct 18 03:05:55 2015
## -------------------------------
## 
## Class specified by attribute `outcome'
## 
## Read 864 cases (7 attributes) from undefined.data
## 
## Rules:
## 
## Rule 1: (292, lift 1.4)
##  safety = low
##  ->  class unacc  [0.997]
## 
## Rule 2: (282, lift 1.4)
##  persons = 2
##  ->  class unacc  [0.996]
## 
## Rule 3: (54, lift 1.4)
##  buying = vhigh
##  maint = high
##  ->  class unacc  [0.982]
## 
## Rule 4: (18, lift 1.3)
##  doors = 2
##  persons = more
##  lug_boot = small
##  ->  class unacc  [0.950]
## 
## Rule 5: (48/2, lift 1.3)
##  maint in {vhigh, high}
##  lug_boot = small
##  safety = med
##  ->  class unacc  [0.940]
## 
## Rule 6: (38/6, lift 1.2)
##  maint in {vhigh, high}
##  doors = 2
##  lug_boot = med
##  ->  class unacc  [0.825]
## 
## Rule 7: (428/83, lift 1.1)
##  buying in {vhigh, high}
##  ->  class unacc  [0.805]
## 
## Rule 8: (26, lift 4.4)
##  buying in {vhigh, high}
##  maint in {med, low}
##  persons in {4, more}
##  lug_boot = big
##  safety in {med, high}
##  ->  class acc  [0.964]
## 
## Rule 9: (22, lift 4.4)
##  buying = high
##  maint in {high, med, low}
##  persons in {4, more}
##  lug_boot = big
##  safety in {med, high}
##  ->  class acc  [0.958]
## 
## Rule 10: (45/2, lift 4.3)
##  buying in {vhigh, high}
##  maint in {med, low}
##  persons in {4, more}
##  safety = high
##  ->  class acc  [0.936]
## 
## Rule 11: (13, lift 4.3)
##  buying = low
##  maint = high
##  persons in {4, more}
##  safety = med
##  ->  class acc  [0.933]
## 
## Rule 12: (13, lift 4.3)
##  buying = med
##  maint = high
##  persons in {4, more}
##  safety = high
##  ->  class acc  [0.933]
## 
## Rule 13: (11, lift 4.2)
##  buying = high
##  maint = high
##  persons in {4, more}
##  safety = high
##  ->  class acc  [0.923]
## 
## Rule 14: (6, lift 4.0)
##  buying in {vhigh, high}
##  maint in {high, med, low}
##  doors = 3
##  persons = more
##  lug_boot = med
##  safety in {med, high}
##  ->  class acc  [0.875]
## 
## Rule 15: (4, lift 3.8)
##  buying = low
##  maint in {vhigh, high}
##  doors = 2
##  persons in {4, more}
##  lug_boot = med
##  safety in {med, high}
##  ->  class acc  [0.833]
## 
## Rule 16: (9/1, lift 3.8)
##  buying in {vhigh, high}
##  maint in {high, med, low}
##  doors in {4, 5more}
##  persons in {4, more}
##  lug_boot = med
##  safety = med
##  ->  class acc  [0.818]
## 
## Rule 17: (194/89, lift 2.5)
##  buying in {med, low}
##  persons in {4, more}
##  safety in {med, high}
##  ->  class acc  [0.541]
## 
## Rule 18: (14/1, lift 26.1)
##  buying = low
##  maint in {med, low}
##  persons in {4, more}
##  lug_boot in {big, med}
##  safety = med
##  ->  class good  [0.875]
## 
## Rule 19: (14/2, lift 24.2)
##  buying in {med, low}
##  maint = low
##  persons in {4, more}
##  lug_boot in {big, med}
##  safety = med
##  ->  class good  [0.813]
## 
## Rule 20: (8/1, lift 23.8)
##  buying in {med, low}
##  maint = low
##  persons in {4, more}
##  lug_boot = small
##  safety = high
##  ->  class good  [0.800]
## 
## Rule 21: (8/1, lift 23.8)
##  buying = low
##  maint in {med, low}
##  persons in {4, more}
##  lug_boot = small
##  safety = high
##  ->  class good  [0.800]
## 
## Rule 22: (34/5, lift 20.0)
##  buying in {med, low}
##  maint in {med, low}
##  persons in {4, more}
##  lug_boot in {big, med}
##  safety = high
##  ->  class vgood  [0.833]
## 
## Rule 23: (9/2, lift 17.5)
##  buying = low
##  maint = high
##  persons in {4, more}
##  lug_boot in {big, med}
##  safety = high
##  ->  class vgood  [0.727]
## 
## Default class: unacc
## 
## 
## Evaluation on training data (864 cases):
## 
##          Rules     
##    ----------------
##      No      Errors
## 
##      23   12( 1.4%)   <<
## 
## 
##     (a)   (b)   (c)   (d)    <-classified as
##    ----  ----  ----  ----
##     607     3     1          (a): class unacc
##       1   183     2     2    (b): class acc
##                  26     3    (c): class good
##                        36    (d): class vgood
## 
## 
##  Attribute usage:
## 
##   71.99% buying
##   70.02% safety
##   66.20% persons
##   34.95% maint
##   25.58% lug_boot
##    8.22% doors
## 
## 
## Time: 0.0 secs
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction unacc acc good vgood
##      unacc   596   3    0     0
##      acc       3 184    7     2
##      good      0   0   34     6
##      vgood     0   0    0    29
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9757          
##                  95% CI : (0.9631, 0.9849)
##     No Information Rate : 0.6933          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9479          
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: unacc Class: acc Class: good Class: vgood
## Sensitivity                0.9950     0.9840     0.82927      0.78378
## Specificity                0.9887     0.9823     0.99271      1.00000
## Pos Pred Value             0.9950     0.9388     0.85000      1.00000
## Neg Pred Value             0.9887     0.9955     0.99150      0.99042
## Prevalence                 0.6933     0.2164     0.04745      0.04282
## Detection Rate             0.6898     0.2130     0.03935      0.03356
## Detection Prevalence       0.6933     0.2269     0.04630      0.03356
## Balanced Accuracy          0.9918     0.9831     0.91099      0.89189

C50 tree with variables- buying and maint price, safety and seating capacity

## 
## Call:
## C5.0.default(x = train1[, c(-3, -5, -7)], y = train1$Class.Value)
## 
## 
## C5.0 [Release 2.07 GPL Edition]      Sun Oct 18 03:05:55 2015
## -------------------------------
## 
## Class specified by attribute `outcome'
## 
## Read 864 cases (5 attributes) from undefined.data
## 
## Decision tree:
## 
## safety = low: unacc (292)
## safety in {med,high}:
## :...persons = 2: unacc (195)
##     persons in {4,more}:
##     :...buying in {vhigh,high}:
##         :...maint = vhigh: unacc (45)
##         :   maint in {high,med,low}:
##         :   :...safety = med: unacc (71/29)
##         :       safety = high:
##         :       :...maint in {med,low}: acc (45/2)
##         :           maint = high:
##         :           :...buying = vhigh: unacc (11)
##         :               buying = high: acc (11)
##         buying in {med,low}:
##         :...safety = high:
##             :...maint = vhigh: acc (29/1)
##             :   maint in {med,low}: vgood (48/19)
##             :   maint = high:
##             :   :...buying = med: acc (13)
##             :       buying = low: vgood (11/4)
##             safety = med:
##             :...buying = med: acc (46/19)
##                 buying = low:
##                 :...maint = high: acc (13)
##                     maint in {med,low}: good (23/10)
##                     maint = vhigh:
##                     :...persons = 4: unacc (7/2)
##                         persons = more: acc (4)
## 
## 
## Evaluation on training data (864 cases):
## 
##      Decision Tree   
##    ----------------  
##    Size      Errors  
## 
##      16   86(10.0%)   <<
## 
## 
##     (a)   (b)   (c)   (d)    <-classified as
##    ----  ----  ----  ----
##     590    18     2     1    (a): class unacc
##      31   139     8    10    (b): class acc
##             4    13    12    (c): class good
##                        36    (d): class vgood
## 
## 
##  Attribute usage:
## 
##  100.00% safety
##   66.20% persons
##   43.63% buying
##   38.31% maint
## 
## 
## Time: 0.0 secs
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction unacc acc good vgood
##      unacc   575  20    0     4
##      acc      40 134   12    10
##      good      0   9   13    18
##      vgood     0   0    0    29
## 
## Overall Statistics
##                                          
##                Accuracy : 0.8692         
##                  95% CI : (0.8449, 0.891)
##     No Information Rate : 0.7118         
##     P-Value [Acc > NIR] : < 2.2e-16      
##                                          
##                   Kappa : 0.7157         
##  Mcnemar's Test P-Value : NA             
## 
## Statistics by Class:
## 
##                      Class: unacc Class: acc Class: good Class: vgood
## Sensitivity                0.9350     0.8221     0.52000      0.47541
## Specificity                0.9036     0.9116     0.96782      1.00000
## Pos Pred Value             0.9599     0.6837     0.32500      1.00000
## Neg Pred Value             0.8491     0.9566     0.98544      0.96168
## Prevalence                 0.7118     0.1887     0.02894      0.07060
## Detection Rate             0.6655     0.1551     0.01505      0.03356
## Detection Prevalence       0.6933     0.2269     0.04630      0.03356
## Balanced Accuracy          0.9193     0.8668     0.74391      0.73770

SUMMARY

Algorithm Variables Accuracy
rpart safety+Persons 0.7616
rpart all 0.941
rpart all except doors 0.9213
c50rules all 0.9757
c50tree buying+maint+safety+persons 0.8692

INFERENCE

  • All the variables are importance for customers in assessing whether the car is in acceptable or unacceptable range
  • Safety and Seating capacity are two main factors in rejecting the cars as unacceptable
  • Number of doors are the least important variable in deciding the class value of the car