Introduction:

Data_set: Car evaluation This Car evaluation dataset is taken from UCI Machine learning repository derived from simple hierarchical decision model

Total no. of Observations: 1728

Input Variable:

Buying price (vhigh, high, med, low)
Price of the maintenance (vhigh, high, med, low)
Number of doors (2, 3, 4, 5more)
Persons capacity in terms of persons to carry (2, 4, more)
Size of luggage boot (small, med, big)
Estimated safety of the car (low, med, high)

Output Variable:

Car acceptability (unacc, acc, good, vgood)

Car_Evaluation attributes and its types

## 'data.frame':    1728 obs. of  7 variables:
##  $ buying     : Factor w/ 4 levels "high","low","med",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ maint      : Factor w/ 4 levels "high","low","med",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ doors      : Factor w/ 4 levels "2","3","4","5more": 1 1 1 1 1 1 1 1 1 1 ...
##  $ persons    : Factor w/ 3 levels "2","4","more": 1 1 1 1 1 1 1 1 1 2 ...
##  $ lug_boot   : Factor w/ 3 levels "big","med","small": 3 3 3 2 2 2 1 1 1 3 ...
##  $ safety     : Factor w/ 3 levels "high","low","med": 2 3 1 2 3 1 2 3 1 2 ...
##  $ Class.Value: Factor w/ 4 levels "acc","good","unacc",..: 3 3 3 3 3 3 3 3 3 3 ...

All the variables are factor variables

Car acceptability grouping

Car evaluation based on various parameters

Seating capacity vs Car Acceptability

Conclusion: Seating capacity is an important factor for customers in accepting or rejecting a car

Car safety vs Car Acceptability

Conclusion: Safety is an important factor for Customers in accepting or rejecting a car

Low safety cars are not accepted by the customers

Buying price vs Car Acceptability

Maintenance price vs Car Acceptability

No of doors vs Car Acceptability

Luggage boot size vs Car Acceptability

Read the car evaluation data from car_data.csv

Split the data 50:50 for Training and testing purpose

Since the data set has multiclass output function, C50 rules, c50 tree and rpart tree algorithms have been used to predict the model

Algorithm 1: RPART TREE

Number & proportion of class values for unacceptable, acceptable, good and vgood

## 
## unacc   acc  good vgood 
##  1210   384    69    65

## 
##      unacc        acc       good      vgood 
## 0.70023148 0.22222222 0.03993056 0.03761574

——————————————————————————————-

Rpart 1 - Build a rpart tree with safety and persons variable (based on the stacked bars)

Variable Importance

##  persons   safety 
## 85.35115 69.87636

Create a rpart tree with safety and persons variables and compare with the original data

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction unacc acc good vgood
##      unacc   516  82    0     0
##      acc      47 142    0     0
##      good      9  29    0     0
##      vgood     0  39    0     0
## 
## Overall Statistics
##                                           
##                Accuracy : 0.7616          
##                  95% CI : (0.7317, 0.7896)
##     No Information Rate : 0.662           
##     P-Value [Acc > NIR] : 1.242e-10       
##                                           
##                   Kappa : 0.4904          
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: unacc Class: acc Class: good Class: vgood
## Sensitivity                0.9021     0.4863          NA           NA
## Specificity                0.7192     0.9178     0.95602      0.95486
## Pos Pred Value             0.8629     0.7513          NA           NA
## Neg Pred Value             0.7895     0.7778          NA           NA
## Prevalence                 0.6620     0.3380     0.00000      0.00000
## Detection Rate             0.5972     0.1644     0.00000      0.00000
## Detection Prevalence       0.6921     0.2188     0.04398      0.04514
## Balanced Accuracy          0.8106     0.7021          NA           NA

Rpart3 - Removing the number of doors variable from the data following the same prediction as above