To predict if the car requires servicing or not using knn algorithm.

knn algorithm is a supervised learning algorithm which is primarily used as a classification algorithm.

Problem Statement

The data sets contains 6 variables in it. Each column contains a particular information which would help in knowing if service for a particular vehicle is needed or not.

The first 5 columns gives us reading about the vehicle and the 6th tells us if the vehicle requires service or not Two data sets are given :-

  1. Train Data : This data set contains 6 variables. The first 5 being the data of the vehicle and the 6th variable being the result whether the car requires servicing or not.

  2. Test Data : This data set is to check the quality of the algorithm built using train data.

library(caret)# package used to create confusion matrix
## Loading required package: lattice
## Loading required package: ggplot2
library(readr)# package used to read/import files
library(class)# package used for knn algorithm
serviceTestData <- read.csv("serviceTestData.csv")# assigning the test data to a variable 
serviceTrainData <- read.csv("serviceTrainData.csv")# assigning the train data to a variable
#Viewing the data imported
View(serviceTestData)
View(serviceTrainData)
#Sructure of the data
str(serviceTestData)
## 'data.frame':    135 obs. of  6 variables:
##  $ OilQual    : num  45.77 4.99 4.99 106.39 104.39 ...
##  $ EnginePerf : num  49.94 7.89 4.89 104.45 103.74 ...
##  $ NormMileage: num  49.78 6.59 7.31 103.05 103.05 ...
##  $ TyreWear   : num  48.26 9.49 8.37 106.28 106.13 ...
##  $ HVACwear   : num  50.95 3.24 2.78 105.54 105.78 ...
##  $ Service    : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 2 2 1 ...
str(serviceTrainData)
## 'data.frame':    315 obs. of  6 variables:
##  $ OilQual    : num  103.4 26.8 62.4 45.5 104.4 ...
##  $ EnginePerf : num  103.5 26.2 63.7 49.9 103.3 ...
##  $ NormMileage: num  103.1 31.3 59.7 48.8 103.1 ...
##  $ TyreWear   : num  106.2 29.2 64.7 48.1 105.8 ...
##  $ HVACwear   : num  105.7 31.3 58.6 48 106.5 ...
##  $ Service    : Factor w/ 2 levels "No","Yes": 1 2 2 1 1 1 1 1 1 1 ...
#summarising the data
summary(serviceTrainData)
##     OilQual           EnginePerf       NormMileage         TyreWear      
##  Min.   :  0.9872   Min.   :  1.891   Min.   :  3.359   Min.   :  6.213  
##  1st Qu.: 26.7655   1st Qu.: 27.418   1st Qu.: 31.260   1st Qu.: 29.036  
##  Median : 59.6633   Median : 59.741   Median : 57.221   Median : 60.304  
##  Mean   : 59.6493   Mean   : 60.306   Mean   : 60.297   Mean   : 61.759  
##  3rd Qu.:104.3888   3rd Qu.:103.744   3rd Qu.:103.051   3rd Qu.:106.173  
##  Max.   :106.4288   Max.   :105.744   Max.   :105.051   Max.   :108.173  
##     HVACwear      Service  
##  Min.   : -1.72   No :232  
##  1st Qu.: 31.34   Yes: 83  
##  Median : 60.62            
##  Mean   : 60.39            
##  3rd Qu.:105.54            
##  Max.   :107.54
summary(serviceTestData)
##     OilQual          EnginePerf       NormMileage         TyreWear      
##  Min.   :  2.597   Min.   :  1.891   Min.   :  3.589   Min.   :  6.143  
##  1st Qu.: 26.696   1st Qu.: 27.418   1st Qu.: 31.260   1st Qu.: 28.901  
##  Median : 61.023   Median : 61.501   Median : 59.351   Median : 61.304  
##  Mean   : 58.629   Mean   : 59.077   Mean   : 59.118   Mean   : 60.864  
##  3rd Qu.:104.229   3rd Qu.:103.744   3rd Qu.:103.051   3rd Qu.:106.173  
##  Max.   :106.389   Max.   :105.744   Max.   :105.051   Max.   :108.173  
##     HVACwear      Service 
##  Min.   : -1.72   No :99  
##  1st Qu.: 31.31   Yes:36  
##  Median : 62.62           
##  Mean   : 58.99           
##  3rd Qu.:105.33           
##  Max.   :105.83
#Applying the knn algorithm and assigning it to a variable
#In the train and test part of the data assigned in the algorithm, 6th column is removed because it contains the result that if a service is required or not
predictedknn <- knn(train=serviceTrainData[,-6],
                    test=serviceTestData[,-6],
                    cl=serviceTrainData$Service,
                    k=3)
predictedknn
##   [1] No  No  No  No  No  No  No  Yes Yes No  No  No  No  No  No  No  Yes
##  [18] No  No  Yes Yes No  Yes No  No  No  No  No  No  No  No  No  No  No 
##  [35] No  Yes No  No  No  No  No  Yes No  Yes No  No  No  Yes Yes No  Yes
##  [52] No  Yes No  No  No  No  No  No  No  No  No  No  No  Yes Yes Yes No 
##  [69] Yes No  No  Yes No  No  No  No  No  No  Yes Yes Yes Yes No  Yes No 
##  [86] No  Yes Yes Yes No  No  No  Yes No  Yes No  No  No  No  No  No  No 
## [103] No  No  No  Yes No  No  No  No  No  Yes No  Yes No  Yes Yes No  Yes
## [120] No  No  No  No  No  Yes No  No  No  No  No  No  No  Yes No  No 
## Levels: No Yes
#Creating a confusion matrix to know how many right predictions are done
conf_matrix <- confusionMatrix(data=predictedknn,serviceTestData$Service)
conf_matrix
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction No Yes
##        No  99   0
##        Yes  0  36
##                                     
##                Accuracy : 1         
##                  95% CI : (0.973, 1)
##     No Information Rate : 0.7333    
##     P-Value [Acc > NIR] : < 2.2e-16 
##                                     
##                   Kappa : 1         
##  Mcnemar's Test P-Value : NA        
##                                     
##             Sensitivity : 1.0000    
##             Specificity : 1.0000    
##          Pos Pred Value : 1.0000    
##          Neg Pred Value : 1.0000    
##              Prevalence : 0.7333    
##          Detection Rate : 0.7333    
##    Detection Prevalence : 0.7333    
##       Balanced Accuracy : 1.0000    
##                                     
##        'Positive' Class : No        
## 

The confusion matrix shows that the predictions done is all correct and hence the accuracy is 1.