knn algorithm is a supervised learning algorithm which is primarily used as a classification algorithm.
The data sets contains 6 variables in it. Each column contains a particular information which would help in knowing if service for a particular vehicle is needed or not.
The first 5 columns gives us reading about the vehicle and the 6th tells us if the vehicle requires service or not Two data sets are given :-
Train Data : This data set contains 6 variables. The first 5 being the data of the vehicle and the 6th variable being the result whether the car requires servicing or not.
Test Data : This data set is to check the quality of the algorithm built using train data.
library(caret)# package used to create confusion matrix
## Loading required package: lattice
## Loading required package: ggplot2
library(readr)# package used to read/import files
library(class)# package used for knn algorithm
serviceTestData <- read.csv("serviceTestData.csv")# assigning the test data to a variable
serviceTrainData <- read.csv("serviceTrainData.csv")# assigning the train data to a variable
#Viewing the data imported
View(serviceTestData)
View(serviceTrainData)
#Sructure of the data
str(serviceTestData)
## 'data.frame': 135 obs. of 6 variables:
## $ OilQual : num 45.77 4.99 4.99 106.39 104.39 ...
## $ EnginePerf : num 49.94 7.89 4.89 104.45 103.74 ...
## $ NormMileage: num 49.78 6.59 7.31 103.05 103.05 ...
## $ TyreWear : num 48.26 9.49 8.37 106.28 106.13 ...
## $ HVACwear : num 50.95 3.24 2.78 105.54 105.78 ...
## $ Service : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 2 2 1 ...
str(serviceTrainData)
## 'data.frame': 315 obs. of 6 variables:
## $ OilQual : num 103.4 26.8 62.4 45.5 104.4 ...
## $ EnginePerf : num 103.5 26.2 63.7 49.9 103.3 ...
## $ NormMileage: num 103.1 31.3 59.7 48.8 103.1 ...
## $ TyreWear : num 106.2 29.2 64.7 48.1 105.8 ...
## $ HVACwear : num 105.7 31.3 58.6 48 106.5 ...
## $ Service : Factor w/ 2 levels "No","Yes": 1 2 2 1 1 1 1 1 1 1 ...
#summarising the data
summary(serviceTrainData)
## OilQual EnginePerf NormMileage TyreWear
## Min. : 0.9872 Min. : 1.891 Min. : 3.359 Min. : 6.213
## 1st Qu.: 26.7655 1st Qu.: 27.418 1st Qu.: 31.260 1st Qu.: 29.036
## Median : 59.6633 Median : 59.741 Median : 57.221 Median : 60.304
## Mean : 59.6493 Mean : 60.306 Mean : 60.297 Mean : 61.759
## 3rd Qu.:104.3888 3rd Qu.:103.744 3rd Qu.:103.051 3rd Qu.:106.173
## Max. :106.4288 Max. :105.744 Max. :105.051 Max. :108.173
## HVACwear Service
## Min. : -1.72 No :232
## 1st Qu.: 31.34 Yes: 83
## Median : 60.62
## Mean : 60.39
## 3rd Qu.:105.54
## Max. :107.54
summary(serviceTestData)
## OilQual EnginePerf NormMileage TyreWear
## Min. : 2.597 Min. : 1.891 Min. : 3.589 Min. : 6.143
## 1st Qu.: 26.696 1st Qu.: 27.418 1st Qu.: 31.260 1st Qu.: 28.901
## Median : 61.023 Median : 61.501 Median : 59.351 Median : 61.304
## Mean : 58.629 Mean : 59.077 Mean : 59.118 Mean : 60.864
## 3rd Qu.:104.229 3rd Qu.:103.744 3rd Qu.:103.051 3rd Qu.:106.173
## Max. :106.389 Max. :105.744 Max. :105.051 Max. :108.173
## HVACwear Service
## Min. : -1.72 No :99
## 1st Qu.: 31.31 Yes:36
## Median : 62.62
## Mean : 58.99
## 3rd Qu.:105.33
## Max. :105.83
#Applying the knn algorithm and assigning it to a variable
#In the train and test part of the data assigned in the algorithm, 6th column is removed because it contains the result that if a service is required or not
predictedknn <- knn(train=serviceTrainData[,-6],
test=serviceTestData[,-6],
cl=serviceTrainData$Service,
k=3)
predictedknn
## [1] No No No No No No No Yes Yes No No No No No No No Yes
## [18] No No Yes Yes No Yes No No No No No No No No No No No
## [35] No Yes No No No No No Yes No Yes No No No Yes Yes No Yes
## [52] No Yes No No No No No No No No No No No Yes Yes Yes No
## [69] Yes No No Yes No No No No No No Yes Yes Yes Yes No Yes No
## [86] No Yes Yes Yes No No No Yes No Yes No No No No No No No
## [103] No No No Yes No No No No No Yes No Yes No Yes Yes No Yes
## [120] No No No No No Yes No No No No No No No Yes No No
## Levels: No Yes
#Creating a confusion matrix to know how many right predictions are done
conf_matrix <- confusionMatrix(data=predictedknn,serviceTestData$Service)
conf_matrix
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 99 0
## Yes 0 36
##
## Accuracy : 1
## 95% CI : (0.973, 1)
## No Information Rate : 0.7333
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 1
## Mcnemar's Test P-Value : NA
##
## Sensitivity : 1.0000
## Specificity : 1.0000
## Pos Pred Value : 1.0000
## Neg Pred Value : 1.0000
## Prevalence : 0.7333
## Detection Rate : 0.7333
## Detection Prevalence : 0.7333
## Balanced Accuracy : 1.0000
##
## 'Positive' Class : No
##