Experiment 6: Naïve Bayes classifier

2) Find the accuracy and draw the confusion matrix.



1) Step 1 :Import the required packages.

library(e1071)

2) Step 2: Import the dataset.

m<-read.csv("C:/Users/pradeep/OneDrive/datasets/students_placement_data.csv")
head(m)
##   Roll.No Gender Section SSC.Percentage inter_Diploma_percentage
## 1       1      M       A          87.30                     65.3
## 2       2      F       B          89.00                     92.4
## 3       3      F       A          67.00                     68.0
## 4       4      M       A          71.00                     70.4
## 5       5      M       A          67.00                     65.5
## 6       6      M       A          81.26                     68.0
##   B.Tech_percentage Backlogs registered_for_.Placement_Training
## 1             40.00       18                                 NO
## 2             71.45        0                                yes
## 3             45.26       13                                yes
## 4             36.47       17                                yes
## 5             42.52       17                                yes
## 6             62.20        6                                yes
##   placement.status
## 1       Not placed
## 2           Placed
## 3       Not placed
## 4       Not placed
## 5       Not placed
## 6       Not placed

3) Step 3: Divide the data (117 observations) into training data and test data.

# divide the data into training data and test data.
n=nrow(m) # n is total number of rows.
set.seed(101)

# We use sample function to partition the data. Here 85 percent is training data and 15 percent is test data. Note that since "replace = TRUE", we may have a row sampled more than once.
data_index=sample(1:n, size = round(0.85*n),replace = TRUE)
train_data=m[data_index,]
test_data=m[-data_index,]

5) Build a model using naive bayes classifier.

stu_model1<-naiveBayes(placement.status~ Backlogs+Gender+B.Tech_percentage+SSC.Percentage+inter_Diploma_percentage, data=train_data)

6) Apply the model stu_model on our test data using predict function.

p<-predict(stu_model1,test_data,type="class")
print(p)
##  [1] Not placed Not placed Not placed Not placed Placed     Not placed
##  [7] Placed     Not placed Placed     Placed     Placed     Placed    
## [13] Placed     Not placed Placed     Not placed Not placed Placed    
## [19] Placed     Placed     Not placed Not placed Placed     Not placed
## [25] Placed     Not placed Not placed Placed     Placed     Not placed
## [31] Placed     Placed     Placed     Placed     Not placed Placed    
## [37] Not placed Placed     Not placed Not placed Not placed Placed    
## [43] Placed     Not placed Placed     Not placed Not placed Placed    
## [49] Placed    
## Levels: Not placed Placed

9) Find the accuracy of the model.

Accuracy of the model is number of correct predictions in test set divided by total number of samples in test set.

  • Note: In the diagonal element in the matrix t, there are correct predictions
print(sum(diag(t))/sum(t))
## [1] 0.7959184