Install the e1071 package:
install.packages("e1071", repos = "https://cran.rstudio.com")
## Installing package into 'C:/Users/joshu/Documents/R/win-library/3.4'
## (as 'lib' is unspecified)
## package 'e1071' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\joshu\AppData\Local\Temp\RtmpaUsMri\downloaded_packages
library(e1071)
## Warning: package 'e1071' was built under R version 3.4.4
Load Dataset
mammMasses <- read.csv("C:/Users/joshu/Documents/Datasets/mammMasses.csv")
str(mammMasses)
## 'data.frame': 829 obs. of 6 variables:
## $ BI_RADS: int 5 5 4 5 5 3 4 4 4 3 ...
## $ Age : int 67 58 28 57 76 42 36 60 54 52 ...
## $ Shape : int 3 4 1 1 1 2 3 2 1 3 ...
## $ Margin : int 5 5 1 5 4 1 1 1 1 4 ...
## $ Density: int 3 3 3 3 3 3 2 2 3 3 ...
## $ Class : int 1 1 0 1 1 1 0 0 0 0 ...
Create the Naive-Bayes Model using the naiveBayes function:
nb_model = naiveBayes(as.factor(Class) ~., data=mammMasses)
Display the conditional probabilities for each variable:
nb_model
##
## Naive Bayes Classifier for Discrete Predictors
##
## Call:
## naiveBayes.default(x = X, y = Y, laplace = laplace)
##
## A-priori probabilities:
## Y
## 0 1
## 0.5150784 0.4849216
##
## Conditional probabilities:
## BI_RADS
## Y [,1] [,2]
## 0 3.983607 0.5282739
## 1 4.703980 0.6429623
##
## Age
## Y [,1] [,2]
## 0 49.29742 13.70268
## 1 62.69403 12.35463
##
## Shape
## Y [,1] [,2]
## 0 2.100703 1.1012901
## 1 3.502488 0.9402307
##
## Margin
## Y [,1] [,2]
## 0 1.939110 1.384367
## 1 3.741294 1.168043
##
## Density
## Y [,1] [,2]
## 0 2.892272 0.3785458
## 1 2.940299 0.3180645
Use the predict function to use the model to classify observations based on the conditional probabilities:
modelPred <- predict(nb_model, mammMasses)
Create a confusion matrix:
cMatrix <- table(modelPred, mammMasses$Class)
Some More Output:
install.packages("Caret", repos="https://cran.rstudio")
## Installing package into 'C:/Users/joshu/Documents/R/win-library/3.4'
## (as 'lib' is unspecified)
## Warning: unable to access index for repository https://cran.rstudio/src/contrib:
## cannot open URL 'https://cran.rstudio/src/contrib/PACKAGES'
## Warning: package 'Caret' is not available (for R version 3.4.3)
## Warning: unable to access index for repository https://cran.rstudio/bin/windows/contrib/3.4:
## cannot open URL 'https://cran.rstudio/bin/windows/contrib/3.4/PACKAGES'
library(caret)
## Warning: package 'caret' was built under R version 3.4.4
## Loading required package: lattice
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.4.4
plot(cMatrix)
confusionMatrix(cMatrix)
## Confusion Matrix and Statistics
##
##
## modelPred 0 1
## 0 330 52
## 1 97 350
##
## Accuracy : 0.8203
## 95% CI : (0.7924, 0.8458)
## No Information Rate : 0.5151
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.6414
## Mcnemar's Test P-Value : 0.0003126
##
## Sensitivity : 0.7728
## Specificity : 0.8706
## Pos Pred Value : 0.8639
## Neg Pred Value : 0.7830
## Prevalence : 0.5151
## Detection Rate : 0.3981
## Detection Prevalence : 0.4608
## Balanced Accuracy : 0.8217
##
## 'Positive' Class : 0
##