Two types of probabilities
A-priori probabilities are probabilities computed
without the knowledge of predictors. Example: Given the above dataset,
what is the probability that a company is fraudulent? Out of 10
companies, 4 companies are fraudulent. Hence, this probability is 4/10 =
0.4.
Now we want to use the NB Classifier to classify companies based on
their predictors. We will use the whole dataset.
## predict probabilities
pred.prob <- predict(train.full.nb, newdata=company.df, type="raw")
pred.prob
## Fraudulent Truthful
## [1,] 0.5294 0.471
## [2,] 0.0698 0.930
## [3,] 0.3103 0.690
## [4,] 0.3103 0.690
## [5,] 0.0698 0.930
## [6,] 0.0698 0.930
## [7,] 0.5294 0.471
## [8,] 0.8710 0.129
## [9,] 0.3103 0.690
## [10,] 0.8710 0.129
## predict class membership
pred.class <- predict(train.full.nb, newdata=company.df)
pred.class
## [1] Fraudulent Truthful Truthful Truthful Truthful Truthful
## [7] Fraudulent Fraudulent Truthful Fraudulent
## Levels: Fraudulent Truthful
pred.df <- data.frame(pred.prob, pred.class)
complete.df <- cbind(company.df, pred.df)
################################
Display the result of classification
kab <- knitr::kable(complete.df, caption = "Classification Result",
booktabs = T, label = "Result table")
kable_classic_2(kab, full_width = T)
Classification Result
|
Trouble
|
Size
|
Status
|
Fraudulent
|
Truthful
|
pred.class
|
|
Yes
|
Small
|
Truthful
|
0.529
|
0.471
|
Fraudulent
|
|
No
|
Small
|
Truthful
|
0.070
|
0.930
|
Truthful
|
|
No
|
Large
|
Truthful
|
0.310
|
0.690
|
Truthful
|
|
No
|
Large
|
Truthful
|
0.310
|
0.690
|
Truthful
|
|
No
|
Small
|
Truthful
|
0.070
|
0.930
|
Truthful
|
|
No
|
Small
|
Truthful
|
0.070
|
0.930
|
Truthful
|
|
Yes
|
Small
|
Fraudulent
|
0.529
|
0.471
|
Fraudulent
|
|
Yes
|
Large
|
Fraudulent
|
0.871
|
0.129
|
Fraudulent
|
|
No
|
Large
|
Fraudulent
|
0.310
|
0.690
|
Truthful
|
|
Yes
|
Large
|
Fraudulent
|
0.871
|
0.129
|
Fraudulent
|
Display Confusion Matrix
confusionMatrix(predict(train.full.nb,newdata=company.df),company.df$Status )
## Confusion Matrix and Statistics
##
## Reference
## Prediction Fraudulent Truthful
## Fraudulent 3 1
## Truthful 1 5
##
## Accuracy : 0.8
## 95% CI : (0.444, 0.975)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 0.167
##
## Kappa : 0.583
##
## Mcnemar's Test P-Value : 1.000
##
## Sensitivity : 0.750
## Specificity : 0.833
## Pos Pred Value : 0.750
## Neg Pred Value : 0.833
## Prevalence : 0.400
## Detection Rate : 0.300
## Detection Prevalence : 0.400
## Balanced Accuracy : 0.792
##
## 'Positive' Class : Fraudulent
##