Confusion / Error Matrix

When we get the data, after data cleaning, pre-processing and wrangling, the first step we do is to feed it to an outstanding model and of course, get output in probabilities. However, how can we measure the effectiveness of our model? Better, the effectiveness, better the performance and that is exactly what we want from our Model. The main problem with classification accuracy is that it hides the detail you need to better understand the performance of your classification model.

Classification accuracy can hide the detail you need to diagnose the performance of your model. This is where confusion matrix comes to help, A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known. It allows the visualization of the performance of an algorithm. It allows easy identification of confusion between classes e.g. one class is commonly mislabeled as the other. Most performance measures are computed from the confusion matrix.

Confusion matrix is a summary of prediction results on a classification problem. The number of correct and incorrect predictions are summarized with count values and broken down by each class. The confusion matrix shows the ways in which your classification model is confused when it makes predictions. It gives us insight not only into the errors being made by a classifier but more importantly the types of errors that are being made. The confusion matrix visualizes the accuracy of a classifier by comparing the actual and predicted classes. The binary confusion matrix is composed of squares:

Confusion Matrix

Definition of the Terms:

Positive (P) : Observation is positive (for example: is an apple).
Negative (N): Observation is not positive (for example: is not an apple).
True Positive (TP): Observation is positive, and is predicted to be positive.
False Negative (FN Type 2 Error): Observation is positive, but is predicted negative.
True Negative (TN): Observation is negative, and is predicted to be negative.
False Positive (FP Type 1 Error): Observation is negative, but is predicted positive.

How to Calculate a Confusion Matrix

Here, is systematic process for calculating a confusion Matrix in data mining
Step 1) First, you need to test dataset with its expected outcome values.
Step 2) Predict all the rows in the test dataset.
Step 3) Calculate the expected predictions and outcomes:
         The total of correct predictions of each class.
         The total of incorrect predictions of each class.
Step 4) After that, these numbers are organized in the below-given methods:
        Every row of the matrix links to a predicted class.
        Every column of the matrix corresponds with an actual class.
        The total counts of correct and incorrect classification are entered into the table.
        The sum of correct predictions for a class go into the predicted column and expected row for that class value.
        The sum of incorrect predictions for a class goes into the expected row for that class value and the predicted column for that specific class value.

Other Important Terms using a Confusion matrix

Sensitivity/Recall

Sensitivity can be defined as the ratio of the total number of correctly classified positive examples divide to the total number of positive examples. High Sensitivity/Recall indicates the class is correctly recognized (small number of FN).
\[Recall\quad =\quad \frac { TP }{ TP\quad +\quad FN }\]

Precision

To get the value of precision we divide the total number of correctly classified positive examples by the total number of predicted positive examples. High Precision indicates an example labeled as positive is indeed positive (small number of FP).
\[Precision\quad =\quad \frac { TP }{ TP\quad +\quad FP } \]

Accuracy

Classification Rate or Accuracy is given by the relation: \[Accuracy\quad =\quad \frac { TP\quad +\quad TN }{ TP+TN+FN+FP } \]

High recall, low precision: This means that most of the positive examples are correctly recognized (low FN) but there are many false positives.
Low recall, high precision: This shows that we miss a lot of positive examples (high FN) but those we predict as positive are indeed positive (low FP)

Roc Curve

Roc curve shows the true positive rates against the false positive rate at various cut points. It also demonstrates a trade-off between sensitivity (recall and specificity or the true negative rate).

F-measure

Since we have two measures (Precision and Recall) it helps to have a measurement that represents both of them. We calculate an F-measure which uses Harmonic Mean in place of Arithmetic Mean as it punishes the extreme values more.The F-Measure will always be nearer to the smaller value of Precision or Recall.

Here are pros/benefits of using a confusion matrix.

It shows how any classification model is confused when it makes predictions.
Confusion matrix not only gives you insight into the errors being made by your classifier but also types of errors that are being made.
This breakdown helps you to overcomes the limitation of using classification accuracy alone.
Every column of the confusion matrix represents the instances of that predicted class.
Each row of the confusion matrix represents the instances of the actual class.
It provides insight not only the errors which are made by a classifier but also errors that are being made.

Good reads & References :-
https://towardsdatascience.com/understanding-confusion-matrix-a9ad42dcfd62
https://www.geeksforgeeks.org/confusion-matrix-machine-learning/