Machine Learning Metrics Explained with Statistics

Introduction

Machine Learning metrics are quantitative measures used to evaluate how well a model performs.
They help answer questions like:
- How accurate is my model?
- Is it making more false alarms or missing key detections?

Understanding metrics is essential for comparing models, diagnosing errors, and improving reliability.
In this presentation, we will explore common classification metrics, their mathematical definitions, and visual representations that make them intuitive to understand.

The Confusion Matrix

The foundation of all classification metrics is the confusion matrix, which summarizes predictions versus actual outcomes.

	Predicted Positive	Predicted Negative
Actual Positive	True Positive (TP)	False Negative (FN)
Actual Negative	False Positive (FP)	True Negative (TN)

From these four values, we can compute accuracy, precision, recall, and more.

Mathematical Foundations – Part 1

\[ Accuracy = \frac{TP + TN}{TP + TN + FP + FN} \] \[ Precision = \frac{TP}{TP + FP}, \quad Recall = \frac{TP}{TP + FN} \]

Accuracy measures overall correctness, while Precision and Recall reveal how well the model identifies positives.

Mathematical Foundations – Part 2

\[ F1 = 2 \times \frac{Precision \times Recall}{Precision + Recall} \] \[ \text{TPR} = \frac{TP}{TP + FN}, \quad \text{FPR} = \frac{FP}{FP + TN} \]

The F1-score balances precision and recall, while the True Positive Rate (TPR) and False Positive Rate (FPR) are key for ROC analysis.

Visual Insights – Comparing Models

Probability Connections

\[ P(\text{Positive} \mid \text{Predicted Positive}) = \text{Precision},\quad P(\text{Predicted Positive} \mid \text{Positive}) = \text{Recall} \] These metrics are deeply rooted in conditional probability and Bayes’ Theorem.
They provide a bridge between statistics and machine learning, allowing us to interpret model predictions probabilistically.

ROC Curve Visualization

Example R Code

metrics <- function(tp, fp, fn) {
  precision <- tp / (tp + fp)
  recall    <- tp / (tp + fn)
  f1        <- 2 * precision * recall / (precision + recall)
  c(precision = round(precision, 3),
    recall    = round(recall, 3),
    f1        = round(f1, 3))
}
metrics(88, 12, 15)

## precision    recall        f1 
##     0.880     0.854     0.867

Conclusion

Machine learning metrics turn model performance into measurable insights.
Precision and Recall highlight the trade-off between exactness and coverage.
ROC curves visualize this balance across thresholds.

Mastering these metrics ensures that ML models are not only accurate but trustworthy and interpretable.