Error Metrics Cheatsheet
Estimator Prediction Error
Here sample variance is the be-all-and-end-all of understanding estimator prediction error.
…variability in our estimates is what makes them not imprecise. An important aspect of statistics is quantifying the variability in our estimates.
- Brian Caffo, JHU Data Science Specialization/Statistical Inference/Module 2/Variability
An imprecise point estimate alone is essentially meaningless. Without variability across samples, we would have no way to detect or define imprecision. The very process of measuring precision fundamentally depends on variability. Rather than being an enemy of precision, the standard error is what enables us to understand and quantify it.
We can typically reduce the standard error in a point estimator by increasing the input sample size as shown in this short study
Name | Description | Definition/Formula |
---|---|---|
MSE: Mean Squared Error | Avg. squared error between actual and prediction. Used in estimation (frequentist analysis) and regression | \[\mathit{MSE}=\frac{1}{n} \sum_i (y_i-\hat y_i)^2 = \mathbb E\!\big[(\widehat{\theta}-\theta)^2\big]\] \[\mathit{MSE}(\widehat{\theta}) = \underbrace{\mathbb Var(\widehat{\theta})}_{\mathbb E[(\widehat{\theta}-\mathbb E[\widehat{\theta}])^2]} + \underbrace{\Big(\mathbb E[\widehat{\theta}] - \theta\Big)^2}_{\text{Bias}(\widehat{\theta})^2} \] |
RMSE: Root Mean Squared Error | Avg. squared error between actual and prediction. MSE represented in the same units as the target parameter | \[\mathit{RMSE}=\sqrt{MSE}=\sqrt{\frac{1}{n} \sum_i (y_i-\hat y_i)^2}\] |
MAE: Mean Average Error | Robust measure (less influenced by outliers), penalises big errors more heavily than \(\mathit{RMSE}\) | \[\mathit{MAE}=\frac{1}{n} \sum_i |y_i-\hat y_i|\] |
Variance | Spread of data around its mean | \[\text{Var}(X)=\frac{1}{n-1} \sum_i (x_i-\bar{x})^2\qquad\text{i.i.d. dof }=n-1\] |
Standard Deviation (SD) | Spread of data in original units | \[\sigma=\sqrt{\text{Var}(X)}\] |
Bias | Systematic error | \[\mathit{Bias}=\frac{1}{n} \sum_i (y_i-\hat y_i)\] |
SE: Standard Error (doesn’t consider bias) | Variability of an estimated statistic \(\hat{\theta}\) with repeat experiment | \[\mathit{SE}=\frac{\sigma}{\sqrt{n}}\] |
Metric | What It Measures | Formula | Use Case |
---|---|---|---|
R^2 | Proportion of target variance explained by the model | \(1-\frac{\sum_i (y_i-\hat y_i)^2}{\sum_i (y_i-\bar y)^2}\) | Evaluating model fit |
Coefficient of Variation (CV) | Ratio of SD to the Mean | \(\frac{\sigma}{\bar{x}}\) | Compare variability across units or scales |
Common Statistical Relationships
Here are key relationships between common statistical metrics:
Bias–Variance Decomposition for MSE
\[ \mathbb E[(y-\hat y)^2] = [\mathbb E(\hat y) - y]^2 + \mathbb Var(\hat y) + \sigma^2 \]Root Mean Squared Error
\[ RMSE = \sqrt{MSE} \]Standard Deviation and Variance
\[ SD = \sqrt{Var(X)} \]Standard Error of the Mean
\[ SE(\bar{x}) = \frac{SD}{\sqrt{n}} \]R-squared and Mean Squared Error
\[ R^2 = 1 - \frac{MSE}{Var(Y)} \]Coefficient of Variation
\[ CV = \frac{SD}{\bar{x}} \]
Summary
- MSE captures error (and can be broken down into bias and variance).
- SD, Variance, SE, and CV measure spread and precision.
- R² relates prediction error (MSE) to total variability.
Key ROC Metrics
Metric | Formula | Description |
---|---|---|
True Positive Rate (TPR) / Sensitivity | \(TPR = \frac{TP}{TP + FN}\) | Proportion of actual positives correctly identified |
False Positive Rate (FPR) | \(FPR = \frac{FP}{FP + TN}\) | Proportion of actual negatives incorrectly identified |
True Negative Rate (TNR) / Specificity | \(TNR = \frac{TN}{TN + FP}\) | Proportion of actual negatives correctly identified |
False Negative Rate (FNR) | \(FNR = \frac{FN}{FN + TP}\) | Proportion of actual positives missed |
ROC Curve | Plot of TPR vs FPR | Visualizes classifier performance at all thresholds |
AUC (Area Under Curve) | Value between 0 and 1 | Overall ability of the model to discriminate classes |
Binary Classification Metrics (Beyond ROC)
Metric | Formula | Use |
---|---|---|
Accuracy | $$ | Overall correctness |
Precision | $$ | Positive predictive value |
Recall (TPR) | $$ | Sensitivity |
F1 Score | $2 $ | Balance between precision & recall |
Log Loss | $- y_i (y_i) + (1 - y_i) (1 - y_i)$ | Penalizes false confidence |
Matthews Correlation Coefficient (MCC) | $$ | Balanced performance summary |
Regression Metrics
Metric | Formula | Use | ||
---|---|---|---|---|
MSE / RMSE | Already covered | Overall prediction error | ||
MAE | Already covered | Robust error | ||
R² / Adjusted R² | $R^2 = 1 - $ | Fit quality | ||
MAPE (Mean Absolute Percentage Error) | $\frac{100%}{n} | $ | Relative error — % | |
Huber Loss | Piecewise quadratic/linear loss | Robust to outliers | ||
Explained Variance | $(y)/(y)$ | Proportion of variance explained |
Model Selection / Complexity Metrics
Metric | Formula | Use |
---|---|---|
AIC | Already covered | Fit + complexity |
BIC | Already covered | More complex penalty |
Adjusted R² | $1 - ( )$ | Adjusts R² for # predictors |
Cross-Validation Error | Average out-of-fold error | Generalization estimation |
Probability & Calibration Metrics
Metric | Formula | Use |
---|---|---|
Brier Score | $ (y_i - p_i)^2$ | Probabilistic accuracy |
Calibration Curve | — | Checks how predicted probabilities align with reality |
Model Selection Criteria
Metric | Formula | Description |
---|---|---|
AIC | \(AIC = 2k - 2 \ln(\mathcal{L})\) | Akaike Information Criterion — balances fit and model complexity |
AICc | \(AIC_c = AIC + \frac{2k(k + 1)}{n - k - 1}\) | Corrected AIC — adjusts AIC for small sample sizes |
BIC | \(BIC = \ln(n) \cdot k - 2 \ln(\mathcal{L})\) | Bayesian Information Criterion — stronger penalty for model complexity |