Metrics: MAD

Applying an ML Algorithm is Not Enough!

Previously on…

We discussed the MSE and how it can be decomposed into prediction variance and bias. We also highlighted that its major drawback is its inability to handle outliers.

In these cases, it might be wise to employ a performance measure that accounts for this particular data structure. An alternative could be the Mean Absolute Deviation (MAD).

\[ MAD= \frac{1}{n}\sum_{i=1}^n |y_i- \hat{f}(x_i)| \]

where \(y_i\) represents the observed value of the dependent variable, \(\hat{f}(x_i)\) is its prediction that depends on a set of covariates \(x_i\), \(n\) is the sample size, and \(|\cdot|\) denotes the absolute value.

When should it be used? - For prediction evaluation in contexts where it’s not uncommon to observe data affected by extreme values (e.g., stock price analysis, monitoring environmental sensor measurements, fraud analysis, or air pollution measurement).

How do we interpret?

A smaller MAD indicates that, on average, the model produces accurate predictions.
Conversely, a large MAD suggests that the model is not accurate, and something should be changed.

Why is it widely used? - The MAD is less sensitive to outliers compared to the Mean Square Error (MSE). If the dataset is expected to contain anomalies or extreme values, using MAD is preferable.

Drawbacks

However, unlike MSE, MAD is not analytically tractable and cannot be decomposed into bias and variance of predictions. This means that MAD doesn’t provide specific insights into the sources of error in the model or the components of prediction variability. Therefore, using MAD makes it more challenging to understand why our model performs well or not.

Comparison of the squared error (right) with the absolute error (left) showing how the latter places much less emphasis on large errors and hence is more robust to outliers and mislabeled data points.