Applying an ML Algorithm is Not Enough!

Previously on…

We discussed the MSE and how it can be decomposed into prediction variance and bias. We also highlighted that its major drawback is its inability to handle outliers.

In these cases, it might be wise to employ a performance measure that accounts for this particular data structure. An alternative could be the Mean Absolute Deviation (MAD).

\[ MAD= \frac{1}{n}\sum_{i=1}^n |y_i- \hat{f}(x_i)| \]

where \(y_i\) represents the observed value of the dependent variable, \(\hat{f}(x_i)\) is its prediction that depends on a set of covariates \(x_i\), \(n\) is the sample size, and \(|\cdot|\) denotes the absolute value.

When should it be used? - For prediction evaluation in contexts where it’s not uncommon to observe data affected by extreme values (e.g., stock price analysis, monitoring environmental sensor measurements, fraud analysis, or air pollution measurement).

How do we interpret?

Why is it widely used? - The MAD is less sensitive to outliers compared to the Mean Square Error (MSE). If the dataset is expected to contain anomalies or extreme values, using MAD is preferable.

Drawbacks

Comparison of the squared error (right) with the absolute error (left) showing how the latter places much less emphasis on large errors and hence is more robust to outliers and mislabeled data points.
Comparison of the squared error (right) with the absolute error (left) showing how the latter places much less emphasis on large errors and hence is more robust to outliers and mislabeled data points.