Metrics: MSE

Applying an ML Algorithm is Not Enough!

In the context of statistical or machine learning models, the use of metrics is essential to evaluate the performance of our estimates. Let’s explore the main features of the Mean Square Error (MSE).

\[ MSE=\frac{1}{n}\sum_{i=1}^n (y_i-\hat{f}(x_i))^2, \]

where \(y_i\) represents the observed value of the dependent variable, and \(\hat{f}(x_i)\) is its prediction that depends on a set of covariates \(x_i\), and \(n\) is the sample size.

When should it be used?

The Mean Square Error is typically used to assess the accuracy of predictions in a Machine Learning model.

How do we interpret?

Small MSE values indicate accurate predictions.
Conversely, large MSE values act as a warning signal, suggesting that something needs to be changed in the model.

Why is it widely used?

It can be treated and studied analytically. We can decompose the MSE into two distinct components:

\[ MSE=\underbrace{\text{Variance}}_{A}+ \underbrace{\text{Bias}^2}_{B} \]

The variance refers to the amount by which \(\hat{f}(x_i)\) would change if we estimate it using different samples. In a perfect world, the estimate of \(f\) should not vary too much between different samples.
Bias refers to the error that is introduced by approximating a real-life problem.

In other words, the decomposition of MSE helps the analyst understand and pinpoint the reasons why the model is performing well or not.

Drawbacks

MSE comes with a significant drawback: squaring can magnify the impact of outliers (especially extreme values), so caution is needed when dealing with such cases. In such situations, MSE might not be the most appropriate option.

Graphical representation of MSE decomposition as the model complexity varies (typically, the more parameters we incorporate, the more complex the model becomes). The black line indicates the total error, which is the MSE. The red line represents the squared Bias, and the green line represents the Variance. The dashed line represents the complexity of the optimal model, which leads to achieving the lowest possible MSE.