0.0.1 Overview

Here we look at an example of Bias and one way to avoid it. It is challenging to find public data sets with individuals and gender, so we shall use the Hong Kong marathon results, which show clear differences in results over gender and age groups. We shall break these into three even groups - fast, medium, slow - and study how we can mitgate against bias due to age or gender.

The data is sourced from this link.

0.0.2 Segment all Data

First we look at a histogram of the finish time for all participants.

Now lets segment the groups into fast medium and slow.

Now lets look at the gender proportion of runners in each group. It is not really fair on the Females that a proportionally smaller group of them are in the fast group.

0.0.3 Resegment the data for the biased group

Now lets segment the groups into fast, medium and slow per gender. We still have equal size groups as a whole, but now have equal size groups within gender.

But now lets us look at our groups. We have a different threshold based on gender for each group, but we have equal proportions, so are no longer biased on the group.