2020-02-26 22:37:45
- “Man Is to Computer Programmer as Woman Is to Homemaker?”: word2vec gender biased.
- The problem here is that the training data used in machine learning applications can often contain all kinds of hidden (and not-so-hidden) biases, and the act of building complex models from such data can both amplify these biases and introduce new ones.
more optimistic subtitle: “Debiasing Word Embeddings
The algorithm is able to “subtract off” the bias in the data associated with non-genedered words.
But “what information should be allowed in a standard prediction”?
Contentious debate on using: race, gender, age, sex
But what if they are needed for “equity”/“fairness”
These questions do not have easy answers, and human judgments and norms will always need to play a central role in the debate.
And even if we settle on just one of them, the predictions made by a model obeying a fairness constraint will, as a general rule, always be less accurate than the predictions made by a model that doesn’t have to; the only question is how much less accurate.
Science can shed light on the pros and cons of different definitions, as we’ll see, but it can’t decide on right and wrong.
CASE I :
Push the cutoff back : But now to be “more fair” less accuracy : 8 mistakes
improving fairness will degrade accuracy, and vice versa.
CASE 2:
- This might indeed be a good thing to do, increasing both fairness and accuracy at the same time.
- But “race” used as input : fairness forbids this.
But, collge loan or application is “algorithmic” since maximizing “predictive accuracy”
It’s Just that when maximizing accuracy across multiple different populations, an algorithm will naturally optimize better for the majority population, at the expense of the minority population—since by definition there are more people from the majority group, and hence they contribute more to the overall accuracy of the model.
So there is simply no escaping that predictive accuracy and notions of fairness (and privacy, transparency, and many other social objectives) are simply different criteria, and that optimizing for one of them may force us to do worse than we could have on the other. This is a fact of life in machine learning. The only sensible response to this fact—from a scientific, regulatory, legal, or moral perspective—is to acknowledge it and to try to directly measure and manage the trade-offs between accuracy and fairness.
But ultimately, science can only take us so far, and human judgments and norms will always play the essential role of choosing where on such curves we want society to be, and what notion of fairness we want to enforce in the first place. Good algorithm design can specify a menu of solutions, but people still have to pick one of them.
Q: Race/Gender/Other Discrimination/Bias is a man-made creation not an algorithmic invention. Unless environment advances, algorithms will always have a bias?