2020-02-26 22:37:45

Discussion Paper(s): Chap 2

Overview:

What to takeaway?

  • scientific notion of algorithmic & human bias and discrimination.
  • how to detect & measure them?
  • how to design fairer algorithmic solution.
  • what the cost of fairness is for predictive accuracy and others?

Bias By Analogy: Word Embeddings and Biases:


- “Man Is to Computer Programmer as Woman Is to Homemaker?”: word2vec gender biased.
- The problem here is that the training data used in machine learning applications can often contain all kinds of hidden (and not-so-hidden) biases, and the act of building complex models from such data can both amplify these biases and introduce new ones.

But, Optimistic:

  • more optimistic subtitle: “Debiasing Word Embeddings

  • Suggests a principled algorithm for building models that can avoid or reduce those concerns
  • The algorithm is able to “subtract off” the bias in the data associated with non-genedered words.

Learning All About You:

  • Word Embeddings : Unsupervised
  • Most applications are “supervised”: predicting aggregate behaviour.
  • With massive trails we leave behind: Machine learning could now move from predictions about the collective to predictions about the individual.
  • And once predictions could be personalized, so could discrimination
  • When machine learning gets personal, however, mistakes of prediction can cause real harms to specific individuals
  • Arenas where ML is used range from
    • mundane : Google ads, Netflix suggestions
    • highly consequential : mortgage application, college acceptance, recidivism.
  • Thus, anywhere ML applied discrimination & bias is real : often because of the science behind it.
  • Addressing it by modifying the science comes at a cost.

YOU ARE YOUR VECTOR

  • Supervised Learning Example:
  • Kate : predict outcome \(y\) given \(x\) : \(<x,y>\)
    • \(y\): accepeted by college or not; collge loan approved or not
  • But “what information should be allowed in a standard prediction”?

  • Contentious debate on using: race, gender, age, sex

  • But what if they are needed for “equity”/“fairness”

  • These questions do not have easy answers, and human judgments and norms will always need to play a central role in the debate.

Forbidden Inputs

  • Defining fairness by forbidding the use of certain information has become an infeasible approach in the machine learning era.

  • An alternative approach is to instead define fairness relative to the actual decisions or predictions a model makes—in other words, to define the fairness of the model’s outputs \(y\) rather than its inputs \(x\).
    • has been successful but also is not without its own drawbacks and complexities
  • And even if we settle on just one of them, the predictions made by a model obeying a fairness constraint will, as a general rule, always be less accurate than the predictions made by a model that doesn’t have to; the only question is how much less accurate.

  • Science can shed light on the pros and cons of different definitions, as we’ll see, but it can’t decide on right and wrong.

Defining Fairness

Simple Notion: Statistical Parity

  • First, identify the group of individuals to protect : Circles and Squares (who gets the loan?)
  • Statistical parity simply asks that the fraction of Square applicants that are granted loans be approximately the same as the fraction of Circle applicants that are granted loans
    • it’s just a crude constraint saying that the rate of granted loans has to be roughly the same for both races
  • Statistical Parity : Weak and Flawed
    • First, SP doesn’t make any mention of the input vector \(x\) - individual features.
    • Second, SP also makes no mention of \(y\) - ultimate creditwothiness of an individual.

  • In the era of data and machine learning, society will have to accept, and make decisions about, trade-offs between how fair models are and how accurate they are.
  • SP remedied by distribution the MISTAKES evenly rather than loans.
  • Rate of False Rejections Be The Same for Circles and Squares
  • i.e., Equality of False Negative constraint.

Fairness Fighting Accuracy

CASE I :

  • On this data, the best model from a pure accuracy perspective is the cutoff labeled as “optimal.”
  • Made 7 mistakes: 2 circles and 5 squares.
  • This violates the equality of false negatives notion of fairness.
  • Push the cutoff back : But now to be “more fair” less accuracy : 8 mistakes

  • improving fairness will degrade accuracy, and vice versa.

CASE 2:


- This might indeed be a good thing to do, increasing both fairness and accuracy at the same time.
- But “race” used as input : fairness forbids this.

  • Note: Word embeddings bias owing to human judgements in data
  • But, collge loan or application is “algorithmic” since maximizing “predictive accuracy”

  • It’s Just that when maximizing accuracy across multiple different populations, an algorithm will naturally optimize better for the majority population, at the expense of the minority population—since by definition there are more people from the majority group, and hence they contribute more to the overall accuracy of the model.

  • So there is simply no escaping that predictive accuracy and notions of fairness (and privacy, transparency, and many other social objectives) are simply different criteria, and that optimizing for one of them may force us to do worse than we could have on the other. This is a fact of life in machine learning. The only sensible response to this fact—from a scientific, regulatory, legal, or moral perspective—is to acknowledge it and to try to directly measure and manage the trade-offs between accuracy and fairness.

NO SUCH THING AS A FAIR LUNCH

  • How might we go about exploring this trade-off in a quantitative and systematic fashion—in other words, algorithmically?
  • The basic idea is just a search for the model with the lowest overall error.
  • Using the same principles as for standard error-minimizing machine learning, we could instead design algorithms for unfairness-minimizing machine learning.
  • Putting both together , we get to values:
    • 1 the number of mistakes it makes on the data and
    • 2 its unfairness score on the data
  • Algorithm enumerates these numbers for all models and picks the “best” trade-off?

PAreto Frontier/ Pareto Curves.

  • The Pareto frontier makes our problem as quantitative as possible, but no more so.

Discussion Thought :

Authors Conclusion:

But ultimately, science can only take us so far, and human judgments and norms will always play the essential role of choosing where on such curves we want society to be, and what notion of fairness we want to enforce in the first place. Good algorithm design can specify a menu of solutions, but people still have to pick one of them.

What Do We Think:

  • Q: Race/Gender/Other Discrimination/Bias is a man-made creation not an algorithmic invention. Unless environment advances, algorithms will always have a bias?

  • Q : Fairness is difficult to quantify ? Fairness to a Democrat is different for a Republican.
  • Q : Do we need a “moral fairness” or a “theoretical fairness” measure? And How?
  • Why don’t we “simplify” the black box and allow us to reveal what features were important and by how much?
    • If “race==black” strongly predicted “recidivism==yes” : BAD! Human judgement needed
    • But if “race==black” strongly predicted “loan ==yes”: GOOD! Go ahead.