HW 1, Due: Tuesday, January 31

Case Study #3

Reading: ISLBS (pg. 14 - 36)

ISLBS: Chapter 1 Exercises (pg. 77).

Collaborators (for this assignment): [Insert name(s) of other students]

Case Study # 3.

In a widely cited 2016 study, computer scientists from Princeton University and the University of Bath demonstrated that significant harmful racial and gender biases are consistently reflected in the performance of learning algorithms commonly used in natural language processing tasks to represent the relationships between meanings of words.

For example, one of the tools they studied, GloVe (Global Vectors for Word Representation), is a learning algorithm for creating word embeddings—visual maps that represent similarities and associations among word meanings in terms of distance between vectors. Thus the vectors for the words ‘water’ and ‘rain’ would appear much closer together than will the vectors for the terms ‘water’ and ‘red.’ As with other similar data models for natural language processing, when GloVe is trained on a body oftext from the Web, it learns to reflect in its own outputs “accurate imprints of [human] historic biases” (Caliskan-Islam, Bryson, and Naryanan, 2016). Some of these biases are based in objective reality (like our ‘water’ and ’rain example above). Others reflect subjective values that are (for the most part) morally neutral—for example, names for flowers (rose, lilac, tulip) are much more strongly associated with pleasant words (such as freedom, honest, miracle, and lucky), whereas names for insects (ant, beetle, hornet) are much more strongly associated (have nearer vectors) with unpleasant words (such as filth, poison, and rotten.)

However, other biases in the data models, especially those concerning race and gender, are neither objective nor harmless. As it turns out, for example, common European American names such as Ryan, Jack, Amanda, and Sarah were far more closely associated in the model with the pleasant terms (such as joy, peace, wonderful, and friend), while common African American names such as Tyrone, Darnell, and Keisha were far more likely to be associated with the unpleasant terms (such as terrible, nasty, and failure).

Common names for men were also much more closely associated with career related words such as ‘salary’ and ‘management’ than for women, whose names were more closely associated with domestic words such as ‘home’ and ‘relatives.’ Career and educational stereotypes by gender were also strongly reflected in the model’s output. The study’s authors note that this is not a deficit of a particular tool, such as GloVe, but a pervasive problem across many data models and tools trained on a corpus of human language use. Because people are (and have long been) biased in harmful and unjust ways, data models that learn from human output will carry those harmful biases forward. Often the human biases are actually concentrated or amplified by the data model.

Does it raise ethical concerns that biased tools are used to drive many tasks in big data analytics, from sentiment analysis (e.g., determining whether an interaction with a customer is pleasant), to hiring solutions (e.g., ranking resumes), to ad service and search (e.g., showing you customized content), to social robotics (understanding and responding appropriately to humans in a social setting) and many other applications? Yes.

Question 2.6: Of the eight types of ethical challenges for data practitioners that we listed in Part Two, which types are most relevant to the word embedding study? Briefly explain your answer.

The types of ethical challenges most relevant to the word embedding study are; Identifying and addressing ethically harmful data bias,and validation and testing of data models and analytics. This is because the algorithm they used could be in need of better understanding or updates to better associate words with other contexts.There could also be human bias in how the algorithm was created. // This line makes the text brown. //

The ethical challenges are 1)Identifying and addressing ethically harmful data bias 2)validation and testing of data models and analytics

Question 2.7: What ethical concerns should data practitioners have when relying on word embedding tools in natural language processing tasks and other big data applications? To say it in another way, what ethical questions should such practitioners ask themselves when using such tools?

[Data practitioners should ask how reliable their algorithm is and when big data applications are used will they provide accurate measures of the data.]

Question 2.8: Some researchers have designed ‘debiasing techniques’ to address the solution to the problem of biased word embeddings. (Bolukbasi 2016) Such techniques quantify the harmful biases, and then use algorithms to reduce or cancel out the harmful biases that would otherwise appear and be amplified by the word embeddings. Can you think of any significant tradeoffs or risks of this solution? Can you suggest any other possible solutions or ways to reduce the ethical harms of such biases?

[One risk of this solution is the debiasing techniques could be biased themselves or new harmful bias could be created. Some other solutions could be to rebuild the algorithm to ignore harmful bias and ethical issues.]

Question 2.9: Identify four different uses/applications of data in which racial or gender biases in word embeddings might cause significant ethical harms, then briefly describe the specific harms that might be caused in each of the four applications, and who they might affect.

[Applications of data which racial or gender biases in word embedding might cause significant ethical harm are food products, clothes, sports, and jobs. Food products, clothes, and sports that use these applications will create bias and ethical harm in associating certain food or clothes or sports to certain people while a job could post resumes and word embedding could create both racial and gender bias depending on the job vacancy.]

Question 2.10: Bias appears not only in language datasets but in image data. In 2016, a site called beauty.ai, supported by Microsoft, Nvidia and other sponsors, launched an online ‘beauty contest’ which solicited approximately 6000 selfies from 100 countries around the world. Of the entrants, 75% were of white and of European descent. Contestants were judged on factors such as facial symmetry, lack of blemishes and wrinkles, and how young the subjects looked for their age group. But of the 44 winners picked by a ‘robot jury’ (i.e., by beauty-detecting algorithms trained by data scientists), only 2% (1 winner) had dark skin, leading to media stories about the ‘racist’ algorithms driving the contest.16 How might the bias have got into the algorithms built to judge the contest, ifwe assume that the data scientists did not intend a racist outcome?

[The bias may have gotten into the algorithms because of recognition factors. Symmetry and lack of blemishes and wrinkles indicate blank slates are more beautiful. These factors phase those with blemishes or imperfections are not as beautiful as others. Recognition tools to recognize darker skin tones may also need updating and proper adjustments.]

ISLBS 1.6

In this study, the explanatory variables are x and

ISLBS 1.14

ISLBS 1.16

ISLBS 1.22