In 1973, UC Berkeley’s graduate school was accused of potential gender bias in its admission process. Although the school wasn’t sued for gender discrimination, the school was concerned about potential legal action, so they decided to ask statistician Peter Bickel to examine the data. At the beginning of the academic year, UC Berkeley’s graduate school had admitted roughly 44% of their male applicants and 35% of their female applicants (Grigg). The graph below shows this percentage of students accepted based on their gender.

After Bickel analyzed the data, he discovered something surprising: four out of the six departments showed a statistically significant gender bias in favor of women, while the other two had no significant bias (Grigg). His team also found that women tended to apply to departments with lower overall acceptance rates. Essentially, the conclusion flipped when Bickel’s team changed their data-viewpoint to account for the school being divided into departments. This hidden factor affected the acceptance percentages in a way that reversed the trend seen in the overall data, as we can see in the graph below.

All we have seen in this report is an example of the Simpson’s paradox. It highlights the need for good intuition regarding the real world and how most data is a finite dimensional representation of a much larger, much more complex domain. Edward Hugh Simpson, a statistician and former cryptanalyst at Bletchley Park (Grigg), described this statistical phenomenon, that takes his name, as a trend or result that is present when data is put into groups that reverses or disappears when the data is combined.

Works Cited:

Grigg, Tom. “Simpson’s Paradox and Interpreting Data”. Towards Data Science, 31 Jan. 2020, https://towardsdatascience.com/simpsons-paradox-and-interpreting-data-6a0443516765. Accessed 8 Sept. 2024.