According to the article “Simpson’s Paradox and Interpreting Data” by Tom Grigg, Simpson’s Paradox is defined as, “A trend or result that is present when data is put into groups that reverses or disappears when the data is combined” (Grigg 2018). In other words when another variable, or lurking variable is introduced, the trend disappears, or even reverses. The article explains a lurking variable as, “hidden variables that split data into multiple separate distributions” (Grigg 2018). Looking at the case of UC-Berkeley’s Admission Data, in 1973, the school noticed a higher admission rate for males, rather than females. Statistician Peter Bickel then looked deeper into the data and found a statistically significant gender bias in favor of women in a high percentage of the departments. In the charts below, I will explain and support Bickel’s argument and proof of Simpson’s Paradox.

Graph 1: Male vs. Female Admission Rate

This chart simply shows the difference in the admission rate of male and female applicants of UC-Berkeley’s Graduate Program. It can be noticed that males are being accepted at a significantly higher rate.

Graph 2: Admission Rate by Department based on Gender

This chart displays the difference of the admission rates of each gender in each of the six departments. Similar to Bickel’s findings, it can be shown that females, on contrary to the previous graph, are admitted at a higher rate in four of the six departments (A, B, D, & E).

Graph 3: Amount of Applicants in each Department based on Gender

This graph helps explain and understand the reasoning behind 4 of the 6 departments having a higher admission rate for females, while overall, males are being admitted at a higher rate. Very simply, the departments in which the females are being accepted rate, especially A & B, have a significantly less amount of females applying.

Works Cited

Griggs, Tom. “Simpson’s Paradox and Interpreting Data: The Challenge of Finding the Right View Through Data”, Towards Data Science, 9 December 2018, https://towardsdatascience.com/simpsons-paradox-and-interpreting-data-6a0443516765

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.