According to the article “Simpson’s Paradox and Interpreting Data” by Tom Grigg, Simpson’s Paradox is defined as, “A trend or result that is present when data is put into groups that reverses or disappears when the data is combined” (Grigg 2018). In other words when another variable, or lurking variable is introduced, the trend disappears, or even reverses. The article explains a lurking variable as, “hidden variables that split data into multiple separate distributions” (Grigg 2018). Looking at the case of UC-Berkeley’s Admission Data, in 1973, the school noticed a higher admission rate for males, rather than females. Statistician Peter Bickel then looked deeper into the data and found a statistically significant gender bias in favor of women in a high percentage of the departments. In the charts below, I will explain and support Bickel’s argument and proof of Simpson’s Paradox.
This chart simply shows the difference in the admission rate of male and female applicants of UC-Berkeley’s Graduate Program. It can be noticed that males are being accepted at a significantly higher rate.
This chart displays the difference of the admission rates of each gender in each of the six departments. Similar to Bickel’s findings, it can be shown that females, on contrary to the previous graph, are admitted at a higher rate in four of the six departments (A, B, D, & E).
This graph helps explain and understand the reasoning behind 4 of the 6 departments having a higher admission rate for females, while overall, males are being admitted at a higher rate. Very simply, the departments in which the females are being accepted rate, especially A & B, have a significantly less amount of females applying.
Griggs, Tom. “Simpson’s Paradox and Interpreting Data: The Challenge of Finding the Right View Through Data”, Towards Data Science, 9 December 2018, https://towardsdatascience.com/simpsons-paradox-and-interpreting-data-6a0443516765
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.