Don’t judge half the Data by its cover!!!

The Simpson paradox describes a situation where an association between two variables that can be identified for the entire population disappears or is even reversed when analyzing sub populations. Hidden (lurking) variable is a variable that is not included in a statistical analysis, but it can affect the outcome. These variables can create problems by obscuring your statistical results.

The blog “Simpson’s Paradox and Interpreting Data” created by Tom Grigg was about one of the most well-known events of Simpson’s paradox. It involves UC Berkeley’s supposed gender prejudice in admissions in 1973. The graduate school admitted 44% of male applicants and 35% of female applicants, yet more males generally applied. However, there is important missing details and factors to consider. If you just look at graph A, below you would think there was favoritism for male applicants. Thankfully the college was not sued for gender discrimination, but needed to dig deeper so they asked statistician, Peter Bickel to analyze the data. Peter Bickel actually found gender bias in favor of women in 4 out of 6 departments, with no bias history in the other two departments.

Graph A

The initial analysis showed that men had higher admission rates than women. The data showed that women tended to apply to more competitive departments with lower overall admission rates. Men tended to apply to less competitive department with higher overall admission rates. Based on graph b provided, UC Berkley’s Acceptance graph based on departments displays greater percentage of women who were admitted into department A,B. Furthermore, even departments e and f proportions of admitted students, that favored men over women wasn’t actually but not by much. Without a more detailed analysis into the data the first and broad look of the numbers would not give you an accurate depiction of the full story.

Graph B

Bibliography

Grigg, Tom. “Simpson’s Paradox and Interpreting Data.” Medium, Towards Data Science, 8 Jan. 2019, towardsdatascience.com/simpsons-paradox-and-interpreting-data-6a0443516765.

OpenAI. (2024). How to fix rstudio code. OpenAI ChatGPT. https://chat.openai.com