## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
At UC Berkley in 1973 word of a scandal surrounding the graduate programs acceptance rate spread like fire. Critics raged that the university had accepted 44.5% of male applicants and 35.4% of female applicants as seen below. This shows a major gender-bias by their acceptance office which could have big implications for their reputation.
This obviously raises suspicion in the admission’s department over
gender-bias. The university, fearing lawsuit had a statistician look
into the data. These findings are the exact opposite of what was thought
to be true. The chart below shows the acceptance rate broken down by
gender and the academic department the students applied for. Of the six
departments researched, four admitted a higher frequency than their male
counter parts. So why did the data look so different in the last graph?
This is a perfect example of Simpson’s paradox where there is a story to
be told about the data in order to understand it. Simpson’s paradox
brings to light the skepticism needed in order to see past the data to
see if it is really saying something true about the world. Simpson’s
Paradox is defined as “a trend or result that is present when data is
put into groups that reverses or disappears when the data is
combined”(Grigg).
This story, discovered by the statistician was that female students had
a higher frequency of applications to departments that admitted a
smaller percentage of applicants overall. This is an example of a
lurking variable or a variable not included in the statistical analysis
that may still affect the outcome of the data. The goal of the paper
being reviewed is to spread awareness for average people to be mindful
of studies who fall victim to this paradox or who intentionally display
their data this way to produce a flashy title that challenges a well
know outlook. Data science requires a need for good intuition when
deciding if information is true or paradoxical and look past the data
for the story
Raj, A. “Simpson’s Paradox and Interpreting Data.” Towards Data Science, Towards Data Science, 25 Aug. 2022,(https://towardsdatascience.com/simpsons-paradox-and-interpreting-data-6a0443516765)