## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

At UC Berkley in 1973 word of a scandal surrounding the graduate programs acceptance rate spread like fire. Critics raged that the university had accepted 44.5% of male applicants and 35.4% of female applicants as seen below. This shows a major gender-bias by their acceptance office which could have big implications for their reputation.

This obviously raises suspicion in the admission’s department over gender-bias. The university, fearing lawsuit had a statistician look into the data. These findings are the exact opposite of what was thought to be true. The chart below shows the acceptance rate broken down by gender and the academic department the students applied for. Of the six departments researched, four admitted a higher frequency than their male counter parts. So why did the data look so different in the last graph? This is a perfect example of Simpson’s paradox where there is a story to be told about the data in order to understand it. Simpson’s paradox brings to light the skepticism needed in order to see past the data to see if it is really saying something true about the world. Simpson’s Paradox is defined as “a trend or result that is present when data is put into groups that reverses or disappears when the data is combined”(Grigg).

This story, discovered by the statistician was that female students had a higher frequency of applications to departments that admitted a smaller percentage of applicants overall. This is an example of a lurking variable or a variable not included in the statistical analysis that may still affect the outcome of the data. The goal of the paper being reviewed is to spread awareness for average people to be mindful of studies who fall victim to this paradox or who intentionally display their data this way to produce a flashy title that challenges a well know outlook. Data science requires a need for good intuition when deciding if information is true or paradoxical and look past the data for the story

Raj, A. “Simpson’s Paradox and Interpreting Data.” Towards Data Science, Towards Data Science, 25 Aug. 2022,(https://towardsdatascience.com/simpsons-paradox-and-interpreting-data-6a0443516765)