Introduction

UC Berkley feared a gender-bias lawsuit because they found that their acceptance rates for males were higher than their acceptance rate for females. However, these results were due to something called Simpson’s Paradox. The figure below shows the obvious difference in acceptance rates between men and women.

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Rows: 4526 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Status, Gender, Department
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning in geom_bar(stat = "count", fun = "sum"): Ignoring unknown parameters:
## `fun`

Deeper Look

Simpson’s Paradox is when you gather data from a specific viewpoint and then you gather data from another viewpoint and the results of the data are completely different. This happens because of what is called a lurking variable. This variable is something in the data that makes different distributions of the data show different trends. The lurking variable in the UC Berkley Admission data is the fact that more women applied to the departments that accepted less applicants. When taking this variable into account the trend of the data completely changes. As shown in the figure below, adding in the department variable we can see that the trend of more men being accepted than women has completely disappeared.

## Works Cited

Grigg, Tom. “Simpson’s Paradox and Interpreting Data.” Towards Data Science, 23 Oct. 2020, towardsdatascience.com/simpsons-paradox-and-interpreting-data-6a0443516765.