2023-03-27

Intro & Problem

The data set used for this presentation is the built-in data set, UCBAdmissions, in R studio. This is a 3 dimensional array that consists of 3 variables, Admit, Gender, and Dept. Students were separated by gender and whether they were rejected or admitted to UC Berkeley for each department A through F.

Using this data set we can gain insight on a statistical phenomenon named Simpson’s Paradox. Through the lens of what seems to be sex bias in admission practices.

Simpson’s Paradox is defined as: a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined.

The data set and more information on it can be found at this link: https://www.rdocumentation.org/packages/datasets/versions/3.6.2/topics/UCBAdmissions

Data Setup

First we have to make sure to load in the data and luckily we do not need to omit any NaN values because this was a precleaned dataset.

data(UCBAdmissions)
str(UCBAdmissions)
##  'table' num [1:2, 1:2, 1:6] 512 313 89 19 353 207 17 8 120 205 ...
##  - attr(*, "dimnames")=List of 3
##   ..$ Admit : chr [1:2] "Admitted" "Rejected"
##   ..$ Gender: chr [1:2] "Male" "Female"
##   ..$ Dept  : chr [1:6] "A" "B" "C" "D" ...
apply(UCBAdmissions, c(1, 2), sum)
##           Gender
## Admit      Male Female
##   Admitted 1198    557
##   Rejected 1493   1278

While the struct and table of admissions/rejections based on gender can give us a little look into what the data set shows, we need something more visual.

Preliminary Visualization Pt.1

This graph demonstrates the aggregated data over all the departments and separated by gender. By looking solely at this plot we can see that it seems as though women are by in large being rejected at a much higher rate and therefore accepted at a much lower rate as compared to the male counterparts. This brings the question, is there sexual bias within admissions at UC Berkeley?

Visualization Pt.2

Insights Gained

By looking at the plot in the previous slide you can see the data based on the individual departments. This is where the Simpson’s Paradox comes into play. There seems to be no discrepancy between the admission/rejections rates and the genders of the students as there was with the aggregate data. Females actually seemed to be getting accepted at a higher rate than men when looking at most of the departments individually.

On the next slides we can see a table of the overall admission rates by department and the ratio in which gender applied to which department.

Deeper Understanding Through Tables Pt.1

Overall Admission Rates by Department
Dept total_applicants total_admitted admission_rate
A 933 601 0.6441586
B 585 370 0.6324786
C 918 322 0.3507625
D 792 269 0.3396465
E 584 147 0.2517123
F 714 46 0.0644258

Deeper Understanding Through Tables Pt.1

## `summarise()` has grouped output by 'Dept'. You can override using the
## `.groups` argument.
Percentage of Students Who Applied to Each Department by Gender
Dept total percent_male
A 933 88.4
B 585 95.7
C 918 35.4
D 792 52.7
E 584 32.7
F 714 52.2

Final Conclusions

The tables reveal something very interesting about the data set and explains exactly why the aggregate data looks so skewed in favor of males. We can see from the table that Departments C through F’s acceptance rates were drastically lower than those of A and B. F’s acceptance rate was an abysmal 6% and the average of C through E’s acceptance rates were 31% as opposed to A and B’s rates of 64% and 63% respectively.

What we can also see from the tables is that a majority of women applied to these departments in which their acceptance rates were way lower. About 90% of the applicants for Department A and B were men while only about 33% applicants were males for departments C and E and about half of applicants were women for Departments D and F.

From this we can extrapolate that the reason it seemed as though women were being unfairly rejected from UC Berkeley, it was due to the fact that a high amount of women would apply to departments with high rejection rates while men were applying to departments with high acceptance rates.