Setting Up
We will exploring a dataset called “admissions” which has the
description: “Gender bias among graduate school admissions to UC
Berkley.” It is not specific about its bias gender, so let us rejoice in
the name of Data Analysis because now, we will try to find that out
ourselves.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.1 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.1.0
## ✔ tidyr 1.3.0 ✔ stringr 1.5.0
## ✔ readr 2.1.4 ✔ forcats 1.0.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library("dslabs")
data("admissions")
Preparing the Data
We ultimately want to answer the question: “What exactly is the
gender disparity?” To do this, we will be adding a column that looks at
acceptance rate based off gender and major, and creating a visualization
based off our findings.
newadmin <- admissions %>% mutate(acceptance_rate = round((admitted / applicants) * 100, digits = 2))
Visualizing the Data
acc_plot <- ggplot(data = newadmin, aes(x = major, y = acceptance_rate, color = gender, size = acceptance_rate)) +
geom_point() +
labs(x = "Major", y = "Acceptance Rate", title = "UC Berkley Graduate School Acceptance Rates based on Major and Gender", color = "Gender") +
guides(size = FALSE) +
theme_linedraw()
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
acc_plot

Notice there is a confusing outlier at major “B”. We’re not sure how
this came to be on the dataset. We could speculate the cause, or we can
revisualize, putting more emphasis on majors A, C, D, E, and F.
adminnob <- newadmin %>% slice(-c(2, 8))
acc_plot2 <- ggplot(data = adminnob, aes(x = major, y = acceptance_rate, color = gender, size = acceptance_rate)) +
geom_point() +
labs(x = "Major", y = "Acceptance Rate", title = "UC Berkley Graduate School Acceptance Rates based on Major and Gender", color = "Gender") +
guides(size = FALSE) +
theme_light() +
scale_y_continuous()
acc_plot2

We can see the obvious disparity in major A, so let us take a closer
look at the rest
adminnoab <- newadmin %>% slice(-c(1, 2, 7, 8))
acc_plot3 <- ggplot(data = adminnoab, aes(x = major, y = acceptance_rate, color = gender)) +
geom_point() +
labs(x = "Major", y = "Acceptance Rate", title = "UC Berkley Graduate School Acceptance Rates based on Major and Gender", color = "Gender") +
guides(size = FALSE) +
theme_light() +
scale_y_continuous()
acc_plot3

Interpreting our findings
The graph illustrates that women have a higher acceptance rate at
the UC Berkley Graduate School for majors A, D, and F, while men have
higher rates for majors C and E. We’re not sure how accurate the reading
for major B is. It would be nice to know what exactly the grouping is
for majors is (because there are far more than just 5). This would give
us some insights on the disparities within the field. That data could
then transcend into other datasets and shed light on the issue.