DSLabs Assignment

Author

Efren Martinez

First we will need to load our DSLabs library since the data set we will use will be inside.

We will use the data set named “admissions”. While disguising the major’s name this will help us review Gender bias among graduate school admissions to UC Berkeley.

library(dslabs)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggrepel)
library(RColorBrewer)
library(ggthemes)
data("admissions")
str(admissions)
'data.frame':   12 obs. of  4 variables:
 $ major     : chr  "A" "B" "C" "D" ...
 $ gender    : chr  "men" "men" "men" "men" ...
 $ admitted  : num  62 63 37 33 28 6 82 68 34 35 ...
 $ applicants: num  825 560 325 417 191 373 108 25 593 375 ...

Saving CSV to working directory for future use/review

write_csv(admissions, "admissions.csv", na="")

Acceptance Rate

First I’d like to see the acceptance rate of each major based on the admitted vs applicants

admissions1 <-admissions |>
  mutate(acceptance_rate = admitted/applicants)

head(admissions1)
  major gender admitted applicants acceptance_rate
1     A    men       62        825      0.07515152
2     B    men       63        560      0.11250000
3     C    men       37        325      0.11384615
4     D    men       33        417      0.07913669
5     E    men       28        191      0.14659686
6     F    men        6        373      0.01608579

In the first two majors we see that women have much higher acceptance rates, but this might be skewed due to the amount of applicants per sex.

Total Applicants

Lets combine both applicant pools and we will see the acceptance rate of men and women based on total applicants

total_applicants_by_major <- admissions |>
  group_by(major) |>
  summarise(
    total_applicants = sum(applicants, na.rm = TRUE)
  )
head(total_applicants_by_major)
# A tibble: 6 × 2
  major total_applicants
  <chr>            <dbl>
1 A                  933
2 B                  585
3 C                  918
4 D                  792
5 E                  584
6 F                  714

Bring new column back into the data set

admissions_with_total <- admissions1 |>
  left_join(total_applicants_by_major, by = "major")

head(admissions_with_total)
  major gender admitted applicants acceptance_rate total_applicants
1     A    men       62        825      0.07515152              933
2     B    men       63        560      0.11250000              585
3     C    men       37        325      0.11384615              918
4     D    men       33        417      0.07913669              792
5     E    men       28        191      0.14659686              584
6     F    men        6        373      0.01608579              714

New acceptance rate from total applicants

admissions3 <-admissions_with_total |>
  mutate(acceptance_rate_with_total = admitted/total_applicants)

head(admissions3)
  major gender admitted applicants acceptance_rate total_applicants
1     A    men       62        825      0.07515152              933
2     B    men       63        560      0.11250000              585
3     C    men       37        325      0.11384615              918
4     D    men       33        417      0.07913669              792
5     E    men       28        191      0.14659686              584
6     F    men        6        373      0.01608579              714
  acceptance_rate_with_total
1                0.066452304
2                0.107692308
3                0.040305011
4                0.041666667
5                0.047945205
6                0.008403361

Now to plot and see what we find -

ggplot(admissions3, aes(y = acceptance_rate_with_total, x = total_applicants, label = gender)) +
  geom_point(aes(color = major), size = 6) +
  geom_text_repel(nudge_x = .001) + 
  labs(y = "Acceptance Rate", x = "Total Applicants", title = "Comparing Gender Acceptance Rates by Major") +
  scale_color_brewer(palette = "Set2") +
   guides(color = guide_legend(title = "Major")) +
  theme_calc()

Results

Using the data set for admissions to UC Berkeley we can compare the acceptance rates of genders in each major to analyze the reason for this. Using a geom_point plot to display this data with the acceptance rate from total applicants in the Y axis and the Total Applicants in the X axis. In the plot, I show the acceptance rates of men and women and use a label to identify them, but color them by major to compare them to each other. This data shows differences depending on which major is taken on which gender has the higher acceptance rate having most being similar except major “A”.