DS Labs Assignment

Author

Nhi Vu

Load the library and reading in the data

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dslabs)

data(package = "dslabs")
list.files(system.file("script", package = "dslabs"))
 [1] "make-admissions.R"                   
 [2] "make-brca.R"                         
 [3] "make-brexit_polls.R"                 
 [4] "make-calificaciones.R"               
 [5] "make-death_prob.R"                   
 [6] "make-divorce_margarine.R"            
 [7] "make-gapminder-rdas.R"               
 [8] "make-greenhouse_gases.R"             
 [9] "make-historic_co2.R"                 
[10] "make-mice_weights.R"                 
[11] "make-mnist_127.R"                    
[12] "make-mnist_27.R"                     
[13] "make-movielens.R"                    
[14] "make-murders-rda.R"                  
[15] "make-na_example-rda.R"               
[16] "make-nyc_regents_scores.R"           
[17] "make-olive.R"                        
[18] "make-outlier_example.R"              
[19] "make-polls_2008.R"                   
[20] "make-polls_us_election_2016.R"       
[21] "make-pr_death_counts.R"              
[22] "make-reported_heights-rda.R"         
[23] "make-research_funding_rates.R"       
[24] "make-stars.R"                        
[25] "make-temp_carbon.R"                  
[26] "make-tissue-gene-expression.R"       
[27] "make-trump_tweets.R"                 
[28] "make-weekly_us_contagious_diseases.R"
[29] "save-gapminder-example-csv.R"        

Admissions Dataset

This admissions dataset is about gender bias among graduate school admissions to UC Berkeley.

data("admissions")
library(ggthemes)

Create a facet wrap point plot to show admission rates vary by department for men vs. women

First I need to create a new column to calculate the admission rate (admitted/applicants)

admissions2 <- admissions |>
  mutate(admit_rate = admitted/applicants) # creating a new column

Next I’m plotting the graph, putting everything together. This graph is showing the admission rates vary by department for men vs. women at UC Berkeley.

p1 <- admissions2 |>
  ggplot(aes(x = major, y = admit_rate, color = gender, group = gender)) + 
  geom_point(size = 4) + #the point size
  geom_line()+ # this is to have the line connecting the points
  labs(
    title = "Admission Rate by Major and Gender at UC Berkeley",
    x = "Majors",
    y = "Admission Rate") +
  theme_bw()+ # this is the theme of the plot
  scale_color_brewer(palette = "Set2") # this is the color palette
p1

Essay

I chose the dataset “admissions,” which explores graduate admission and potential gender bias across majors. I didn’t have to do any data cleaning and all the variables were clean and ready to use as it was a small dataset. My goal was to examine how men and women differ in terms of admission rate by major. There are four variables in this data set: major, gender, admitted, applicants. I created a new variable called admit_rate, which I calculated as the number of admitted students divided by the number of applicants. In my graph, I plotted majors on the x-axis, admit rate on the y-axis, and gender was my colour legend. The graph shows that in majors A and B, there is a clear female admission rate advantage,for majors C and E, there is a clear male admission rate advantage but really small, and majors D and F’s admission rate were identical which was really interesting to see.