DS Labs

Author

Micaela T

Load necessary libraries

#install.packages("dslabs")
#install.packages("highcharter")
library(tidyverse)
library(dslabs)
library(highcharter)

Load the admissions dataset

data("admissions")

Check the structure of dataset

str(admissions)
'data.frame':   12 obs. of  4 variables:
 $ major     : chr  "A" "B" "C" "D" ...
 $ gender    : chr  "men" "men" "men" "men" ...
 $ admitted  : num  62 63 37 33 28 6 82 68 34 35 ...
 $ applicants: num  825 560 325 417 191 373 108 25 593 375 ...

Check the first rows of the dataset

head(admissions) 
  major gender admitted applicants
1     A    men       62        825
2     B    men       63        560
3     C    men       37        325
4     D    men       33        417
5     E    men       28        191
6     F    men        6        373

Clean and Prepare Data

Create a new variable thats shows the admission rate as a percentage

admissions <- admissions |>
  mutate(admit_rate = admitted / applicants * 100)

Create the Scatterplot

#This plot shows how the number of applicants relates to admission rate
#The color represents gender as a third variable
ggplot(admissions, aes (x = applicants, y = admit_rate, color = gender)) + 
  geom_point(size = 4, alpha = 0.8) + #points for each gender-major combo
  geom_smooth(method = "lm", se = FALSE, linetype = "dashed") + #trend lines
  labs (title = "Admission Rate vs Number of Applicants by Gender", 
        subtitle = "Data from the 'admissions' dataset in dslabs", 
        x = "Number of Applicants",
        y = "Admission Rate (%)",
        color = "Gender") + 
  theme_minimal() + #simple clean theme
  theme(
    plot.title = element_text(size = 16, face = "bold"), 
    plot.subtitle = element_text(size = 12, color = "gray30"),
    legend.position = "bottom" 
  )
`geom_smooth()` using formula = 'y ~ x'

Interactive Version Using Highcharter

#this part creates an interactive scatter plot version of the same data
#you can hover over points to see the number of applicants, admit rate, and gender
admissions |> 
  hchart(
    "scatter",
    hcaes(x = applicants, y = admit_rate, group = gender)) |>
  hc_colors(c("lightblue", "pink")) |> #pink for women, light blue for men
  hc_title(text = "Interactive Admission Rate vs Applicants by Gender") |>
  hc_subtitle(text = "Using the 'admissions' dataset from dslabs") |>
  hc_xAxis(title = list(text = "Number of Applicants")) |>
  hc_tooltip(
    pointFormat = "Applicants: {point.x}<br>Admit Rate: {point.y:.1f}%<br>Gender: {series.name}"
  ) |>
  hc_chart(backgroundColor = "white") |>
  hc_legend(align = "center", verticalAlign = "bottom")

Description of Dataset/Visualization

In this assignment, I used the same “admissions” dataset from dslabs, which have information on graduate program applicants (number of applicants, number of students admitted, gender and majors). To better compare the data, I created a new variable called ‘admit_rate’, which calculates the percent of people getting admitted from each gender-major group. Then I created a scatterplot of the number of people applying vs. the admission rate and colored in the points with the gender as a third variable. My graph is different from the example graphs from the tutorial for a number of reasons. For one, I added the trend in the rejection rates per gender along with a linear regression line using the function theme_minimal to make the graph look cleaner. I later did an interactive version with the highcharter package with which you can hover over points. Finally, programs with more applicants seem to have a lower acceptance rate, which would be consistent with higher competition. Additionally, however, there are slight differences in male-female acceptance rates across certain majors which may prove to be an interesting subject to explore further