#install.packages("dslabs")
#install.packages("highcharter")
library(tidyverse)
library(dslabs)
library(highcharter)DS Labs
Load necessary libraries
Load the admissions dataset
data("admissions")Check the structure of dataset
str(admissions)'data.frame': 12 obs. of 4 variables:
$ major : chr "A" "B" "C" "D" ...
$ gender : chr "men" "men" "men" "men" ...
$ admitted : num 62 63 37 33 28 6 82 68 34 35 ...
$ applicants: num 825 560 325 417 191 373 108 25 593 375 ...
Check the first rows of the dataset
head(admissions) major gender admitted applicants
1 A men 62 825
2 B men 63 560
3 C men 37 325
4 D men 33 417
5 E men 28 191
6 F men 6 373
Clean and Prepare Data
Create a new variable thats shows the admission rate as a percentage
admissions <- admissions |>
mutate(admit_rate = admitted / applicants * 100)Create the Scatterplot
#This plot shows how the number of applicants relates to admission rate
#The color represents gender as a third variable
ggplot(admissions, aes (x = applicants, y = admit_rate, color = gender)) +
geom_point(size = 4, alpha = 0.8) + #points for each gender-major combo
geom_smooth(method = "lm", se = FALSE, linetype = "dashed") + #trend lines
labs (title = "Admission Rate vs Number of Applicants by Gender",
subtitle = "Data from the 'admissions' dataset in dslabs",
x = "Number of Applicants",
y = "Admission Rate (%)",
color = "Gender") +
theme_minimal() + #simple clean theme
theme(
plot.title = element_text(size = 16, face = "bold"),
plot.subtitle = element_text(size = 12, color = "gray30"),
legend.position = "bottom"
)`geom_smooth()` using formula = 'y ~ x'
Interactive Version Using Highcharter
#this part creates an interactive scatter plot version of the same data
#you can hover over points to see the number of applicants, admit rate, and gender
admissions |>
hchart(
"scatter",
hcaes(x = applicants, y = admit_rate, group = gender)) |>
hc_colors(c("lightblue", "pink")) |> #pink for women, light blue for men
hc_title(text = "Interactive Admission Rate vs Applicants by Gender") |>
hc_subtitle(text = "Using the 'admissions' dataset from dslabs") |>
hc_xAxis(title = list(text = "Number of Applicants")) |>
hc_tooltip(
pointFormat = "Applicants: {point.x}<br>Admit Rate: {point.y:.1f}%<br>Gender: {series.name}"
) |>
hc_chart(backgroundColor = "white") |>
hc_legend(align = "center", verticalAlign = "bottom")Description of Dataset/Visualization
In this assignment, I used the same “admissions” dataset from dslabs, which have information on graduate program applicants (number of applicants, number of students admitted, gender and majors). To better compare the data, I created a new variable called ‘admit_rate’, which calculates the percent of people getting admitted from each gender-major group. Then I created a scatterplot of the number of people applying vs. the admission rate and colored in the points with the gender as a third variable. My graph is different from the example graphs from the tutorial for a number of reasons. For one, I added the trend in the rejection rates per gender along with a linear regression line using the function theme_minimal to make the graph look cleaner. I later did an interactive version with the highcharter package with which you can hover over points. Finally, programs with more applicants seem to have a lower acceptance rate, which would be consistent with higher competition. Additionally, however, there are slight differences in male-female acceptance rates across certain majors which may prove to be an interesting subject to explore further