library("dslabs")
data(package = "dslabs")DS LABS
Loading libraries and data
library(tidyverse)
library(ggthemes)
library (ggrepel)
data("research_funding_rates")To look at the data type and first 6 rows
str(research_funding_rates)'data.frame': 9 obs. of 10 variables:
$ discipline : chr "Chemical sciences" "Physical sciences" "Physics" "Humanities" ...
$ applications_total : num 122 174 76 396 251 183 282 834 505
$ applications_men : num 83 135 67 230 189 105 156 425 245
$ applications_women : num 39 39 9 166 62 78 126 409 260
$ awards_total : num 32 35 20 65 43 29 56 112 75
$ awards_men : num 22 26 18 33 30 12 38 65 46
$ awards_women : num 10 9 2 32 13 17 18 47 29
$ success_rates_total: num 26.2 20.1 26.3 16.4 17.1 15.8 19.9 13.4 14.9
$ success_rates_men : num 26.5 19.3 26.9 14.3 15.9 11.4 24.4 15.3 18.8
$ success_rates_women: num 25.6 23.1 22.2 19.3 21 21.8 14.3 11.5 11.2
head(research_funding_rates) discipline applications_total applications_men applications_women
1 Chemical sciences 122 83 39
2 Physical sciences 174 135 39
3 Physics 76 67 9
4 Humanities 396 230 166
5 Technical sciences 251 189 62
6 Interdisciplinary 183 105 78
awards_total awards_men awards_women success_rates_total success_rates_men
1 32 22 10 26.2 26.5
2 35 26 9 20.1 19.3
3 20 18 2 26.3 26.9
4 65 33 32 16.4 14.3
5 43 30 13 17.1 15.9
6 29 12 17 15.8 11.4
success_rates_women
1 25.6
2 23.1
3 22.2
4 19.3
5 21.0
6 21.8
Visualization
ggplot(research_funding_rates, aes(x=success_rates_men,y= success_rates_women,color = discipline )) +
geom_point(size =4, alpha = 0.8)+
geom_smooth (method = "lm", se= TRUE, color = "gray30", linewidth = 1,linetype = "dashed") + ##Added a smoothed trend line to show overall relationship and got that from dslabs and highcharter tutorial
geom_text_repel(aes(label=discipline),size = 3.5,fontface = "bold",box.padding = 0.5,max.overlaps = 15,show.legend = FALSE)+ ##Added text labels for each discipline ##https://rdrr.io/cran/ggrepel/man/geom_text_repel.html
scale_color_brewer(palette = "Set1",name = "Discipline")+
labs(title = "Relationship Between Men's and Women's Research Funding Success Rates (Netherlands)", x = "Men's Success Rate(%)",y = "Women's Success Rate (%)", color = "Discipline") +
theme_minimal(base_size = 12, base_family = "serif") + ##changed minimal theme with a serif font and size 12
theme(plot.title = element_text(face = "bold",size = 14, hjust = 0.5),
axis.text = element_text(size = 10),
legend.position = "none")`geom_smooth()` using formula = 'y ~ x'
Essay
The dataset research_funding_rates from the dslabs package examines how male and female researchers in the Netherlands applied for and received research funding across different academic disciplines. To better understand whether gender bias plays a role in funding outcomes, I visualized the relationship between men’s and women’s success rates using a scatter plot. In this plot, each point represents a discipline, where the x-axis shows men’s success rates and the y-axis shows women’s success rates. The dashed regression line represents the general trend between the two variables.
This type of plot allows for a direct comparison between male and female success rates instead of separating them by bars or categories. By labeling each discipline, I could easily see how the relationship changes across fields. For instance, Physical Sciences and Chemical Sciences show relatively high success rates for both genders, while Social Sciences and Medical Sciences appear noticeably lower. What really caught my attention was that in some fields, like Technical Sciences and Physical Sciences, women’s success rates are actually slightly higher than men’s, which goes against the common assumption that men always have an advantage in scientific funding.
The regression line shows a mild positive relationship, suggesting that in disciplines where men’s success rates are higher, women’s rates also tend to be higher, but the gap isn’t consistent. This visualization helped me see that gender bias in research funding is not uniform, it varies by discipline. Some areas seem to have become more balanced, while others still show disparities that deserve attention.
Overall, this visualization made the data easier to interpret and encouraged me to think beyond averages. Instead of assuming that women are always disadvantaged, this plot helped reveal a more nuanced story, one that depends on the field, the context, and ongoing progress toward equality in research opportunities.