DS LABS

Author

Thiloni Konara

Loading libraries and data

library("dslabs")
data(package = "dslabs")
library(tidyverse)
library(ggthemes)
library (ggrepel)
data("research_funding_rates")

To look at the data type and first 6 rows

str(research_funding_rates)
'data.frame':   9 obs. of  10 variables:
 $ discipline         : chr  "Chemical sciences" "Physical sciences" "Physics" "Humanities" ...
 $ applications_total : num  122 174 76 396 251 183 282 834 505
 $ applications_men   : num  83 135 67 230 189 105 156 425 245
 $ applications_women : num  39 39 9 166 62 78 126 409 260
 $ awards_total       : num  32 35 20 65 43 29 56 112 75
 $ awards_men         : num  22 26 18 33 30 12 38 65 46
 $ awards_women       : num  10 9 2 32 13 17 18 47 29
 $ success_rates_total: num  26.2 20.1 26.3 16.4 17.1 15.8 19.9 13.4 14.9
 $ success_rates_men  : num  26.5 19.3 26.9 14.3 15.9 11.4 24.4 15.3 18.8
 $ success_rates_women: num  25.6 23.1 22.2 19.3 21 21.8 14.3 11.5 11.2
head(research_funding_rates)
          discipline applications_total applications_men applications_women
1  Chemical sciences                122               83                 39
2  Physical sciences                174              135                 39
3            Physics                 76               67                  9
4         Humanities                396              230                166
5 Technical sciences                251              189                 62
6  Interdisciplinary                183              105                 78
  awards_total awards_men awards_women success_rates_total success_rates_men
1           32         22           10                26.2              26.5
2           35         26            9                20.1              19.3
3           20         18            2                26.3              26.9
4           65         33           32                16.4              14.3
5           43         30           13                17.1              15.9
6           29         12           17                15.8              11.4
  success_rates_women
1                25.6
2                23.1
3                22.2
4                19.3
5                21.0
6                21.8

Visualization

ggplot(research_funding_rates, aes(x=success_rates_men,y= success_rates_women,color = discipline )) +
  geom_point(size =4, alpha = 0.8)+
  geom_smooth (method = "lm", se= TRUE, color = "gray30", linewidth = 1,linetype = "dashed") + ##Added a smoothed trend line to show overall relationship and got that from dslabs and highcharter tutorial
  geom_text_repel(aes(label=discipline),size = 3.5,fontface = "bold",box.padding = 0.5,max.overlaps = 15,show.legend = FALSE)+ ##Added text labels for each discipline ##https://rdrr.io/cran/ggrepel/man/geom_text_repel.html
  scale_color_brewer(palette = "Set1",name = "Discipline")+
  labs(title = "Relationship Between Men's and Women's Research Funding Success Rates (Netherlands)", x = "Men's Success Rate(%)",y = "Women's Success Rate (%)", color = "Discipline") +
  theme_minimal(base_size = 12, base_family = "serif") + ##changed minimal theme with a serif font and size 12
  theme(plot.title = element_text(face = "bold",size = 14, hjust = 0.5),
        axis.text = element_text(size = 10),
        legend.position = "none")
`geom_smooth()` using formula = 'y ~ x'

Essay

The dataset research_funding_rates from the dslabs package examines how male and female researchers in the Netherlands applied for and received research funding across different academic disciplines. To better understand whether gender bias plays a role in funding outcomes, I visualized the relationship between men’s and women’s success rates using a scatter plot. In this plot, each point represents a discipline, where the x-axis shows men’s success rates and the y-axis shows women’s success rates. The dashed regression line represents the general trend between the two variables.

This type of plot allows for a direct comparison between male and female success rates instead of separating them by bars or categories. By labeling each discipline, I could easily see how the relationship changes across fields. For instance, Physical Sciences and Chemical Sciences show relatively high success rates for both genders, while Social Sciences and Medical Sciences appear noticeably lower. What really caught my attention was that in some fields, like Technical Sciences and Physical Sciences, women’s success rates are actually slightly higher than men’s, which goes against the common assumption that men always have an advantage in scientific funding.

The regression line shows a mild positive relationship, suggesting that in disciplines where men’s success rates are higher, women’s rates also tend to be higher, but the gap isn’t consistent. This visualization helped me see that gender bias in research funding is not uniform, it varies by discipline. Some areas seem to have become more balanced, while others still show disparities that deserve attention.

Overall, this visualization made the data easier to interpret and encouraged me to think beyond averages. Instead of assuming that women are always disadvantaged, this plot helped reveal a more nuanced story, one that depends on the field, the context, and ongoing progress toward equality in research opportunities.