An Observation of the research_funding_rates Data Set
For this assignment, I will be observing the research_funding_rates data set. This set holds data showcasing potential gender bias in research funding in the Netherlands, using the following variables to assist with my observations:
discipline (The research area of discipline)
applications_total
applications_men
applications_women
awards_total
awards_men
awards_women
success_rates_total
success_rates_men
success_rates_women
I am utilizing this data set from the DS Labs package, however to find extra details regarding where the data is sourced from, you can locate their reference from:
van der Lee R, Ellemers N. Gender contributes to personal research funding success in The Nether-lands. Proc Natl Acad Sci US A. 2015 Oct 6;112(40): 12349-53. doi: 10.1073/pnas. 1510159112. Epub 2015 Sep 21. PMID: 26392544; PMCID: PMC4603485.
Load the Libraries
For starters, I will load any and all potentially necessary libraries for this observation, along with the dataset from DS Labs.
library(dslabs)library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)library(ggrepel)library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
For my observation I will need to create 2 new tibbles, isolating the needed variables and creating new variables in a pivot table to separate the data by a new variable called “gender”
I will then combine the two new tibbles into one final tibble called rfund_final
rfund1 <- research_funding_rates |># Creating awards tibble to remove the need for separate variablesselect(discipline, awards_men, awards_women) |>pivot_longer(cols =starts_with("awards_"),names_to ="gender",names_prefix ="awards_",values_to ="awards")rfund2 <- research_funding_rates |># Creating success rates tibble to remove the need for separate variablesselect(discipline, success_rates_men, success_rates_women) |>pivot_longer(cols =starts_with("success_rates_"),names_to ="gender",names_prefix ="success_rates_",values_to ="success_rate")rfund_final <-left_join(rfund1, rfund2, by =c("discipline", "gender")) |># Combines the two created tibbles by gender and discipline to connect success rates and awardsmutate(gender =str_to_title(gender))
Data Set Observation
Now that I have prepared the data for plotting, I would like to observe the data set by creating a dot-and-line plot to showcase the disparity between women and men regarding research awards offered.
rfund_plot <-ggplot(rfund_final, aes(x = awards, y = discipline, color = gender, text =paste("Discipline: ", discipline,"<br>Gender: ", gender,"<br>Awards: ", awards,"<br>Success Rate: ", success_rate, "%"))) +#Credit to ChatGPT to fix plotly using text = ...geom_line(aes(group = discipline), color ="gray70", size =1) +geom_point(aes(size = success_rate), alpha =1) +scale_color_manual(values =c("Men"="#52b2bf", "Women"="#de5d83")) +scale_size(range =c(3, 10)) +# Credit to ChatGPT for assistance with limiting the sizeslabs(title ="Research Award Disparities by Gender Across Disciplines \nin Netherland",x ="Number of Awards",y ="Discipline",size ="Success Rate") +theme_minimal()+theme(panel.grid.major.x =element_blank(),panel.grid.major.y =element_blank())
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
ggplotly(rfund_plot, tooltip ="text") # Credit to ChatGPT for the tooltip function
If you hover over the data points, you can see information regarding each one. Give it a try!
Data Analysis
To recall, the purpose of this data set is to showcase the disparity between research funding between men and women in different disciplines in Netherlands. So, to gain a good perspective of the data set, I chose to focus solely on the following variables from the set:
discipline
awards_men
awards_women
success_rates_men
success_rates_women
By focusing on these variables, I not only can quickly recognize the difference in amounts of men versus women being awarded research funds by varying disciplines; I can also recognize the differences in success rates as well. This was done by creating a dot-and-line graph with the x axis showing the number of awards and the y axis showing the disciplines. Each scatter point in the graph is color coded to represent the gender, and the size of each scatter point represents the success rate for the corresponding gender. The line is solely meant to give the viewer the ability to see the difference between men and women in the corresponding discipline, whether it be a large or small difference. So what can be inferred based on this graph?
Men easily outweigh women when it comes to receiving awards for research funding in the Netherlands (and likely outside of the region too) especially in all science disciplines. Interestingly enough, though, the success rate of receiving said awards appear to be nearly equal in most disciplines such as physics where 2 women succeeded in being awarded at a success rate of 22.2% versus the 18 men at 26.9%. This may potentially be due to other reasons outside of gender bias including a lack of women representation in the discipline itself (which is confirmed by the total of 9 women applying for funding in the discipline versus the 67 men). However, it is hard to ignore the very noticeable disparity in certain other disciplines like medical sciences; where men have a drastically higher award amount than women.
Despite all of this, the truth is the data does not give sound enough proof of a large disparity. As even with a large gap in award rates, it is more often explained in the success rates; even for the previously mentioned medical science as women have a success rate of 11.2% to men’s 18.8%. So, although the information does clearly show that there is a size-able difference in research funding awards given to men versus women, the truth can clearly be seen that the difference is actually in the lack of gender diversity in the discipline in and of itself. That is not to say that gender bias is not occurring when concerning research fund awards, as there is still evidence showing it occurs in examples such as the Earth/Life Sciences. But it is to say that more often than not, based on this data set, it is better explained by the amount of each gender applying for the funding than gender bias itself.
Although I must remain unbiased for this observation, I am certain with more information the disparity would become more clear, with women only having a higher success rate in social science disciplines and men in physical sciences.