DS Labs Assignment

Author

A Warsaw

An Observation of the research_funding_rates Data Set

For this assignment, I will be observing the research_funding_rates data set. This set holds data showcasing potential gender bias in research funding in the Netherlands, using the following variables to assist with my observations:

discipline (The research area of discipline)
applications_total
applications_men
applications_women
awards_total
awards_men
awards_women
success_rates_total
success_rates_men
success_rates_women

I am utilizing this data set from the DS Labs package, however to find extra details regarding where the data is sourced from, you can locate their reference from:

van der Lee R, Ellemers N. Gender contributes to personal research funding success in The Nether-lands. Proc Natl Acad Sci US A. 2015 Oct 6;112(40): 12349-53. doi: 10.1073/pnas. 1510159112. Epub 2015 Sep 21. PMID: 26392544; PMCID: PMC4603485.

Load the Libraries

For starters, I will load any and all potentially necessary libraries for this observation, along with the dataset from DS Labs.

library(dslabs)
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggthemes)
library(ggrepel)
library(plotly)


Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

library(DataExplorer)
data("research_funding_rates")

Reviewing the Data Set

I would like to first get an idea of what is available in this data set.

head(research_funding_rates)

          discipline applications_total applications_men applications_women
1  Chemical sciences                122               83                 39
2  Physical sciences                174              135                 39
3            Physics                 76               67                  9
4         Humanities                396              230                166
5 Technical sciences                251              189                 62
6  Interdisciplinary                183              105                 78
  awards_total awards_men awards_women success_rates_total success_rates_men
1           32         22           10                26.2              26.5
2           35         26            9                20.1              19.3
3           20         18            2                26.3              26.9
4           65         33           32                16.4              14.3
5           43         30           13                17.1              15.9
6           29         12           17                15.8              11.4
  success_rates_women
1                25.6
2                23.1
3                22.2
4                19.3
5                21.0
6                21.8

For my observation I will need to create 2 new tibbles, isolating the needed variables and creating new variables in a pivot table to separate the data by a new variable called “gender”

I will then combine the two new tibbles into one final tibble called rfund_final

rfund1 <- research_funding_rates |> # Creating awards tibble to remove the need for separate variables
  select(discipline, awards_men, awards_women) |>
  pivot_longer(cols = starts_with("awards_"),
               names_to = "gender",
               names_prefix = "awards_",
               values_to = "awards")

rfund2 <- research_funding_rates |> # Creating success rates tibble to remove the need for separate variables
  select(discipline, success_rates_men, success_rates_women) |>
  pivot_longer(cols = starts_with("success_rates_"),
               names_to = "gender",
               names_prefix = "success_rates_",
               values_to = "success_rate")

rfund_final <- left_join(rfund1, rfund2, by = c("discipline", "gender")) |> # Combines the two created tibbles by gender and discipline to connect success rates and awards
  mutate(gender = str_to_title(gender))

Data Set Observation

Now that I have prepared the data for plotting, I would like to observe the data set by creating a dot-and-line plot to showcase the disparity between women and men regarding research awards offered.

rfund_plot <- ggplot(rfund_final, aes(x = awards, 
                                      y = discipline, 
                                      color = gender, 
                                      text = paste("Discipline: ", discipline,
                                                   "<br>Gender: ", gender,
                                                   "<br>Awards: ", awards,
                                                   "<br>Success Rate: ", success_rate, "%"))) + #Credit to ChatGPT to fix plotly using text = ...
  geom_line(aes(group = discipline), color = "gray70", size = 1) +
  geom_point(aes(size = success_rate), alpha = 1) +
  scale_color_manual(values = c("Men" = "#52b2bf", "Women" = "#de5d83")) +
  scale_size(range = c(3, 10)) + # Credit to ChatGPT for assistance with limiting the sizes
  labs(title = "Research Award Disparities by Gender Across Disciplines \nin Netherland",
       x = "Number of Awards",
       y = "Discipline",
       size = "Success Rate") +
  theme_minimal()+
  theme(panel.grid.major.x = element_blank(),
        panel.grid.major.y = element_blank())

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

ggplotly(rfund_plot, tooltip = "text")  # Credit to ChatGPT for the tooltip function

If you hover over the data points, you can see information regarding each one. Give it a try!

Data Analysis

To recall, the purpose of this data set is to showcase the disparity between research funding between men and women in different disciplines in Netherlands. So, to gain a good perspective of the data set, I chose to focus solely on the following variables from the set:

discipline
awards_men
awards_women
success_rates_men
success_rates_women

By focusing on these variables, I not only can quickly recognize the difference in amounts of men versus women being awarded research funds by varying disciplines; I can also recognize the differences in success rates as well. This was done by creating a dot-and-line graph with the x axis showing the number of awards and the y axis showing the disciplines. Each scatter point in the graph is color coded to represent the gender, and the size of each scatter point represents the success rate for the corresponding gender. The line is solely meant to give the viewer the ability to see the difference between men and women in the corresponding discipline, whether it be a large or small difference. So what can be inferred based on this graph?

Men easily outweigh women when it comes to receiving awards for research funding in the Netherlands (and likely outside of the region too) especially in all science disciplines. Interestingly enough, though, the success rate of receiving said awards appear to be nearly equal in most disciplines such as physics where 2 women succeeded in being awarded at a success rate of 22.2% versus the 18 men at 26.9%. This may potentially be due to other reasons outside of gender bias including a lack of women representation in the discipline itself (which is confirmed by the total of 9 women applying for funding in the discipline versus the 67 men). However, it is hard to ignore the very noticeable disparity in certain other disciplines like medical sciences; where men have a drastically higher award amount than women.

Despite all of this, the truth is the data does not give sound enough proof of a large disparity. As even with a large gap in award rates, it is more often explained in the success rates; even for the previously mentioned medical science as women have a success rate of 11.2% to men’s 18.8%. So, although the information does clearly show that there is a size-able difference in research funding awards given to men versus women, the truth can clearly be seen that the difference is actually in the lack of gender diversity in the discipline in and of itself. That is not to say that gender bias is not occurring when concerning research fund awards, as there is still evidence showing it occurs in examples such as the Earth/Life Sciences. But it is to say that more often than not, based on this data set, it is better explained by the amount of each gender applying for the funding than gender bias itself.

Although I must remain unbiased for this observation, I am certain with more information the disparity would become more clear, with women only having a higher success rate in social science disciplines and men in physical sciences.