library("dslabs")
data(package = "dslabs")DS LABS
Loading libraries and data
library(tidyverse)
data("research_funding_rates")To look at the data type and first 6 rows
str(research_funding_rates)'data.frame': 9 obs. of 10 variables:
$ discipline : chr "Chemical sciences" "Physical sciences" "Physics" "Humanities" ...
$ applications_total : num 122 174 76 396 251 183 282 834 505
$ applications_men : num 83 135 67 230 189 105 156 425 245
$ applications_women : num 39 39 9 166 62 78 126 409 260
$ awards_total : num 32 35 20 65 43 29 56 112 75
$ awards_men : num 22 26 18 33 30 12 38 65 46
$ awards_women : num 10 9 2 32 13 17 18 47 29
$ success_rates_total: num 26.2 20.1 26.3 16.4 17.1 15.8 19.9 13.4 14.9
$ success_rates_men : num 26.5 19.3 26.9 14.3 15.9 11.4 24.4 15.3 18.8
$ success_rates_women: num 25.6 23.1 22.2 19.3 21 21.8 14.3 11.5 11.2
head(research_funding_rates) discipline applications_total applications_men applications_women
1 Chemical sciences 122 83 39
2 Physical sciences 174 135 39
3 Physics 76 67 9
4 Humanities 396 230 166
5 Technical sciences 251 189 62
6 Interdisciplinary 183 105 78
awards_total awards_men awards_women success_rates_total success_rates_men
1 32 22 10 26.2 26.5
2 35 26 9 20.1 19.3
3 20 18 2 26.3 26.9
4 65 33 32 16.4 14.3
5 43 30 13 17.1 15.9
6 29 12 17 15.8 11.4
success_rates_women
1 25.6
2 23.1
3 22.2
4 19.3
5 21.0
6 21.8
Reshaping the data : convert from wide to long format
The original data set has two seperate columns for men’s and women’s success rates.
funding_rates_long <- research_funding_rates |>
pivot_longer(cols = c(success_rates_men, success_rates_women), ##columns to combine
names_to = "gender", ##new column for the old column names
values_to = "success_rate") ##new column for the actual success rate values
##I used hatecrimes tutorial hereRenaming gender values to make them easier to read in the legend
funding_rates_long <- funding_rates_long |>
mutate(gender = case_when(gender=="success_rates_men" ~ "Men", gender == "success_rates_women" ~ "Women", TRUE ~ gender)) ##just because it is hard to read as success_rate_men and success_rate_women in legendVisualization
ggplot(funding_rates_long,aes(x=reorder(discipline,success_rate),y= success_rate,color = gender, group = gender)) +
geom_line(linewidth = 1)+
geom_point(size =3)+
scale_color_manual(values=c("Men" = "#27408B", "Women" = "#FF1493"),name = "Gender")+
labs(title = "Research Funding Success Rates by Gender and Discipline (Netherlands)", x = "Discipline", y = "Success Rate(%)")+
theme_minimal(base_size = 12, base_family = "serif") + ##changed minimal theme with a serif font and size 12
theme(
plot.title = element_text(face = "bold"), ##make title bold
axis.text.x = element_text(angle = 45, hjust = 1), ##making x-axis labels in angle 45 and right align
legend.position = "top") ##move legend to above the plot, ##For theme changes, I used Iris's Project 1. Essay
The data set I used, research_funding_rates from the dslabs package, shows how male and female researchers in the Netherlands apply for and receive funding in different academic fields(disciplines). I reshaped the data from wide to long format so I could visualize how success rates changes between genders with each field. Then, I created a line-point plot where the x axis represents academic disciplines, the y axis shows the success rate (in percent), and color represents gender as a third variable. Each line connects the success rates for men and women within the same discipline, making it easy to see where the gaps appear.
After visualizing the data, I noticed that men generally have slightly higher success rates in most disciplines, which matches what many people expect when talking about gender bias in research funding. However, I was actually surprised too see that in the Technical and Physical Sciences, women’s success rates were higher than men’s. This result goes against the common belief that women are less successful in those areas, which made me think about how some fields maybe becoming more balanced or even giving women better chances than before. Overall, this data set helped me see that gender differences in research funding are not the same everywhere, some disciplines still show a gap, while others are showing positive change.