DS LABS

Author

Thiloni Konara

Loading libraries and data

library("dslabs")
data(package = "dslabs")
library(tidyverse)
data("research_funding_rates")

To look at the data type and first 6 rows

str(research_funding_rates)
'data.frame':   9 obs. of  10 variables:
 $ discipline         : chr  "Chemical sciences" "Physical sciences" "Physics" "Humanities" ...
 $ applications_total : num  122 174 76 396 251 183 282 834 505
 $ applications_men   : num  83 135 67 230 189 105 156 425 245
 $ applications_women : num  39 39 9 166 62 78 126 409 260
 $ awards_total       : num  32 35 20 65 43 29 56 112 75
 $ awards_men         : num  22 26 18 33 30 12 38 65 46
 $ awards_women       : num  10 9 2 32 13 17 18 47 29
 $ success_rates_total: num  26.2 20.1 26.3 16.4 17.1 15.8 19.9 13.4 14.9
 $ success_rates_men  : num  26.5 19.3 26.9 14.3 15.9 11.4 24.4 15.3 18.8
 $ success_rates_women: num  25.6 23.1 22.2 19.3 21 21.8 14.3 11.5 11.2
head(research_funding_rates)
          discipline applications_total applications_men applications_women
1  Chemical sciences                122               83                 39
2  Physical sciences                174              135                 39
3            Physics                 76               67                  9
4         Humanities                396              230                166
5 Technical sciences                251              189                 62
6  Interdisciplinary                183              105                 78
  awards_total awards_men awards_women success_rates_total success_rates_men
1           32         22           10                26.2              26.5
2           35         26            9                20.1              19.3
3           20         18            2                26.3              26.9
4           65         33           32                16.4              14.3
5           43         30           13                17.1              15.9
6           29         12           17                15.8              11.4
  success_rates_women
1                25.6
2                23.1
3                22.2
4                19.3
5                21.0
6                21.8

Reshaping the data : convert from wide to long format

The original data set has two seperate columns for men’s and women’s success rates.

funding_rates_long <- research_funding_rates |>
  pivot_longer(cols  = c(success_rates_men, success_rates_women), ##columns to combine
               names_to = "gender", ##new column for the old column names
               values_to = "success_rate") ##new column for the actual success rate values

##I used hatecrimes tutorial here

Renaming gender values to make them easier to read in the legend

funding_rates_long <- funding_rates_long |>
  mutate(gender = case_when(gender=="success_rates_men" ~ "Men", gender == "success_rates_women" ~ "Women", TRUE ~ gender)) ##just because it is hard to read as success_rate_men and success_rate_women in legend

Visualization

ggplot(funding_rates_long,aes(x=reorder(discipline,success_rate),y= success_rate,color = gender, group = gender)) +
  geom_line(linewidth = 1)+
  geom_point(size =3)+
  scale_color_manual(values=c("Men" = "#27408B", "Women" = "#FF1493"),name = "Gender")+
  labs(title = "Research Funding Success Rates by Gender and Discipline (Netherlands)", x = "Discipline", y = "Success Rate(%)")+
  theme_minimal(base_size = 12, base_family = "serif") +  ##changed minimal theme with a serif font and size 12              
  theme(
    plot.title = element_text(face = "bold"), ##make title bold
    axis.text.x = element_text(angle = 45, hjust = 1), ##making x-axis labels in angle 45 and right align 
    legend.position = "top") ##move legend to above the plot, ##For theme changes, I used Iris's Project 1. 

Essay

The data set I used, research_funding_rates from the dslabs package, shows how male and female researchers in the Netherlands apply for and receive funding in different academic fields(disciplines). I reshaped the data from wide to long format so I could visualize how success rates changes between genders with each field. Then, I created a line-point plot where the x axis represents academic disciplines, the y axis shows the success rate (in percent), and color represents gender as a third variable. Each line connects the success rates for men and women within the same discipline, making it easy to see where the gaps appear.

After visualizing the data, I noticed that men generally have slightly higher success rates in most disciplines, which matches what many people expect when talking about gender bias in research funding. However, I was actually surprised too see that in the Technical and Physical Sciences, women’s success rates were higher than men’s. This result goes against the common belief that women are less successful in those areas, which made me think about how some fields maybe becoming more balanced or even giving women better chances than before. Overall, this data set helped me see that gender differences in research funding are not the same everywhere, some disciplines still show a gap, while others are showing positive change.