library(tidyverse)
## -- Attaching packages --------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.1 v purrr 0.3.2
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 0.8.3 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## Warning: package 'ggplot2' was built under R version 3.6.2
## Warning: package 'stringr' was built under R version 3.6.3
## -- Conflicts ------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(dslabs)
## Warning: package 'dslabs' was built under R version 3.6.3
library(ggsci)
## Warning: package 'ggsci' was built under R version 3.6.3
data("research_funding_rates")
str(research_funding_rates)
## 'data.frame': 9 obs. of 10 variables:
## $ discipline : chr "Chemical sciences" "Physical sciences" "Physics" "Humanities" ...
## $ applications_total : num 122 174 76 396 251 183 282 834 505
## $ applications_men : num 83 135 67 230 189 105 156 425 245
## $ applications_women : num 39 39 9 166 62 78 126 409 260
## $ awards_total : num 32 35 20 65 43 29 56 112 75
## $ awards_men : num 22 26 18 33 30 12 38 65 46
## $ awards_women : num 10 9 2 32 13 17 18 47 29
## $ success_rates_total: num 26.2 20.1 26.3 16.4 17.1 15.8 19.9 13.4 14.9
## $ success_rates_men : num 26.5 19.3 26.9 14.3 15.9 11.4 24.4 15.3 18.8
## $ success_rates_women: num 25.6 23.1 22.2 19.3 21 21.8 14.3 11.5 11.2
summary(research_funding_rates)
## discipline applications_total applications_men applications_women
## Length:9 Min. : 76.0 Min. : 67.0 Min. : 9
## Class :character 1st Qu.:174.0 1st Qu.:105.0 1st Qu.: 39
## Mode :character Median :251.0 Median :156.0 Median : 78
## Mean :313.7 Mean :181.7 Mean :132
## 3rd Qu.:396.0 3rd Qu.:230.0 3rd Qu.:166
## Max. :834.0 Max. :425.0 Max. :409
## awards_total awards_men awards_women success_rates_total
## Min. : 20.00 Min. :12.00 Min. : 2.00 Min. :13.4
## 1st Qu.: 32.00 1st Qu.:22.00 1st Qu.:10.00 1st Qu.:15.8
## Median : 43.00 Median :30.00 Median :17.00 Median :17.1
## Mean : 51.89 Mean :32.22 Mean :19.67 Mean :18.9
## 3rd Qu.: 65.00 3rd Qu.:38.00 3rd Qu.:29.00 3rd Qu.:20.1
## Max. :112.00 Max. :65.00 Max. :47.00 Max. :26.3
## success_rates_men success_rates_women
## Min. :11.4 Min. :11.20
## 1st Qu.:15.3 1st Qu.:14.30
## Median :18.8 Median :21.00
## Mean :19.2 Mean :18.89
## 3rd Qu.:24.4 3rd Qu.:22.20
## Max. :26.9 Max. :25.60
long <- research_funding_rates %>%
gather(applications,totals,applications_total:success_rates_women)
success_rates <- long %>%
filter(applications %in% c("success_rates_men","success_rates_women"))
funding_rate_plot <- success_rates %>%
ggplot(aes(x=discipline,y=totals,fill=applications)) +
geom_bar(stat="identity",position="dodge") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
scale_fill_lancet() +
labs(x="Discipline",y="Success Rate (%)",title="Funding Success Rates by Gender and Discipline")
funding_rate_plot
This chart shows a pretty good parity between the success rates of men and women across different scientific disciplines. There is certainly room for improvement, but there are no fields that absolutely exclude one gender.
However, funding rates don’t tell the whole story.
application_rates <- long %>%
filter(applications %in% c("applications_men","applications_women","awards_men","awards_women")) %>%
mutate(gender = if_else(applications %in% c("applications_women","awards_women"),"Women","Men")) %>%
mutate(type = if_else(applications %in% c("applications_women","applications_men"),"Applications","Awards"))
application_rate_plot <- application_rates %>%
ggplot(aes(x=gender,y=totals,fill=type)) +
facet_wrap(~discipline) +
geom_bar(stat="identity",position="identity",alpha=0.5) +
theme(
strip.text.x = element_text(size = 10),
legend.title = element_blank()) +
scale_fill_lancet() +
labs(x="Discipline",y="Number of Applications/Awards",title="Applications and Awards by Gender and Discipline")
application_rate_plot
This shows us that while there is a relatively good balance of gender in award rate, the overall participation by women is lower in all except the medical sciences, and in some cases (Physics, Chemical Sciences, Technical Sciences) the gap is extreme. This suggests a “pipeline” issue, where women are not being recruited into the sciences, they are not completing higher education in STEM fields, they are not pursuing academic or research careers after graduation, and/or they are not being given the opportunity to lead projects as Principal Investigator.
Notes on plot design:
facet_wrap() instead of facet_grid() to get the discipline titles to show.position="identity") because that shows the Applications that became Awards.stat="identity" because the underlying data is aggregated. Normally I would use stat="count" because I typically am dealing with actual award data instead of pre-aggregated data.