DS Labs HW

Libraries and Data

library(tidyverse)
Warning: package 'ggplot2' was built under R version 4.4.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dslabs)
Warning: package 'dslabs' was built under R version 4.4.3
library(highcharter)
Warning: package 'highcharter' was built under R version 4.4.3
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 

Attaching package: 'highcharter'

The following object is masked from 'package:dslabs':

    stars
library(plyr)
Warning: package 'plyr' was built under R version 4.4.3
------------------------------------------------------------------------------
You have loaded plyr after dplyr - this is likely to cause problems.
If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
library(plyr); library(dplyr)
------------------------------------------------------------------------------

Attaching package: 'plyr'

The following objects are masked from 'package:dplyr':

    arrange, count, desc, failwith, id, mutate, rename, summarise,
    summarize

The following object is masked from 'package:purrr':

    compact
data(research_funding_rates)
research_funding_rates
           discipline applications_total applications_men applications_women
1   Chemical sciences                122               83                 39
2   Physical sciences                174              135                 39
3             Physics                 76               67                  9
4          Humanities                396              230                166
5  Technical sciences                251              189                 62
6   Interdisciplinary                183              105                 78
7 Earth/life sciences                282              156                126
8     Social sciences                834              425                409
9    Medical sciences                505              245                260
  awards_total awards_men awards_women success_rates_total success_rates_men
1           32         22           10                26.2              26.5
2           35         26            9                20.1              19.3
3           20         18            2                26.3              26.9
4           65         33           32                16.4              14.3
5           43         30           13                17.1              15.9
6           29         12           17                15.8              11.4
7           56         38           18                19.9              24.4
8          112         65           47                13.4              15.3
9           75         46           29                14.9              18.8
  success_rates_women
1                25.6
2                23.1
3                22.2
4                19.3
5                21.0
6                21.8
7                14.3
8                11.5
9                11.2

Prep Data

# Calculate applications and awards as percentages
research_funding_rates <- research_funding_rates |>
  mutate(applications_women_rate = applications_women/applications_total * 100) |>
  mutate(applications_men_rate = applications_men/applications_total * 100) |>
  mutate(awards_women_rate = awards_women/awards_total * 100) |>
  mutate(awards_men_rate = awards_men/awards_total * 100)

# Format data
research_long <- research_funding_rates |>
  pivot_longer(cols = 2:14, names_to = "measurements", values_to = "values") |>
  # Filter out totals
  filter(grepl("rate", measurements) & !grepl("total", measurements))

# Rename Variables
research_long$measurements <- 
  revalue(research_long$measurements, c("success_rates_women" = "Success Percent Women",
                                        "success_rates_men" = "Success Percent Men",
                                        "awards_women_rate" = "Award Percent Women",
                                        "awards_men_rate" = "Award Percent Men",
                                        "applications_women_rate" = "Application Percent Women",
                                        "applications_men_rate" = "Application Percent Men"))

Final Visualization

hchart(research_long, "heatmap", hcaes(x = discipline, y = measurements, value = values), name="Percent") |>
  hc_xAxis(title = list(text = "")) |>
  hc_yAxis(title = list(text = "")) |>
  hc_title(text = "Women vs Men in Research Funding") |>
  hc_caption(text = "van der Lee R, Ellemers N. Gender contributes to personal research funding success in The Netherlands. Proc Natl Acad Sci U S A. 2015 Oct 6") |>
  hc_colorAxis(stops = color_stops(n = 5, colors = c("#000004", "#57106e", "#bc3754", "#f98e09", "#fcffa4")))

The data set I chose gave the total applicants, awards, and success rates for funding in a variety of scientific disciplines. It also gave these variables when separating applicants by gender. It is slightly unclear but I believe success rates refers to success in receiving funding. I calculated what percentage of the applicants and reward recipients were either men or woman to make it easier to compare between genders. I then made a heatmap with highcharter with scientific discipline on the x-axis. Based on the heatmap it is clear that some disciplines have far more even gender distributions than others (social sciences is nearly half and half while physics consists of around 10% women). Success rates are largely similar across disciplines and between the genders. I would like to improve this plot by rounding the numbers in the tooltip and removing the “x, y:” from the tooltip.