Dslabs Assignment

Author

Martia Eyi

Load required packages

To beguin, I will load the necessary packages along with the dataset from Dslabs

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dslabs)
data(package="dslabs")
list.files(system.file("script",package = "dslabs"))

 [1] "make-admissions.R"                   
 [2] "make-brca.R"                         
 [3] "make-brexit_polls.R"                 
 [4] "make-calificaciones.R"               
 [5] "make-death_prob.R"                   
 [6] "make-divorce_margarine.R"            
 [7] "make-gapminder-rdas.R"               
 [8] "make-greenhouse_gases.R"             
 [9] "make-historic_co2.R"                 
[10] "make-mice_weights.R"                 
[11] "make-mnist_127.R"                    
[12] "make-mnist_27.R"                     
[13] "make-movielens.R"                    
[14] "make-murders-rda.R"                  
[15] "make-na_example-rda.R"               
[16] "make-nyc_regents_scores.R"           
[17] "make-olive.R"                        
[18] "make-outlier_example.R"              
[19] "make-polls_2008.R"                   
[20] "make-polls_us_election_2016.R"       
[21] "make-pr_death_counts.R"              
[22] "make-reported_heights-rda.R"         
[23] "make-research_funding_rates.R"       
[24] "make-stars.R"                        
[25] "make-temp_carbon.R"                  
[26] "make-tissue-gene-expression.R"       
[27] "make-trump_tweets.R"                 
[28] "make-weekly_us_contagious_diseases.R"
[29] "save-gapminder-example-csv.R"

Load and preview the dataset

I choose the ‘research_funding_rates’ dataset from the ‘dslabs’ package. In this section, I will load and preview its structure to see which variables are available.

library(ggthemes)
library(ggrepel)
data("research_funding_rates")

Review the dataset

head(research_funding_rates)

          discipline applications_total applications_men applications_women
1  Chemical sciences                122               83                 39
2  Physical sciences                174              135                 39
3            Physics                 76               67                  9
4         Humanities                396              230                166
5 Technical sciences                251              189                 62
6  Interdisciplinary                183              105                 78
  awards_total awards_men awards_women success_rates_total success_rates_men
1           32         22           10                26.2              26.5
2           35         26            9                20.1              19.3
3           20         18            2                26.3              26.9
4           65         33           32                16.4              14.3
5           43         30           13                17.1              15.9
6           29         12           17                15.8              11.4
  success_rates_women
1                25.6
2                23.1
3                22.2
4                19.3
5                21.0
6                21.8

Prepare the dataset

here, I will select relevant variables. I will focuc on ‘discipline’, ‘success_rates_total’, and ‘applications_total’. Then, reove any rows that contain missing values and convert the ‘discipline’ column a factor for better visual grouping.

df <- research_funding_rates %>%
  select(discipline, success_rates_total, applications_total) %>%
  filter(
    !is.na(success_rates_total),
    !is.na(applications_total),
    !is.na(discipline)
  ) %>%
  mutate(discipline = as.factor(discipline))

Create the scatterplot

In this final step, I will create a scatterplot to show the correlation between the total number of application and the overall success rate.

ggplot(df, aes(x = applications_total, y = success_rates_total, color = discipline)) +
  geom_point(size = 4, alpha = 0.8) +
  labs(
    title = "Applications vs Success Rate by Discipline",
    x = "Total Applications Submitted",
    y = "Total Success Rate (%)",
    color = "Discipline"
  ) +
  theme_economist() +
  scale_color_brewer(palette = "Set1")

Description

For this assignemnt, I chose to explore the research_funding_rates dataset from the dslabs package. This dataset offers detailed information on the number of research grant applications and awards across various academic disciplines, along with overall success rates and gender-specific statistics. I was drawn to this dataset because it addresses a relevant and complex topic: the accessibility and competitiveness of research funding in academia. I was particularly interested in understanding how the volume of applications in each discipline relates to the rate at which those applications are successful. After loading the necessary packages, I previewed the dataset and selected three variables for the graph: the total number of applications, the overall success rate and the discipline.

Once the data was cleaned and ready, I created a scatterplot using ggplot to visualize the relationship between application volume and funding success rate, with discipline used as the third variable indicated by color. The x-axis represents the total number of applications per discipline, and the y-axis shows the corresponding success rate. To enhance the clarity and professionalism of the visualization, I used the theme_economist() from the ggthemes package and applied a customized color scheme with scale_color_brewer(). The plot reveals meaningful trends: disciplines with fewer applications, such as physical sciences or humanities, often demonstrate higher success rates, while heavily applied-to fields like engineering and health sciences appear to have lower or more moderate success rates. This suggests that success in securing funding is not only about the merit of the application but also about how competitive the field is.

Ultimately, the graph highlights how funding dynamics can vary widely across academic disciplines, offering insight into structural imbalances and informing conversations around research equity and funding policies.