SFI - Gender Dashboard Analysis

Science Foundation Ireland

Science Foundation Ireland (SFI) funds oriented, basic, and applied research in science, engineering, and mathematics. SFI provides grants from around the world to those who wish to relocate to Ireland and are already based in Ireland, especially for outstanding investigators and collaborators within the industry. Their published data can be found on their website (https://www.sfi.ie/) and the dataset under examination in this project is their Gender Dashboard, which is a collection of grant application submission and success rates by gender since 2011.

The first step, after loading the data is to create two dummy variables of note; is_male, and is_awarded. These will be binary variables that read 1 id an applicant is Male and 1 if an award was granted respectively and 0 if not.

url <- "https://www.sfi.ie/funding/sfi-policies-and-guidance/gender/dashboard/SFIGenderDashboard-TableauPublic-2022.csv"
Data <- read.csv(url)

Data$Is_Male <-  ifelse(Data$Applicant.Gender == "Male", 1, 0)
Data$Is_Awarded <-  ifelse(Data$Award.Status == "Awarded", 1, 0)

Do we observe a group difference?

We’ll break this question down into 2 sub-questions

Do we see a difference in grant awarding rate?
Do we see a difference in amount granted?

We’ll then ask one follow-up question…

Is this difference constant across SFI Programmes?

Do we see a difference in grant awarding rates?

GAR <- round(mean(Data$Is_Awarded),3)*100

DataMale <- Data[Data$Is_Male == 1,]
DataNonMale <- Data[Data$Is_Male == 0,]

GARmale <- round(mean(DataMale$Is_Awarded),3)*100
GARnonmale<- round(mean(DataNonMale$Is_Awarded),3)*100

print(paste0(GAR,"% of all applicants were awarded grants"))

## [1] "30.3% of all applicants were awarded grants"

print(paste0(GARmale,"% of all male applicants were awarded grants"))

## [1] "30% of all male applicants were awarded grants"

print(paste0(GARnonmale,"% of all female applicants were awarded grants"))

## [1] "31.1% of all female applicants were awarded grants"

Do we see a differnce in amount requested?

AvgReq <- round(mean(na.omit(Data$Amount.Requested)))
AvgReqMale <- round(mean(na.omit(DataMale$Amount.Requested)))
AvgReqFemale <- round(mean(na.omit(DataNonMale$Amount.Requested)))

  print(paste0("The average grant requested was €",AvgReq))

## [1] "The average grant requested was €835523"

print(paste0("The average grant requested, from male applicants, was €",AvgReqMale))

## [1] "The average grant requested, from male applicants, was €959756"

print(paste0("The average grant requested, from female applicants, was €",AvgReqFemale))

## [1] "The average grant requested, from female applicants, was €537463"

Do we see a differnce in amount granted?

AvgGrant <- round(mean(na.omit(Data$Amount.Funded)))
AvgGrantMale <- round(mean(na.omit(DataMale$Amount.Funded)))
AvgGrantFemale <- round(mean(na.omit(DataNonMale$Amount.Funded)))

print(paste0("The average grant awarded was €",AvgGrant))

## [1] "The average grant awarded was €789734"

print(paste0("The average grant awarded, to male applicants, was €",AvgGrantMale))

## [1] "The average grant awarded, to male applicants, was €910421"

print(paste0("The average grant awarded, to female applicants, was €",AvgGrantFemale))

## [1] "The average grant awarded, to female applicants, was €510000"

DataAwarded <- Data[Data$Is_Awarded == 1,]
DataMaleAwarded <- DataMale[DataMale$Is_Awarded == 1,]
DataNonMaleAwarded <- DataNonMale[DataNonMale$Is_Awarded == 1,]

DataAwarded$FundPerc <- DataAwarded$Amount.Funded/DataAwarded$Amount.Requested
DataMaleAwarded$FundPerc <- 
DataMaleAwarded$Amount.Funded/DataMaleAwarded$Amount.Requested
DataNonMaleAwarded$FundPerc <- DataNonMaleAwarded$Amount.Funded/DataNonMaleAwarded$Amount.Requested

FundPerc <- round(mean(na.omit(DataAwarded$FundPerc))*100,2)
FundPercMale <- round(mean(na.omit(DataMaleAwarded$FundPerc))*100,2)
FundPercFemale <- round(mean(na.omit(DataNonMaleAwarded$FundPerc))*100,2)

paste0('Awarded Applicants had, on average, ',as.character(FundPerc),'% of their Funds Requested granted')

## [1] "Awarded Applicants had, on average, 96.59% of their Funds Requested granted"

paste0('Awarded Male Applicants had, on average, ',as.character(FundPercMale),'% of their Funds Requested granted')

## [1] "Awarded Male Applicants had, on average, 96.21% of their Funds Requested granted"

paste0('Awarded Female Applicants had, on average, ',as.character(FundPercFemale),'% of their Funds Requested granted')

## [1] "Awarded Female Applicants had, on average, 97.49% of their Funds Requested granted"

Is there statistical significance?

test <- table(Data$Is_Male, Data$Is_Awarded)
print(chisq.test(Data$Is_Male,Data$Is_Awarded))

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  Data$Is_Male and Data$Is_Awarded
## X-squared = 0.48222, df = 1, p-value = 0.4874

We can tell from this Chi-squared test that there is no significant statistical difference between the Acceptance Rate for Men and Women applicants as the p-value featured is greater than 0.05.

print(t.test(DataMaleAwarded$Amount.Requested,DataNonMaleAwarded$Amount.Requested))

## 
##  Welch Two Sample t-test
## 
## data:  DataMaleAwarded$Amount.Requested and DataNonMaleAwarded$Amount.Requested
## t = 2.9562, df = 1246.2, p-value = 0.003173
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  144224.1 713324.0
## sample estimates:
## mean of x mean of y 
##  964109.1  535335.1

We can tell from this Two Sample t-test that there is a significant statistical difference between the Funds Requested by Men and Women applicants as the p-value featured is less than 0.05.

print(t.test(DataMaleAwarded$Amount.Funded,DataNonMaleAwarded$Amount.Funded))

## 
##  Welch Two Sample t-test
## 
## data:  DataMaleAwarded$Amount.Funded and DataNonMaleAwarded$Amount.Funded
## t = 3.0439, df = 1266.2, p-value = 0.002383
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  142531.9 659347.1
## sample estimates:
## mean of x mean of y 
##  910939.5  510000.0

We can tell from this Two Sample t-test that there is a significant statistical difference between the Funds Awarded to Men and Women applicants as the p-value featured is less than 0.05.

We should take note however that since Acceptance Rate was not significant, Funds Awarded are therefore more than likely influenced by how much Funds were Requested by the applicants.

t.test(DataMaleAwarded$FundPerc,DataNonMaleAwarded$FundPerc)

## 
##  Welch Two Sample t-test
## 
## data:  DataMaleAwarded$FundPerc and DataNonMaleAwarded$FundPerc
## t = -2.2631, df = 947.73, p-value = 0.02385
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.023889740 -0.001699768
## sample estimates:
## mean of x mean of y 
## 0.9620677 0.9748625

We can tell from this Two Sample t-test that there is a significant statistical difference between the % of Funds Awarded to Men and Women applicants as the p-value featured is less than 0.05.

Noticably this is an inverse relationship, indicating that Women applicants are more likely to feature a greater % of their Requested Funds awarded.

Testing Statistical Significance across Programmes

3. Is this difference constant across SFI Programmes?

To go further into detail we can run a statistical significance test on each programme, and assess whether this significance remains constant throughout any (or all) of them.

SFI_Awards <- unique(Data$Programme.Name)

Awards_Datasets <- list()

#Creating a loop to seperate all the awards into their own sets.
for (Award in SFI_Awards) {
  subset_data <- subset(Data, Programme.Name == Award)
  assign(paste0("Data_", gsub(" ", "_", Award)), subset_data)
  Awards_Datasets[[Award]] <- subset_data
}

#Creating a Similiar Loop for Male and NonMale Datasets
MaleAwards_Datasets <- list()
NonMaleAwards_Datasets <- list()

for (Award in SFI_Awards) {
  subset_data <- subset(DataMale, Programme.Name == Award)
  assign(paste0("DataMale_", gsub(" ", "_", Award)), subset_data)
  MaleAwards_Datasets[[Award]] <- subset_data
}

for (Award in SFI_Awards) {
  subset_data <- subset(DataNonMale, Programme.Name == Award)
  assign(paste0("DataMale_", gsub(" ", "_", Award)), subset_data)
  NonMaleAwards_Datasets[[Award]] <- subset_data
}

The next step, similar to how we calculated the Average Request statistic for the group as a whole is to calculate the Average Requested statistic for each of the Awards.

To assist with this we’ll create a new summary dataset, with one row corresponding to each Award.

Awards_Data <- data.frame(matrix(nrow = 20, ncol = 4))
colnames(Awards_Data) <- c("Award.Name","Avg.Amount.Requested",
                           "Male.Avg.Amount.Requested",
                           "NonMale.Avg.Amount.Requested")
Awards_Data$Award.Name <- SFI_Awards

for (i in (1:20)){
  Awards_Data[i,2] <- round(mean(Awards_Datasets[[i]]$Amount.Requested, na.rm = TRUE))
  Awards_Data[i,3] <- round(mean(MaleAwards_Datasets[[i]]$Amount.Requested, na.rm = TRUE))
  Awards_Data[i,4] <- round(mean(NonMaleAwards_Datasets[[i]]$Amount.Requested, na.rm = TRUE))
}

We’ll then run a significance test across the programmes, and add the p-values into the Summary Table for further examination.

p_values <- list()

Results <- by(
  Data$Amount.Requested,
  Data$Programme.Name,
  function(x) t.test(x, Data$Amount.Requested)
)


for (i in 1:20){
p_values = c(p_values,Results[[Awards_Data[i,1]]][3])
}

Awards_Data$'p-value' <- p_values

print(Awards_Data[c(1,5)])

##                                                     Award.Name      p-value
## 1                                         SFI Research Centres 1.059259e-21
## 2  SFI Investigator Programme/Principal Investigator Programme 8.426528e-07
## 3                               SFI Investigator Project Award 8.518468e-48
## 4              SFI President of Ireland Young Researcher Award   0.09534437
## 5                                 SFI Career Development Award  4.14731e-17
## 6                     SFI Starting Investigator Research Grant 1.283107e-24
## 7                  SFI Technology Innovation Development Award 1.938075e-66
## 8                                      SFI Industry Fellowship  1.13504e-71
## 9                                             SFI Spokes Fixed  0.004912233
## 10                                  SFI Research Professorship 6.760822e-23
## 11            SFI President of Ireland Future Research Leaders  1.03431e-14
## 12                       SFI Science Policy Research Programme 1.880305e-05
## 13                           SFI Centres for Research Training 6.719711e-11
## 14                                SFI Frontiers for the Future  2.44852e-15
## 15                                  SFI Future Innovator Prize  1.03794e-34
## 16                     SFI Public Service Fellowship Programme 1.447658e-73
## 17  SFI COVID-19 Rapid Response Funding Programme 2020-Phase 1 3.516443e-61
## 18  SFI COVID-19 Rapid Response Funding Programme 2020-Phase 2 3.583113e-30
## 19                       SFI Industry RDI Fellowship Programme 4.453802e-71
## 20                                   SFI-IRC Pathway Programme 6.887293e-24

We can see from the summary table that the only programme which features a p-value greater than 0.05 is the SFI President of Ireland Young Researcher Award, meaning this is the only award which features no significant statistical difference between the Funds Requested by male applicants and female applicants.

Also of note is the SFI Spokes Fixed award, which features a p-value of 0.05. This means the SFI Spokes Fixed does feature statistical significance between Funds Requested across gender, but only to a p-value of 0.05. If we were adopting a stricter significance level and decided on a p-value of 0.025 or 0.01, then the diffrence would not be statistically significant.