What questions do we want to ask of this data in regards to equity? To start at the highest level, how do admissions for URM students compare to non-URM students? My first stab:

  • Is there disparate attrition at any step of the admissions process (URM vs non-URM)?
  • How does the BGS applicant pool comapre to the demographic composition of the US?

For simplicity, to start I’m looking only at all BGS-level numbers and individual years. Can get into individual graduate groups and the panel data later.


Figures

Alluvial plots

 

 

Applicants 2018-2020: All BGS

 

 

URM applicants 2018-2020: All BGS

 

EA Admissions

 

 

EA applicants by year

 

 

EA applicants: table 1

 

 

Two-sample bar graphs

 

BMB applicants (2018-2020)

 

 

CAMB applicants (2018-2020)

 

 

GCB applicants (2018-2020)

 

 

GGEB-EPID applicants (2018-2020)

 

 

GGEB-BSTA applicants (2018-2020)

 

 

IGG applicants (2018-2020)

 

 

PGG applicants (2018-2020)

 

 

NGG applicants (2018-2020)

 

Disparate attrition

Year: 2020

 

We’ll use two-sample tests of proportions and only look at 2020 data.

For 2020, BGS actually interviewed a significantly higher proportion of URM applicants vs non-URM (95% CI for mean difference (0.012, 0.13) p = 0.01)), and admitted and matriculated people in similar proportions (p = 0.4 and p = 1 respectively).

But has it always been that way? Again, saving more sophisticated time series analyses for later - I’ll just look at the year 2009.

Note that this difference is not due to GPA - the mean GPA for URM applicants is lower than for non-URM applicants.

 

Interviewed/applied

##         urm n_applied n_interviewed failures
## 435 non_urm      1214           224      990
## 436     urm       278            71      207
## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  as.matrix(test[, c(3:4)])
## X-squared = 6.7246, df = 1, p-value = 0.009509
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.12880491 -0.01295845
## sample estimates:
##    prop 1    prop 2 
## 0.1845140 0.2553957

 

 

Admitted/interviewed

##         urm n_interviewed n_admitted failures
## 435 non_urm           224        184       40
## 436     urm            71         62        9
## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  as.matrix(test[, c(3:4)])
## X-squared = 0.70424, df = 1, p-value = 0.4014
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.15330536  0.04968363
## sample estimates:
##    prop 1    prop 2 
## 0.8214286 0.8732394

 

 

Matriculated/admitted

##         urm n_admitted n_matriculated failures
## 435 non_urm        184             81      103
## 436     urm         62             27       35
## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  as.matrix(test[, c(3:4)])
## X-squared = 1.7545e-30, df = 1, p-value = 1
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.1427467  0.1522137
## sample estimates:
##    prop 1    prop 2 
## 0.4402174 0.4354839

 

 

Year: 2009

 

In 2009, BGS interviewed and matriculated URM applicants in similar proportions to non-URM applicants. But of those they interviewed, they admitted a significantly lower proportion of URM folks (95% CI of mean difference (0.16, 0.53) p = 0.000).

We should definitely take a look at this step in the admissions process across all years.

 

Interviewed/applied

##         urm n_applied n_interviewed failures
## 391 non_urm       566           201      365
## 392     urm        77            33       44
## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  as.matrix(test[, c(3:4)])
## X-squared = 1.2782, df = 1, p-value = 0.2582
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.19817897  0.05128346
## sample estimates:
##    prop 1    prop 2 
## 0.3551237 0.4285714

 

 

Admitted/interviewed

##         urm n_interviewed n_admitted failures
## 391 non_urm           201        192        9
## 392     urm            33         20       13
## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  as.matrix(test[, c(3:4)])
## X-squared = 36.576, df = 1, p-value = 1.468e-09
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.1623795 0.5359471
## sample estimates:
##    prop 1    prop 2 
## 0.9552239 0.6060606

 

 

Matriculated/admitted

##         urm n_admitted n_matriculated failures
## 391 non_urm        192             78      114
## 392     urm         20              7       13
## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  as.matrix(test[, c(3:4)])
## X-squared = 0.061882, df = 1, p-value = 0.8035
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.1916327  0.3041327
## sample estimates:
##  prop 1  prop 2 
## 0.40625 0.35000

 

 

US population comparison

 

It’s also a problem if BGS isn’t properly recruiting applicants, e.g., the applicant pool looks nothing like the overall US population.

For this portion, I used US Census data 2010-2019 (2020 doesn’t exist yet - I just used 2019). The Census race and ethnicity categories are what the OMB and also Penn use to determine URM status. I calculated the proportion of the US population that is considered ‘URM’ (people who identify as Black or African American, Hispanic/Latinx, American Indian or Alaska Native/Indigneous folks and Native Hawaiians and other Pacific Islanders) and compared it to the proportion of applicants categorized as such. You can use a one-sample test of proportions to do so.

The proportion of the overall BGS applicant pool that is URM is significantly lower than the US population proportion (95% CI of the difference (0.17, 0.21) p = 0.000).

 

2020 applicant pool

##   year      gg    metric  all urm non_urm international natl_prop_urm
## 1 2020 all_bgs n_applied 2879 278    1214          1387     0.3340315
##   app_prop_urm trials
## 1    0.1863271   1492
## 
##  1-sample proportions test with continuity correction
## 
## data:  test$urm out of test$trials, null probability test$natl_prop_urm
## X-squared = 145.66, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.3340315
## 95 percent confidence interval:
##  0.1670637 0.2072288
## sample estimates:
##         p 
## 0.1863271