What questions do we want to ask of this data in regards to equity? To start at the highest level, how do admissions for URM students compare to non-URM students? My first stab:
For simplicity, to start I’m looking only at all BGS-level numbers and individual years. Can get into individual graduate groups and the panel data later.
We’ll use two-sample tests of proportions and only look at 2020 data.
For 2020, BGS actually interviewed a significantly higher proportion of URM applicants vs non-URM (95% CI for mean difference (0.012, 0.13) p = 0.01)), and admitted and matriculated people in similar proportions (p = 0.4 and p = 1 respectively).
But has it always been that way? Again, saving more sophisticated time series analyses for later - I’ll just look at the year 2009.
Note that this difference is not due to GPA - the mean GPA for URM applicants is lower than for non-URM applicants.
## urm n_applied n_interviewed failures
## 435 non_urm 1214 224 990
## 436 urm 278 71 207
##
## 2-sample test for equality of proportions with continuity
## correction
##
## data: as.matrix(test[, c(3:4)])
## X-squared = 6.7246, df = 1, p-value = 0.009509
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.12880491 -0.01295845
## sample estimates:
## prop 1 prop 2
## 0.1845140 0.2553957
## urm n_interviewed n_admitted failures
## 435 non_urm 224 184 40
## 436 urm 71 62 9
##
## 2-sample test for equality of proportions with continuity
## correction
##
## data: as.matrix(test[, c(3:4)])
## X-squared = 0.70424, df = 1, p-value = 0.4014
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.15330536 0.04968363
## sample estimates:
## prop 1 prop 2
## 0.8214286 0.8732394
## urm n_admitted n_matriculated failures
## 435 non_urm 184 81 103
## 436 urm 62 27 35
##
## 2-sample test for equality of proportions with continuity
## correction
##
## data: as.matrix(test[, c(3:4)])
## X-squared = 1.7545e-30, df = 1, p-value = 1
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.1427467 0.1522137
## sample estimates:
## prop 1 prop 2
## 0.4402174 0.4354839
In 2009, BGS interviewed and matriculated URM applicants in similar proportions to non-URM applicants. But of those they interviewed, they admitted a significantly lower proportion of URM folks (95% CI of mean difference (0.16, 0.53) p = 0.000).
We should definitely take a look at this step in the admissions process across all years.
## urm n_applied n_interviewed failures
## 391 non_urm 566 201 365
## 392 urm 77 33 44
##
## 2-sample test for equality of proportions with continuity
## correction
##
## data: as.matrix(test[, c(3:4)])
## X-squared = 1.2782, df = 1, p-value = 0.2582
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.19817897 0.05128346
## sample estimates:
## prop 1 prop 2
## 0.3551237 0.4285714
## urm n_interviewed n_admitted failures
## 391 non_urm 201 192 9
## 392 urm 33 20 13
##
## 2-sample test for equality of proportions with continuity
## correction
##
## data: as.matrix(test[, c(3:4)])
## X-squared = 36.576, df = 1, p-value = 1.468e-09
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.1623795 0.5359471
## sample estimates:
## prop 1 prop 2
## 0.9552239 0.6060606
## urm n_admitted n_matriculated failures
## 391 non_urm 192 78 114
## 392 urm 20 7 13
##
## 2-sample test for equality of proportions with continuity
## correction
##
## data: as.matrix(test[, c(3:4)])
## X-squared = 0.061882, df = 1, p-value = 0.8035
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.1916327 0.3041327
## sample estimates:
## prop 1 prop 2
## 0.40625 0.35000
It’s also a problem if BGS isn’t properly recruiting applicants, e.g., the applicant pool looks nothing like the overall US population.
For this portion, I used US Census data 2010-2019 (2020 doesn’t exist yet - I just used 2019). The Census race and ethnicity categories are what the OMB and also Penn use to determine URM status. I calculated the proportion of the US population that is considered ‘URM’ (people who identify as Black or African American, Hispanic/Latinx, American Indian or Alaska Native/Indigneous folks and Native Hawaiians and other Pacific Islanders) and compared it to the proportion of applicants categorized as such. You can use a one-sample test of proportions to do so.
The proportion of the overall BGS applicant pool that is URM is significantly lower than the US population proportion (95% CI of the difference (0.17, 0.21) p = 0.000).
## year gg metric all urm non_urm international natl_prop_urm
## 1 2020 all_bgs n_applied 2879 278 1214 1387 0.3340315
## app_prop_urm trials
## 1 0.1863271 1492
##
## 1-sample proportions test with continuity correction
##
## data: test$urm out of test$trials, null probability test$natl_prop_urm
## X-squared = 145.66, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.3340315
## 95 percent confidence interval:
## 0.1670637 0.2072288
## sample estimates:
## p
## 0.1863271