Statistical Note

There is a La Griffe Du Lion note on this method (http://www.lagriffedulion.f2s.com/g.htm; see previously: https://rpubs.com/JLLJ/threshold, http://www.lagriffedulion.f2s.com/excel.htm). These quotes are instructive:

A single, Gaussian-distributed variable uniquely determines the dense rank order of performance on large-scale standardized exams. The variable completely accounts for group differences in test performance.

Let me introduce you to diversity space. It is a construct I find convenient for comparing group mental abilities. Each point in diversity space specifies the proportions of two groups that attain some threshold of cognitive achievement. Look at Figure 1. There, the point (0.6, 0.4) is represented in the diversity space of Groups A and B. It corresponds to 60% of Group A and 40% of Group B reaching some unspecified threshold of cognitive achievement.

linspac = seq(0, 1, length.out = 2) #Equality line
par(mar = c(4, 4, 1.5, 0.8), bg = "#FBEEE6") #par(mar = c(5.1, 4.1, 4.1, 2.1)), default
plot(linspac, linspac, type = "l", lty = 3, main = "Diversity Space", xlab = "Fraction of Group A Above Threshold", ylab = "Fraction of Group B Above Threshold", xaxs="i", yaxs="i")
grid()
points(.6, .4)

In diversity space, specific thresholds of achievement are not specified. The point (0.6, 0.4) could arise from any mental challenge met by 60% of Group A and 40% of Group B. Such a challenge might, for example, be passing the bar exam or reaching 1200 on the SAT. The possibilities are countless, the details not important. Points in diversity space are not test specific. They are ability specific.

After a bit of skipping about

Could you spell out more about how the curves in Figure 3 were generated?

Here’s the short version. (I’ll provide a bit more detail in the Appendix.) Let PA be the probability density function of the g distribution in Group A, and fA the fraction of group A that succeeds in performing some cognitive task. The minimum value of g, say Λ, needed to succeed is given implicitly by the relation:

\[f_A = \int_A^\infty P_Adx\]

with a similar relation for Group B:

\[f_B = \int_B^\infty P_Bdx\]

The pair of equations (1) and (2) taken together relate fB to fA parametrically. The curves in Figure 3 were generated for PA and PB Gaussian.

I now assert that a single Gaussian-distributed latent variable, plausibly g, uniquely determines the dense rank order of performance on large-scale standardized exams. There is a simple test of this proposition. If it is true, there will exist a single curve, derived from Gaussian distributions of g, upon which all observed points in diversity space, irrespective of their source, must lie. We merely have to find that curve.

Consider first the diversity space of blacks and Non-Hispanic whites. Seven points, chosen to span the range of proportions in diversity space, were obtained from five different large-scale standardized tests. Here is a brief description of the tests and sample sizes when available.

Skipping

Although the observed points in diversity space were obtained from five different standardized tests they all lie close to the theoretical curve predicted by Gaussian distributions of g for both whites and blacks. Adjusted parameter values yielded a mean white-black difference of 1.09 SD equivalent to 16 IQ points in favor of whites and a variance ratio (B/W) of 0.888.

Skipping

Given that the distribution of g converges to normality in large populations, the g-distribution probability density function will be approximated by:

\[P(X; \mu, \sigma) = \frac{1}{\sigma \sqrt{2\pi}}e^{-(x-\mu)^2/2\sigma^2}\]

where μ and σ are the mean and standard deviation, respectively. We choose for convenience the unit of g to be one standard deviation of the Group A distribution. Also, since g has no absolute zero, we are free to set it as we wish. Accordingly, we choose the zero of g to be its mean value in Group A.

With these simplifications the fraction of Group A that reaches or exceeds some cognitive threshold, Λ, is:

\[f_A = \int_\Lambda^\infty P(x;0,1)dx\]

with a similar expression for the corresponding fraction of Group B:

\[f_B = \int_\Lambda^\infty P(x;\Delta,\rho)dx\]

In (A.2) and (A.3), \(\Delta\) is the mean g difference (A - B) between Groups A and B, and \(\rho\) is the dimensionless ratio of standard deviations, \(\frac{\sigma B}{\sigma A}\).

Equations (A.2) and (A.3) relate fA and fB parametrically. For a given value of fA, \(\Lambda\) may be obtained numerically from (A.2). This value in (A.3) returns the fraction fB. The curves in Figure 3 were generated in this way for various values of \(\rho\) and \(\Delta\). For the analysis of data, whites and males correspond to Group A, blacks and females to Group B. Values of fB calculated from (A.3) were fit to observed values by adjusting the parameters \(\Delta\) and \(\rho\) to satisfy the least squares criterion.

La Griffe also provided a way to use this information to know the probability that a random person from group A or group B ranked higher on the attribute being modeled. Click the link to see more. Now, with all of this said, the code to use La Griffe’s method is given above.

MethodOfThresholds <- function(G1, G2, #Category proportions, high to low.
                               epsilon = .00001, #To avoid unfittable values of 0 or 1
                               deltaS = 0, #Starting value of \Delta for iterations, equal means, see above
                               rhoS = 1, #Starting value of \rho for iterations, equal SDs, see above,
                               sum = T, #Whether the numbers are part of a range that sums to 1
                               rnd = 3,
                               DG = c("GDelta", "Cd")){ #Glass' Delta and SDR or Cohen's d and pooled SD. Hedge's *g* can also be added, and I might later, but it would require vectors of N's for sum = F, so I haven't attempted it yet
  DG = ifelse(is.na(DG), "GDelta", DG)
  colMax <- function(data) sapply(as.data.frame(data), max, na.rm = TRUE)
  if (sum == T) {G1C = cumsum(G1); G2C = cumsum(G2)} else {G1C = G1; G2C = G2}
  if (sum == T) {G1M = mapply('/', G1C, colMax(G1C)); G2M = mapply('/', G2C, colMax(G2C))} else {G1M = G1C; G2M = G2C}
  GDF = rbind(as.numeric(unlist(G1M)), as.numeric(unlist(G2M)))
  GDF[which(GDF == 1)] = GDF[which(GDF == 1)] - epsilon
  GDF[which(GDF == 0)] = GDF[which(GDF == 0)] + epsilon
  P_G1 = GDF[1,]
  P_G2 = GDF[2,]
  f0 = function(t){ #Standard normal density function
    1/sqrt(2*pi)*exp(-t^2/2)}
  thresholds = rep(0, length(P_G1))
  for(i in 1:length(P_G1)){
    F0_temp = function(t){
      -P_G1[i] + integrate(f0, t, Inf)$value}
    thresholds[i] = uniroot(F0_temp, lower = -10, upper = 10)$root}
  F_fit = function(t, delta, rho){
    f0_full = function(x, mu = delta, sigma = rho){
      1/sqrt(2*pi*sigma^2)*exp(-(x-mu)^2/(2*sigma^2))}
    outp = rep(0, length(t))
    for(i in 1:length(outp)){
      outp[i] = integrate(f0_full,t[i],Inf)$value}
    return(outp)}
  fitLS1 = nls(P_G2 ~ F_fit(thresholds, delta, rho), start = list(delta = deltaS, rho = rhoS))
  deltaF = unname(coef(fitLS1)[1]) #Mean difference (in SD units)
  rhoF = unname(coef(fitLS1)[2]) #Standard deviation ratio
  out = c(deltaF, rhoF)
  thresholds = rep(0, length(P_G1))
  for(i in 1:length(P_G1)){
    F0_temp = function(t){
      -P_G2[i] + integrate(f0, t, Inf)$value}
    thresholds[i] = uniroot(F0_temp, lower = -10, upper = 10)$root}
  fitLS2 = nls(P_G1 ~ F_fit(thresholds, delta, rho), start = list(delta = -deltaS, rho = 1/rhoS))
  deltaOF = unname(coef(fitLS2)[1]) 
  rhoOF = unname(coef(fitLS2)[2]) 
  rhoOF = 1/rhoOF
  deltaOF = -deltaOF*rhoOF
  out = c((deltaF + deltaOF)/2, (rhoF + rhoOF)/2)
  #suppressWarnings(if (DG == "Cd") {out[2] = sqrt((1 + out[2]^2)/2); out[1] = out[1]/out[2]}) #Warnings suppressed out of vectorization laziness
  suppressWarnings(if (DG == "Cd") {out[3] = sqrt((1 + out[2]^2)/2); out[1] = out[1]/out[3]}) #Pooled SD, then mean difference/PSD
  out = round(out[1:2], rnd)
  names(out) = suppressWarnings(if (DG == "GDelta") {c("Glass' Delta", "SD Ratio")} else {c("Cohen's d", "SD Ratio")})
  return(out)}

DiversitySpacePlot <- function(x, Delta, rho, 
                               xlabel = "Fraction of Group A Achieving Threshold", 
                               ylabel = "Fraction of Group B Achieving Threshold"){
  linspac = seq(0,1, length.out = 2)
  plot(linspac, linspac, type = "l", lty = 3, main = "Diversity Space", xlab = xlabel, ylab = ylabel)
  points(x[1,], x[2,], col = "red", pch = 16)
  F_fit = function(t,delta, rho){
    f0_full = function(x, mu = delta, sigma = rho){
      1/sqrt(2*pi*sigma^2)*exp(-(x-mu)^2/(2*sigma^2))}
    outp = rep(0, length(t))
    for(i in 1:length(outp)){
      outp[i] = integrate(f0_full,t[i],Inf)$value}
    return(outp)}
  tvals = seq(-5,5,length.out = 500)
  xvals = rep(0,length(tvals))
  yvals = rep(0,length(tvals))
  for(i in 1:length(tvals)){ # Create curve
    xvals[i] = F_fit(tvals[i], 0, 1)
    yvals[i] = F_fit(tvals[i], Delta, rho)}
  lines(xvals, yvals, col = "blue", lwd = 2)}

Rationale

As part of the partnership between College Board and the Michigan Department of Education (https://satsuite.collegeboard.org/state-partnerships/michigan) for the purposes of conducting the Michigan Merit Examination, the College Board has released the scores for around 85% of Michigan’s total public high school senior population (i.e., the Class of 2022, or 2021-2022 school year seniors, numbering approximately 107,000 total with 90,642 in sample; https://web.archive.org/web/20221004183404/https://reports.collegeboard.org/media/pdf/2022-michigan-sat-suite-of-assessments-annual-report.pdf). They have stratified the scores for various groups - males, females, American Indians, Asians, African Americans, Hispanics, Native Hawaiians, Whites, and Two or More Races - into brackets by total and section scores. They have not provided the means and standard deviations for these groups, but no fear - with the method described above and our handy-dandy code at the ready, we can calculate what those might be.

The two most talked-about gaps we can assess in the available data are the (1) Black-White total score gap, and (2) the male-female mathematics score gap.

Method Test

The following is a demonstration of this method using Michigan’s mathematics SAT data for males and females (see, here: https://web.archive.org/web/20221004183404/https://reports.collegeboard.org/media/pdf/2022-michigan-sat-suite-of-assessments-annual-report.pdf, PDF page 8, middle of the page).

Females = c(.04, .11, .3, .36, .19, .01) #N = 45,504, high part of range to low
Males   = c(.06, .14, .29, .31, .19, .01) #N = 44,938

pars = MethodOfThresholds(Females, Males)
par(mar = c(4, 4, 1.5, 0.8), bg = "#FBEEE6")
DiversitySpacePlot(rbind(cumsum(Females), 
                         cumsum(Males )), 
                   pars[1], pars[2],
                   xlabel = "Fraction of Females Achieving Threshold",
                   ylabel = "Fraction of Males Achieving Threshold")
grid()

pars; pars[2]^2 #Variance ratio, because \rho is technically the SDR
## Glass' Delta     SD Ratio 
##        0.101        1.111
## SD Ratio 
## 1.234321

The following is a demonstration of this method using Michigan’s total SAT data for Blacks and Whites (see, here: https://web.archive.org/web/20221004183404/https://reports.collegeboard.org/media/pdf/2022-michigan-sat-suite-of-assessments-annual-report.pdf, PDF page 8, top of the page).

Blacks = c(0, .03, .13, .44, .38, .01) #N = 9,980
Whites = c(.04, .15, .34, .36, .11, 0) #N = 62,358

pars = MethodOfThresholds(Blacks, Whites) #DG = "Cd" for Cohen's d, which is 1.022
par(mar = c(4, 4, 1.5, 0.8), bg = "#FBEEE6")
DiversitySpacePlot(rbind(cumsum(Blacks), 
                         cumsum(Whites)), 
                   pars[1], pars[2],
                   xlabel = "Fraction of Blacks Achieving Threshold",
                   ylabel = "Fraction of Whites Achieving Threshold")
grid()

pars
## Glass' Delta     SD Ratio 
##        1.044        1.044

Discussion

Using this interesting method, two things are apparent with respect to the focal gaps. Firstly, for the male-female mathematical scoring gaps, there are not mean differences; differences are contained in the variances. The variance ratio was 1.23, or 23% more variability in males. This number is very similar to the aggregate 1.16 I found for variance ratios in general (https://rpubs.com/JLLJ/SDVR), when score differences are removed. For total score, the variance ratio is a bit further, at 1.32, with a mean difference of -.01 in favor of the female sample. Secondly, for the Black-White total score gaps, there were typical mean differences of 1.04 Delta, and there were more limited differences in the variances, at 1.04, or 8% greater White variability.

In 2022, the Black-White gap in the state of Michigan stands at approximately one standard deviation and there is very little male-female gap in mathematical scoring. This method has extraordinary utility for the refurbishment of data that is abstrusely presented, such as SAT and ACT scores. For example, here is the total male-female gap in Michigan.

Females = c(.03, .13, .31, .39, .14, 0)
Males   = c(.05, .14, .29, .35, .18, 0)

MethodOfThresholds(Females, Males) #Total Score Gaps by Sex
## Glass' Delta     SD Ratio 
##       -0.007        1.148

Post-Script I - Other Gaps

Here are the values for various other total score gaps. I have omitted “Two or More Races”, because that category is very unclear. “Asian” is also unclear, but at least we know that it tends to mean Chinese, Japanese, or Korean. Because it does not explicitly mean them, though, we might expect higher variance in that group because of just how heterogeneous the other common parts make it.

Hawaiian   = c(.03, .09, .33, .39, .15, 0) #N = 66
Hispanic   = c(.01, .07, .22, .45, .24, 0) #N = 7,564
Asian      = c(.25, .22, .25, .21, .07, 0) #N = 4,217
Amerindian = c(.01, .05, .23, .46, .25, 0) #N = 1,212

MethodOfThresholds(Hawaiian, Whites)   #d = .208, use DG = "Cd" to see
## Glass' Delta     SD Ratio 
##        0.211        1.032
MethodOfThresholds(Hispanic, Whites)   #d = .566
## Glass' Delta     SD Ratio 
##        0.568        1.007
MethodOfThresholds(Asian, Whites)      #d = -.629
## Glass' Delta     SD Ratio 
##       -0.546        0.714
MethodOfThresholds(Amerindian, Whites) #d = .612
## Glass' Delta     SD Ratio 
##        0.628        1.051

The Hawaiian-White gap among Michigan’s high school seniors stood at .211 Delta, and was the least reliable among the bunch due to its very small N. The Hispanic-White gap was .568 Delta, and the same gap for the Native American Indian group stood at .628 Delta. The Asian-White gap reversed directions and stood at -.546 Delta. The SD ratio of .714 equated to nearly 30% greater variability in the Asian group. The IQ point metric versions of these gaps are

.211  * 15
## [1] 3.165
.568  * 15
## [1] 8.52
-.546 * 15
## [1] -8.19
.628  * 15
## [1] 9.42
1.044 * 15 #Black-White
## [1] 15.66

and the gaps in terms of Cohen’s d are

.208  * 15
## [1] 3.12
.566  * 15
## [1] 8.49
-.629 * 15
## [1] -9.435
.612  * 15
## [1] 9.18
1.022 * 15 #Black-White
## [1] 15.33

In other words, there were 3-, 9- (8-), -8- (-9-), 9-, and 16-point gaps between Michigan’s White high school seniors and their Hawaiian, Hispanic, Asian, Native American Indian, and Black high school seniors in Spring, 2022. Importantly, some might claim that differential rates of high school dropping out might impact these gaps, reducing them because they are expected to be selective with respect to IQ. This is an interesting possibility, and one that the report allows us to test because it includes this data and the participation rates are unusually high in Michigan, helping us to ensure representativeness. We may easily recalculate the gaps for those grades to test if the gap differed by year. If the theory that differential rates of dropping out reduce the gap is true, the gaps between Whites and underrepresented minority groups should decline as the school years advance, and the same should hold for the gap between Whites and Asians, except that in that case, it should favor Asians. Another theory worth considering is that the Asian gap is due to Asians using much greater levels and quality of preparation. This would suggest a gap that grows considerably over time in that group’s favor. If the gap is due to differences in test preparation, it should not be presented long before it’s reasonable to prep for the SAT, like in 8th or 9th grade. The same logic applies to all prep-based theories in different directions. Developmental angles aplenty could be proferred, but think about those on your own time.

White8      = c(0, .02, .15, .43, .37, .03) #N = 66,328
Black8      = c(0, 0, .02, .2, .66, .11) #N = 16,324
Hawaiian8   = c(0, .01, .07, .4, .49, .03) #N = 90
Hispanic8   = c(0, .01, .06, .33, .54, .06) #N = 8,950
Asian8      = c(.01, .16, .24, .37, .2, .02) #N = 3,633
Amerindian8 = c(0, .01, .07, .34, .52, .06) #N = 648

White9      = c(0, .05, .22, .43, .28, .02) #N = 71,808
Black9      = c(0, 0, .04, .26, .59, .1) #N = 16,596
Hawaiian9   = c(0, .03, .18, .34, .43, .03) #N = 117
Hispanic9   = c(0, .01, .11, .37, .45, .06) #N = 10,150
Asian9      = c(.03, .22, .29, .29, .15, .02) #N = 3,904
Amerindian9 = c(0, .02, .1, .38, .45, .06) #N = 898

White10      = c(.01, .07, .27, .42, .23, .01) #N = 67,339
Black10      = c(0, .01, .07, .33, .56, .03) #N = 14,675
Hawaiian10   = c(0, .03, .2, .38, .39, 0) #N = 117
Hispanic10   = c(0, .02, .14, .41, .41, .02) #N = 10,063
Asian10      = c(.06, .19, .32, .28, .14, 0) #N = 4,161
Amerindian10 = c(0, .02, .14, .44, .38, .01) #N = 1,555

White11      = c(.02, .14, .37, .36, .12, 0) #N = 32,072
Black11      = c(0, .02, .13, .43, .4, .02) #N = 5,784
Hawaiian11   = c(0, .06, .24, .44, .26, 0) #N = 50
Hispanic11   = c(.01, .06, .23, .42, .28, .01) #N = 4,328
Asian11      = c(.19, .28, .28, .2, .06, 0) #N = 2,911
Amerindian11 = c(0, .06, .22, .45, .25, .01) #N = 535
"Eighth Grade"
## [1] "Eighth Grade"
MethodOfThresholds(Black8, White8)
## Glass' Delta     SD Ratio 
##        1.049        1.150
MethodOfThresholds(Hawaiian8, White8)
## Glass' Delta     SD Ratio 
##        0.331        1.122
MethodOfThresholds(Hispanic8, White8)
## Glass' Delta     SD Ratio 
##        0.508        1.036
MethodOfThresholds(Asian8, White8)
## Glass' Delta     SD Ratio 
##       -0.554        0.785
MethodOfThresholds(Amerindian8, White8)
## Glass' Delta     SD Ratio 
##        0.452        1.018

In grade eight, score gaps were practically indistinguishable from the gaps in the senior year. This is prior to test prep being a plausible score gap explanation.

"Ninth Grade"
## [1] "Ninth Grade"
MethodOfThresholds(Black9, White9)
## Glass' Delta     SD Ratio 
##        1.100        1.129
MethodOfThresholds(Hawaiian9, White9)
## Glass' Delta     SD Ratio 
##        0.318        0.883
MethodOfThresholds(Hispanic9, White9)
## Glass' Delta     SD Ratio 
##        0.558        1.025
MethodOfThresholds(Asian9, White9)
## Glass' Delta     SD Ratio 
##       -0.558        0.746
MethodOfThresholds(Amerindian9, White9)
## Glass' Delta     SD Ratio 
##        0.546        1.009

In grade nine, score gaps were practically indistinguishable from the gaps in the senior year. This is prior to test prep being a plausible score gap explanation.

"Tenth Grade"
## [1] "Tenth Grade"
MethodOfThresholds(Black10, White10, epsilon = .0000001) #Epsilon had to be adjusted for computation. Remarkable!
## Glass' Delta     SD Ratio 
##         1.01         1.10
MethodOfThresholds(Hawaiian10, White10)
## Glass' Delta     SD Ratio 
##        0.390        0.978
MethodOfThresholds(Hispanic10, White10)
## Glass' Delta     SD Ratio 
##        0.579        1.063
MethodOfThresholds(Asian10, White10)
## Glass' Delta     SD Ratio 
##       -0.507        0.827
MethodOfThresholds(Amerindian10, White10)
## Glass' Delta     SD Ratio 
##        0.537        1.126
"Eleventh Grade"
## [1] "Eleventh Grade"
MethodOfThresholds(Black11, White11)
## Glass' Delta     SD Ratio 
##        1.083        1.062
MethodOfThresholds(Hawaiian11, White11)
## Glass' Delta     SD Ratio 
##        0.575        1.017
MethodOfThresholds(Hispanic11, White11)
## Glass' Delta     SD Ratio 
##        0.574        0.940
MethodOfThresholds(Asian11, White11)
## Glass' Delta     SD Ratio 
##        -0.63         0.74
MethodOfThresholds(Amerindian11, White11)
## Glass' Delta     SD Ratio 
##        0.611        1.022

In grades ten and eleven, the same patterns were maintained. The sizes of the Black-White gap in the typical IQ metric varied from 8th to 12th grade, with the same results presented for Hawaiians, then Hispanics, then Asians, and finally, Native American Indians, as such:

"Blacks"
## [1] "Blacks"
1.049 * 15
## [1] 15.735
1.1   * 15
## [1] 16.5
1.01  * 15
## [1] 15.15
1.083 * 15
## [1] 16.245
1.044 * 15
## [1] 15.66
"Hawaiians"
## [1] "Hawaiians"
.331 * 15
## [1] 4.965
.318 * 15
## [1] 4.77
.390 * 15
## [1] 5.85
.575 * 15
## [1] 8.625
.211 * 15
## [1] 3.165
"Hispanics"
## [1] "Hispanics"
.508 * 15
## [1] 7.62
.558 * 15
## [1] 8.37
.579 * 15
## [1] 8.685
.574 * 15
## [1] 8.61
.568 * 15
## [1] 8.52
"Asians"
## [1] "Asians"
-.554 * 15
## [1] -8.31
-.558 * 15
## [1] -8.37
-.507 * 15
## [1] -7.605
-.630 * 15
## [1] -9.45
-.546 * 15
## [1] -8.19
"Amerindians"
## [1] "Amerindians"
.452 * 15
## [1] 6.78
.546 * 15
## [1] 8.19
.537 * 15
## [1] 8.055
.611 * 15
## [1] 9.165
.628 * 15
## [1] 9.42

Gaps were consistent. What about male-female variance ratios?

Male8  = c(0, .02, .12, .36, .45, .05) #N = 52,388
Male9  = c(0, .05, .17, .36, .38, .05) #N = 59,064
Male10 = c(.01, .06, .21, .37, .34, .02) #N = 54,800 
Male11 = c(.03, .12, .29, .35, .2, .01) #N = 24,406
Male12 = c(.05, .14, .29, .35, .18, 0) #N = 45,504

Female8  = c(0, .02, .12, .4, .42, .04) #N = 49,968
Female9  = c(0, .04, .19, .41, .32, .03) #N = 55,508
Female10 = c(0, .05, .23, .43, .27, .01) #N = 52,811
Female11 = c(.02, .12, .33, .38, .15, 0) #N = 25,843
Female12 = c(.03, .13, .31, .39, .14, 0) #N = 44,938

MethodOfThresholds(Female8, Male8)
## Glass' Delta     SD Ratio 
##       -0.089        1.060
MethodOfThresholds(Female9, Male9)
## Glass' Delta     SD Ratio 
##       -0.152        1.124
MethodOfThresholds(Female10, Male10)
## Glass' Delta     SD Ratio 
##       -0.130        1.187
MethodOfThresholds(Female11, Male11)
## Glass' Delta     SD Ratio 
##       -0.104        1.150
MethodOfThresholds(Female12, Male12)
## Glass' Delta     SD Ratio 
##       -0.007        1.148
MethodOfThresholds(Female8, Male8)[2]^2
## SD Ratio 
##   1.1236
MethodOfThresholds(Female9, Male9)[2]^2
## SD Ratio 
## 1.263376
MethodOfThresholds(Female10, Male10)[2]^2
## SD Ratio 
## 1.408969
MethodOfThresholds(Female11, Male11)[2]^2
## SD Ratio 
##   1.3225
MethodOfThresholds(Female12, Male12)[2]^2
## SD Ratio 
## 1.317904

Directional consistency and some consistency in variance ratio differences. Interestingly, because variance differences are related to mean differences, male variance differences at the same scores are probably a bit larger. I have linked an analysis explaining and showing this, above.

Investigating change over time is hardly possible for one reason: Michigan used to be an ACT state. In the year 2000, only 11% of Michiganders took the SAT, and 12.7% of those who took it in Michigan were Black (JBHE, 2000), compared to 14.2% of the state which was Black (https://web.archive.org/web/20041117101850/https://www2.census.gov/census_2000/datasets/demographic_profile/Michigan/2kh26.pdf). Moreover, Michigan had a gap size of 244 points compared to a national average of 198. Prior to 2015, most students in Michigan took the ACT instead of the SAT. In that year, the Michigan Department of Education struck a deal to test everyone on a free-for-them basis for a surprisingly low price. Before that point, SAT scores were biased upwards, because it was elite students who took the SAT instead of the ACT. For that reason, the pattern is harder to interpret. The small sample sizes that resulted from being an ACT state also contribute to interpretation difficulties. Regardless of those issues, and potential issues with comparing between editions, psychometric bias, etc., it may be interesting to examine SAT differences from much earlier in time, say, 2000. Michigan’s 2000 scores are available (https://web.archive.org/web/20221005011021/https://secure-media.collegeboard.org/digitalServices/pdf/research/cb-seniors-2000-MI.PDF, table 4.1).

The total scores for Whites (N = 7,484) and Blacks (N = 1,457), respectively, were 1159 (191) and 915 (204). Hedge’s g and Cohen’s d are thus 1.23 and 1.26 for Michigan SAT takers in the year 2000. The female SD for Whites was 193 for males versus 186 for females, and 215 vs 196 for Blacks, for VRs of 1.08 and 1.20. Despite being from the era when Michigan was an ACT state, the SAT gaps are still at least plausibly consistent with the current ones. The

Cohensd <- function(M1, M2, SD1, SD2, N1 = 1, N2 = 1, rnd = 3){
  SDP = sqrt((SD1^2 + SD2^2)/2)
  SDPW = sqrt((((N1 - 1) * SD1^2) + ((N2 - 1) * SD2^2))/(N1 + N2))
  d = (M2 - M1)/SDP
  delta = (M2 - M1)/SD1
  g = (M2 - M1)/SDPW
  if (N1 & N2 <= 1) {cat(paste0("With group means of ", M1, " and ", M2," with SDs of ", SD1, " and ", SD2, ", Cohen's d is ", round(d, rnd), " Glass' Delta is ", round(delta, rnd), ". \n"))} else {cat(paste0("With group means of ", M1, " and ", M2," with SDs of ", SD1, " and ", SD2, " Cohen's d is ", round(d, rnd), " Glass' Delta is ", round(delta, rnd), " and Hedge's g is ", round(g, rnd), ". \n"))}}

Cohensd(1159, 915, 191, 204, 7484, 1457)
## With group means of 1159 and 915 with SDs of 191 and 204 Cohen's d is -1.235 Glass' Delta is -1.277 and Hedge's g is -1.263.

Post-Script II - Rhode Island and New Hampshire

The states of Rhode Island and New Hampshire also have high rates of SAT taking due to a requirements they’ve set up. Some other states have SAT/ACT requirements, and their data is available, but for want of time, I’m just going to look at these two, since they have SAT-specific requirements and thus reduced chances of representativeness issues with respect to their populations. We should expect more unrepresentativeness with respect to the national population of a minority group to the extent a state’s minority population is lower than the national level because people tend to self-segregate, and not doing that is selective with respect to variables related to IQ, like education, income, and so on. Moreover, larger population movements are more representative than smaller ones. Michigan is 15.3% Black or African alone or in combination as of 2020 (https://www.census.gov/library/stories/state-by-state/michigan-population-change-between-census-decade.html), compared to 2.4% for New Hampshire (https://www.census.gov/library/stories/state-by-state/new-hampshire-population-change-between-census-decade.html), and 9.1% for Rhode Island (https://www.census.gov/library/stories/state-by-state/rhode-island-population-change-between-census-decade.html). This issue was described by Jensen amply in Educability and Group Differences (see “the Klineberg fallacy”). That noted, here’s the data and the results.

Rhode Island: https://web.archive.org/web/20221005024155/https://reports.collegeboard.org/media/pdf/2022-rhode-island-sat-suite-of-assessments-annual-report.pdf

New Hampshire: https://web.archive.org/web/20221005024207/https://reports.collegeboard.org/media/pdf/2022-new-hampshire-sat-suite-of-assessments-annual-report.pdf

For this, I’ll just do the Black-White gaps for the SAT. Anyone who wishes for sex, PSAT, or other year results can look into these reports themselves. Similar to what Jensen pointed out in 1973

BlacksR = c(.01, .03, .15, .45, .35, .01) #N = 927
WhitesR = c(.04, .16, .33, .36, .12, 0) #N = 6,095

pars = MethodOfThresholds(BlacksR, WhitesR)
par(mar = c(4, 4, 1.5, 0.8), bg = "#FBEEE6")
DiversitySpacePlot(rbind(cumsum(BlacksR), 
                         cumsum(WhitesR)), 
                   pars[1], pars[2],
                   xlabel = "Fraction of Blacks Achieving Threshold",
                   ylabel = "Fraction of Whites Achieving Threshold")
grid()

pars
## Glass' Delta     SD Ratio 
##        0.915        1.029
BlacksN = c(.07, .13, .31, .4, .1,  0) #N = 167
WhitesN = c(.07, .25, .41, .23, .04, 0) #N = 6,359

pars = MethodOfThresholds(BlacksN, WhitesN)
par(mar = c(4, 4, 1.5, 0.8), bg = "#FBEEE6")
DiversitySpacePlot(rbind(cumsum(BlacksN), 
                         cumsum(WhitesN)), 
                   pars[1], pars[2],
                   xlabel = "Fraction of Blacks Achieving Threshold",
                   ylabel = "Fraction of Whites Achieving Threshold")
grid()

pars
## Glass' Delta     SD Ratio 
##        0.461        0.833

Post-Script III - Michigan immediately before and after becoming an SAT State

Michigan recently transitioned from being an ACT state. When it was an ACT state, relatively elite students and those with out-of-state prospects were more likely to take the SAT. Comparing gaps before and after the transition can be edifying, because participation went from highly selective and small (3,565 total test-takers in 2016) to population-representative and massive (110,082 total test-takers in 2017: more test-takers than high school graduates). As above, I will look at the Black-White gap and those interested can conduct their own analyses with the data.

Michigan, 2016: https://web.archive.org/web/20221005185307/https://reports.collegeboard.org/media/pdf/mi16030301.pdf

Michigan, 2017: https://web.archive.org/web/20221005185413/https://reports.collegeboard.org/media/pdf/2017-michigan-sat-suite-assessments-annual-report.pdf

Cohensd((504 + 492 + 490), (603 + 601 + 583), (110 + 114 + 105), (102 + 104 + 99), 243, 2095) #Reading + Mathematics + Writing, Blacks first
## With group means of 1486 and 1787 with SDs of 329 and 305 Cohen's d is 0.949 Glass' Delta is 0.915 and Hedge's g is 0.979.
Cohensd((504 + 492), (603 + 601), (110 + 114), (102 + 104), 243, 2095) #Reading + Mathematics, Blacks first
## With group means of 996 and 1204 with SDs of 224 and 206 Cohen's d is 0.967 Glass' Delta is 0.929 and Hedge's g is 1.001.

Due to how the data were presented in 2017, we might wish to use the threshold method (https://rpubs.com/JLLJ/threshold) for reading plus writing benchmark scores, and for more informativeness, we can also use proportions of each group who passed at least one benchmark. The reason for using the single threshold method is convergence issues with small numbers of datapoints.

ThresholdMean <- function(FracA, FracB, rnd = 3){
  Gap = qnorm(FracA) - qnorm(FracB)
  if (Gap >= 0){cat(paste0("Group A's mean is ", round(Gap, rnd), " SDs higher than Group B's. \n"))} else {cat(paste0("Group B's mean is ", abs(round(Gap, rnd)), " SDs higher than Group A's. \n"))}}

ThresholdMean(.12, .45) #ERW + Math, Blacks first
## Group B's mean is 1.049 SDs higher than Group A's.
ThresholdMean(1 - .63, 1 - .26) #1 - Neither, Blacks first
## Group B's mean is 0.975 SDs higher than Group A's.
ThresholdMean(.35, .72) #ERW
## Group B's mean is 0.968 SDs higher than Group A's.
ThresholdMean(.13, .47) #Math
## Group B's mean is 1.051 SDs higher than Group A's.

In Michigan, the gap seems basically stable and possibly somewhat larger after the transition to the universal use of the SAT.

New Hampshire transitioned from using the Smarter Balanced Statewide Assessment to the SAT as a graduation requirement in Spring, 2016 (https://web.archive.org/web/20220128123255/https://www.education.nh.gov/who-we-are/division-of-learner-support/bureau-of-instructional-support/sat). According to the Western Interstate Commission for Higher Education’s Knocking at the College Door | 10th Edition report (https://www.wiche.edu/resources/knocking-at-the-college-door-10th-edition/; https://web.archive.org/web/20220303045931/https://www.wiche.edu/wp-content/uploads/2020/12/Knocking-pdf-for-website.pdf), New Hampshire had 13,400 high school graduates in 2015 and 2016, and 13,000 in 2017. Rhode Island had 9,400, 9,600, and 8,700 in the same years, and 9,200 in 2018. College Board has not supplied New Hampshire’s 2015 SAT data, so we don’t have pre-requirement SAT data for them. However, we do have their previous assessment’s benchmark rates for the reading and mathematics sections provided by the New Hampshire Department of Education. This assessment seemed easier than the SAT for mathematics and harder for writing.

New Hampshire, 2015 (and 2016, same metric): https://web.archive.org/web/20221005193845/https://www.education.nh.gov/sites/g/files/ehbemt326/files/inline-documents/results-16.pdf

New Hampshire, 2016: https://web.archive.org/web/20221005192905/https://reports.collegeboard.org/media/pdf/nh16030301.pdf

ThresholdMean(.36, .59) #2015, Reading
## Group B's mean is 0.586 SDs higher than Group A's.
ThresholdMean(.24, .48) #2015, Mathematics
## Group B's mean is 0.656 SDs higher than Group A's.
ThresholdMean(.38, .61) #2016, Reading
## Group B's mean is 0.585 SDs higher than Group A's.
ThresholdMean(.25, .51) #2016, Mathematics
## Group B's mean is 0.7 SDs higher than Group A's.
ThresholdMean(.39, .68) #2016, SAT Reading
## Group B's mean is 0.747 SDs higher than Group A's.
ThresholdMean(.15, .42) #2016, SAT Mathematics
## Group B's mean is 0.835 SDs higher than Group A's.
Cohensd((481 + 471 + 465), (527 + 529 + 509), (115 + 124 + 114), (99 + 101 + 98), 202, 8676) #2016, SAT, Reading + Mathematics + Writing, Blacks first
## With group means of 1417 and 1565 with SDs of 353 and 298 Cohen's d is 0.453 Glass' Delta is 0.419 and Hedge's g is 0.494.
Cohensd((481 + 471), (527 + 529), (115 + 124), (99 + 101), 202, 8676) #2016, SAT, Reading + Mathematics, , Blacks first
## With group means of 952 and 1056 with SDs of 239 and 200 Cohen's d is 0.472 Glass' Delta is 0.435 and Hedge's g is 0.518.

In New Hampshire, equal variances were not attained for the SAT and the lower-scoring group had greater variances, so the single-point threshold method overstated mean differences for the SAT compared to computing the standardized effect size directly. The Black-White gap in New Hampshire is roughly half the national average, at about .5 g, and slightly less with d or the method of multiple thresholds. The results with the SAT with respect to the Black-White gap are similar to the test they used before making it a requirement.

Rhode Island transitioned from using their state examinations to using the PSAT and SAT for high school students in 2017 (https://web.archive.org/web/20221005195246/https://www.providencejournal.com/story/news/education/2018/10/25/with-sat-required-ri-sees-jump-in-participation-decline-in-scores/9450299007/). As that article notes,

But a spokeswoman for the College Board cautioned against state-to-state comparisons, because each state tests different populations of students. Some might have more English-language learners; others might have more students living in poverty.

Wagner said it’s not uncommon for scores to decline when more students take a standardized test.

“When you get a dramatic increase in participation, the last students to take the test are generally lower-performing students,” he said. “They’re taking the test because the school told them to. The real comparison will be next year, because we are already at 95 percent participation.”

The requirement coming into play in 2017 led to a large increase in participation. In 2017, the year the requirement was set up but prior to actual implementation taking place, 71% of Rhode Island high schoolers took the SAT. In 2018, 97% did, and in 2016 around 82% did, if the 9,600 figure provided by the WICHE is to be believed and it is assumed that all test-takers the College Board reported were among the graduates. Because we only have benchmark scores for 2017 and 2018, 2016 can be used for calibration.

Rhode Island, 2016: https://web.archive.org/web/20221005200546/https://reports.collegeboard.org/media/pdf/ri16030301.pdf

Rhode Island, 2017: https://web.archive.org/web/20221005200600/https://reports.collegeboard.org/media/pdf/2017-rhode-island-sat-suite-assessments-annual-report.pdf

Rhode Island, 2018: https://web.archive.org/web/20221005200610/https://reports.collegeboard.org/media/pdf/2018-rhode-island-sat-suite-assessments-annual-report.pdf

Cohensd((417 + 409 + 408), (520 + 518 + 510), (105 + 99 + 99), (97 + 102 + 98), 570, 4906) #2016, Reading + Mathematics + Writing, Blacks first
## With group means of 1234 and 1548 with SDs of 303 and 297 Cohen's d is 1.047 Glass' Delta is 1.036 and Hedge's g is 1.055.
Cohensd((417 + 409), (520 + 518), (105 + 99), (97 + 102), 570, 4906) #2016, Reading + Mathematics, Blacks first
## With group means of 826 and 1038 with SDs of 204 and 199 Cohen's d is 1.052 Glass' Delta is 1.039 and Hedge's g is 1.063.
ThresholdMean(.19, .57) #2017, ERW + Math, Blacks first
## Group B's mean is 1.054 SDs higher than Group A's.
ThresholdMean(1 - .54, 1 - .13) #2017, 1 - Neither, Blacks first
## Group B's mean is 1.227 SDs higher than Group A's.
ThresholdMean(.44, .85) #2017, ERW
## Group B's mean is 1.187 SDs higher than Group A's.
ThresholdMean(.21, .58) #2017, Math
## Group B's mean is 1.008 SDs higher than Group A's.
ThresholdMean(.16, .48) #2018, ERW + Math, Blacks first
## Group B's mean is 0.944 SDs higher than Group A's.
ThresholdMean(1 - .6, 1 - .24) #2018, 1 - Neither, Blacks first
## Group B's mean is 0.96 SDs higher than Group A's.
ThresholdMean(.37, .74) #2018, ERW
## Group B's mean is 0.975 SDs higher than Group A's.
ThresholdMean(.18, .5) #2018, Math
## Group B's mean is 0.915 SDs higher than Group A's.

The universalization of SAT testing in Rhode Island may have led to relatively less selective scores for the White population somehow, bringing the gap back down by a bit between 2016 and 2018 to the point it’s now settled around.

Post-Script IV - The Total Gap Now and at the Turn of the Millenium

It was mentioned above that in the year 2000, Michigan had a gap that was larger than the national average SAT score gap. Incidentally, we can now assess whether that is still true because national total scores are available. Moreover, we can naively assess trends in the gap by comparing the present one with the gap as far back as we have adequate score reports. As it happens, the oldest College Board report I could find was from two decades back in 2002. We can also cut this interval in half and look at the report for 2012.

Total Scores, 2022: https://web.archive.org/web/20221005221309/https://reports.collegeboard.org/media/pdf/2022-total-group-sat-suite-of-assessments-annual-report.pdf

Total Scores, 2012: https://web.archive.org/web/20220901040755/https://secure-media.collegeboard.org/digitalServices/pdf/research/TotalGroup-2012.pdf

Total Scores, 2002: https://web.archive.org/web/20211108145244/https://secure-media.collegeboard.org/digitalServices/pdf/research/cb-seniors-2002-TOTAL_GROUP_REPORT.pdf

The 2022 data is provided in terms of proportions of different groups in various ranges.

Cohensd((430 + 427), (527 + 533), (99 + 99), (100 + 103), 122684, 698659) #2002, Verbal and Mathematics
## With group means of 857 and 1060 with SDs of 198 and 203 Cohen's d is 1.012 Glass' Delta is 1.025 and Hedge's g is 1.004.
Cohensd((428 + 428 + 417), (527 + 536 + 515), (98 + 97 + 94), (103 + 103 + 103), 217656, 852144) #2012, Reading + Mathematics + Writing
## With group means of 1273 and 1578 with SDs of 289 and 309 Cohen's d is 1.019 Glass' Delta is 1.055 and Hedge's g is 1.
Cohensd((428 + 428), (527 + 536), (98 + 97), (103 + 103), 217656, 852144) #2012, Reading + Mathematics
## With group means of 856 and 1063 with SDs of 195 and 206 Cohen's d is 1.032 Glass' Delta is 1.062 and Hedge's g is 1.016.
#2022

BlacksT= c(.01, .07, .22, .45, .24, .01) #N = 201,645
WhitesT = c(.07, .24, .38, .26, .05, 0) #N = 732,946

pars = MethodOfThresholds(BlacksT, WhitesT) #DG = "Cd" for Cohen's d, which is .998
par(mar = c(4, 4, 1.5, 0.8), bg = "#FBEEE6")
DiversitySpacePlot(rbind(cumsum(BlacksT), 
                         cumsum(WhitesT)), 
                   pars[1], pars[2],
                   xlabel = "Fraction of Blacks Achieving Threshold",
                   ylabel = "Fraction of Whites Achieving Threshold")
grid()

pars
## Glass' Delta     SD Ratio 
##        0.979        0.963

A few conclusions stand out. Firstly, Michigan’s gap is now very close to the national gap. Michigan’s gap is 1.044 SDs and the national gap is .979, for a .065 SD or .975 IQ point difference. In Cohen’s d terms, Michigan’s gap is 1.022 d and the national gap is .998 d, for a .36 IQ point difference that further reduces their difference and the possible narrowing in actual gap sizes over time even more. Michigan’s gap is therefore just slightly larger, and that slight difference is similar to the effect of requirements known to improve representativeness, as shown above. It stands somewhere between slightly smaller than and a bit larger than the representativeness effects already shown. Secondly, the gap in general has not changed very much, and changes are, as noted, well within the realm of those we could expect with better representation of different racial groups resulting from increased testing in general. Third, practical transformations reveal how trivial the changes between 2002 and 2022 could have been, if taken at face-value. Consider the transformation to the typical IQ metric:

15 * 1.004 #2002
## [1] 15.06
15 * 1     #2012, Writing included
## [1] 15
15 * 1.016 #2012, Writing excluded
## [1] 15.24
15 * .979  #2022
## [1] 14.685
15.06 - 14.685 #2002 - 2022
## [1] 0.375
1.004 - .979   #2002 - 2022 in SD terms
## [1] 0.025

A possible change of .025 SDs is very small, equivalent to a little over a third of an IQ point. First, consider the change in terms of d: these were .014, or .21 IQ points. Next, consider the years 2016 through 2021.

#2021 - https://web.archive.org/web/20220923004412/https://reports.collegeboard.org/media/2022-04/2021-total-group-sat-suite-of-assessments-annual-report%20(1).pdf
BlacksT= c(.01, .08, .24, .43, .23, .01) #N = 168,454
WhitesT = c(.08, .26, .39, .23, .05, 0)  #N = 635,486

pars = MethodOfThresholds(BlacksT, WhitesT) #As Cohen's d, this is .999
par(mar = c(4, 4, 1.5, 0.8), bg = "#FBEEE6")
DiversitySpacePlot(rbind(cumsum(BlacksT), 
                         cumsum(WhitesT)), 
                   pars[1], pars[2],
                   xlabel = "Fraction of Blacks Achieving Threshold",
                   ylabel = "Fraction of Whites Achieving Threshold")
grid()

pars
## Glass' Delta     SD Ratio 
##        0.990        0.982
#2020 - https://web.archive.org/web/20220819015131/https://reports.collegeboard.org/media/pdf/2020-total-group-sat-suite-assessments-annual-report.pdf
BlacksT= c(.01, .07, .24, .44, .24, .01) #N = 261,326
WhitesT = c(.07, .24, .4, .23, .05, 0)   #N = 909,987

pars = MethodOfThresholds(BlacksT, WhitesT) #1.021
par(mar = c(4, 4, 1.5, 0.8), bg = "#FBEEE6")
DiversitySpacePlot(rbind(cumsum(BlacksT), 
                         cumsum(WhitesT)), 
                   pars[1], pars[2],
                   xlabel = "Fraction of Blacks Achieving Threshold",
                   ylabel = "Fraction of Whites Achieving Threshold")
grid()

pars
## Glass' Delta     SD Ratio 
##        1.007        0.974
#2019 - https://web.archive.org/web/20221005225229/https://reports.collegeboard.org/media/pdf/2019-total-group-sat-suite-assessments-annual-report.pdf
BlacksT= c(.01, .07, .25, .45, .22, 0)  #N = 271,178
WhitesT = c(.08, .26, .39, .22, .05, 0) #N = 947,842

pars = MethodOfThresholds(BlacksT, WhitesT) #1.029
par(mar = c(4, 4, 1.5, 0.8), bg = "#FBEEE6")
DiversitySpacePlot(rbind(cumsum(BlacksT), 
                         cumsum(WhitesT)), 
                   pars[1], pars[2],
                   xlabel = "Fraction of Blacks Achieving Threshold",
                   ylabel = "Fraction of Whites Achieving Threshold")
grid()

pars
## Glass' Delta     SD Ratio 
##        1.048        1.037
#2018 - https://web.archive.org/web/20220714204614/https://reports.collegeboard.org/media/pdf/2018-total-group-sat-suite-assessments-annual-report.pdf
BlacksT= c(.010, .07, .27, .45, .19, 0) #N = 263,318
WhitesT = c(.08, .27, .41, .21, .04, 0) #N = 930,825

pars = MethodOfThresholds(BlacksT, WhitesT) #1.041
par(mar = c(4, 4, 1.5, 0.8), bg = "#FBEEE6")
DiversitySpacePlot(rbind(cumsum(BlacksT), 
                         cumsum(WhitesT)), 
                   pars[1], pars[2],
                   xlabel = "Fraction of Blacks Achieving Threshold",
                   ylabel = "Fraction of Whites Achieving Threshold")
grid()

pars
## Glass' Delta     SD Ratio 
##        1.056        1.029
#2017 - https://web.archive.org/web/20221005225348/https://reports.collegeboard.org/media/pdf/2017-total-group-sat-suite-assessments-annual-report.pdf
ThresholdMean(.5, .85)  #1 - None; Black-N = 225,860; White-N = 760,362
## Group B's mean is 1.036 SDs higher than Group A's.
ThresholdMean(.2, .59)  #Both
## Group B's mean is 1.069 SDs higher than Group A's.
ThresholdMean(.49, .83) #ERW
## Group B's mean is 0.979 SDs higher than Group A's.
ThresholdMean(.22, .61) #Mathematics
## Group B's mean is 1.052 SDs higher than Group A's.
#2016 - https://web.archive.org/web/20220712213212/https://reports.collegeboard.org/media/pdf/2016-total-group-sat-suite-assessments-annual-report.pdf
Cohensd((430 + 425 + 415), (528 + 533 + 511), (102 + 101 + 97), (104 + 104 + 103), 199306, 742436) #Black-N = 199,306; White-N = 742,436
## With group means of 1270 and 1572 with SDs of 300 and 311 Cohen's d is 0.988 Glass' Delta is 1.007 and Hedge's g is 0.978.
Cohensd((430 + 425), (528 + 533), (102 + 101), (104 + 104), 199306, 742436)
## With group means of 855 and 1061 with SDs of 203 and 208 Cohen's d is 1.002 Glass' Delta is 1.015 and Hedge's g is 0.995.

These were consistent with a change with respect to 2022 of +.001 to -.09 or 0 to -.09 without writing. In other words, changes in that time period of +.15 to -1.35 points. Using Cohen’s d instead of Glass’ Delta, the gap changes were more muted, going from 1.002 d in 2016 to .999 d in 2021 with the aforementioned .998 in 2022, or potential changes in that time period of .06 IQ points. There is basically nothing to mention here.

Post-Script V - Aggregating Data from Different States

Data can also be aggregated across different sources with this method. For example, below, I have aggregated the 2022 data for Michigan, New York, California, and Georgia. These states were chosen because someone else asked for them. Unfortunately, to plot data from multiple sources requires either a few more steps as shown in the MethodOfThresholds function, or the use of the function provided below. This can be added to the DiversitySpacePlot function on its own if desired. PSAT comparisons may also be interesting, and a comparison between all states, free SAT states, required SAT states, aggregated states, and the total might be neat.

New York: https://web.archive.org/web/20221007011627/https://reports.collegeboard.org/media/pdf/2022-new-york-sat-suite-of-assessments-annual-report.pdf

California: https://web.archive.org/web/20221007011628/https://reports.collegeboard.org/media/pdf/2022-california-sat-suite-of-assessments-annual-report.pdf

Georgia: https://web.archive.org/web/20221007011631/https://reports.collegeboard.org/media/pdf/2022-georgia-sat-suite-of-assessments-annual-report.pdf

BlackM = c(.00, .03, .13, .44, .38, .01) # Michigan
WhiteM = c(.04, .15, .34, .36, .11, .00)

BlackN = c(.01, .08, .26, .45, .2, .01) # New York
WhiteN = c(.1, .28, .41, .19, .02, 0)

BlackC = c(.03, .1, .25, .41, .2, .01) # California
WhiteC = c(.17, .34, .33, .13, .02, 0)

BlackG = c(.01, .08, .28, .48, .16, 0) # Georgia
WhiteG= c(.06, .26, .42, .24, .02, 0)

G1 = data.frame(BlackM, BlackN, BlackC, BlackG)
G2 = data.frame(WhiteM, WhiteN, WhiteC, WhiteG)

PlotPrep <- function(A1, A2){
  colMax <- function(data) sapply(as.data.frame(data), max, na.rm = TRUE)
  A1C = cumsum(A1); A2C = cumsum(A2)
  A1M = mapply('/', A1C, colMax(A1C)); A2M = mapply('/', A2C, colMax(A2C))
  ADF = rbind(as.numeric(unlist(A1M)), as.numeric(unlist(A2M)))
  return(ADF)}

ADF <- PlotPrep(G1, G2)

pars = MethodOfThresholds(G1, G2) #In Cohen's d terms, this is 1.107 d
par(mar = c(4, 4, 1.5, 0.8), bg = "#FBEEE6")
DiversitySpacePlot(ADF, 
                   pars[1], pars[2],
                   xlabel = "Fraction of Blacks Achieving Threshold",
                   ylabel = "Fraction of Whites Achieving Threshold")
grid()

pars
## Glass' Delta     SD Ratio 
##        1.065        0.922

Post-Script VI: States Requiring or Providing the SAT Freely

According to the website PrepScholar (https://web.archive.org/web/20220801175929/https://blog.prepscholar.com/which-states-require-the-sat), 20 states are contracted with the College Board to provide free SAT testing. In some of them, testing with the SAT is required. In some others, it, the ACT, or an alternative assessment can be required depending on the district and student choice, but the SAT in those states is, in any case, free in total or in specific districts. Here are the links to the data for these states, and the Black-White gaps. The male-female variance differences are also of considerable interest, and weighting this function by sample size could also be useful, but the utility will scale with \(\sqrt{N}\) and the high participation states besides West Virginia, New Hampshire, and Idaho have large sample sizes.

# Colorado - Eligible, https://web.archive.org/web/20221008010827/https://reports.collegeboard.org/media/pdf/2022-colorado-sat-suite-of-assessments-annual-report.pdf, N's = 1,972 and 28,878. 84% participation.

BlackCol = c(.01, .07, .22, .46, .23, .01) 
WhiteCol = c(.06, .22, .37, .29, .06, 0)

# Connecticut - Required, https://web.archive.org/web/20221008010844/https://reports.collegeboard.org/media/pdf/2022-connecticut-sat-suite-of-assessments-annual-report.pdf, N's = 4,090 and 21,381. 89% participation.

BlackCon = c(.01, .04, .17, .44, .32, .01) 
WhiteCon = c(.07, .21, .35, .29, .08, 0)

# Delaware - Required, https://web.archive.org/web/20221008010918/https://reports.collegeboard.org/media/pdf/2022-delaware-sat-suite-of-assessments-annual-report.pdf, N's = 2,043 and 4,107. 95% participation. 

BlackDel = c(.01, .04, .18, .48, .29, .01) 
WhiteDel = c(.05, .17, .35, .35, .08, 0)

# District of Columbia - Eligible, https://web.archive.org/web/20221008010939/https://reports.collegeboard.org/media/pdf/2022-district-of-columbia-sat-suite-of-assessments-annual-report.pdf, N's = 2,174 and 928. >100% participation. Presumably some people took it here and did not graduate, and others took it here and did not graduate locally. 

BlackDC = c(.01, .06, .15, .43, .33, .02) 
WhiteDC = c(.31, .36, .2, .07, .05, 0) #Selection into living in D.C. is extreme among Whites

# Idaho - Eligible, https://web.archive.org/web/20221008011633/https://reports.collegeboard.org/media/pdf/2022-idaho-sat-suite-of-assessments-annual-report.pdf, N's = 208 and 11,315. 97% participation. 

BlackIda = c(0, .04, .23, .45, .27, 0) 
WhiteIda = c(.03, .18, .35, .34, .1, 0)

# Illinois - Required, https://web.archive.org/web/20221006165356/https://reports.collegeboard.org/media/pdf/2022-illinois-sat-suite-of-assessments-annual-report.pdf, N's = 16,255 and 57,803. 97% participation. 

BlackIll = c(.01, .04, .15, .46, .35, .01) 
WhiteIll = c(.06, .19, .35, .31, .09, 0)

# Indiana - Required, https://web.archive.org/web/20221008011719/https://reports.collegeboard.org/media/pdf/2022-indiana-sat-suite-of-assessments-annual-report.pdf, N's = 3,260 and 24,874. 48% participation. Requirement began in 2022, so should show up in 2023 scores. Data not used, but listed for interest.

#BlackInd = c(0, .05, .24, .48, .22, 0) 
#WhiteInd = c(.05, .25, .44, .23, .03, 0)

# Michigan - Required, https://web.archive.org/web/20221004183404/https://reports.collegeboard.org/media/pdf/2022-michigan-sat-suite-of-assessments-annual-report.pdf, N's = 9,980 and 62,358. 84% participation.

BlackMic = c(0, .03, .13, .44, .38, .01) 
WhiteMic = c(.04, .15, .34, .36, .11, 0)

# New Hampshire - Required, https://web.archive.org/web/20221005024207/https://reports.collegeboard.org/media/pdf/2022-new-hampshire-sat-suite-of-assessments-annual-report.pdf, N's = 167 and 6,359. 81% participation.

BlackNH = c(.07, .13, .31, .4, .1, 0) #Selection into living in New Hampshire is extreme among Blacks
WhiteNH = c(.07, .25, .41, .23, .04, 0)

# Ohio - Required, but ACT is also usable instead and, unfortunately, the ACT is primarily used instead of the SAT, which has low uptake, https://web.archive.org/web/20221008011928/https://reports.collegeboard.org/media/pdf/2022-ohio-sat-suite-of-assessments-annual-report.pdf, N's = 3,728 and 12,976. 18% participation.

BlackOhi = c(.01, .05, .17, .43, .33, .01)
WhiteOhi = c(.09, .26, .36, .23, .06, 0)

# Oklahoma - Required, but ACT required in some districts, and unfortunately for inference, more districts than the SAT is required in, https://web.archive.org/web/20221008012004/https://reports.collegeboard.org/media/pdf/2022-oklahoma-sat-suite-of-assessments-annual-report.pdf, N's = 1,083 and 3,303. 17% participation.

BlackOkl = c(0, .01, .12, .42, .44, .01) 
WhiteOkl = c(.03, .13, .31, .38, .14, 0)

# Rhode Island - Required, https://web.archive.org/web/20221005024155/https://reports.collegeboard.org/media/pdf/2022-rhode-island-sat-suite-of-assessments-annual-report.pdf, N's = 927 and 6,095. 93% participation. 

BlackRI = c(.01, .03, .15, .45, .35, .01) 
WhiteRI = c(.04, .16, .33, .36, .12, 0)

# South Carolina - Required, but ACT required in some districts, and unfortunately nearly as many as the SAT, https://web.archive.org/web/20221008012032/https://reports.collegeboard.org/media/pdf/2022-south-carolina-sat-suite-of-assessments-annual-report.pdf, N's = 4,741 and 15,356. 51% participation.

BlackSC = c(0, .04, .22, .52, .21, 0) 
WhiteSC = c(.05, .22, .4, .29, .04, 0)

# Tennessee - Required, but ACT is also usable instead and, unfortunately, the ACT is primarily used instead of the SAT, which has low uptake, https://web.archive.org/web/20221008012048/https://reports.collegeboard.org/media/pdf/2022-tennessee-sat-suite-of-assessments-annual-report.pdf, N's = 297 and 2,149. 5% participation.

BlackTen = c(.03, .18, .43, .32, .04, 0) 
WhiteTen = c(.16, .39, .35, .1, 0, 0) 

#Both the Black and White samples are extremely selected in Tennessee because most people elect to take the ACT instead 

# West Virginia - Required unless taking the West Virginia Alternative Summer Assessment, https://web.archive.org/web/20221008012116/https://reports.collegeboard.org/media/pdf/2022-west-virginia-sat-suite-of-assessments-annual-report.pdf, N's = 489 and 10,436. 84% participation.

BlackWV = c(0, .03, .16, .53, .27, 0) 
WhiteWV = c(.01, .09, .3, .44, .16, 0)

G1 = data.frame(BlackCol, BlackCon, BlackDel, BlackDC, BlackIda, BlackIll, #BlackInd, 
                BlackMic, BlackNH, BlackOhi, BlackOkl, BlackRI, BlackSC, BlackTen, BlackWV)
G2 = data.frame(WhiteCol, WhiteCon, WhiteDel, WhiteDC, WhiteIda, WhiteIll, #WhiteInd, 
                WhiteMic, WhiteNH, WhiteOhi, WhiteOkl, WhiteRI, WhiteSC, WhiteTen, WhiteWV)

ADF <- PlotPrep(G1, G2)

pars = MethodOfThresholds(G1, G2) #In Cohen's d terms this is .969 d
par(mar = c(4, 4, 1.5, 0.8), bg = "#FBEEE6")
DiversitySpacePlot(ADF, 
                   pars[1], pars[2],
                   xlabel = "Fraction of Blacks Achieving Threshold",
                   ylabel = "Fraction of Whites Achieving Threshold")
grid()

pars; pars[1]*15
## Glass' Delta     SD Ratio 
##        0.985        1.034
## Glass' Delta 
##       14.775

And the data with just high participation requirement states looks like this:

G1 = data.frame(BlackCol, BlackCon, BlackDel, BlackDC, BlackIda, BlackIll, BlackMic, BlackNH, BlackRI, BlackWV)
G2 = data.frame(WhiteCol, WhiteCon, WhiteDel, WhiteDC, WhiteIda, WhiteIll, WhiteMic, WhiteNH, WhiteRI, WhiteWV)

ADF <- PlotPrep(G1, G2)

pars = MethodOfThresholds(G1, G2) #In Cohen's d terms this is .932 d
par(mar = c(4, 4, 1.5, 0.8), bg = "#FBEEE6")
DiversitySpacePlot(ADF, 
                   pars[1], pars[2],
                   xlabel = "Fraction of Blacks Achieving Threshold",
                   ylabel = "Fraction of Whites Achieving Threshold")
grid()

pars; pars[1]*15
## Glass' Delta     SD Ratio 
##        0.946        1.030
## Glass' Delta 
##        14.19

Post-Script VII: La Griffe’s Results

I pulled La Griffe Du Lion’s Black-White (Figure 4) and male-female (Figure 5) data from his original post, linked above. Since I am taking the data from the charts, the values are very approximate and will be slightly off. La Griffe’s original Black-White data indicated a mean difference of 1.09 SDs and a variance ratio of .888. The male-female data indicated a mean difference of .162 SDs and a variance ratio of .916. Below, I got 1.08 (.854) for the Black-White data and .169 (.939) for the male-female data. In Cohen’s d terms, the gaps La Griffe observed were -1.117 for Black-White differences and -.172 for Female-Male differences.

White <- c(.10, .26, .40, .58, .73, .81, .92)
Black <- c(.01, .02, .08, .16, .31, .44, .62)

pars = MethodOfThresholds(White, Black, sum = F)
par(mar = c(4, 4, 1.5, 0.8), bg = "#FBEEE6")
DiversitySpacePlot(rbind(White, 
                         Black), 
                   pars[1], pars[2],
                   xlabel = "Fraction of Whites Achieving Threshold",
                   ylabel = "Fraction of Blacks Achieving Threshold")
grid()

pars; pars[2]^2
## Glass' Delta     SD Ratio 
##       -1.075        0.924
## SD Ratio 
## 0.853776
Male   <- c(.01, .12, .23, .25, .30, .40, .49, .58, .60, .72, .75, .74, .87, .90, .95)
Female <- c(.01, .08, .18, .18, .25, .32, .43, .51, .53, .66, .69, .71, .84, .87, .93)

pars = MethodOfThresholds(Male, Female, sum = F)
par(mar = c(4, 4, 1.5, 0.8), bg = "#FBEEE6")
DiversitySpacePlot(rbind(Male, 
                         Female), 
                   pars[1], pars[2],
                   xlabel = "Fraction of Males Achieving Threshold",
                   ylabel = "Fraction of Females Achieving Threshold")
grid()

pars; pars[2]^2
## Glass' Delta     SD Ratio 
##       -0.169        0.969
## SD Ratio 
## 0.938961

References

Ranking the States by the Black-White SAT Scoring Gap. (2000). The Journal of Blacks in Higher Education, 29, 91–95. https://doi.org/10.2307/2678852

sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19044)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.28   R6_2.5.1        jsonlite_1.7.2  magrittr_2.0.1 
##  [5] evaluate_0.14   highr_0.9       rlang_0.4.12    stringi_1.7.5  
##  [9] jquerylib_0.1.4 bslib_0.3.1     rmarkdown_2.11  tools_4.1.2    
## [13] stringr_1.4.0   xfun_0.27       yaml_2.2.1      fastmap_1.1.0  
## [17] compiler_4.1.2  htmltools_0.5.2 knitr_1.36      sass_0.4.0