library(pacman); p_load(psych, dplyr)
ThresholdMean <- function(FracA, FracB, rnd = 3){
Gap = qnorm(FracA) - qnorm(FracB)
if (Gap >= 0){cat(paste0("Group A's mean is ", round(Gap, rnd), " SDs higher than Group B's. \n"))} else {cat(paste0("Group B's mean is ", abs(round(Gap, rnd)), " SDs higher than Group A's. \n"))}}
Cohensd <- function(M1, M2, SD1, SD2, N1 = 1, N2 = 1, rnd = 3){
SDP = sqrt((SD1^2 + SD2^2)/2)
SDPW = sqrt((((N1 - 1) * SD1^2) + ((N2 - 1) * SD2^2))/(N1 + N2))
d = (M2 - M1)/SDP
delta = (M2 - M1)/SD1
g = (M2 - M1)/SDPW
if (N1 & N2 <= 1) {cat(paste0("With group means of ", M1, " and ", M2," with SDs of ", SD1, " and ", SD2, ", Cohen's d is ", round(d, rnd), " Glass' Delta is ", round(delta, rnd), ". \n"))} else {cat(paste0("With group means of ", M1, " and ", M2," with SDs of ", SD1, " and ", SD2, " Cohen's d is ", round(d, rnd), " Glass' Delta is ", round(delta, rnd), " and Hedge's g is ", round(g, rnd), ". \n"))}}
TrueScore <- function(Score, Mean, Reliability, rnd = 3){
True = Reliability * Score + (1 - Reliability) * Mean
cat(paste0("The estimated true score for an individual from a group with a mean of ", Mean, " who took a test with a reliability of ", Reliability, " and earned a score of ", Score, " is ", round(True, rnd), ". \n"))}
When two groups are measured in terms of normally-distributed traits like height or intelligence and then we select members of those groups based on reaching a given trait level threshold, we can infer, from the proportions who pass the thresholds, what the difference in group means is. For example, say in Group A, 40% meet our height threshold to ride a rollercoaster. In Group B, only 30% meet the threshold. Thus:
ThresholdMean(0.4, 0.3)
## Group A's mean is 0.271 SDs higher than Group B's.
Now, what if Group A is one SD taller than Group B, and members of either group must be one SD above the mean for Group A to qualify to ride the rollercoaster?
ThresholdMean(pnorm(1, lower.tail = F), pnorm(2, lower.tail = F))
## Group A's mean is 1 SDs higher than Group B's.
Clearly, pass rates can be interesting and informative. However, if the variances are unequal such that, say, one SD for Group B is only two-thirds as large as the typical SD for Group A, and our cutoffs remain the same, thresholds no longer correctly indicate group mean differences. This can be shown easily:
ThresholdMean(pnorm(1, lower.tail = F), pnorm(3, lower.tail = F))
## Group A's mean is 2 SDs higher than Group B's.
which is wrong because
Cohensd(100, 85, 15, 10)
## With group means of 100 and 85 with SDs of 15 and 10, Cohen's d is -1.177 Glass' Delta is -1.
Putting it in another way, the threshold-mean relationship only holds under equal variances. There is no way to derive the correct mean difference from thresholds with unequal variances. The numbers should also be less accurate the further we are from a given group’s mean due to sampling sparsity. With this said, thinking in terms of what differences in threshold pass rates suggest with or without equal variances (i.e., mean differences) is useful. For example, it can help to understand findings like Friedman, Laurison & Miles’ (2015). Those authors found that, among individuals in eight different occupations, individuals whose origins were in the higher classes had higher incomes. They gave this a sociological explanation, as is probably their wont as sociologists. An alternative they neglected was a difference in the traits of different backgrounds and a threshold - hard, soft, or somewhere in-between - that let these preexisting differences continue existing beyond said threshold. Simulated data can help to encapsulate this phenomenon.
set.seed(1)
data <- data.frame(LowSES = rnorm(1e5, mean = 85, sd = 15),
ModerateSES = rnorm(1e5, mean = 100, sd = 15),
HighSES = rnorm(1e5, mean = 115, sd = 15))
Low = data$LowSES; Low[Low < 115] = NA; data$LowThreshold = Low
Moderate = data$ModerateSES; Moderate[Moderate < 115] = NA; data$ModerateThreshold = Moderate
High = data$HighSES; High[High < 115] = NA; data$HighThreshold = High
describe(data)
Individuals from the high SES group scored better on the attainment criteria after the threshold had been applied because, in any random sample, there are more people further out in the higher-scoring group after the threshold is applied. If other traits that help are also higher in the high SES group, the effect can be increased more than a single attainment criteria would show. Given the importance of tails for averages, the effect of small differences above thresholds can already be larger than the means of threshold attainment criteria might suggest. Moreover, with measurement error, high scorers from lower-scoring groups would be more likely to have scores above the threshold that were attained due to error, and thus, regression to the mean would yield larger true differences above the threshold. For example, with an 85% reliable test, the estimated true scores for the various groups above would be
TrueScore(120.47, 85, 0.85)
## The estimated true score for an individual from a group with a mean of 85 who took a test with a reliability of 0.85 and earned a score of 120.47 is 115.15.
TrueScore(122.91, 100, 0.85)
## The estimated true score for an individual from a group with a mean of 100 who took a test with a reliability of 0.85 and earned a score of 122.91 is 119.474.
TrueScore(126.91, 115, 0.85)
## The estimated true score for an individual from a group with a mean of 115 who took a test with a reliability of 0.85 and earned a score of 126.91 is 125.123.
Or, in other words, with reasonable measurement error, the apparent mean gap above the threshold goes from 6.5 points to about 10. So, with the knowledge that there are preexisting differences in the criteria that precede earned income across classes (see, e.g., Belsky et al., 2016; Belsky et al., 2018), it is quite clear that differences continue to exist in a systematic fashion, without some sort of selection against that happening, in occupations peopled by persons from initially different classes. To answer the question of whether there is even a need to sociologically theorize about the causes of such differences, one could perform a study like Nettle’s (2003), where they would take a measure of a cause of income and see if it predicts equally by class, and if there are intercept differences. Reliability will leave some residual favoring the higher-scoring groups, but that can be handled in a variety of ways.
Anyway, threshold selection has interesting implications, and differences beyond thresholds (hard or soft) should not be misunderstood to mean that groups that have been selected on no longer differ.
Friedman, S., Laurison, D., & Miles, A. (2015). Breaking the ‘Class’ Ceiling? Social Mobility into Britain’s Elite Occupations. The Sociological Review, 63(2), 259–289. https://doi.org/10.1111/1467-954X.12283
Belsky, D. W., Moffitt, T. E., Corcoran, D. L., Domingue, B., Harrington, H., Hogan, S., Houts, R., Ramrakha, S., Sugden, K., Williams, B. S., Poulton, R., & Caspi, A. (2016). The Genetics of Success: How Single-Nucleotide Polymorphisms Associated With Educational Attainment Relate to Life-Course Development. Psychological Science, 27(7), 957–972. https://doi.org/10.1177/0956797616643070
Belsky, D. W., Domingue, B. W., Wedow, R., Arseneault, L., Boardman, J. D., Caspi, A., Conley, D., Fletcher, J. M., Freese, J., Herd, P., Moffitt, T. E., Poulton, R., Sicinski, K., Wertz, J., & Harris, K. M. (2018). Genetic analysis of social-class mobility in five longitudinal studies. Proceedings of the National Academy of Sciences, 115(31), E7275–E7284. https://doi.org/10.1073/pnas.1801238115
Nettle, D. (2003). Intelligence and class mobility in the British population. British Journal of Psychology, 94(4), 551–561. https://doi.org/10.1348/000712603322503097