Setup

library(pacman); p_load(ggplot2, posc)

my_theme <- function(){
  theme(
    legend.background = element_rect(colour = "black", linewidth = 0.5),
    legend.key.width = unit(0.5, "cm"),
    legend.key.height = unit(0.4, "cm"),
    panel.background = element_rect(fill = "white"),
    plot.background = element_rect(fill = "white"),
    panel.grid.major.x = element_blank(),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.y = element_line(colour ="gray80",linewidth = 1,linetype="dashed"),
    panel.grid.minor.y = element_blank(),
    axis.text.x = element_text(colour ="black",size=rel(1.5),hjust=0.5),
    axis.text.y = element_text(colour ="black",size=rel(1.5),vjust=0.5),
    axis.title.x = element_text(colour ="black",size=rel(1.2)),
    axis.title.y = element_text(colour ="black",size=rel(1.2)),
    plot.margin=unit(c(1,0.5,0.5,0.5), "cm"),
    plot.title=element_text(hjust=0.5,vjust=5,size=rel(1.5),face="bold"),
    plot.caption=element_text(hjust=0,size=rel(1.2)),
    axis.line.x=element_line(color="gray80",linewidth = 1,linetype="dashed"),
    axis.line.y=element_line(color="gray80",linewidth = 1,linetype="dashed"),
    axis.ticks.x = element_blank(),
    axis.ticks.y = element_blank())}

Rationale

Rosenthal’s Binomial Effect Size Display (BESD) is a simple way to quickly understand how well a correlation indicates how well something works for selecting among options. There are linearity and dependence assumptions to use the BESD, but it is still very interesting. I have written about it before, here: https://rpubs.com/JLLJ/BESD

In 2012, Ruscio & Mullen provided a means to compute confidence intervals for a related measure: probability of superiority. The probability of superiority is a great effect size since it allows visualizing the effect of selection with some variable continuously (see also Ruscio & Gera, 2013). So, instead of simply knowing that person 1 has a lower level of trait X that correlates at Y with outcome Z compared to person 2, we can visualize the effect of the size of the difference between the two, and this method just requires monotonicity.

In 2022, an R package for calculating the probability of superior was introduced, RProbSup. Now, PhD student Matthew Jané has introduced an R package for visualizing the probability of superiority. It’s a good package, and I’ll attach his examples and links to his code at the end. I also linked to code for expectancy plots, since they are useful for this sort of thing too.

This post shows how to translate Jané’s plots to ggplot2 format and illustrates their use with some of the correlations in Strenze’s (2007) IQ-SES meta-analysis.

Analysis

Here’s translating a correlation POSC to ggplot.

P1 = posc_uni(r = .5, n = 400)
POSCDF <- data.frame(
  "X"  = P1$predict_matrix$delta_x,
  "PY" = P1$predict_matrix$py,
  "pyUCI" = P1$predict_matrix$pyUCI,
  "pyLCI" = P1$predict_matrix$pyLCI)

ggplot(POSCDF, aes(x = X, y = PY)) +
  geom_line(color = "steelblue", linewidth = 1.2) +
  geom_ribbon(aes(ymin = pyLCI, ymax = pyUCI), alpha = 0.2, fill = "steelblue", color = "steelblue") +
  labs(x = "Difference in Predictor (SDs)", y = "Probability of Superiority") +
  scale_y_continuous(expand = c(0, 0)) +
  scale_x_continuous(expand = c(0, 0)) +
  my_theme() + 
  labs(
    x = "Predictor Difference (SDs)",
    y = "Probability of Superiority",
    title = "Probability of Superiority Curves Show How Differences Translate to the Probability of Performing Better")

Here’s translating an LM to ggplot.

x1 = rnorm(200,0,.5)
x2 = rnorm(200,0,.5)
y = x1 + x2 + rnorm(200,0,1)

mdl = lm(y ~ x1 + x2)

P2 = posc_lm(mdl, color = 'darkgreen')
POSCDF <- data.frame(
  "X"  = P2$predict_matrix$delta_x,
  "PY" = P2$predict_matrix$py,
  "pyUCI" = P2$predict_matrix$pyUCI,
  "pyLCI" = P2$predict_matrix$pyLCI)

ggplot(POSCDF, aes(x = X, y = PY)) +
  geom_line(color = "darkgreen", linewidth = 1.2) +
  geom_ribbon(aes(ymin = pyLCI, ymax = pyUCI), alpha = 0.2, fill = "darkgreen", color = "darkgreen") +
  labs(x = "Difference in Predictor (SDs)", y = "Probability of Superiority") +
  scale_y_continuous(expand = c(0, 0)) +
  scale_x_continuous(expand = c(0, 0)) +
  my_theme() + 
  labs(
    x = "Predictor Difference (SDs)",
    y = "Probability of Superiority",
    title = "Probability of Superiority Curves Show How Differences Translate to the Probability of Performing Better")

Unfortunately, the function posc_comp() currently only takes up two two curves. Here’s code to make this fit for an arbitrary number of curves. All you have to do is supply a list of the curves and the colors you want them to use if you use more than five colors.

posc_comp_ex <- function(posc_list, colors = c('red', 'blue', 'purple', 'green', 'orange')){

  options(warn=-1)

  if (length(posc_list) > length(colors)) {
    stop("Please provide a color for each posc object in the list")
  }

  posc_mdl = c()
  posc_preds = c()
  z = c()
  n = c()

  for (i in seq_along(posc_list)) {
    posc_mdl = c(posc_mdl, rep(as.character(i),nrow(posc_list[[i]]$predict_matrix)))
    posc_preds = rbind(posc_preds, posc_list[[i]]$predict_matrix)
    z = c(z, atanh(posc_list[[i]]$parameters$r))
    n = c(n, posc_list[[i]]$parameters$n)
  }

  df = cbind(posc_mdl, posc_preds)

  z_combinations = combn(z, 2)
  n_combinations = combn(n, 2)

  z_diff = apply(z_combinations, 2, function(x) x[1] - x[2])
  z_se = apply(n_combinations, 2, function(x) sqrt((1 / (x[1] - 3)) + (1 / (x[2] - 3))))
  z_stat = round(z_diff / z_se, 3)
  p_value = round(2*(1-pnorm(abs(z_stat))),4)

  main_plot = ggplot(data=df, aes(x=delta_x,y=py,group=posc_mdl, color = posc_mdl,fill = posc_mdl)) +
    geom_line(linewidth=1.5) +
    coord_cartesian(ylim=c(.5,1),xlim = c(0,4)) +
    theme_light() +
    geom_ribbon(data=NULL,aes(x = delta_x,ymin = pyLCI,ymax = pyUCI),alpha=.15,color=NA) +
    scale_color_manual(values = colors) +
    scale_fill_manual(values = colors) +
    theme(axis.text.x = element_text(size=11),
        axis.text.y = element_text(size=11),
        axis.title.x = element_text(size=12),
        axis.title.y = element_text(size=12),
        aspect.ratio = .8) +
    ylab('Probability of Higher Outcome') +
    xlab('Difference in Predictor (SD)') +
    ggtitle('Probability of Outcome Superiority Curve (POSC)')

   stats = data.frame(r = z,
                      n = n)

   print(stats)
   print(main_plot)

   invisible(list(stats = stats,
                  posc_plot_object = main_plot))

}

Now you probably don’t want to have to go through every POSC object to get out the parameters. Here’s a function that will automate it:

POSCExtractor <- function(posc_obj, df_name = "POSCDF") {
  if (!is.null(posc_obj$predict_matrix)) {
    df <- data.frame(
      "X"  = posc_obj$predict_matrix$delta_x,
      "PY" = posc_obj$predict_matrix$py,
      "pyUCI" = posc_obj$predict_matrix$pyUCI,
      "pyLCI" = posc_obj$predict_matrix$pyLCI)
  } else if (!is.null(posc_obj$posc_plot_object)) {
    df <- data.frame(
      "Group" = posc_obj$posc_plot_object$data$posc_mdl,
      "X"     = posc_obj$posc_plot_object$data$delta_x,
      "PY"    = posc_obj$posc_plot_object$data$py,
      "pyUCI" = posc_obj$posc_plot_object$data$pyUCI,
      "pyLCI" = posc_obj$posc_plot_object$data$pyLCI)
  } else {
    stop("Invalid posc object")
  }
  assign(df_name, df, envir = .GlobalEnv)
  return(df)}

This would recreate the first plot with that function:

POSCExtractor(P1, "My_DF")

ggplot(My_DF, aes(x = X, y = PY)) +
  geom_line(color = "steelblue", linewidth = 1.2) +
  geom_ribbon(aes(ymin = pyLCI, ymax = pyUCI), alpha = 0.2, fill = "steelblue", color = "steelblue") +
  labs(x = "Difference in Predictor (SDs)", y = "Probability of Superiority") +
  scale_y_continuous(expand = c(0, 0)) +
  scale_x_continuous(expand = c(0, 0)) +
  my_theme() + 
  labs(
    x = "Predictor Difference (SDs)",
    y = "Probability of Superiority",
    title = "Probability of Superiority Curves Show How Differences Translate to the Probability of Performing Better")

Onto Strenze’s results.

BestEdu <- posc_uni(r = 0.49, n = 26504)
BestOcc <- posc_uni(r = 0.41, n = 43304)
BestInc <- posc_uni(r = 0.22, n = 29152)

First, here are two uses to show that you can use the default colors or choose different ones with the extended posc_comp function, posc_comp_ex.

posc_comp_ex(list(BestEdu, BestOcc, BestInc))
posc_comp_ex(list(BestEdu, BestOcc, BestInc), colors = c('purple', 'green', 'orange'))

Second, here’s the plot in ggplot.

EDOCIN = posc_comp_ex(list(BestEdu, BestOcc, BestInc))
POSCExtractor(EDOCIN, "My_DF")
ggplot(My_DF, aes(x = X, y = PY,
                   ymin = pyLCI, ymax = pyUCI,
                   color = Group, fill = Group)) +
  geom_line(linewidth = 1.2) +
  geom_ribbon(aes(ymin = pyLCI, ymax = pyUCI), alpha = 0.2) +
  labs(x = "Difference in Predictor (SDs)", y = "Probability of Superiority") +
  scale_y_continuous(expand = c(0, 0)) +
  scale_x_continuous(expand = c(0, 0)) +
  my_theme() + 
  scale_color_manual(values = c("steelblue", "orangered", "darkgreen"),
                     labels = c("Education", "Occupation", "Income")) + 
  scale_fill_manual(values = c("steelblue", "orangered", "darkgreen"),
                     labels = c("Education", "Occupation", "Income")) +
  labs(
    color = NULL,
    fill = NULL,
    x = "Predictor Difference (SDs)",
    y = "Probability of Superiority",
    title = "The Relationship Between IQ and Education, Occupation, and Income, Illustrated With POSCs") +
  theme(
    legend.position = c(.95, .07),
    legend.background = element_blank(),
    legend.title=element_text(color = "black", size = 13),
    legend.text=element_text(color = "black", size = 13),
    legend.key.size = unit(.5, "cm"),
    legend.key.height = unit(.5, "cm"))

References

Ruscio, J., & Mullen, T. (2012). Confidence Intervals for the Probability of Superiority Effect Size Measure and the Area Under a Receiver Operating Characteristic Curve. Multivariate Behavioral Research, 47(2), 201–223. https://doi.org/10.1080/00273171.2012.658329

Ruscio, J., & Gera, B. L. (2013). Generalizations and Extensions of the Probability of Superiority Effect Size Estimator. Multivariate Behavioral Research, 48(2), 208–219. https://doi.org/10.1080/00273171.2012.738184

RPobSup: https://web.archive.org/web/20230712202845/https://joss.tcnj.edu/wp-content/uploads/sites/176/2022/04/2022-Ortelli-Psychology.pdf, https://web.archive.org/web/20230712202830/https://cran.r-project.org/web/packages/RProbSup/RProbSup.pdf

POSC: https://web.archive.org/web/20230712201748/https://github.com/MatthewBJane/posc, https://web.archive.org/web/20230712201805/https://github.com/MatthewBJane/posc/blob/main/R/globals.R, https://web.archive.org/web/20230712201820/https://github.com/MatthewBJane/posc/blob/main/R/posc_comp.R, https://web.archive.org/web/20230712201836/https://github.com/MatthewBJane/posc/blob/main/R/posc_lm.R, https://web.archive.org/web/20230712201905/https://github.com/MatthewBJane/posc/blob/main/R/posc_multi.R, https://web.archive.org/web/20230712201852/https://github.com/MatthewBJane/posc/blob/main/R/posc_uni.R.

Expectancy Curves: https://web.archive.org/web/20230712203232/https://github.com/AJThurston/expectancy

Strenze, T. (2007). Intelligence and socioeconomic success: A meta-analytic review of longitudinal research. Intelligence, 35(5), 401–426. https://doi.org/10.1016/j.intell.2006.09.004

Matthew’s Examples

r = .5
n = 100
posc_uni(r, n)

x1 = rnorm(200,0,.5)
x2 = rnorm(200,0,.5)
y = x1 + x2 + rnorm(200,0,1)

mdl = lm(y ~ x1 + x2)

posc_lm(mdl, color = 'darkgreen')

r_mat = rbind(X1 = c(1.0,  0.1,  0.3),
                        X2 = c(0.1,  1.0,  0.2),
                         Y  = c(0.3,  0.2,  1.0))
n = 200

posc_multi(r_mat = r_mat, n = n, outcome_idx = 1)

posc_multi(r_mat = r_mat, n = n, outcome_idx = 2)

posc_multi(r_mat = r_mat, n = n, outcome_idx = 3)

P1 = posc_uni(r = .5, n = 400)

P2 = posc_uni(r = .3, n = 400)

posc_comp(P1,P2)
##     r   n     z     p
## 1 0.5 400            
## 2 0.3 400 3.378 7e-04

sessionInfo()
## R version 4.2.2 (2022-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19045)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] posc_0.1.0    ggplot2_3.4.1 pacman_0.5.1 
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.10      highr_0.10       later_1.3.0      bslib_0.4.2     
##  [5] compiler_4.2.2   pillar_1.8.1     jquerylib_0.1.4  tools_4.2.2     
##  [9] digest_0.6.31    jsonlite_1.8.4   evaluate_0.20    lifecycle_1.0.3 
## [13] tibble_3.1.8     gtable_0.3.1     pkgconfig_2.0.3  rlang_1.0.6     
## [17] shiny_1.7.4      cli_3.6.0        rstudioapi_0.14  yaml_2.3.7      
## [21] xfun_0.37        fastmap_1.1.0    ggExtra_0.10.0   withr_2.5.0     
## [25] dplyr_1.1.0      knitr_1.42       generics_0.1.3   vctrs_0.5.2     
## [29] sass_0.4.5       grid_4.2.2       tidyselect_1.2.0 glue_1.6.2      
## [33] R6_2.5.1         fansi_1.0.4      rmarkdown_2.20   farver_2.1.1    
## [37] magrittr_2.0.3   ellipsis_0.3.2   promises_1.2.0.1 scales_1.2.1    
## [41] htmltools_0.5.4  xtable_1.8-4     mime_0.12        colorspace_2.1-0
## [45] httpuv_1.6.9     labeling_0.4.2   utf8_1.2.3       miniUI_0.1.1.1  
## [49] munsell_0.5.0    cachem_1.0.6