Setup

Functions

BESD <- function(r){
  PC = 0.5 * abs(r) + 0.5
  return(message(paste("Your probability of correct classification using a predictor with a performance correlation of", r, "is", PC)))}

BESDV <- function(v){
  v = v/100
  r = sqrt(v)
  PC = 0.5 * abs(r) + 0.5
  return(message(paste("Your probability of correct classification using a predictor explaining", v*100, "percent of the variance is", PC)))}

BESDS <- function(r){
  PC = 0.5 * abs(r) + 0.5
  return(PC)}

BESDVS <- function(v){
  v = v/100
  r = sqrt(v)
  PC = 0.5 * abs(r) + 0.5
  return(PC)}

FWBESD <- function(A, B, C, D){
  r = A/(A+B)-C/(C+D)
  return(r)}

CONTrad <- function(A, B, C, D){
  r = (A * D - B * C)/sqrt((A + B) * (C + D) * (A + C) * (B + D))
  OR = (A * D)/(B * C)
  RR = (A/(A + B))/(C/(C + D))
  d = 2*r/sqrt(1-r^2)
  return(message(paste("This contingency table is consistent with a correlation of r =", r, "an odds ratio of", OR, "and a relative risk of", RR, "consistent with a Cohen's d of", d)))}

CONY <- function(a, b, c, d){
  #y = "percentage of subjects with positive outcomes"
  #x = Percentage in control group
  #a, b, c, d = percentages
  x = c + d
  y = (a + c)/(a + b + c + d)
  r = ((a*d) - (b*c))/sqrt((1-x)*x*y*(1-y))
  d = 2*r/sqrt(1-r^2)
  return(message(paste("This contingency table is consistent with a correlation of r =", r, "and a Cohen's d of", d)))}

CONMod <- function(A, B, w, nC){
  #w = "Percentage of control group subjects with positive outcomes"
  #nC = Number in control group
  x = nC/(nC + A + B)
  r = ((A * x) * (1 - w) - B * x * w)/sqrt((1 - x) * x * (A + (x * w)) * (B + x - (x * w)))
  C = x * w
  D = x * (1 - w)
  OR = (A * D)/(B * C)
  RR = (A/(A + B))/(C/(C + D))
  d = 2*r/sqrt(1-r^2)
  return(message(paste("This contingency table is consistent with a correlation of r =", r, "an odds ratio of", OR, "and a relative risk of", RR, "consistent with a Cohen's d of", d)))}

Contingency Tables

See Miller, Hendrie & Derzon (2011) for more information. Their formulae were inconsistent and some of them are a bit strange, but I present their stuff nonetheless.

The first table is a table with known experimental outcomes. The correlation coefficient is given by

\[r = \frac{AD - BC}{\sqrt{(A+B) \times (C+D) \times (A+C) \times (B+D)}}\]

or, asymptotically,

\[r = \frac{A}{A+B} - \frac{C}{C+D}\]

BESDF <- data.frame("ID" = c("Treatment", "Control", "TotalH"), "Positive" = c("A", "C", "A + C"), "Unimproved" = c("B", "D", "B + D"), "TotalV" = c("A + B", "C + D", "1")); BESDF

The second table is one with known marginals where y is "the percentage of subjects with positive outcomes" and x is "the percentage of subjects who were controls". The correlation coefficient is given by

\[r = \frac{AD-BC}{\sqrt{(1-x)xy(1-y)}}\]

BESDF <- data.frame("ID" = c("Treatment", "Control", "TotalH"), "Positive" = c("A", "C", "y"), "Unimproved" = c("B", "D", "1-y"), "TotalV" = c("1 - x", "x", "1")); BESDF

The third table is one where a "known percentage of the control population will have a positive outcome". The correlation coefficient is given by

\[r = \frac{Ax(1-w)-Bxw}{\sqrt{(1-x)x(A+xw)(B+x-xw)}}\]

BESDF <- data.frame("ID" = c("Treatment", "Control", "TotalH"), "Positive" = c("A", "xw", "A + xw"), "Unimproved" = c("B", "x(1 - w)", "B + x(1 - w)"), "TotalV" = c("1 - x", "x", "1")); BESDF

The final table with r = 0.10 is displayed by Rosnow, Rosenthal & Rubin (2000). The original paper (Rosenthal & Rubin, 1982) is also interesting.

BESDF <- data.frame("Condition" = c("New Drug", "Old Drug", "TotalH"), "Above Median Outcome" = c(55, 45, 100), "Below Median Outcome" = c(45, 55, 100), "TotalV" = c(100, 100, 200)); BESDF

Examples

BESD(0.10); A = seq(0, 1, 0.01) #absolute value, so backwards with -1 to 1
## Your probability of correct classification using a predictor with a performance correlation of 0.1 is 0.55
BESDS(A)
##   [1] 0.500 0.505 0.510 0.515 0.520 0.525 0.530 0.535 0.540 0.545 0.550 0.555
##  [13] 0.560 0.565 0.570 0.575 0.580 0.585 0.590 0.595 0.600 0.605 0.610 0.615
##  [25] 0.620 0.625 0.630 0.635 0.640 0.645 0.650 0.655 0.660 0.665 0.670 0.675
##  [37] 0.680 0.685 0.690 0.695 0.700 0.705 0.710 0.715 0.720 0.725 0.730 0.735
##  [49] 0.740 0.745 0.750 0.755 0.760 0.765 0.770 0.775 0.780 0.785 0.790 0.795
##  [61] 0.800 0.805 0.810 0.815 0.820 0.825 0.830 0.835 0.840 0.845 0.850 0.855
##  [73] 0.860 0.865 0.870 0.875 0.880 0.885 0.890 0.895 0.900 0.905 0.910 0.915
##  [85] 0.920 0.925 0.930 0.935 0.940 0.945 0.950 0.955 0.960 0.965 0.970 0.975
##  [97] 0.980 0.985 0.990 0.995 1.000
BESDV(10); B = seq(0, 100, 1) #absolute value, so backwards with -1 to 1
## Your probability of correct classification using a predictor explaining 10 percent of the variance is 0.658113883008419
BESDVS(B)
##   [1] 0.5000000 0.5500000 0.5707107 0.5866025 0.6000000 0.6118034 0.6224745
##   [8] 0.6322876 0.6414214 0.6500000 0.6581139 0.6658312 0.6732051 0.6802776
##  [15] 0.6870829 0.6936492 0.7000000 0.7061553 0.7121320 0.7179449 0.7236068
##  [22] 0.7291288 0.7345208 0.7397916 0.7449490 0.7500000 0.7549510 0.7598076
##  [29] 0.7645751 0.7692582 0.7738613 0.7783882 0.7828427 0.7872281 0.7915476
##  [36] 0.7958040 0.8000000 0.8041381 0.8082207 0.8122499 0.8162278 0.8201562
##  [43] 0.8240370 0.8278719 0.8316625 0.8354102 0.8391165 0.8427827 0.8464102
##  [50] 0.8500000 0.8535534 0.8570714 0.8605551 0.8640055 0.8674235 0.8708099
##  [57] 0.8741657 0.8774917 0.8807887 0.8840573 0.8872983 0.8905125 0.8937004
##  [64] 0.8968627 0.9000000 0.9031129 0.9062019 0.9092676 0.9123106 0.9153312
##  [71] 0.9183300 0.9213075 0.9242641 0.9272002 0.9301163 0.9330127 0.9358899
##  [78] 0.9387482 0.9415880 0.9444097 0.9472136 0.9500000 0.9527693 0.9555217
##  [85] 0.9582576 0.9609772 0.9636809 0.9663690 0.9690416 0.9716991 0.9743416
##  [92] 0.9769696 0.9795832 0.9821825 0.9847680 0.9873397 0.9898979 0.9924429
##  [99] 0.9949747 0.9974937 1.0000000
plot(BESDS(A), type = 'l', main = "Classification Accuracy by r and r-Squared", xlab = NA, ylab = "Accuracy", col = "orangered", lwd = 2, lty = 3); lines(BESDVS(B), type = 'l', lwd = 3, lty = 4, col = "steelblue2")

CONTrad(55, 45, 45, 55)
## This contingency table is consistent with a correlation of r = 0.1 an odds ratio of 1.49382716049383 and a relative risk of 1.22222222222222 consistent with a Cohen's d of 0.201007563051842
CONY(0.275, 0.225, 0.225, 0.275)
## This contingency table is consistent with a correlation of r = 0.1 and a Cohen's d of 0.201007563051842
CONMod(55, 45, 0.45, 100)
## This contingency table is consistent with a correlation of r = 0.199987526167071 an odds ratio of 1.49382716049383 and a relative risk of 1.22222222222222 consistent with a Cohen's d of 0.40822176754137

References

Miller, T. R., Hendrie, D., & Derzon, J. (2011). Exact method for computing absolute percent change in a dichotomous outcome from meta-analytic effect size: Improving impact and cost-outcome estimates. Value in Health, 14(1), 144-151. https://doi.org/10.1016/j.jval.2010.10.013

Rosnow, R. L., Rosenthal, R., & Rubin, D. B. (2000). Contrasts and Correlations in Effect-Size Estimation. Psychological Science, 11(6), 446-453. https://doi.org/10.1111/1467-9280.00287

Rosenthal, R., & Rubin, D. B. (1982). A simple, general purpose display of magnitude of experimental effect. Journal of Educational Psychology, 74(2), 166-169. https://doi.org/10.1037/0022-0663.74.2.166