Setup

TransitivityBounds <- function(rXY, rYZ, rnd = 3, MinFactor = .001){
  rXYrYZ = rXY * rYZ
  UCI = rXYrYZ + sqrt(1 - rXY^2 - rYZ^2 + (rXY^2 * rYZ^2))
  LCI = rXYrYZ - sqrt(1 - rXY^2 - rYZ^2 + (rXY^2 * rYZ^2))
  rYZBound = sqrt((1 + MinFactor) - rXY^2) #A fudge factor of a precision to be determined by the user, so that the product of r_{xy}^2 and r_{yz}^2 is >1 instead of equal to 1
  cat(paste0("The range of correlations between variables X and Z given a known correlation of ", rYZ, " between Y and Z, and a correlation of ", rXY, " between X and Y is ", round(LCI, rnd), " to ", round(UCI, rnd), ". The lowest correlation between Y and Z that guarantees transitivity with our known correlation between X and Y of ", rXY, " is ", round(rYZBound, rnd), ". \n"))}

Rationale

I have previously noted that a correlation is transitive given that \(r_{xy}^2 + r_{yz}^2\) is \(> 1\) (https://rpubs.com/JLLJ/PARTI). It is useful to also provide bounds for that transitivity. The bounds for transitivity help qualify our expectations about waht it means for a correlation to be transitive by providing us with a range of possible estimates. The formula is as follows:

\[r_{xy}r_{yz} \pm \sqrt{1 - r_{xy}^2 - r_{yz}^2 + r_{xy}^2r_{yz}^2}\]

The only derivation of this formula I could find was by McCornack (1956), who cited Yule & Kendall’s textbook An introduction to the theory of statistics for it. To obtain the lowest possible guaranteed transitive correlation \(r_{yz}\) supporting a positive \(r_{xz}\) given some known \(r_{xy}\), just calculate \(\sqrt{1 - r_{xy}^2}\). The utility of this knowledge is potentially considerable. Transitivity bounds could supplement investigations into existing papers and, with high correlations, they could be used to detect instances of severely fraudulent presented correlations and to calibrate replication expectations by showing that correlations are within or outside of the expected range with a metric that is unaffected by sample size and, instead, only relies on the accuracy of the reported coefficients.

Analysis

Here are the results for the first column from McCornack’s Table 1, to verify the function. As is abundantly clear, lower values of \(r_{xy}\) increase the width of the interval for \(r_{xz}\) at constant values of \(r_{yz}\).

TransitivityBounds(.998, .3)
## The range of correlations between variables X and Z given a known correlation of 0.3 between Y and Z, and a correlation of 0.998 between X and Y is 0.239 to 0.36. The lowest correlation between Y and Z that guarantees transitivity with our known correlation between X and Y of 0.998 is 0.071.
TransitivityBounds(.996, .3)
## The range of correlations between variables X and Z given a known correlation of 0.3 between Y and Z, and a correlation of 0.996 between X and Y is 0.214 to 0.384. The lowest correlation between Y and Z that guarantees transitivity with our known correlation between X and Y of 0.996 is 0.095.
TransitivityBounds(.994, .3)
## The range of correlations between variables X and Z given a known correlation of 0.3 between Y and Z, and a correlation of 0.994 between X and Y is 0.194 to 0.403. The lowest correlation between Y and Z that guarantees transitivity with our known correlation between X and Y of 0.994 is 0.114.
TransitivityBounds(.992, .3)
## The range of correlations between variables X and Z given a known correlation of 0.3 between Y and Z, and a correlation of 0.992 between X and Y is 0.177 to 0.418. The lowest correlation between Y and Z that guarantees transitivity with our known correlation between X and Y of 0.992 is 0.13.
TransitivityBounds(.990, .3)
## The range of correlations between variables X and Z given a known correlation of 0.3 between Y and Z, and a correlation of 0.99 between X and Y is 0.162 to 0.432. The lowest correlation between Y and Z that guarantees transitivity with our known correlation between X and Y of 0.99 is 0.145.
TransitivityBounds(.980, .3)
## The range of correlations between variables X and Z given a known correlation of 0.3 between Y and Z, and a correlation of 0.98 between X and Y is 0.104 to 0.484. The lowest correlation between Y and Z that guarantees transitivity with our known correlation between X and Y of 0.98 is 0.201.
TransitivityBounds(.960, .3)
## The range of correlations between variables X and Z given a known correlation of 0.3 between Y and Z, and a correlation of 0.96 between X and Y is 0.021 to 0.555. The lowest correlation between Y and Z that guarantees transitivity with our known correlation between X and Y of 0.96 is 0.282.
TransitivityBounds(.940, .3)
## The range of correlations between variables X and Z given a known correlation of 0.3 between Y and Z, and a correlation of 0.94 between X and Y is -0.043 to 0.607. The lowest correlation between Y and Z that guarantees transitivity with our known correlation between X and Y of 0.94 is 0.343.
TransitivityBounds(.920, .3)
## The range of correlations between variables X and Z given a known correlation of 0.3 between Y and Z, and a correlation of 0.92 between X and Y is -0.098 to 0.65. The lowest correlation between Y and Z that guarantees transitivity with our known correlation between X and Y of 0.92 is 0.393.
TransitivityBounds(.900, .3)
## The range of correlations between variables X and Z given a known correlation of 0.3 between Y and Z, and a correlation of 0.9 between X and Y is -0.146 to 0.686. The lowest correlation between Y and Z that guarantees transitivity with our known correlation between X and Y of 0.9 is 0.437.

Here are the results for the first row from Table 1, so the effects of altering \(r_{yz}\) at a constant \(r_{xy}\) can be observed.

TransitivityBounds(.998, .3)
## The range of correlations between variables X and Z given a known correlation of 0.3 between Y and Z, and a correlation of 0.998 between X and Y is 0.239 to 0.36. The lowest correlation between Y and Z that guarantees transitivity with our known correlation between X and Y of 0.998 is 0.071.
TransitivityBounds(.998, .4)
## The range of correlations between variables X and Z given a known correlation of 0.4 between Y and Z, and a correlation of 0.998 between X and Y is 0.341 to 0.457. The lowest correlation between Y and Z that guarantees transitivity with our known correlation between X and Y of 0.998 is 0.071.
TransitivityBounds(.998, .5)
## The range of correlations between variables X and Z given a known correlation of 0.5 between Y and Z, and a correlation of 0.998 between X and Y is 0.444 to 0.554. The lowest correlation between Y and Z that guarantees transitivity with our known correlation between X and Y of 0.998 is 0.071.
TransitivityBounds(.998, .6)
## The range of correlations between variables X and Z given a known correlation of 0.6 between Y and Z, and a correlation of 0.998 between X and Y is 0.548 to 0.649. The lowest correlation between Y and Z that guarantees transitivity with our known correlation between X and Y of 0.998 is 0.071.

The more shared variance, the narrower the interval.

References

Mccornack, R. L. (1956). A criticism of studies comparing item-weighting methods. Journal of Applied Psychology, 40–31(5), 343–3446925. https://doi.org/10.1037/h0045635