Rationale and Examples

Ecological correlations are correlations between means of variables rather than between those variables at the individual level. Researchers are widely aware of some of the issues with using ecological correlations. For example, ecological correlations tend to be greatly inflated versions of the individual-level correlation. With constructed rather than nonarbitrary groupings, it is even trivial to produce perfect ecological correlations (Lubinski & Humphreys, 1996). The reasons for the inflation are well-understood to be numerous. One of the most commonly stated is that ecological correlations remove measurement error and reduce noise. But, there is actually a way to mathematically connect the individual and ecological varieties of correlations.

Robinson (1950) noted the ultimate reason for why ecological correlations are greater than individual ones:

[T]he ecological [correlation] will be numerically greater than the individual correlation whenever the within-areas individual correlation is not greater than the total individual correlation, and this is the usual circumstance. (p. 356)

Robinson defined the total individual correlation, \(r\), as “the simple Pearsonian correlation between X and Y for all N members of the total group, computed without reference to [ecological] position at all”. He defined the ecological correlation, \(r_e\), as “the… correlation between the m pairs of of X- and Y- percentages which describe the sub-groups”. Finally, the within-areas individual correlation, \(r_w\), was defined as “a[n…] average of the m within-areas individual correlations between X and Y.” (p. 355). On the same page, Robinson also noted two correlation ratios, \(\eta_{XA}\) and \(\eta_{YA}\), which were proposed to “measure the degree to which the values of X and Y show clustering by area”. The formulas relating the ecological and total individual correlations were

\[r_e = k_1r - k_2r_w\]

where

\[k_1 = \frac{1}{\eta_{XA}\eta_{YA}}\]

and

\[k_2 = \frac{\sqrt{1-\eta_{XA}^2}\sqrt{1-\eta_{YA}^2}}{\eta_{XA}\eta_{YA}}\]

Or, in natural language, “the ecological correlation is the weighted difference between the total individual correlation and the average of the m within-areas individual correlations. In this weighted difference, the weights of the total individual correlation and the within-areas individual correlation depend upon the degree to which the values of X and Y show clustering by area.” (p. 356). The total individual and ecological correlations will be equal when

\[r_w = k_3r\]

where

\[k_3 = \frac{1-\eta_{XA}\eta_{YA}}{\sqrt{1-\eta_{XA}^2}\sqrt{1-\eta_{YA}^2}}\]

but, “the individual and ecological correlations will be equal, only if the average within-areas individual correlation is not less than the total individual correlation. But all available evidence is that (whatever properties X and Y may denote) the correlation between X and Y is certainly not larger for relatively homogeneous sub-groups of persons than it is for the population at large. In short, the equivalency assumption has no basis in fact.” (p. 356).

These related facts explain why ecological correlations are inflated, and under what conditions they can be more or less increased relative to the individual correlation. As Robinson wrote:

Equation (I) shows why the size of the ecological correlation depends upon the number of sub-areas, for the behavior of the ecological correlation as small sub-areas are grouped into larger ones can be predicted from the behavior of the variables on the right side of (I) as consolidation takes place. As smaller areas are consolidated, two things happen:

  1. The average within-areas individual correlation increases in size because of the increasing heterogeneity of the sub-areas. The effect of this is to decrease the value of the ecological correlation.

  2. The values of \(\eta_{XA}\) and \(\eta_{YA}\) decrease because of the decrease in the homogeneity of values of X and Y within sub-areas. The effect of this is to increase the value of the ecological correlation.

However, these two tendencies are of unequal importance. Investigation of (I) with respect to the effect of changes in the values of \(\eta_{XA}, \eta_{YA}\), and \(r_w\) indicates that the influence of changes in the \(\eta\)’s is considerably more important than the influence of changes in the value of \(r_w\). The net effect of changes in the values of the \(\eta\)’s and of \(r_w\) taken together, therefore, is to increase the numerical value of the ecological correlation as consolidation takes place.

The two correlation ratios, \(\eta_{XA}\) and \(\eta_{YA}\), are the ratios of the averaged standard deviations of X when grouped versus the individual level standard deviation or in other words, \(\sqrt{\text{ICC}}\).

So, lets say that we have an individual level correlation of .1 and the within-areas correlations are half that, at .05 and lets assume the standard deviations of X and Y are 85% as large between levels as they are overall.

EcologicalInflation <- function(R, RW, SDX, SDY, SDGX, SDGY, rnd = 3){
  ETA_X = SDGX^2/SDX^2
  ETA_Y = SDGY^2/SDY^2
  K1 = 1/(ETA_X * ETA_Y)
  K2 = (sqrt(1 - ETA_X^2) * sqrt(1 - ETA_Y^2))/(ETA_X * ETA_Y)
  RE = (K1 * R) - (K2 * RW)
  InflationFactor = RE/R
  cat(paste0("The ecological inflation factor is ", round(InflationFactor, rnd), ". \n"))}

EcologicalInflation(.1, .05, 1, 1, .85, .85)
## The ecological inflation factor is 1.458.

And we can investigate the effects of altering \(r_w\) and the various \(\eta\) values, too.

cat("Varying the within-area correlation. \n")
## Varying the within-area correlation.
RWI <- seq(0, 0.1, 0.01)

EcologicalInflation(.1, RWI, 1, 1, .85, .85)
## The ecological inflation factor is 1.916. 
##  The ecological inflation factor is 1.824. 
##  The ecological inflation factor is 1.733. 
##  The ecological inflation factor is 1.641. 
##  The ecological inflation factor is 1.549. 
##  The ecological inflation factor is 1.458. 
##  The ecological inflation factor is 1.366. 
##  The ecological inflation factor is 1.275. 
##  The ecological inflation factor is 1.183. 
##  The ecological inflation factor is 1.092. 
##  The ecological inflation factor is 1.
cat("Varying the standard deviation of one X or Y. \n")
## Varying the standard deviation of one X or Y.
SDXYI <- seq(1, 10, 1)

EcologicalInflation(.1, .05, SDXYI, 1, .85, .85)
## The ecological inflation factor is 1.458. 
##  The ecological inflation factor is 5.057. 
##  The ecological inflation factor is 11.3. 
##  The ecological inflation factor is 20.066. 
##  The ecological inflation factor is 31.343. 
##  The ecological inflation factor is 45.129. 
##  The ecological inflation factor is 61.423. 
##  The ecological inflation factor is 80.224. 
##  The ecological inflation factor is 101.532. 
##  The ecological inflation factor is 125.348.
cat("Varying the standard deviations of both in the same direction. \n")
## Varying the standard deviations of both in the same direction.
EcologicalInflation(.1, .05, SDXYI, SDXYI, .85, .85)
## The ecological inflation factor is 1.458. 
##  The ecological inflation factor is 15.825. 
##  The ecological inflation factor is 78.085. 
##  The ecological inflation factor is 245.708. 
##  The ecological inflation factor is 599.152. 
##  The ecological inflation factor is 1241.864. 
##  The ecological inflation factor is 2300.281. 
##  The ecological inflation factor is 3923.825. 
##  The ecological inflation factor is 6284.908. 
##  The ecological inflation factor is 9578.929.
SDYXI <- seq(10, 1, -1)

cat("Varying the standard deviations of both in different directions. \n")
## Varying the standard deviations of both in different directions.
EcologicalInflation(.1, .05, SDXYI, SDYXI, .85, .85)
## The ecological inflation factor is 125.348. 
##  The ecological inflation factor is 315.458. 
##  The ecological inflation factor is 553.533. 
##  The ecological inflation factor is 751.796. 
##  The ecological inflation factor is 862.592. 
##  The ecological inflation factor is 862.592. 
##  The ecological inflation factor is 751.796. 
##  The ecological inflation factor is 553.533. 
##  The ecological inflation factor is 315.458. 
##  The ecological inflation factor is 125.348.

There is no law saying that the standard deviation in the population must be smaller than the standard deviation within units, but it is usually going to be the case and the function can be amended accordingly. The same can be said for the relationship between within-area and total individual correlations. If that occurred, the inflation would turn to deflation, but it is a condition that is likely to be very rare. Interestingly, the explanation of the ecological correlation via measurement error does make sense in this framework: The total population has a greater ability to converge to the true variance for finite sampling reasons that measurement error aggravates while stratification-induced homogeneity reduces the within-area variance compared to that same general population. Systematic effects of aggregation have more to work with thanks to measurement error, even though it is not strictly necessary.

A useful application of this is to check the interpretations of ecological correlations by simulating a number of samples equal to the number of units in a reported ecological correlation and giving each of these samples datapoints with values of X and Y to assess what value of the individual level correlation, with empirical or suspected between-groups and hopefully between-persons standard deviations, fits. Deriving the inflation (or deflation) factor may help to understand just how strongly an ecological correlation should be understood to represent between-persons relationships or, in other words, how ergodic it ought to be.

References

Lubinski, D., & Humphreys, L. G. (1996). Seeing the forest from the trees: When predicting the behavior or status of groups, correlate means. Psychology, Public Policy, and Law, 2(2), 363–376. https://doi.org/10.1037/1076-8971.2.2.363

Robinson, W. S. (1950). Ecological Correlations and the Behavior of Individuals. American Sociological Review, 15(3), 351–357. https://doi.org/10.2307/2087176