Within-Study Meta-Analytic Tool

Overview of Within-Study Meta-Analytic Tool

This analytic tool provides a GLS-based within-study meta-analytic method to combine multiple correlation coefficients from a single study—accounting for the statistical dependency between them to produce a single pooled correlation estimate. It uses the generalized least squares (GLS) framework, weighting each correlation by its precision and accounting for their inter-correlations via a covariance matrix.

Traditional Meta-Analysis Approach (Between-Study)

Data

For illustration purposes, a data frame with four studies is included for the correlation between two variables. The observed Pearson correlation coefficients r and their corresponding sample size n are provided for each study.

# Generate a fictitious data set of four studies with correlation coefficients and associated sample sizes
study_data <- data.frame(
  study = paste("Study", c("Study 1", "Study 2", "Study 3", "Study 4")),
  r = c(0.21, 0.45, 0.31, 0.60),
  n = c(50, 100, 80, 40)
)
# View first few rows
head(study_data)

##           study    r   n
## 1 Study Study 1 0.21  50
## 2 Study Study 2 0.45 100
## 3 Study Study 3 0.31  80
## 4 Study Study 4 0.60  40

Calculating the pooled effect for a between-study meta-analysis

In a traditional meta-analysis, there are three basic steps to calculate a pooled estimate.

In the first step, all correlation coefficients are converted to Fisher’s z using the inverse tangent function “atanh” and the variance is calculated . The sampling distribution of Pearson’s r is typically skewed, especially when the true population correlation is close to -1 or +1. To address this, Fisher’s r-to-z transformation is applied to each individual study’s correlation coefficient. This transformation converts the skewed distribution of r into an approximately normally distributed variable, z, with a more stable variance.

Formula for Fisher’s \(z\) transformation: \[ z_r = 0.5 \times \ln\left(\frac{1+r}{1-r}\right) \] where:
- \(z_r\) is the Fisher-transformed correlation for a given study.
- \(\ln\) is the natural logarithm.
- \(r\) is the Pearson correlation coefficient from that study.
Variance of \(z_r\): The variance of \(z_r\) is approximately: \[ Var(z_r) = \frac{1}{N - 3} \] where:
- \(N\) is the sample size of the study.

In the second step, each study’s transformed correlation (\(z_r\)) is weighted to account for differences in precision, primarily determined by sample size. Studies with larger sample sizes provide more precise estimates and therefore receive greater weight. The most common weighting scheme is the inverse of the variance.

Weight (\(w_i\)) for each study i: \[ w_i = \frac{1}{Var(z_{r_i})} = N_i - 3 \] where:
- \(N_i\) is the sample size for study i.

The weighted average of the transformed correlations is calculated to obtain an overall estimate of the population correlation.

Pooled Fisher’s \(z\) (\(Z_{pooled}\)): \[ Z_{pooled} = \frac{\sum (w_i \times z_{r_i})}{\sum w_i} \] where:
- \(w_i\) is the weight for study i.
- \(z_{r_i}\) is the Fisher-transformed correlation for study i.

In the third step, the pooled Fisher’s \(z\) value needs to be transformed back to a correlation coefficient (\(r\)) for interpretability.

Formula for inverse Fisher’s \(z\) transformation (back to \(r\)): \[ r_{pooled} = \frac{e^{2 \times Z_{pooled}} - 1}{e^{2 \times Z_{pooled}} + 1} \] where:
- \(r_{pooled}\) is the overall estimated population correlation.
- \(e\) is Euler’s number (the base of the natural logarithm).

Now to our example:

## First: transform r to Fisher's z
study_data$z <- atanh(study_data$r)
study_data$vi <- 1 / (study_data$n - 3)
head(study_data)

##           study    r   n         z         vi
## 1 Study Study 1 0.21  50 0.2131713 0.02127660
## 2 Study Study 2 0.45 100 0.4847003 0.01030928
## 3 Study Study 3 0.31  80 0.3205454 0.01298701
## 4 Study Study 4 0.60  40 0.6931472 0.02702703

## Second: Calculate the pooled common effect meta-analysis 
model_btw <- metafor::rma(yi = z, vi = vi, data = study_data, method = "FE")
summary(model_btw)

## 
## Fixed-Effects Model (k = 4)
## 
##   logLik  deviance       AIC       BIC      AICc   
##   1.5466    5.9349   -1.0931   -1.7068    0.9069   
## 
## I^2 (total heterogeneity / total variability):   49.45%
## H^2 (total variability / sampling variability):  1.98
## 
## Test for Heterogeneity:
## Q(df = 3) = 5.9349, p-val = 0.1148
## 
## Model Results:
## 
## estimate      se    zval    pval   ci.lb   ci.ub      
##   0.4161  0.0623  6.6842  <.0001  0.2941  0.5382  *** 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Third: Back-Transform to r
# Back-transform point estimate z to r
r_est <- tanh(model_btw$b)
# Back-transform confidence interval
r_ci_lb <- tanh(model_btw$ci.lb)
r_ci_ub <- tanh(model_btw$ci.ub)

## Print results
cat("Pooled r:", round(r_est, 3), "\n", "95% CI: [", round(r_ci_lb, 3), ",", round(r_ci_ub, 3), "]\n")

## Pooled r: 0.394 
##  95% CI: [ 0.286 , 0.492 ]

Results of traditional (between-study) meta-analysis approach

The traditional meta-analysis approach yielded a common effect estimate of 0.394 95% CI [0.286, 0.492]. You can also obtain a random effect estimate by indicating “REML” in the method parameter of the metafor::rma() function but for the purposes of this example we will just focus on the common/fixed effect.

Within-Study Meta-Analysis Approach

A different approach would need to be taken if the four studies represented multiple tests of conceptually-similar variables in a single study using the same sample of participants. For instance, say that you were interested in testing if scores on a cognitive test correlated with four variables that all measure aspects of physiological stress (e.g., heart rate, skin conductance, eye movement, respiration rate). Given that you are making comparisons within the same study, using the same sample of participants, the inter-correlations between variables would not be assumed to be zero as it is in a traditional between-study meta-analysis where each study represents an independent assessment of the relationship between two variables using an independent sample of participants. In a within-study meta-analysis, there are dependencies that need to be accounted for across the variables given that the sample participants are providing data for each variable. Below, a Generalized Least Square (GLS) Within-Study Meta-analytic Approach is described in which the covariance matrix between the focal variables is utilized to account for the inter-dependencies of the data to obtain a proper pooled estimate of the effect.

The pooled Fisher’s z-transformed correlation using generalized least squares (GLS) is calculated as:

\[ \hat{z}_{\text{pooled}} = \frac{\mathbf{1}^\top \mathbf{V}^{-1} \mathbf{z}}{\mathbf{1}^\top \mathbf{V}^{-1} \mathbf{1}} \]

Where:

\(\mathbf{z}\) is the vector of Fisher-transformed correlations: \(z_i = \tanh^{-1}(r_i)\)
\(\mathbf{V}\) is the covariance matrix with elements:

\[ V_{ij} = \sqrt{\frac{1}{n_i - 3}} \cdot \sqrt{\frac{1}{n_j - 3}} \cdot R_{ij} \]

\(\mathbf{1}\) is a vector of ones
\(\mathbf{V}^{-1}\) is the inverse of the covariance matrix

The pooled correlation on the original scale is:

\[ \hat{r}_{\text{pooled}} = \tanh(\hat{z}_{\text{pooled}}) \]

Within-study meta-analysis function using a GLS approach

The code below provides a function to be used to calculate a pooled within-study meta-analysis using a Generalized Least Squares (GLS) approach and variance-covariance matrix of dependent variables. This method properly accounts for the non-independence among correlations (e.g., from overlapping variables or repeated measures within a study) by incorporating their correlation matrix into the GLS weighting.

# Within-Study Meta-Analysis Function that has the following parameters: r_vec = vector of correlation estimates, n_vec = vector of sample sizes, R_mat = covariance matrix between focal variables
pooled_dep_correlation <- function(r_vec, n_vec, R_mat) {
  # 1. Fisher z-transform
  z <- atanh(r_vec)
  var_z <- 1 / (n_vec - 3)
  
  # 2. Weighted average imputation for NA values in R_mat
  impute_mean <- function(R_mat, n_vec) {
    n <- length(n_vec)
    weights_mat <- outer(n_vec, n_vec, function(i, j) sqrt(i * j))
    weighted_sum <- 0
    weight_total <- 0
    
    for (i in 1:n) {
      for (j in 1:n) {
        if (i != j && !is.na(R_mat[i, j])) {
          weighted_sum <- weighted_sum + R_mat[i, j] * weights_mat[i, j]
          weight_total <- weight_total + weights_mat[i, j]
        }
      }
    }
    
    mean_weighted <- weighted_sum / weight_total
    
    # Fill NAs with weighted mean
    R_mat_filled <- R_mat
    R_mat_filled[is.na(R_mat)] <- mean_weighted
    return(R_mat_filled)
  }
  
  R_mat <- impute_mean(R_mat, n_vec)
  
  # 3. Compute covariance matrix
  cov_z <- outer(sqrt(var_z), sqrt(var_z)) * R_mat
  
  # 4. Inverse of covariance matrix
  inv_cov_z <- solve(cov_z)
  
  # 5. GLS pooled estimate
  ones <- rep(1, length(r_vec))
  z_pooled <- as.numeric((t(ones) %*% inv_cov_z %*% z) / (t(ones) %*% inv_cov_z %*% ones))
  
  # 6. Variance and standard error
  var_pooled_z <- 1 / as.numeric(t(ones) %*% inv_cov_z %*% ones)
  se_pooled_z <- sqrt(var_pooled_z)
  
  # 7. Confidence interval in z-space
  ci_lower_z <- z_pooled - 1.96 * se_pooled_z
  ci_upper_z <- z_pooled + 1.96 * se_pooled_z
  
  # 8. Back-transform to r
  r_pooled <- tanh(z_pooled)
  ci_lower_r <- tanh(ci_lower_z)
  ci_upper_r <- tanh(ci_upper_z)
  
  # 9. p-value (two-tailed test)
  z_stat <- z_pooled / se_pooled_z
  p_value <- 2 * (1 - pnorm(abs(z_stat)))
  
  # 10. Effective sample size
  n_eff <- sum(n_vec)
  
  return(list(
    pooled_r = r_pooled,
    effective_n = n_eff,
    ci_lower = ci_lower_r,
    ci_upper = ci_upper_r,
    p_value = p_value,
    pooled_z = z_pooled,
    CI_lower_z = ci_lower_z,
    CI_upper_z = ci_upper_z
  ))
}

Test Scenarios

The following code generates various types of covariance matrices to demonstrate how the within-study meta-analytic tool calculates pooled estimates from the data.

## Different covariance matrices to test

# A null covariance matrix (no inter-dependencies of the variables)
null.mx <- matrix(
    c(
    1.0000000, 0.0000000, 0.0000000, 0.0000000,
    0.0000000, 1.0000000, 0.0000000, 0.0000000,
    0.0000000, 0.0000000, 1.0000000, 0.0000000,
    0.0000000, 0.0000000, 0.0000000, 1.0000000
  ),
  nrow = 4, ncol = 4, byrow = TRUE
)

# A covariance matrix with a mix of inter-dependencies of the variables
cor.matrix <- matrix(
  c(
    1.0000000, 0.8486251, 0.4047249, 0.2656064,
    0.8486251, 1.0000000, 0.5038092, 0.1947458,
    0.4047249, 0.5038092, 1.0000000, 0.1061337,
    0.2656064, 0.1947458, 0.1061337, 1.0000000
  ),
  nrow = 4, ncol = 4, byrow = TRUE
)

# A covariance matrix where all inter-dependencies of the variables is very low
low.mx <- matrix(
    c(
    1.0000000, 0.1000100, 0.1000100, 0.1000100,
    0.1000100, 1.0000000, 0.1000100, 0.1000100,
    0.1000100, 0.1000100, 1.0000000, 0.1000100,
    0.1000100, 0.1000100, 0.1000100, 1.0000000
  ),
  nrow = 4, ncol = 4, byrow = TRUE
)

# A covariance matrix where all inter-dependencies of the variables is very high
high.mx <- matrix(
    c(
    1.0000000, 0.8000100, 0.8000100, 0.8000100,
    0.8000100, 1.0000000, 0.8000100, 0.8000100,
    0.8000100, 0.8000100, 1.0000000, 0.8000100,
    0.8000100, 0.8000100, 0.8000100, 1.0000000
  ),
  nrow = 4, ncol = 4, byrow = TRUE
)

# A covariance matrix where some of the inter-dependencies of the variables are negative
test_neg.mx <- matrix(c(
  1.00000000, 0.87654321, 0.54321098, -0.32109876,
  0.87654321, 1.00000000, 0.65432109, 0.43210987,
  0.54321098, 0.65432109, 1.00000000, 0.76543210,
  -0.32109876, 0.43210987, 0.76543210, 1.00000000
), nrow = 4, ncol = 4, byrow = TRUE)

Results of non-independence (within-study) meta-analysis approach

Replicating the between-study meta-analysis results

When using a null covariance matrix—indicating complete independence between within the data—we obtain the same pooled estimates and 95% confidence interval as using the traditional approach, pooled r estimate = 0.394, 95% CI [0.286, 0.492].

# Calculate pooled r using within-study meta-analytic tool (null covariance matrix)
pooled_dep_correlation(r_vec = study_data$r, n_vec = study_data$n, R_mat = null.mx) # test with 0 inter-dependencies for equivalence with independent meta-analysis

## $pooled_r
## [1] 0.3936713
## 
## $effective_n
## [1] 270
## 
## $ci_lower
## [1] 0.285916
## 
## $ci_upper
## [1] 0.4915951
## 
## $p_value
## [1] 2.322609e-11
## 
## $pooled_z
## [1] 0.4161373
## 
## $CI_lower_z
## [1] 0.294113
## 
## $CI_upper_z
## [1] 0.5381616

Within-study meta-analysis results with variables showing moderate inter-depedencies

When using a covariance matrix allowing for a range of inter-dependencies amongst the variables, we obtain a pooled r estimate = 0.513 95% CI [0.384, 0.622]. Compared to the traditional approach, the pooled meta-analysis estimate is larger because the calculation is now accounting for the unique dependencies within the data.

# Calculate pooled r using within-study meta-analytic tool (moderate correspondence covariance matrix)
pooled_dep_correlation(r_vec = study_data$r, n_vec = study_data$n, R_mat = cor.matrix) # test with normal inter-dependencies

## $pooled_r
## [1] 0.5125711
## 
## $effective_n
## [1] 270
## 
## $ci_lower
## [1] 0.3835798
## 
## $ci_upper
## [1] 0.6219454
## 
## $p_value
## [1] 7.276402e-12
## 
## $pooled_z
## [1] 0.5662109
## 
## $CI_lower_z
## [1] 0.4042503
## 
## $CI_upper_z
## [1] 0.7281714

Within-study meta-analysis results with variables showing low inter-dependencies

When using a covariance matrix allowing for very low inter-dependencies amongst the variables, we obtain a pooled estimates very similar to the null covariance matrix where the pooled r estimate = 0.392 95% CI [0.270, 0.502].

# Calculate pooled r using within-study meta-analytic tool (low correspondence covariance matrix)
pooled_dep_correlation(r_vec = study_data$r, n_vec = study_data$n, R_mat = low.mx) # test with very low inter-dependencies

## $pooled_r
## [1] 0.3922
## 
## $effective_n
## [1] 270
## 
## $ci_lower
## [1] 0.2695029
## 
## $ci_upper
## [1] 0.5023667
## 
## $p_value
## [1] 4.0368e-09
## 
## $pooled_z
## [1] 0.4143973
## 
## $CI_lower_z
## [1] 0.2763277
## 
## $CI_upper_z
## [1] 0.5524668

Within-study meta-analysis results with variables showing high inter-dependencies

When using a covariance matrix allowing for very high inter-dependencies amongst the variables, we obtain a pooled estimates very similar to the null covariance matrix where the pooled r estimate = 0.359 95% CI [0.192, 0.504] except with a larger confidence band.

# Calculate pooled r using within-study meta-analytic tool (high correspondence covariance matrix)
pooled_dep_correlation(r_vec = study_data$r, n_vec = study_data$n, R_mat = high.mx) # test with very high inter-dependencies

## $pooled_r
## [1] 0.3585721
## 
## $effective_n
## [1] 270
## 
## $ci_lower
## [1] 0.1924089
## 
## $ci_upper
## [1] 0.5047467
## 
## $p_value
## [1] 4.56686e-05
## 
## $pooled_z
## [1] 0.3752463
## 
## $CI_lower_z
## [1] 0.1948374
## 
## $CI_upper_z
## [1] 0.5556552

Within-study meta-analysis results with variables showing some negative inter-dependencies

When using a covariance matrix allowing for some negative inter-dependencies amongst the variables, we obtain a pooled estimates very similar to the null covariance matrix where the pooled r estimate = 0.434 95% CI [0.264, 0.577] except with a larger confidence band.

# Calculate pooled r using within-study meta-analytic tool (some negative correspondence covariance matrix)
pooled_dep_correlation(r_vec = study_data$r, n_vec = study_data$n, R_mat = test_neg.mx) # test with some negative inter-dependencies

## $pooled_r
## [1] 0.4335966
## 
## $effective_n
## [1] 270
## 
## $ci_lower
## [1] 0.2643392
## 
## $ci_upper
## [1] 0.5769423
## 
## $p_value
## [1] 2.576781e-06
## 
## $pooled_z
## [1] 0.4643176
## 
## $CI_lower_z
## [1] 0.2707679
## 
## $CI_upper_z
## [1] 0.6578673

Summary

This GLS-based within-study meta-analytic method is used to combine multiple correlation coefficients from a single study—accounting for the statistical dependency between them to produce a single pooled correlation estimate. It uses the generalized least squares (GLS) framework, weighting each correlation by its precision and accounting for their inter-correlations via a covariance matrix.

This tool is useful for summarizing analyses taken from the same study where a shared source is providing or generating the data (e.g., same sensors, same subjects/participants, same raters or respondents). This tool offers utility in these cases where analytic aggregation is needed across numerous similar correlations to enhance the precision of estimates and reduce the uncertainty in confidence intervals.

Primary uses

Within-Study Aggregation
- Pooled estimate when a study reports multiple effect sizes (e.g., correlations between a common variable and several outcomes).
- Converts a set of related correlation estimates into a single summary statistic.
Accurate Weighting with Dependency
- Accounts for the correlated sampling errors (non-independence) that occur when correlations share a common variable or sample.
- Prevents inflated precision and biased estimates that would result from naive averaging or unadjusted meta-analysis.

Common Use Cases

Use Case	Description
🔬 Psychology & Social Sciences	Studies often report multiple correlations (e.g., between different dimensions of a psychological scale and an outcome). This method aggregates them properly.
📊 Education Research	When a study reports effects for subgroups (e.g., math and reading scores correlated with a variable), this method pools them into a single effect.
🧠 Neuroscience & Biology	Correlations between brain regions or genetic markers and outcomes often need to be pooled, accounting for dependencies.
📈 Multivariate Outcome Studies	When one predictor variable is correlated with several related outcomes (e.g., health indices), and you want to summarize the predictor’s overall association.

Advantages

🧠 Statistically principled: Accounts for inter-correlation of effect sizes
📉 Reduces bias: Prevents artificial inflation of sample size or precision
🔄 Flexible: Works with missing data in the correlation matrix via imputation and works when negative relationships are indicated in the covariance matrix

Limitations

⚠️ Requires knowledge (or estimates) of inter-correlations between effect sizes (i.e., you need access to the raw data to calculate the covariance matrix or alternatively input a reasonable correlation value)
🔧 Assumes multivariate normality of transformed correlations (Fisher’s z)
📈 Computationally intensive for large sets of effect sizes
🧩 Interpretation of pooled correlation may mask meaningful heterogeneity across outcomes