How does mean replicate correlation relate to signal strength?

Consider a perturbation whose treatment profiles are given by

\(\mathbf{x} \sim \mathcal{N}(\mathbf{\mu}, \sigma\mathbf{I})\)

Assume that the negative control is

\(\mathbf{c} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})\)

so that the profiles remain unchanged after normalization (z-scoring with respect to control is an identity transformation)

Consider \(n\) replicates \(\mathbf{x}_1, \mathbf{x}_2, \dots, \mathbf{x}_n\).

Now the signal strength of the perturbation is given by the magnitude of the average treatment profile, which is \(\|\frac{1}{n}\Sigma_{i=1}^{n}\mathbf{x}_i\|_2\) and approaches \(\|\mathbf{\mu}\|_{2}\) as \(n \rightarrow \inf\).

The mean replicate correlation is given by \(\frac{1}{n(n-1)/2}\Sigma_{i > j}cor(\mathbf{x}_i, \mathbf{x}_j)\)

generate_treatment_replicates <- function(mu, sigma, n) {
  sweep(matrix(rnorm(length(mu) * n), n, length(mu)) * sigma, 2, -mu)
}

cv <- .1 # coef. of variation

d <- 1000 # dimensions

mu_factor <- 10

n <- 1000 # replicates

sigma <- cv * mu_factor

mu <- rnorm(d) * mu_factor

treatment <- generate_treatment_replicates(mu, sigma, n)

Compute signal strength

sig_strength <- sqrt(sum(apply(treatment, 2, mean)^2))

print(sig_strength)
## [1] 324.5828

Is \(\|\mathbf{\mu}\|_{2}\) a good approximation of signal strength?

print(sig_strength / sqrt(sum(mu^2)))
## [1] 1.000062

So signal strength is independent of \(\sigma\)

How does \(\mu\) relate to the multiplicate factor?

print((mu_factor) / (sqrt(sum(mu^2)) / sqrt(d)))
## [1] 0.9743202

Mean replicate correlation should likely depend on coefficient of variation; need to test it out

cmat <- cor(t(treatment))

mean_replicate_correlation <- mean(cmat[upper.tri(cmat)])

print(mean_replicate_correlation)
## [1] 0.9906107

We should also test this out empirically once again. Its likely that the Gaussian model is not appropriate in that we’d almost never have high \(\mu\) but low \(\sigma\)