How does mean replicate correlation relate to signal strength?
Consider a perturbation whose treatment profiles are given by
\(\mathbf{x} \sim \mathcal{N}(\mathbf{\mu}, \sigma\mathbf{I})\)
Assume that the negative control is
\(\mathbf{c} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})\)
so that the profiles remain unchanged after normalization (z-scoring with respect to control is an identity transformation)
Consider \(n\) replicates \(\mathbf{x}_1, \mathbf{x}_2, \dots, \mathbf{x}_n\).
Now the signal strength of the perturbation is given by the magnitude of the average treatment profile, which is \(\|\frac{1}{n}\Sigma_{i=1}^{n}\mathbf{x}_i\|_2\) and approaches \(\|\mathbf{\mu}\|_{2}\) as \(n \rightarrow \inf\).
The mean replicate correlation is given by \(\frac{1}{n(n-1)/2}\Sigma_{i > j}cor(\mathbf{x}_i, \mathbf{x}_j)\)
generate_treatment_replicates <- function(mu, sigma, n) {
sweep(matrix(rnorm(length(mu) * n), n, length(mu)) * sigma, 2, -mu)
}
cv <- .1 # coef. of variation
d <- 1000 # dimensions
mu_factor <- 10
n <- 1000 # replicates
sigma <- cv * mu_factor
mu <- rnorm(d) * mu_factor
treatment <- generate_treatment_replicates(mu, sigma, n)
Compute signal strength
sig_strength <- sqrt(sum(apply(treatment, 2, mean)^2))
print(sig_strength)
## [1] 324.5828
Is \(\|\mathbf{\mu}\|_{2}\) a good approximation of signal strength?
print(sig_strength / sqrt(sum(mu^2)))
## [1] 1.000062
So signal strength is independent of \(\sigma\)
How does \(\mu\) relate to the multiplicate factor?
print((mu_factor) / (sqrt(sum(mu^2)) / sqrt(d)))
## [1] 0.9743202
Mean replicate correlation should likely depend on coefficient of variation; need to test it out
cmat <- cor(t(treatment))
mean_replicate_correlation <- mean(cmat[upper.tri(cmat)])
print(mean_replicate_correlation)
## [1] 0.9906107
We should also test this out empirically once again. Its likely that the Gaussian model is not appropriate in that we’d almost never have high \(\mu\) but low \(\sigma\)