For simplicity, it is common to assume that the correlation of the random variables is \(\rho_{ij} = \rho\) for all \(i,j=1,\dots, d\). This is, the correlation matrix is:
\[ R = \left(\begin{array}{cc} 1 & \rho & \dots & \rho\\ \rho & 1 & \dots & \rho\\ \dotso & \dots & \dots & \dots\\ \rho & \rho & \dots & 1\\ \end{array}\right) \] A natural question is: when is this matrix positive definite?
A solution:
The characteristic polynomial of \(R\) is (Hoffman and Kunze, n.d.)
\[ P(\lambda) = \mbox{det} \left( R - \lambda I_d \right). \] Clearly, a root of this polynomial is \(\lambda = 1-\rho\), as this value leads to a matrix \(R - \lambda I_d\) with all entries equal to \(\rho\), with determinant equal to zero. Then, \(\lambda_1 = 1-\rho\) is an eigenvalue (and it is positive).
Moreover, the geometric multiplicity of \(\lambda_1\) is :
\[ \gamma _{R}(\lambda_1 )=n-\operatorname {rank} (R - \lambda_1 I_d).\] Since the \(R - \lambda I_d\) is a matrix with all entries equal to \(\rho\), then the \(\operatorname {rank} (R - \lambda_1 I_d) = 1\). Consequently, the multiplicity of \(\lambda_1\) is \(d-1\), and there is only one more eigenvalue. We can find the second eigenvalue by using the property:
\[ \begin{aligned} d &= \operatorname {tr} (R)=\sum _{i=1}^{d}R_{ii}=\sum _{i=1}^{d}\lambda _{i}= (d-1)\lambda _{1}+\lambda _{2} \\ &= (d-1)(1-\rho) + \lambda_2 \end{aligned} \] Therefore, \(\lambda_2 = d - (d-1)(1-\rho)\). In order for \(\lambda_2\) to be positive, we need \(d - (d-1)(1-\rho) >0\), which is equivalent to the condition:
\[ \rho > -\dfrac{1}{d-1}. \] So, when you simulate (e.g. normal) random variables with correlation \(R\), you need to take this condition into account. Note that this condition applies to non-normal variables as well.
For \(d = 2\), this implies \(\rho > -1\); but for \(d = 3\), you need \(\rho > - 0.5\); and for \(d\to \infty\), \(\rho >0\).
# dimension
d <- 10
# Correlation matrix
rho0 <- -0.25
RHO <- diag(d) + rho0*(matrix(1, nrow = d, ncol = d) - diag(d))
print(RHO)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1.00 -0.25 -0.25 -0.25 -0.25 -0.25 -0.25 -0.25 -0.25 -0.25
## [2,] -0.25 1.00 -0.25 -0.25 -0.25 -0.25 -0.25 -0.25 -0.25 -0.25
## [3,] -0.25 -0.25 1.00 -0.25 -0.25 -0.25 -0.25 -0.25 -0.25 -0.25
## [4,] -0.25 -0.25 -0.25 1.00 -0.25 -0.25 -0.25 -0.25 -0.25 -0.25
## [5,] -0.25 -0.25 -0.25 -0.25 1.00 -0.25 -0.25 -0.25 -0.25 -0.25
## [6,] -0.25 -0.25 -0.25 -0.25 -0.25 1.00 -0.25 -0.25 -0.25 -0.25
## [7,] -0.25 -0.25 -0.25 -0.25 -0.25 -0.25 1.00 -0.25 -0.25 -0.25
## [8,] -0.25 -0.25 -0.25 -0.25 -0.25 -0.25 -0.25 1.00 -0.25 -0.25
## [9,] -0.25 -0.25 -0.25 -0.25 -0.25 -0.25 -0.25 -0.25 1.00 -0.25
## [10,] -0.25 -0.25 -0.25 -0.25 -0.25 -0.25 -0.25 -0.25 -0.25 1.00
# Determinant is negative!
det(RHO)
## [1] -9.313226
# So, if you try to use this correlation matrix on rmvnorm:
set.seed(123) # seed
X <- rmvnorm(n = n, mean = rep(0, d), sigma = RHO)
## Warning in rmvnorm(n = n, mean = rep(0, d), sigma = RHO): sigma is numerically
## not positive semidefinite
# dimension
d <- 10
# Correlation matrix
rho0 <- -0.1
RHO <- diag(d) + rho0*(matrix(1, nrow = d, ncol = d) - diag(d))
print(RHO)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1.0 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1
## [2,] -0.1 1.0 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1
## [3,] -0.1 -0.1 1.0 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1
## [4,] -0.1 -0.1 -0.1 1.0 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1
## [5,] -0.1 -0.1 -0.1 -0.1 1.0 -0.1 -0.1 -0.1 -0.1 -0.1
## [6,] -0.1 -0.1 -0.1 -0.1 -0.1 1.0 -0.1 -0.1 -0.1 -0.1
## [7,] -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 1.0 -0.1 -0.1 -0.1
## [8,] -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 1.0 -0.1 -0.1
## [9,] -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 1.0 -0.1
## [10,] -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 1.0
# Determinant is positive
det(RHO)
## [1] 0.2357948
Hoffman, Kenneth, and Ray Kunze. n.d. Linear Algebra. 1971. Englewood Cliffs, New Jersey.