This page describes the generation of samples from multivariate Gaussian distribution.
The multivariate Gaussian distribution, also known as the multivariate normal distribution, is a generalization of the one-dimensional normal distribution to higher dimensions. It is characterized by a mean vector and a covariance matrix, which define the distribution’s central tendency and the relationships between its variables, respectively. This distribution is of great interest in many applied fields of science and engineering due to its tractable mathematical properties and the central limit theorem, which ensures that the sum of many random variables tends to follow a Gaussian distribution. It is commonly used in fields such as machine learning, signal processing, finance, and statistical inference for modeling and analysis of multidimensional data. The multivariate Gaussian distribution facilitates the understanding and prediction of complex systems where multiple interrelated variables are involved
The mathematical form of the multivariate Gaussian distribution is given by
\begin{equation} f(\mathbf{x} | \boldsymbol{\mu}, \boldsymbol{\Sigma}) = \frac{1}{(2\pi)^{k/2} |\boldsymbol{\Sigma}|^{1/2}} \exp\left( -\frac{1}{2} (\mathbf{x} - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu}) \right), \end{equation}
where:
\(\mathbf{x}\) is a \(k\)-dimensional random vector, \(\boldsymbol{\mu}\) is the \(k\)-dimensional mean vector, \(\boldsymbol{\Sigma}\) is the \(k \times k\) covariance matrix, \(|\boldsymbol{\Sigma}|\) is the determinant of the covariance matrix \(\boldsymbol{\Sigma}\) with inverse \(\boldsymbol{\Sigma}^{-1}\) and \(k\) is the number of dimensions.
A special case (\(k=2\)), the bivariate Gaussian distributon with the probability density function given by:
\begin{equation} f(x_1, x_2 | \mu_1, \mu_2, \sigma_1, \sigma_2, \rho) = \frac{1}{2\pi \sigma_1 \sigma_2 \sqrt{1 - \rho^2}} \exp\left( -\frac{1}{2(1 - \rho^2)} \left[ \frac{(x_1 - \mu_1)^2}{\sigma_1^2} + \frac{(x_2 - \mu_2)^2}{\sigma_2^2} - \frac{2\rho (x_1 - \mu_1)(x_2 - \mu_2)}{\sigma_1 \sigma_2} \right] \right), \end{equation}
where:
\(x_1\) and \(x_2\) are the random variable,\(\mu_1\) and \(\mu_2\) are the means of \(x_1\) and \(x_2\), respectively. \(\sigma_1\) and \(\sigma_2\) are the standard deviations of \(x_1\) and \(x_2\), respectively and \(\rho\) is the correlation coefficient between \(x_1\) and \(x_2\).
When generating MVN distritbution, we need to specify the mean vector \(\boldsymbol{\mu}\) of size \(k\) and the covariance matrix \(\Sigma\) of size \(k \times k\). Then, the generation of MVN samples are obtained via the steps:
k=5
n = 1000
# Step 1
Sigma <- toeplitz(0.5^(0:(k-1))) # toepliz matrix
print('Actual Sigma')
## [1] "Actual Sigma"
print(round(Sigma,4))
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.0000 0.500 0.25 0.125 0.0625
## [2,] 0.5000 1.000 0.50 0.250 0.1250
## [3,] 0.2500 0.500 1.00 0.500 0.2500
## [4,] 0.1250 0.250 0.50 1.000 0.5000
## [5,] 0.0625 0.125 0.25 0.500 1.0000
R = chol(Sigma)
# Step 2
N=matrix(rnorm(n*k), k)
# Step 3
X = t(R) %*% N
# Verify if we get Sigma for large n
Sigma_estimated=X %*%t(X) /n
print('Estimated Sigma \n')
## [1] "Estimated Sigma \n"
print(round(Sigma_estimated,4))
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.9645 0.4818 0.2470 0.0820 0.0326
## [2,] 0.4818 0.9853 0.4963 0.2071 0.1213
## [3,] 0.2470 0.4963 0.9993 0.4510 0.1923
## [4,] 0.0820 0.2071 0.4510 0.9609 0.4395
## [5,] 0.0326 0.1213 0.1923 0.4395 1.0125
mvtnorm/MASS packagek=5
n = 1000
Sigma <- toeplitz(0.5^(0:(k-1)))
mu <- rep(0,k)
cat('Actual Sigma\n')
## Actual Sigma
print(round(Sigma,4))
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.0000 0.500 0.25 0.125 0.0625
## [2,] 0.5000 1.000 0.50 0.250 0.1250
## [3,] 0.2500 0.500 1.00 0.500 0.2500
## [4,] 0.1250 0.250 0.50 1.000 0.5000
## [5,] 0.0625 0.125 0.25 0.500 1.0000
#using mvtnorm
X=mvtnorm::rmvnorm(n, mean = mu, sigma = Sigma)
Sigma_estimated=t(X) %*% X /n
cat('Estimated Sigma using mvtnorm \n')
## Estimated Sigma using mvtnorm
print(round(Sigma_estimated,4))
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.0758 0.5105 0.2818 0.1359 0.1026
## [2,] 0.5105 0.9609 0.4963 0.2340 0.1682
## [3,] 0.2818 0.4963 0.9524 0.4959 0.2778
## [4,] 0.1359 0.2340 0.4959 1.0482 0.5288
## [5,] 0.1026 0.1682 0.2778 0.5288 0.9816
X=MASS::mvrnorm(n, mu = mu, Sigma = Sigma)
Sigma_estimated= t(X)%*%X /n
cat('Estimated Sigma using MASS \n')
## Estimated Sigma using MASS
print(round(Sigma_estimated,4))
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.9795 0.5320 0.2486 0.1480 0.0445
## [2,] 0.5320 1.0329 0.4982 0.2782 0.0985
## [3,] 0.2486 0.4982 0.9527 0.4973 0.2393
## [4,] 0.1480 0.2782 0.4973 0.9958 0.4787
## [5,] 0.0445 0.0985 0.2393 0.4787 0.9585