Definition

Define Numer of variables nvars and numer of samples n.

n <- 20 # number of samples
nvars <- 3 # number of variables

Define the correlation matrix sigma, with dimension nvars x nvars which represents the correlation between each variable.
Of course, the diagonal must be = 1.0.

# correlation matrix (symmetric!)
sigma <- matrix(rep(0, nvars*nvars), nrow = nvars, ncol = nvars) # uncorrelated
sigma[1,] <- c(1.0, 0.5, 0.0) 
sigma[2,] <- c(0.5, 1.0, 0.8)
sigma[3,] <- c(0.0, 0.8, 1.0)

 

Latin Hypercube Sampling

1. Build a LHS

If n is large, LHS funciton may not converge. Try increasing eps, at cost of reducing accuracy.

set.seed(123)
corrLHS <- pse::LHS(factors = nvars, N = n, method = "HL", opts = list(COR = sigma, eps = 0.05))

 

2. Extract probability matrix

XX <- pse::get.data(corrLHS)

head(XX)
##      I1    I2    I3
## 1 0.725 0.425 0.175
## 2 0.925 0.975 0.825
## 3 0.675 0.925 0.975
## 4 0.125 0.325 0.475
## 5 0.475 0.475 0.375
## 6 0.075 0.125 0.575
plot(XX)

All variables are in [0, 1] interval.

 

3. Build dataframe of variales

Note that can be any kind of distribution

df <- data.frame(var1 = qlnorm(XX[, 1], 10, 2),
                 var2 = qnorm(XX[, 2], 5, 0.5),
                 var3 = qunif(XX[, 3], 0, 1))

 

4. Check correlation

The correlation of sampled variables should be equal to sigma. Increasing eps (necessary for large number of samples) reduces the accuracy and therefore there may be difference between sigma and the sampled variables correlation.

round(cor(df, method = "spearman"), 2)
##      var1 var2 var3
## var1 1.00 0.50 0.02
## var2 0.50 1.00 0.77
## var3 0.02 0.77 1.00
sigma
##      [,1] [,2] [,3]
## [1,]  1.0  0.5  0.0
## [2,]  0.5  1.0  0.8
## [3,]  0.0  0.8  1.0

 

5. Check graphic

df[ , 1] = log10(df[ , 1])
psych::pairs.panels(df, 
                    method = "spearman", # correlation method
                    hist.col = "#00AFBB",
                    density = TRUE,  # show density plots
                    ellipses = TRUE) # show correlation ellipses

 

Random Sampling

1. Build a copula

What you obtain in this step are navrs normal distributions, correlated between each other acocording to sigma.

# n <- 1000

set.seed(123)
copula <- MASS::mvrnorm(n, mu = rep(0, nvars), Sigma = sigma, empirical = TRUE)

par(mfrow = c(1, 3))
hist(copula[ ,1])
hist(copula[ ,2])
hist(copula[ ,3])

 

2. Extract probability matrix

XX <- pnorm(copula)

head(XX)
##           [,1]      [,2]       [,3]
## [1,] 0.1564713 0.6764229 0.76505670
## [2,] 0.3125947 0.5049848 0.59719605
## [3,] 0.8290174 0.9836215 0.95746657
## [4,] 0.7305729 0.1816715 0.02272921
## [5,] 0.6366200 0.3544741 0.14755066
## [6,] 0.8733880 0.9929314 0.95192462
plot(as.data.frame(XX))

All variables are in [0, 1] interval.

 

Following steps are the same as for the case of LHS.

 

3. Build dataframe of variales

Note that can be any kind of distribution

df <- data.frame(var1 = qlnorm(XX[, 1], 10, 2),
                 var2 = qnorm(XX[, 2], 5, 0.5),
                 var3 = qunif(XX[, 3], 0, 1))

 

4. Check correlation

The correlation of sampled variables should be equal to sigma. In this case, since the sampling is random, greater is the number of samples n, closer to sigma is the sampled variables correlation.

round(cor(df, method = "spearman"), 2)
##       var1 var2  var3
## var1  1.00 0.43 -0.08
## var2  0.43 1.00  0.75
## var3 -0.08 0.75  1.00
sigma
##      [,1] [,2] [,3]
## [1,]  1.0  0.5  0.0
## [2,]  0.5  1.0  0.8
## [3,]  0.0  0.8  1.0

 

5. Check graphic

df[ , 1] = log10(df[ , 1])
psych::pairs.panels(df, 
                    method = "spearman", # correlation method
                    hist.col = "#00AFBB",
                    density = TRUE,  # show density plots
                    ellipses = TRUE) # show correlation ellipses