Correlated Variable Sampling

Definition

Define Numer of variables nvars and numer of samples n.

n <- 20 # number of samples
nvars <- 3 # number of variables

Define the correlation matrix sigma, with dimension nvars x nvars which represents the correlation between each variable.
Of course, the diagonal must be = 1.0.

# correlation matrix (symmetric!)
sigma <- matrix(rep(0, nvars*nvars), nrow = nvars, ncol = nvars) # uncorrelated
sigma[1,] <- c(1.0, 0.5, 0.0) 
sigma[2,] <- c(0.5, 1.0, 0.8)
sigma[3,] <- c(0.0, 0.8, 1.0)

Latin Hypercube Sampling

1. Build a LHS

If n is large, LHS funciton may not converge. Try increasing eps, at cost of reducing accuracy.

set.seed(123)
corrLHS <- pse::LHS(factors = nvars, N = n, method = "HL", opts = list(COR = sigma, eps = 0.05))

2. Extract probability matrix

XX <- pse::get.data(corrLHS)

head(XX)

##      I1    I2    I3
## 1 0.725 0.425 0.175
## 2 0.925 0.975 0.825
## 3 0.675 0.925 0.975
## 4 0.125 0.325 0.475
## 5 0.475 0.475 0.375
## 6 0.075 0.125 0.575

plot(XX)

All variables are in [0, 1] interval.

3. Build dataframe of variales

Note that can be any kind of distribution

df <- data.frame(var1 = qlnorm(XX[, 1], 10, 2),
                 var2 = qnorm(XX[, 2], 5, 0.5),
                 var3 = qunif(XX[, 3], 0, 1))

4. Check correlation

The correlation of sampled variables should be equal to sigma. Increasing eps (necessary for large number of samples) reduces the accuracy and therefore there may be difference between sigma and the sampled variables correlation.

round(cor(df, method = "spearman"), 2)

##      var1 var2 var3
## var1 1.00 0.50 0.02
## var2 0.50 1.00 0.77
## var3 0.02 0.77 1.00

sigma

##      [,1] [,2] [,3]
## [1,]  1.0  0.5  0.0
## [2,]  0.5  1.0  0.8
## [3,]  0.0  0.8  1.0

5. Check graphic

df[ , 1] = log10(df[ , 1])
psych::pairs.panels(df, 
                    method = "spearman", # correlation method
                    hist.col = "#00AFBB",
                    density = TRUE,  # show density plots
                    ellipses = TRUE) # show correlation ellipses

Random Sampling

1. Build a copula

What you obtain in this step are navrs normal distributions, correlated between each other acocording to sigma.

# n <- 1000

set.seed(123)
copula <- MASS::mvrnorm(n, mu = rep(0, nvars), Sigma = sigma, empirical = TRUE)

par(mfrow = c(1, 3))
hist(copula[ ,1])
hist(copula[ ,2])
hist(copula[ ,3])