Setting up packages
#need rtools
#install.packages(c("devtools","SamplingStrata","Rcpp", "RcppArmadillo","mlrMBO", "cluster","e1071"))
library(devtools)#Install from source those packages which need compilation? Yes
library(SamplingStrata)
install_github("MervynOLuing/hEDA")
library(hEDA)
devtools::install_github("r-pkg-examples/rcpp-and-doparallel")
library("Rcpp2doParallel")
Example of EDA component of the HEDA on the iris data set
- Select Petal Length and Petal Width as the target variables.
- Use Sepal Length and Species as auxiliary variables. –
- Convert Sepal Length to a categorical variable with 3 bins using the k-means algorithm (Hartigan and Wong, 1979) and a seed of 1234.
- Use an upper coefficient of variation level of 0.05 for the target variables.
- The cross product of the categorical version of Sepal Length with Species results in 8 atomic strata.
1*1 |
40 |
1.457500 |
0.235000 |
0.1715918 |
0.1038027 |
1*2 |
5 |
3.400000 |
1.100000 |
0.2966479 |
0.1549193 |
1*3 |
1 |
4.500000 |
1.700000 |
0.0000000 |
0.0000000 |
2*1 |
10 |
1.480000 |
0.290000 |
0.1720465 |
0.0943398 |
2*2 |
31 |
4.229032 |
1.306452 |
0.3620671 |
0.1916574 |
2*3 |
12 |
5.066667 |
1.883333 |
0.2248456 |
0.2702879 |
3*2 |
14 |
4.635714 |
1.450000 |
0.2091040 |
0.1118034 |
3*3 |
37 |
5.737838 |
2.081081 |
0.4961208 |
0.2523809 |
Running EDA
- Initialise a population size of Np solutions (in this case Np = 5) each of size L, i.e. the number of atomic strata, where l = 1, 2, . . . ,L.
- An integer denotes to which one of the H strata each atomic stratum belongs.
- Each solution is evaluated and ranked.

- Create a new selected population of the best solutions

- Construct a probabilistic model of the solutions in the selected population with the aim of estimating the probability distribution for each of the H strata for each atomic stratum

- Generate new solutions from this model to replace the non-elite solutions by sampling from that distribution
- Evaluate the new solutions and rank all solutions by their quality
- This offspring population returns a solution quality of 9.34 which is the global minimum
- Note: to find the global minimum we evaluated each of the 4,140 possible partitions of the 8 atomic strata

Mutation
- Use perturbation to deal with extreme probabilities of 0 and 1