library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.2
library(reshape)
library(randtoolbox)
## Loading required package: rngWELL
## This is randtoolbox. For overview, type 'help("randtoolbox")'.
library(plyr)
## 
## Attaching package: 'plyr'
## The following objects are masked from 'package:reshape':
## 
##     rename, round_any

Define a Function to Generate Monte Carlo sample

Many important algorithms used to accomplish machine learning goals are based on drawing samples from some probability distribution and using these samples to form a Monte Carlo estimate of some desired quantity.

http://www.cs.cmu.edu/~epxing/Class/DS/lectures/lecture13.pdf. http://www.stat.ufl.edu/archived/casella/ShortCourse/MCMC-UseR.pdf. http://www.deeplearningbook.org/contents/monte_carlo.html.

MC is not an inference technique for finding the “best” model, it is a numerical tool to obtain samples from a given model. Sure enough you can also build inference procedures relying on MC (e.g. optimizing a criterion over parameters as a function of the simulated empirical distribution) but that doesn’t change the respective scopes and goals. The most common application of MC is probably the calculation of high-dimensional integrals.

Draw random samples from the desired distribution

n: samples size vals: distribution of each variable

generateMCSample <- function(n, vals) {
  # Packages to generate quasi-random sequences
  # and rearrange the data

  
  # Generate a Sobol' sequence
  sob <- sobol(n, length(vals))
  
  # Fill a matrix with the values
  # inverted from uniform values to
  # distributions of choice
  samp <- matrix(rep(0,n*(length(vals)+1)), nrow=n)
  samp[,1] <- 1:n
  for (i in 1:length(vals)) {
    l <- vals[[i]]
    dist <- l$dist
    params <- l$params
    samp[,i+1] <- eval(call(paste("q",dist,sep=""),sob[,i],params[1],params[2]))
  }
  
  # Convert matrix to data frame and label
  samp <- as.data.frame(samp)
  names(samp) <- c("n",laply(vals, function(l) l$var))
  return(samp)
}

Simple Example

1. Define the sample size and distribution

n <- 1000  # number of simulations to run

# List described the distribution of each variable

vals <- list(list(var="Uniform",
                  dist="unif",
                  params=c(0,1)),
             list(var="Normal",
                  dist="norm",
                  params=c(0,1)),
             list(var="Weibull",
                  dist="weibull",
                  params=c(2,1)))

2. Generate the sample

samp <- generateMCSample(n,vals)

3. Visulizaiton of the generated sample

samp.mt <- melt(samp,id="n")
ggplot(samp.mt,aes(x=value)) +
  geom_histogram(binwidth=0.1) +
  theme_bw() +
  facet_wrap(~variable, ncol=3,scale="free")