library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.2
library(reshape)
library(randtoolbox)
## Loading required package: rngWELL
## This is randtoolbox. For overview, type 'help("randtoolbox")'.
library(plyr)
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:reshape':
##
## rename, round_any
Many important algorithms used to accomplish machine learning goals are based on drawing samples from some probability distribution and using these samples to form a Monte Carlo estimate of some desired quantity.
http://www.cs.cmu.edu/~epxing/Class/DS/lectures/lecture13.pdf. http://www.stat.ufl.edu/archived/casella/ShortCourse/MCMC-UseR.pdf. http://www.deeplearningbook.org/contents/monte_carlo.html.
MC is not an inference technique for finding the “best” model, it is a numerical tool to obtain samples from a given model. Sure enough you can also build inference procedures relying on MC (e.g. optimizing a criterion over parameters as a function of the simulated empirical distribution) but that doesn’t change the respective scopes and goals. The most common application of MC is probably the calculation of high-dimensional integrals.
Draw random samples from the desired distribution
n: samples size vals: distribution of each variable
generateMCSample <- function(n, vals) {
# Packages to generate quasi-random sequences
# and rearrange the data
# Generate a Sobol' sequence
sob <- sobol(n, length(vals))
# Fill a matrix with the values
# inverted from uniform values to
# distributions of choice
samp <- matrix(rep(0,n*(length(vals)+1)), nrow=n)
samp[,1] <- 1:n
for (i in 1:length(vals)) {
l <- vals[[i]]
dist <- l$dist
params <- l$params
samp[,i+1] <- eval(call(paste("q",dist,sep=""),sob[,i],params[1],params[2]))
}
# Convert matrix to data frame and label
samp <- as.data.frame(samp)
names(samp) <- c("n",laply(vals, function(l) l$var))
return(samp)
}
n <- 1000 # number of simulations to run
# List described the distribution of each variable
vals <- list(list(var="Uniform",
dist="unif",
params=c(0,1)),
list(var="Normal",
dist="norm",
params=c(0,1)),
list(var="Weibull",
dist="weibull",
params=c(2,1)))
samp <- generateMCSample(n,vals)
samp.mt <- melt(samp,id="n")
ggplot(samp.mt,aes(x=value)) +
geom_histogram(binwidth=0.1) +
theme_bw() +
facet_wrap(~variable, ncol=3,scale="free")