There are many distributions from which it is difficult, or even impossible, to directly simulate. Moreover, in some cases, we are not even able to represent the distribution in a usable form, such as a transformation or a mixture.
We thus turn to another class of methods that only requires us to know the functional form of the density \(f\) of interest up to a multiplicative constant; no deep analytic study of \(f\) is necessary.
The key to this method is to use a simpler (simulationwise) density \(g\) from which the simulation is actually done. For a given density \(g\)-called the Instrumental density or Candidate density- there are thus many densities \(f\)- called the Target densities-which can be simulated this way.
The Fundamental Theorem of Simulation.
Simulating,
\[ X\sim f\left(x\right) \]
is equivalent to simulating,
\[
(X,U)\sim \mathcal{U}[(x,u):0<u<f(x)]
\]
Applying the Fundamental Theorem of Simulation.
Suppose \(f\) is bounded by \(m\). We can then simulate the random pair \((Y,U)\sim\mathcal{U}[0<u<m]\) by simulating \(Y\sim\mathcal{U}(a,b)\) defined \(U|Y=y\sim \mathcal{U}(0,m)\)
Take the pair only if the further constraint \(0<u<f(y)\) is satisfied.
This results in the correct distribution of the accepted value of \(Y\), call it \(X\), because,
Simulation from \(\mathcal{B}e_{1}(\alpha=2.7,\beta=6.3)\)
Now, using R we will create a function named ARBeta which simulates \(\mathcal{B}e_{1}\left(\alpha,\beta\right)\) sample, all the \(\left(Y,U\right)\) pairs, # such pairs generated, upper bound for \(f\left(x\right)\) etc, given sample size, \(\alpha,\beta\).
Simulation from \(\mathcal{B}e_{1}(\alpha=2.7,\beta=6.3)\)
Using this function, we generate a sample of size \(n=1000\) from \(\mathcal{B}e_{1}\left(\alpha=2.7,\beta=6.3\right)\). The function returns a list ARSam, which contains all such necessary information.
ARSam
#---A Beta(2.7,6.3) sample of size n=1000------------- ARSam=ARBeta(n=1000,alpha=2.7,beta=6.3) list(ARSam$GenP,ARSam$ExpectedProp,ARSam$AccProp)
Simulation from \(\mathcal{B}e_{1}(\alpha=2.7,\beta=6.3)\)
For this example, we got \(m=2.67\), so we accept approximately \(\frac{1}{2.67}=37\)% of the values . Now we plot a histogram of the sample taken alongwith the density of, \(\mathcal{B}e_{1}\left(\alpha=2.7,\beta=6.3\right)\)
Code
hist(ARSam$Beta,prob=T,col="navyblue",border="orange",main="Histogram of ARSam",xlab="x",ylab="Frequency Density",breaks=30)rug(ARSam$Beta)Betadense=function(x,shape1=2.7,shape2=6.3){return(dbeta(x,shape1=shape1,shape2=shape2))} #--Gives the desired density--curve(Betadense,from=0,to=1,add=T,lwd=2,col="red")box(lwd=3)
Simulation from \(\mathcal{B}e_{1}(\alpha=2.7,\beta=6.3)\)
To get an idea about how many simulations are wasted, we make the following plot. Here we plot the \(\mathcal{B}e_{1}\left(\alpha=2.7,\beta=6.3\right)\) density and the generated beta samples. The points which are below the density curve are selected.
Code
accept=ARSam$U<Betadense(ARSam$Y)plot(ARSam$U~ARSam$Y,col=ifelse(accept,"navyblue","red"),pch=20,xlab="x",ylab="Density",main="Scatterplot of the Generated Samples \n alongwith Density") grid(col="darkgreen") box(lwd=3)curve(Betadense,from=0,to=1,add=T,lwd=2,col="purple") abline(h=ARSam$UpperB,col="grey",lwd=3)
Extension to the Fundamental Theorem of Simulation.
Obviously, efficiency dictates that \(m\) be as close as possible to \(f\) in order to avoid wasting simulations. A remark of importance is that, because of the constraint \(m\left(x\right)\geq f\left(x\right)\), \(m\) can’t be a probability density. We then write,
A natural way of simulating the uniform, on \(\mathscr{L}\) is to then use the previous theorem backwards, that is, to simulate \(Y\sim g\) then \(U\left|Y=y\right.\sim\mathcal{U}\left(0,Mg\left(y\right)\right)\). If we only accept the \(y\)’s such that the constraint \(u<f\left(y\right)\) is satisfied, we have,
For every measurable set \(\mathcal{A}\) and the accepted \(X\)’s are indeed distributed from \(f\). We have thus derived a more general implementation of the fundamental theorem as follows,
Theorem 2.
Let\(X\sim f\left(x\right)\)and let\(g\left(x\right)\)be a density function that satisfies\(f\left(x\right)\leq Mg\left(x\right)\)for some constant\(M\geq1\). Then, to simulate\(X\sim f\), it is sufficient to generate,
Now, we’ll follow the AR Algorithm to generate \(\mathcal{B}e_{1}\left(\alpha=2.7,\beta=6.3\right)\) samples with candidate density \(\mathcal{B}e_{1}\left(\alpha=2,\beta=6\right)\) from which it’s easy to simulate and is very close to our target density. We can generate \(\mathcal{B}e_{1}\left(\alpha=2,\beta=6\right)\) samples by following some transformations.
Now, we’ll follow the AR Algorithm to generate \(\mathcal{B}e_{1}\left(\alpha=2.7,\beta=6.3\right)\) samples with candidate density \(\mathcal{B}e_{1}\left(\alpha=2,\beta=6\right)\) from which it’s easy to simulate and is very close to our target density. We can generate \(\mathcal{B}e_{1}\left(\alpha=2,\beta=6\right)\) samples by following some transformations.
Note.
If \(X_{1},X_{2},\ldots,X_{n}\overset{iid}{\sim}\mathcal{U}\left(0,1\right)\) then \(X_{\left(r\right)}\sim\mathcal{B}e_{1}\left(r,n-r+1\right),r=1\left(1\right)n\). We need, \(r=2,n-r+1=6\), from which we get, \(\left(r,n\right)=\left(2,7\right)\). Hence generate a sample of size \(7\) from \(\mathcal{U}\left(0,1\right)\) and take the second least element which will provide a sample from \(\mathcal{B}e_{1}\left(\alpha=2,\beta=6\right)\).
The function ARBeta2 generates \(\mathcal{B}e_{1}\left(\alpha,\beta\right)\) samples from target density \(\mathcal{B}e_{1}\left(\left\lfloor \alpha\right\rfloor ,\left\lfloor \beta\right\rfloor \right)\). This function gives similar output as we have seen in the case of ARBeta1.
Using this function, we generate a sample of size \(n=1000\) from \(\mathcal{B}e_{1}\left(\alpha=2.7,\beta=6.3\right)\). The function returns a list ARSam2, which contains all such necessary information.
ARSam2
#---A Beta(2.7,6.3) sample of size n=1000------------- ARSam2=ARBeta2(n=1000,alpha=2.7,beta=6.3) list(ARSam2$GenP,ARSam2$ExpectedProp,ARSam2$AccProp)
Proportions of accepted simulations is much higher than before, out of around \(1680\) samples, \(1000\) were accepted. Now we plot the \(\left(Y,U\right)\) pairs along-with the density to see how much we have improved.
Code
plot(ARSam2$U~ARSam2$Y,col="maroon",pch=20,ylim=c(0,2),xlab="x",ylab="Density",main="Scatterplot of the Generated Samples \n alongwith Density") grid(col="darkgreen");box(lwd=3) curve(Betadense,from=0,to=1,add=T,lwd=2,col="purple")abline(h=ARSam$UpperB,col="red",lwd=3) polyseq=seq(from=0,to=1,length.out=1000)polygon(polyseq,Betadense(polyseq),density=60)
Dealing with unknown Normalizing Constants.
In most practical scenarios, we only know \(f\left(x\right)\) and \(g\left(x\right)\) upto some normalizing constants, i.e.,
where \(\tilde{f}\left(x\right),\tilde{g}\left(x\right)\) are known but \(Z_{f}=\int_{\chi}\tilde{f}\left(x\right)dx,Z{g}=\int_{\chi}\tilde{g}\left(x\right)dx\) are unknown/expensive to compute. In that cases rejection can still be used, indeed,
which is obviously straightforward to generate. Let’s plot the function \(f\) with envelope density \(g\left(x\right)=5\exp\left(-\frac{x^{2}}{2}\right)\)
Plotting \(f\) with envelope density \(g\left(x\right)=5\exp\left(-\frac{x^{2}}{2}\right)\) .
library(lattice) histogram(~ARSam3$sam,type="density",main="Histogram of ARSam3",ylab="Frequency Density",breaks=100,xlab="x",col='tomato',border='navyblue')
Ziggurat Algorithm for \(\boldsymbol{\mathcal{N}\left(0,1\right)}\)
About the Algorithm.
The Ziggurat Algorithm is an algorithm for pseudo-random number sampling. Belonging to the class of rejection sampling algorithms, it relies on an underlying source of uniformly-distributed random numbers, typically from a pseudo-random number generator, as well as precomputed tables.
The algorithm is used to generate values from a monotonically decreasing probability distribution. It can also be applied to symmetric unimodal distributions, such as the normal distribution, by choosing a value from one half of the distribution and then randomly choosing which half the value is considered to have been drawn from.
The Algorithm with R .
The principal reference we will follow is Doornik \(2005\). As with that paper, we consider an non-normalised right hand side of the normal distribution :
Code
f <-function(x){exp(-x^2/2)}
It will be useful to also have functions f_inv (the inverse of \(f\)) and f_int (the integral of \(f\) from \(x\) to \(\infty\)).
To work with the algorithm we need to cover this curve with \(n\) evenly sized rectangles, with the lowest one having infinite width. We’ll use these rectangles for the sampling scheme, described below.
Finding these rectangles turns out to be nontrivial.
Suppose we want to cover \(f\) with \(6\) rectangles; to do this we need to find the area of each of the rectangles (will be slightly larger area under \(f\) divided by \(n\) due to the overhangs) and a series of \(x\) points \((x_{1},x_{2},x_{3},\ldots,x_{n})\) with the last one being \(0\).
First we make a starting guess as to the x location of the final rectangle, say r.
We then compute the volume of the lowest rectangle as f(r)*r + f_int(r).
We can then compute the size of the \(2\)nd rectangle as f_inv(f(r) + v / r).
Continue this until all \('x'\) values have been computed, replacing r above with the x value from the previous iteration.
Ziggurat Formation.
Z1
n <-6r <-2.2v <- r *f(r) +f_int(r) x <-numeric(6) x[1] <- r for (i in2:(n -1)) {x[i] <-f_inv(f(x[i -1]) + v / x[i -1])}
which is bigger than all the others, which have area \(v\).
Ziggurat Formation contd.
If we’d made our original guess too small (say \(2.1\)) then we’d have too small a final area. We can capture this in a small nonlinear equation to solve :
TargetFunction
target <-function(r, n =6) { v <- r *f(r) +f_int(r) x <- r for (i in2:(n -1)) {x <-f_inv(f(x) + v / x)} x * (f(0) -f(x)) - v } c(target(2.2),target(2.1))
[1] 0.07608185 -0.22330914
This is easily solved by uniroot():
Uniroot
ans <-uniroot(target, c(2.1, 2.2)) ans$root
[1] 2.176047
Ziggurat Formation contd.
This approach works for any \(n\), though in practice some care is needed to select good bounds. Once we have found this value, we can compute our series of \(x\) values as above.
Reformation of Ziggurats.
r <- ans$root v <- r *f(r) +f_int(r) x <-numeric(n) x[1] <- r for (i in2:(n -1)) {x[i] <-f_inv(f(x[i -1]) + v / x[i -1])} x
curve(f, 0, 4, las =1,lwd=2,col="navyblue",main="Ziggurats") points(x,y=f(x),pch=20,col="red") abline(h=f(x),v=x,lty="dashed")axis(side=1)box(lwd=3)
Sampling.
To sample from the distribution using these rectangles we do :
Select a rectangle to sample from (an integer between \(1\) and \(n\)).
Select a random number \(u\) (between \(0\) and \(1\)); scale this by x[i] to get z.
If i==0 (the base layer)
If z<r then the point lies in the rectangle and can be accepted.
Otherwise draw from the tail and accept that point (see below).
If i>1 (any other layer):
If z<x[i-1] then the point lies entirely within the distribution and can be accepted.
Otherwise test the edges once and accept that point or return to the first step of the algorithm.
Note the slightly different alternative conditions for the base layer and others ; for the base layer we will find a sample from the tail even if it takes a few goes, but for the regions with overlaps we only try once and if that does not succeed we start again by selecting a new layer.
Sampling from the tail.
For sampling from the tail we use the algorithm of Marsaglia \(\left(1963\right)\); given \(\mathcal{U}\left(0,1\right)\) random numbers u1 and u2 and using the value of r from above (i.e., x[1]) :
Let x=-log(u1)/r.
Let y=-log(u2).
If 2y > x^2 accept x+r, otherwise try again.
To see this in action, first, let’s generate a set of normal draws from the tail (beyond x[1]).
Histogram of the Sample.
The curve is the analytical density function, shifted by r along the \(x\)-axis and scaled so that the area under the tail is \(1\). Generating a reasonably large number of samples (here \(10\) thousand) shows a good agreement between the samples and the expectation:
Code
sx <--log(runif(1e5)) / r sy <--log(runif(1e5)) accept <-2* sy > sx^2hist(sx[accept], nclass =30, freq =FALSE, xlim =c(0, 3),col="darkgrey",border="navyblue") curve(dnorm(x + r) /pnorm(r, lower.tail =FALSE), add =TRUE, col ="red")rug(sx[accept],col="darkgreen")box(lwd=3)
Proportion of Acceptance.
With the relatively low r here, our acceptance probability is not bad \((\sim86\)%) but as r increases it will improve :
Acceptance Proportion
sx <--log(runif(1e5)) /3.9sy <--log(runif(1e5)) mean(2* sy > sx^2)
[1] 0.94459
The Edge.
If our point in layer i(i>1) lies outside of the safe zone we need to do a full rejection sample. We start by illustrating this graphically; given two \(\mathcal{U}\left(0,1\right)\) numbers scaled to (x[3],x[2]) and f(x[3]),f(x[2]) we would accept the blue points below but not the red ones :
Code
plot(f, x[3] -0.05, x[2] +0.05,main="Layer Selected for Sampling") abline(v = x[2:3], lty =3) abline(h =f(x[2:3]))f0 <-f(x[3]) f1 <-f(x[2]) u1 <-runif(100, x[3], x[2]) # these ones are filtered already u2 <-runif(100) y <- f0 + u2 * (f1 - f0) accept <- y <f(u1) points(u1, y, col =ifelse(accept, "blue", "red"), pch =19, cex =0.25)box(lwd=3)
Bibliography.
Monte Carlo Statistical Methods - Christian P. Robert, George Casella - Springer.
Introducing Monte Carlo Methods with R - Christian P. Robert, George Casella - Springer.
Stochastic Simulation and Monte Carlo Methods - Carl Graham, Denis Talay - Springer.
Simulating Random Variables - Timothy Hanson - University of South Carolina.
Casella, George; Robert, Christian P.; Wells, Martin T. (2004). Generalized Accept-Reject sampling schemes. Institute of Mathematical Statistics. pp. 342–347.
Legault, Geoffrey; Melbourne, Brett A. (2019-03-01). “Accounting for environmental change in continuous-time stochastic population models”. Theoretical Ecology.
Casella, G., Lavine, M., and Robert, C.P. (2000). An introduction to perfect sampling. Amer. Statist. 55 299–305. MR1939363
Fill, J. A., Machida, M., Murdoch, D. J. and Rosenthal, J. S. (1999). Extension of Fill’s perfect rejection sampling algorithm to general chains. Random Structures and Algorithms 17 219–316. MR1801136