Accept-Reject Method & Ziggurat Algorithm

A Brief Overview

Rupanjan Mukherjee, Sourav Biswas

Indian Statistical Institute, Delhi

Accept-Reject Algorithm.

Target and Instrumental Density.

  • There are many distributions from which it is difficult, or even impossible, to directly simulate. Moreover, in some cases, we are not even able to represent the distribution in a usable form, such as a transformation or a mixture.

  • We thus turn to another class of methods that only requires us to know the functional form of the density \(f\) of interest up to a multiplicative constant; no deep analytic study of \(f\) is necessary.

  • The key to this method is to use a simpler (simulationwise) density \(g\) from which the simulation is actually done. For a given density \(g\)-called the Instrumental density or Candidate density- there are thus many densities \(f\)- called the Target densities-which can be simulated this way.

The Fundamental Theorem of Simulation.

Simulating,

\[ X\sim f\left(x\right) \]

is equivalent to simulating,

\[ (X,U)\sim \mathcal{U}[(x,u):0<u<f(x)] \]

Applying the Fundamental Theorem of Simulation.

  • Suppose \(f\) is bounded by \(m\). We can then simulate the random pair \((Y,U)\sim\mathcal{U}[0<u<m]\) by simulating \(Y\sim\mathcal{U}(a,b)\) defined \(U|Y=y\sim \mathcal{U}(0,m)\)
  • Take the pair only if the further constraint \(0<u<f(y)\) is satisfied.
  • This results in the correct distribution of the accepted value of \(Y\), call it \(X\), because,

\[ \mathbb{P}(X\leq x)=\mathbb{P}(Y\leq x|U\leq f(Y))=\frac{\int_{a}^{x}\int_{0}^{f(y)}dudy}{\int_{a}^{b}\int_{0}^{f(y)}dudy}=\int_{a}^{x}f(y)dy \]

Proportion of Acceptance.

In addition, it’s easy to see that the probability of the acceptance of a given simulation in the box \([a,b]\times[0,m]\) is given by,

\[ \mathbb{P}\left(\mathrm{Accept}\right)=\mathbb{P}\left(U<f\left(Y\right)\right)=\frac{1}{m}\int_{a}^{b}\int_{0}^{f(y)}dudy=\frac{1}{m} \]

Simulation from \(\mathcal{B}e_{1}(\alpha=2.7,\beta=6.3)\)

Now, using R we will create a function named ARBeta which simulates \(\mathcal{B}e_{1}\left(\alpha,\beta\right)\) sample, all the \(\left(Y,U\right)\) pairs, # such pairs generated, upper bound for \(f\left(x\right)\) etc, given sample size, \(\alpha,\beta\).

The Function for Sample Generation.
ARBeta=function(n,alpha,beta) 
{   
    betasam=NULL;Y=NULL;U=NULL   
    Betadense=function(x){return(dbeta(x,shape1=alpha,shape2=beta))}   
    m=optimize(Betadense,interval=c(0,1),maximum=T)$objective   
    while(length(betasam)<n)   
    {     
        y=runif(1);u=runif(1,max=m)     
        if(u<Betadense(y)){betasam=c(betasam,y)}     
        Y=c(Y,y);U=c(U,u)   
}   
return(list("Beta"=betasam,"Y"=Y,"U"=U,"GenP"=length(Y),"AccProp"=length(betasam)/length(Y),
"ExpectedProp"=1/m,"UpperB"=m)) }

Simulation from \(\mathcal{B}e_{1}(\alpha=2.7,\beta=6.3)\)

Using this function, we generate a sample of size \(n=1000\) from \(\mathcal{B}e_{1}\left(\alpha=2.7,\beta=6.3\right)\). The function returns a list ARSam, which contains all such necessary information.

ARSam
#---A Beta(2.7,6.3) sample of size n=1000------------- 
ARSam=ARBeta(n=1000,alpha=2.7,beta=6.3) 
list(ARSam$GenP,ARSam$ExpectedProp,ARSam$AccProp)
[[1]]
[1] 2653

[[2]]
[1] 0.3745677

[[3]]
[1] 0.3769318

Simulation from \(\mathcal{B}e_{1}(\alpha=2.7,\beta=6.3)\)

For this example, we got \(m=2.67\), so we accept approximately \(\frac{1}{2.67}=37\)% of the values . Now we plot a histogram of the sample taken alongwith the density of, \(\mathcal{B}e_{1}\left(\alpha=2.7,\beta=6.3\right)\)

Code
hist(ARSam$Beta,prob=T,col="navyblue",border="orange",main="Histogram of ARSam",
xlab="x",ylab="Frequency Density",breaks=30)
rug(ARSam$Beta)
Betadense=function(x,shape1=2.7,shape2=6.3){return(dbeta(x,shape1=shape1,shape2=shape2))} 
#--Gives the desired density--
curve(Betadense,from=0,to=1,add=T,lwd=2,col="red")
box(lwd=3)

Simulation from \(\mathcal{B}e_{1}(\alpha=2.7,\beta=6.3)\)

To get an idea about how many simulations are wasted, we make the following plot. Here we plot the \(\mathcal{B}e_{1}\left(\alpha=2.7,\beta=6.3\right)\) density and the generated beta samples. The points which are below the density curve are selected.

Code
accept=ARSam$U<Betadense(ARSam$Y)
plot(ARSam$U~ARSam$Y,col=ifelse(accept,"navyblue","red"),pch=20,
xlab="x",ylab="Density",
main="Scatterplot of the Generated Samples \n alongwith Density") 
grid(col="darkgreen") 
box(lwd=3)
curve(Betadense,from=0,to=1,add=T,lwd=2,col="purple") 
abline(h=ARSam$UpperB,col="grey",lwd=3)

Extension to the Fundamental Theorem of Simulation.

  • The larger set is of the form,

    • \[\mathscr{L}=\left\{\left(y,u\right):0<u<m\left(y\right)\right\}\]
  • Obviously, efficiency dictates that \(m\) be as close as possible to \(f\) in order to avoid wasting simulations. A remark of importance is that, because of the constraint \(m\left(x\right)\geq f\left(x\right)\), \(m\) can’t be a probability density. We then write,

    • \[m\left(x\right)=Mg\left(x\right)\, \mathrm{where,}\, \int_{\chi}m\left(x\right)dx=\int_{\chi}Mg\left(x\right)dx=M\]
  • A natural way of simulating the uniform, on \(\mathscr{L}\) is to then use the previous theorem backwards, that is, to simulate \(Y\sim g\) then \(U\left|Y=y\right.\sim\mathcal{U}\left(0,Mg\left(y\right)\right)\). If we only accept the \(y\)’s such that the constraint \(u<f\left(y\right)\) is satisfied, we have,

    • \[ \mathbb{P}\left(X\in\mathcal{A}\right)=\mathbb{P}\left(Y\in\mathcal{A}\left|U<f\left(Y\right)\right.\right)=\frac{\int_{\mathcal{A}}\int_{0}^{f\left(y\right)}\frac{du}{Mg\left(y\right)}g\left(y\right)dy}{\int_{\mathcal{A}}\int_{0}^{f\left(y\right)}\frac{1}{M}dudy}=\int_{\mathcal{A}}f\left(y\right)dy.\]

    • For every measurable set \(\mathcal{A}\) and the accepted \(X\)’s are indeed distributed from \(f\). We have thus derived a more general implementation of the fundamental theorem as follows,

Theorem 2.

Let \(X\sim f\left(x\right)\) and let \(g\left(x\right)\) be a density function that satisfies \(f\left(x\right)\leq Mg\left(x\right)\) for some constant \(M\geq1\). Then, to simulate \(X\sim f\), it is sufficient to generate,

\[ Y\sim g\quad\mathrm{\mathit{and},\quad}U\left|Y=y\right.\sim\mathcal{U}\left(0,Mg\left(y\right)\right) \]

until \(0<u<f\left(y\right)\).

The Accept-Reject Algorithm.

The implementation of Theorem \(2\) is known as the Accept-Reject Method, which is usually stated in the slightly modified, but equivalent form.

The Accept-Reject Algorithm.

The implementation of Theorem \(2\) is known as the Accept-Reject Method, which is usually stated in the slightly modified, but equivalent form.

Algorithm.

  1. Generate \(Y\sim g,U\sim\mathcal{U}\left(0,1\right)\)
  2. Accept \(X=Y\) if \(U\leq\frac{f\left(Y\right)}{Mg\left(Y\right)}\)
  3. Return to 1. otherwise.

Probability of Acceptance.

The unconditional acceptance probability is the proportion of proposed samples which are accepted, which is,

\[ \mathbb{P}\left(U\leq\frac{f\left(Y\right)}{Mg\left(Y\right)}\right)=\mathbb{E}\mathrm{\boldsymbol{1}}_{\left[U\leq\frac{f\left(Y\right)}{Mg\left(Y\right)}\right]}=\mathbb{E}\left[\mathbb{E}\left[\left.\mathrm{\boldsymbol{1_{\left[U\leq\frac{f(Y)}{Mg(Y)}\right]}}}\right|Y\right]\right] \]

\[ =\mathbb{E}\left[\mathbb{P}\left(\left.U\leq\frac{f\left(Y\right)}{Mg\left(Y\right)}\right|Y\right)\right] \]

\[ =\mathbb{E}\left[\frac{f\left(Y\right)}{Mg\left(Y\right)}\right]=\int_{y:g\left(y\right)>0}\frac{f\left(y\right)}{Mg\left(y\right)}g\left(y\right)dy \]

\[ =\frac{1}{M}\int_{y:g\left(y\right)>0}f\left(y\right)dy=\frac{1}{M} \]

\(\mathcal{B}e_{1}(\alpha=2.7,\beta=6.3)\) simulation contd.

Now, we’ll follow the AR Algorithm to generate \(\mathcal{B}e_{1}\left(\alpha=2.7,\beta=6.3\right)\) samples with candidate density \(\mathcal{B}e_{1}\left(\alpha=2,\beta=6\right)\) from which it’s easy to simulate and is very close to our target density. We can generate \(\mathcal{B}e_{1}\left(\alpha=2,\beta=6\right)\) samples by following some transformations.

\(\mathcal{B}e_{1}(\alpha=2.7,\beta=6.3)\) simulation contd.

Now, we’ll follow the AR Algorithm to generate \(\mathcal{B}e_{1}\left(\alpha=2.7,\beta=6.3\right)\) samples with candidate density \(\mathcal{B}e_{1}\left(\alpha=2,\beta=6\right)\) from which it’s easy to simulate and is very close to our target density. We can generate \(\mathcal{B}e_{1}\left(\alpha=2,\beta=6\right)\) samples by following some transformations.

Note.

If \(X_{1},X_{2},\ldots,X_{n}\overset{iid}{\sim}\mathcal{U}\left(0,1\right)\) then \(X_{\left(r\right)}\sim\mathcal{B}e_{1}\left(r,n-r+1\right),r=1\left(1\right)n\). We need, \(r=2,n-r+1=6\), from which we get, \(\left(r,n\right)=\left(2,7\right)\). Hence generate a sample of size \(7\) from \(\mathcal{U}\left(0,1\right)\) and take the second least element which will provide a sample from \(\mathcal{B}e_{1}\left(\alpha=2,\beta=6\right)\).

\(\mathcal{B}e_{1}(\alpha=2.7,\beta=6.3)\) simulation contd.

The function ARBeta2 generates \(\mathcal{B}e_{1}\left(\alpha,\beta\right)\) samples from target density \(\mathcal{B}e_{1}\left(\left\lfloor \alpha\right\rfloor ,\left\lfloor \beta\right\rfloor \right)\). This function gives similar output as we have seen in the case of ARBeta1.

Modification of ARBeta
ARBeta2=function(n,alpha,beta) 
{  
    betasam=NULL;Y=NULL;U=NULL   
    Betadense=function(x){return(dbeta(x,shape1=alpha,shape2=beta))}   
    Betadense2=function(x){return(dbeta(x,shape1=floor(alpha),shape2=floor(beta)))}   
    RDense=function(x){return(Betadense(x)/Betadense2(x))}   
    M=optimize(RDense,interval=c(0,1),maximum=T)$objective   
    while(length(betasam)<n)   
    {     
        y=rbeta(1,shape1=floor(alpha),shape2=floor(beta));u=runif(1)     
        if(u<(RDense(y)/M)){betasam=c(betasam,y)}     
        Y=c(Y,y);U=c(U,u)   
    }   
    return(list("Beta"=betasam,"Y"=Y,"U"=U,"GenP"=length(Y),
                "AccProp"=length(betasam)/length(Y),
"ExpectedProp"=1/M,"UpperB"=M)) 
}

\(\mathcal{B}e_{1}(\alpha=2.7,\beta=6.3)\) simulation contd.

Using this function, we generate a sample of size \(n=1000\) from \(\mathcal{B}e_{1}\left(\alpha=2.7,\beta=6.3\right)\). The function returns a list ARSam2, which contains all such necessary information.

ARSam2
#---A Beta(2.7,6.3) sample of size n=1000------------- 
ARSam2=ARBeta2(n=1000,alpha=2.7,beta=6.3) 
list(ARSam2$GenP,ARSam2$ExpectedProp,ARSam2$AccProp)
[[1]]
[1] 1701

[[2]]
[1] 0.5981549

[[3]]
[1] 0.5878895

\(\mathcal{B}e_{1}(\alpha=2.7,\beta=6.3)\) simulation contd.

Proportions of accepted simulations is much higher than before, out of around \(1680\) samples, \(1000\) were accepted. Now we plot the \(\left(Y,U\right)\) pairs along-with the density to see how much we have improved.

Code
plot(ARSam2$U~ARSam2$Y,col="maroon",pch=20,
ylim=c(0,2),xlab="x",ylab="Density",
main="Scatterplot of the Generated Samples \n alongwith Density") 
grid(col="darkgreen");box(lwd=3) 
curve(Betadense,from=0,to=1,add=T,lwd=2,col="purple")
abline(h=ARSam$UpperB,col="red",lwd=3) 
polyseq=seq(from=0,to=1,length.out=1000)
polygon(polyseq,Betadense(polyseq),density=60)

Dealing with unknown Normalizing Constants.

In most practical scenarios, we only know \(f\left(x\right)\) and \(g\left(x\right)\) upto some normalizing constants, i.e.,

\[ f\left(x\right)=\frac{\tilde{f}\left(x\right)}{Z_{f}}\:\mathrm{and}\:g\left(x\right)=\frac{\tilde{g}\left(x\right)}{Z_{g}} \]

where \(\tilde{f}\left(x\right),\tilde{g}\left(x\right)\) are known but \(Z_{f}=\int_{\chi}\tilde{f}\left(x\right)dx,Z{g}=\int_{\chi}\tilde{g}\left(x\right)dx\) are unknown/expensive to compute. In that cases rejection can still be used, indeed,

\[ \frac{f\left(x\right)}{g\left(x\right)}\leq M\iff\frac{\tilde{f}\left(x\right)}{\tilde{g}\left(x\right)}\leq\tilde{M}\;\mathrm{with}\:\tilde{M}=\frac{Z_{f}M}{Z_{g}}. \]

Example.

Suppose, we are trying to generate samples from a density \(f\) such that,

\[ f\left(x\right)\propto\exp\left(-\frac{x^{2}}{2}\right)\left(\sin\left(6x\right)^{2}+3\cos\left(x\right)^{2}\sin\left(4x\right)^{2}+1\right) \]

Example.

Suppose, we are trying to generate samples from a density \(f\) such that,

\[ f\left(x\right)\propto\exp\left(-\frac{x^{2}}{2}\right)\left(\sin\left(6x\right)^{2}+3\cos\left(x\right)^{2}\sin\left(4x\right)^{2}+1\right) \]

The following figure illustrates Theorem \(2\) (or, rather dominating density) the normal density,

\[ g\left(x\right)=\frac{1}{\sqrt{2\pi}}\exp\left(-\frac{x^{2}}{2}\right) \]

which is obviously straightforward to generate. Let’s plot the function \(f\) with envelope density \(g\left(x\right)=5\exp\left(-\frac{x^{2}}{2}\right)\)

Plotting \(f\) with envelope density \(g\left(x\right)=5\exp\left(-\frac{x^{2}}{2}\right)\) .

Code
#---Dealing with unknown normalizing constants--- 
Targetfunc=function(x)
{   
    f=((sin(6*x))^2)+(3*((cos(x))^2)*((sin(4*x))^2))+1   
    return(f*(exp(-(x^2)/2))) 
} 
Candfunc=function(x){5*exp(-((x^2)/2))} 
curve(Targetfunc,from=-4,to=4,ylim=c(0,5)) 
curve(Candfunc,add=T) 
polyseq=seq(-4,4,length.out=1000) 
yseq=NULL;yseq2=NULL 
for(i in 1:length(polyseq)){yseq[i]=Candfunc(polyseq[i])} 
for(i in 1:length(polyseq)){yseq2[i]=Targetfunc(polyseq[i])} 
polygon(polyseq,yseq,density=60);polygon(polyseq,yseq2,density=90);box(lwd=3)

Example contd.

The following function ARB generates sample from \(f\) using \(\mathcal{N}\left(0,1\right)\) as candidate density.

ARB
ARB=function(n) 
{   
    sam=NULL;Y=NULL;U=NULL   
    Targetfunc=function(x)   
        {     
            f=((sin(6*x))^2)+(3*((cos(x)^2))*((sin(4*x))^2))+1     
            return(f*(exp(-(x^2)/2)))   
        }   
    RDense=function(x){return(Targetfunc(x)/dnorm(x))}   
    M=optimize(RDense,interval=c(-100,100),maximum=T)$objective   
    while(length(sam)<n)   
    {     
        y=rnorm(1);u=runif(1);     
        if(u<(RDense(y)/M)){sam=c(sam,y)}     
        Y=c(Y,y);U=c(U,u)   
    }   
    return(list("sam"=sam,"Y"=Y,"U"=U,"GenP"=length(Y),"AccProp"=length(sam)/length(Y),
    "ExpectedProp"=1/M,"UpperB"=M)) 
} 

Example contd.

We generate \(10000\) samples in ARSam3 from the proposed density using ARB. Here are the relevant information.

Code
ARSam3=ARB(n=10000)
list(ARSam3$GenP,ARSam3$AccProp)
[[1]]
[1] 18475

[[2]]
[1] 0.541272

Histogram of the generated Sample.

Code
library(lattice) 
histogram(~ARSam3$sam,type="density",
main="Histogram of ARSam3",ylab="Frequency Density",breaks=100
,xlab="x",col='tomato',border='navyblue')

Ziggurat Algorithm for \(\boldsymbol{\mathcal{N}\left(0,1\right)}\)

About the Algorithm.

  • The Ziggurat Algorithm is an algorithm for pseudo-random number sampling. Belonging to the class of rejection sampling algorithms, it relies on an underlying source of uniformly-distributed random numbers, typically from a pseudo-random number generator, as well as precomputed tables.

  • The algorithm is used to generate values from a monotonically decreasing probability distribution. It can also be applied to symmetric unimodal distributions, such as the normal distribution, by choosing a value from one half of the distribution and then randomly choosing which half the value is considered to have been drawn from.

The Algorithm with R .

The principal reference we will follow is Doornik \(2005\). As with that paper, we consider an non-normalised right hand side of the normal distribution :

Code
f <- function(x){exp(-x^2 / 2)} 

It will be useful to also have functions f_inv (the inverse of \(f\)) and f_int (the integral of \(f\) from \(x\) to \(\infty\)).

Code
f_inv <- function(y) {sqrt(-2 * log(y))}
f_int <- function(r) {pnorm(r, lower.tail = FALSE) / dnorm(0)}

The Algorithm with R .

  • To work with the algorithm we need to cover this curve with \(n\) evenly sized rectangles, with the lowest one having infinite width. We’ll use these rectangles for the sampling scheme, described below.

  • Finding these rectangles turns out to be nontrivial.

  • Suppose we want to cover \(f\) with \(6\) rectangles; to do this we need to find the area of each of the rectangles (will be slightly larger area under \(f\) divided by \(n\) due to the overhangs) and a series of \(x\) points \((x_{1},x_{2},x_{3},\ldots,x_{n})\) with the last one being \(0\).

  1. First we make a starting guess as to the x location of the final rectangle, say r.
  2. We then compute the volume of the lowest rectangle as f(r)*r + f_int(r).
  3. We can then compute the size of the \(2\)nd rectangle as f_inv(f(r) + v / r).
  4. Continue this until all \('x'\) values have been computed, replacing r above with the x value from the previous iteration.

Ziggurat Formation.

Z1
n <- 6 
r <- 2.2 
v <- r * f(r) + f_int(r) 
x <- numeric(6) 
x[1] <- r 
for (i in 2:(n - 1)) 
{x[i] <- f_inv(f(x[i - 1]) + v / x[i - 1])}

We now have a series of \(x\) values.

x
x
[1] 2.2000000 1.8119187 1.5077600 1.2223596 0.9077862 0.0000000

which is bigger than all the others, which have area \(v\).

Ziggurat Formation contd.

If we’d made our original guess too small (say \(2.1\)) then we’d have too small a final area. We can capture this in a small nonlinear equation to solve :

TargetFunction
target <- function(r, n = 6) 
{   
    v <- r * f(r) + f_int(r)   
    x <- r   
    for (i in 2:(n - 1)) 
    {x <- f_inv(f(x) + v / x)}   
    x * (f(0) - f(x)) - v 
} 
c(target(2.2),target(2.1))
[1]  0.07608185 -0.22330914

This is easily solved by uniroot():

Uniroot
ans <- uniroot(target, c(2.1, 2.2)) 
ans$root
[1] 2.176047

Ziggurat Formation contd.

This approach works for any \(n\), though in practice some care is needed to select good bounds. Once we have found this value, we can compute our series of \(x\) values as above.

Reformation of Ziggurats.
r <- ans$root 
v <- r * f(r) + f_int(r) 
x <- numeric(n) 
x[1] <- r 
for (i in 2:(n - 1)) 
{x[i] <- f_inv(f(x[i - 1]) + v / x[i - 1])} 
x
[1] 2.1760469 1.7818609 1.4695742 1.1712803 0.8287847 0.0000000

Ziggurat Formation contd.

Code
curve(f, 0, 4, las = 1,lwd=2,col="navyblue",main="Ziggurats") 
points(x,y=f(x),pch=20,col="red") 
abline(h=f(x),v=x,lty="dashed")
axis(side=1)
box(lwd=3)

Sampling.

To sample from the distribution using these rectangles we do :

  1. Select a rectangle to sample from (an integer between \(1\) and \(n\)).
  2. Select a random number \(u\) (between \(0\) and \(1\)); scale this by x[i] to get z.
  3. If i==0 (the base layer)
    1. If z<r then the point lies in the rectangle and can be accepted.
    2. Otherwise draw from the tail and accept that point (see below).
  4. If i>1 (any other layer):
    1. If z<x[i-1] then the point lies entirely within the distribution and can be accepted.
    2. Otherwise test the edges once and accept that point or return to the first step of the algorithm.

Note the slightly different alternative conditions for the base layer and others ; for the base layer we will find a sample from the tail even if it takes a few goes, but for the regions with overlaps we only try once and if that does not succeed we start again by selecting a new layer.

Sampling from the tail.

For sampling from the tail we use the algorithm of Marsaglia \(\left(1963\right)\); given \(\mathcal{U}\left(0,1\right)\) random numbers u1 and u2 and using the value of r from above (i.e., x[1]) :

  1. Let x=-log(u1)/r.
  2. Let y=-log(u2).
  3. If 2y > x^2 accept x+r, otherwise try again.

To see this in action, first, let’s generate a set of normal draws from the tail (beyond x[1]).

Histogram of the Sample.

The curve is the analytical density function, shifted by r along the \(x\)-axis and scaled so that the area under the tail is \(1\). Generating a reasonably large number of samples (here \(10\) thousand) shows a good agreement between the samples and the expectation:

Code
sx <- -log(runif(1e5)) / r 
sy <- -log(runif(1e5)) 
accept <- 2 * sy > sx^2 
hist(sx[accept], nclass = 30, freq = FALSE, xlim = c(0, 3),col="darkgrey",border="navyblue") 
curve(dnorm(x + r) / pnorm(r, lower.tail = FALSE), add = TRUE, col = "red")
rug(sx[accept],col="darkgreen")
box(lwd=3)

Proportion of Acceptance.

With the relatively low r here, our acceptance probability is not bad \((\sim86\)%) but as r increases it will improve :

Acceptance Proportion
sx <- -log(runif(1e5)) / 3.9 
sy <- -log(runif(1e5)) 
mean(2 * sy > sx^2)
[1] 0.94459

The Edge.

If our point in layer i(i>1) lies outside of the safe zone we need to do a full rejection sample. We start by illustrating this graphically; given two \(\mathcal{U}\left(0,1\right)\) numbers scaled to (x[3],x[2]) and f(x[3]),f(x[2]) we would accept the blue points below but not the red ones :

Code
plot(f, x[3] - 0.05, x[2] + 0.05,main="Layer Selected for Sampling") 
abline(v = x[2:3], lty = 3) 
abline(h = f(x[2:3]))
f0 <- f(x[3]) 
f1 <- f(x[2]) 
u1 <- runif(100, x[3], x[2]) # these ones are filtered already 
u2 <- runif(100) 
y <- f0 + u2 * (f1 - f0) 
accept <- y < f(u1) 
points(u1, y, col = ifelse(accept, "blue", "red"), pch = 19, cex = 0.25)
box(lwd=3)

Bibliography.

  1. Monte Carlo Statistical Methods - Christian P. Robert, George Casella - Springer.

  2. Introducing Monte Carlo Methods with R - Christian P. Robert, George Casella - Springer.

  3. Stochastic Simulation and Monte Carlo Methods - Carl Graham, Denis Talay - Springer.

  4. Simulating Random Variables - Timothy Hanson - University of South Carolina.

  5. Casella, George; Robert, Christian P.; Wells, Martin T. (2004). Generalized Accept-Reject sampling schemes. Institute of Mathematical Statistics. pp. 342–347.

  6. Legault, Geoffrey; Melbourne, Brett A. (2019-03-01). “Accounting for environmental change in continuous-time stochastic population models”. Theoretical Ecology.

  7. Casella, G., Lavine, M., and Robert, C.P. (2000). An introduction to perfect sampling. Amer. Statist. 55 299–305. MR1939363

  8. Fill, J. A., Machida, M., Murdoch, D. J. and Rosenthal, J. S. (1999). Extension of Fill’s perfect rejection sampling algorithm to general chains. Random Structures and Algorithms 17 219–316. MR1801136

  9. Wikipedia.

Questions :) ?