Sampling from some measure zero sets

In my previous comment I mentioned the idea of sampling from the measure zero set by parameterizing the curve through the x-y plane, drawing a length \( L \) from a symmetric proposal distribution, finding the (x,y) pair that correspond to the new point, and accepting the proposal with the usual M-H acceptance probability. This should satisfy the requirements for the MCMC algorithm:

It is also notable that the proposal distribution above is now symmetric: the probability of proposing a new point from the current point is the same as going the other direction.

To sample along the curve, we use the approach of parameterizing with \( y(t) = t \) and then solving for \( x(t) \) as needed for the problem. We then find the (x,y) pair by using the function below to find the upper limit of the line integral along the constraining path with length \( L \) drawn from the proposal distribution.

getT <- function(L, f, start = 0, tol = 1e-05, initT = start + L) {
    # Input is a length L, a function f, and an initial t for the
    # parameterization

    # Output is the tPrime corresponding to a movement of length L along the
    # curve
    tPrime <- initT
    Ltest <- integrate(f, start, tPrime)$value
    # Iterate to find the value for tPrime to satisfy the curve length condition
    while (abs(L - Ltest) >= tol) {
        tPrime <- L/Ltest * (tPrime - start) + start
        Ltest <- integrate(f, start, tPrime)$value
    }
    # Return the value for the limit of the integral: tval also return the
    # approximation to L: finalL
    return(list(tval = tPrime, finalL = Ltest))
}

We'll test it first on some simple problems to make sure it works in obvious cases. First, sampling from a bivariate standard normal along the x=0 line. Obviously the draws of y conditional on being along this line should be a standard normal, as should the distribution “along the line”.

Note: in all plots below, ss are samples “along the path”, xs and ys are the samples in x and y respectively

plot of chunk unnamed-chunk-2

Alright, this works well, which is good. Note we are sampling along x=0, so the histogram of x is pointless here.

Next, we'll test it adding trivial complexity by sampling from a bivariate standard normal along the x=y line. Now draws of y conditional on being along this line should be normal with a variance of ½, though the distribution “along the line”, should still be standard normal due to the symmetry.

plot of chunk unnamed-chunk-3

Alright, this works well too, which is good.

Next, we'll test it on the problem at hand, sampling constrained to the curve \( x^2-y^2=1 \).

plot of chunk unnamed-chunk-4

Hmmm, that's no good. The curve under “along the path” looks reasonable, but the distribution of y is wrong in a very similar way to the way it was using the simple MCMC sampling of y.

Next, maybe we should try something in between - sampling constrained to the curve \( x=y^2 \).

par(mfrow = c(2, 2))
# Integrand of the line integral along the x=0 path.
inte <- function(t) {
    sqrt(1 + 4 * t^2)
}

T = 10^5
Eps = 3
# y(t) = t x(t) = t^2
ys = xs = ss = rep(runif(1), T)
xs[1] = ys[1]^2
for (t in 2:T) {
    # Sample a step length along the curve L
    propL = runif(1, -Eps, Eps)
    # Obtain the (x,y) pair corresponding to the proposed jump
    propy = getT(propL, f = inte, start = ys[t - 1])$tval
    propx = propy^2
    # M-H acceptance probability
    ace = (runif(1) < (dnorm(propy) * dnorm(propx))/(dnorm(ys[t - 1]) * dnorm(xs[t - 
        1])))
    if (ace) {
        ys[t] = propy
        xs[t] = propx
        ss[t] = integrate(inte, 0, propy)$value
    } else {
        ys[t] = ys[t - 1]
        xs[t] = xs[t - 1]
        ss[t] = ss[t - 1]
    }
}
xx <- seq(-3, 3, length = 100)
contour(xx, xx, outer(dnorm(xx), dnorm(xx)))
lines(xx^2, xx, col = "red", lwd = 2)
# ss is the curve coordinates Plot the density of ss given the bivariate
# normal for x,y
plot(density(ss))
# MCMC samples from density of x conditional on x^2=y
plot(density(xs))
# MCMC samples from density of y conditional on x^2=y
plot(density(ys))
# Change of variables solution to the conditional density of y
points(xx, dnorm(xx) * dnorm(xx^2)/0.4160508, lwd = 2, col = "blue", type = "l")

plot of chunk unnamed-chunk-5

Now this is a real problem. We have a clearly bimodal sampling distribution but that should not be the case. Something is clearly amiss with this sampling procedure here.

Just for fun, let's try sampling from something that should be pretty intuitive, a circle: \( x^2+y^2=1 \). Due to symmetry this should have some nice properties:

plot of chunk unnamed-chunk-6

We can see that the location along the path is essentially uniformly distributed. On either x or y, we see the effect of “limb darkening”. Technically above we are only sampling on the half-circle, in order to enfore the periodic nature of the coordinates. This all appears quite reasonable.

Just for fun, let's try sampling from something really interesting: \( x=sin(y) \).

plot of chunk unnamed-chunk-7