MCMC without likelihoods

Michael Grosskopf
April 3, 2014

References

Markov Chain Monte Carlo without likelihoods

  • Marjoram, Molitor, Plagnol, and Tavare

Approximate Bayesian Computational Methods

  • Marin, Pudlo, Robert, and Ryder

Likelihood-free Markov Chain Monte Carlo\( ^* \)

  • Sisson and Fan

\( ^* \) on arXiv

In many complex models, the likelihood is expensive or impossible to evaluate

It is necessary to sample from the posterior using MCMC methods

Common situations listed in Sisson and Fan include:

  • Tree-based methods for evolution (Fagundes 2007)
  • Protein Networks (Ratmann 2009)
  • Wireless network communications (Nevat 2008)

Often much faster to simulate data from the model

Approximate Bayesian Computation can be used for complex problems

The general idea:

  • Sample model parameters from a distribution
    • Often the prior
  • Generate simulated data under the model with those parameters
  • Accept samples if the simulated data matches the observed data

The above actually contains no approximations

  • According to Marin, the above idea was first mentioned by Rubin (1984) “as an intuitive way to understand posterior distributions from a frequentist perspective”

It's not quite that straightforward

  • \( P(D^\prime = D) \approx 0 \) for reasonable sized datasets
    • \( P(D^\prime = D) = 0 \) for continuous data

Simple example:

d <- matrix(rbinom(5000, 1, 0.5), nrow = 1000, ncol = 5)
d1 <- rbinom(5, 1, 0.5)

Only 3.6% of the values in d match d1.

Can improve the acceptance rate at cost of fidelity

  • Compare summary statistics, \( T(\cdot) \) instead of full data
    • Ideally a sufficient statistic (rare)
  • Allow for some discrepency \( \epsilon \) when comparing
    • Must choose a function \( \rho(\cdot) \) as measure of discrepancy

17.3% of the values in total successes in d match the total in d1.

ABC-MCMC can further improve acceptance rates

To sample from \( P(\theta \mid D) \) starting at \( \theta_t \):

  • Draw \( \theta^\prime \) from proposal distribution \( q(\theta_t \to \theta^\prime) \)
  • Generate simulated data from the model given \( \theta^\prime \)
  • Reject if \( \rho(T(D_{obs}),T(D^\prime \mid \theta^\prime)) > \epsilon \)
  • Otherwise, set \( \rho_{t+1} \) = \( \theta^\prime \) with probability \( h(\cdot) \)
    • \( h(\cdot) = \text{min}\left(1,\frac{\pi(\theta^\prime)q(\theta^\prime \to \theta_t)}{\pi(\theta_t)q(\theta_t \to \theta^\prime)}\right) \)
  • Iterate

It is easy to show this satisfies the detailed balance condition

It's important to note that ABC is inherently conservative

The accept/reject step will always accept simulated data that exactly matches the observed data

Limit:
As acceptance rate \( \to \) 1,
Sampling distribution \( \to \) the prior

Approximation cannot make the sampling distribution overconfident, it can make it a biased estimate of the posterior though.

Examples

if (there is time){

Go to RStudio

}

else{

Go to next slide

}

Example: Simple Binomial Data

\( X \sim \) Binomial(4,\( p \))
\( \pi(p) \sim \) Beta(2,2)

# Setting up problem and test data
testn <- 4
truePar <- 0.3
testData <- rbinom(testn, 1, truePar)
par(lwd = 2, mfrow = c(2, 2))
xs <- seq(0, 1, length = 200)

Our test data is:
\[ 0, 0, 0, 1 \]

First the exact procedure without likelihoods

# Testing ABC with full data
tol = 0
n <- 10000
draws <- rep(0.5, n)
accept <- rep(0, n)
for (i in 2:n) {
    prop <- rbeta(1, 2, 2)
    if (all(testData == rbinom(testn, 1, prop))) {
        draws[i] <- prop
        accept[i] <- 1
    } else {
        draws[i] = draws[i - 1]
    }
}

The acceptance rate is: 0.0544 for comparing full data.

Then ABC using a sufficient statistic

# Testing ABC with a summary statistic
for (i in 2:n) {
    prop <- rbeta(1, 2, 2)
    if (abs(sum(testData) - sum(rbinom(testn, 1, prop))) <= tol) {
        draws[i] <- prop
        accept[i] <- 1
    } else {
        draws[i] = draws[i - 1]
    }
}

The acceptance rate is: 0.2669 using a sufficient statistic.

Now ABC-MCMC with the full data

# Testing MCMC without likelihood on simple problem
for (i in 2:n) {
    alpha <- 8 * draws[i - 1] + 1
    beta <- 10 - alpha
    prop <- rbeta(1, shape1 = alpha, shape2 = beta)
    alpha2 <- 8 * prop + 1
    if (all(testData == rbinom(testn, 1, prop)) & (runif(1) < dbeta(prop, 2, 
        2)/dbeta(draws[i - 1], 2, 2) * dbeta(draws[i - 1], shape1 = alpha2, 
        shape2 = 10 - alpha2)/dbeta(prop, shape1 = alpha, shape2 = beta))) {
        draws[i] <- prop
        accept[i] <- 1
    } else {
        draws[i] = draws[i - 1]
    }
}

The acceptance rate is: 0.3138 for comparing full data.

ABC-MCMC with the sufficient statistic

# Testing MCMC without likelihood on simple problem
for (i in 2:n) {
    alpha <- 8 * draws[i - 1] + 1
    beta <- 10 - alpha
    prop <- rbeta(1, shape1 = alpha, shape2 = beta)
    alpha2 <- 8 * prop + 1
    if ((abs(sum(testData) - sum(rbinom(testn, 1, prop))) <= tol) & (runif(1) < 
        dbeta(prop, 2, 2)/dbeta(draws[i - 1], 2, 2) * dbeta(draws[i - 1], shape1 = alpha2, 
            shape2 = 10 - alpha2)/dbeta(prop, shape1 = alpha, shape2 = beta))) {
        draws[i] <- prop
        accept[i] <- 1
    } else {
        draws[i] = draws[i - 1]
    }
}

The acceptance rate is: 0.5043 using a sufficient statistic.

A comparison of the sampling distributions

plot of chunk unnamed-chunk-7

Example 2: Continuous Normal Data

\( X \sim \) Normal\( (\mu=6,1) \)
\( \pi(\mu) \sim \) Normal\( (0,10) \)

# Setting up problem and test data
testn <- 10
truePar <- 6
testData <- rnorm(testn, truePar, 1)
par(lwd = 2, mfrow = c(2, 2))
xs <- seq(0, 10, length = 100)

Our data is: \[ 5.302, 4.7151, 6.99, 6.1118, 6.1142,\\ 7.6982, 6.0478, 6.6549, 7.3653, 6.4026 \]

Probability of simulated data matching observed is 0

draws <- rep(mean(testData), n)
accept <- rep(0, n)
for (i in 2:n) {
    prop <- rnorm(1, draws[i - 1], 2)
    if ((abs(mean(testData) - mean(rnorm(testn, prop, 1))) <= tol) & (runif(1) < 
        dnorm(prop, 0, 10)/dnorm(draws[i - 1], 0, 10))) {
        draws[i] <- prop
        accept[i] <- 1
    } else {
        draws[i] = draws[i - 1]
    }
}

The acceptance rate is: 0!

We can increase the tolerance

tol = 0.2
for (i in 2:n) {
    prop <- rnorm(1, draws[i - 1], 2)
    if ((abs(mean(testData) - mean(rnorm(testn, prop, 1))) <= tol) & (runif(1) < 
        dnorm(prop, 0, 10)/dnorm(draws[i - 1], 0, 10))) {
        draws[i] <- prop
        accept[i] <- 1
    } else {
        draws[i] = draws[i - 1]
    }
}

The acceptance rate is: 0.0779.

We can increase the tolerance more

tol = 0.5
for (i in 2:n) {
    prop <- rnorm(1, draws[i - 1], 2)
    if ((abs(mean(testData) - mean(rnorm(testn, prop, 1))) <= tol) & (runif(1) < 
        dnorm(prop, 0, 10)/dnorm(draws[i - 1], 0, 10))) {
        draws[i] <- prop
        accept[i] <- 1
    } else {
        draws[i] = draws[i - 1]
    }
}

The acceptance rate is: 0.2511.

... and more

tol = 1
for (i in 2:n) {
    prop <- rnorm(1, draws[i - 1], 2)
    if ((abs(mean(testData) - mean(rnorm(testn, prop, 1))) <= tol) & (runif(1) < 
        dnorm(prop, 0, 10)/dnorm(draws[i - 1], 0, 10))) {
        draws[i] <- prop
        accept[i] <- 1
    } else {
        draws[i] = draws[i - 1]
    }
}

The acceptance rate is: 0.5127.

... and maybe too much

tol = 3
for (i in 2:n) {
    prop <- rnorm(1, draws[i - 1], 2)
    if ((abs(mean(testData) - mean(rnorm(testn, prop, 1))) <= tol) & (runif(1) < 
        dnorm(prop, 0, 10)/dnorm(draws[i - 1], 0, 10))) {
        draws[i] <- prop
        accept[i] <- 1
    } else {
        draws[i] = draws[i - 1]
    }
}

The acceptance rate is: 0.8563.

Definitely too much....

tol = 10
for (i in 2:n) {
    prop <- rnorm(1, draws[i - 1], 2)
    if ((abs(mean(testData) - mean(rnorm(testn, prop, 1))) <= tol) & (runif(1) < 
        dnorm(prop, 0, 10)/dnorm(draws[i - 1], 0, 10))) {
        draws[i] <- prop
        accept[i] <- 1
    } else {
        draws[i] = draws[i - 1]
    }
}

The acceptance rate is: 0.9858.

A comparison of the sampling distributions

plot of chunk unnamed-chunk-15

Extensions of these algorithms have been made (Marin 2011)

  • Incorporating the acceptance tolerance as a model parameter and use it to assess the model fit (Ratmann 2009)
  • Methods of importance sampling and sequential MC have been developed (Sisson 2007,Beaumont 2009)
  • Methods for using even the samples where the simulated data is far from the observed data through appropriate weighting (Beaumont 2002)
  • Using the acceptance probability in ABC for model selection (Marin,Robert 2010)
  • Fruitful future research paths also discussed in (Marin 2011)

References

Markov Chain Monte Carlo without likelihoods

  • Marjoram, Molitor, Plagnol, and Tavare

Approximate Bayesian Computational Methods

  • Marin, Pudlo, Robert, and Ryder

Likelihood-free Markov Chain Monte Carlo\( ^* \)

  • Sisson and Fan

\( ^* \) on arXiv