A simple approach to maximum intractable likelihood estimation: AMLE

The AMLE algorithm, introduced by [1], consists of using Approximate Bayesian Computation (ABC) methods to approximate the likelihood function in cases where the likelihood is intractable and/or too computationally expensive to evaluate, and maximising this approximated likelihood function. The basic ABC algorithm is shown below:

  1. Simulate \({\bf\theta}^{\prime}\) from the prior distribution \(\pi(\cdot)\).

  2. Generate \({\bf y}\) from the model \(f(\cdot\vert{\bf\theta}^{\prime})\).

  3. Accept \({\bf\theta}^{\prime}\) with probability \(\propto K_{\varepsilon}({\bf x} \vert {\bf y})\) otherwise return to step 1.

where \(K_{\varepsilon}({\bf x}\vert {\bf y})\) is a normalised Markov kernel. The ABC algorithm produces a sample from the following approximation to the posterior distribution: \[ \widehat{\pi}_{\varepsilon}({\bf\theta}\vert{\bf x}) = \dfrac{\widehat{f}_{\varepsilon}({\bf x}\vert{\bf\theta})\pi({\bf\theta})}{\int_{\Theta}\widehat{f}_{\varepsilon}({\bf x}\vert{\bf t})\pi({\bf t})d{\bf t}}, \] where \[ \widehat{f}_{\varepsilon}({\bf x}\vert{\bf\theta}) = \int_{{\mathbb R}^{q\times n}}K_{\varepsilon}({\bf x}\vert {\bf y}) f({\bf y}\vert{\bf\theta}) d{\bf{y}} , \] is an approximation of the likelihood function. Therefore, if we use a uniform prior on a compact set that contains the Maximum Likelihood Estimator (MLE), and the Maximum a Posteriori (MAP, the maximum of the posterior distribution) coincide. Thus, it is possible to approximate the MLE by using ABC methods to generate an approximate sample from the posterior and then approximating the MAP using this sample. This can be done by using the ABC algorithm in the first step of the AMLE algorithm, as shown below:

The AMLE Algorithm

  1. Obtain a sample \({\bf \theta}^*_{\varepsilon}=({\bf \theta}^*_{\varepsilon,1},...,{\bf \theta}^*_{\varepsilon,k})\) from \(\widehat{\pi}_{\varepsilon}({\bf \theta}\vert{\bf x})\).

  2. Using the sample \({\bf \theta}^*_{\varepsilon}\) construct a nonparametric estimator \(\widehat{\pi}_{k,\varepsilon}({\bf \theta}\vert{\bf x})\) of the density \(\widehat{\pi}_{\varepsilon}({\bf \theta}\vert{\bf x})\).

  3. Calculate the maximum of \(\widehat{\pi}_{k,\varepsilon}({\bf \theta}\vert{\bf x})\), \(\tilde{{\bf \theta}}_{\varepsilon}\). This is an approximation of the MLE \(\widehat{{\bf \theta}}\).

Asymptotic properties on the AMLE are studied in [1]. The following R code shows two toy examples (binomial and normal models) that illustrate the calculation of the AMLE. These examples correspond to those presented in Sections 4.1 and 4.2 of [1].

References.

  1. A simple approach to maximum intractable likelihood estimation

Example 1: Binomial model

#########################################################################################################################################
####  Calculates the Approximate Maximum Likelihood Estimator (AMLE) of the parameter $p$ of a Binomial distribution.                          
####  The summary statistic is the sample mean. 
####  The metric employed is the Euclidean metric.   
#########################################################################################################################################


rm(list=ls())

# Simulated data

data = c(6,6,6,8,3,6,5,7,4,7,7,7,4,6,6,6,5,5,7,3,5,4,4,5,6,5,6,4,7,6)

######################################################################################################
# Input variables
######################################################################################################
#--------------------------------------------------------------------------------------------------------
# dat    :   Vector of observations.
# N      :   ABC sample size to be used for the kernel density estimation
# tol    :   Tolerance for the ABC algorithm
# a      :   Left-end point of the support of the uniform prior distribution
# b      :   Right-end point of the support of the uniform prior distribution
#-------------------------------------------------------------------------------------------------------- 


AMLE = function(dat,N,tol,a,b){
  pp = rep(0,N)
  j=1
  while(j<N+1){
    p = runif(1,a,b)
    datABC = rbinom(30,10,p)
    if(abs(mean(datABC)-mean(dat))<tol) {
      pp[j] = p
      j = j+1
    }
  }
  pB = density(pp)
  return(pB$x[which(pB$y==max(pB$y))])
}

# AMLE of the simulated data

AMLE(data,10000,0.1,0.4,0.6)
## [1] 0.5562335

Example 2: Normal model

#########################################################################################################################################
####  Calculates the Approximate Maximum Likelihood Estimator (AMLE) of the parameters $(\mu,\sigma)$ of a Normal distribution.                          
####  The summary statistic is the sample mean and sample variance.
####  The metric employed is the Euclidean metric in two dimensions.
#########################################################################################################################################

rm(list=ls())

# Required packages

library(KernSmooth)
## KernSmooth 2.23 loaded
## Copyright M. P. Wand 1997-2009
library(mvtnorm)
library(misc3d)
library(ks)
## Warning: package 'ks' was built under R version 3.3.2
## Warning in rgl.init(initValue, onlyNULL): RGL: unable to open X11 display
## Warning: 'rgl_init' failed, running with rgl.useNULL = TRUE
# Simulated data

data = c(-0.14858096,  0.07707150,  0.96922964, -1.43173858, -0.20649986, -0.66617827,
         1.08589636,  1.08440788,  0.34425089,  0.57356652, -0.40346415, -0.96862983,
         -0.80275031,  0.26647296,  2.49183268, -1.10162050, -0.78695620, -1.43644030,
         0.08592752,  2.00935874,  0.93525153,  0.68942103,  1.46106992,  0.98110375,
         1.25898725, -0.71190799,  0.23366910, -0.10465786, -1.08424077,  0.46294997,
         0.99890695,  0.70747561, -2.08774711, -0.95524464, -1.78411916,  1.11029231,
         -0.90350745,  1.11962500,  1.26074829,  1.52763765,  0.07144053,  0.71504412,
         -0.73598439, -0.15623768, -0.32990114, -1.23751435,  0.99929185,  0.25147602,
         0.74972926,  0.50649467,  1.29911930, -1.41630256, -0.51830105, -0.24920388,
         -0.14841598,  1.42094654,  1.27497412, -0.20233274,  0.26353656,  0.48123977,
         -1.11826275, -0.37355127, -0.02052075, -0.66050695,  0.63988185, -0.29413887,
         0.14280533, -0.12139647,  0.35824963, -0.84682013,  0.59795021, -0.11048339,
         -1.27087873, -1.92605219, -0.46258777, -0.45827509,  2.98644450, -1.24410514,
         1.95508486, -1.92738850, -0.16304102, -0.19533231,  0.29192288, -1.06087870,
         -1.54370706,  0.29211985, -0.08889496,  0.62322394, -1.57392493,  0.85927803,
         -0.10532303, -0.50732824, -0.34994640, -1.51477602, -0.58488975,  0.64886611,
         0.01561432, -0.69437539,  0.65501940, -0.61570078)

######################################################################################################
# Input variables
######################################################################################################
#--------------------------------------------------------------------------------------------------------
# dat    :   Vector of observations.
# N      :   ABC sample size to be used for the kernel density estimation
# tol    :   Tolerance for the ABC algorithm
# a1     :   Left-end point of the support of the uniform prior distribution on $\mu$
# b1     :   Right-end point of the support of the uniform prior distribution on $\mu$
# a2     :   Left-end point of the support of the uniform prior distribution on $\sigma$
# b2     :   Right-end point of the support of the uniform prior distribution on $\sigma$
#-------------------------------------------------------------------------------------------------------- 

# Metric

dist = function(x,y) return( sqrt( mean(x-y)^2  + (sqrt(var(x))-sqrt(var(y)))^2  )  )

# AMLE function

AMLE = function(dat,N,tol,a1,b1,a2,b2){
  pp = qq = rep(0,N)
  j=1
  while(j<N+1){
    p = runif(1,a1,b1)
    q = runif(1,a2,b2)
    datABC = rnorm(100,p,q)
    if(dist(datABC,dat)<tol) {
      pp[j] = p  
      qq[j] = q 
      j = j+1 
    }
  }
  pp1 = cbind(pp,qq)
  H.scv=Hscv(pp1)
  den = kde(pp1, H=H.scv)
  ind=which(den$estimate==max(den$estimate), arr.ind=TRUE)
  return(c(den$eval.points[[1]][ind[1]],den$eval.points[[2]][ind[2]]))
}

# AMLE of the simulated data (~5 minutes)

AMLE(data,5000,0.01,-0.1,0.1,0.9,1.1)
## [1] -0.0192377  1.0078863