Metropolis Hastings for variance components

Monte Carlo methods make use of random numbers to generate samples from a probability distribution P(x) and to evaluate expectations from this distribution, which is oftentimes complex and cannot be evaluated by exact methods. In Bayesian inference, P(x) is usually the joint posterior distribution defined over a set of random variables. However, obtaining independent samples from this distribution is not easy, depending on the dimensionality of the sampled space. Hence, we need to resort to more sophisticated Monte Carlo methods that can help simplify this problem; e.g. importance sampling, rejection sampling, Gibbs sampling, and Metropolis Hastings sampling. These methods usually involve sampling from a proposal density Q(x) in lieu of P(x).

In importance sampling, we generate samples from Q(x) and introduce weights to account for sampling from the incorrect distribution. We then adjust for the importance of each point in the estimator that we need to evaluate. In rejection sampling, we sample a point from the proposal distribution Q(x) and compute the ratio P(x)/Q(x). We then draw a random number u from a U(0,1) distribution; if \(u < P(x)/Q(x)\), we accept the point x, else we reject and go back to sampling another point from Q(x). Gibbs sampling is a method for sampling from distributions over at least two dimensions. Here, the proposal distributions Q(x) are defined in terms of conditional distributions of the joint distribution P(x). We simulate posterior samples from P(x) by iteratively sampling from the posterior conditionals while keeping other variables fixed at their current values.

While, importance sampling and rejection sampling need Q(x) to be similar to P(x) (such densities are difficult to create in case of high dimensional problems), Gibbs sampling is difficult to apply when the conditional posteriors don’t have a known form. This assumption can be relaxed in case of the more general Metropolis-Hastings algorithm where candidate samples are accepted or rejected probabilistically. This algorithm can accommodate both symmetric and asymmetric proposal distributions. The algorithm can be described as follows:

Initialize \(x^{(0)} \sim Q(x)\)

Do \(i=1:n\)

Draw \(x^{new} \sim Q(x^{(i)}|x^{(i-1)})\)

Compute \(a(x^{new},x^{old})=min(1,\frac{P(x^{new})Q(x^{old})}{P(x^{old})Q(x^{new})})\)

Draw \(U\) from \(U(0,1)\)

If \(U \le a(x^{new},x^{old})\) set \(x^{(i)} =x^{new}\)

Else set \(x^{(i)} =x^{old}\)

EndDo

Gibbs sampling is a special case of Metropolis hastings. it involves a proposal that is always accepted (always has a Metropolis-Hastings ratio of 1).

In this exercise, we apply the Metropolis Hastings algorithm for estimating the variance component of regression coefficients in a standard G-BLUP model.

For the G-BLUP model,

\[ y_{n\times1} = X_{n\times p}b_{p\times 1} + \varepsilon_{n\times 1}\] where \(y=\{y_i \}_{i=1}^n\) and \(X=\{X_{ij} \}_{i=1,j=1}^{n,p}\) represent the vector of phenotypes and the matrix of genotypes, \(b=\{b_j\}_{j=1}^p\) is the vector of marker effects and \(\varepsilon=\{\varepsilon_i \}_{i=1}^n\) is the vector of model residuals that are normally distributed with means 0 and variances \(\sigma^2_b\) and \(\sigma^2_\varepsilon\), respectively.

The conditional posterior density for \(\sigma^2_b\) given the remaining parameters is given by: \[P(\sigma^2_b|b,\sigma^2_\varepsilon) \propto N(b|0,I\sigma^2_b) \chi^{-2}(\sigma^2_b|df_0, S) \] \[= \big(\dfrac{1}{\sqrt{2\pi}\sigma_b}\big)^p e^{\dfrac{-\sum_{j=1}^pb_j^2}{2\sigma^2_b}} *\frac{1}{2^\frac{df}{2}\Gamma(\frac{df}{2})}e^{\dfrac{-df_0 \times S}{2\sigma^2_b}}\big(\dfrac{df_0 \times S}{\sigma^2_b}\big)^{\frac{df}{2}-1} \] \[\propto \big(\dfrac{1}{\sigma^2_b}\big)^{\frac{p+df_0}{2}+1} e^{\dfrac{-\sum_{j=1}^pb_j^2-(df_0\times S)}{2\sigma^2_b}}\]

This is a scale-inverse chi-squared distribution, \[P(\sigma^2_b|b,\sigma^2_\varepsilon)\propto \chi^{-2}(p+df_0, \dfrac{b'b+df_0\times S}{p+df_0}) \]

Suppose we need to make the prior for \(\sigma^2_b\) as uninformative as possible. One option could be to set \(df_0\)=-2 and \(S_0=df_0\times S=0\) and use rejection sampling to estimate \(\sigma^2_b\); however, setting \(S_0=0\) might cause the algorithm to get stuck at 0. Hence we need a prior that is an alternative to the scale inverse chi-squared and can be very flexible. For this purpose, we propose to use the beta distribution. Since the resulting posterior is not a proper distribution, the Metropolis Hastings algorithm would be a good choice to obtain posterior samples for \(\sigma^2_b\).

Here we treat \(P(\sigma^2_b|b,\sigma^2_\varepsilon)\) as our proposal distribution \(Q\). Thus, \[ Q(\sigma^2_b)= \chi^{-2}(p+df_0, \dfrac{b'b+S_0}{p+df_0}) \] \[ log(Q(\sigma^2_b))\propto (\frac{-p-df_0}{2}+1)log(\sigma^2_b)-\dfrac{b'b+S_0}{2\sigma^2_b} \]

Our target distribution is the product of the normal likelihood of \(b\)|\(\sigma^2_b\) and the beta prior for \(\sigma^2_b\). Since beta distribution has a domain that lies between 0 and 1, we use the variable \(\frac{\sigma^2_b}{MAX}\) instead for the beta prior, where MAX is a number ensured to be larger than \(\sigma^2_b\) such that \(0<\frac{\sigma^2_b}{MAX}<1\). Thus, \[ log(P(\sigma^2_b) )\propto \frac{-p}{2}log(\sigma^2_b)-\frac{b'b}{2\sigma^2_b}+(\alpha_1-1)log\big(\frac{\sigma^2_b}{MAX}\big) +(\alpha_2-1)log\big(1-\frac{\sigma^2_b}{MAX}\big)\] where \(\alpha_1\) and \(\alpha_2\) are the shape parameters of the beta distribution whose mean is given by \(\frac{\alpha_1}{\alpha_1+\alpha_2}\)

We follow the algorithm steps outlined above and calculate our acceptance ratio, as follows:

\(log(a({\sigma^2_b}^{(new)}),{\sigma^2_b}^{(old)})=min\{0,(log({P({\sigma^2_b}^{(new)}))-log({P({\sigma^2_b}^{(old)}))+log(Q({\sigma^2_b}^{(old)})})-log(Q({\sigma^2_b}^{(new)})})\}\)

We then draw a random number \(u\) from a uniform distribution and accept the sample point \({\sigma^2_b}^{(new)}\) if \(log(u)<log(a({\sigma^2_b}^{(new)}))\), else we reject the point and retain the current value and iterate again until convergence.

References

Gelman, Andrew, et al. Bayesian data analysis. Vol. 2. London: Chapman & Hall/CRC, 2014.

Sorensen, Daniel, and Daniel Gianola. Likelihood, Bayesian, and MCMC methods in quantitative genetics. Springer Science & Business Media, 2002.

http://www.bcs.rochester.edu/people/robbie/jacobslab/cheat_sheet/metropolishastingssampling.pdf

De Los Campos, Gustavo, et al. “Predicting quantitative traits with regression models for dense molecular markers and pedigree.” Genetics 182.1 (2009): 375-385.

Metropolis Hastings algorithm

mh=function(p, nIter, varb, df0=-2, S=0, shape1,shape2, MAX)
{
  set.seed(100)
  S0 = df0*S 
  DF = p+df0
  
  b = rnorm(p,0,sqrt(varb))
  SS=sum(b^2)
  x = (SS+S0)/rchisq(df=DF,n=1)
  MLE = SS/p
#  x = 0.2

  chain =   logp.new = logp.old = logq.new = logq.old = y= AP = numeric(nIter+1)
  chain[1]=x
   
  for (i in 1:nIter) {
      y[i] <-(SS+S0)/rchisq(df=DF,n=1)
  
   logq.old[i]=-((DF/2)+1)*log(chain[i]) - (SS+S0)/(2*chain[i])
    logp.old[i]=-(p/2)*log(chain[i]) - (SS/(2*chain[i])) + (shape1-1)*(log(chain[i]/(MAX)))+(shape2-1)*(log(1-(chain[i]/(MAX))))

    logq.new[i]=-((DF/2)+1)*log(y[i]) - (SS+S0)/(2*y[i])
    logp.new[i]=-(p/2)*log(y[i]) - (SS/(2*y[i])) + (shape1-1)*(log(y[i]/(MAX)))+(shape2-1)*(log(1-(y[i]/(MAX))))
  
    A = logp.new[i]-logp.old[i]-logq.new[i]+logq.old[i]
    AP[i] = min(1, exp(A))
    
    chain[i+1] = ifelse (runif(1)<AP[i] , y[i], chain[i])
#    cat("Iter=",i,"chain=",chain[i],"logq.old=",logq.old[i],"logp.old=",logp.old[i],"logq.new=",logq.new[i],"logp.new=",logp.new[i],"acceptProb=",exp(A),"\n")
  }
    cat("MeanAcceptProb=",round(mean(AP),3),"\n")
    cat("initial value=",round(x,3),"\n")
    cat("MLE=",round(MLE,3),"\n")

  out = list(chain,y,logq.old,logq.new,logp.old,logp.new,MLE)
  
  return(out)
}

Gibbs sampler

gs=function(p, nIter, varb, df0=-2, S=0)
{
  set.seed(100)
  S0 = df0*S 
  DF = p+df0
  
  b = rnorm(p,0,sqrt(varb))
  SS=sum(b^2)
  chain  = numeric(nIter+1)

  for (i in 1:nIter) {
      chain[i] <-(SS+S0)/rchisq(df=DF,n=1)
 }

  return(chain)
}

Make Plots function

makeplot = function(out1,out2)
{
chain1 = out1[[1]]
y1 = out1[[2]]
logq.old1 = out1[[3]]
logq.new1 = out1[[4]]
logp.old1 = out1[[5]]
logp.new1 = out1[[6]]
MLE = out1[[7]]
chain2 = out2

cat("Summary of chain for MH:","\n")
print(summary(chain1))
plot(density(chain1),xlim=xlim,main="")
lines(density(chain2),xlim=xlim,main="",col="blue")
abline(v=varb,col="red",lwd=3)
abline(v=MLE,col="green",lwd=3)
cat("Legend: BLUE = Gibbs Sampler ; BLACK = Metropolis Hastings ; GREEN = MLE ; RED = TRUE VALUE","\n")

par(mfrow=c(1,2))
plot(chain1,logq.old1,col="black",xlim=xlim,main="", ylab="log(ProposalDistbn)")
plot(chain1,logp.old1,col="blue",xlim=xlim,main="",  ylab="log(TargetDistbn)")
}

Set parameters

#df0=-2; S=0
varb=0.6
sample.small = 50
sample.large = 500
shape.flat = 1.3
shape.skew = c(50,5)
nIter = 1000
MAX = 1
xlim = c(0,1)

RUN GIBBS SAMPLER

##################
out.gs_1=gs(p=sample.small,nIter=nIter,varb=varb)
out.gs_2=gs(p=sample.large,nIter=nIter,varb=varb)

RUN METROPOLIS HASTINGS UNDER DIFFERENT SCENARIOS

##################

Small sample size, flat prior

out.mh=mh(p=sample.small,nIter=nIter,varb=varb,shape1=shape.flat,shape2=shape.flat, MAX=MAX)

## MeanAcceptProb= 0.99 
## initial value= 0.466 
## MLE= 0.398

makeplot(out.mh, out.gs_1)

## Summary of chain for MH: 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2097  0.3728  0.4222  0.4344  0.4874  0.8200

## Legend: BLUE = Gibbs Sampler ; BLACK = Metropolis Hastings ; GREEN = MLE ; RED = TRUE VALUE

Small sample size, large shape 1 parameter for beta

out.mh=mh(p=sample.small,nIter=nIter,varb=varb,shape1=shape.skew[1],shape2=shape.skew[2], MAX=MAX)

## MeanAcceptProb= 0.008 
## initial value= 0.466 
## MLE= 0.398

makeplot(out.mh, out.gs_1)

## Summary of chain for MH: 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4658  0.8087  0.8200  0.7987  0.8200  0.8200

## Legend: BLUE = Gibbs Sampler ; BLACK = Metropolis Hastings ; GREEN = MLE ; RED = TRUE VALUE

Small sample size, large shape 2 parameter for beta

out.mh=mh(p=sample.small,nIter=nIter,varb=varb,shape1=shape.skew[2],shape2=shape.skew[1], MAX=MAX)

## MeanAcceptProb= 0.059 
## initial value= 0.466 
## MLE= 0.398

makeplot(out.mh, out.gs_1)

## Summary of chain for MH: 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2097  0.2436  0.2524  0.2698  0.2978  0.4658

## Legend: BLUE = Gibbs Sampler ; BLACK = Metropolis Hastings ; GREEN = MLE ; RED = TRUE VALUE

Small sample size, same (large) shape parameters for beta

out.mh=mh(p=sample.small,nIter=nIter,varb=varb,shape1=shape.skew[1],shape2=shape.skew[1], MAX=MAX)

## MeanAcceptProb= 0.445 
## initial value= 0.466 
## MLE= 0.398

makeplot(out.mh, out.gs_1)

## Summary of chain for MH: 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3421  0.4544  0.4831  0.4852  0.5199  0.6398

## Legend: BLUE = Gibbs Sampler ; BLACK = Metropolis Hastings ; GREEN = MLE ; RED = TRUE VALUE

Large sample size, flat prior

out.mh=mh(p=sample.large,nIter=nIter,varb=varb,shape1=shape.flat,shape2=shape.flat, MAX=MAX)

## MeanAcceptProb= 0.993 
## initial value= 0.669 
## MLE= 0.606

makeplot(out.mh, out.gs_2)

## Summary of chain for MH: 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.5006  0.5851  0.6084  0.6101  0.6354  0.7731

## Legend: BLUE = Gibbs Sampler ; BLACK = Metropolis Hastings ; GREEN = MLE ; RED = TRUE VALUE

Large sample size, large shape 1 parameter for beta

out.mh=mh(p=sample.large,nIter=nIter,varb=varb,shape1=shape.skew[1],shape2=shape.skew[2], MAX=MAX)

## MeanAcceptProb= 0.077 
## initial value= 0.669 
## MLE= 0.606

makeplot(out.mh, out.gs_2)

## Summary of chain for MH: 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.5690  0.6863  0.7068  0.7109  0.7191  0.7731

## Legend: BLUE = Gibbs Sampler ; BLACK = Metropolis Hastings ; GREEN = MLE ; RED = TRUE VALUE

Large sample size, large shape 2 parameter for beta

out.mh=mh(p=sample.large,nIter=nIter,varb=varb,shape1=shape.skew[2],shape2=shape.skew[1], MAX=MAX)

## MeanAcceptProb= 0.057 
## initial value= 0.669 
## MLE= 0.606

makeplot(out.mh, out.gs_2)

## Summary of chain for MH: 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.5006  0.5111  0.5182  0.5268  0.5429  0.6692

## Legend: BLUE = Gibbs Sampler ; BLACK = Metropolis Hastings ; GREEN = MLE ; RED = TRUE VALUE

Large sample size, same (large) shape parameters for beta

out.mh=mh(p=sample.large,nIter=nIter,varb=varb,shape1=shape.skew[1],shape2=shape.skew[1], MAX=MAX)

## MeanAcceptProb= 0.398 
## initial value= 0.669 
## MLE= 0.606

makeplot(out.mh, out.gs_2)

## Summary of chain for MH: 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.5006  0.5495  0.5687  0.5692  0.5908  0.6692

## Legend: BLUE = Gibbs Sampler ; BLACK = Metropolis Hastings ; GREEN = MLE ; RED = TRUE VALUE

Metropolis Hastings for variance components

August 9, 2015