multilevel models

Multilevel models

This is code and explanantion is taken from the Jim Albert book (Bayesian Computation with R) and in particular chapter 8.

library(LearnBayes)
library(lattice)

## Warning: package 'lattice' was built under R version 3.5.3

A simple Bayesian model of heart transplant mortality (from chapter 3)

This example considers the issue of learning the rate of heart transplant surgery, it is not a hierarchial model. For a particular hospital we observe the number of transplat suregeries $n$, and the number of deaths within 30 days of surgery $y$. We can predict the probability of death for an individual patient. The prediction is based on a model that uses information such as patients medical condition before surgey, gender and race. Based on these predicted probabilities, one can obtain an expected number of deaths, denoted by $e$. A standard model assumes that the number of deaths $y$ follows a Poisson distribution with mean $e \lambda$, and the objective function is to estimate the mortality rate per unit exposure $\lambda$.

The standard estimate of $lambda$ is the maximum likelihood estimate $\hat{\lambda} = y/e$. This estimate is likely to be poor when number of deaths $y$ is close to zero. In this situation it is useful to have a Bayesian estimate that uses prior knowledge about the size of the mortality rate. A good choice for a prior distribution is a member of the $gamma(\alpha,\beta)$ density of the form

$p(\lambda) \alpha^{\alpha-1} exp(- \beta \lambda), \lambda > 0.$

A convienent source of prior information is heart transplant data from a small group of hospitals that we beleive has the same rate of mortality as the rate from the hospital of interest. We observe the number of deaths $Z_{j}$ and the mean exposure $O_{j}$ for ten hospitals $(j=1,..., 10)$, where $Z_j$ is the Poisson with mean $O_j \lambda$. We assign $\lambda$ the noninformative prior of $p(\lambda) \alpha \lambda^{-1}$, then the updated distribution for $\lambda$ given these data from the ten hospitals is:

$p(\lambda) \alpha \lambda \sum^{10}_{j=1} Z_{j}-1 exp(-\sum_{j = 1}^{10} O_j \lambda)$

In the below example code we assign a gamma($\alpha,\beta$) of (16, 15174).

If the number of observed deaths from surgery $y_{obs}$ for a given hospital is with exposure $e$ is Possion $(e \lambda)$ and $\lambda$ is assigned to the gamma$(\alpha,\beta)$ prior then the posterior distribution will also have the gamma form with parameters $\alpha + Y_{obs}$ and $\beta + e$. The (prior) predictive density of $y$ before any data are observed can be computed using:

($f(y) = \frac{f(y|\lambda) g(\lambda)}{g(\lambda|y)}$,

where: $f(y|\lambda)$ is the Poisson $(e\lambda)$ sampling density and $g(\lambda)$ the prior density of $\lambda$ and $g(\lambda|y)$ is the density of the posterior of $\lambda$.

Ok, so now we can infer the heart transplant death rate of two hospitals - one that has a small number of surgeries and a second that has experienced many surgeries. Hospital A has one death $(Y_{obs}=1)$ with an exposure of $e=66$. The first definitions are alpha and beta for the gamma prior, exposure $ex$ is defined. The predictive density of the values $y=0,1,...,10$ is found by using the above formula and functions $dpois$ and $dgamma$ This is shown in code snippet below:

alpha=16;beta=15174
 yobs=1; ex=66
 y=0:10
 lam=alpha/beta
 py=dpois(y, lam*ex)*dgamma(lam, shape = alpha,
   rate = beta)/dgamma(lam, shape= alpha + y,
   rate = beta + ex)
 cbind(y, round(py, 3))

##        y      
##  [1,]  0 0.933
##  [2,]  1 0.065
##  [3,]  2 0.002
##  [4,]  3 0.000
##  [5,]  4 0.000
##  [6,]  5 0.000
##  [7,]  6 0.000
##  [8,]  7 0.000
##  [9,]  8 0.000
## [10,]  9 0.000
## [11,] 10 0.000

The posterior density lamda can be summarised by simulating 1000 values from a gamma density.

 lambdaA = rgamma(1000, shape = alpha + yobs, rate = beta + ex)

Now consider the estimation of a different hospital that experiences many surgeries. Hospital B had yobs = 4 deaths, with exposure of e =1767. Now compute the predictive prior density and simulate 1000 draws from the posterior density using the $rgamma$ function. We observe the number of deaths is consistent with the model since $y_{obs}$=4 is not in an extreme tail of the distribution.

 ex = 1767; yobs=4
 y = 0:10
 py = dpois(y, lam * ex) * dgamma(lam, shape = alpha, 
     rate = beta)/dgamma(lam, shape = alpha + y,
     rate = beta + ex)
 cbind(y, round(py, 3))

##        y      
##  [1,]  0 0.172
##  [2,]  1 0.286
##  [3,]  2 0.254
##  [4,]  3 0.159
##  [5,]  4 0.079
##  [6,]  5 0.033
##  [7,]  6 0.012
##  [8,]  7 0.004
##  [9,]  8 0.001
## [10,]  9 0.000
## [11,] 10 0.000

 lambdaB = rgamma(1000, shape = alpha + yobs, rate = beta + ex)

To see the impact of the prior density on the inference it is helpful to display the prior and posterior distributions on the same graph. The next bit of code does this for simulated draws from the posterior distributions of the rates for hospitals A and B. The gamma prior density is also shown. Hospital A with little experience of surgeries the prior information is significant and the posterior distribution resembles the prior distribution. However, for Hospital B with many surgeries the prior information is less influential and the posterior distribution resembles the likelihood function.

 par(mfrow = c(2, 1))
 plot(density(lambdaA), main="HOSPITAL A", xlab="lambdaA", lwd=3)
 curve(dgamma(x, shape = alpha, rate = beta), add=TRUE)
 legend("topright",legend=c("prior","posterior"),lwd=c(1,3))
 plot(density(lambdaB), main="HOSPITAL B", xlab="lambdaB", lwd=3)
 curve(dgamma(x, shape = alpha, rate = beta), add=TRUE)
 legend("topright",legend=c("prior","posterior"),lwd=c(1,3))

Individual and combined estimates

So recall we have data from 94 hospitals recording the number of deaths within 30 days from a heart transplant. We can plot the ratios $\{y_i/e_i\}$ of the number of deaths per unit of exposure against the log of the exposures $\{log(e_i)\}$ for all hospitals, where each data point is labelled by the number of observed deaths $y_i$.

data(hearttransplants)
attach(hearttransplants)

## The following object is masked _by_ .GlobalEnv:
## 
##     y

head(hearttransplants)

##      e y
## 1  532 0
## 2  584 0
## 3  672 2
## 4  722 1
## 5  904 1
## 6 1236 0

options(warn=-1)

plot(log(e), y/e, xlim=c(6,9.7), xlab="log(e)", ylab="y/e", 
main="death rates against log exposure for all hospitals")
text(log(e),y/e,labels=as.character(y),pos=4)

The estimated rates are highly variable, especially for hospitals with small exposures (no of operations). Hospitals with no deaths are mainly associated with small exposures. If we wish to estimate the true mortality rate for all hospitals we could simply estimate the true rates by using the individual death rates. However, these individual rates can be poor estimates especially for those hospitals ith small exposures.Some hospitals did not experience any deaths and therefore the individual death rate $y_i/e_i=0$ would underestimate the hospitals true mortality rate, these hospitals with small exposures have high variance anyway.

Since the individual death rates can be poor, its seems desirable to combine the individual estimates in a way to obtain improved estimates. If we can assume the true mortality rates are equal across hospitals:

$\lambda_1 = ... = \lambda_{94}$

This is an “equal-means” Poisson model and the estimate of the mortality rate for the $i$th hospital would be a pooled estimate:

$\frac{\sum^{94}_{j=1} y_i}{\sum^{94}_{j=1} e_j}$

The above pooled model estimate is based on the strong assumption that the true mortality rate is the same across hospitals, this is probably unlikely since we should expect some variation.

So far we have discussed to possible estimates for the mortality rate of the $i$th hospital: the individual estimate $y_i/e_i$ and the pooled estimate $\sum y_i/ \sum e_j$. A third possibility is the compromise estimate:

$(1-) + $

This estimate shrinks or moves the individual estimate $y_i/e_i$ towards the pooled estimate $\sum y_i/ \sum e_j$ here the parameters $0 < \lambda < 1$ determines the size of the shrinkage. This shrinkage estimate is a natural by-product of the application of an exchangable prior model to the true mortality rates.

Equal Mortality Rates?

Before we consider the

data(hearttransplants)
attach(hearttransplants)

## The following object is masked _by_ .GlobalEnv:
## 
##     y

## The following objects are masked from hearttransplants (pos = 3):
## 
##     e, y

sum(hearttransplants$y)

## [1] 277

sum(hearttransplants$e)

## [1] 294681

For our data we compute the posterior density for the common rate $\lambda$ is gamma(277,294681). In order to simulate from the predictive distribution of $y_{94}$ we fisrt simulate 1000 draws of the posterior density of $\lambda$ and then simulate draws of $y_{94}$ from a Poisson distribution with mean $e_{94} \lambda$.

  # 55 (277)
 lambda <- rgamma(1000,shape=277,rate=294681)
 ys94<- rpois(1000,e[94]*lambda)

The histogram displays the posterior predictive distribution and the actual number of transplant deaths $y_{94}$ is indicated by a vertical line.

 hist(ys94,breaks=seq(0.5,max(ys94)+0.5))
 lines(c(y[94],y[94]),c(0,120),lwd=4)

The observed $y_j$ is in the tail portion of the distribution, it seems inconsistent with the fitted model - it suggests that this hospital actually has a higher true mortality rate than estimated from the equal rates model.

The next section of code computes the probabilities of “at least as extreme” for all observations and places the probabilities in the vector $pout$. We needa short inline function $prob.out$ that calculates this probability for a single subscript and then uses $sapply$ to use it for all indices.

lambda=rgamma(1000,shape=277,rate=294681)
prob.out=function(i)
{
   ysi=rpois(1000,e[i]*lambda)
   pleft=sum(ysi<=y[i])/1000
   pright=sum(ysi>=y[i])/1000
   min(pleft,pright)
}

Now plot the probabilities against the log exposures.

pout=sapply(1:94,prob.out)
plot(log(e),pout,ylab="Prob(extreme)")

A number of the tail probabilities appear rather small (15 are smaller than 0.10) hich implies the “equal-rates” model is NOT good for explaining the distribution of mortality rates for the 94 hospitals we must assume differences exist between the true mortaility rates. We will use the exchangable model next.

Modeling a Prior Belief of Exchangeability

The function $pgexchprior$ computes the log prior density, it receives as arguments the vector of true rates $lambda$ and vector $pars$ containing the prior parameters $\alpha_0, a, b$.

pgexchprior=function(lambda,pars)
{
alpha=pars[1]; a=pars[2]; b=pars[3]
(alpha-1)*log(prod(lambda))-(2*alpha+a)*log(alpha*sum(lambda)+b)
}

We assign $\mu$ an inverse gamma(10,10) distribution ($a=10, b=10)$. We plot contour plots of the joint density of $(\lambda_1,\lambda_2)$ for values $\alpha_0$ equal to 5, 20, 80, and 400.

alpha=c(5,20,80,400); par(mfrow=c(2,2))
for (j in 1:4)
    mycontour(pgexchprior,c(.001,5,.001,5),c(alpha[j],10,10),
      main=paste("ALPHA = ",alpha[j]),xlab="LAMBDA 1",ylab="LAMBDA 2")

Since $\mu$ is assigned an inverse gamma(10,10) distribution, both the true rates $\lambda_1$ and $\lambda_2$ are centered on the value 1. The hyperparameter $\alpha$ is a precision parameter that controls the correlation between the parameters. For the fixed value $\alpha$=400, we should note that $\lambda_1$ and $\lambda_2$ are concentrated along the line $\lambda_1=\lambda_2$. As the precisison parameter $\alpha$ approaches infinity, the exchangable prior places all of its mass along the space where $\lambda_1=...=\lambda_{94}$.

Although we used subjective priors to illustrate the behavior of the prior distribution in practice vague distributions can be chosen for the hyperparameters $\mu$ and $\alpha$.

Simulating from the Posterior

In the last section, the posterior density of all the parameters was expressed as: g(hyperparameters|data) g(true rates|hyperparameters,data)

where hyperparameters are $(\mu,\alpha)$ and the true rates are $(\lambda_1,...,\lambda_{94})$, by using the composition method we can simulate a random draw from the joint posterior by:

Item simulating $(\mu,\alpha)$ from marginal posterior distribution
Item simulating $\lambda_1,...,\lambda_{94}$ from their distribution conditional on the values of the simulated $\mu$ and $\alpha$

First process is to simulate from the marginal density of the hyperparameters $\mu$ and $\alpha$. Since both paremeters are positive, a good first step is to transform each to the real-valued parameters. The function $poissgamexch$ contains the definition of the log posterior of $\theta_1$ and $\theta_2$. It expects to inputs: $theta$ a vector corresponding to a value of $(\theta_1,\theta_2)$, and $datapar$ - an R list with to components, the data and the value of the hyperparameter $z0$

datapar = list(data = hearttransplants, z0 = 0.53)
 start=c(2, -7)
 fit = laplace(poissgamexch, start, datapar)
 fit

## $mode
## [1]  1.883954 -6.955446
## 
## $var
##              [,1]         [,2]
## [1,]  0.233694921 -0.003086655
## [2,] -0.003086655  0.005866020
## 
## $int
## [1] -2208.503
## 
## $converge
## [1] TRUE

The above output gives information about the location of the posterior density, by trial and error we use the $mycontour$ function to find a grid that contains the posterior density of $(\theta_1,\theta_2)$.

 par(mfrow = c(1, 1))
 mycontour(poissgamexch, c(0, 8, -7.3, -6.6), datapar,
   xlab="log alpha",ylab="log mu")

Examing the contour plot we can see that the posterior density for $(\theta_1,\theta_2)$ is nonnormal shaped, especially in the direction of $_1=log. Since the normal approximation to the posterior is inadequate, we obtain a simulated sample of $(\theta_1,\theta_2)$ by using the “Metropolis within Gibbs” algorithm from function $gibbs$. We off at value $(\theta_1,\theta_2)=(4,-7)$ and iterate through 1000 cycles with the metropolis scale parameters $c_1=1, c_2=.15$, the output shows the acceptance rates for the two conditional distributions are each about 30%.

 start = c(4, -7)
 fitgibbs = gibbs(poissgamexch, start, 1000, c(1,.15), datapar)
 fitgibbs$accept

##       [,1]  [,2]
## [1,] 0.508 0.492

The next contour plot is fitted with the data points.

 #par(mfrow = c(1, 1))
 mycontour(poissgamexch, c(0, 8, -7.3, -6.6), datapar,
   xlab="log alpha",ylab="log mu")
points(fitgibbs$par[,1],fitgibbs$par[,2])

The next plot shows the kernal density estimate of marginal posterior distribution of the precision parameter $\theta_1=log(\alpha)$.

plot(density(fitgibbs$par[,1], bw=0.2),main="kernal density estimate of marginal posterior distribution")

some more text.

alpha = exp(fitgibbs$par[, 1])
mu = exp(fitgibbs$par[, 2])
#lam1 <- rgamma(1000, y[1] + alpha, e[1] + alpha/mu)

Comparing hospitals

 data(hearttransplants)
 #attach(hearttransplants)

 datapar <- list(data = hearttransplants, z0 = 0.53)
 start <- c(2, -7)
 fit <- laplace(poissgamexch, start, datapar)
 fit

## $mode
## [1]  1.883954 -6.955446
## 
## $var
##              [,1]         [,2]
## [1,]  0.233694921 -0.003086655
## [2,] -0.003086655  0.005866020
## 
## $int
## [1] -2208.503
## 
## $converge
## [1] TRUE

 par(mfrow <- c(1, 1))

## NULL

 mycontour(poissgamexch, c(0, 8, -7.3, -6.6), datapar, xlab="log alpha",ylab="log mu")

 start <- c(4, -7)
 fitgibbs <- gibbs(poissgamexch, start, 1000, c(1,.15), datapar)

 alpha <- exp(fitgibbs$par[, 1])
 mu <- exp(fitgibbs$par[, 2])

 shrink <- function(i) mean(alpha/(alpha + e[i] * mu))
 shrinkage <- sapply(1:94, shrink)

 plot(log(e), shrinkage)

The shrinkage plot shows the posterior skrinkages against the log exposures for the heart tranplant data.

I couldnt get this to run correctly using rMarkdown. So it is in the plain text format.

We want to find the hospital with the smallest estimated mortality rate. This is hospital number 85 in our list.

mrate<- function(i) mean(rgamma(10000, y[i] + alpha, e[i] + alpha/mu))

hospital <- 1:94

meanrate <-sapply(hospital,mrate)

hospital[meanrate==min(meanrate)]

multilevel models

Ken McGarry

3 April 2019