Prior elicitation is the practice of transforming expert knowledge into a distribution and assigning it proper weight relative to data.

Today

In two groups complete this exercise and report back next class. Before then post your group answers to Canvas/Discussion. Here are some functions/packages:

source('clarkFunctions2026.R')
library(repmis)

How do I use the prior?

So far we have used non-informative prior distributions, specified to be overwhelmed by the data. Is the prior nothing more than a device to transform the likelihood into a distribution of parameters (posterior)? How could I make it communicate my prior belief about the distribution of parameters?

The prior distribution

If I am asked about my belief that it will rain tomorrow, I would not have a hard time coming up with a value for \(p\). For example, if I feel clueless, my answer would be \(p = 1/2\) (maximum ignorance). If I am asked about the prior distribution of values, i.e., \([p]\), that would be more difficult. How much of the prior distribution should be assigned to values less than, say, \(0.1\) or greater than \(0.8\)?

Prior elicitation is a huge literature in Bayesian analysis that includes which experts to consult, evaluating their input, and turning it into a distribution. Alternatively, an objective prior seeks to let the data dominate.

A prior distribution generally has a central tendency, e.g., the prior mean of the normal distribution, and a prior weight relative to the data. The weight of the data is typically proportional to sample size. Together with the noise in the data, the sample size has a direct effect on the posterior variance, e.g., the stardard error of a mean estimate, \(\sigma/\sqrt{n}\). The weight of the prior is inversely proportional to its variance. The posterior variance is a weighted average of data and prior mean.

Non-informative regression prior

For the regression example we used this model for, say, \(p = 3\) predictors:

\[ [\boldsymbol{\beta}|\mathbf{y}, \mathbf{X}, \sigma^2, \mathbf{b}, \mathbf{B}] = \prod_{i=1}^n N(y_i | \mathbf{x}_i' \boldsymbol{\beta}, \sigma^2) \times MVN_p( \boldsymbol{\beta}|\mathbf{b}, \mathbf{B}) \] Our non-informative prior was centered on zero, but that didn’t matter because the variances were huge:

\[ \mathbf{b} = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix} , \mathbf{B} = \begin{bmatrix} 1000 & 0 & 0\\ 0 & 1000 & 0 \\ 0 & 0 & 1000 \end{bmatrix} \]

Here I load the FIA data we used for the Tobit regression:

source_data("https://github.com/jimclarkatduke/gjam/blob/master/xdataCluster.rdata?raw=True")
source_data("https://github.com/jimclarkatduke/gjam/blob/master/baCluster.rdata?raw=True")
y    <- baCluster[,'pinusTaeda']
data <- data.frame( cbind(xdataCluster, y) )


Here is this prior distribution fitted with the Tobit.


xnames  <- c( 'moist','standAge','meanTemp','annualPrec','silt30','sand30' )
form    <- vars2formula( xnames )
Q       <- length( xnames )
priorB  <- matrix( 0, Q, 1, dimnames = list( xnames, NULL ) )
priorVB <- 1000*diag( Q )
colnames( priorVB ) <- rownames( priorVB ) <- xnames
fit1    <- bayesReg(formula = form, data, ng = 5000, TOBIT = T)
## NULL
## ================================================================================

Exercise 1. Find the parameter estimates and chains in object fit1. Determine the following:

  1. estimates that differ from zero
  2. the shape of the posterior distribution
  3. which estimates agree/disagree with prior understanding

How to make the prior informative

To make the prior informative I would likely want something other than zeros in \(\mathbf{b}\) and more weight (smaller variances) in \(\mathbf{B}\).

For the FIA basal area data I want an informative prior reflecting my knowledge that loblolly pine does well on moist, fertile soils and early in stand development. I start by specifying variables that I believe to be important for the species pinusTaeda. These variables must be in colnames( data ):

Variable name Description Prior belief Why
moist site moisture status positive field obs
standAge stand age negative early successional sp
meanTemp site temperature unsure
annualPrec site precipitation unsure
silt30 silt content positive field obs
sand30 sand content unsure

The prior belief in this table reflects field experience with this species as early-successional and tending to occupy rich (silt) soils.

Exercise 2. Make the prior distribution informative and determine its effect. You will need to modify \(\mathbf{b}\) and \(\mathbf{B}\).

  1. For different prior distributions determine the contribution of data
  2. What do you learn about the controls on this species that you did not learn using the non-informative prior?
## NULL
## ================================================================================

Truncated prior distribution

In the foregoing exercise I made the prior informative by changing the location and weight of the distribution. An alternative approach can engage truncated prior distributions that assign non-zero probability to only positive or negative values of a coefficient.

The matrix priorLoHi has two columns that specify the lower and upper bounds on parameter values. Here is an example:

priorLoHi <- matrix( NA, 3, 2, 
                    dimnames = list( c( 'moist', 'standAge', 'sand30' ), 
                                     c( 'lo', 'hi' ) ) )
priorLoHi[,1] <- c( 0, -Inf, -Inf )
priorLoHi[,2] <- c( Inf, 0, 0 )

fit3 <- bayesReg(formula = form, data, ng = 5000, TOBIT = T, priorLoHi = priorLoHi)
##            [,1] [,2]
## intercept  -Inf  Inf
## moist         0  Inf
## standAge   -Inf    0
## meanTemp   -Inf  Inf
## annualPrec -Inf  Inf
## silt30     -Inf  Inf
## sand30     -Inf    0
## ================================================================================

Exercise 3. Compare the posterior distributions you obtained with the non-informative prior, the prior with non-zero mean, and the truncated prior.

  1. Describe the effect of prior distributions on posterior shape.
  2. For the informative prior cases, what was the effect on estimates that had non-informative priors? Why?
  3. What are the advantages/disadvantages of the two prior distributions?