The Horsesheo Prior

Huokai Wu1, , Hao Hu2

1 Mathematics and Statistics,University of Saskatchewan,Canada; 2 Mathematics and Statistics,University of Saskatchewan,Canada

Objectives

  1. Introduce the Horseshoe Prior.
  2. Introduce two close cousins: Laplacian and Student-t
  3. How can those parameter influence each of the PDFs.

The Probability Density Functions

1. Student-t distribution


(i). Probability density function:

\(f(t)=\frac{\Gamma(\frac{\nu+1}{2})}{\sqrt{\nu\pi}\Gamma(\frac{\nu}{2})}(1+\frac{t^2}{\nu})^{-\frac{\nu+1}{2}}\).

where \(\nu\) is the number of degrees of freedom and \(\Gamma\) is the gamma function.

ii). For \(\nu>1\) even, \(\frac{\Gamma(\frac{\nu+1}{2})}{\sqrt{\nu\pi}\Gamma(\frac{\nu}{2})}=\frac{(\nu-1)(\nu-3)...5\times3}{2\sqrt{\nu}(\nu-2)(\nu-4)...4\times2}\)

For \(\nu>1\) odd, \(\frac{\Gamma(\frac{\nu+1}{2})}{\sqrt{\nu\pi}\Gamma(\frac{\nu}{2})}=\frac{(\nu-1)(\nu-3)...4\times2}{\pi\sqrt{\nu}(\nu-2)(\nu-4)...5\times3}\)

2. Half Cauchy:


The half Cauchy distribution (HC) is derived from the standard Cauchy distribution by folding the curve on the origin so that only positive values can be observed, and its pdf:
\(f(x)=\frac{2}{\pi}\frac{1}{1+x^2}\), \(x>0\)

Notes: The probability density function of standard Cauchy distribution is:
\(f(x;x_0,\gamma)=\frac{1}{\pi\gamma}[\frac{\gamma^2}{(x-x_0)^2+\gamma^2}]\).

where \(x_0\) is the location parameter (specifying the location of the peak of the distribution), \(\gamma\) is the scale parameter (specifies the half-width at half-maximum (HWHM))

3. Laplace distribution:


A random variable has a \(Laplace(\mu,b)\) distribution if its probability density function is:

\(f(x|\mu,b)=\frac{1}{2b}\left\{\begin{aligned}exp(-\frac{\mu-x}{b}),(x<\mu)\\exp(-\frac{x-\mu}{b}),(x\ge\mu)\end{aligned}\right.\)

where \(\mu\) is a location parameter and \(b\) is a scale parameter (sometimes referred to as the diversity)

4. Horseshoe Prior:


(i). Under the situation where \((y|\beta)\sim N(\beta,\sigma^2I)\), where \(\beta\) is believed to be sparse.

(ii). Assumes that each \(\beta_i\) is conditionally independent with density \(\pi_{HS}(\beta_i|\tau)\), where \(\pi_{HS}\) can be represented as a scale mixture of normals:

\((\beta_i|\lambda_i,\tau)\sim N(0,\lambda_i^2\tau^2)\)

\(\lambda_i\sim C^+(0,1)\), where \(C^+(0,1)\) is a standard half-Cauchy distribution on the positive reals.

NOTES: \(\lambda_i's\) are the local shrinkage parameters, \(\tau\) is global shrinkage parameter.

(iii). The density funtion \(\pi_{HS}(\beta_i|\tau)\) lacks a closed form distribution

Methods

  1. Generating a random data set.
  2. Using gg-plot to generate the plots to have the visual comparison.
  3. Since we have the pdf of horseshoe prior before, we would like to get the marginal prior for \(\beta\). Then based on the Bayes Theorem the horseshoe distribution is \(f(\beta)=\int\limits_{\lambda\in\Omega}f(\beta|\lambda)g(\lambda)d\lambda\approx\frac{1}{n}\sum\limits_{i=1}^nf(\beta|\lambda_i)g(\lambda_i)\).
    Thus, we have the algorithm to generate the plots:

    Step 1: Generate a sequence of random data set {\(x_j\)} where \(j=1,..,1000\) range from \(-3\) to \(3\).

    Step 2: For each \(x_j\), generate \(n\) random samples from {\(\lambda_i\)} where \(i=1,...n\) and \(C^+(0,1)\), and we have\(f(\beta=x_j)\approx\frac{1}{n}\sum\limits _{i=1}^nf(\beta|\lambda_j)g(\lambda_j)\), where \(\beta=x_j\).

    Step 3: The data (\(x_j,f(\beta=x_j)\)) from \(i=1,...1000\) are what we need to generate the plots.
    # Results

1. Horseshoe Prior with different global parameters: \(\tau\) (\(\tau_1=1\),\(\tau_2=5\),\(\tau_3=10\))

HS with different global parameters

Figure 1: HS with different global parameters

2. Horseshoe Prior with different local parameters: \(\lambda\) (\(\lambda=1\),\(\lambda=5\),\(\lambda=10\))

HS with different local paramters

Figure 2: HS with different local paramters

3. Comparison of Horseshoe Prior (\(\lambda=1,\tau=1\)), Student-t ditribution (\(\nu=1\)) and Laplace distribution (\(\mu=0,b=1\))

The Comparison b/w HS, t and Laplace distributions

Figure 3: The Comparison b/w HS, t and Laplace distributions

Conclusion


From Figure 1, we know that when \(\tau\) increases, the “fake” peak of the density curve will decreases. Also, the tail of the density curve become more heavier when \(\tau\) increases.

From Figure 2, we know that when \(\lambda\) increases, the “fake” peak of the density curve will decreases. Also, the density curve will become more flatter when \(\lambda\) increses. However, there is no change for the tail of the density curve when \(\lambda\) changes.

From Figure 3, we know that the density curve of Horseshoe Prior has the highest peak (even though is “fake”), the student-t ditribution has the lowest peak, and the Laplace distribution is in the middle. However, the density curve of Horseshoe prior has the lightest tail, the student-t distribution has the heaviest tail, and the Laplace distribution is still in the middle.

References

  1. Carvalho, C. M.; Polson, N. G.; Scott, J. G.; School, B.; of Business; School, B.; of Business; School, M.; of Business; University, T.; of Chicago; University, T.; of Chicago; University, T. & of Texas Handling Sparsity via the Horseshoe, 2009
    http://proceedings.mlr.press/v5/carvalho09a/carvalho09a.pdf\
  2. Jacob, E.; series models with asymmetric innovations , “T.; Thesis.; of Statistics, D. & of Calicut, U. Half Cauchy Distribution and Process, 2013
    https://statperson.com/Journal/StatisticsAndMathematics/Article/Volume3Issue2/3_2_8.pdf\
  3. Student’s t-distribution https://en.wikipedia.org/wiki/Student%27s_t-distribution\
  4. Laplace distribution https://en.wikipedia.org/wiki/Laplace_distribution