Large Sample Properties HW

Exercise 4.2

Normal approximation: derive the analytic form of the information matrix and the normal approximation variance for the bioassay example.

Given an uninformative prior distribution and the binomial model in the bioassay example, the log-posterior can be written as

\[log(P(\alpha,\beta|y_i,n_i,x_i))=y_ilog[logit^{-1}(\alpha+\beta x_i)]+(n_i-y_i)[logit^{-1}(\alpha+\beta x_i)].\]

Calculate then the second derivatives for the information matrix. Consider first:

\[\frac{\partial log(\alpha,\beta | y_i, n_i, x_i)}{\partial \alpha}=y_i-y_i\frac{1}{1-exp(\alpha+\beta x_i)}(-exp(\alpha+\beta x_i))+(n_i-y_i)-(n_i-y_i)\frac{1}{1-exp(\alpha+\beta x_i)}(-exp(\alpha + \beta x_i)\]

\[=(y_i+n_i-y_i)(1+\frac{exp(\alpha + \beta x_i)}{1-exp(\alpha + \beta x_i)})\]

\[=n_i(1-exp(\alpha + \beta x_i))^{-1}.\] Also,

\[\frac{\partial log(\alpha,\beta|y_i,n_i,x_i)}{\partial \beta}=y_ix_i-y_i\frac{1}{1-exp(\alpha+\beta x_i)}(-exp(\alpha+\beta x_i))x_i+(n_i-y_i)x_i-(n_i-y_i)\frac{1}{1-exp(\alpha + \beta x_i)}(-exp(\alpha + \beta x_i))x_i\]

\[=n_ix_i(1+\frac{exp(\alpha + \beta x_i)}{1-exp(\alpha + \beta x_i)})\]

\[n_ix_i(1-exp(\alpha + \beta x_i))^{-1}\]

Then the second derivatives can be calculated as:

\[\frac{\partial^2 log(\alpha,\beta | y_i, n_i, x_i)}{\partial \alpha ^2}=-n_i(1-exp(\alpha + \beta x_i))^{-2}(-exp(\alpha + \beta x_i))\] \[=n_i\frac{exp(\alpha + \beta x_i)}{(1-exp(\alpha + \beta x_i))^2}\] and

\[\frac{\partial ^2log(\alpha,\beta|y_i,n_i,x_i)}{\partial \beta \partial \alpha}=-n_i(1-exp(\alpha + \beta x_i))^{-2}(-exp(\alpha + \beta x_i))x_i\] \[=n_ix_i\frac{exp(\alpha + \beta x_i)}{(1-exp(\alpha + \beta x_i))^2}\] also \[\frac{\partial^2 log(\alpha,\beta|y_i,n_i,x_i)}{\partial \beta^2}=-(1-exp(\alpha + \beta x_i))^{-2}(-exp(\alpha + \beta x_i))x_in_ix_i\] \[=n_ix_i^2 \frac{exp(\alpha + \beta x_i)}{(1-exp(\alpha + \beta x_i))^2}.\]

Then \(I(\alpha,\beta)\) is written as:

\[I(\alpha,\beta)=\begin{bmatrix} n_i\frac{exp(\alpha + \beta x_i)}{(1-exp(\alpha + \beta x_i))^2} & n_ix_i\frac{exp(\alpha + \beta x_i)}{(1-exp(\alpha + \beta x_i))^2}\\ n_ix_i\frac{exp(\alpha + \beta x_i)}{(1-exp(\alpha + \beta x_i))^2} & n_ix_i^2 \frac{exp(\alpha + \beta x_i)}{(1-exp(\alpha + \beta x_i))^2} \end{bmatrix}\]

While, by definition, the variance matrix corresponds to the inverse of the information matrix.

Exercise 4.6

Statistical decision theory: a decision-theoretic approach to the estimation of an unknown parameter \(\theta\) introduces the loss function \(L(\theta,a)\) which, loosely speaking, gives the cost of deciding that the parameter has the value \(a\), when it is in fact equal to \(\theta\). The estimate a can be chosen to minimize the posterior expected loss,

\[E(L(a|y))=\int L(\theta|a)p(\theta|y)d\theta\]

The optimal choice of \(a\) is called a Bayes estimate for the loss function \(L\). Show that:

  • If \(L(\theta,a)=(\theta-a)^2\) (squared loss error), then the posterior mean, \(E(\theta|y)\), if it exists, is the unique Bayes estimate of \(\theta\).

To minimize the posterior predictive loss:

\[\frac{d}{da}\int(\theta-a)^2p(\theta|y)d\theta = 0\] \[-2\int(\theta-a)p(\theta|y)d\theta=0\] \[\int\theta p(\theta|y)d\theta - \int p(\theta|y)d\theta = 0\] Since \(\int P(\theta|y)d\theta =1\),

\[a=\int\theta p(\theta|y)d\theta = E[\theta|y]\] Consider aldo the second derivative:

\[\frac{\partial ^2}{\partial a ^2}=-2\int \frac{\partial}{\partial a}(\theta - a) P(\theta|y)d\theta\] \[=-2\int -p(\theta|y)d\theta\] \[=2\]

  • If \(L(\theta,a)=|\theta-a|\), then any posterior median of \(\theta\) is a Bayes estimate of \(\theta\).

To minimize:

\[\frac{\partial}{\partial a}\int|\theta-a|p(\theta|y)d\theta\] \[-\int \frac{|\theta-a|}{(\theta-a)}p(\theta|y)d\theta =0\]

Which leads to two cases, when \(\theta >a\) and \(\theta < a\). This leads to

\[-\int_{a}^\infty p(\theta|y)d\theta + \int_{-\infty}^a p(\theta|y)=0\] \[\int_a^{\infty}p(\theta|y)d\theta=\int_{-\infty}^a p(\theta|y)d\theta \] \(a=median\)

Remark the second derivative is:

\[-\int\frac{ -\frac{|\theta-a|}{(\theta-a)}(\theta-a)(-1)|\theta-a|}{(\theta-a)^2}p(\theta|y)\] \[-\int 0 p(\theta|y)\] \[=0\] - If \(k_0\) and \(k_1\) are nonnegative numbers, not both zero, and

\[\begin{equation} L(\theta,a)= \begin{cases} k_0(\theta-a) & \text{if } \theta \ge a\\ k_1(a-\theta) & \text{if } \theta < a \end{cases} \end{equation}\] then any \(\frac{k_0}{k_0+k_1}\) quantile of the posterior distribution \(p(\theta|y)\) is a Bayes estimate of \(\theta\).

Minimize:

\[\frac{E[L(a)]}{\partial a}=\frac{\partial}{\partial a}[\int_a^{+\infty}k_0(\theta-a)p(\theta|y)d\theta + \int_{-\infty}^ak_1(a-\theta)p(\theta|y)d\theta]\] \[\int_a^{+\infty}k_0(-1)p(\theta|y)d\theta + \int_{-\infty}^ak_1(1)p(\theta|y)d\theta =0\]

considering \(1-\int_a^\infty p(\theta|y)d\theta=\int_{-\infty}^a p(\theta|y)d\theta\)

\[k_0\int_a^\infty p(\theta|y)d\theta = k_1 \int_{-\infty}^a p(\theta|y)d\theta\] \[k_0 \int_a^\infty p(\theta|y)d\theta = k_1-k_1 \int_a^\infty p(\theta|y)d\theta\]

\[k_0 \int_a^\infty p(\theta|y)d\theta+k_1\int_a^\infty p(\theta|y)d\theta = k_1\] \[\int_a^\infty p(\theta|y)d\theta = \frac{k_1}{k_o+k_1}\]

Now a remark on the second derivative:

\[\frac{\partial}{\partial a} \int_{-\infty}^a p(\theta|y)d\theta\] \[=p(\theta=a|y)\frac{\partial a}{\partial a}-p(\theta=-\infty|y)\frac{\partial (-\infty)}{\partial a}\] \[=P(\theta=a|y)\] \[>0\]