S520 HW2: Exponential Family, Wald Inference, Likelihood Ratio Test & Mean-value parameterization

Part 1

Experiment Group: 26 planes inject silver iodide to the clouds.

Control Group: another 26 planes do not inject silver iodide to the clouds.

Response: Rainfall Volumn.

Random variables

Rainfall volumn for control group \(X_i\in (0,\infty)\), \(i=1,\ldots, 26\).

Rainfall volumn for experiment groups \(Y_i \in (0,\infty)\), \(i=1,\ldots, 26\).

Preliminary data analysis

From the histograms, we see right-skewed distribution of rainfall volumns within each group. And the support of rainfall volumn can be taken to be \((0,\infty)\) since all the the observations exhibits positive rainfall volumns. Therefore it is reasonable to assume the rainfall volumn follows Gamma distribution. Lastly, the interquantile rainfall volumn of seeded group is higher than that of unseeded group. It is reasonble to assume different mean parameters for the Gamma distribution of two groups.

d <- read.csv("clouddata.csv")
par(mfrow = c(1,2))
hist(d[,1], main = "UnseededRain");hist(d[,2], main = "SeededRain")

d <- read.csv("clouddata.csv")
boxplot(d, main = "BoxPlot of rainfall volumn of two groups")

Mean-value parameterization

Recall the pdf of \(\Gamma (\alpha, \beta)\):

\[ \begin{aligned} f(y|\alpha, \beta) & = {\displaystyle {\frac {\beta ^{\alpha }}{\Gamma (\alpha )}}y^{\alpha -1}e^{-\beta y}}\\ & = \exp(\alpha \log (\beta) - \log (\Gamma(\alpha)) + (\alpha-1) \log(y) -\beta y)\\ & = \exp(-B(\theta_1, \theta_2) + \theta_1 T_1(y) + \theta_2 T_2(y) + c(y)) \end{aligned} \] where

\[ \begin{aligned} \theta_1 &= -\beta, \, T_1(y) = y; \quad \theta_2 = (\alpha-1),\, T_2(y) = \log(y)\\ B(\theta_1, \theta_2) &= -\alpha\log(\beta) + \log(\Gamma(\alpha))= (-1-\theta_2)\log(-\theta_1) + \log(\Gamma(\theta_2+1))\\ E(Y) &= \mu = \partial B(\theta_1, \theta_2)/\partial \theta_1 = - (1+\theta_2)/\theta_1 = \alpha/\beta \\ Var(Y) &= \sigma^2 = \partial B^2(\theta_1,\theta_2)/\partial^2\theta_1 = (1+\theta_2)/\theta_1^2 = \alpha/\beta^2 \\ & \Rightarrow \beta = \alpha/\mu, \quad \sigma^2 =\mu^2/\alpha \end{aligned} \]

Full model

\[X_i|\alpha_C, \beta_C \stackrel{IID}{\sim} \Gamma(\alpha_C, \beta_C),\, i=1, \ldots, 26\] \[Y_i|\alpha_E, \beta_E \stackrel{IID}{\sim} \Gamma(\alpha_E, \beta_E),\, i=1, \ldots, 26\] where C stands for control group, E stands for experiment group.

Log likelihood of the seeded group is (unseeded group can be handled in the same manner) \[ \begin{aligned} \ell(\alpha_E, \beta_E) & = \sum_{i=1}^{n} \log (f(y_i|\alpha_E, \beta_E)) \\ & = n \alpha_E\log(\beta_E)- n \log(\Gamma(\alpha_E)) + (\alpha_E-1) \sum_{i=1}^{n}\log(y_i) - \beta_E \sum_{i=1}^{n}y_i \\ \implies \ell(\alpha_E, \mu_E) & = n \alpha_E\log(\alpha_E/\mu_E)- n \log(\Gamma(\alpha_E)) + (\alpha_E-1) \sum_{i=1}^{n}\log(y_i) - \alpha_E/\mu_E \sum_{i=1}^{n}y_i \\ \partial \ell(\alpha_E, \beta_E) / \partial \alpha_E &=n\log(\beta_E) - n\Gamma'(\alpha_E)/\Gamma(\alpha_E) + \sum_{i=1}^{n}\log(y_i) \\ \implies \partial \ell(\alpha_E, \mu_E) / \partial \alpha_E &= n\log(\alpha_E/\mu_E) + n -n\Gamma'(\alpha_E)/\Gamma(\alpha_E) + \sum_{i=1}^{n}\log(y_i) -n\bar{y}/\mu_E \\ \partial \ell(\alpha_E, \beta_E) / \partial \beta_E &=n\alpha_E/\beta_E - \sum_{i=1}^{n}y_i = n\mu_E - n\bar{y} \\ \implies \partial \ell(\alpha_E, \mu_E) / \partial \mu_E &= - n\alpha_E/\mu_E + n\bar{y} \alpha_E/\mu_E^2 = -n \alpha_E/\mu_E^2(\mu_E-\bar{y}) \end{aligned} \]

It is easy to check the reparameterization \((\alpha_E, \mu_E)\) does not affect the mle results, i.e. the transformation of mle dominated by one pair of parameter is the mle under the other set parameter dominated model.

In order to find Maximum Likelihood Estimator of \(\alpha_{E}\) and \(\beta_{E}\), set \(\partial \ell(\alpha_{E}, \beta_{E})/\partial \alpha_{E}\) and \(\partial \ell(\alpha_{E}, \beta_{E}) / \partial \beta_{E}\) to be zero. Solve the system, we have \(\hat{\mu}_{E, MLE} = \bar{y}\), and \[ n\log(\alpha_E/\bar{y}) -n\Gamma'(\alpha_E)/\Gamma(\alpha_E) + \sum_{i=1}^{n}\log(y_i) = 0 \]

Next, we need to solve \(\alpha_E\) by numerical method.Here function uniroot is applied. We can also find the MOM estimator of \(\alpha_E\), \(\hat{\alpha}_{E,MOM} = \bar{y}^2/ s^2(y)\), since the Mean-Value Parameterization, we know \(Var(Y) = \mu^2/\alpha\).

x <- d[,1]
y <- d[,2]
# MLE estimator of mu
mu.MLE.E <- mean(y)
mu.MLE.C <- mean(x)
# MOM estimator of alpha
a.MOM.E <- mean(y)^2 / (sd(y)^2)
a.MOM.C <- mean(x)^2 / (sd(x)^2)
# Derivative of log likelihood W.R.T alpha
der_a <- function(a, y){
  n <- length(y)
  n * log(a/ mean(y)) - n * digamma(a) +sum(log(y))
}
# Plot der_a against alpha
par(mfrow = c(1,2))
a <- seq(.1,1,length.out = 20)
plot(a , der_a(a,y) , type = "l", lwd = 2); abline(c(0,0), lty = 2, col = "red", lwd = 2)
plot(a , der_a(a,x) , type = "l", lwd = 2); abline(c(0,0), lty = 2, col = "red", lwd = 2)

# Find the MLE of alpha
a.MLE.E <- uniroot(function(i){der_a(i,y)}, c(0.5,1), tol = 10^-6)$root
a.MLE.C <- uniroot(function(i){der_a(i,x)}, c(0.5,1), tol = 10^-6)$root
# MLE of seeded group
c(mu.MLE.E, a.MLE.E)
#> [1] 441.9730769   0.6395972
# MLE of unseeded group
c(mu.MLE.C, a.MLE.C)
#> [1] 164.5730769   0.5604931

Therefore we have (441.97, 0.64) as our MLE estimate for \(\mu\) and \(\alpha\) for the experiment group, and correspondingly, (164.57, 0.56) for the control group. While the estimated mean is significantly different between two groups, the shape parameters are reasonably close. Therefore we may be interested in a reduced model where two groups share the same shape parameter \(\alpha\) under Gamma distribution. To assess the adequacy of the shared shape parameter assumption, we can perform a Wald test.

Wald Test on shape parameter

\(H_0:\, \alpha_E = \alpha_C\). Under Wald inference, we write \(R_1(\boldsymbol\theta) = \alpha_E-\alpha_C\), where \(\boldsymbol\theta = (\mu_E, \mu_C, \alpha_E, \alpha_C )\), \(b(\boldsymbol\theta)= \alpha_E-\alpha_C\), \(C_{k,1} = \partial R_1(\boldsymbol\theta)/\partial\theta_k = (0,0,1,-1)^T\).

Let \(\ell_{2n}(\boldsymbol\theta|X,Y)\) denote the log likelihood of both groups. Let \(U_{2n}(\boldsymbol\theta |X,Y)\) denote the score functions. Specifically,

\[ \ell_{2n}(\boldsymbol\theta) = \ell(\mu_E, \alpha_E) +\ell(\mu_C, \alpha_C) \]

\[ \begin{aligned} I_{2n}(\boldsymbol\theta) = & -E(\partial^2\ell_{2n}/\partial \theta_j\partial\theta_k)\\ = & - E\begin{bmatrix} \partial^2\ell_{2n}/\partial^2 \mu_E &\cdots & \partial^2\ell_{2n}/\partial\mu_E\partial \alpha_C\\ \vdots & \cdots & \vdots \\ \partial^2\ell_{2n}/\partial\alpha_C \partial \mu_E &\cdots &\partial^2\ell_{2n}/\partial^2 \alpha_C \end{bmatrix} \\ = & - E\begin{bmatrix} \partial^2\ell_{n}(\mu_E, \alpha_E)/\partial^2 \mu_E & 0 & 0 & 0\\ 0 &\partial^2\ell_{n}(\mu_C, \alpha_C)/\partial^2 \mu_C & 0 &0 \\ 0 & 0 & \partial^2\ell_{n}(\mu_E, \alpha_E)/\partial^2 \alpha_E & 0 \\ 0 & 0 & 0 & \partial^2\ell_{n}(\mu_C, \alpha_C)/\partial^2 \alpha_C \\ \end{bmatrix} \\ = &- \begin{bmatrix} - n \alpha_E/\mu_E^2 & 0 & 0 & 0\\ 0 &- n \alpha_C/\mu_C^2 & 0 &0 \\ 0 & 0 & n/\alpha_E -n(\Gamma''(\alpha_E)\Gamma(\alpha_E)- \Gamma'(\alpha_E)^2)/\Gamma(\alpha_E) & 0 \\ 0 & 0 & 0 & n/\alpha_C -n(\Gamma''(\alpha_C)\Gamma(\alpha_C)- \Gamma'(\alpha_C)^2)/\Gamma(\alpha_C) \\ \end{bmatrix} \\ = & \, \begin{bmatrix} n\alpha_E/\mu_E^2 & 0 & 0 & 0\\ 0 & n\alpha_C/\mu_C^2 & 0 &0 \\ 0 & 0 & -n/\alpha_E +n(\Gamma''(\alpha_E)\Gamma(\alpha_E)- \Gamma'(\alpha_E)^2)/\Gamma(\alpha_E) & 0 \\ 0 & 0 & 0 & -n/\alpha_C+n(\Gamma''(\alpha_C)\Gamma(\alpha_C)- \Gamma'(\alpha_C)^2)/\Gamma(\alpha_C) \\ \end{bmatrix} \\ I^{-1}_{2n}(\boldsymbol\theta) = & \text{Diag}(\mu_E^2/n\alpha_E, \mu_C^2/n\alpha_C, [-n/\alpha_E +n(\Gamma''(\alpha_E)\Gamma(\alpha_E)- \Gamma'(\alpha_E)^2)/\Gamma(\alpha_E)]^{-1}, [-n/\alpha_C+n(\Gamma''(\alpha_C)\Gamma(\alpha_C)- \Gamma'(\alpha_C)^2)/\Gamma(\alpha_C)]^{-1}) \end{aligned} \] Detailed derivation is as follows \[ \begin{aligned} \partial \ell(\alpha_E, \mu_E) / \partial \alpha_E &= n\log(\alpha_E/\mu_E) + n -n\Gamma'(\alpha_E)/\Gamma(\alpha_E) + \sum_{i=1}^{n}\log(y_i) -n\bar{y}/\mu_E \\ \partial \ell(\alpha_E, \mu_E) / \partial \mu_E &= - n\alpha_E/\mu_E + n\bar{y} \alpha_E/\mu_E^2 = -n \alpha_E/\mu_E^2(\mu_E-\bar{y})\\ \partial^2 \ell(\alpha_E, \mu_E) / \partial \alpha_E^2 &= n/\alpha_E -n(\Gamma''(\alpha_E)\Gamma(\alpha_E)- \Gamma'(\alpha_E)^2)/\Gamma(\alpha_E) \\ \partial^2 \ell(\alpha_E, \mu_E) / \partial \mu_E^2 &= n\alpha_E/\mu_E^2 -2n\bar{y} \alpha_E/\mu_E^3 = n \alpha_E/\mu_E^3(\mu_E-2\bar{y})\\ E[ \partial^2 \ell(\alpha_E, \mu_E) / \partial \alpha_E^2 ] &= n/\alpha_E -n(\Gamma''(\alpha_E)\Gamma(\alpha_E)- \Gamma'(\alpha_E)^2)(\alpha_E)/\Gamma(\alpha_E) \\ E[\partial^2 \ell(\alpha_E, \mu_E) / \partial \mu_E^2] &= - n \alpha_E/\mu_E^2\\ \end{aligned} \]

Recall \[ W_n = b^T(\hat\theta_n) \left [C(\hat\theta_n)I_n^{-1} (\hat\theta_n) C^T(\hat\theta_n) \right]^{-1} b(\hat\theta_n) \stackrel{\mathcal{L}}{\rightarrow} \chi_r^2 \] In our case \[ \begin{aligned} C(\theta_{2n}) I_{2n}^{-1}(\theta_{2n})C(\theta_{2n}) = & (0,0,1,-1) I_{2n}^{-1}(\theta_{2n})(0,0,1,-1) ^T \\ = & [-n/\alpha_E +n(\Gamma''(\alpha_E)\Gamma(\alpha_E)- \Gamma'(\alpha_E)^2)/\Gamma(\alpha_E)]^{-1}+ [-n/\alpha_C+n(\Gamma''(\alpha_C)\Gamma(\alpha_C)- \Gamma'(\alpha_C)^2)/\Gamma(\alpha_C)]^{-1} \\ W_{2n} =& (\alpha_E-\alpha_C)^2 C(\theta_{2n}) I_{2n}^{-1}(\theta_{2n})C(\hat\theta_{2n})\\ \end{aligned} \]

n = length(y)
Wn <- (a.MLE.E-a.MLE.C)^2 * ( (-n/a.MLE.E + n * trigamma(a.MLE.E))^(-1)+  (-n/a.MLE.C +n *trigamma(a.MLE.C))^(-1))^(-1)

We have already calculated the mle \(\hat{\boldsymbol{\theta}}_{2n} =\) (441.97, 164.57, 0.64, 0.56). Then calculate \(W_n =\) 0.1592871. Since \(H_0:\, \alpha_E = \alpha_C\) is a two sided test, we compare it with the critical value \(\chi^{2}( 1,0.95) =\) 3.8415. Therefore we do not reject the null hypothesis.

Reduced model: same shape parameter

\[X_i|\alpha, \beta_C \stackrel{IID}{\sim} \Gamma(\alpha, \beta_C),\, i=1, \ldots, 26\] \[Y_i|\alpha, \beta_E \stackrel{IID}{\sim} \Gamma(\alpha, \beta_E),\, i=1, \ldots, 26\]

Note \(\alpha\) can now be estimated with information from both groups. Specificly, we want to estimate \((\alpha, \beta_C, \beta_E)\). Consider the log likelihood function for two groups together. \[ \begin{aligned} \ell_{2n} (\alpha, \beta_C, \beta_E) =&\sum_{i=1}^{n} \log (f(y_i|\alpha, \beta_E)) +\sum_{i=1}^{n} \log (f(x_i|\alpha, \beta_C)) \\ = & n \alpha (\log(\beta_E)+ \log(\beta_C))- \beta_E \sum_{i=1}^{n}y_i -\beta_C \sum_{i=1}^{n}x_i \\ & - 2n \log(\Gamma(\alpha)) + (\alpha-1) \sum_{i=1}^{n}(\log(y_i)+\log(x_i)) \end{aligned} \]

\(\partial \ell_{2n} /\partial \beta_E = \partial \ell_{2n} /\partial \beta_C = 0\) produces the MLE of \(\mu_i = \alpha/\beta_i\) same as that of full model, i.e. \(\hat{\mu}_{E,MLE} = \bar{y}, \, \hat{\mu}_{C,MLE} = \bar{x}\). Then we solve MLE for \(\alpha\) by

\[ \begin{aligned} \partial \ell_{2n} / \partial \alpha = & n (\log(\beta_E) + \log(\beta_C)) - 2n\Gamma'(\alpha)/\Gamma(\alpha) + \sum_{i=1}^n (\log(y_i) + \log(x_i))=0 \end{aligned} \]

x <- d[,1]
y <- d[,2]
# MLE estimator of mu
mu.MLE.E <- mean(y)
mu.MLE.C <- mean(x)
# Derivative of log likelihood W.R.T alpha
der_a_xy <- function(a, x , y){
  n <- length(y)
  n * (log(a/ mean(y))+log(a/ mean(x))) - 2  * n * digamma(a) + sum(log(y) + log(x))
}
# Plot der_a against alpha
a <- seq(.1,1,length.out = 20)
plot(a , der_a_xy(a,x,y) , type = "l", lwd = 2); abline(c(0,0), lty = 2, col = "red", lwd = 2)


# Find the MLE of alpha
a.MLE <- uniroot(function(i){der_a_xy(i,x , y)}, c(0.5,1), tol = 10^-6)$root

# MLE 
c(a.MLE, mu.MLE.C,mu.MLE.E)
#> [1]   0.597193 164.573077 441.973077

We have \(\hat{\alpha}_{MLE}=\) 0.5972, \(\hat\beta_{C,MLE}=\) 164.5731, \(\hat\beta_{E,MLE}=\) 441.9731.

Likelihood Ratio Confidence Interval

Let F denote Full Model, R denote Reduced model. \(p := dim (\boldsymbol\Theta_F)\), \(r := dim (\boldsymbol\Theta_R)\). Let \(r_\mu := \mu_E/\mu_C\), our new parameter set for the full model is \(\boldsymbol\theta_F = (r_\mu, \mu_C, \alpha_E, \alpha_C)\), for the reduced model is \(\boldsymbol\theta_R = ( r_\mu, \mu_C, \alpha)\). \[ \hat{\boldsymbol{\theta}}_{F,n} = \sup _{\boldsymbol\theta\in \boldsymbol\Theta_F} \ell_n( \boldsymbol\theta) , \quad \hat{\boldsymbol{\theta}}_{R, n} = \sup _{\boldsymbol\theta\in \boldsymbol\Theta_R} \ell_n( \boldsymbol\theta) \]

\[ T_n \equiv -2 (\ell_n(\hat{\boldsymbol{\theta}}_{R,n}) -\ell_n(\hat{\boldsymbol{\theta}}_{F,n}) )\stackrel{D}{\rightarrow} \chi^2_{p-r} \]

Therefore the CI for \(\mu_r\) follows as \[ \{ r_\mu : -2 (\ell_n(\hat{\boldsymbol{\theta}}_{R,n}) -\ell_n(\hat{\boldsymbol{\theta}}_{F,n}) )\leq\chi^2_{1,.95} \}\\ \iff \{ \hat r_\mu : -2 (\sup_{\hat r_\mu,\hat\mu_C, \hat\alpha }\ell_n(\hat{\boldsymbol{\theta}}_{R,n} ) -\sup_{r_\mu,\mu_C,\alpha_E, \alpha_C}\ell_n(\hat{\boldsymbol{\theta}}_{F,n}) )\leq\chi^2_{1,.95} \} \]

\[ \begin{aligned} \ell(\mu_E, \mu_C, \alpha_E, \alpha_C) & = n \alpha_E\log(\alpha_E/\mu_E)- n \log(\Gamma(\alpha_E)) + (\alpha_E-1) \sum_{i=1}^{n}\log(y_i) - \alpha_E/\mu_E \sum_{i=1}^{n}y_i \\ & +n \alpha_C\log(\alpha_C/\mu_C)- n \log(\Gamma(\alpha_C)) + (\alpha_C-1) \sum_{i=1}^{n}\log(y_i) - \alpha_C/\mu_C \sum_{i=1}^{n}y_i \\ \text {Since }\mu_E = \mu_C r_\mu\,, \\ \implies \ell(r_\mu, \mu_C, \alpha_E, \alpha_C) & =n \alpha_E\log(\alpha_E/\mu_C r_\mu)- n \log(\Gamma(\alpha_E)) + (\alpha_E-1) \sum_{i=1}^{n}\log(y_i) - \alpha_E/\mu_C r_\mu \sum_{i=1}^{n}y_i \\ & +n \alpha_C\log(\alpha_C/\mu_C)- n \log(\Gamma(\alpha_C)) + (\alpha_C-1) \sum_{i=1}^{n}\log(x_i) - \alpha_C/\mu_C \sum_{i=1}^{n}x_i \\ \implies \ell (r_\mu, \mu_C, \alpha) & = 2n \alpha\log(\alpha/\mu_C ) - n\alpha\log(r_\mu)- 2n \log(\Gamma(\alpha)) + (\alpha-1) \sum_{i=1}^{n}(\log(y_i)+\log(x_i)) - \alpha/\mu_C (1/r_\mu \sum_{i=1}^{n}y_i +\sum_{i=1}^{n}x_i) \\ \text{Given } r_\mu,& \\ \partial \ell(\mu_C, \alpha|r_\mu) / \partial \mu_C & = -2n\alpha/\mu_C + \alpha/\mu^2_C (1/r_\mu \sum_{i=1}^{n}y_i +\sum_{i=1}^{n}x_i) = 0\\ \implies \mu_C &= (1/r_\mu \sum_{i=1}^{n}y_i +\sum_{i=1}^{n}x_i)/2n\\ \partial \ell(\mu_C, \alpha|r_\mu) / \partial \alpha & = 2n \log(\alpha/\mu_C) + 2n -n\log(r_\mu)-2n\Gamma'(\alpha)/\Gamma(\alpha) + \sum_{i=1}^n (\log(y_i) + \log(x_i)) - 1/\mu_C (1/r_\mu \sum_{i=1}^{n}y_i +\sum_{i=1}^{n}x_i) = 0 \end{aligned} \]

We know the maximum log likelihood from full model,

loglike <- function(mu, a, x){
  nx <- length(x)
  nx * a * log(a/mu) -nx * log(gamma(a)) + (a-1) * sum(log(x)) - a/mu * sum(x)
}
loglike_full <- loglike(mu.MLE.E,a.MLE.E,y) + loglike(mu.MLE.C,a.MLE.C,x)
loglike_reduce <- loglike(mu.MLE.E,a.MLE,y) + loglike(mu.MLE.C,a.MLE,x)
# The critical benchmark of log likelihood
critical <- qchisq(.95,1)/(-2) + loglike_full; critical
#> [1] -339.2104
critical1 <- qchisq(.95,1)/(-2) + loglike_reduce;

der_a_rmu<- function( r.mu,a, mu.C,x,y){
  2 *n * log(a/mu.C) + 2*n- n* log(r.mu) - 2*n* digamma(a) + sum( log(x) + log(y)) - (1/r.mu* sum(y) + sum(x))/mu.C
}

# Given r_mu, find mle of mu_C and alpha
mle_rmu <- function(r.mu,x,y){
  mu.C <- (1/r.mu * sum(y) + sum(x)) / (2*n )
  a <- uniroot(function(alpha){der_a_rmu( r.mu, alpha, mu.C,x,y)},c(0.00001,10))$root
  return(list("mu"=mu.C, "a" = a))
}

loglike_reduce <- function(r.mu,x,y){
    mle <- mle_rmu (r.mu,x,y)
    mu.C  <- mle$mu
    a <- mle$a
    loglike(mu.C *r.mu, a, y)+loglike(mu.C,a,x)-critical
}

find.r<-function(r) {loglike_reduce(r,x,y)}
r.mu = seq(0.1,6,length.out = 100) ;  plot(r.mu,sapply(r.mu, find.r), type = "l") ; abline (c(0,0), lty = 2, lwd=2, col ="red");

lower <- uniroot(find.r,c(1,2))$root
upper <-  uniroot(find.r,c(5,6))$root

The confidence interval for \(r_\mu\) is (1.328, 5.431).

Part 2

1. Inverse Gamma Distribution

(a)

\[ \begin{aligned} f_W(w|\alpha, \beta) & = \left |\frac{d\,w^{-1}}{d\,w} \right|f_Y(w^{-1}|\alpha, \beta)\\ & = \frac{1}{w^2} {\displaystyle {\frac {\beta ^{\alpha }}{\Gamma (\alpha )}}w^{-\alpha +1}e^{-\beta / w}}\\ & = {\displaystyle {\frac {\beta ^{\alpha }}{\Gamma (\alpha )}}w^{-\alpha -1}e^{-\beta / w}}\\ & = \exp(\alpha \log (\beta) - \log (\Gamma(\alpha)) + (-\alpha-1) \log(w) -\beta /w)\\ & = \exp(-B(\theta_1, \theta_2) + \theta_1 T_1(y) + \theta_2 T_2(y) + c(y)) \end{aligned} \]

(b)

\[ \begin{aligned} \theta_1 &= -\beta, \, T_1(w) = 1/w; \quad \theta_2 = (-\alpha-1),\, T_2(w) = \log(w)\\ B(\theta_1, \theta_2) &= -\alpha\log(\beta) + \log(\Gamma(\alpha))= (1+\theta_2)\log(-\theta_1) + \log(\Gamma(-\theta_2-1))\\ E(1/W) &= \partial B(\theta_1, \theta_2)/\partial \theta_1 = - (1+\theta_2)/\theta_1 = \alpha/\beta \\ E(\log(W)) &= \partial B(\theta_1, \theta_2)/\partial \theta_2 = \log(-\theta_1) -\Gamma'(-\theta_2-1) / \Gamma(-\theta_2-1) = \log(\beta) - \Gamma'(\alpha)/ \Gamma(\alpha) \end{aligned} \]

(c)

\[ M_{T}(u)= E(e^{u^T T(W)})=\frac{\exp \{B(\theta+u)\}}{\exp \{B(\theta)\}} \]

\[ \begin{aligned} E(W) = & E(e^{\log(W)})\\ = & M_{T}( (0,1)^T ) \\ = & \frac{\exp(B(\theta_1, \theta_2+1) )}{\exp(B(\theta_1, \theta_2))}\\ = & \frac{\beta^{(+1-\alpha)}\, \Gamma(\alpha-1)}{\beta^{(-\alpha)} \, \Gamma(\alpha)}\\ = & \beta/ (\alpha-1) \end{aligned} \]

(d)

We can perform a change of parameters \(\mu = \beta/ (\alpha-1), \phi = \alpha-1\), i.e.\(\beta = \mu\phi, \alpha = \phi+1\). Plug them respectively into the density function, we have \[ \begin{aligned} f_W(w|\alpha, \beta) & = {\displaystyle {\frac {\beta ^{\alpha }}{\Gamma (\alpha )}}w^{-\alpha -1}e^{-\beta / w}}\\ & = {\displaystyle {\frac {(\mu\phi) ^{\phi+1 }}{\Gamma (\phi+1 )}}w^{-\phi-2}e^{-\mu\phi/ w}}\\ & = \exp((\phi+1) \log(\mu\phi) - \log(\Gamma(\phi+1)) -(\phi+2)\log(w) -\mu\phi/w) \end{aligned} \]

(e)

We are not able to coerce the inverse gamma distribution to have the form of an exponenetial dispersion family, since the only possible statistics is \(\log(w)\) instead of \(w\).

2. Dirichlet distribution

(a)

\[ \begin{aligned} f_Y(y_1,y_2, y_3) = & \frac{\Pi_{i=1}^3\Gamma(\alpha_i)}{\Gamma(\sum_{i=1}^3\alpha_i)} y_1^{\alpha_1-1}y_2^{\alpha_2-1}y_3^{\alpha_3-1}\\ = & \exp \left( \sum_{i=1}^3 \log(\Gamma(\alpha_i)) - \log(\Gamma(\sum_{i=1}^3\alpha_i)) + (\alpha_1-1) \log(y_1)+ (\alpha_2-1) \log(y_2)+ (\alpha_3-1) \log(y_3)\right)\\ = & \exp(-B(\theta) + \sum_{i=1}^3\theta_i T_i(y) + c(y)) \end{aligned} \]

where \[ \begin{aligned} & \theta_i = \alpha_i-1;\quad T_i(y) = \log(y_i) \quad i = 1,2,3\\ & B(\theta) = - \sum_{i=1}^3 \log(\Gamma(\alpha_i)) + \log(\Gamma(\sum_{i=1}^3\alpha_i)) = - \sum_{i=1}^3 \log(\Gamma(\theta_i+1)) + \log(\Gamma(\sum_{i=1}^3\theta_i+1))\\ & c(y) = 0\\ & \mu_i = \partial B/ \partial \theta_i = -\Gamma'(\theta_i+1) /\Gamma(\theta_i+1) + (\Gamma'(\sum_{i=1}^3\theta_i+1))/\log(\Gamma(\sum_{i=1}^3\theta_i+1)) \end{aligned} \] The sufficient statistic is \((\log(Y_1),\log(Y_2),\log(Y_3))\). Canonical parameters are shown as above.

(b) Mean-value parameterization

Reparameteriza using \(\mu_i = \alpha_i / (\alpha_1 + \alpha_2 + \alpha_3) = \alpha_i/(\delta -1),\, i =1,2,3\) and \(\delta = \alpha_1 + \alpha_2+ \alpha_3+1\), equivalently we have \(\alpha_i = \mu_i (\delta-1)\), which gives \(\alpha_i = \mu_i (\delta -1) ,\, i = 1,2\) and \(\alpha_3 = 1- \alpha_1 -\alpha_2\). Plug the \(\alpha_i\) in term of \(\mu_i\) into the following equation. \[ \begin{aligned} f_Y(y_1,y_2, y_3|\mu_1, \mu_2, \delta) = & \exp \left( \sum_{i=1}^3 \log(\Gamma(\alpha_i)) - \log(\Gamma(\sum_{i=1}^3\alpha_i)) + (\alpha_1-1) \log(y_1)+ (\alpha_2-1) \log(y_2)+ (\alpha_3-1) \log(y_3)\right) \end{aligned} \]

(c) Random variables

Let \(Y_i = (Y_{i1}, Y_{i2}, Y_{i3}), \, i =1 ,\ldots, n\) denote the the \(i^{th}\) observation of the proportion of sand and the proportion of silt. Sample space: \(\{ [0,1]\times [0,1]\times [0,1] : y_1 + y_2 + y_3 = 1 \}\)

(d)

Let \(X_i\) be the elevation at the location and \(X_i \in [0, \infty)\). Note the range of \(X_i\) is \([0,\infty)\), it does not match the support of Dirichlet distribution \(\{ [0,1]\times [0,1]\times [0,1] : y_1 + y_2 + y_3 = 1 \}\). Consider the following model \[ \begin{aligned} \log (Y_{i1}/Y_{i3}) &= g_1(X_i) + \varepsilon_{i1} \\ \log (Y_{i2}/Y_{i3}) &= g_2(X_i) + \varepsilon_{i2} \\ \end{aligned} \] where \(g(X_i)\) is the regression design, including linear and non-linear design. Take linear model as example, let \(g_1(x) = \beta_0+ \beta_1 x\) and \(g_2(x) = \beta'_0+ \beta'_1 x\) \[ \begin{aligned} \hat y_{i1}&=\frac{e^ {\beta_0 + \beta_1x_i}}{1 + e^ {\beta_0 + \beta_1x_i}+e^ {\beta'_0 + \beta'_1x_i}} \\ \hat y_{i2}&=\frac{e^ {\beta_0 + \beta_1x_i}}{1 + e^ {\beta_0 + \beta_1x_i}+e^ {\beta'_0 + \beta'_1x_i}} \\ \hat y_{i3}&=\frac{1}{1 + e^ {\beta_0 + \beta_1x_i}+e^ {\beta'_0 + \beta'_1x_i}} \end{aligned} \]