The following process constructs a data frame of different a,b parameters and \(Pr(\theta > .5 | \sum Y_i = 57)\) given different combinations of \(\theta_0\) and \(n_0\).
#prior
theta0 <- seq(.1, .9, .1)
ltheta0 <- length(theta0)
n0 <- c(1,2,8,16,32)
ln0 <- length(n0)
n0 <- rep(n0, ltheta0)
theta0 <- rep(theta0, each = ln0)
data <- data.frame(theta0 = theta0, n0 = n0)
data$a <- data$theta0 * data$n0
data$b <- (1-data$theta0) * data$n0
# posterior
y <- 57
n <- 100
#probability that posterior theta is greater than .5
data$ptheta.5 <- pbeta(.5, data$a + y, data$b + n - y, lower.tail = FALSE)
data
## theta0 n0 a b ptheta.5
## 1 0.1 1 0.1 0.9 0.9067174
## 2 0.1 2 0.2 1.8 0.8914671
## 3 0.1 8 0.8 7.2 0.7686623
## 4 0.1 16 1.6 14.4 0.5445167
## 5 0.1 32 3.2 28.8 0.1554088
## 6 0.2 1 0.2 0.8 0.9100204
## 7 0.2 2 0.4 1.6 0.8987201
## 8 0.2 8 1.6 6.4 0.8130640
## 9 0.2 16 3.2 12.8 0.6591158
## 10 0.2 32 6.4 25.6 0.3248808
## 11 0.3 1 0.3 0.7 0.9132361
## 12 0.3 2 0.6 1.4 0.9056150
## 13 0.3 8 2.4 5.6 0.8517895
## 14 0.3 16 4.8 11.2 0.7606645
## 15 0.3 32 9.6 22.4 0.5417239
## 16 0.4 1 0.4 0.6 0.9163656
## 17 0.4 2 0.8 1.2 0.9121591
## 18 0.4 8 3.2 4.8 0.8847656
## 19 0.4 16 6.4 9.6 0.8430584
## 20 0.4 32 12.8 19.2 0.7465826
## 21 0.5 1 0.5 0.5 0.9194100
## 22 0.5 2 1.0 1.0 0.9183604
## 23 0.5 8 4.0 4.0 0.9121799
## 24 0.5 16 8.0 8.0 0.9042520
## 25 0.5 32 16.0 16.0 0.8894420
## 26 0.6 1 0.6 0.4 0.9223703
## 27 0.6 2 1.2 0.8 0.9242272
## 28 0.6 8 4.8 3.2 0.9344278
## 29 0.6 16 9.6 6.4 0.9458344
## 30 0.6 32 19.2 12.8 0.9628561
## 31 0.7 1 0.7 0.3 0.9252477
## 32 0.7 2 1.4 0.6 0.9297689
## 33 0.7 8 5.6 2.4 0.9520516
## 34 0.7 16 11.2 4.8 0.9716716
## 35 0.7 32 22.4 9.6 0.9905703
## 36 0.8 1 0.8 0.2 0.9280434
## 37 0.8 2 1.6 0.4 0.9349949
## 38 0.8 8 6.4 1.6 0.9656774
## 39 0.8 16 12.8 3.2 0.9863403
## 40 0.8 32 25.6 6.4 0.9982198
## 41 0.9 1 0.9 0.1 0.9307587
## 42 0.9 2 1.8 0.2 0.9399155
## 43 0.9 8 7.2 0.8 0.9759580
## 44 0.9 16 14.4 1.6 0.9939428
## 45 0.9 32 28.8 3.2 0.9997538
For the contour plot, I chose to have the prior values of \(\theta _0\) on the x-axis, the prior number of observations (\(n_0\)) on the y-axis, and the posterior probability \(P(\theta > .5 | \sum Y_i = 57)\) on the z-axis.
fig <- plot_ly(
x = data$theta0,
y = data$n0,
z = data$ptheta.5,
type = "contour",
contours = list(
start = 0,
end = 1,
size = .05
)
)
fig
As the contour plot shows, if we fix \(n_0\), the higher the prior \(\theta_0\), the higher \(P(\theta > .5 | \sum Y_i = 57)\). If we fix a low value of \(\theta_0\) (\(\theta_0 < .5\)), the greater \(n_0\), the lower \(P(\theta > .5 | \sum Y_i = 57)\) . But at high values of \(\theta_0\) (\(\theta_0\) > .5), the greater \(n_0\), the higher \(P(\theta > .5 | \sum Y_i = 57)\).
If someone were to use this plot with a fixed beta prior distribution, they would first find the point (x,y) on the graph where x is their prior \(\theta_0\) and y is their prior number of observations. Given this point, the z value will give \(P(\theta > .5 | \sum Y_i = 57)\). If this value is large enough (greater than some pre-specifed \(\alpha\)), they should believe \(\theta > .5\), otherwise they should not believe \(\theta > .5\).
Since the sampling distribution for each group follows a Poisson distribution, the posterior distributions for each \(\theta_A, \theta_B\) follows a \(gamma(a + \sum Y_i, b + n )\) distribution. So the posterior parameters for \(\theta_A\) are the following.
##finding the posterior dist for a
y.a <- c(12, 9, 12, 14, 13, 13, 15, 8, 15, 6)
#prior parameters
a <- 120
b <- 10
sum.ya <- sum(y.a)
n <- length(y.a)
#posterior parameters
post.a1 <- sum.ya + a
post.a2 <- n + b
c(post.a1, post.a2)
## [1] 237 20
And \(\theta_A |Y_{A1}...Y_{An} \sim gamma(237, 20)\). The following output gives the posterior mean, variance, and 95% quantile-based interval for \(\theta_A\).
mean.a <- post.a1/post.a2
mean.a
## [1] 11.85
var.a <- post.a1/(post.a2^2)
var.a
## [1] 0.5925
lower.a <- qgamma(.025, post.a1, post.a2)
upper.a <- qgamma(.975, post.a1, post.a2)
c(lower.a, upper.a)
## [1] 10.38924 13.40545
Following a similar process for \(\theta_B\):
#finding posterior dist for b
y.b <- c(11, 11, 10, 9, 9, 8, 7, 10, 6, 8, 8, 9, 7)
#prior parameters
a <- 12
b <- 1
sum.yb <- sum(y.b)
nb <- length(y.b)
#posterior parameters
post.b1 <- a + sum.yb
post.b2 <- b + nb
c(post.b1, post.b2)
## [1] 125 14
So the posterior distribution \(\theta_B | Y_{B1}...Y_{Bn} \sim gamma(125, 14)\). Again, the posterior summary statistics for \(\theta_B\) are provided as output in the following order: mean, variance, 95% quantile-based interval.
mean.b <- post.b1/post.b2
mean.b
## [1] 8.928571
var.b <- post.b1/(post.b2^2)
var.b
## [1] 0.6377551
lower.b <- qgamma(.025, post.b1, post.b2)
upper.b <- qgamma(.975, post.b1, post.b2)
c(lower.b, upper.b)
## [1] 7.432064 10.560308
The following code computes the posterior expected values of \(\theta_B\) given the modified prior. The values are then graphed alongside the 95% quantile-based posterior interval for \(\theta_A\).
n0 <- c(1:50)
a <- 12 * n0
b <- n0
post.b1 <- a + sum.yb
post.b2 <- b + nb
post.exp <- post.b1/post.b2
plot(n0, post.exp, type = "l", xlab = "n0 values", ylab = "posterior expectation"
, lwd = 2, ylim = c(9, 13.5), main = "Posterior Expectation of theta_B")
abline(h = c(lower.a, upper.a), col = "blue", lwd = 2)
legend(x = c(20, 51), y = c(12, 13), inset = .2, lty = c(1,1), lwd = 2, col = c("blue", "black"), legend = c("95% quantile interval for theta_A", "posterior expecation for theta_B"))
So adjusting the prior \(n_0\) until \(E(\theta_B | Y_{B1} ... Y_{Bn}) \in\) 95% quantile-based interval for posterior \(\theta_A\) is computed below.
for(i in 1:length(post.exp)){
if(post.exp[i] > lower.a){
print(n0[i])
break
}
}
## [1] 14
So the prior belief we would need for \(\theta_B\) to be close to \(\theta_A\) is that \(n_0 \geq 14\). The following plot visualizes this concept, with a red line marking the minimum \(n_0\) necessary to have the posterior means close to one another.
plot(n0, post.exp, type = "l", xlab = "n0 values", ylab = "posterior expectation"
, lwd = 2, ylim = c(9, 13.5), main = "Posterior Expectation of theta_B")
abline(h = c(lower.a, upper.a), col = "blue", lwd = 2)
legend(x = c(25, 51), y = c(12, 13), inset = .2, lty = c(1,1), lwd = 2, col = c("blue", "black"), legend = c("95% quantile interval for theta_A", "posterior expecation for theta_B"))
abline(v = 14, col = "red", lwd = 2)
The prompt states very explicitly that type B mice tumor rates are unknown, but type B mice are related to type A mice. This relation implies that some of our prior information about type B mice comes from our knowledge about type A mice. Therefore, the expression \(p(\theta_A, \theta_B) = p(\theta_A) * p(\theta_B)\) does not make sense since the two group means are not independent from one another. Rather, the information we know about \(p(\theta_A)\) is considered when forming \(p(\theta_B)\).We could instead write \(p(\theta_A, \theta_B) = p(\theta_A) * p(\theta_B| \theta_A)\).
The Galenshore distribution itself is a class of conjugate prior densities for \(\theta\)
galenshore <- function(a, theta, y){
prob <- (2/factorial(a-1)) * theta^(2*a) * y ^ (2*a - 1) * exp((-(theta^2 * y ^ 2)))
return(prob)
}
y <- seq(0, 5, length.out = 100)
one <- data.frame(density = galenshore(1,1, y), type = "galenshore(1,1)")
two <- data.frame(density = galenshore(2,1,y), type = "galenshore(2,1)")
three <- data.frame(density = galenshore(1,2,y), type = "galenshore(1,2)")
four <- data.frame(density = galenshore(4,2,y), type = "galenshore(4,2)")
densities <- rbind(one, two, three, four)
yvalues <- rep(y, 4)
data <- data.frame(yvalues = yvalues, densities)
library(ggplot2)
g = ggplot(data, aes(x = yvalues, y = density, group = type, color = type)) + geom_line() + xlab("y")
g
Finding the posterior disribution of \(\theta | Y_1 ... Y_n\) using a Galenshore prior:
\(\large P(\theta | Y_1 ... Y_n) = \frac{P(\theta)P(Y_1...Y_n | \theta)}{\int P(\theta)P(Y_1...Y_n | \theta) d \theta}\)
\(\large P(Y_1... Y_n | \theta) = c(Y_1 ... Y_n, a) \theta ^ {2na} e ^ {-\theta ^2 \sum y_i ^ 2}\) and
\(\large P(\theta) = d(z,b) \theta ^ {2b-1} e ^ {-\theta ^ 2 z ^ 2}\)
\(\Large P(\theta | Y_1 ... Y_n) = \frac{c(Y_1 ... Y_n, a) \theta ^ {2na} e ^ {-\theta ^2 \sum y_i ^ 2}d(z,b) \theta ^ {2b-1} e ^ {-\theta ^ 2 z ^ 2}}{c(Y_1...Y_n, a)d(z, b) \int \theta ^ {2na} e ^ {-\theta ^2 \sum y_i ^ 2} \theta ^ {2b-1} e ^ {-\theta ^ 2 z ^ 2}}\)
= \(\huge \frac{\theta ^ { 2(na+b)-1} e ^ {-\theta ^ 2 (\sum y_i ^2 + z^2)}}{\int \theta ^ { 2(na+b)-1} e ^ {-\theta ^ 2 ((\sum y_i ^2 + z^2)^.5)^2}d\theta}\) = \(\huge \frac{\theta ^ { 2(na+b)-1} e ^ {-\theta ^ 2 ((\sum y_i ^2 + z^2)^.5)^2}}{\frac{\Gamma{(na+b)}}{2((\sum y_i ^2 + z ^2)^.5)^{2(na+b)}}}\)
So \(\large p(\theta | Y_1 ... Y_n) = \frac{2}{\Gamma(na+b)} ((\sum y_i^2 + z^2)^.5) ^ {2(na+b)} e ^{-\theta ^2 ((\sum y_i^2 + z^2)^.5)^2} \theta ^ {2(na+b)-1}\)
And we have \(p(\theta|Y_1...Y_n)\) ~ \(Galenshore(na+b, \sqrt{\sum y_i ^2 + z^ 2})\).
\(\huge \frac{P(\theta_x|Y_1 ... Y_n)}{P(\theta_q|Y_1...Y_n)}\) = \(\huge \frac{\frac{2}{\Gamma(na+b)} (\sqrt{\sum y_i^2 + z^2}) ^ {2(na+b)} e ^{-\theta_x ^2 (\sqrt{\sum y_i^2 + z^2})^2} \theta_x ^ {2(na+b)-1}}{\frac{2}{\Gamma(na+b)} (\sqrt{\sum y_i^2 + z^2}) ^ {2(na+b)} e ^{-\theta_q ^2 (\sqrt{\sum y_i^2 + z^2})^2} \theta_q ^ {2(na+b)-1}}\) = \(\Large e^{-(\sqrt{\sum y _i ^2 + z^2})^2(\theta_x^2-\theta_q^2)}(\frac{\theta_x}{\theta_q})^{2(na+b)-1}\) and a sufficient statistic is \(\sum y_i^2\).
Since the Galenshore distribution \(\large E[Y] = \frac{\Gamma{(a + .5)}}{\theta \Gamma{(a)}}\), we have \(\large E[\theta|Y_1...Y_n] = \frac{\Gamma{(na+b + .5)}}{(\sqrt{\sum y_i ^ 2 + z^2}) \Gamma{(na+b)}}\)