1. If the prior mean and prior variance (or standard deviation) are known, then we can plug it in the above formulas and solve for a and b. Suppose Sophie, the editor of the student newspaper, is going to conduct a survey of students to determine the level of support for the current president of the students’ association. She needs to determine her prior distribution for p, the proportion of students who support the president. She decides her prior mean is 0.5, and her prior standard deviation is 0.15.

a. Show algebraically that a=b=5.06

\(E(Y) = \frac {a}{(a+b)} = 0.5\)

\(a = 0.5(a+b)\)

\(a-0.5a = 0.5b\)

\(a = b\)

then use a and b in V(Y),

\(V(Y) = \frac {ab}{(a+b)^2(a+b+1)}\)

\((0.15)^2 = \frac {ab}{(a+b)^2(a+b+1)}\)

solve algebraically to obtain

\(8a = \frac {364}{9}\)

\(a = (\frac {364}{9})(\frac {1}{8}) \approx 5.06\)

since a = b, a = 5.06, b = 5.06

b. Out of the 68 students that she polls, y=21 support the current president. Determine posterior distribution using the Beta(5.06,5.06) prior.

a = 5.06

b = 5.06

s = 21

f = 47

\(a' = s+a\)

\(a' = 21+5.06\)

\(a' = 26.06\)

\(b' = f+b\)

$b’ = 47+5.06 $

\(b' = 52.06\)

Thus, the posterior is Beta(26.06, 52.06)

c. Plot both the prior and posterior distributions.

p <- seq(0, 1, length = 68)
a <- 5.06
b <- 5.06
s <- 21
f <- 47

prior <- dbeta(p, a, b)
post <- dbeta(p, a+s, b+f)
plot(p, post, type = "l", ylab = "Density", lty = 2, lwd = 3)
lines(p, prior, lty = 3, lwd = 3)
legend(.7, 4, c("Prior", "Posterior"),
       lty=c(3, 1, 2), lwd = c( 3, 3, 3))

d. Construct and interpret a 90% credible interval for p.

ProbBayes::beta_interval(0.9,c(26.06, 52.06))

There is a 90% probability that the true (unknown) estimate would lie within the interval 0.250 to 0.422, given the evidence provided by the observed data.

Problem 2. Suppose Sophie, the editor of the student newspaper, conducted a survey to 68 students to determine the level of support for the current president of the students’ association. Out of the 68 students that she polls, y=21 support the current president. She needs to determine her prior distribution for p, the proportion of students who support the president. Suppose it is known that 25th and 75th percentiles of a Beta(a,b) prior are 0.393 and 0.607, respectively.

a. Find a and b.

library(ProbBayes)
beta.select(list(x=0.393,p=0.25),
            list(x=0.607,p=0.75))
## [1] 5.1 5.1

a = 5.1, b = 5.1

b. Find and plot the posterior distribution.

a = 5.1

b = 5.1

s = 21

f = 47

\(a' = s+a\)

\(a' = 21+5.1\)

\(a' = 26.1\)

\(b' = f+b\)

\(b' = 47+5.1\)

\(b' = 52.1\)

Thus, the posterior is Beta(26.1, 52.1)

p <- seq(0, 1, length = 68)
a <- 5.1
b <- 5.1
s <- 21
f <- 47

post <- dbeta(p, a+s, b+f)
plot(p, post, type = "l", ylab = "Density", lty = 2, lwd = 3)
lines(post, lty = 3, lwd = 3)
legend(.7, 4, "Posterior",
       lty=c(3, 1, 2), lwd = c( 3, 3, 3))

c. Sophie claims that at least 85% of the students support the current president. Are we going to reject or not her claim? Support your answer.

ProbBayes::beta_area(lo = 0.85, hi = 1.0,
          shape_par = c(26.1, 52.1))

1-pbeta(0.85,26.1, 52.1)
## [1] 0
rval <- rbeta(1000, 26.1, 52.1)
prop <- sum(rval >= 0.85) / 1000
print(prop)
## [1] 0

In all cases, the P(Y≥0.85)≈0, thus, the claim that at least 85% of the students support the current president is very unlikely, almost impossible to happen.

d. Using the posterior distribution, how many students are expected to support the current president if the survey is given to a sample of 100 students?

pred.prob.dist <- pbetap(c(26.1, 52.1),100,0:100)
discint(cbind(0:100,pred.prob.dist), 0.85)
## $prob
## [1] 0.8623549
## 
## $set
##  [1] 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43

Thus, \(P(23≤Y˜≤43)≈0.86\). So, there’s an 86% chance that 23-43 students in the sample will support the current president.

e. Does the posterior distribution describe well the prediction in (d)? Explain.

pred_p_sim <- rbeta(1000, 26.1, 52.1)
pred_y_sim <- rbinom(1000,100, pred_p_sim)
hist(pred_y_sim,xlab="Simulated Y", main = " ")
abline(v = mean(pred_y_sim), col = "red", lwd = 3, lty = 2)

The observed value of s is in the middle of the distribution. Notice also that the Bayesian point estimate of p is the mean of Beta(a’, b’) which is approximately 33. This probably means that the posterior distribution describes well the prediction in (d) as the sample gets larger.