a. The estimate of p is \(p = \frac {47}{80} \approx 0.5875\)

The estimate means that more than half or approximately 59% of the sample has lactose intolerant.

b. Calculate posterior distribution of p. How does this Bayesian estimate compare answer in (a)?

prior = c(0.1,0.2,0.44,0.26)
 p = c(0.4,0.5,0.6,0.7)
likelihood = p^{47}*(1-p)^{33}
PL = prior*likelihood
posterior = PL/sum(PL)
Bayesbox1 = as.data.frame(cbind(p,prior,likelihood, PL, posterior))
knitr::kable(Bayesbox1)
p prior likelihood PL posterior
0.4 0.10 0 0 0.0006491
0.5 0.20 0 0 0.1135404
0.6 0.44 0 0 0.8337986
0.7 0.26 0 0 0.0520119
posteriormean = sum(p*posterior)
posteriormean
## [1] 0.5937173
var(posterior)
## [1] 0.1536057

By using mean as Bayesian estimate, estimate of \(p\) is \(p \approx 0.5937173\) and the variance of the estimate \(\approx 0.1536057\). The difference between Bayesian estimate to the answer in \(a\) is just barely noticeable at 0.0062173. This means that the Bayesian estimate and the point estimate is equivalent.

c. If Fatima had instead collected a sample of 800 adults and 470 has lactose intolerance, how does that change the posterior distribution?

likelihood2 = p^(470)*(1-p)^(330)
PL2 = prior*likelihood2
posterior2 = PL2/sum(PL2)
Bayesbox2 = as.data.frame(cbind(p,prior,likelihood2, PL2, posterior2))
knitr::kable(Bayesbox2)
p prior likelihood2 PL2 posterior2
0.4 0.10 0 0 0.0000000
0.5 0.20 0 0 0.0000026
0.6 0.44 0 0 0.9999974
0.7 0.26 0 0 0.0000000
posteriormean2 = sum(p*posterior2)
posteriormean2
## [1] 0.5999997
var(posterior2)
## [1] 0.2499982

As the sample number gets larger, the posterior distribution gets exactly closer to 1 with a 44% prior probability. The Bayes estimate using the mean is still almost exactly the same as the previous problem. The variance gets larger than the previous one means that the data has been spread out from the mean, and from one another.

d. Use the posterior distribution in (b) as prior probabilities.

Prior = posterior
likelihood3 = p^(47)*(1-p)^(33)
PL3 = Prior*likelihood3
Posterior3 = PL3/sum(PL3)
Bayesbox3 = as.data.frame(cbind(p,Prior,likelihood3, PL3, Posterior3))
knitr::kable(Bayesbox3)
p Prior likelihood3 PL3 Posterior3
0.4 0.0006491 0 0 0.0000025
0.5 0.1135404 0 0 0.0389490
0.6 0.8337986 0 0 0.9547613
0.7 0.0520119 0 0 0.0062872
posteriormean3 = sum(p*Posterior3)
posteriormean3
## [1] 0.5967333
var(Posterior3)
## [1] 0.2210419

We know that the posterior distribution is proportional to the product of the prior distribution and the likelihood. With the prior distribution larger on \(p =0.6\), its posterior probability also is the greatest among all of it. Thus, there are two probability distributions which will influence the posterior distribution. The variance \(\approx 0.221\) is slightly spread out from the mean, and from each other. The above results lead to a strong evidence to support the point estimate of the sample \(p \approx 0.5875\) since the Bayes estimate \(p\) is \(p \approx 0.596733\), with number of adult with lactose intolerance \(Y=47\) and a sample size of \(n=80\).