a.p-hat= 47/80 = 0.5875 is the point estimate for p. The estimate of the true proportion or percentage of adults who are lactose intolerant from the population is 58.75%.
b.p-hat= 0.6 is the bayesian estimate for p.
Variance = 0.1536057
There is 0.0125 difference between the point estimate in (a) and Bayesian estimate in (b).
library(gt)
library(dplyr)
p <- seq(0.4, 0.7, by= 0.1)
prior <- c(0.1,0.2,0.44,0.26)
likelihood <- p^{47}*(1-p)^{33}
PL <- prior*likelihood
posterior <- PL/sum(PL)
posteriordist <- as.data.frame(cbind(p,prior,likelihood,posterior))
posteriordist %>%
gt() %>%
cols_align(align="center", columns = everything())
| p | prior | likelihood | posterior |
|---|---|---|---|
| 0.4 | 0.10 | 9.458251e-27 | 0.0006491288 |
| 0.5 | 0.20 | 8.271806e-25 | 0.1135403807 |
| 0.6 | 0.44 | 2.761140e-24 | 0.8337986003 |
| 0.7 | 0.26 | 2.914804e-25 | 0.0520118903 |
var(posteriordist$posterior)
## [1] 0.1536057
p <- seq(0.4, 0.7, by= 0.1)
prior <- c(0.1,0.2,0.44,0.26)
likelihood <- p^{470}*(1-p)^{330}
PL <- prior*likelihood
posterior <- PL/sum(PL)
posteriordist <- as.data.frame(cbind(p,prior,likelihood,posterior))
posteriordist %>%
gt() %>%
cols_align(align="center", columns = everything())
| p | prior | likelihood | posterior |
|---|---|---|---|
| 0.4 | 0.10 | 5.729392e-261 | 5.055567e-26 |
| 0.5 | 0.20 | 1.499697e-241 | 2.646640e-06 |
| 0.6 | 0.44 | 2.575638e-236 | 9.999974e-01 |
| 0.7 | 0.26 | 4.426830e-246 | 1.015611e-10 |
var(posteriordist$posterior)
## [1] 0.2499982
The above calculations show that sample size matters as well as the prior probabilities as it affects the posterior distribution. The bigger the sample size, the distribution will be more concentrated to the p with the biggest prior probability(p=0.6) given the same sample proportion.Problem C and D also shows that prior probabilties will greatly affect the posterior distribution; the bigger probability given in prior, the bigger its corresponding posterior probabilty compared to other p which has smaller prior probability.
p <- seq(0.4, 0.7, by= 0.1)
prior <- c(0.1,0.2,0.44,0.26)
likelihood <- p^{47}*(1-p)^{33}
PL <- prior*likelihood
posterior1 <- PL/sum(PL)
prior1 <- posterior1
likelihood <- p^{470}*(1-p)^{330}
PL <- prior1*likelihood
posterior <- PL/sum(PL)
posteriordist <- as.data.frame(cbind(p,prior1,likelihood,posterior))
posteriordist %>%
gt() %>%
cols_align(align="center", columns = everything())
| p | prior1 | likelihood | posterior |
|---|---|---|---|
| 0.4 | 0.0006491288 | 5.729392e-261 | 1.731781e-28 |
| 0.5 | 0.1135403807 | 1.499697e-241 | 7.928800e-07 |
| 0.6 | 0.8337986003 | 2.575638e-236 | 9.999992e-01 |
| 0.7 | 0.0520118903 | 4.426830e-246 | 1.072134e-11 |
mean = sum(p*posterior)
mean
## [1] 0.5999999
var(posteriordist$posterior)
## [1] 0.2499995