Posterior distribution

p_grid <- seq( from=0 , to=1 , length.out=1000 )
prior <- rep( 1 , 1000 )
likelihood <- dbinom( 6 , size=9 , prob=p_grid )
posterior <- likelihood * prior
posterior <- posterior / sum(posterior)
set.seed(100)
samples <- sample( p_grid , prob=posterior , size=1e4 , replace=TRUE )

Use the values in samples to answer the questions that follow.

3E1. How much posterior probability lies below p = 0.2?

sum(samples < 0.2) / 1e4
## [1] 4e-04

3E2. How much posterior probability lies above p = 0.8?

mean(samples > 0.8)
## [1] 0.1116

3E3. How much posterior probability lies between p = 0.2 and p = 0.8?

mean(samples > 0.2 & samples < 0.8)
## [1] 0.888

3E4. 20% of the posterior probability lies below which value of p?

quantile(samples, 0.20)
##       20% 
## 0.5185185
# 20% of the posterior probability lies below p= 0.52.

3E5. 20% of the posterior probability lies above which value of p?

quantile(samples, 1 - 0.20)
##       80% 
## 0.7557558
# 20% of the posterior probability lies above p= 0.76.

3E6. Which values of p contain the narrowest interval equal to 66% of the posterior probability?

local({r <- getOption("repos")
       r["CRAN"] <- "http://cran.r-project.org"
       options(repos=r)})
install.packages(c('coda','mvtnorm'))
## package 'coda' successfully unpacked and MD5 sums checked
## package 'mvtnorm' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\wxx19\AppData\Local\Temp\RtmpcRvIOv\downloaded_packages
options(repos=c(getOption('repos'),rethinking='http://xcelab.net/R'))
install.packages('rethinking',type='source')
library(rethinking)
## Loading required package: rstan
## Loading required package: StanHeaders
## Loading required package: ggplot2
## rstan (Version 2.19.3, GitRev: 2e1f913d3ca3)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
## For improved execution time, we recommend calling
## Sys.setenv(LOCAL_CPPFLAGS = '-march=corei7 -mtune=corei7')
## although this causes Stan to throw an error on a few processors.
## Loading required package: parallel
## Loading required package: dagitty
## rethinking (Version 2.01)
## 
## Attaching package: 'rethinking'
## The following object is masked from 'package:stats':
## 
##     rstudent
HPDI(samples, prob = 0.66)
##     |0.66     0.66| 
## 0.5085085 0.7737738
##The narrowest interval equal to 66% of the posterior probability is within [0.51, 0.77].

3E7. Which values of p contain 66% of the posterior probability, assuming equal posterior probability both below and above the interval?

PI(samples, prob = 0.66)
##       17%       83% 
## 0.5025025 0.7697698
##The interval of [0.50, 0.77] contains 66% of the posterior probability with equal probability mass both below and above.

3M1. Suppose the globe tossing data had turned out to be 8 water in 15 tosses. Construct the posterior distribution, using grid approximation. Use the same flat prior as before.

p_grid <- seq( from=0 , to=1 , length.out=1000 )
prior <- rep( 1 , 1000 )
likelihood_M <- dbinom( 8 , size=15 , prob=p_grid )
posterior_M <- likelihood_M * prior
posterior_M <- posterior_M / sum(posterior_M)
plot(x = p_grid, y = posterior_M, type = "l")

3M2. Draw 10,000 samples from the grid approximation from above. Then use the samples to calculate the 90% HPDI for p.

samples_M <- sample(p_grid, prob = posterior_M, size = 10000, replace = TRUE)
HPDI(samples_M, prob = .90)
##      |0.9      0.9| 
## 0.3293293 0.7167167
#The 90% HPDI is within [0.33, 0.72].

3M3. Construct a posterior predictive check for this model and data. This means simulate the distribution of samples, averaging over the posterior uncertainty in p. What is the probability of observing 8 water in 15 tosses?

w <- rbinom(10000, size = 15, prob = samples_M)
simplehist(w)

mean(w == 8)
## [1] 0.1444
#The probability of observing 8 water in 15 tosses is equal to 14.4%. Accordiing to the histogram, this is the most likely outcome.

3M4. Using the posterior distribution constructed from the new (8/15) data, now calculate the probability of observing 6 water in 9 tosses.

w_2 <- rbinom(10000, size = 9, prob = samples_M)
simplehist(w_2)

mean(w_2 == 6)
## [1] 0.1751
#The probability of observing of 6 water in 9 tosses is 17.5%.

3M5. Start over at 3M1, but now use a prior that is zero below p = 0.5 and a constant above p = 0.5. This corresponds to prior information that a majority of the Earth’s surface is water. Repeat each problem above and compare the inferences. What difference does the better prior make? If it helps, compare inferences (using both priors) to the true value p = 0.7.

p_grid <- seq( from=0 , to=1 , length.out=1000 )
prior_5 <- ifelse(p_grid < .5, 0, 1)
likelihood_M <- dbinom( 8 , size=15 , prob=p_grid )
posterior_5 <- likelihood_M * prior_5
posterior_5 <- posterior_5 / sum(posterior_5)
plot(x = p_grid, y = posterior_5, type = "l")

samples_5 <- sample(p_grid, prob = posterior_5, size = 10000, replace = TRUE)
HPDI(samples_5, prob = .90)
##      |0.9      0.9| 
## 0.5005005 0.7117117
#The 90% HPDI is within [0.50, 0.71].
w_3 <- rbinom(10000, size = 15, prob = samples_5)
simplehist(w_3)

mean(w_3 == 8)
## [1] 0.1589
#The probability of observing 8 water in 15 tosses is equal to 15.9%. 
w_4 <- rbinom(10000, size = 9, prob = samples_5)
simplehist(w_4)

mean(w_4 == 6)
## [1] 0.2415
#The probability of observing of 6 water in 9 tosses is 24.2%.

3M6. Suppose you want to estimate the Earth’s proportion of water very precisely. Specifically, you want the 99% percentile interval of the posterior distribution of p to be only 0.05 wide. This means the distance between the upper and lower bound of the interval should be 0.05. How many times will you have to toss the globe to do this?

compute_pi_width <- function(N, true_p) {
  likelihood_6 <- dbinom( round(N*true_p), size=N, prob=p_grid )
  posterior_6 <- likelihood_6 * prior
  posterior_6 <- posterior_6 / sum(posterior_6)
  samples_6 <- sample(p_grid, prob=posterior_6, size=1e4, replace=TRUE )
  interval <- PI(samples_6, prob=0.99)
  names(interval) <- NULL
  diff( interval )
}

true_p <- 0.7

N <- 2000
compute_pi_width(N, true_p)
## [1] 0.05405405
# We will have to toss 2000 times.