01-problem-set.knit

-Joseph Bakita: -800990986:

Question1

Your friend just became interested in Bayesian statistics. In one paragraph or less (nocode),explain the following to them: * Why/when is Bayesian statistics useful? * What are the similarities in Bayesian and frequentist statistics?

Bayesian statistics is most useful when there exists intuition or understanding of the data prior to running the model where classical or machine models would not be able to discover. These probabilistic methods shed light on distributions that can be used to extrapolate in more detail the uncertainty behind observations and variables than other methods. They do share concepts of distributions and frequencies commonplace in frequentism, alongside ideas of Gaussian approximations that are similar at a base level.

Question 2

Suppose the globe tossing data (Chapter 2) had turned out to be 4 water and 11 land. Construct the posterior distribution, using grid approximation. Use the same flat prior as in the book. Plot the posterior. Use 1000 grid approximations and set the set as set.seed(100).

set.seed(100)

p2 <- seq( from=0 , to=1 , length.out=1000) #flat prior

q2 = dbinom(4 , size=15 , prob=p2) #posterior (chance of 4 waters given N=15 and p=p2[i])

q2b = q2 / sum(q2) #q2 scaled such that q2b is a pdf

plot(p2, q2b, type="l", ylab = "Likelihood")

Behold! A graph showing the most likely values of p given that we observe 4 waters (and 11 lands) given 15 tosses in our simulation. Choose value \(0\leq p \leq 1\) (at the precise value to 3 decimal points) and q2b[1000*p] will give you the probability. For example, the likelihood of p=.3:

q2b[300]

## [1] 0.00350726

or that \(p\ge .7\):

sum(q2b[700:1000])

## [1] 0.0002733507

showing that it is more than 10x as likely that p=.300 precisely than for p \(\ge .7\).

The distribution appears to be approximately Gaussian (Normal) with a right skew (the left side of the crest is significantly steeper than the right). I will go into more detail on the choice (or, more accurately, lack of choice) of prior in the next question.

Question 3

Now suppose the data are 4 water and 2 land. Compute the posterior again, but this time use a prior that is zero below p = 0.5 and a constant above p = 0.5. This corresponds to prior information that a majority of the Earth’s surface is water. Plot the new posterior. Use 1000 grid approximations and set the set as set.seed(100).

p3 <- ifelse( p2 < 0.5 , 0 , 1) #prior that vets all probabilities < .5

q3 = dbinom( 4 , size=6 , prob=p2)

q3a = p3*q3  #prior multiplied by distribution to produce posterior

q3b = q3a / sum(q3a)

plot(p2, q3b, type="l", ylab = "Likelihood")

The graph works identically to the graph in Question 3. The parameters for the binomial coefficients (in this case 4 water with N=6) and, more interestingly, a new prior. In the previous question, a flat prior was used, symbolizing the assumption of no factors. In contrast, now their is a prior that multiplies all water/land ratios < 1 by 0, and 1 otherwise. This represents a knowledge or hypothesis that our globe has more water than land. Now, our model reflects our assumptions and scales our likelihood indicators accordingly. To illustrate, the blue graph is the model, just with the prior replaced with a flat prior.

plot(p2, q3b, type="l", ylab = "Likelihood")
lines(p2,q3/sum(q3), col="blue")

Question 4

For the posterior distribution from 3, compute 89% percentile and HPDI intervals. Compare the widths of these intervals. Which is wider? Why? If you had only the information in the interval, what might you misunderstand about the shape of the posterior distribution?

library(rethinking)

## Loading required package: parallel

## rethinking (Version 2.13.2)

## 
## Attaching package: 'rethinking'

## The following object is masked from 'package:stats':
## 
##     rstudent

s4=sample(p2 , size=1e4 , replace=TRUE , prob=q3b)
PI(s4, prob=.89)

##        5%       94% 
## 0.5245245 0.8798799

HPDI(s4, prob=.89)

##     |0.89     0.89| 
## 0.5005005 0.8388388

Before the percentile and HPDI intervals are even calculate, it is inferred the percentile interval cannot be thinner than the HPDI interval. The HPDI interval is defined as the densest observations for a given bracket, compared to the percentile interval, whose width is derived from the count of observations from the mean. Accordingly, the percentile interval [.527,.881] has apx. width .354 compared to the HPDI interval [.501,.842]’s apx. width of .341. By themselves, they provide a very small slice of information. It provides no information on the actual shape of the distribution or the fact that the HPDI lower bound lies on the decision boundary.

Optional (not graded)

Suppose there is bias in sampling so that Land is more likely than Water to be recorded. Specifically, assume that 1-in-5 (20%) of Water samples are accidentally recorded instead as “Land”. First, write a generative simulation of this sampling process. Assuming the true proportion of Water is 0.70, what proportion does your simulation tend to produce instead? Second, using a simulated sample of 20 tosses, compute the unbiased posterior distribution of the true proportion of water.

p5 = seq(from=0 , to=1 , length.out=20) #shared prior

q5 = dbinom(14, size=25, prob=p5)
q5a =  q5 / sum(q5) #green posterior

q5b = dbinom(14, size = 20, prob = p5)
q5c = q5b/sum(q5b) #red posterior

plot(p5, q5a, type="l", ylab = "Likelihood", col="green")
lines(p5, q5c, ylab = "Likelihood", col="red")

s5a=sample(p5 , size=20 , replace=TRUE , prob=q5a)
s5b=sample(p5 , size=20 , replace=TRUE , prob=q5c)

The biased model in green is constructed a lot like the previous models from the assignment, except with a small calculation done prior. With the knowledge that \(1-p(W) = P(L) = P(L|True) + P(L|False)\), \(P(L|False) = .25P(W)\), and \(P(L|True) = .3\), \(P(W)\) can be derived as .56 or \(\frac{14}{25}\). The green graph is the probability of 14 waters given \(N=25\), while the red graph gives the unbiased which is conveniently identical except that it is measuring the probability of 14 waters given \(N=20\). The two plots almost look like they are staring into a mirror. Should we force a normal approximation, red would have mean and variance of

mean(s5a , adj=0.01 )

## [1] 0.5657895

var(s5a)

## [1] 0.008128007

and green would have

mean(s5b , adj=0.01 )

## [1] 0.6921053

var(s5b)

## [1] 0.005911941