Conditional probability: For two events A and B \[ P(A \mid B) = \frac{P(A \cap B)}{P(B)} = \frac{P(AB)}{B}\] From the definition it follows that \[ P(AB) = P(B) P(A \mid B) = P(A) P(B \mid A)\]
and if \(\bar{A}\) is the complimentary event to \(A\) (\(P(A \cup \bar{A}) = 1\)) then \[ P(A \mid B) = \frac{P(A) P(B \mid A)}{P(B)} = \frac{P(A) P(B \mid A)}{P(A) P(B \mid A) + P(\bar{A}) P(B \mid \bar{A})}\]
Fact: Bayes Theorem. If \(H_1, ..., H_k\) are such that \(P(H_1 \cup ... \cup H_k) = 1\); \(H_i \cap H_j = \phi\) for \(i \ne j\), then \(P(H_i \mid B) = \frac{P(H_i) P(B \mid H_i)}{\sum_{i=1}^k P(H_i) P(B \mid H_i)}\)
\[ P(\theta \mid Data) = \frac{P(Data\mid \theta)P(\theta)}{P(Data)} \]
i.e.,
\[ Posterior \ Distribution \ = \ \frac{Likelihood \ \times \ Prior}{Data}\]
| Steps of Bayesian Statistics | Visualization |
|---|---|
| 1. Specify the model \(M_{\theta}\)2. Define prior \(F_{\theta}(x)\)3. From \(Data\) and model \(M_{\theta}\) find likelihood4. Find denominator by integrating over all \(\theta\)5. Find posterior \(F_{\theta}\) |
Both terms can only be calculated after the model generating the data is specified. For example, in the mammography experiment the model is binomial: the data are generated by binomial distribution with parameter \(\theta\) = \(p\) (equal to probability of disease)
According to CDC, probability of breast cancer among women of age 50-60 years old is 2.28%.
Screening using mammography has sensitivity (i.e. probability of detecting the disease correctly) 75%-94%. Specificity of the test (i.e. probability of correct “no disease” conclusion) is 83%-98%. Assume the most optimistic parameters of the test (94% and 98%, correspondingly for sensitivity and specificity). What is the probability that randomly selected woman with positive test result has the disease?
\(\Longrightarrow\) Need to calculate: \(P(\text{Disease} \mid \text{Test Positive})\) or \(P(\theta_1 \mid \tau_1) = \frac{P(\tau_1 \mid \theta_1)P(\theta_1)}{P(\tau_1 \mid \theta_1)P(\theta_1) + P(\tau_1 \mid \theta_0)P(\theta_0)}\)
#Prior p(theta1)
p.theta1 <- .0228
#Sensitivity: p(tau1|theta1)
p.tau1.theta1 <- .94
#Specificity: p(tau0|theta0)
p.tau0.theta0 <- .98
#Complimentary probabilities
p.theta0 <- 1-p.theta1
p.tau1.theta0 <- 1-p.tau0.theta0
#Bayes
(p.theta1.tau1 <- p.tau1.theta1*p.theta1/(p.tau1.theta1*p.theta1+p.tau1.theta0*p.theta0))
## [1] 0.5230379
Assume that randomly selected woman tested positive and then retested with negative test result.
After both tests what is the probability of disease?
\(\Longrightarrow\) Need to calculate: \(P(\text{Disease} \mid \text{Test Negative})\) where the \(P(\text{Disease})\) will be \(P(\text{Disease} \mid \text{Test Positive})\) in the previous information. Or \(P(\theta_1^{\prime} \mid \tau_0) = \frac{P(\tau_0 \mid \theta_1^{\prime})}{P(\theta_1^{\prime})}\) where \(P(\theta_1^{\prime}) = P(\theta_1 \mid \tau_1)\)
# Set the prior to the new probability of having the disease:
pDisease <- p.theta1.tau1
# Bayes rule for second, negative test:
(p.theta1.tau0 <- ( (1.0-p.tau1.theta1) * pDisease /
+ ( (1.0-p.tau1.theta1) * pDisease+ (1.0-p.tau1.theta0) * (1.0-pDisease) ) ))
## [1] 0.06291489
library(faraway)
data(babyfood)
babyfood
Select rows 1 and 3: boys bottle fed or breast fed.
(data <- babyfood[c(1,3),c(1:2,4)])
The simplest way of analysis, as we learned, was to use the prop test.
It means:
77 diseases out of 458 for bottle-fed baby boys
47 diseases out of 494 for breast-fed baby boys
Comparing the proportions and testing \(H_0: p_1=p_2\)
prop.test(as.matrix(data[,1:2]))
##
## 2-sample test for equality of proportions with continuity correction
##
## data: as.matrix(data[, 1:2])
## X-squared = 10.539, df = 1, p-value = 0.001169
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.02795236 0.11800878
## sample estimates:
## prop 1 prop 2
## 0.1681223 0.0951417
The null hypothesis is rejected.
Then we use logistic regression to find out how much the odds of having the disease change between bottle-fed and breast-fed babies.
mdl <- glm(cbind(disease,nondisease) ~ food, family=binomial, data)
summary(mdl)
##
## Call:
## glm(formula = cbind(disease, nondisease) ~ food, family = binomial,
## data = data)
##
## Deviance Residuals:
## [1] 0 0
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.5990 0.1249 -12.797 < 2e-16 ***
## foodBreast -0.6534 0.1978 -3.303 0.000955 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1.1235e+01 on 1 degrees of freedom
## Residual deviance: 1.1680e-13 on 0 degrees of freedom
## AIC: 15.591
##
## Number of Fisher Scoring iterations: 3
Take exponents to express the change in odds ratio.
exp(mdl$coefficients)
## (Intercept) foodBreast
## 0.2020997 0.5202650
Look at confidence intervals to see the significance and the range of accuracy of our conclusions.
library(MASS)
exp(confint(mdl))
## 2.5 % 97.5 %
## (Intercept) 0.1570976 0.2565573
## foodBreast 0.3511082 0.7636948
First, we apply Bayes theorem to address the same question, but not exactly within Bayesian approach.
Create joint distribution of two binary variables: (“disease”,“nondisease”) and (“bottle”,“breast”).
joint.dist<-data
joint.dist[1:2,1:2] <- joint.dist[1:2,1:2]/sum(joint.dist[1:2,1:2])
joint.dist
sum(joint.dist[1:2,1:2])
## [1] 1
Find marginal probabilities for the disease and the treatment.
(p.breast<-sum(joint.dist[2,-3]))
## [1] 0.5189076
(p.disease<-sum(joint.dist[,1]))
## [1] 0.1302521
Find conditional probability of breastfeeding, given that baby got the disease
\[ P(\text{breast} \mid \text{disease}) = \frac{P(\text{breast}, \text{disease})}{P(\text{disease})} \]
(p.breast.disease<-joint.dist[2,1]/p.disease)
## [1] 0.3790323
Finally, use Bayes theorem.
(p.disease.breast<-p.breast.disease*p.disease/p.breast)
## [1] 0.0951417
Could we calculate the same probability directly, not using Bayes theorem?
(p.disease.breast<-joint.dist[2,1]/p.breast)
## [1] 0.0951417
Why one would prefer using Bayes theorem rather than calculating the probability directly? It could follow the flow so feel more logical and easily understood.
This is not Bayesian approach. Even, we applied the ‘basic’ Bayes Theorem, but mostly we use the frequentists’ probability to calculate the results.
True Bayesian approach to the baby food problem would be logistic regression with the food type as a predictor and prior distribution for the parameters.
We will look at this model later.
Now use the true B-approach to estimation of probability of binomial distribution.
The data collected from observation of 952 baby boys of which 124 got the disease: this is a sequence of 124 ones and 828 zeros.
(Data<-c(rep(0,828),rep(1,124)))
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [260] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [297] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [334] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [371] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [408] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [445] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [482] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [519] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [556] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [593] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [630] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [667] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [704] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [741] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [778] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [815] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [852] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [889] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [926] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Probability of success is \(p\).
In the FNP-approach its estimate is \(\hat{p}\).
In the B-approach it is a random variable with prior distribution density \(f_p(x)\).
Then the B-approach means using Bayes theorem to find posterior \(f_p(p|Data)\).
We will practice this approach in the following section.
The following example uses script BernGrid.R from the book by John K. Kruschke.
Source the scripts DBDA2E-utilities.R and BernGrid.R available on the book site.
source("DBDA2E-utilities.R")
##
## *********************************************************************
## Kruschke, J. K. (2015). Doing Bayesian Data Analysis, Second Edition:
## A Tutorial with R, JAGS, and Stan. Academic Press / Elsevier.
## *********************************************************************
source("BernGrid.R")
Excert from Graph 5.1 on page 111 in [K]
pTheta is the prior distribution of triangular shape.Data is the observed number of successes \(\gamma\). Start with one flip in which \(\gamma\) = 1.pTheta and likelihood.likelihood and posterior distribution post, then run BenGrid() and compare the results.Set \(\theta\), the prior and the data.
(Theta = seq(0 ,1 ,length=5 )) # Sparse teeth for Theta.
## [1] 0.00 0.25 0.50 0.75 1.00
pTheta = pmin(Theta ,1-Theta ) # Triangular shape for pTheta.
(pTheta = pTheta/sum(pTheta)) # Make pTheta sum to 1.0
## [1] 0.00 0.25 0.50 0.25 0.00
Data = c(rep(0,0),rep(1,1)) # Single flip with 1 head
Function `pmin()` returns parallel minimum, i.e. minimum of 2 vectors `Theta` and `1-Theta` by position.
Likelihood: \(p(\gamma \mid \theta)=\theta^{\gamma} (1-\theta)^{(1-\gamma)}\)
Prior: \(p(\theta)=\theta\)
\(p(D) = \sum_{\theta^{*}} p(D \mid \theta^{*}) p(\theta^{*})\)
So, Post: \(p(\theta \mid \gamma) = \frac{p(\gamma \mid \theta) p(\theta)}{\sum_{\theta^{*}} p(D \mid \theta^{*}) p(\theta^{*})}\)
likelihood=Theta^Data * (1-Theta)^(1-Data)
post=pTheta*likelihood
post=post/sum(post)
Compare the results, make the plot.
likelihood
## [1] 0.00 0.25 0.50 0.75 1.00
post
## [1] 0.000 0.125 0.500 0.375 0.000
(posterior = BernGrid(Theta, pTheta, Data, plotType="Bars",
showCentTend="None" , showHDI=FALSE , showpD=FALSE ))
## [1] 0.000 0.125 0.500 0.375 0.000
Increase the number of values taken by \(\theta\) to 11.
Theta = seq( 0 , 1 , length=11 ) # Sparse teeth for Theta.
pTheta = pmin( Theta , 1-Theta ) # Triangular shape for pTheta.
pTheta = pTheta/sum(pTheta) # Make pTheta sum to 1.0
Data = c(rep(0,0),rep(1,1)) # Single flip with 1 head
posterior = BernGrid(Theta, pTheta, Data, plotType="Bars",
showCentTend="None", showHDI=FALSE, showpD=FALSE )
Which of the functions has more influence on the posterior: prior or likelihood?
- Currently, it looks like the prior influences the posterior.
Make prior with triangular shape.
Make the number of Bernoulli trials equal to 4, of which one outcome is Head (\(\gamma\)=1) and three outcomes are Tails.
Calculate likelihood and posterior yourself, compare with BernGrid().
Theta = seq(0, 1, length=1001) # fine grid for Theta.
pTheta = pmin(Theta, 1-Theta) # Triangular shape for pTheta.
pTheta = pTheta/sum(pTheta) # Make pTheta sum to 1.0
Data = c(rep(0,3),rep(1,1)) # 25% heads, N=4
likelihood=Theta^sum(Data) * (1-Theta)^(4-sum(Data))
post=pTheta*likelihood
post=post/sum(post)
plot(Theta,pTheta)
plot(Theta,likelihood)
plot(Theta,post)
posterior = BernGrid(Theta, pTheta, Data, plotType="Bars",
showCentTend="Mode" , showHDI=TRUE, showpD=FALSE)
Increase the number of Bernoulli trials to 40. Keep the proportion of heads to tails in the data as 3/1.
Theta = seq( 0 , 1 , length=1001 ) # fine grid for Theta.
pTheta = pmin( Theta , 1-Theta ) # Triangular shape for pTheta.
pTheta = pTheta/sum(pTheta) # Make pTheta sum to 1.0
Data = c(rep(0,30),rep(1,10)) # 25% heads, N=40
posterior = BernGrid( Theta, pTheta , Data , plotType="Bars" ,
showCentTend="Mode" , showHDI=TRUE , showpD=FALSE )
Increase the number of values of \(\theta\) to 1001. Make prior uniform on [0,1].
Theta = seq( 0 , 1 , length=1001 ) # fine grid for Theta.
pTheta = rep(1,length(Theta)) # Uniform (horizontal) shape for pTheta.
pTheta = pTheta/sum(pTheta) # Make pTheta sum to 1.0
Data = c(rep(0,0),rep(1,1)) # Single flip with 1 head
likelihood=Theta^Data * (1-Theta)^(1-Data)
post=pTheta*likelihood
post=post/sum(post)
plot(Theta,pTheta, pch=20, lwd=.25)
plot(density(likelihood), lwd=4)
plot(density(post), lwd=4)
Which of the functions has more influence on the posterior: prior or likelihood?
posterior = BernGrid( Theta, pTheta , Data , plotType="Bars" ,
showCentTend="None" , showHDI=FALSE , showpD=FALSE )
Smear the prior to almost flat shape by taking pTheta to the power of 0.1.
Theta = seq(0, 1, length=1001) # fine grid for Theta.
pTheta = pmin(Theta , 1-Theta) # Triangular shape for pTheta.
pTheta = pTheta/sum(pTheta) # Make pTheta sum to 1.0
pTheta = pTheta^0.1 # Flatten pTheta !
pTheta = pTheta/sum(pTheta) # Make pTheta sum to 1.0
Data = c(rep(0,3),rep(1,1)) # 25% heads, N=4
posterior = BernGrid(Theta, pTheta, Data, plotType="Bars",
showCentTend="Mode", showHDI=TRUE,
showpD=FALSE )
Which of the functions has more influence on the posterior: prior or likelihood?
Flatten the prior, given the same data.
Theta = seq(0, 1, length=1001) # fine grid for Theta.
pTheta = pmin(Theta, 1-Theta) # Triangular shape for pTheta.
pTheta = pTheta/sum(pTheta) # Make pTheta sum to 1.0
pTheta = pTheta^0.1 # Flatten pTheta !
pTheta = pTheta/sum(pTheta) # Make pTheta sum to 1.0
Data = c(rep(0,30),rep(1,10)) # 25% heads, N=40
posterior = BernGrid(Theta, pTheta, Data, plotType="Bars",
showCentTend="Mode", showHDI=TRUE, showpD=FALSE)
Make the prior concentrated at 0.5 by taking it to the power of 10.
Theta = seq( 0 , 1 , length=1001 ) # fine grid for Theta.
pTheta = pmin( Theta , 1-Theta ) # Triangular shape for pTheta.
pTheta = pTheta/sum(pTheta) # Make pTheta sum to 1.0
pTheta = pTheta^10 # Sharpen pTheta !
pTheta = pTheta/sum(pTheta) # Make pTheta sum to 1.0
Data = c(rep(0,3),rep(1,1)) # 25% heads, N=4
posterior = BernGrid( Theta, pTheta , Data , plotType="Bars" ,
showCentTend="Mode" , showHDI=TRUE , showpD=FALSE )
Which of the functions has more influence on the posterior: prior or likelihood?
In this case, the answer is the prior.
With the same number and ratio of observations make the prior condensed around 0.5.
Theta = seq( 0 , 1 , length=1001 ) # fine grid for Theta.
pTheta = pmin( Theta , 1-Theta ) # Triangular shape for pTheta.
pTheta = pTheta/sum(pTheta) # Make pTheta sum to 1.0
pTheta = pTheta^10 # Sharpen pTheta !
pTheta = pTheta/sum(pTheta) # Make pTheta sum to 1.0
Data = c(rep(0,30),rep(1,10)) # 25% heads, N=40
posterior = BernGrid( Theta, pTheta , Data , plotType="Bars" ,
showCentTend="Mode" , showHDI=TRUE , showpD=FALSE )
Make prior distribution giving probabilities 0.5 to \(\theta\) = 0, 1 and probabilities 0 to all other values of \(\theta\).
Theta = seq(0 ,1 , length=1001) # fine grid for Theta.
pTheta = rep(0, length(Theta)) # Only extremes are possible!
pTheta[2] = 1 # Only extremes are possible!
pTheta[length(pTheta)-1] = 1
pTheta = pTheta/sum(pTheta) # Make pTheta sum to 1.0
Data = c(rep(0,0),rep(1,1)) # Single flip with 1 head
posterior = BernGrid(Theta, pTheta, Data, plotType="Bars",
showCentTend="None", showHDI=FALSE, showpD=FALSE)
Compromise between the prior and likelihood resulted in concentration of the posterior at the observed value (maximum likelihood).
Make the prior bimodal, the data generated by 27 Bernoulli trials with 13 tails and 14 heads.
Theta = seq( 0 , 1 , length=1000 ) # fine grid for Theta.
# Two triangular peaks on a small non-zero floor:
pTheta = c( rep(1,200),seq(1,100,length=50),seq(100,1,length=50),rep(1,200) ,
rep(1,200),seq(1,100,length=50),seq(100,1,length=50),rep(1,200) )
pTheta = pTheta/sum(pTheta) # Make pTheta sum to 1.0
Data = c(rep(0,13),rep(1,14))
posterior = BernGrid( Theta, pTheta , Data , plotType="Bars" ,
showCentTend="None" , showHDI=FALSE , showpD=FALSE )
Bayesian inference is intuitively rational: With a strongly informed prior that uses a lot of previous data to put high credibility over a narrow range of parameter values, it takes a lot of novel contrary data to budge beliefs away from the prior. But with a weakly informed prior that spreads credibility over a wide range of parameter values, it takes relatively little data to shift the peak of the posterior distribution toward the data (although the posterior will be relatively wide and uncertain).
What, understand! Above, not my words. But in real applications a sharp prior has been informed by practically prior knowledge that we would be loath to move away from without having numerous contrary data.
Look again at the example in section Binomial model, uniform prior with 1001 values with uniform prior.
Follow the steps of obtaining posterior distribution by Monte Carlo.
Define likelihood function for binomial distribution.
likeli <- function(par, data){
sdata <- sum(data)
ldata <- length(data)
return(par^sdata*(1-par)^(ldata-sdata))
}
Define values of parameter \(\theta\) and prior distribution.
Theta = seq(.00001, 1-.00001, length=1001) # fine grid for theta
pTheta = rep(1,length(Theta)) # Uniform (horizontal) shape for pTheta.
pTheta = pTheta/sum(pTheta) # Make pTheta sum to 1.0
plot(Theta,pTheta)
Create data of length 5 generated by the model with parameter 0.84.
set.seed(15)
(Data<-rbinom(5,size=1,prob=.84))
## [1] 1 1 0 1 1
Create sample of \(\theta\) generated from the prior distribution.
set.seed(15)
priorInd <- sample(1:length(Theta),500,replace = T)
priorSample <- cbind(Theta=Theta[priorInd],Prob=pTheta[priorInd])
priorSample <- rbind(priorSample,
c(head(Theta,1),head(pTheta,1)),
c(tail(Theta,1),tail(pTheta,1)))
We would learn from an example from Section 6.4.1 of [K].
Load the definition of the BernBeta function
source("BernBeta.R")
A coin known to be regular gets 20 flips of which 17 or 85% are heads.
Despite this there is a strong prior belief that the coin is fair with concentration of 500 observations. What is your posterior believe about the probability of heads?
Express prior belief in terms of parameters of beta distribution.
The prior is very strong and characterized by mode $ = 0.5$ and concentration \(\kappa = 500\).
Use formulas given in my website to convert the mode and concentration into shape parameters of beta distribution.
# Specify the prior:
t = 0.5 # Specify the prior MODE.
n = 500 # Specify the effective prior sample size.
a = t*(n-2) + 1 # Convert to beta shape parameter a.
b = (1-t)*(n-2) + 1 # Convert to beta shape parameter b.
Form the prior shapes parameters vector.
(Prior = c(a,b)) # Specify Prior as vector with the two shape parameters.
## [1] 250 250
Define the data of size 20 and number of heads 17. Note that Data should be a sequence of zeros and ones.
# Specify the data:
N = 20 # The total number of flips.
z = 17 # The number of heads.
(Data = c(rep(0,N-z),rep(1,z))) # Convert N and z into vector of 0's and 1's.
## [1] 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Use the same function for calculating binomial likelihood as in the previous
likeli <- function(par,data){
sdata <- sum(data)
ldata <- length(data)
return(par^sdata*(1-par)^(ldata-sdata))
}
Define the grid of values for \(\theta\).
Define the prior as beta distribution with the shape parameters in Prior. Plot prior.
Theta <- seq(0, 1, length=1001) # Make grid of Theta.
pTheta <- dbeta(Theta, shape1=Prior[1], shape2=Prior[2]) # Beta prior distribution.
plot(Theta, dbeta(Theta, shape1=Prior[1], shape2=Prior[2]), ylab="Prior")
Calculate likelihood for Data, given values in Theta.
likelihood<-likeli(Theta,Data)
plot(Theta,likelihood)
Calculate posterior distribution for \(\theta\) using grid approximation. We need to normalize the posterior distribution to make it add up to 1.
post <- pTheta*likelihood
post <- post/sum(post)*1000
Then, plot the posterior.
plot(Theta, post)
Find the mode of the posterior.
(mode<-Theta[which.max(post)])
## [1] 0.514
Calculate posterior using the analytical formula.
postBeta <- dbeta(Theta, shape1 = z+Prior[1], shape2 = N-z+Prior[2])
plot(Theta, postBeta)
(mode<-Theta[which.max(postBeta)])
## [1] 0.514
We can run script from [K]
posterior <- BernBeta(priorBetaAB=Prior, Data=Data, plotType="Bars",
showCentTend="Mode", showHDI=TRUE, showpD=FALSE)
Conclusion.
Very strong prior knowledge expressed by mode of 0.5 and concentration of 500 could not be affected by a sample of 20 coin flips, even if estimated frequency from these data shows probability of 85% for heads.
A professional basketball player makes 20 free throws.
We know that professionals tend to make 75% of free throws, majority makes 50% or more, but very few make more than 90%.
This knowledge is reflected in a beta distribution with mode \(\omega = 0.75\) and concentration \(\kappa = 25\).
Such distribution has 95% highest density interval (HDI) approximately between 0.5 and 0.9.
Think how you would define concentration to satisfy the requirement that 95% HDI equals [0.5,0.9].
\(\Rightarrow\) What is your posterior believe this time?
Follow the same steps as in Situation 1.