Why Bayesian Inference ?

“What we know is not much. What we do not know is immense” - Marquis Simon de Laplace

What is Bayesian Inference?

Bayesian inference techniques specify how one should update one’s beliefs upon observing data.
Bayesian inference deals with how does new information (data/test/observation) changes the odds of something happening (belief/disease)?
Statement usually of the form if Observation then Belief
- OBSERVATION = SOMETHING WE KNOW
- BELIEF = SOMETHING WE DO NOT KNOW

Prior & Posterior

Prior
- involves Belief only (Hypothesis: P(H))
- Before observation
Posterior
- Involves both belief and observation (Evidence: P(H|E))
- After Observation

Baye’s Theorem Formal Definition

P(H|E)= P(E|H) * P(H) / P(E)

H is the hypothesis (I have food poisoning)
P(H) is the prior probability (the probability of having food poisoning)
E is the evidence (I ate shellfish)
P(H|E) is the posterior probability (the probability of having food poisoning if I ate shellfish)
P(E|H) is the likelihood of observing E if H is true (probability of having eaten shellfish if I have food poisoning)
P(E) is the model evidence: the probability of E under all possible hypothesis (what is the probability of having eaten shellfish)

Let’s apply Bayesian Inference on an Example

Finding the thief

Copies of the ClassXYZ Final Exam have been stolen from Prof. M’s office. She wants to find the culprit. Luckily, she asked the students to fill a survey which turns out to be pretty useful, as she can use it to narrow down the suspects.

Let’s load the the survey data that we will be working on You can learn more about the dataset here: http://www.amstat.org/publications/jse/v20n3/mclaren/shoesize.xls

#Load the dataset
data<-read.csv("~/Downloads/Datasets/classXYZstudents.csv")
head(data)

##   ID Gender Size Height                   Name
## 1  1      F  5.5     60   Nguyen-Huynh, Bianca
## 2  2      F  6.0     60   Baumgartner, Freddie
## 3  3      F  7.0     60 Gomez-Gonzales, Kendra
## 4  4      F  8.0     60        Naranjo, Alexis
## 5  5      F  8.0     60   al-Farooq, Mahfoodha
## 6  6      F  9.0     60        Martinez, Sable

Clues

Like any other mystery, here too we are given a set of clues.

There is a footprint on the floor. It is size 11.
The TA heard a voice and is 80% sure that the thief is female
The exams were on a high shelf. Someone shorter than 6 feet would have had a tough time reaching that shelf.

Assume that means that someone taller than 6 feet can reach the shelf. Someone between 5’8 and 6 feet has a 75% probability of finding a way to reach it (heels?), someone shorter than 5’8 cannot.

Prior: What is the probability that the thief is female before looking at any evidence

What is the probability that if you were to randomly select a student from class, the gender of that student would be female?

priors<-table(data$Gender)
priors

## 
##   F   M 
## 187 221

prior = priors['F']/(priors['F']+priors['M'])
prior

##         F 
## 0.4583333

Looking at this we can say that: Without any other information, there is a 45.83% chance that the thief is female P(H) = 0.4583333

Clue 1. The footprint on the floor is size 11.

Here we have our first evidence, E Let us calculate P(E|H)

p_e1<-table(data[data$Size==11,]$Gender)
p_e1

## 
##  F  M 
##  6 58

6 Female and 58 Male in the class have size 11

posterior1 = p_e1['F']/(p_e1['F']+p_e1['M'])
posterior1

##       F 
## 0.09375

Clue 2. The TA is 80% sure that the thief is female

Posterior, given the TA testimony The TA is 80% sure P(E|H) is 0.8 The TA will say female for (P(E)):

p_e2<-(priors['F']*0.8 + priors['M']*0.2)/(priors['M']+priors['F'])
p_e2

##     F 
## 0.475

In this case, (P(E)=0.475 is the probability that the TA will say that the thief is female. It happens in two ways: The thief is female and the TA was correct (0.8P(female)) The thief is males and the TA made a mistake ((1-0.8)P(male))

So the probability that the thief is female given the TA’s info is? P(H|E) = P(H)*P(E|H)/P(E)

posterior2<-prior*0.8/p_e2
posterior2

##         F 
## 0.7719298

Clue 3. The exams were on a high shelf. Someone shorter than 6 feet would have had a tough time reaching that shelf.

The probability that a female can reach the shelf P(E|H) is the probability the a female is taller than 6‘ (72 inches), plus 75% times the probability that a female is between 5‘8 and 6‘ (68 and 72 inches)

We need to calculate 2 things here, first what are the chances for any student to reach the shelf and what are the chances for a female student to reach the shelf given the constraints in the clue

data$height_distribution<-as.factor(cut(data$Height,breaks = c(0,67,71,max(data$Height)), labels = c("lt_68","lte_72",'gte_72')))
head(data)

##   ID Gender Size Height                   Name height_distribution
## 1  1      F  5.5     60   Nguyen-Huynh, Bianca               lt_68
## 2  2      F  6.0     60   Baumgartner, Freddie               lt_68
## 3  3      F  7.0     60 Gomez-Gonzales, Kendra               lt_68
## 4  4      F  8.0     60        Naranjo, Alexis               lt_68
## 5  5      F  8.0     60   al-Farooq, Mahfoodha               lt_68
## 6  6      F  9.0     60        Martinez, Sable               lt_68

update2<-table(data$Gender,data$height_distribution)
update2

##    
##     lt_68 lte_72 gte_72
##   F   149     32      6
##   M    20    103     98

Things are easier after getting this table

p_f3<-(update2['F','lte_72']*0.75+ update2['F','gte_72']*1+update2['F','lt_68']*0)/priors['F']
p_f3

##         F 
## 0.1604278

We have calculated, the probability that a female can reach the shelf P(E|H) is %p_f%

Now let’s look at the probability of any student can reach the shelf

p_e3<-nrow(data[(data$Height>=72),]) /nrow(data) + nrow(data[ (data$Height>=68)&(data$Height<72),]) /nrow(data)*0.75
p_e3

## [1] 0.5012255

So the probability that the thief is female given the shelf reachability constrint info is? P(H|E) = P(H)*P(E|H)/P(E)

posterior3<-(prior*p_f3)/(p_e3)
posterior3

##         F 
## 0.1466993

Therefore, our final posterior probability turns out to be 0.1466993

Combining the clues

“Today’s posterior is tomorrow’s prior”

After Clue 1, P(female|footprint=11) = 0.09375

So if we use this as our prior for clue 2, the new posterior is:

updated_posterior2<-posterior1*0.8/p_e2
updated_posterior2

##         F 
## 0.1578947

Therefore, we have P(female|TA testimony&footprint=11) = 0.1578947

Using this as our prior for clue 3, the new posterior is:

updated_posterior3<-updated_posterior2*p_f3/p_e3
updated_posterior3

##          F 
## 0.05053755

Therefore, we have P(female|TA testimony&footprint=11& high shelf) = 0.05053755 = 5%

Recitation 08 - Bayesian Inference

Janish Parikh

10/26/2021