Intro to Bayesian data analysis

Introduction

This is intro to Bayesian data analysis, which I and probably any others find quite intuitive and more realistic method for data anaysis than more classical methods. Bayes’ rule itself is reasonably simple (even to math allergic people):

\[ P(A|B)=\frac{P(B|A)P(A)}{P(B)} =\frac{P(B|A)P(A)}{P(B|A)P(A)+P(B|\lnot A)P(\lnot B)} \]

Now let’s eat this soup. Notation:

event A, in this example this event means having a disease
event B, in this example this event means having a positive medical test
event ¬A, means not having event/condition A, in this example not having a disease
event ¬B, means not having event/condition B, in this example not having a positive medical test

First P(A|B) means probability of event A given B has happened. In this example probability that you have a disease given you medical test was positive. P(B|A) stands for probability that you have a positive test (B) given you have a disease (A). P(A) stands for overall probabilty to have a disease and P(B) stands for overall probability to have a positive test. P(B|¬A) stands for probability of positive test given you don’t have a disease (false positive rate) and P(¬B) stands for probability that you don’t have a disease.

Example. Let’s assume that there is a rare disease which only 0.1% of the population has. If you have a disease than medical test will correctly classify you as a infected 99% of the time (this is called sensitivity). 5% of people who don’t have this disease will be falsely classified as infected (false positive rate). If your test is positive what is the probability that you have the disease?

Solution \[ P(A|B)=\frac{0.001*0.99}{0.001*0.99+0.05*(1-0.001)}=0.019 \]

This is classical example how intuition might ˇturn us down. Without calculation you probably thought that probability is high, but it is actually 1.9%. It is much more probabale that test gave false positive result. If you want to get to know Bayes rule just Google you’ll find tons of stuff for it.

Explanation

Firstly why Bayes rule is much used is because it uses logic we use every day. First we have some prior about some event (how likely it is that I am infected, about 0.1%). When we gather some new evidence (make medical test) we calculate posterior probability. This what Bayes rule components mean:

P(A) - prior probability (probablity that I have a disease given there is no more information)
P(B|A) - likelihood, probability that I have a disease given I got positive test (this is the part where we take new data into consideration).
P(B) - evidence/marginal likelihood. Refers to operation of taking the average of the likelihoodP(B|A), across all values of A weigthed by the prior probavility of A. Basivally normalization factor, which helps to ensure that probabilities sum to 1. If this is not important, this factor might be left out.

More graphic

Coin

Let’s have an example with coin. We have some prior knowledge that euro coins are fair (their average value on the long run is 0.5, if 0=tail and 1=head). Graphically this is shown like this:

Theta = seq( 0 , 1 , length=100)  #possible set of average coin values over long run
pTheta = pmin( Theta , 1-Theta) #make it coin value probabilites triangle shape, because we believe that coin is fail (average value=0.5)
pTheta = pTheta/sum(pTheta) #normalize probabilities (so they sum up to 1)
Results=data.frame(Theta, pTheta) #trun it into data frame
#plot it
library(ggplot2)
ggplot(Results, aes(x=Theta, y=pTheta))+
  geom_bar(stat="identity", fill="lightblue")+
  theme_minimal()+
  ylab("Probability")

From the plot we see that most our prior knowledge (data) shows that coins are usually fair. If we toss them than over the long run probability of getting a head (1) or a tail (0) is 50% (0.5). Other values are less probable.

Next phase is to toss a coin a once and see what the results are:

Data = c(rep(0,0),rep(1,1)) #toss a coin once
z = sum( Data ) # number of 1's in Data
N = length( Data )#number of flips
# Compute the Bernoulli likelihood at each value of Theta:
Results$pDataGivenTheta = Results$Theta^z * (1-Results$Theta)^(N-z)
#plot results
library(ggplot2)
ggplot(Results, aes(x=Theta, y=pDataGivenTheta))+
  geom_bar(stat="identity", fill="lightblue")+
  theme_minimal()+
  ylab("Probability")

We got a head (1) on this flip. So we know that this coin has a head. Because we haven’t seen any tails yet we give bigger probability that this coin gives more heads. But because we ave just made 1 toss, we can’t really conclude much. Maybe this coin has two heads, but maybe not. What is in this plot is likelihood as described before.

Now let’s see what do we conclude from prior and likelihood:

# Compute the evidence and the posterior via Bayes' rule:
Results$pData = sum( Results$pDataGivenTheta * Results$pTheta )
Results$pThetaGivenData = Results$pDataGivenTheta * Results$pTheta / Results$pData
#plot it
library(ggplot2)
ggplot(Results, aes(x=Theta, y=pThetaGivenData))+
  geom_bar(stat="identity", fill="lightblue")+
  theme_minimal()+
  ylab("Probability")

Posterior probavility still suggests that 0.5 is most probable value. But because we got one head, we think that coin might be head biased. But we are not so sure (1 flip of a coin is really not much to conclude something very accurately).

Let’s say we flip a coin 100 times:

Data = sample(0:1,100, replace=T) #toss a coin 100x
z = sum( Data ) # number of 1's in Data
N = length( Data )#number of flips
# Compute the Bernoulli likelihood at each value of Theta:
Results$pDataGivenTheta = Results$Theta^z * (1-Results$Theta)^(N-z)
#plot results
library(ggplot2)
ggplot(Results, aes(x=Theta, y=pDataGivenTheta))+
  geom_bar(stat="identity", fill="lightblue")+
  theme_minimal()+
  ylab("Probability")

Now we have a much bigger sample, as we can see we are pretty sure that mean value of a coin tosses will be around 0.5. There is some random noise so it might not be exactly 0.5.

Our posterior probability is now:

# Compute the evidence and the posterior via Bayes' rule:
Results$pData = sum( Results$pDataGivenTheta * Results$pTheta )
Results$pThetaGivenData = Results$pDataGivenTheta * Results$pTheta / Results$pData
#plot it
library(ggplot2)
ggplot(Results, aes(x=Theta, y=pThetaGivenData))+
  geom_bar(stat="identity", fill="lightblue")+
  theme_minimal()+
  ylab("Probability")

Now we are more confident that mean value (coin “bias”) will be around 0.5. As you can see 0.5 is not most probable, but in this time it is because we have still relatively small sample and random noise produces some fluctuation. It is better to have intervale where we estimate most probably result will be:

min=sum(cumsum(Results$pThetaGivenData)<0.025)/100
max=sum(cumsum(Results$pThetaGivenData)<0.975)/100

ggplot(Results, aes(x=Theta, y=pThetaGivenData))+
  geom_bar(stat="identity", fill="lightblue")+
  theme_minimal()+
  geom_vline(xintercept = min)+
  geom_vline(xintercept = max)+
  ylab("Probability")

Between vertical black lines is a region where with 95% of probability true theta value (coin mean value) will be.

Conclusion

I hope that you got some insight how Bayesian approach to data analysis works. This is just the beginning so to get more parctical value you should find more materials to study Bayesian approach more deeply.

Intro to Bayesian data analysis

Risto Hinno

aprill 2016

Introduction

More graphic

Conclusion