I. Learning Bayesian inference

Let’s learn the difference between the frequentist and the Bayesian approach of reasoning according to the following article.

A gentle Introduction to Bayesian Inference

References

Bayes’ rule with a simple and practical example

Probability concepts explained: Maximum likelihood estimation

II. A short story

Frequentist Frank, Stubborn Stu, and Bayesian Betty visited a tent, where they met Clair Voyant, who claims to be a fortune teller.

They successively drew a card and asked Clair to name the color of the card; red or black. She answered right for three times in succession.

Stu’s explanation

I don’t care about the experiment. From my experience, it’s highly unlikely that she has psychic powers. If I had to quantify my belief, I would say that there is a 0.1% chance that she has these kind of powers.

Stu’s prior belief is mathematically expressed in the following equation.

\[ p(\theta = \textrm{fortune teller}) = 0.001 \]

\[ p(\theta = \textrm{not a fortune teller}) = 0.999 \]

Frank’s Explanation

Data is everything. She got 3 out of 3 right, so I say she clearly has psychic powers.

Frank has come to the conclusion via a Maximum Likelihood approach, where he maximized the likelihood formula with respect to 𝜃:

\[ p(data|\theta) \]

The two possible values for 𝜃:

Given that she is a fortune teller,

\[ P(3 \textrm{ right }|\textrm{ fortune teller})=1 \]

Given that she is an ordinary human,

\[ P(3 \textrm{ right }| \textrm{ not a fortune teller}) = (\frac{1}{2}) = \frac{1}{8} \]

Frank thinks she is a fortune teller because the probability of her getting three successive right answers is higher for a fortune teller than for an ordinary person.

Betty’s exlanation:

Both explanations are extreme in their own ways. Stubborn Stu disregards the collected data, which is clearly stupid. However, Frank pays too much attention to the data, as if nothing else exists in this world. Maybe Claire was just lucky, especially with only 3 trials.

Betty tries to find an explanation that connects the following three extremes:

  1. prior knowledge is everything,

  2. the truth is somewhere in between,

  3. observed data i everything.

I start off with a prior belief. Then, I look at the observed data and let each data point change my mind a little bit. The more data I observed, the further I can drift off my initial belief. This procedure results in my posterior belief.

\[ P(\theta |data) = \frac{p(data| \theta)·p(\theta)}{p(data)} \]

To be continued.