Parsimony and Likelihood (cont'd)

M. Drew LaMar
February 20, 2019

“If all the statisticians in the world were laid head to toe, they wouldn't be able to reach a conclusion.”
- Anonymous

Likelihood Theory

Definition: The Likelihood of model parameters \( \theta \) given the model (\( g \)) and data (\( x \)) is given by: \[ \mathcal{L}(\theta | x, g) \]

Note: Likelihood theory describes how to find the most likely parameters of a model that fit the data the best.

Likelihood Theory: Example

Suppose you observe 10 coin flips and you see the following result:

H H H H H H T T T H

Discuss: What’s the most likely value for the probability of heads on an individual coin flip?

Let \( p \) denote the probability of getting heads.

This follows what is known as a binomial model, with the probability of getting 7 heads out of 10 given by:

\[ \mathrm{Prob}(7 \ \textrm{heads}) = \left(\begin{array}{c}10 \\ 7\end{array}\right)p^{7}(1-p)^{3} \]

Likelihood Theory: Example

In this example \[ \mathcal{L}(\theta | x, g) = \left(\begin{array}{c}10 \\ 7\end{array}\right)p^{7}(1-p)^{3} \] we have

The model \( g \) is the binomial model
The data \( x \) is the number of coin flips (10) and number of heads (7). In other words, it's what we observed.
The unknown parameter \( \theta \) is \( p \)

Likelihood Theory: Example

\[ \mathcal{L}(p | 10, 7; \mathrm{binomial}) = \left(\begin{array}{c}10 \\ 7\end{array}\right)p^{7}(1-p)^{3} \]

plot of chunk unnamed-chunk-2

Likelihood Theory: Example

The most likely parameter given the model and data is where the likelihood function is maximized.

It's usually easier to deal with summation rather than products, so we look at the log-likelihood function instead:

\[ \log(\mathcal{L}(\theta | g, x)) \]

which in our case becomes

\[ \log\left(\begin{array}{c}10 \\ 7\end{array}\right) + 7\log p +3\log (1-p) \]

Likelihood Theory: Example

Log-likelihood function:

plot of chunk unnamed-chunk-3