Maximum Likelihood Estimate

11/5/2020

What is a Maximum Likelihood Estimate?

The Maximum Likelihood Estimate (MLE) is a method for estimating parameters for a probability distribution by maximizing the likelihood function.

The likelihood function is the density function of \(\theta\): \[ \textbf{L}(\theta | x) = \textbf{f}(x|\theta) \]
Where the maximum likelihood estimator (MLE) is: \[ \hat \theta (x) = \text{arg } \underset{\theta}{\text{max}} \textbf{L}(\theta | x) \]

Gamma Density Function

Observing \(n\) independent data points \(X=[x_1, x_2 \dots x_n]\) from \(\theta\). We restrict \(\theta\) to \((\alpha,\beta)\). In general, the mean is \(\alpha\beta\), and the mode is \((\alpha-1)\beta\).

Log-Likelihood Density

Typically, however, maximizing the logarithm of the likelihood is much easier, \(ln(\textbf{L}(\theta|x)\).

Estimating the population (\(\hat \theta_{MLE}\)) for the Gamma distribution

\[f(x;k,\theta) = \begin{cases}{1\over \Gamma(k)\,\theta^k} \, x^{k-1}e^{-{x_i \over \theta}} &, x > 0 \\ \hspace{1cm} 0 &, \text{elsewhere}\end{cases} \\\]

\[ \\ P(x_i|\theta) = {1\over \Gamma(k)\,\theta^k}, x^{k-1}e^{-{x_i \over \theta}} = \frac{1}{\theta^k}\cdot x^{k-1}\cdot e^{-\frac{x_i}{\theta}} \\ \textbf{L}(\theta|x) = \prod^n_{i=1}{P(x_i|\theta)} \Rightarrow \ell(\theta|x) = \sum^n_{i=1}{ln(P(x_i|\theta))} \\ = \sum^n_{i=1}(-kln(\theta)+(k-1)ln(x_i)-\frac{x_i}{\theta}) \Rightarrow \sum^n_{i=1}(-\frac{k}{\theta}+\frac{x_i}{\theta^2})=0 \\ \dots \text{continued on next slide} \]

Estimation of \(\hat \theta_{MLE}\) continued

\[ \Rightarrow -\frac{k}{\theta}+\frac{\sum^n_{i=1}x_i}{\theta^2} = 0 \Rightarrow \frac{1}{kn}\cdot \sum^n_{i=1}x_i = \frac{\bar x}{k} \] Therefore, we can see that \(\hat \theta_{MLE} = \frac{\bar x}{k}\). So, \(\hat \theta_{MLE}\) is maximized when the sum of samples is divided by the number of samples taken.

We can compare this to the MLE of a Binomial example.

MLE Binomial example

\[ \textbf{L}(\theta|x)= {k\choose x}\theta^x(1-\theta)^{k-x} \\ log\textbf{L}(\theta|x)= \ell(\theta|x) = log({k\choose x}\theta^x(1-\theta)^{k-x}) \\ \Rightarrow \ell(\theta|x) = \frac{x}{\theta}+\frac{k-x}{1-\theta}(-1) = 0 \Rightarrow \cdots \hat \theta=\frac{x}{k} \]

So, while the MLE for the Gamma distribution is \(\hat \theta_{MLE} = \frac{\bar x}{k}\), the MLE for the Binomial is \(\hat \theta_{MLE} = \frac{x}{k}\).

Colonies per year [Interactive Plotly]

In order to find the total number of colonies per year for all US states, we can use dplyr and plot the results.

R code for Colonies per year

honey <- read.csv('honeyproduction.csv')
colony_tot <- sum(honey$numcol)
colony <- honey %>%
  group_by(year) %>%
  summarise(numcol_tot = sum(numcol))
x <- ggplot(colony, aes(x=year, y=numcol_tot)) +
  geom_line() +
  theme_classic() +
  ggtitle("Number of Colonies per Year") +
  theme(plot.title = element_text(hjust = 0.5)) + 
  xlab("Years") +
  ylab("Number of Colonies")
plt <- ggplotly(x)
#plt

Mean estimate for yearly bee colonies

Below is the log-likelihood estimate for the average amount of bee colonies per year, based on our dataset.

Calculating the mean estimate

Based on our data set, this R code calculates the average number of colonies per year for the MLE samples, \(\theta_{MLE}=\frac{\bar x_s}{k}\), vs the mean for the population, \(\mu=\frac{\bar x_a}{n}\). Where \(\bar x_s\) is the sum of the samples, \(\bar x_a\) is the sum of all colonies, k is the number of samples taken, n is the total number of actual measurements.

honey <- read.csv('honeyproduction.csv')
colony_tot <- sum(honey$numcol)
colony <- honey %>%
  group_by(year) %>% summarise(numcol_tot = sum(numcol))
bee_sample <- sum(sample_n(colony, 10, replace=FALSE))
set.seed(5555);
smu <- bee_sample/10
mu <- colony_tot/15

Results of the mean estimate

From the previous code we fine that:

## [1] "The estimated number of average colonies: 2521805.5"

## [1] "The actual average number of colonies per year: 2515866.66666667"

Conclusion:

We can use the Maximum Likelihood Estimate to estimate the parameter \(\theta\) of a population.
The estimator \(\hat \theta_n(x)\) is consistent.
- As the number of observations increase, the MLE distribution becomes more concentrated about the true state.
If the likelihood function is differentiable, the derivative test for determining maxima can be applied.