How to find the MLE if \(\Omega \subset \mathbf{R}^1:\)
Solve \(\frac{dL(\theta)}{d\theta}=0\) and check whether \(\frac{d^2L(\theta)}{d\theta^2} < 0\) (calculus method to find maximum)
If \(L(\theta)\) is monotone in \(\theta\) and \(\Omega = [a,b]\) for some interval \([a,b]\), then the MLE is a boundary point of \(\Omega = [a,b],\) either \(a\) or \(b\).
Examples:
\[\tilde{L}(\theta) = \tilde{L}(\theta ;X_1,\cdots ,X_n) = f_{X_1,\cdots ,X_n}(x_1,\cdots ,x_n|\theta) = \prod_{i=1}^nf(x_i|\theta) = \prod_{i=1}^{n }\frac{e^{-\theta}\theta^{x_i}}{x_i!}=\frac{e^{-n\theta}\theta^{\sum_{i=1}^{n}x_i}}{\prod_{i=1}^nx_i!}\]
We will first plot the likelihood function for the Poisson distribution where the sample data \(X_1,\cdots ,X_{100}\stackrel{i.i.d.}{\sim} Poisson(\theta=5)\)
set.seed(999)
poisson_likelihood = function(theta, x, n) { # function for Poisson likelihood
return((exp(-n*theta) * theta^(sum(x)))/(prod(factorial(x))))
}
sample_data <- rpois(100,5) # generate 100 sample points from a Poisson distribution
# with lambda = 5
theta <- seq(0.01, 9.99, 0.01) # vector of possible thetas
plot(theta, poisson_likelihood(theta = theta, x = sample_data, n = 100), # plot
main = 'Poisson Likelihood Function with true theta = 5', ylab = 'Likelihood(theta)')
abline(v = 5) # vertical line at theta = 5
It is apparent that the likelihood of \(\theta\) continues toward \(\infty\), and so it is difficult to understand where the MLE \(\hat{\theta}\) may be given the scale of the graph. We will continue towards the log-likelihood to try and better visualize the MLE.
\[L(\theta)=-n\theta + \sum_{i=1}^nx_ilog\theta-\sum_{i=1}^nlog(x_i!)\]
poisson_log_likelihood = function(theta, x, n){ # function for Poisson log-likelihood
return(-n * theta + sum(x) * log(theta) - sum(log(factorial(x))))
}
plot(theta, poisson_log_likelihood(theta = theta, x = sample_data, n = 100), type='l',
main = 'Poisson log-likelihood function with true theta = 5',
ylab = 'log-likelihood of theta')
abline(v = 5) # vertical line at theta = 5
abline(h = max(poisson_log_likelihood(theta = theta, # horizontal line at max of data
x = sample_data, n = 100)))
The monotone nature of the log function makes it possible to see clearly where the maximum \(\theta\) or \(\hat{\theta}\) lies. This gives the parameter for which the sample data have the highest pdf/pmf.
\[\frac{dL(\theta)}{d\theta}=-n+\frac{1}{\theta}\sum_{i=1}^nx_i\] \[\frac{dL(\theta)}{d\theta}=0\Rightarrow\hat{\theta}=\frac{\sum_{i=1}^nx_i}{n}=\bar{x} \text{ (sample mean)}\]
We can also calculate the MLE using the formula given when setting the derivative to 0, which is simply the sample mean of the data. Recall, when we take the derivative of the log-likelihood and set it to 0, we are looking to maximize the parameter \(\theta\) given some sample data and a distribution. The model of the likelihood function is \(\tilde{L}(\theta ; X_1,\cdots ,X_n)\).
mean(sample_data)
## [1] 4.87
If we increased the size of \(n\), the estimator would appear more precise. This works due to the \(\textit{ consistency of the MLE}\). Under mild assumptions the MLE \(\textit{converges in probability}\) to the true parameter \(\theta\).
\[\hat{\theta} \rightarrow_p \theta \text{ as } n\rightarrow\infty\]
mean(rpois(1000000,5))
## [1] 5.001632
\[\text{Note that } \frac{d^2L(\theta)}{d\theta^2}=-\frac{1}{\theta^2}\sum_{i=1}^nx_i<0 \text{ since }x_i\geq0\text{ if at least one }x_i>0.\]
The above merely confirms using calculus that the point we have found is indeed a maximum and not a minimum (Source: Lecture notes).
\[\tilde{L}(\theta) = \prod_{i=1}^n\theta^{x_i}(1-\theta)^{1-x_i}=\theta^{\sum_{i=1}^nx_i}(1-\theta)^{\sum_{i=1}^n(1-x_i)}=\theta^{\sum_{i=1}^nx_i}(1-\theta)^{n-\sum_{i=1}^nx_i}\] Using imaginary sample data let us pretend that in a city there was a gate where outsiders would enter. It was found out that over a period of time 500 people had entered, and 380 were male. The probability of ‘success’ in this case represents the person entering the city being a male. (Reference: https://www.coursera.org/lecture/bayesian-statistics/plotting-the-likelihood-in-r-6Ztvq)
bernoulli_likelihood = function(theta, x, n) { # Bernoulli likelihood function
return(theta^x * (1-theta)^(n-x))
}
theta <- seq(0.01, 0.99, 0.01) # vector of possible thetas
plot(theta, bernoulli_likelihood(theta = theta, x = 380, n = 500), # plot the function
main = 'Bernoulli Likelihood Function', ylab = 'Likelihood(theta)')
abline(v = (380/500)) # vertical line using sample mean
Above it is possible to tell that using the sample mean again as the \(\hat{\theta}\), we can get a rather close estimate of where the data has the highest probability. In the case of the Bernoulli distribution, the results are ‘1’ if success, and ‘0’ if failure. So if we have 380 successes, it would sum to 380. So using the sample mean formula we would get back \(\sum_{i=1}^nx_i \frac{1}{n}=380/500=0.76.\) Let us do it again using the log-likelihood below. \[L(\theta) = \sum_{i=1}^nx_ilog\theta + (n-\sum_{i=1}^nx_i) log(1-\theta)\]
bernoulli_log_likelihood = function(theta, x, n) { # Bernoulli log-likelihood function
return(x*log(theta) + (n-x)*log(1-theta))
}
theta <- seq(0.01, 0.99, 0.01) # vector of possible thetas
plot(theta, bernoulli_log_likelihood(theta = theta, x = 380, n = 500), # plot the function
main = 'Bernoulli log-likelihood Function', ylab = 'log-likelihood of theta', type = 'l')
abline(v = (380/500)) # vertical line using sample mean
abline(h = max(bernoulli_log_likelihood(theta = theta, # horizontal line at max of data
x = 380, n = 500)))
\[\frac{dL(\theta)}{d\theta} = 0\Rightarrow\frac{1}{\theta}\sum_{i=1}^nx_i - (n-\sum_{i=1}^nx_i)\frac{1}{ 1-\theta}=0\Rightarrow\hat{\theta}=\bar{x}\]