Kushal Kharel
9/11/2022
How do we infer sample data to population parameters? One way is to use maximum likelihood estimation method.
Let us consider a set of observations \(y_1, y_2, ..., y_n\), the likelihood function \(L\) is the join probability density of obtaining the data that are actually observed.
It is considered as a function of unknown parameters with the observed data held fixed.
The maximum likelihood estimators are those values of the parameters for which the data actually observed are most likely. In other words, the values that maximizes the likelihood function.
We have i.i.d sample of data, assuming that the data is generated from some parametric density function or the probability mass function which is indexed by parameter \(\theta\).
Then we construct a likelihood function using the assumption that the data are iid which is simply the product of density or pmf evaluated at each sample data points \(y_i\).
We take the log of that function simply because it is computationally easier among other reasons.
We may have infinite number of possible parameter values, and we want to choose the one that best fits the data. In order to do this, we need to maximize the log likelihood function.
library(dplyr)
library(plotly)
N = 1000
beta = 5
sigma_2 = 5
X = rnorm(N, 0, 5)
Z = rnorm(N, 0, sqrt(sigma_2))
Y = beta*X + Z
DT = data.table::data.table(X, Y, Z)
head(DT)## X Y Z
## 1: -5.1031125 -22.608491 2.9070710
## 2: 2.3169936 11.783020 0.1980519
## 3: 3.6623491 19.698005 1.3862593
## 4: 5.4599034 28.181253 0.8817356
## 5: -1.3758629 -8.999021 -2.1197066
## 6: 0.4845064 4.092999 1.6704669
a non-parametric method to estimate the probability density function of a random variable based on kernels as weights
provides a way to see the shape of data
MLE_par = MLE_Estimates$par
MLE_SE = sqrt(diag(solve(MLE_Estimates$hessian)))
MLE = data.table::data.table(param = c("beta", "sigma_2"),
estimates = MLE_par,
sd = MLE_SE)
kable(MLE)| param | estimates | sd |
|---|---|---|
| beta | 4.993428 | 0.0146031 |
| sigma_2 | 5.156092 | 0.2305873 |
Suppose we observe ten successes in 25 trials. Here we are not asking about the probability of 10 successes but instead asking what is the parameter that would have given us 10 successes.
Let us see the plot explained above. Suppose that in the sequence of 25 coin flips, we observe 10 successes(heads).
What is the maximum likelihood estimate for theta? Assume binomial model.
## $par
## [1] 0.4
##
## $value
## [1] 1.82537
##
## $counts
## function gradient
## 28 NA
##
## $convergence
## [1] 0
##
## $message
## NULL