Maximum likelihood estimation (MLE) is a method that determines values for the parameters of a model. The parameter values are found such that they maximise the likelihood that the process described by the model produced the data that were actually observed. MLEs and Likelihood Functions generally have very desirable large sample properties:
they become unbiased minimum variance estimators as the sample size increases
they have approximate normal distributions and approximate sample variances that can be calculated and used to generate confidence bounds
likelihood functions can be used to test hypotheses about models and parameters
Refers to conventional linear regression models for a continuous response variable given continuous and/or categorical predictors.
The sigma (standard error) shows the variation between values in the set data. The lower the standard deviation, the closer the data points tend to be to the mean and vise versa.
The standard error of mean estimates the variability between sample means that you would obtain if you took multiple samples from the same population. Lower values of the standard error of the mean indicate more precise estimates of the population mean. Usually, a larger standard deviation will result in a larger standard error of the mean and a less precise estimate. A larger sample size will result in a smaller standard error of the mean and a more precise estimate.
library(maxLik)
library(Zelig)
logLikFun <- function(param) {
mu <- param[1]
sigma <- param[2]
sum(dnorm(x, mean = mu, sd = sigma, log = TRUE))
}
We use MLE to find the “best fit model” for a data that satisfies the argument that it is normally distrubuted i.e. Y∼N(μ,σ). As the below example shows, the intercept is displayed as “beta 1” and the slope is displayed as “beta 2”. We can therefore say that income is expected to increase when education increases by 0.37613. The stadard error in the model is rather small indicating that we will be able to predict a more precise estimates for the population mean.
data(turnout)
ols.lf <- function(param) {
beta <- param[-1]
sigma <- param[1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate)
mu <- x%*%beta
sum(dnorm(y, mu, sigma, log = TRUE))}
mle_ols <- maxLik(logLik = ols.lf, start = c(sigma = 1, beta1 = 1, beta2 = 1))
summary(mle_ols)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 12 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4691.256
## 3 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## sigma 2.52613 0.03989 63.326 < 2e-16 ***
## beta1 -0.65207 0.20827 -3.131 0.00174 **
## beta2 0.37613 0.01663 22.612 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
coef(mle_ols)
## sigma beta1 beta2
## 2.5261325 -0.6520675 0.3761334
stdEr(mle_ols)
## sigma beta1 beta2
## 0.03989085 0.20827251 0.01663390
As the below example shows, the intercept is displayed as “Theta 1” and the slope is displayed as “Theta 2”. We can therefore say that income is expected to increase when education increases by 0.11.
ols.lf2 <- function(param) {
mu <- param[1]
theta <- param[-1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate)
sigma <- x%*%theta
sum(dnorm(y, mu, sigma, log = TRUE))
}
mle_ols2 <- maxLik(logLik = ols.lf2, start = c(mu = 1, theta1 = 1, theta2 = 1))
summary(mle_ols2)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 9 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4861.964
## 3 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## mu 3.516764 0.070320 50.01 <2e-16 ***
## theta1 1.461011 0.106745 13.69 <2e-16 ***
## theta2 0.109081 0.009185 11.88 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
If variable age is introduced in both the above examples, our mean and sigma is bound to change. Also, there will a no correlation between age and income. This is becuase, your income is not defined by how old you are but rather how much experience and education one has. We cannot say that, everyyear as one turns older, his/her income increases/ decreases. We can see this from the below example, where age was introduced and there is a negative relationship between income and age.
summary(lm(income ~ educate+age, data = turnout))
##
## Call:
## lm(formula = income ~ educate + age, data = turnout)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2128 -1.7471 -0.4217 1.3042 11.1256
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.446084 0.303955 -1.468 0.142
## educate 0.371013 0.017641 21.031 <2e-16 ***
## age -0.003183 0.003394 -0.938 0.348
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.527 on 1997 degrees of freedom
## Multiple R-squared: 0.2014, Adjusted R-squared: 0.2006
## F-statistic: 251.8 on 2 and 1997 DF, p-value: < 2.2e-16