Maximum Likelihood Estimation (MLE) and Generalized Linear Model (GLM)

What is MLE?

Maximum likelihood estimation (MLE) is a method that determines values for the parameters of a model. The parameter values are found such that they maximise the likelihood that the process described by the model produced the data that were actually observed. MLEs and Likelihood Functions generally have very desirable large sample properties:

  • they become unbiased minimum variance estimators as the sample size increases

  • they have approximate normal distributions and approximate sample variances that can be calculated and used to generate confidence bounds

  • likelihood functions can be used to test hypotheses about models and parameters

What is GLM?

Refers to conventional linear regression models for a continuous response variable given continuous and/or categorical predictors.

Sigma and Standard Error

The sigma (standard error) shows the variation between values in the set data. The lower the standard deviation, the closer the data points tend to be to the mean and vise versa.

The standard error of mean estimates the variability between sample means that you would obtain if you took multiple samples from the same population. Lower values of the standard error of the mean indicate more precise estimates of the population mean. Usually, a larger standard deviation will result in a larger standard error of the mean and a less precise estimate. A larger sample size will result in a smaller standard error of the mean and a more precise estimate.

library(maxLik)
library(Zelig)
logLikFun <- function(param) {
    mu <- param[1]
    sigma <- param[2]
    sum(dnorm(x, mean = mu, sd = sigma, log = TRUE))
}

Example 4.14

We use MLE to find the “best fit model” for a data that satisfies the argument that it is normally distrubuted i.e. Y∼N(μ,σ). As the below example shows, the intercept is displayed as “beta 1” and the slope is displayed as “beta 2”. We can therefore say that income is expected to increase when education increases by 0.37613. The stadard error in the model is rather small indicating that we will be able to predict a more precise estimates for the population mean.

data(turnout)
ols.lf <- function(param) {
  beta <- param[-1]
  sigma <- param[1]
  y <- as.vector(turnout$income)
  x <- cbind(1, turnout$educate)
  mu <- x%*%beta
  sum(dnorm(y, mu, sigma, log = TRUE))}     

mle_ols <- maxLik(logLik = ols.lf, start = c(sigma = 1, beta1 = 1, beta2 = 1))
summary(mle_ols)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 12 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4691.256 
## 3  free parameters
## Estimates:
##       Estimate Std. error t value Pr(> t)    
## sigma  2.52613    0.03989  63.326 < 2e-16 ***
## beta1 -0.65207    0.20827  -3.131 0.00174 ** 
## beta2  0.37613    0.01663  22.612 < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
coef(mle_ols)
##      sigma      beta1      beta2 
##  2.5261325 -0.6520675  0.3761334
stdEr(mle_ols)
##      sigma      beta1      beta2 
## 0.03989085 0.20827251 0.01663390

Example 4.18

As the below example shows, the intercept is displayed as “Theta 1” and the slope is displayed as “Theta 2”. We can therefore say that income is expected to increase when education increases by 0.11.

ols.lf2 <- function(param) {
  mu <- param[1]
  theta <- param[-1]
  y <- as.vector(turnout$income)
  x <- cbind(1, turnout$educate)
  sigma <- x%*%theta
  sum(dnorm(y, mu, sigma, log = TRUE))
}    
mle_ols2 <- maxLik(logLik = ols.lf2, start = c(mu = 1, theta1 = 1, theta2 = 1))
summary(mle_ols2)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 9 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4861.964 
## 3  free parameters
## Estimates:
##        Estimate Std. error t value Pr(> t)    
## mu     3.516764   0.070320   50.01  <2e-16 ***
## theta1 1.461011   0.106745   13.69  <2e-16 ***
## theta2 0.109081   0.009185   11.88  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------

What will happen if another independent variable such as “Age” is introduced? What will be the relationship between age and income?

If variable age is introduced in both the above examples, our mean and sigma is bound to change. Also, there will a no correlation between age and income. This is becuase, your income is not defined by how old you are but rather how much experience and education one has. We cannot say that, everyyear as one turns older, his/her income increases/ decreases. We can see this from the below example, where age was introduced and there is a negative relationship between income and age.

summary(lm(income ~ educate+age, data = turnout))
## 
## Call:
## lm(formula = income ~ educate + age, data = turnout)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.2128 -1.7471 -0.4217  1.3042 11.1256 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.446084   0.303955  -1.468    0.142    
## educate      0.371013   0.017641  21.031   <2e-16 ***
## age         -0.003183   0.003394  -0.938    0.348    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.527 on 1997 degrees of freedom
## Multiple R-squared:  0.2014, Adjusted R-squared:  0.2006 
## F-statistic: 251.8 on 2 and 1997 DF,  p-value: < 2.2e-16