Using obsevations of income and education in the turnout dataset from the Zelig package, we are going to perform Max Likelihood Estimation to generate the parameters for a normal distribution under which our observed values have the maximum likelihood of occuring.


library(Zelig)
library(maxLik)
library(dplyr)
library(knitr)
library(texreg)
library(ggplot2)
data(turnout)

Income x Education

Is there a relatinoship between educaiton & income?


Let’s explore by plotting the average income for each level of educational attainment.

turnout%>%
  group_by(educate)%>%
  summarize(mean_income=mean(income))%>%
  ggplot()+
  geom_col(aes(x=educate, y=mean_income, fill=mean_income))+
  theme(legend.position="none")

Writing the likelihood function in R.

  • sigma tells us the standard deviation of income
  • beta1 is the intercept.
  • beta 2 is the slope for education
  • beta 3 is the slope for age

It appears that education is a statistically singificant predictor of income (p<.001), such that the model estimates a .37 unit increase in income for every additional unit of education. The standard deviation is estimated to remain constant at a value of 2.52.

Age is not statistically significant as a predictor of income in this model.

ols.lf <- function(param) {
  beta <- param[-1] #Regression Coefficients
  sigma <- param[1] #Standard Deviation
  y <- as.vector(turnout$income) #DV
  x <- cbind(1, turnout$educate, turnout$age) #IV
  mu <- x%*%beta #multiply matrices
  sum(dnorm(y, mu, sigma, log = TRUE)) #normal distribution(vector of observations, mean, sd)
  }    
mle_ols <- maxLik(logLik = ols.lf, start = c(sigma = 1, beta1 = 1, beta2 = 1, beta3=1))
summary(mle_ols)
--------------------------------------------
Maximum Likelihood estimation
Newton-Raphson maximisation, 16 iterations
Return code 2: successive function values within tolerance limit
Log-Likelihood: -4690.815 
4  free parameters
Estimates:
       Estimate Std. error t value Pr(> t)    
sigma  2.525576   0.039919  63.268  <2e-16 ***
beta1 -0.446047   0.300583  -1.484   0.138    
beta2  0.371011   0.017493  21.209  <2e-16 ***
beta3 -0.003184   0.003373  -0.944   0.345    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
--------------------------------------------


The values from our maxLik output should be identical to the output from a linear model function in R.

#Checking against linear regression
htmlreg(lm(income~educate+age, data=turnout))
Statistical models
Model 1
(Intercept) -0.45
(0.30)
educate 0.37***
(0.02)
age -0.00
(0.00)
R2 0.20
Adj. R2 0.20
Num. obs. 2000
RMSE 2.53
p < 0.001, p < 0.01, p < 0.05



What does this regression look like?

visreg::visreg(lm(income~educate, data=turnout))




Income Dispersion x Education

Is there more income inequality among people with higher education?


Let’s explore by plotting the standard deviation for income for each level of educational attainment.

turnout%>%
  group_by(educate)%>%
  summarize(income_sd=sd(income))%>%
  ggplot()+
  geom_point(aes(x=educate,y=income_sd))

We can modify the variables that we define in our Likelihood function to engineer our max likelihood estimation procedure to estimate the standard deviation of income rather than the mean value of income.

Writing the likelihood function in R.

  • mu tells us the mean income
  • theta1 is the intercept.
  • theta2 is the slope of education
  • theta3 is the slope of age

It appears that education is a statistically singificant predictor of income dispersion (p<.001), such that while holding the mean constant at 3.55, the model estimates a .13 unit increase in the deviation of income for every additional unit of education.

Age is also a statistically singificant predictor of income dispersion according to our model, such that while holding the mean constant at 3.55, the model estimates a .02 unit increase in the deviation of income for every 1 unit increase in age.

ols.lf2 <- function(param) {
  mu <- param[1]
  theta <- param[-1]
  y <- as.vector(turnout$income) #DV
  x <- cbind(1, turnout$educate, turnout$age) #IV
  sigma <- x%*%theta   #multiply matrices
  sum(dnorm(y, mu, sigma, log = TRUE)) #normal distribution(vector of observations, mean, sd)
  }
mle_ols2 <- maxLik(logLik = ols.lf2, start = c(mu = 1, theta1 = 1, theta2 = 1, theta3 = 1), method="BFGS")
summary(mle_ols2)
--------------------------------------------
Maximum Likelihood estimation
BFGS maximization, 150 iterations
Return code 0: successful convergence 
Log-Likelihood: -4843.15 
4  free parameters
Estimates:
       Estimate Std. error t value  Pr(> t)    
mu     3.555011   0.069193  51.378  < 2e-16 ***
theta1 0.362114   0.204550   1.770   0.0767 .  
theta2 0.133349   0.010756  12.398  < 2e-16 ***
theta3 0.017507   0.002852   6.139 8.32e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
--------------------------------------------
LS0tCnRpdGxlOiAibWF4TGlrOiBBbiBFeGVyY2lzZSBpbiBNYXggTGlrZWxpaG9vZCBFc3RpbWF0aW9uIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgoKCjxici8+Cjxici8+Cgo8aHIgd2lkdGg9IjkwJSI+CgpVc2luZyBvYnNldmF0aW9ucyBvZiBpbmNvbWUgYW5kIGVkdWNhdGlvbiBpbiB0aGUgKip0dXJub3V0KiogZGF0YXNldCBmcm9tIHRoZSAqKlplbGlnKiogcGFja2FnZSwgd2UgYXJlIGdvaW5nIHRvIHBlcmZvcm0gTWF4IExpa2VsaWhvb2QgRXN0aW1hdGlvbiB0byBnZW5lcmF0ZSB0aGUgcGFyYW1ldGVycyBmb3IgYSBub3JtYWwgZGlzdHJpYnV0aW9uIHVuZGVyIHdoaWNoIG91ciBvYnNlcnZlZCB2YWx1ZXMgaGF2ZSB0aGUgbWF4aW11bSBsaWtlbGlob29kIG9mIG9jY3VyaW5nLgoKPGhyIHdpZHRoPSI5MCUiPgoKYGBge3IsIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9CmxpYnJhcnkoWmVsaWcpCmxpYnJhcnkobWF4TGlrKQpsaWJyYXJ5KGRwbHlyKQpsaWJyYXJ5KGtuaXRyKQpsaWJyYXJ5KHRleHJlZykKbGlicmFyeShnZ3Bsb3QyKQpkYXRhKHR1cm5vdXQpCmBgYAoKPGhyIHdpZHRoPSI5MCUiPgoKIyNJbmNvbWUgeCBFZHVjYXRpb24KCklzIHRoZXJlIGEgcmVsYXRpbm9zaGlwIGJldHdlZW4gZWR1Y2FpdG9uICYgaW5jb21lPwoKPGhyIHdpZHRoPSI5MCUiPgoKTGV0J3MgZXhwbG9yZSBieSBwbG90dGluZyB0aGUgYXZlcmFnZSBpbmNvbWUgZm9yIGVhY2ggbGV2ZWwgb2YgZWR1Y2F0aW9uYWwgYXR0YWlubWVudC4KCmBgYHtyfQp0dXJub3V0JT4lCiAgZ3JvdXBfYnkoZWR1Y2F0ZSklPiUKICBzdW1tYXJpemUobWVhbl9pbmNvbWU9bWVhbihpbmNvbWUpKSU+JQogIGdncGxvdCgpKwogIGdlb21fY29sKGFlcyh4PWVkdWNhdGUsIHk9bWVhbl9pbmNvbWUsIGZpbGw9bWVhbl9pbmNvbWUpKSsKICB0aGVtZShsZWdlbmQucG9zaXRpb249Im5vbmUiKQpgYGAKCgojIyNXcml0aW5nIHRoZSBsaWtlbGlob29kIGZ1bmN0aW9uIGluIFIuCgoqICoqc2lnbWEqKiB0ZWxscyB1cyB0aGUgc3RhbmRhcmQgZGV2aWF0aW9uIG9mIGluY29tZQoqICoqYmV0YTEqKiBpcyB0aGUgaW50ZXJjZXB0LgoqICoqYmV0YSAyKiogaXMgdGhlIHNsb3BlIGZvciBlZHVjYXRpb24KKiAqKmJldGEgMyoqIGlzIHRoZSBzbG9wZSBmb3IgYWdlCgpJdCBhcHBlYXJzIHRoYXQgZWR1Y2F0aW9uIGlzIGEgc3RhdGlzdGljYWxseSBzaW5naWZpY2FudCBwcmVkaWN0b3Igb2YgaW5jb21lIChwPC4wMDEpLCBzdWNoIHRoYXQgdGhlIG1vZGVsIGVzdGltYXRlcyBhIC4zNyB1bml0IGluY3JlYXNlIGluIGluY29tZSBmb3IgZXZlcnkgYWRkaXRpb25hbCB1bml0IG9mIGVkdWNhdGlvbi4gVGhlIHN0YW5kYXJkIGRldmlhdGlvbiBpcyBlc3RpbWF0ZWQgdG8gcmVtYWluIGNvbnN0YW50IGF0IGEgdmFsdWUgb2YgMi41Mi4KCkFnZSBpcyBub3Qgc3RhdGlzdGljYWxseSBzaWduaWZpY2FudCBhcyBhIHByZWRpY3RvciBvZiBpbmNvbWUgaW4gdGhpcyBtb2RlbC4KCmBgYHtyLCBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFfQpvbHMubGYgPC0gZnVuY3Rpb24ocGFyYW0pIHsKICBiZXRhIDwtIHBhcmFtWy0xXSAjUmVncmVzc2lvbiBDb2VmZmljaWVudHMKICBzaWdtYSA8LSBwYXJhbVsxXSAjU3RhbmRhcmQgRGV2aWF0aW9uCiAgeSA8LSBhcy52ZWN0b3IodHVybm91dCRpbmNvbWUpICNEVgogIHggPC0gY2JpbmQoMSwgdHVybm91dCRlZHVjYXRlLCB0dXJub3V0JGFnZSkgI0lWCiAgbXUgPC0geCUqJWJldGEgI211bHRpcGx5IG1hdHJpY2VzCiAgc3VtKGRub3JtKHksIG11LCBzaWdtYSwgbG9nID0gVFJVRSkpICNub3JtYWwgZGlzdHJpYnV0aW9uKHZlY3RvciBvZiBvYnNlcnZhdGlvbnMsIG1lYW4sIHNkKQogIH0gICAgCgoKbWxlX29scyA8LSBtYXhMaWsobG9nTGlrID0gb2xzLmxmLCBzdGFydCA9IGMoc2lnbWEgPSAxLCBiZXRhMSA9IDEsIGJldGEyID0gMSwgYmV0YTM9MSkpCnN1bW1hcnkobWxlX29scykKYGBgCgo8YnIvPgoKCiMjIyNUaGUgdmFsdWVzIGZyb20gb3VyIG1heExpayBvdXRwdXQgc2hvdWxkIGJlIGlkZW50aWNhbCB0byB0aGUgb3V0cHV0IGZyb20gYSBsaW5lYXIgbW9kZWwgZnVuY3Rpb24gaW4gUi4KCmBgYHtyIHJlc3VsdHM9J2FzaXMnfQojQ2hlY2tpbmcgYWdhaW5zdCBsaW5lYXIgcmVncmVzc2lvbgpodG1scmVnKGxtKGluY29tZX5lZHVjYXRlK2FnZSwgZGF0YT10dXJub3V0KSkKYGBgCgo8YnIvPgoKPGhyIHdpZHRoPSI5MCUiPgoKIyMjI1doYXQgZG9lcyB0aGlzIHJlZ3Jlc3Npb24gbG9vayBsaWtlPwpgYGB7cn0KdmlzcmVnOjp2aXNyZWcobG0oaW5jb21lfmVkdWNhdGUsIGRhdGE9dHVybm91dCkpCmBgYAoKCjxici8+Cjxici8+Cgo8aHIgd2lkdGg9IjkwJSI+CgojI0luY29tZSBEaXNwZXJzaW9uIHggRWR1Y2F0aW9uCgpJcyB0aGVyZSBtb3JlIGluY29tZSBpbmVxdWFsaXR5IGFtb25nIHBlb3BsZSB3aXRoIGhpZ2hlciBlZHVjYXRpb24/Cgo8aHIgd2lkdGg9IjkwJSI+CgpMZXQncyBleHBsb3JlIGJ5IHBsb3R0aW5nIHRoZSBzdGFuZGFyZCBkZXZpYXRpb24gZm9yIGluY29tZSBmb3IgZWFjaCBsZXZlbCBvZiBlZHVjYXRpb25hbCBhdHRhaW5tZW50LgoKCmBgYHtyLCBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFfQp0dXJub3V0JT4lCiAgZ3JvdXBfYnkoZWR1Y2F0ZSklPiUKICBzdW1tYXJpemUoaW5jb21lX3NkPXNkKGluY29tZSkpJT4lCiAgZ2dwbG90KCkrCiAgZ2VvbV9wb2ludChhZXMoeD1lZHVjYXRlLHk9aW5jb21lX3NkKSkKYGBgCgoKCldlIGNhbiBtb2RpZnkgdGhlIHZhcmlhYmxlcyB0aGF0IHdlIGRlZmluZSBpbiBvdXIgTGlrZWxpaG9vZCBmdW5jdGlvbiB0byBlbmdpbmVlciBvdXIgbWF4IGxpa2VsaWhvb2QgZXN0aW1hdGlvbiBwcm9jZWR1cmUgdG8gZXN0aW1hdGUgdGhlICpzdGFuZGFyZCBkZXZpYXRpb24qIG9mIGluY29tZSByYXRoZXIgdGhhbiB0aGUgbWVhbiB2YWx1ZSBvZiBpbmNvbWUuIAoKCiMjI1dyaXRpbmcgdGhlIGxpa2VsaWhvb2QgZnVuY3Rpb24gaW4gUi4KCiogKiptdSoqIHRlbGxzIHVzIHRoZSBtZWFuIGluY29tZQoqICoqdGhldGExKiogaXMgdGhlIGludGVyY2VwdC4KKiAqKnRoZXRhMioqIGlzIHRoZSBzbG9wZSBvZiBlZHVjYXRpb24KKiAqKnRoZXRhMyoqIGlzIHRoZSBzbG9wZSBvZiBhZ2UKCgpJdCBhcHBlYXJzIHRoYXQgZWR1Y2F0aW9uIGlzIGEgc3RhdGlzdGljYWxseSBzaW5naWZpY2FudCBwcmVkaWN0b3Igb2YgaW5jb21lIGRpc3BlcnNpb24gKHA8LjAwMSksIHN1Y2ggdGhhdCB3aGlsZSBob2xkaW5nIHRoZSBtZWFuIGNvbnN0YW50IGF0IDMuNTUsIHRoZSBtb2RlbCBlc3RpbWF0ZXMgYSAuMTMgdW5pdCBpbmNyZWFzZSBpbiB0aGUgZGV2aWF0aW9uIG9mIGluY29tZSBmb3IgZXZlcnkgYWRkaXRpb25hbCB1bml0IG9mIGVkdWNhdGlvbi4gCgoKQWdlIGlzIGFsc28gYSBzdGF0aXN0aWNhbGx5IHNpbmdpZmljYW50IHByZWRpY3RvciBvZiBpbmNvbWUgZGlzcGVyc2lvbiBhY2NvcmRpbmcgdG8gb3VyIG1vZGVsLCBzdWNoIHRoYXQgd2hpbGUgaG9sZGluZyB0aGUgbWVhbiBjb25zdGFudCBhdCAzLjU1LCB0aGUgbW9kZWwgZXN0aW1hdGVzIGEgLjAyIHVuaXQgaW5jcmVhc2UgaW4gdGhlIGRldmlhdGlvbiBvZiBpbmNvbWUgZm9yIGV2ZXJ5IDEgdW5pdCBpbmNyZWFzZSBpbiBhZ2UuCgpgYGB7ciwgbWVzc2FnZT1GQUxTRSwgd2FybmluZz1GQUxTRX0KCgpvbHMubGYyIDwtIGZ1bmN0aW9uKHBhcmFtKSB7CiAgbXUgPC0gcGFyYW1bMV0KICB0aGV0YSA8LSBwYXJhbVstMV0KICB5IDwtIGFzLnZlY3Rvcih0dXJub3V0JGluY29tZSkgI0RWCiAgeCA8LSBjYmluZCgxLCB0dXJub3V0JGVkdWNhdGUsIHR1cm5vdXQkYWdlKSAjSVYKICBzaWdtYSA8LSB4JSoldGhldGEgICAjbXVsdGlwbHkgbWF0cmljZXMKICBzdW0oZG5vcm0oeSwgbXUsIHNpZ21hLCBsb2cgPSBUUlVFKSkgI25vcm1hbCBkaXN0cmlidXRpb24odmVjdG9yIG9mIG9ic2VydmF0aW9ucywgbWVhbiwgc2QpCiAgfQoKCm1sZV9vbHMyIDwtIG1heExpayhsb2dMaWsgPSBvbHMubGYyLCBzdGFydCA9IGMobXUgPSAxLCB0aGV0YTEgPSAxLCB0aGV0YTIgPSAxLCB0aGV0YTMgPSAxKSwgbWV0aG9kPSJCRkdTIikKCgpzdW1tYXJ5KG1sZV9vbHMyKQpgYGAKCgo=