knitr::opts_chunk$set(echo = TRUE)
pacman::p_load(maxLik, Zelig)

Relationship between income and education

data(turnout)
head(turnout)

##    race age educate income vote
## 1 white  60      14 3.3458    1
## 2 white  51      10 1.8561    0
## 3 white  24      12 0.6304    0
## 4 white  38       8 3.4183    1
## 5 white  25      12 2.7852    1
## 6 white  67      12 2.3866    1

#We have some discrete variables like race and continuous variables like income


#relationship between income and education
#log-likelihood function
ols.lf <- function(param) {
  beta <- param[-1]
  sigma <- param[1]
  y <- as.vector(turnout$income)
  x <- cbind(1, turnout$educate)
  mu <- x%*%beta
  sum(dnorm(y, mu, sigma, log = TRUE))}    

#Maximizing log-likelihood using maxLik

mle_ols <- maxLik(logLik = ols.lf, start = c(sigma = 1, beta1 = 1, beta2 = 1))
summary(mle_ols)

## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 12 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4691.256 
## 3  free parameters
## Estimates:
##       Estimate Std. error t value Pr(> t)    
## sigma  2.52613    0.03989  63.326 < 2e-16 ***
## beta1 -0.65207    0.20827  -3.131 0.00174 ** 
## beta2  0.37613    0.01663  22.612 < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------

#Using OLS lm function to replicate same results

summary(lm(income ~ educate, data = turnout))

## 
## Call:
## lm(formula = income ~ educate, data = turnout)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.2028 -1.7363 -0.4273  1.3150 11.0632 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.65207    0.21016  -3.103  0.00194 ** 
## educate      0.37613    0.01677  22.422  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.527 on 1998 degrees of freedom
## Multiple R-squared:  0.201,  Adjusted R-squared:  0.2006 
## F-statistic: 502.8 on 1 and 1998 DF,  p-value: < 2.2e-16

For the Maximum Likelihood Estimation, we specify a models distribution and then the estimator will choose the parameters that maximize the likelihood/log-likelihood function. In an essence, we want the location that maximized the likelihood of observing the weights we measured. For the above problem, we are looking at maximizing the likelihood estimate of the mean or average. This means that we will find the parameters that best fit the mean and that maximizes the likelihood of observing those weights.
For instance, here we get returned beta1 and beta2. Beta1 is the intercept and can be read as “Ceteris Paribus, with no years of education, average income is -.65 units(however income was measured)” Beta2, our slope, can be read as “Ceteris Paribus, an increase in education by 1 year on average will increase income by .37 units”. Our sigma, 2.527 is the residual standard error of our model.

Does education influence income inequality?

ols.lf2 <- function(param) {
  mu <- param[1]
  theta <- param[-1]
  y <- as.vector(turnout$income)
  x <- cbind(1, turnout$educate)
  sigma <- x%*%theta
  sum(dnorm(y, mu, sigma, log = TRUE))
}    

mle_ols2 <- maxLik(logLik = ols.lf2, start = c(mu = 1, theta1 = 1, theta2 = 1))
summary(mle_ols2)

## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 9 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4861.964 
## 3  free parameters
## Estimates:
##        Estimate Std. error t value Pr(> t)    
## mu     3.516764   0.070320   50.01  <2e-16 ***
## theta1 1.461011   0.106745   13.69  <2e-16 ***
## theta2 0.109081   0.009185   11.88  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------

Here, instead of getting mean likelihood, we are looking to find max likelihood of weights for changes in standard deviation. This allows us to interpret the data differently and understand a different working relationship between variables. So now instead of our residual standard error being returned as sd, we get a residual mean error returned as 3.5. Our theta1, the new intercept can be read as “Ceteris paribus, 0 years of education will lead to a standard deviation of income of 1.46”. The new slope, theta2, can be read as “Ceteris paribus, an increase in education by 1 year will increase the standard deviation of income by .109”. Instead of using the mean parameters to understand mean changes, we can use standard deviation to understand that increase in education will cause inequality in income through changes in sigma.

Let us add age to both models.

ols.lf.age <- function(param) {
  beta <- param[-1]
  sigma <- param[1]
  y <- as.vector(turnout$income)
  x <- cbind(1, turnout$educate,turnout$age)
  mu <- x%*%beta
  sum(dnorm(y, mu, sigma, log = TRUE))}    

#Maximizing log-likelihood using maxLik

mle_ols_age <- maxLik(logLik = ols.lf.age, start = c(sigma = 1, beta1 = 1, beta2 = 1,beta3=1))
summary(mle_ols_age)

## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 16 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4690.815 
## 4  free parameters
## Estimates:
##        Estimate Std. error t value Pr(> t)    
## sigma  2.525576   0.039919  63.268  <2e-16 ***
## beta1 -0.446047   0.300583  -1.484   0.138    
## beta2  0.371011   0.017493  21.209  <2e-16 ***
## beta3 -0.003184   0.003373  -0.944   0.345    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------

#Comparing to ols

summary(lm(income~educate+age,data=turnout))

## 
## Call:
## lm(formula = income ~ educate + age, data = turnout)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.2128 -1.7471 -0.4217  1.3042 11.1256 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.446084   0.303955  -1.468    0.142    
## educate      0.371013   0.017641  21.031   <2e-16 ***
## age         -0.003183   0.003394  -0.938    0.348    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.527 on 1997 degrees of freedom
## Multiple R-squared:  0.2014, Adjusted R-squared:  0.2006 
## F-statistic: 251.8 on 2 and 1997 DF,  p-value: < 2.2e-16

Adding age as a predictor alongside education in order to predict income changes the parameters and gives us different results. Our intercept is -.446. Our slope for education is still around .37, but our age parameter is -.003. This means that “Ceteris paribus, an increase in age by one year will decrease income on average by .003 units”. This doesn’t seem to make much sense, and in fact, this predictor is not statistically significant, while educate is!

ols.lf2.age <- function(param) {
  mu <- param[1]
  theta <- param[-1]
  y <- as.vector(turnout$income)
  x <- cbind(1, turnout$educate,turnout$age)
  sigma <- x%*%theta
  sum(dnorm(y, mu, sigma, log = TRUE))
}    

mle_ols2_age <- maxLik(logLik = ols.lf2.age, start = c(mu = 1, theta1 = 1, theta2 = 1,theta3=1))
summary(mle_ols2_age)

## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 3 iterations
## Return code 3: Last step could not find a value above the current.
## Boundary of parameter space?  
## Consider switching to a more robust optimisation method temporarily.
## Log-Likelihood: -7542.444 
## 4  free parameters
## Estimates:
##        Estimate Std. error t value Pr(> t)   
## mu       1.0033     0.3368   2.979 0.00289 **
## theta1   0.9810         NA      NA      NA   
## theta2   0.7662         NA      NA      NA   
## theta3   0.1531         NA      NA      NA   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------

Here we take a look at changes in standard deviation with included predictor age. Our intercept is .98 and our new slope for education is .766. Our slope for age, .153, can be interpreted as “Ceteris paribus, an increase in age by one year will increase standard deviation of income by .153 units.” Since both the age and education coefficients are positive, this hints at income inequality being caused by both increases in age and education.

Homework_3_maxLik

Greg Maghakian

9/30/2018

Relationship between income and education

Does education influence income inequality?

Let us add age to both models.