THE RELATIONSHIP BETWEEN INCOME AND EDUCATION:

Using Log-likelihood function:

logLikFun <- function(param) {
    mu <- param[1]
    sigma <- param[2]
    sum(dnorm(x, mean = mu, sd = sigma, log = TRUE))
    }

The turnout data, as part of the Zelig package:

library(Zelig)
## Loading required package: survival
data("turnout")
head(turnout)
##    race age educate income vote
## 1 white  60      14 3.3458    1
## 2 white  51      10 1.8561    0
## 3 white  24      12 0.6304    0
## 4 white  38       8 3.4183    1
## 5 white  25      12 2.7852    1
## 6 white  67      12 2.3866    1

Linear regression using MLE for Mean-Using log likelihood function:

ols.lf <- function(param) {
  beta <- param[-1]
  sigma <- param[1]
  y <- as.vector(turnout$income)
  x <- cbind(1, turnout$educate)
  mu <- x%*%beta
  sum(dnorm(y, mu, sigma, log = TRUE))}    
param<-c(1,2,3,4,5)
param[-1]
## [1] 2 3 4 5
param[1]
## [1] 1

Maximizing the log likelihood function:

library(maxLik)
## Loading required package: miscTools
## 
## Please cite the 'maxLik' package as:
## Henningsen, Arne and Toomet, Ott (2011). maxLik: A package for maximum likelihood estimation in R. Computational Statistics 26(3), 443-458. DOI 10.1007/s00180-010-0217-1.
## 
## If you have questions, suggestions, or comments regarding the 'maxLik' package, please use a forum or 'tracker' at maxLik's R-Forge site:
## https://r-forge.r-project.org/projects/maxlik/
mle_ols <- maxLik(logLik = ols.lf, start = c(sigma = 1, beta1 = 1, beta2 = 1))
summary(mle_ols)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 12 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4691.256 
## 3  free parameters
## Estimates:
##       Estimate Std. error t value Pr(> t)    
## sigma  2.52613    0.03989  63.326 < 2e-16 ***
## beta1 -0.65207    0.20827  -3.131 0.00174 ** 
## beta2  0.37613    0.01663  22.612 < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------

Checking The Results Through Lenear Model:

summary(lm(income~educate, data = turnout))
## 
## Call:
## lm(formula = income ~ educate, data = turnout)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.2028 -1.7363 -0.4273  1.3150 11.0632 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.65207    0.21016  -3.103  0.00194 ** 
## educate      0.37613    0.01677  22.422  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.527 on 1998 degrees of freedom
## Multiple R-squared:  0.201,  Adjusted R-squared:  0.2006 
## F-statistic: 502.8 on 1 and 1998 DF,  p-value: < 2.2e-16

INTERPRETATION:

In the above analysis I am looking at maximizing the likelihood estimate of the mean (Slide 4.15). I have 3 free parameters and value of sigma is 2.52613 which is standard error of model, have beta1= -0.65207 is the intercept indicating average income is -0.65207 with zero number of education and beta2 is 0.37613 which is the slope and indicates that an increase in education by 1 year, on average will increase income by 0.37613 units.

Does Education Influnce Income Inequality?

Linear regression using MLE For Standard Deveation: Log likelihood function:

ols.lf2 <- function(param) {
  mu <- param[1]
  theta <- param[-1]
  y <- as.vector(turnout$income)
  x <- cbind(1, turnout$educate)
  sigma <- x%*%theta
sum(dnorm(y, mu, sigma, log=TRUE))}

Maximizing the log likelihood function:

library(maxLik)
mle_ols2 <- maxLik(logLik = ols.lf2, start = c(mu = 1, theta1 = 1, theta2 = 1))
summary(mle_ols2)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 9 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4861.964 
## 3  free parameters
## Estimates:
##        Estimate Std. error t value Pr(> t)    
## mu     3.516764   0.070320   50.01  <2e-16 ***
## theta1 1.461011   0.106745   13.69  <2e-16 ***
## theta2 0.109081   0.009185   11.88  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------

INTERPRETATION:

In the above analysis now I am looking for Max Liklihood (MLE) of unit for change in Standard Deviation (Slide 4.19). This is a different way of analysis of above variables and relationship between them. Instead of standard error I get mean error, mu=3.51676. Our theta 1 is 1.46101 which is the intercept with zero number of education and theta2 is 0.10908 which is the slpoe indicates that an increase in education by 1 year there will be an increase in Standard Deviation of income by 0.10908. It shows that an increase in education will cause ineqaulity in income.

THE RELATIONSHIP BETWEEN INCOME, EDUCATION AND AGE:

Adding another coefficient “AGE”" and Comparing the results:

Linear regression using MLE for Mean Average: log likelihood function:

ols.lf3 <- function(param) {
  beta <- param[-1]
  sigma <- param[1]
  y <- as.vector(turnout$income)
  x <- cbind(1, turnout$educate, turnout$age)
  mu <- x%*%beta
  sum(dnorm(y, mu, sigma, log = TRUE))}    
param<-c(1,2,3,4,5)
param[-1]
## [1] 2 3 4 5
param[1]
## [1] 1

Maximizing the log likelihood function:

library(maxLik)
mle_ols3 <- maxLik(logLik = ols.lf3, start = c(sigma = 1, beta1 = 1, beta2 = 1, beta3=1))

summary(mle_ols3)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 16 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4690.815 
## 4  free parameters
## Estimates:
##        Estimate Std. error t value Pr(> t)    
## sigma  2.525576   0.039919  63.268  <2e-16 ***
## beta1 -0.446047   0.300583  -1.484   0.138    
## beta2  0.371011   0.017493  21.209  <2e-16 ***
## beta3 -0.003184   0.003373  -0.944   0.345    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------

Checking the Results Through Linear Model

summary(lm(income~educate+age, data = turnout))
## 
## Call:
## lm(formula = income ~ educate + age, data = turnout)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.2128 -1.7471 -0.4217  1.3042 11.1256 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.446084   0.303955  -1.468    0.142    
## educate      0.371013   0.017641  21.031   <2e-16 ***
## age         -0.003183   0.003394  -0.938    0.348    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.527 on 1997 degrees of freedom
## Multiple R-squared:  0.2014, Adjusted R-squared:  0.2006 
## F-statistic: 251.8 on 2 and 1997 DF,  p-value: < 2.2e-16

INTERPRETATION:

In the above analysis I have added age as a coefficient along with education to analyse changes in income. First, I am looking at maximizing the likelihood estimate of the mean. The value of sigma is 2.525576 which is standard error of model and is same as compared to above analysis, have beta1= -0.446047 is the intercept indicating average income and slight difference when used education parameter i.e -0.65207 , beta2 is 0.371011 which is the slope (almost same result when used with single parameter education) and indicates that an increase in education by 1 year, on average will increase income by 0.371011 units and beta3 is -0.003184, which is an age parameter, indicates that an increase in age by 1 year, on average will decrease income by -0.003184 units. Decrease in income with increase in an age seems to be of no sence to me and in the result this predictor is not statistically significant but education is.

Does Age Influnce Income Inequality?

Linear regression using MLE for Standard Deviation- Using log likelihood function:

ols.lf4 <- function(param) {
  mu <- param[1]
  theta <- param[-1]
  y <- as.vector(turnout$income)
  x <- cbind(1, turnout$educate, turnout$age)
  sigma <- x%*%theta
sum(dnorm(y, mu, sigma, log=TRUE))}

Maximizing the log likelihood function:

library(maxLik)
mle_ols4 <- maxLik(logLik = ols.lf4, start = c(mu = 1, theta1 = 1, theta2 = 1,theta3=1))
summary(mle_ols4)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 3 iterations
## Return code 3: Last step could not find a value above the current.
## Boundary of parameter space?  
## Consider switching to a more robust optimisation method temporarily.
## Log-Likelihood: -7542.444 
## 4  free parameters
## Estimates:
##        Estimate Std. error t value Pr(> t)   
## mu       1.0033     0.3368   2.979 0.00289 **
## theta1   0.9810         NA      NA      NA   
## theta2   0.7662         NA      NA      NA   
## theta3   0.1531         NA      NA      NA   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------

INTERPRETATION:

In the above analysis now I am looking for Max Liklihood (MLE) of unit for changes in Standard Deviation. Mu=1.0033, Our theta1 is 0.9810 which is the intercept, theta2 is 0.7662 which is a new slpoe of education indicates that an increase in education by 1 year there will be an increase in Standard Deviation of income by 0.7662 and theta3 is 0.1531, the slop for age, indicates that an increase in age by 1 year there will be an increase in Standard Deviation of income by 0.1531. It shows that an increase in age and education will cause ineqyaulity in income.