Setting up the Data:

library(Zelig)
## Loading required package: survival
data("turnout")
head(turnout)
##    race age educate income vote
## 1 white  60      14 3.3458    1
## 2 white  51      10 1.8561    0
## 3 white  24      12 0.6304    0
## 4 white  38       8 3.4183    1
## 5 white  25      12 2.7852    1
## 6 white  67      12 2.3866    1

Writing the Log Likelihood Function:

ols.lf <- function(param) {
  beta <- param[-1]
  sigma <- param[1]
  y <- as.vector(turnout$income)
  x <- cbind(1, turnout$educate)
  mu <- x%*%beta
  sum(dnorm(y, mu, sigma, log = TRUE))}

Slides 4.15 & 4.19 Similarities:

In both slides maximum likelihood estimation (MLE) is used to show the relationship between education and income. In this scenario, education is the independent variable and income is the dependent variable. The function used in 4.15 uses sigma, σ, (standard deviation) and the function in 4.19 uses mu, μ (mean) of income to analyze it’s relationship with education.

Code for the Relationship Between Income and Education (4.15):

library(maxLik)
mle_ols <- maxLik(logLik = ols.lf, start = c(sigma = 1, beta1 = 1, beta2 = 1))
summary(mle_ols)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 12 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4691.256 
## 3  free parameters
## Estimates:
##       Estimate Std. error t value Pr(> t)    
## sigma  2.52613    0.03989  63.326 < 2e-16 ***
## beta1 -0.65207    0.20827  -3.131 0.00174 ** 
## beta2  0.37613    0.01663  22.612 < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------

Checking the results:

summary(lm(income ~ educate, data = turnout))
## 
## Call:
## lm(formula = income ~ educate, data = turnout)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.2028 -1.7363 -0.4273  1.3150 11.0632 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.65207    0.21016  -3.103  0.00194 ** 
## educate      0.37613    0.01677  22.422  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.527 on 1998 degrees of freedom
## Multiple R-squared:  0.201,  Adjusted R-squared:  0.2006 
## F-statistic: 502.8 on 1 and 1998 DF,  p-value: < 2.2e-16

Slide 4.15: Relationship between Income and Education:

To determine the relationship between education and income we used the equation: Y∼N(β0+β1X,σ). In the code, we have specified three parameters: sigma, beta1, and beta2. The output shows sigma = 2.526 (standard error), beta1 = -0.652 (intercept – average income), and beta2 = 0.376 (slope).

The output shows that those with 0 years of education have an average income of (beta1) -0.652 units and for every one year increase in level of education income will increase by (beta2) 0.376 units.

This shows that as the level of education increases, income also increases.

Writing the Log Likelihood Function (4.19):

ols.lf2 <- function(param) {
  mu <- param[1]
  theta <- param[-1]
  y <- as.vector(turnout$income)
  x <- cbind(1, turnout$educate)
  sigma <- x%*%theta
  sum(dnorm(y, mu, sigma, log = TRUE))
}    

Code for education’s influence on income inequality (4.19):

library(maxLik)
mle_ols2 <- maxLik(logLik = ols.lf2, start = c(mu = 1, theta1 = 1, theta2 = 1))
summary(mle_ols2)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 9 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4861.964 
## 3  free parameters
## Estimates:
##        Estimate Std. error t value Pr(> t)    
## mu     3.516764   0.070320   50.01  <2e-16 ***
## theta1 1.461011   0.106745   13.69  <2e-16 ***
## theta2 0.109081   0.009185   11.88  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------

Slide 4.19: Does Education Influence Income Inequality?

To determine the relationship between income and education we used the equation: Y∼N(β0+β1X,σ). Now, in order to demonstrate education’s possible influence of income inequality we use the equation: Y∼N(μ,θ0+θ1X). In the code, we have specified three parameters: mu, theta1, theta2. The output shows mu = 3.516 (mean error), theta1 = 1.461 (intercept – education = 0), and theta2 = 0.109 (slope).

This means that as the level of education increases by one year, the standard deviation will increase by (theta2) 0.109 units. This shows that education does influence income inequality.

Adding another variable, Age:

ols.lf3 <- function(param) {
  beta <- param[-1]
  sigma <- param[1]
  y <- as.vector(turnout$income)
  x <- cbind(1, turnout$educate, turnout$age)
  mu <- x%*%beta
  sum(dnorm(y, mu, sigma, log = TRUE))}

Adding to the code on 4.15

library(maxLik)
mle_ols3 <- maxLik(logLik = ols.lf3, start = c(sigma = 1, beta1 = 1, beta2 = 1, beta3=1))
summary(mle_ols3)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 16 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4690.815 
## 4  free parameters
## Estimates:
##        Estimate Std. error t value Pr(> t)    
## sigma  2.525576   0.039919  63.268  <2e-16 ***
## beta1 -0.446047   0.300583  -1.484   0.138    
## beta2  0.371011   0.017493  21.209  <2e-16 ***
## beta3 -0.003184   0.003373  -0.944   0.345    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------

Checking the results:

summary(lm(income ~ educate+age, data = turnout))
## 
## Call:
## lm(formula = income ~ educate + age, data = turnout)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.2128 -1.7471 -0.4217  1.3042 11.1256 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.446084   0.303955  -1.468    0.142    
## educate      0.371013   0.017641  21.031   <2e-16 ***
## age         -0.003183   0.003394  -0.938    0.348    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.527 on 1997 degrees of freedom
## Multiple R-squared:  0.2014, Adjusted R-squared:  0.2006 
## F-statistic: 251.8 on 2 and 1997 DF,  p-value: < 2.2e-16

Results: Age added to 4.15

For this part the variable age was added to determine its relationship with income and education. Age is used as an independent variable. In the code, four parameters are specified: signma, beta1, beta2, and beta3. The output shows sigma = 2.525 (standard error), beta1 = -0.446 (intercept – average income), beta2 = 0.371 (slope), and beta3 = -0.003 (age).

The output shows that as age increases by one year the average income decreases by -0.003 units. However, this number is very small and may not be significant.

Adding to the code on 4.19

ols.lf4 <- function(param) {
  mu <- param[1]
  theta <- param[-1]
  y <- as.vector(turnout$income)
  x <- cbind(1, turnout$educate, turnout$age)
  sigma <- x%*%theta
  sum(dnorm(y, mu, sigma, log = TRUE))
}    
library(maxLik)
mle_ols4 <- maxLik(logLik = ols.lf4, start = c(mu = 1, theta1 = 1, theta2 = 1, theta3=1))
summary(mle_ols4)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 3 iterations
## Return code 3: Last step could not find a value above the current.
## Boundary of parameter space?  
## Consider switching to a more robust optimisation method temporarily.
## Log-Likelihood: -7542.444 
## 4  free parameters
## Estimates:
##        Estimate Std. error t value Pr(> t)   
## mu       1.0033     0.3368   2.979 0.00289 **
## theta1   0.9810         NA      NA      NA   
## theta2   0.7662         NA      NA      NA   
## theta3   0.1531         NA      NA      NA   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------

Results: Age added to 4.19

For this part the variable age was added to determine if it influences income inequality. Age is used as another independent variable. The code has four parameters specified: mu, theta1, theta2, and theta3. The output shows mu = 1.003 (mean error), theta1 = 0.981 (intercept – education = 0, age = 0), theta2 = 0.766 (education slope), theta3 = 0.153 (age slope).

This means that as age increases by one year the standard deviation of income will increase by 0.153 units (theta3). We are able to determine that age does influence income inequality.