What is the relationship between age, education, and income?

Slide 4.15 - The Relationship Between Education and Income (Mean)

MODEL 1

In the following results (check below), beta1 (y-intercept) = -0.65207 and beta2 (slope) = 0.37613. The beta1 value shows when x = 0 (education), then mean income = -0.65207. The beta2 value shows as education increases by every 1 year, then mean income increase by 0.37613 units. Sigma = 2.52513 is the residual standard error. This result suggests there is a positive correlation between education and income. Those with higher education will have a higer mean income.

Zelig package

library(Zelig)

Turnout data

data(turnout)
head(turnout)

Log likelihood function

ols.lf <- function(param) {
  beta <- param[-1]
  sigma <- param[1]
  y <- as.vector(turnout$income)
  x <- cbind(1, turnout$educate)
  mu <- x%*%beta
  sum(dnorm(y, mu, sigma, log = TRUE))}  

Maximize the log likelihood function

library(maxLik)
mle_ols <- maxLik(logLik = ols.lf, start = c(sigma = 1, beta1 = 1, beta2 = 1))
summary(mle_ols)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 12 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4691.256 
## 3  free parameters
## Estimates:
##       Estimate Std. error t value Pr(> t)    
## sigma  2.52613    0.03989  63.326 < 2e-16 ***
## beta1 -0.65207    0.20827  -3.131 0.00174 ** 
## beta2  0.37613    0.01663  22.612 < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------

Checking Results

summary(lm(income ~ educate, data = turnout))
## 
## Call:
## lm(formula = income ~ educate, data = turnout)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.2028 -1.7363 -0.4273  1.3150 11.0632 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.65207    0.21016  -3.103  0.00194 ** 
## educate      0.37613    0.01677  22.422  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.527 on 1998 degrees of freedom
## Multiple R-squared:  0.201,  Adjusted R-squared:  0.2006 
## F-statistic: 502.8 on 1 and 1998 DF,  p-value: < 2.2e-16

Slide 4.19 - The Relationship Between Education and Income (Standard Deviation)

MODEL 2

In the following results (check below), theta1 (y-intercept) = 1.461011 and theta2 (slope) = 0.109081. The Mu = 3.516764 is the mean income. The theta1 value shows when x = 0 (education), then the standard devation of income = 1.461011. The theta2 value shows as education increases by every 1 year, then standard deviation of income will increase by 0.109081 units.This result suggests there is a positive correlation between education and income inequality. Those with higher education will have a higher variation in income.

The likelihood function

ols.lf2 <- function(param) {
  mu <- param[1]
  theta <- param[-1]
  y <- as.vector(turnout$income)
  x <- cbind(1, turnout$educate)
  sigma <- x%*%theta
  sum(dnorm(y, mu, sigma, log = TRUE))
}    

The MLE results

library(maxLik)
mle_ols2 <- maxLik(logLik = ols.lf2, start = c(mu = 1, theta1 = 1, theta2 = 1))
summary(mle_ols2)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 9 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4861.964 
## 3  free parameters
## Estimates:
##        Estimate Std. error t value Pr(> t)    
## mu     3.516764   0.070320   50.01  <2e-16 ***
## theta1 1.461011   0.106745   13.69  <2e-16 ***
## theta2 0.109081   0.009185   11.88  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------

The Relationship Between Age, Education, and Income (Mean)

MODEL 3

Pre-analysis: I hypothesize that age and income will positively correlate withe education. As people get older, they get more education and earn more. However, I expect age to negatively correlate with income after a certain age (perhaps 65).

Post-analysis: In the following results (check below), beta1 (y-intercept) = -0.446047, beta2 (slope) = 0.371011, and beta3 (slope) = -0.003184. The beta1 value shows when x = 0 (education), then mean income = -0.0446047. The beta2 value shows as education increases by every 1 year, then mean income increase by 0.371011 units.The beta3 value shows as age increase every year, mean income decreases by -0.0031384 units. Sigma = 2.52576 is the residual standard error. This result suggests there is a positive correlation between education and income and negative correlation between age and income. Those with higher education will have a higer mean income and those with higer age will have a lower mean income. However, with the p-value at 0.345, it is not a statistically significant relationship for age and mean income.

Log likelihood function

ols.lf3<- function(param) {
    beta <- param[-1]
    sigma <- param[1]
    y <- as.vector(turnout$income)
    x <- cbind(1, turnout$educate, turnout$age)
    mu <- x%*%beta
    sum(dnorm(y, mu, sigma, log = TRUE))
}

Maximize the log likelihood function

mle_ols3 <- maxLik(logLik = ols.lf3, start = c(sigma = 1, beta1 = 1, beta2 = 1, beta3= 1))

summary(mle_ols3)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 16 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4690.815 
## 4  free parameters
## Estimates:
##        Estimate Std. error t value Pr(> t)    
## sigma  2.525576   0.039919  63.268  <2e-16 ***
## beta1 -0.446047   0.300583  -1.484   0.138    
## beta2  0.371011   0.017493  21.209  <2e-16 ***
## beta3 -0.003184   0.003373  -0.944   0.345    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------

Checking Results

summary(lm(income~educate+age,data=turnout))
## 
## Call:
## lm(formula = income ~ educate + age, data = turnout)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.2128 -1.7471 -0.4217  1.3042 11.1256 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.446084   0.303955  -1.468    0.142    
## educate      0.371013   0.017641  21.031   <2e-16 ***
## age         -0.003183   0.003394  -0.938    0.348    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.527 on 1997 degrees of freedom
## Multiple R-squared:  0.2014, Adjusted R-squared:  0.2006 
## F-statistic: 251.8 on 2 and 1997 DF,  p-value: < 2.2e-16

The Relationship Between Age, Education, and Income (Standard Deviation)

MODEL 4

Pre-analysis: I hypothesize that age and income will positively correlate withe education. As people get older, they get more education and earn more. However, I expect age to negatively correlate with income after a certain age (perhaps 65).

Post-analysis: In the following results (check below), theta1 (y-intercept) = 0.362114, theta2 (slope) = 0.133349, and beta3 (slope) = 0.017507. In this analysis, Mu = 3.555011 is the mean income. The theta1 value shows when x = 0 (education), then stnadard deviation of income = 0.362114. The theta2 value shows as education increases by every 1 year, then standard deviation of income increase by 0.133349 unites.The theta3 value shows as age increase every year, mean income increases by 0.017507 units. This result suggests there is a positive correlation between education and income and a positive correlation between age and income. Those with higher education will have a higher standard deviaiton of income and those with higher age will have higher standard deviation income. The corresponding p-values show that the results are statistically significant.

The likelihood function

ols.lf4 <- function(param) {
     mu <- param[1]
     theta <- param[-1]
     y <- as.vector(turnout$income)
     x <- cbind(1, turnout$educate, turnout$age)
     sigma <- x%*%theta
     sum(dnorm(y, mu, sigma, log = TRUE))
 }

The MLE Results

mle_ols4<-maxLik(logLik=ols.lf4, start=c(mu=1, theta1=1, theta2=1, theta3=1), method="bfgs")
summary(mle_ols4)
## --------------------------------------------
## Maximum Likelihood estimation
## BFGS maximization, 150 iterations
## Return code 0: successful convergence 
## Log-Likelihood: -4843.15 
## 4  free parameters
## Estimates:
##        Estimate Std. error t value  Pr(> t)    
## mu     3.555011   0.069193  51.378  < 2e-16 ***
## theta1 0.362114   0.204550   1.770   0.0767 .  
## theta2 0.133349   0.010756  12.398  < 2e-16 ***
## theta3 0.017507   0.002852   6.139 8.32e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------

Visualization of Age and Income

The following shows that as age increases, so does income. However, income starts to decrease when age hits around 50. I hypothesized that income would increase with age but income would decrease after age 65. My assumption was based on the retirement age. Perhaps people are retiring earlier than expected.

library(ggplot2)
ggplot(turnout)+
  geom_point(aes(x = age, y = income)) + geom_smooth(aes(x = age, y = income))