Using Maximum Likelihood Estimations to exam the factors that influences ones income.

Introduction

Within this analysis, we will exam the two different set of factors that impact an individual income. Using both mean the Standard Deviation, in the first set, we will use Maximum Likelihood Estimation (MLE) to study educations effect on revenue. In the second set, again using mean, then standard deviation, we will examine both education and age’s effects on income. 

Our Analysis

Within this step, I used the data funcation to call a buildin dataset titled “Turnout”. I then used the head command and the tail command to see first six and the last sex rows if my data loaded correctly.

library(Zelig)
library(maxLik)
data("turnout")

This is a funcation to set up the MLE. In this funcation, we are going to use the education variables to see its effects on ones income.

ols.lf <- function(param) {
  beta <- param[-1]
  sigma <- param[1]
  y <- as.vector(turnout$income)
  x <- cbind(1, turnout$educate)
  mu <- x%*%beta
  sum(dnorm(y, mu, sigma, log = TRUE))} 

mle_ols <- maxLik(logLik = ols.lf, start = c(sigma = 1, beta1 = 1, beta2 = 1))

As stated in the previous paragraphs, Maximum-Likelihood Estimation (MLE) is a statistical method that is used to estimate the model parameters of a dataset. In other words, MLE aims to find the numbers that characterize data for an entire population. With this notion in mind it must not go unmentioned that in the problem below, we are interested in maximizing the likelihood estimate of the mean for the variables income and education. Based on the results of the tables Beta1 is the intercept. Beta1 suggest that with no years of education, the mean income will be around -.65. Beta2 is the slope in the analysis. Beta2 suggest that one added year of schooling would increase to .37 units of income. Similar results can be seen in the cell labeled LM. The similar results prove that our findings are accurate.

summary(mle_ols)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 12 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4691.256 
## 3  free parameters
## Estimates:
##       Estimate Std. error t value Pr(> t)    
## sigma  2.52613    0.03989  63.326 < 2e-16 ***
## beta1 -0.65207    0.20827  -3.131 0.00174 ** 
## beta2  0.37613    0.01663  22.612 < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------

LM

summary(lm(income ~ educate, data = turnout))
## 
## Call:
## lm(formula = income ~ educate, data = turnout)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.2028 -1.7363 -0.4273  1.3150 11.0632 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.65207    0.21016  -3.103  0.00194 ** 
## educate      0.37613    0.01677  22.422  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.527 on 1998 degrees of freedom
## Multiple R-squared:  0.201,  Adjusted R-squared:  0.2006 
## F-statistic: 502.8 on 1 and 1998 DF,  p-value: < 2.2e-16

In this section, we are examining the effects education has on income by looking at the standard deviation.

ols.lf2 <- function(param) {
  mu <- param[1]
  theta <- param[-1]
  y <- as.vector(turnout$income)
  x <- cbind(1, turnout$educate)
  sigma <- x%*%theta
  sum(dnorm(y, mu, sigma, log = TRUE))
}    

It would appear that there is not much variation between one’s education levels and income. Generally, the more educated you are, the more money you may make.

mle_ols2 <- maxLik(logLik = ols.lf2, start = c(mu = 1, theta1 = 1, theta2 = 1))
summary(mle_ols2)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 9 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4861.964 
## 3  free parameters
## Estimates:
##        Estimate Std. error t value Pr(> t)    
## mu     3.516764   0.070320   50.01  <2e-16 ***
## theta1 1.461011   0.106745   13.69  <2e-16 ***
## theta2 0.109081   0.009185   11.88  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------

In this section, we are looking at the effects of age and education on income using the mean.

ols.lf3 <- function(param) {
  beta <- param[-1]
  sigma <- param[1]
  y <- as.vector(turnout$income)
  x <- cbind(1, turnout$educate, turnout$age)
  mu <- x%*%beta
  sum(dnorm(y, mu, sigma, log = TRUE))} 

For this analysis, we added age as a second predictor and paired it with education to investigate their effects on income. As the previous analysis, the slop for the education variable is .37 units. However, something odd appeared, the slop for the age parameter is -.003. The number in the previous section indicates that the age variable is not statistically significant in terms of how much a person makes. To double check our findings, please direct your attention to cell titled LM2.

mle_ols3 <- maxLik(logLik = ols.lf3,start = c(sigma = 1, beta1 = 1, beta2 = 1, beta3=1))
summary(mle_ols3)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 16 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4690.815 
## 4  free parameters
## Estimates:
##        Estimate Std. error t value Pr(> t)    
## sigma  2.525576   0.039919  63.268  <2e-16 ***
## beta1 -0.446047   0.300583  -1.484   0.138    
## beta2  0.371011   0.017493  21.209  <2e-16 ***
## beta3 -0.003184   0.003373  -0.944   0.345    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------

LM2

summary(lm(income~educate+age,data= turnout))
## 
## Call:
## lm(formula = income ~ educate + age, data = turnout)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.2128 -1.7471 -0.4217  1.3042 11.1256 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.446084   0.303955  -1.468    0.142    
## educate      0.371013   0.017641  21.031   <2e-16 ***
## age         -0.003183   0.003394  -0.938    0.348    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.527 on 1997 degrees of freedom
## Multiple R-squared:  0.2014, Adjusted R-squared:  0.2006 
## F-statistic: 251.8 on 2 and 1997 DF,  p-value: < 2.2e-16

In this section of we are looking edcation and age’s effects on income in terms of standard deviation.

ols.lf4 <- function(param) {
  mu <- param[1]
  theta <- param[-1]
  y <- as.vector(turnout$income)
  x <- cbind(1, turnout$educate, turnout$age)
  sigma <- x%*%theta
  sum(dnorm(y, mu, sigma, log = TRUE))
}    

In this section of our analysis, we wanted to investigate the changes in standard deviation. According to our findings, our slops for education and age are .766 and 1.53 respectively. Our intercept is .98. These results suggest that an increase in age and education will increase the variation or standard deviation for income.

mle_ols4<- maxLik(logLik = ols.lf4,start = c(mu = 1, theta1 = 1, theta2 = 1, theta3 = 1))
summary(mle_ols4)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 3 iterations
## Return code 3: Last step could not find a value above the current.
## Boundary of parameter space?  
## Consider switching to a more robust optimisation method temporarily.
## Log-Likelihood: -7542.444 
## 4  free parameters
## Estimates:
##        Estimate Std. error t value Pr(> t)   
## mu       1.0033     0.3368   2.979 0.00289 **
## theta1   0.9810         NA      NA      NA   
## theta2   0.7662         NA      NA      NA   
## theta3   0.1531         NA      NA      NA   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------