Within this analysis, we will exam the two different set of factors that impact an individual income. Using both mean the Standard Deviation, in the first set, we will use Maximum Likelihood Estimation (MLE) to study educations effect on revenue. In the second set, again using mean, then standard deviation, we will examine both education and age’s effects on income.
Within this step, I used the data funcation to call a buildin dataset titled “Turnout”. I then used the head command and the tail command to see first six and the last sex rows if my data loaded correctly.
library(Zelig)
library(maxLik)
data("turnout")
This is a funcation to set up the MLE. In this funcation, we are going to use the education variables to see its effects on ones income.
ols.lf <- function(param) {
beta <- param[-1]
sigma <- param[1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate)
mu <- x%*%beta
sum(dnorm(y, mu, sigma, log = TRUE))}
mle_ols <- maxLik(logLik = ols.lf, start = c(sigma = 1, beta1 = 1, beta2 = 1))
As stated in the previous paragraphs, Maximum-Likelihood Estimation (MLE) is a statistical method that is used to estimate the model parameters of a dataset. In other words, MLE aims to find the numbers that characterize data for an entire population. With this notion in mind it must not go unmentioned that in the problem below, we are interested in maximizing the likelihood estimate of the mean for the variables income and education. Based on the results of the tables Beta1 is the intercept. Beta1 suggest that with no years of education, the mean income will be around -.65. Beta2 is the slope in the analysis. Beta2 suggest that one added year of schooling would increase to .37 units of income. Similar results can be seen in the cell labeled LM. The similar results prove that our findings are accurate.
summary(mle_ols)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 12 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4691.256
## 3 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## sigma 2.52613 0.03989 63.326 < 2e-16 ***
## beta1 -0.65207 0.20827 -3.131 0.00174 **
## beta2 0.37613 0.01663 22.612 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
summary(lm(income ~ educate, data = turnout))
##
## Call:
## lm(formula = income ~ educate, data = turnout)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2028 -1.7363 -0.4273 1.3150 11.0632
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.65207 0.21016 -3.103 0.00194 **
## educate 0.37613 0.01677 22.422 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.527 on 1998 degrees of freedom
## Multiple R-squared: 0.201, Adjusted R-squared: 0.2006
## F-statistic: 502.8 on 1 and 1998 DF, p-value: < 2.2e-16
In this section, we are examining the effects education has on income by looking at the standard deviation.
ols.lf2 <- function(param) {
mu <- param[1]
theta <- param[-1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate)
sigma <- x%*%theta
sum(dnorm(y, mu, sigma, log = TRUE))
}
It would appear that there is not much variation between one’s education levels and income. Generally, the more educated you are, the more money you may make.
mle_ols2 <- maxLik(logLik = ols.lf2, start = c(mu = 1, theta1 = 1, theta2 = 1))
summary(mle_ols2)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 9 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4861.964
## 3 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## mu 3.516764 0.070320 50.01 <2e-16 ***
## theta1 1.461011 0.106745 13.69 <2e-16 ***
## theta2 0.109081 0.009185 11.88 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
In this section, we are looking at the effects of age and education on income using the mean.
ols.lf3 <- function(param) {
beta <- param[-1]
sigma <- param[1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate, turnout$age)
mu <- x%*%beta
sum(dnorm(y, mu, sigma, log = TRUE))}
For this analysis, we added age as a second predictor and paired it with education to investigate their effects on income. As the previous analysis, the slop for the education variable is .37 units. However, something odd appeared, the slop for the age parameter is -.003. The number in the previous section indicates that the age variable is not statistically significant in terms of how much a person makes. To double check our findings, please direct your attention to cell titled LM2.
mle_ols3 <- maxLik(logLik = ols.lf3,start = c(sigma = 1, beta1 = 1, beta2 = 1, beta3=1))
summary(mle_ols3)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 16 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4690.815
## 4 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## sigma 2.525576 0.039919 63.268 <2e-16 ***
## beta1 -0.446047 0.300583 -1.484 0.138
## beta2 0.371011 0.017493 21.209 <2e-16 ***
## beta3 -0.003184 0.003373 -0.944 0.345
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
summary(lm(income~educate+age,data= turnout))
##
## Call:
## lm(formula = income ~ educate + age, data = turnout)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2128 -1.7471 -0.4217 1.3042 11.1256
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.446084 0.303955 -1.468 0.142
## educate 0.371013 0.017641 21.031 <2e-16 ***
## age -0.003183 0.003394 -0.938 0.348
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.527 on 1997 degrees of freedom
## Multiple R-squared: 0.2014, Adjusted R-squared: 0.2006
## F-statistic: 251.8 on 2 and 1997 DF, p-value: < 2.2e-16
In this section of we are looking edcation and age’s effects on income in terms of standard deviation.
ols.lf4 <- function(param) {
mu <- param[1]
theta <- param[-1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate, turnout$age)
sigma <- x%*%theta
sum(dnorm(y, mu, sigma, log = TRUE))
}
In this section of our analysis, we wanted to investigate the changes in standard deviation. According to our findings, our slops for education and age are .766 and 1.53 respectively. Our intercept is .98. These results suggest that an increase in age and education will increase the variation or standard deviation for income.
mle_ols4<- maxLik(logLik = ols.lf4,start = c(mu = 1, theta1 = 1, theta2 = 1, theta3 = 1))
summary(mle_ols4)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 3 iterations
## Return code 3: Last step could not find a value above the current.
## Boundary of parameter space?
## Consider switching to a more robust optimisation method temporarily.
## Log-Likelihood: -7542.444
## 4 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## mu 1.0033 0.3368 2.979 0.00289 **
## theta1 0.9810 NA NA NA
## theta2 0.7662 NA NA NA
## theta3 0.1531 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------