For this purpose, first I installed maxLik: Maximum likelihood estimation and Zelig: Unified interface for statistical modeling packages.
“Turnout” data, derived from package Zelig contains such variables as voters’ age, income, and educatio among others.
Using MLE method with independant variable (education) and dependant variable (income) on one of the linear regression models (slide 4.15). The question to be asked: Is there any correlation between years of education and average income of voters? Beta1 -0.652 (y-intercept) is income value point for those who has zero years of education. Beta2 0.376 (the slope) is a value point of how much the mean income increases with one year of obtained education. Apparently, an increase of one year (x=0) of education respectively increase the mean income, mu (by 0.376). Thus, it is feasible to state that there is a positive correlation between education and mean income of voters. With more years of education, the mean income will increase respectively.
#Installing libraries and deriving data
library(Zelig)
library(maxLik)
# Loading data
data(turnout)
head (turnout)
# Using Log-likelihood function to find the relationships between education and mean income (mu)
ols.lf <- function(param) {
beta <- param[-1]
sigma <- param[1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate)
mu <- x%*%beta
sum(dnorm(y, mu, sigma, log = TRUE))}
# Maximixing the log likelihood function
mle_ols <- maxLik(logLik = ols.lf, start = c(sigma = 1, beta1 = 1, beta2 = 1))
summary (mle_ols)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 12 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4691.256
## 3 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## sigma 2.52613 0.03995 63.236 < 2e-16 ***
## beta1 -0.65207 0.21253 -3.068 0.00215 **
## beta2 0.37613 0.01695 22.192 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
Checking obtained results by using LM function (simplified R Code)
summary(lm(income ~ educate, data = turnout))
##
## Call:
## lm(formula = income ~ educate, data = turnout)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2028 -1.7363 -0.4273 1.3150 11.0632
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.65207 0.21016 -3.103 0.00194 **
## educate 0.37613 0.01677 22.422 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.527 on 1998 degrees of freedom
## Multiple R-squared: 0.201, Adjusted R-squared: 0.2006
## F-statistic: 502.8 on 1 and 1998 DF, p-value: < 2.2e-16
Is there any relashionships between years of educaion and income inaquality? Does more years of education lead to higher income inequality among voters? On slide 4.19, standard deviation is used as a dependant variable to answer this question. Theta1 is y-intercept and theta2 is a slope. X is number of years of education (x=0). Increase of one year of education will change the point value of theta1 (1.461) to theta2 (0.109) meaning that there is a significant correlation between education and income inequality. Thus, more years of education lead to higher income inequality among voters.
ols.lf2 <- function(param) {
mu <- param[1]
theta <- param[-1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate)
sigma <- x%*%theta
sum(dnorm(y, mu, sigma, log = TRUE))
}
mle_ols2 <- maxLik(logLik = ols.lf2, start = c(mu = 1, theta1 = 1, theta2 = 1))
summary(mle_ols2)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 9 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4861.964
## 3 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## mu 3.516764 0.070298 50.03 <2e-16 ***
## theta1 1.461010 0.107600 13.58 <2e-16 ***
## theta2 0.109081 0.009243 11.80 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
How does such independant variables like age and education predict mean income of voters? Does income increases with age and education? Assumingly, it does. Analyzing obtained results (-0.446 point value for mean income) show that with one year increase of age, mean income will decrease to 0.003. Thus, there is a negative correlation between age and mean income but it is insignificant. Thus, age cannot be a reliable predictor of mean income because of other unaccounted for factors. Most likely, the retirment is one of those factors. beta1 (-0.446 - mean income with zero years) beta2(0.371 - education) beta3(-0.003 - mean income after one year increase).
Thus, higher number of years of education is a better predictor of higher mean income than age as a whole parameter.
ols.lf3 <- function(param) {
beta <- param[-1]
sigma <- param[1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate, turnout$age)
mu <- x%*%beta
sum(dnorm(y, mu, sigma, log = TRUE))}
mle_ols3 <- maxLik(logLik = ols.lf3,start = c(sigma = 1, beta1 = 1, beta2 = 1, beta3=1))
summary(mle_ols3)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 16 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4690.815
## 4 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## sigma 2.525575 0.039921 63.265 <2e-16 ***
## beta1 -0.446066 0.301804 -1.478 0.139
## beta2 0.371012 0.017563 21.124 <2e-16 ***
## beta3 -0.003184 0.003377 -0.943 0.346
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
Checking the results:
summary(lm(income~educate+age,data= turnout))
##
## Call:
## lm(formula = income ~ educate + age, data = turnout)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2128 -1.7471 -0.4217 1.3042 11.1256
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.446084 0.303955 -1.468 0.142
## educate 0.371013 0.017641 21.031 <2e-16 ***
## age -0.003183 0.003394 -0.938 0.348
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.527 on 1997 degrees of freedom
## Multiple R-squared: 0.2014, Adjusted R-squared: 0.2006
## F-statistic: 251.8 on 2 and 1997 DF, p-value: < 2.2e-16
How do age and education as independant variables influence income ineqaity? Can we state that with one year increase of age and education voters experience higher disparity in their incomes? The results of the obtained model show that there is a positive correlation between age and standard deviation of income. theta1 is 0.362, theta2 is 0.133 and theta3 is 0.175. Thus, we can predict that increase in age increases income inequality among voters.
ols.lf4 <- function(param) {
mu <- param[1]
theta <- param[-1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate, turnout$age)
sigma <- x%*%theta
sum(dnorm(y, mu, sigma, log = TRUE))
}
mle_ols4<- maxLik(logLik = ols.lf4,start = c(mu = 1, theta1 = 1, theta2 = 1, theta3 = 1),method="bfgs")
summary(mle_ols4)
## --------------------------------------------
## Maximum Likelihood estimation
## BFGS maximization, 150 iterations
## Return code 0: successful convergence
## Log-Likelihood: -4843.15
## 4 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## mu 3.555011 0.068543 51.866 < 2e-16 ***
## theta1 0.362114 0.197185 1.836 0.0663 .
## theta2 0.133349 0.010491 12.711 < 2e-16 ***
## theta3 0.017507 0.002788 6.280 3.39e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------