This assignment highlights the use of the Maximum Likelihood Estimation (MLE) and the Generalized Linear Model (GLM.) The Zelig package was used with the turnout data set. The question that we hope to explore using the MLE & Generalized Linear Model is: Does Education influence Income Inequality?
The variables used were:
We essentially want to see if there is any relationship between education and income inequality e.g is there a positive/negative relationship between the two and we hope to find this by utilizing the MLE method & GLM.
mle_ols <- maxLik(logLik = ols.lf, start = c(sigma = 1, beta1 = 1, beta2 = 1))
summary(mle_ols)
--------------------------------------------
Maximum Likelihood estimation
Newton-Raphson maximisation, 12 iterations
Return code 2: successive function values within tolerance limit
Log-Likelihood: -4691.256
3 free parameters
Estimates:
Estimate Std. error t value Pr(> t)
sigma 2.52613 0.03989 63.326 < 2e-16 ***
beta1 -0.65207 0.20827 -3.131 0.00174 **
beta2 0.37613 0.01663 22.612 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
--------------------------------------------
INTERPRETATION #1 (Slide 4.15)
For this first analysis, there are 3 paramaters used: sigma, beta1 and beta2.
The data above shows that for every 1 unit increase in education, there is a 0.37613 increase in mean income inequality, which ultimately means that there is a positive correlation between education and income inequality and it is statistically significant as the p-value is <0.05.
ols.lf2 <- function(param) {
mu <- param[1]
theta <- param[-1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate)
sigma <- x%*%theta
sum(dnorm(y, mu, sigma, log = TRUE))
}
library(maxLik)
mle_ols2 <- maxLik(logLik = ols.lf2, start = c(mu = 1, theta1 = 1, theta2 = 1))
summary(mle_ols2)
--------------------------------------------
Maximum Likelihood estimation
Newton-Raphson maximisation, 9 iterations
Return code 2: successive function values within tolerance limit
Log-Likelihood: -4861.964
3 free parameters
Estimates:
Estimate Std. error t value Pr(> t)
mu 3.516764 0.070320 50.01 <2e-16 ***
theta1 1.461011 0.106745 13.69 <2e-16 ***
theta2 0.109081 0.009185 11.88 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
--------------------------------------------
INTERPRETATION # 2 (Slide 4.19)
For this first analysis, there are 3 paramaters used: Mu, theta1 and theta2.
The above data shows for every 1 unit increase in education, there is a 0.109081 unit increase in the standard deviation of income inequality. The variables are positively correlated and the results are statistically significant as the p value is <0.05.
mle_ols3 <- maxLik(logLik = ols.lf3, start = c(sigma = 1, beta1 = 1, beta2 = 1, beta3 = 1))
summary(mle_ols3)
--------------------------------------------
Maximum Likelihood estimation
Newton-Raphson maximisation, 16 iterations
Return code 2: successive function values within tolerance limit
Log-Likelihood: -4690.815
4 free parameters
Estimates:
Estimate Std. error t value Pr(> t)
sigma 2.525576 0.039919 63.268 <2e-16 ***
beta1 -0.446047 0.300583 -1.484 0.138
beta2 0.371011 0.017493 21.209 <2e-16 ***
beta3 -0.003184 0.003373 -0.944 0.345
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
--------------------------------------------
In the above analysis, a second INDEPENDENT variable, age, was added to the analysis to see if income inequality would vary at all and if so, what is the direction of the relationship.
The new question being asked is: Does Age influence Income Inequality?
The variables used were:
INTERPRETATION # 3
For this first analysis, there are 4 paramaters used: Sigma, Beta1 and Beta2 & Beta 3.
After analyzing the data it was found that, for every 1 unit increase in age, income inequality decreased by -0.003184 which means that the two variables are negatively correlated and there is NO statistical significance between the variable as the p value is >0.05.
ols.lf4 <- function(param) {
mu <- param[1]
theta <- param[-1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate, turnout$age)
sigma <- x%*%theta
sum(dnorm(y, mu, sigma, log = TRUE))
}
library(maxLik)
mle_ols4 <- maxLik(logLik = ols.lf4, start = c(mu = 1, theta1 = 1, theta2 = 1, theta3 = 1), method = "BFGS")
summary(mle_ols4)
--------------------------------------------
Maximum Likelihood estimation
BFGS maximization, 150 iterations
Return code 0: successful convergence
Log-Likelihood: -4843.15
4 free parameters
Estimates:
Estimate Std. error t value Pr(> t)
mu 3.555011 0.069193 51.378 < 2e-16 ***
theta1 0.362114 0.204550 1.770 0.0767 .
theta2 0.133349 0.010756 12.398 < 2e-16 ***
theta3 0.017507 0.002852 6.139 8.32e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
--------------------------------------------
INTERPRETATION # 4
For this first analysis, there are 4 paramaters used: Mu, theta1 and theta2 & theta 3.
The above interpretation shows that for every 1 unit increase in age, there is a 0.017507 unit increase in the standard deviation of income inequality. This shows a positive correlation between the two variables. In terms of it being statistically significant, the data shows that it is indeed significant with a p value of <0.05.