library(Zelig)
## Loading required package: survival
data("turnout")
head(turnout)
## race age educate income vote
## 1 white 60 14 3.3458 1
## 2 white 51 10 1.8561 0
## 3 white 24 12 0.6304 0
## 4 white 38 8 3.4183 1
## 5 white 25 12 2.7852 1
## 6 white 67 12 2.3866 1
ols.lf <- function(param) {
beta <- param[-1]
sigma <- param[1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate)
mu <- x%*%beta
sum(dnorm(y, mu, sigma, log = TRUE))}
In both slides maximum likelihood estimation (MLE) is used to show the relationship between education and income. In this scenario, education is the independent variable and income is the dependent variable. The function used in 4.15 uses sigma, σ, (standard deviation) and the function in 4.19 uses mu, μ (mean) of income to analyze it’s relationship with education.
library(maxLik)
mle_ols <- maxLik(logLik = ols.lf, start = c(sigma = 1, beta1 = 1, beta2 = 1))
summary(mle_ols)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 12 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4691.256
## 3 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## sigma 2.52613 0.03989 63.326 < 2e-16 ***
## beta1 -0.65207 0.20827 -3.131 0.00174 **
## beta2 0.37613 0.01663 22.612 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
summary(lm(income ~ educate, data = turnout))
##
## Call:
## lm(formula = income ~ educate, data = turnout)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2028 -1.7363 -0.4273 1.3150 11.0632
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.65207 0.21016 -3.103 0.00194 **
## educate 0.37613 0.01677 22.422 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.527 on 1998 degrees of freedom
## Multiple R-squared: 0.201, Adjusted R-squared: 0.2006
## F-statistic: 502.8 on 1 and 1998 DF, p-value: < 2.2e-16
To determine the relationship between education and income we used the equation: Y∼N(β0+β1X,σ). In the code, we have specified three parameters: sigma, beta1, and beta2. The output shows sigma = 2.526 (standard error), beta1 = -0.652 (intercept – average income), and beta2 = 0.376 (slope).
The output shows that those with 0 years of education have an average income of (beta1) -0.652 units and for every one year increase in level of education income will increase by (beta2) 0.376 units.
This shows that as the level of education increases, income also increases.
ols.lf2 <- function(param) {
mu <- param[1]
theta <- param[-1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate)
sigma <- x%*%theta
sum(dnorm(y, mu, sigma, log = TRUE))
}
library(maxLik)
mle_ols2 <- maxLik(logLik = ols.lf2, start = c(mu = 1, theta1 = 1, theta2 = 1))
summary(mle_ols2)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 9 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4861.964
## 3 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## mu 3.516764 0.070320 50.01 <2e-16 ***
## theta1 1.461011 0.106745 13.69 <2e-16 ***
## theta2 0.109081 0.009185 11.88 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
To determine the relationship between income and education we used the equation: Y∼N(β0+β1X,σ). Now, in order to demonstrate education’s possible influence of income inequality we use the equation: Y∼N(μ,θ0+θ1X). In the code, we have specified three parameters: mu, theta1, theta2. The output shows mu = 3.516 (mean error), theta1 = 1.461 (intercept – education = 0), and theta2 = 0.109 (slope).
This means that as the level of education increases by one year, the standard deviation will increase by (theta2) 0.109 units. This shows that education does influence income inequality.
ols.lf3 <- function(param) {
beta <- param[-1]
sigma <- param[1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate, turnout$age)
mu <- x%*%beta
sum(dnorm(y, mu, sigma, log = TRUE))}
library(maxLik)
mle_ols3 <- maxLik(logLik = ols.lf3, start = c(sigma = 1, beta1 = 1, beta2 = 1, beta3=1))
summary(mle_ols3)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 16 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4690.815
## 4 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## sigma 2.525576 0.039919 63.268 <2e-16 ***
## beta1 -0.446047 0.300583 -1.484 0.138
## beta2 0.371011 0.017493 21.209 <2e-16 ***
## beta3 -0.003184 0.003373 -0.944 0.345
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
summary(lm(income ~ educate+age, data = turnout))
##
## Call:
## lm(formula = income ~ educate + age, data = turnout)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2128 -1.7471 -0.4217 1.3042 11.1256
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.446084 0.303955 -1.468 0.142
## educate 0.371013 0.017641 21.031 <2e-16 ***
## age -0.003183 0.003394 -0.938 0.348
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.527 on 1997 degrees of freedom
## Multiple R-squared: 0.2014, Adjusted R-squared: 0.2006
## F-statistic: 251.8 on 2 and 1997 DF, p-value: < 2.2e-16
For this part the variable age was added to determine its relationship with income and education. Age is used as an independent variable. In the code, four parameters are specified: signma, beta1, beta2, and beta3. The output shows sigma = 2.525 (standard error), beta1 = -0.446 (intercept – average income), beta2 = 0.371 (slope), and beta3 = -0.003 (age).
The output shows that as age increases by one year the average income decreases by -0.003 units. However, this number is very small and may not be significant.
ols.lf4 <- function(param) {
mu <- param[1]
theta <- param[-1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate, turnout$age)
sigma <- x%*%theta
sum(dnorm(y, mu, sigma, log = TRUE))
}
library(maxLik)
mle_ols4 <- maxLik(logLik = ols.lf4, start = c(mu = 1, theta1 = 1, theta2 = 1, theta3=1))
summary(mle_ols4)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 3 iterations
## Return code 3: Last step could not find a value above the current.
## Boundary of parameter space?
## Consider switching to a more robust optimisation method temporarily.
## Log-Likelihood: -7542.444
## 4 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## mu 1.0033 0.3368 2.979 0.00289 **
## theta1 0.9810 NA NA NA
## theta2 0.7662 NA NA NA
## theta3 0.1531 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
For this part the variable age was added to determine if it influences income inequality. Age is used as another independent variable. The code has four parameters specified: mu, theta1, theta2, and theta3. The output shows mu = 1.003 (mean error), theta1 = 0.981 (intercept – education = 0, age = 0), theta2 = 0.766 (education slope), theta3 = 0.153 (age slope).
This means that as age increases by one year the standard deviation of income will increase by 0.153 units (theta3). We are able to determine that age does influence income inequality.