knitr::opts_chunk$set(echo = TRUE)
pacman::p_load(maxLik, Zelig)
data(turnout)
head(turnout)
## race age educate income vote
## 1 white 60 14 3.3458 1
## 2 white 51 10 1.8561 0
## 3 white 24 12 0.6304 0
## 4 white 38 8 3.4183 1
## 5 white 25 12 2.7852 1
## 6 white 67 12 2.3866 1
#We have some discrete variables like race and continuous variables like income
#relationship between income and education
#log-likelihood function
ols.lf <- function(param) {
beta <- param[-1]
sigma <- param[1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate)
mu <- x%*%beta
sum(dnorm(y, mu, sigma, log = TRUE))}
#Maximizing log-likelihood using maxLik
mle_ols <- maxLik(logLik = ols.lf, start = c(sigma = 1, beta1 = 1, beta2 = 1))
summary(mle_ols)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 12 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4691.256
## 3 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## sigma 2.52613 0.03989 63.326 < 2e-16 ***
## beta1 -0.65207 0.20827 -3.131 0.00174 **
## beta2 0.37613 0.01663 22.612 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
#Using OLS lm function to replicate same results
summary(lm(income ~ educate, data = turnout))
##
## Call:
## lm(formula = income ~ educate, data = turnout)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2028 -1.7363 -0.4273 1.3150 11.0632
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.65207 0.21016 -3.103 0.00194 **
## educate 0.37613 0.01677 22.422 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.527 on 1998 degrees of freedom
## Multiple R-squared: 0.201, Adjusted R-squared: 0.2006
## F-statistic: 502.8 on 1 and 1998 DF, p-value: < 2.2e-16
For the Maximum Likelihood Estimation, we specify a models distribution and then the estimator will choose the parameters that maximize the likelihood/log-likelihood function. In an essence, we want the location that maximized the likelihood of observing the weights we measured. For the above problem, we are looking at maximizing the likelihood estimate of the mean or average. This means that we will find the parameters that best fit the mean and that maximizes the likelihood of observing those weights.
For instance, here we get returned beta1 and beta2. Beta1 is the intercept and can be read as “Ceteris Paribus, with no years of education, average income is -.65 units(however income was measured)” Beta2, our slope, can be read as “Ceteris Paribus, an increase in education by 1 year on average will increase income by .37 units”. Our sigma, 2.527 is the residual standard error of our model.
ols.lf2 <- function(param) {
mu <- param[1]
theta <- param[-1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate)
sigma <- x%*%theta
sum(dnorm(y, mu, sigma, log = TRUE))
}
mle_ols2 <- maxLik(logLik = ols.lf2, start = c(mu = 1, theta1 = 1, theta2 = 1))
summary(mle_ols2)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 9 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4861.964
## 3 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## mu 3.516764 0.070320 50.01 <2e-16 ***
## theta1 1.461011 0.106745 13.69 <2e-16 ***
## theta2 0.109081 0.009185 11.88 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
Here, instead of getting mean likelihood, we are looking to find max likelihood of weights for changes in standard deviation. This allows us to interpret the data differently and understand a different working relationship between variables. So now instead of our residual standard error being returned as sd, we get a residual mean error returned as 3.5. Our theta1, the new intercept can be read as “Ceteris paribus, 0 years of education will lead to a standard deviation of income of 1.46”. The new slope, theta2, can be read as “Ceteris paribus, an increase in education by 1 year will increase the standard deviation of income by .109”. Instead of using the mean parameters to understand mean changes, we can use standard deviation to understand that increase in education will cause inequality in income through changes in sigma.
ols.lf.age <- function(param) {
beta <- param[-1]
sigma <- param[1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate,turnout$age)
mu <- x%*%beta
sum(dnorm(y, mu, sigma, log = TRUE))}
#Maximizing log-likelihood using maxLik
mle_ols_age <- maxLik(logLik = ols.lf.age, start = c(sigma = 1, beta1 = 1, beta2 = 1,beta3=1))
summary(mle_ols_age)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 16 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4690.815
## 4 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## sigma 2.525576 0.039919 63.268 <2e-16 ***
## beta1 -0.446047 0.300583 -1.484 0.138
## beta2 0.371011 0.017493 21.209 <2e-16 ***
## beta3 -0.003184 0.003373 -0.944 0.345
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
#Comparing to ols
summary(lm(income~educate+age,data=turnout))
##
## Call:
## lm(formula = income ~ educate + age, data = turnout)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2128 -1.7471 -0.4217 1.3042 11.1256
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.446084 0.303955 -1.468 0.142
## educate 0.371013 0.017641 21.031 <2e-16 ***
## age -0.003183 0.003394 -0.938 0.348
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.527 on 1997 degrees of freedom
## Multiple R-squared: 0.2014, Adjusted R-squared: 0.2006
## F-statistic: 251.8 on 2 and 1997 DF, p-value: < 2.2e-16
Adding age as a predictor alongside education in order to predict income changes the parameters and gives us different results. Our intercept is -.446. Our slope for education is still around .37, but our age parameter is -.003. This means that “Ceteris paribus, an increase in age by one year will decrease income on average by .003 units”. This doesn’t seem to make much sense, and in fact, this predictor is not statistically significant, while educate is!
ols.lf2.age <- function(param) {
mu <- param[1]
theta <- param[-1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate,turnout$age)
sigma <- x%*%theta
sum(dnorm(y, mu, sigma, log = TRUE))
}
mle_ols2_age <- maxLik(logLik = ols.lf2.age, start = c(mu = 1, theta1 = 1, theta2 = 1,theta3=1))
summary(mle_ols2_age)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 3 iterations
## Return code 3: Last step could not find a value above the current.
## Boundary of parameter space?
## Consider switching to a more robust optimisation method temporarily.
## Log-Likelihood: -7542.444
## 4 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## mu 1.0033 0.3368 2.979 0.00289 **
## theta1 0.9810 NA NA NA
## theta2 0.7662 NA NA NA
## theta3 0.1531 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
Here we take a look at changes in standard deviation with included predictor age. Our intercept is .98 and our new slope for education is .766. Our slope for age, .153, can be interpreted as “Ceteris paribus, an increase in age by one year will increase standard deviation of income by .153 units.” Since both the age and education coefficients are positive, this hints at income inequality being caused by both increases in age and education.