Introduction
MLE can be defined as a method for estimating population parameters (the mean and variance for normal distribution) from sample data such that the likelihood or probability of obtaining the observed data is maximized. When using MLE we need to first determine if the probability distribution is discrete or continuous. If a discrete distribution (i.e. race, class, gender etc.) is used, it can be described by Probability Mass Function (PMF) and if a continuous distribution (i.e. income, height, temperature etc.) is used, it can be described by Probability Density Function (PDF).
The following analyses are done using the “turnout” data from Zelig package. Since the dependent variable is income (continuous), PDF is used. The two MLE models - log-likelihood and likelihood functions - predict different things, but they are both useful in understanding what is going on with the data.
library(Zelig)
## Loading required package: survival
library(maxLik)
## Loading required package: miscTools
##
## Please cite the 'maxLik' package as:
## Henningsen, Arne and Toomet, Ott (2011). maxLik: A package for maximum likelihood estimation in R. Computational Statistics 26(3), 443-458. DOI 10.1007/s00180-010-0217-1.
##
## If you have questions, suggestions, or comments regarding the 'maxLik' package, please use a forum or 'tracker' at maxLik's R-Forge site:
## https://r-forge.r-project.org/projects/maxlik/
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following object is masked from 'package:Zelig':
##
## stat
data(turnout)
head(turnout)
## race age educate income vote
## 1 white 60 14 3.3458 1
## 2 white 51 10 1.8561 0
## 3 white 24 12 0.6304 0
## 4 white 38 8 3.4183 1
## 5 white 25 12 2.7852 1
## 6 white 67 12 2.3866 1
turnout%>%
group_by(educate)%>%
summarize(mean_income=mean(income))%>%
ggplot()+
geom_col(aes(x=educate, y=mean_income, fill=mean_income))+
theme(legend.position="none")
Interpretation: The bar graph shows that as education of the participants in the sample data increases, their mean income increases.
logLikFun <- function(param) {
mu <- param[1]
sigma <- param[2]
sum(dnorm(x, mean = mu, sd = sigma, log = TRUE))
}
ols.lf <- function(param) {
beta <- param[-1]
sigma <- param[1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate)
mu <- x%*%beta
sum(dnorm(y, mu, sigma, log = TRUE))}
mle_ols <- maxLik(logLik = ols.lf, start = c(sigma = 1, beta1 = 1, beta2 = 1))
summary(mle_ols)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 12 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4691.256
## 3 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## sigma 2.52613 0.03989 63.326 < 2e-16 ***
## beta1 -0.65207 0.20827 -3.131 0.00174 **
## beta2 0.37613 0.01663 22.612 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
Interpretation: The above analysis (slide 4.15) uses the log-likelihood function, which shows the effect of education on income. Here, education is the independent variable while income is the dependent variable. The three parameters in this analysis are sigma, beta1 and beta2.
The standard deviation is 2.53. Beta1 represents the y-intercept - which indicates that voters who have no education have an income of -0.65. Beta2 is the slope and it shows the effect of education on income, The value of beta2 is 0.38, which means for every one unit increase in education, a person’s income in this dataset is expected to increase by 0.38 units. Therefore, the variables education and income are positively correlated. The results for both beta1 and beta2 estimates are statistically significant at an alpha value of 0.05.
ols.lf2 <- function(param){
mu <- param[1]
theta <- param[-1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate)
sigma <- x%*%theta
sum(dnorm(y, mu, sigma, log = TRUE))
}
In appearance, this function is similar to the log-likelihood function. The independent and dependent variables are still the same here. However, beta and sigma have been replaced with mu and theta. Here, we are not looking for means, but standard deviations across the levels of education.
mle_ols2 <- maxLik(logLik = ols.lf2, start = c(mu = 1, theta1 = 1, theta2 = 1))
summary(mle_ols2)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 9 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4861.964
## 3 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## mu 3.516764 0.070320 50.01 <2e-16 ***
## theta1 1.461011 0.106745 13.69 <2e-16 ***
## theta2 0.109081 0.009185 11.88 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
Interpretation: This analysis (slide 4.19) uses the likelihood function to show the effect education has on the variability of income. The three parameters in this analysis are mu, theta1 and theta2. Here, standard deviation is split into two: intercept (theta1) and slope (theta2). The mean income (u) is 3.52 units. Theta1 is the y-intercept - which indicates that voters with no education have a 1.46 variability on income. Theta2 represents the effect education has on the standard deviation on income. The value of theta2 is 0.11, which means that one unit increase in education results in an increase of 0.11 unit increase in the standard deviation of income. Results for both theta1 and theta2 are statistically significant at an alpha value of 0.05.
Adding in a second independent variable, will increase the validity of MLE. The more independent variables there is, the more precise our estimate will be. If another variable, age, was added into these two models, there would be another beta, and theta added.
Prediction
My prediction is that age and income will be positively correlated. Also, i think the education coefficient will decrease as the age coefficient would be explaining for a good portion of income now. However, age and education are interrerlated, therefore, the significance of adding the variable age to the previous ewuation is uncertain.
Log-likelihood function (Income, Education and Age)
ols.lf <- function(param) {
beta <- param[-1] #Regression Coefficients
sigma <- param[1] #Standard Deviation
y <- as.vector(turnout$income) #DV
x <- cbind(1, turnout$educate, turnout$age) #IV
mu <- x%*%beta #multiply matrices
sum(dnorm(y, mu, sigma, log = TRUE)) #normal distribution(vector of observations, mean, sd)
}
mle_ols <- maxLik(logLik = ols.lf, start = c(sigma = 1, beta1 = 1, beta2 = 1, beta3=1))
summary(mle_ols)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 16 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4690.815
## 4 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## sigma 2.525576 0.039919 63.268 <2e-16 ***
## beta1 -0.446047 0.300583 -1.484 0.138
## beta2 0.371011 0.017493 21.209 <2e-16 ***
## beta3 -0.003184 0.003373 -0.944 0.345
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
Here again, sigma the standard deviation of income, beta1 is the intercept, beta 2 is the slope for education and beta 3 is the slope for age.
Interpretation: The results indicate that the age coefficient is negative, meaning that my prediction was wrong regarding the correlation. In fact, age and income is actually negatively correlated. As age increases, income is expected to decrease. But this finding is not statistically significant at all, with a p-value of 0.348, and therefore, is valueless.
The education coefficient did decreased just a tiny bit, from 0.376 to 0.371, meaning that for every additional unit of education there will be a 0.37 unit increase in income. The standard deviation is estimated to remains the same as before, at a value of 2.52. Education is still statistically significant in predicting income at an alpha value of 0.05.
Likelihood function (Income, Age, Education)
ols.lf2 <- function(param) {
mu <- param[1]
theta <- param[-1]
y <- as.vector(turnout$income) #DV
x <- cbind(1, turnout$educate, turnout$age) #IV
sigma <- x%*%theta #multiply matrices
sum(dnorm(y, mu, sigma, log = TRUE)) #normal distribution(vector of observations, mean, sd)
}
mle_ols2 <- maxLik(logLik = ols.lf2, start = c(mu = 1, theta1 = 1, theta2 = 1, theta3 = 1), method="BFGS")
summary(mle_ols2)
## --------------------------------------------
## Maximum Likelihood estimation
## BFGS maximization, 150 iterations
## Return code 0: successful convergence
## Log-Likelihood: -4843.15
## 4 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## mu 3.555011 0.069193 51.378 < 2e-16 ***
## theta1 0.362114 0.204550 1.770 0.0767 .
## theta2 0.133349 0.010756 12.398 < 2e-16 ***
## theta3 0.017507 0.002852 6.139 8.32e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
Interpretation: The results show the effect education has on the variability of income and age. The mean income (u) is 3.55 units - a slight increase than before. Theta 1 shows that income will be 0.36 unit for those with no education at all. Theta2 represents the effect education has on the standard deviation on income. The value of theta2 increased from 0.11 to 0.13, which means that with the current model, one unit increase in education results in an increase of 0.13 unit increase in the standard deviation of income. Results for both theta2 and theta3 are statistically significant at an alpha value of 0.05.Therefore, we can conclude that an increase in age and education is likely to cause inequality in income for the participants in our sample data.
.