Using Log-likelihood function:
logLikFun <- function(param) {
mu <- param[1]
sigma <- param[2]
sum(dnorm(x, mean = mu, sd = sigma, log = TRUE))
}
The turnout data, as part of the Zelig package:
library(Zelig)
## Loading required package: survival
data("turnout")
head(turnout)
## race age educate income vote
## 1 white 60 14 3.3458 1
## 2 white 51 10 1.8561 0
## 3 white 24 12 0.6304 0
## 4 white 38 8 3.4183 1
## 5 white 25 12 2.7852 1
## 6 white 67 12 2.3866 1
Linear regression using MLE for Mean-Using log likelihood function:
ols.lf <- function(param) {
beta <- param[-1]
sigma <- param[1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate)
mu <- x%*%beta
sum(dnorm(y, mu, sigma, log = TRUE))}
param<-c(1,2,3,4,5)
param[-1]
## [1] 2 3 4 5
param[1]
## [1] 1
Maximizing the log likelihood function:
library(maxLik)
## Loading required package: miscTools
##
## Please cite the 'maxLik' package as:
## Henningsen, Arne and Toomet, Ott (2011). maxLik: A package for maximum likelihood estimation in R. Computational Statistics 26(3), 443-458. DOI 10.1007/s00180-010-0217-1.
##
## If you have questions, suggestions, or comments regarding the 'maxLik' package, please use a forum or 'tracker' at maxLik's R-Forge site:
## https://r-forge.r-project.org/projects/maxlik/
mle_ols <- maxLik(logLik = ols.lf, start = c(sigma = 1, beta1 = 1, beta2 = 1))
summary(mle_ols)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 12 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4691.256
## 3 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## sigma 2.52613 0.03989 63.326 < 2e-16 ***
## beta1 -0.65207 0.20827 -3.131 0.00174 **
## beta2 0.37613 0.01663 22.612 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
Checking The Results Through Lenear Model:
summary(lm(income~educate, data = turnout))
##
## Call:
## lm(formula = income ~ educate, data = turnout)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2028 -1.7363 -0.4273 1.3150 11.0632
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.65207 0.21016 -3.103 0.00194 **
## educate 0.37613 0.01677 22.422 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.527 on 1998 degrees of freedom
## Multiple R-squared: 0.201, Adjusted R-squared: 0.2006
## F-statistic: 502.8 on 1 and 1998 DF, p-value: < 2.2e-16
In the above analysis I am looking at maximizing the likelihood estimate of the mean (Slide 4.15). I have 3 free parameters and value of sigma is 2.52613 which is standard error of model, have beta1= -0.65207 is the intercept indicating average income is -0.65207 with zero number of education and beta2 is 0.37613 which is the slope and indicates that an increase in education by 1 year, on average will increase income by 0.37613 units.
Linear regression using MLE For Standard Deveation: Log likelihood function:
ols.lf2 <- function(param) {
mu <- param[1]
theta <- param[-1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate)
sigma <- x%*%theta
sum(dnorm(y, mu, sigma, log=TRUE))}
Maximizing the log likelihood function:
library(maxLik)
mle_ols2 <- maxLik(logLik = ols.lf2, start = c(mu = 1, theta1 = 1, theta2 = 1))
summary(mle_ols2)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 9 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4861.964
## 3 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## mu 3.516764 0.070320 50.01 <2e-16 ***
## theta1 1.461011 0.106745 13.69 <2e-16 ***
## theta2 0.109081 0.009185 11.88 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
In the above analysis now I am looking for Max Liklihood (MLE) of unit for change in Standard Deviation (Slide 4.19). This is a different way of analysis of above variables and relationship between them. Instead of standard error I get mean error, mu=3.51676. Our theta 1 is 1.46101 which is the intercept with zero number of education and theta2 is 0.10908 which is the slpoe indicates that an increase in education by 1 year there will be an increase in Standard Deviation of income by 0.10908. It shows that an increase in education will cause ineqaulity in income.
Adding another coefficient “AGE”" and Comparing the results:
Linear regression using MLE for Mean Average: log likelihood function:
ols.lf3 <- function(param) {
beta <- param[-1]
sigma <- param[1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate, turnout$age)
mu <- x%*%beta
sum(dnorm(y, mu, sigma, log = TRUE))}
param<-c(1,2,3,4,5)
param[-1]
## [1] 2 3 4 5
param[1]
## [1] 1
Maximizing the log likelihood function:
library(maxLik)
mle_ols3 <- maxLik(logLik = ols.lf3, start = c(sigma = 1, beta1 = 1, beta2 = 1, beta3=1))
summary(mle_ols3)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 16 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4690.815
## 4 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## sigma 2.525576 0.039919 63.268 <2e-16 ***
## beta1 -0.446047 0.300583 -1.484 0.138
## beta2 0.371011 0.017493 21.209 <2e-16 ***
## beta3 -0.003184 0.003373 -0.944 0.345
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
Checking the Results Through Linear Model
summary(lm(income~educate+age, data = turnout))
##
## Call:
## lm(formula = income ~ educate + age, data = turnout)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2128 -1.7471 -0.4217 1.3042 11.1256
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.446084 0.303955 -1.468 0.142
## educate 0.371013 0.017641 21.031 <2e-16 ***
## age -0.003183 0.003394 -0.938 0.348
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.527 on 1997 degrees of freedom
## Multiple R-squared: 0.2014, Adjusted R-squared: 0.2006
## F-statistic: 251.8 on 2 and 1997 DF, p-value: < 2.2e-16
In the above analysis I have added age as a coefficient along with education to analyse changes in income. First, I am looking at maximizing the likelihood estimate of the mean. The value of sigma is 2.525576 which is standard error of model and is same as compared to above analysis, have beta1= -0.446047 is the intercept indicating average income and slight difference when used education parameter i.e -0.65207 , beta2 is 0.371011 which is the slope (almost same result when used with single parameter education) and indicates that an increase in education by 1 year, on average will increase income by 0.371011 units and beta3 is -0.003184, which is an age parameter, indicates that an increase in age by 1 year, on average will decrease income by -0.003184 units. Decrease in income with increase in an age seems to be of no sence to me and in the result this predictor is not statistically significant but education is.
Linear regression using MLE for Standard Deviation- Using log likelihood function:
ols.lf4 <- function(param) {
mu <- param[1]
theta <- param[-1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate, turnout$age)
sigma <- x%*%theta
sum(dnorm(y, mu, sigma, log=TRUE))}
Maximizing the log likelihood function:
library(maxLik)
mle_ols4 <- maxLik(logLik = ols.lf4, start = c(mu = 1, theta1 = 1, theta2 = 1,theta3=1))
summary(mle_ols4)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 3 iterations
## Return code 3: Last step could not find a value above the current.
## Boundary of parameter space?
## Consider switching to a more robust optimisation method temporarily.
## Log-Likelihood: -7542.444
## 4 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## mu 1.0033 0.3368 2.979 0.00289 **
## theta1 0.9810 NA NA NA
## theta2 0.7662 NA NA NA
## theta3 0.1531 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
In the above analysis now I am looking for Max Liklihood (MLE) of unit for changes in Standard Deviation. Mu=1.0033, Our theta1 is 0.9810 which is the intercept, theta2 is 0.7662 which is a new slpoe of education indicates that an increase in education by 1 year there will be an increase in Standard Deviation of income by 0.7662 and theta3 is 0.1531, the slop for age, indicates that an increase in age by 1 year there will be an increase in Standard Deviation of income by 0.1531. It shows that an increase in age and education will cause ineqyaulity in income.