In the following results (check below), beta1 (y-intercept) = -0.65207 and beta2 (slope) = 0.37613. The beta1 value shows when x = 0 (education), then mean income = -0.65207. The beta2 value shows as education increases by every 1 year, then mean income increase by 0.37613 units. Sigma = 2.52513 is the residual standard error. This result suggests there is a positive correlation between education and income. Those with higher education will have a higer mean income.
library(Zelig)
data(turnout)
head(turnout)
ols.lf <- function(param) {
beta <- param[-1]
sigma <- param[1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate)
mu <- x%*%beta
sum(dnorm(y, mu, sigma, log = TRUE))}
library(maxLik)
mle_ols <- maxLik(logLik = ols.lf, start = c(sigma = 1, beta1 = 1, beta2 = 1))
summary(mle_ols)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 12 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4691.256
## 3 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## sigma 2.52613 0.03989 63.326 < 2e-16 ***
## beta1 -0.65207 0.20827 -3.131 0.00174 **
## beta2 0.37613 0.01663 22.612 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
summary(lm(income ~ educate, data = turnout))
##
## Call:
## lm(formula = income ~ educate, data = turnout)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2028 -1.7363 -0.4273 1.3150 11.0632
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.65207 0.21016 -3.103 0.00194 **
## educate 0.37613 0.01677 22.422 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.527 on 1998 degrees of freedom
## Multiple R-squared: 0.201, Adjusted R-squared: 0.2006
## F-statistic: 502.8 on 1 and 1998 DF, p-value: < 2.2e-16
In the following results (check below), theta1 (y-intercept) = 1.461011 and theta2 (slope) = 0.109081. The Mu = 3.516764 is the mean income. The theta1 value shows when x = 0 (education), then the standard devation of income = 1.461011. The theta2 value shows as education increases by every 1 year, then standard deviation of income will increase by 0.109081 units.This result suggests there is a positive correlation between education and income inequality. Those with higher education will have a higher variation in income.
ols.lf2 <- function(param) {
mu <- param[1]
theta <- param[-1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate)
sigma <- x%*%theta
sum(dnorm(y, mu, sigma, log = TRUE))
}
library(maxLik)
mle_ols2 <- maxLik(logLik = ols.lf2, start = c(mu = 1, theta1 = 1, theta2 = 1))
summary(mle_ols2)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 9 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4861.964
## 3 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## mu 3.516764 0.070320 50.01 <2e-16 ***
## theta1 1.461011 0.106745 13.69 <2e-16 ***
## theta2 0.109081 0.009185 11.88 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
Pre-analysis: I hypothesize that age and income will positively correlate withe education. As people get older, they get more education and earn more. However, I expect age to negatively correlate with income after a certain age (perhaps 65).
Post-analysis: In the following results (check below), beta1 (y-intercept) = -0.446047, beta2 (slope) = 0.371011, and beta3 (slope) = -0.003184. The beta1 value shows when x = 0 (education), then mean income = -0.0446047. The beta2 value shows as education increases by every 1 year, then mean income increase by 0.371011 units.The beta3 value shows as age increase every year, mean income decreases by -0.0031384 units. Sigma = 2.52576 is the residual standard error. This result suggests there is a positive correlation between education and income and negative correlation between age and income. Those with higher education will have a higer mean income and those with higer age will have a lower mean income. However, with the p-value at 0.345, it is not a statistically significant relationship for age and mean income.
ols.lf3<- function(param) {
beta <- param[-1]
sigma <- param[1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate, turnout$age)
mu <- x%*%beta
sum(dnorm(y, mu, sigma, log = TRUE))
}
mle_ols3 <- maxLik(logLik = ols.lf3, start = c(sigma = 1, beta1 = 1, beta2 = 1, beta3= 1))
summary(mle_ols3)
## --------------------------------------------
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 16 iterations
## Return code 2: successive function values within tolerance limit
## Log-Likelihood: -4690.815
## 4 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## sigma 2.525576 0.039919 63.268 <2e-16 ***
## beta1 -0.446047 0.300583 -1.484 0.138
## beta2 0.371011 0.017493 21.209 <2e-16 ***
## beta3 -0.003184 0.003373 -0.944 0.345
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
summary(lm(income~educate+age,data=turnout))
##
## Call:
## lm(formula = income ~ educate + age, data = turnout)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2128 -1.7471 -0.4217 1.3042 11.1256
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.446084 0.303955 -1.468 0.142
## educate 0.371013 0.017641 21.031 <2e-16 ***
## age -0.003183 0.003394 -0.938 0.348
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.527 on 1997 degrees of freedom
## Multiple R-squared: 0.2014, Adjusted R-squared: 0.2006
## F-statistic: 251.8 on 2 and 1997 DF, p-value: < 2.2e-16
Pre-analysis: I hypothesize that age and income will positively correlate withe education. As people get older, they get more education and earn more. However, I expect age to negatively correlate with income after a certain age (perhaps 65).
Post-analysis: In the following results (check below), theta1 (y-intercept) = 0.362114, theta2 (slope) = 0.133349, and beta3 (slope) = 0.017507. In this analysis, Mu = 3.555011 is the mean income. The theta1 value shows when x = 0 (education), then stnadard deviation of income = 0.362114. The theta2 value shows as education increases by every 1 year, then standard deviation of income increase by 0.133349 unites.The theta3 value shows as age increase every year, mean income increases by 0.017507 units. This result suggests there is a positive correlation between education and income and a positive correlation between age and income. Those with higher education will have a higher standard deviaiton of income and those with higher age will have higher standard deviation income. The corresponding p-values show that the results are statistically significant.
ols.lf4 <- function(param) {
mu <- param[1]
theta <- param[-1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate, turnout$age)
sigma <- x%*%theta
sum(dnorm(y, mu, sigma, log = TRUE))
}
mle_ols4<-maxLik(logLik=ols.lf4, start=c(mu=1, theta1=1, theta2=1, theta3=1), method="bfgs")
summary(mle_ols4)
## --------------------------------------------
## Maximum Likelihood estimation
## BFGS maximization, 150 iterations
## Return code 0: successful convergence
## Log-Likelihood: -4843.15
## 4 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## mu 3.555011 0.069193 51.378 < 2e-16 ***
## theta1 0.362114 0.204550 1.770 0.0767 .
## theta2 0.133349 0.010756 12.398 < 2e-16 ***
## theta3 0.017507 0.002852 6.139 8.32e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
The following shows that as age increases, so does income. However, income starts to decrease when age hits around 50. I hypothesized that income would increase with age but income would decrease after age 65. My assumption was based on the retirement age. Perhaps people are retiring earlier than expected.
library(ggplot2)
ggplot(turnout)+
geom_point(aes(x = age, y = income)) + geom_smooth(aes(x = age, y = income))