Linear Model
Installing and loading the packages
library(maxLik)
library(Zelig)
library(ggplot2)
library(coefplot)
library(ggthemes)
library(dplyr)
# Opening the "turnout" data set from the 'Zelig' package
library(Zelig)
data(turnout)
head(turnout)
## race age educate income vote
## 1 white 60 14 3.3458 1
## 2 white 51 10 1.8561 0
## 3 white 24 12 0.6304 0
## 4 white 38 8 3.4183 1
## 5 white 25 12 2.7852 1
## 6 white 67 12 2.3866 1
Running preliminary analitycs of the “turnout” data set to understand and discover the data
Plotting a graph that shows the relationship between education and income
library(ggplot2)
ggplot(turnout, aes(x=educate, y=income)) + geom_point(color="orange") + labs(x="Educate", y="Income", title="Relationship between Education and Income")

Plotting a graph that shows the relationship between age and income
ggplot(turnout, aes(x=age, y=income)) + geom_point(color="orange") + geom_smooth(aes(x = age, y = income)) + labs(x="Age", y="Income", title="Relationship between Age and Income")

Ploting a matrix graph showing the relationship with all 3 variables
turnout1 <- turnout [,c (2:4)]
plot(turnout1, pch=16, col="grey", main="Matrix Scatterplot of Income, Education and Age")

turnout%>%
group_by(educate)%>%
summarize(av.income=mean(income), av.age= mean(age))%>%
ggplot()+
geom_col(aes(x=educate, y=av.income, fill=av.age))+
theme_calc()

The summary of turnout data
summary(turnout)
## race age educate income
## others: 292 Min. :17.0 Min. : 0.00 Min. : 0.000
## white :1708 1st Qu.:31.0 1st Qu.:10.00 1st Qu.: 1.744
## Median :42.0 Median :12.00 Median : 3.351
## Mean :45.3 Mean :12.07 Mean : 3.887
## 3rd Qu.:59.0 3rd Qu.:14.00 3rd Qu.: 5.233
## Max. :95.0 Max. :19.00 Max. :14.925
## vote
## Min. :0.000
## 1st Qu.:0.000
## Median :1.000
## Mean :0.746
## 3rd Qu.:1.000
## Max. :1.000
lm(turnout$income ~ turnout$educate)
##
## Call:
## lm(formula = turnout$income ~ turnout$educate)
##
## Coefficients:
## (Intercept) turnout$educate
## -0.6521 0.3761
lm(turnout$income ~ turnout$age)
##
## Call:
## lm(formula = turnout$income ~ turnout$age)
##
## Coefficients:
## (Intercept) turnout$age
## 5.03117 -0.02527
Implementing the Log-Likelihood Function showing Age and Educate as independent variables that explain the standard deviation of Income
ols.lf2 <- function(param) {
mu <- param[1]
theta <- param[-1]
y <- as.vector(turnout$income)
x <- cbind(1, turnout$educate, turnout$age)
sigma <- x%*%theta
sum(dnorm(y, mu, sigma, log = TRUE))
}
The Maximum Likelihood Estimation Result
?maxLik
##to open help for MLE package. Check the method options to choose the most robust optimisation method.
library(maxLik)
mle_ols2 <- maxLik(logLik = ols.lf2, start = c(mu = 1, theta1 = 1, theta2 = 1, theta3= 1), method="BFGS")
summary(mle_ols2)
## --------------------------------------------
## Maximum Likelihood estimation
## BFGS maximization, 150 iterations
## Return code 0: successful convergence
## Log-Likelihood: -4843.15
## 4 free parameters
## Estimates:
## Estimate Std. error t value Pr(> t)
## mu 3.555011 0.069193 51.378 < 2e-16 ***
## theta1 0.362114 0.204550 1.770 0.0767 .
## theta2 0.133349 0.010756 12.398 < 2e-16 ***
## theta3 0.017507 0.002852 6.139 8.32e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------
confint(mle_ols2)
## 2.5 % 97.5 %
## mu 3.41939485 3.69062797
## theta1 -0.03879737 0.76302499
## theta2 0.11226813 0.15442979
## theta3 0.01191727 0.02309622
##"confint" compute the confidence intervals. It is showing several 95% intervals (two-tailed, from the 2.5% point to the 97.5% point of the relevant distribution, which form the upper and lower limits of the intervals)
Plotting a graph showing the correlation coefficient between the variables
library(coefplot)
coefplot(mle_ols2)

Interpreting the results
The output show a maximum likelihood estimation of the 3 variables from the turnout data. It presents the impact of age and education on the standart deviation of income:
mu = average income for population
theta1 = the intercept
theta2 = impact of education on sd of income
theta3 = impact of age on sd of income
We can notice from the data that the correlation between both, the age and income and the education and income is positive. For each unit of age the diversity in income increases 0.01 and for each unit of education the diversity of income increases 0.13. This means that education has stronger impact on the diversity of income (With increase of education the difference among people’ income also incresae) and the size of standard deviation of income than age: 0.13 > 0.01.
The output also show that all the results are statistically significant with a p-value smaller than 0.001.
The End