In this statistical analysis I will try to investigate the relationship between vote outcome and education using the 2000 Current Population Survey from the Census Bureau. The states that are represented are South Carolina and Arkansas. This is only a sample containing 7 variables (state, year, vote, income, education, age, and female-which takes on the values (1Female, 2 Male). It is interesting to look at what variables may effect vote turnout.
Looking at data like this can help increase awareness on the importance on voting. Whether it is voting for city council member or for the next President, many people have mixed thoughts on whether they should vote. Some people also do not vote because they are not aware of the issues / solutions that are occuring.
Analyzing this type of data can help us understand what variables to look at and perhaps consider incorporating other variables in future studies.
A data frame containing 7 variables (“state”, “year”, “vote”, “income”, “education”, “age”, “female”) and 1500 observations.
state : a factor variable with levels equal to “AR” (Arkansas) and “SC” (South Carolina)
year: an integer vector
vote: an integer vector taking on values “1” (Voted) and “0” (Did Not Vote)
income: an integer vector ranging from “4” (Less than $5000) to “17” (Greater than $75000) denoting family income.
education: an integer vector ranging from “1” (Less than High School Education) to “4” (More than a College Education).
age: an integer vector ranging from “18” to “85”
female: an integer vector taking on values “1” (Female) and “0” (Male)
The analysis I will be using will consist of the package Zelig to create logit based simulation models. THis will help us to visualize the predicted probabilities of education attainment and voter turnout. I am using a logit model since the dependent variable is nominal with only two levels.
## vote
## 0 1
## 217 1283
## education
## 1 2 3 4
## 216 486 403 395
## Model:
##
## Call:
## z5$zelig(formula = vote ~ education + income + age + female +
## state, data = voteincome)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.4569 0.3855 0.4806 0.5941 1.0363
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.009550 0.382580 -2.639 0.008320
## education 0.225355 0.090210 2.498 0.012485
## income 0.089629 0.021860 4.100 4.13e-05
## age 0.016321 0.004331 3.768 0.000164
## female 0.304063 0.151263 2.010 0.044415
## stateSC 0.311943 0.154000 2.026 0.042805
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1240.0 on 1499 degrees of freedom
## Residual deviance: 1181.6 on 1494 degrees of freedom
## AIC: 1193.6
##
## Number of Fisher Scoring iterations: 5
##
## Next step: Use 'setx' method
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 0.8337687 0.02288345 0.8351569 0.7831587 0.8717908
## pv
## 0 1
## [1,] 0.166 0.834
##
## sim x1 :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 0.9080929 0.01371846 0.9091453 0.8799439 0.9323122
## pv
## 0 1
## [1,] 0.086 0.914
## fd
## mean sd 50% 2.5% 97.5%
## [1,] 0.07432418 0.03013401 0.07438663 0.01825795 0.1373818
## V1
## Min. :-0.007198
## 1st Qu.: 0.053523
## Median : 0.074387
## Mean : 0.074324
## 3rd Qu.: 0.092579
## Max. : 0.172506
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 0.9077762 0.0133055 0.9091724 0.8780127 0.9316126
## pv
## 0 1
## [1,] 0.076 0.924
##
## sim x1 :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 0.8330955 0.02360585 0.8342377 0.7842494 0.8759815
## pv
## 0 1
## [1,] 0.16 0.84
## fd
## mean sd 50% 2.5% 97.5%
## [1,] -0.07468075 0.03018175 -0.07397202 -0.1354132 -0.0175725
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 0.8332623 0.02310582 0.8352496 0.7846537 0.8735021
## pv
## 0 1
## [1,] 0.164 0.836
##
## sim x1 :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 0.9073734 0.01353348 0.9085606 0.8778982 0.9323346
## pv
## 0 1
## [1,] 0.081 0.919
## fd
## mean sd 50% 2.5% 97.5%
## [1,] 0.07411117 0.03017012 0.07212039 0.01528098 0.1372173
According to the data,there are more people who voted however the level of education seems to be between 1 and 4, meaning that the population selected has more than a high school education but less than a college education.. The results however do not seem to be as strong as I thought it would. It is important to note, that this dataset of a single year (2000) and of only 2 states and of 1500 observations, does not give detailed results. We would probably want to get at least a 5 year sample to gain more insights regarding vote outcome and education.