library(Zelig)
library(dplyr)
library(radiant.data)
library(tidyr)
library(car)
library(magrittr)
data(Vocab)
vocabulary2 <- ifelse(Vocab$vocabulary >= 6, 1, 0)
head(vocabulary2)
[1] 0 1 1 1 0 1
This data set is for number of vocab words correct in a ten word test. The purpose of this analysis is to determine whether or n ot variables such as sex and education have a significant impact on the outcome. Voca is going to be my dependent binomial variable (0/1) and age and education will the by indepdent variables.
z.out1 <- zlogit$new()
z.out1$zelig(vocabulary2 ~ sex + education, data = Vocab)
summary(z.out1)
Model:
Call:
z.out1$zelig(formula = vocabulary2 ~ sex + education, data = Vocab)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.5052 -1.1865 0.5718 1.0277 2.8260
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.733516 0.080908 -46.145 < 0.0000000000000002
sexMale -0.241018 0.030998 -7.775 0.00000000000000753
education 0.341363 0.006415 53.217 < 0.0000000000000002
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 28944 on 21637 degrees of freedom
Residual deviance: 25016 on 21635 degrees of freedom
AIC: 25022
Number of Fisher Scoring iterations: 4
Next step: Use 'setx' method
An estimate looks at a certain model and and generates a random draw.
z.outfare <- zlogit$new()
z.outfare$zelig(vocabulary2 ~ sex + education, data = Vocab)
z.outfare$setrange(education = min(Vocab$education):max(Vocab$education))
z.outfare$sim()
z.outfare$graph()
This plot shows that as education increases, number of vocab words correct also increases.
z.outsex <- zlogit$new()
z.outsex$zelig(vocabulary2 ~ sex + education, data = Vocab)
z.outsex$setx(sex = "Male")
z.outsex$setx1(sex = "Female")
z.outsex$sim()
summary(z.outsex)
sim x :
-----
ev
mean sd 50% 2.5% 97.5%
[1,] 0.5968797 0.005582665 0.5970494 0.5860084 0.6076786
pv
0 1
[1,] 0.411 0.589
sim x1 :
-----
ev
mean sd 50% 2.5% 97.5%
[1,] 0.6534015 0.004700812 0.6535669 0.6441086 0.6625176
pv
0 1
[1,] 0.344 0.656
fd
mean sd 50% 2.5% 97.5%
[1,] 0.05652174 0.007118653 0.05669325 0.0433241 0.07029301
fd <- z.outsex$get_qi(xvalue="x1", qi="fd")
summary(fd)
V1
Min. :0.02913
1st Qu.:0.05192
Median :0.05669
Mean :0.05652
3rd Qu.:0.06134
Max. :0.07948
plot(z.outsex)
z.outsc <- zlogit$new()
z.outsc$zelig(vocabulary2 ~ sex + education, data = Vocab)
z.outsc$setx(sex = "Male", vocabulary2 = "1")
z.outsc$setx1(sex = "Female", vocabulary2 ="1")
z.outsc$sim()
summary(z.outsc)
sim x :
-----
ev
mean sd 50% 2.5% 97.5%
[1,] 0.5968891 0.005440126 0.5970913 0.5859864 0.6076572
pv
0 1
[1,] 0.417 0.583
sim x1 :
-----
ev
mean sd 50% 2.5% 97.5%
[1,] 0.6534346 0.004705283 0.653322 0.6445684 0.662526
pv
0 1
[1,] 0.362 0.638
fd
mean sd 50% 2.5% 97.5%
[1,] 0.05654552 0.006882038 0.05652564 0.04327127 0.07050685
This model shows on average women on average pass to vocab test at a higher percentage then men.
z.outsc2 <- zlogit$new()
z.outsc2$zelig(vocabulary2 ~ sex + education, data = Vocab)
z.outsc2$setx(sex = "Male", vocabulary2 = "0")
z.outsc2$setx1(sex = "Female", vocabulary2 ="0")
z.outsc2$sim()
summary(z.outsc2)
sim x :
-----
ev
mean sd 50% 2.5% 97.5%
[1,] 0.5969255 0.00589936 0.597106 0.585213 0.6085026
pv
0 1
[1,] 0.406 0.594
sim x1 :
-----
ev
mean sd 50% 2.5% 97.5%
[1,] 0.6531433 0.004731518 0.6532818 0.6439204 0.6619257
pv
0 1
[1,] 0.375 0.625
fd
mean sd 50% 2.5% 97.5%
[1,] 0.05621782 0.0074061 0.05620223 0.0414632 0.07069902
plot(z.outsc)
plot(z.outsc2)
d1 <- z.outsc$get_qi(xvalue="x1", qi="fd")
d2 <- z.outsc2$get_qi(xvalue="x1", qi="fd")
dfd <- as.data.frame(cbind(d1, d2))
head(dfd)
library(dplyr)
tidd %>%
group_by(vocabulary2) %>%
summarise(mean = mean(simv), sd = sd(simv))
library(ggplot2)
ggplot(tidd, aes(simv)) + geom_histogram() + facet_grid(~vocabulary2)
In concluion from this information, we see that both sex and education have an effect on number of vacbulary words correct in a ten word test. As education increases, number of words correct also increases. Women are also more likely to pass/get more number of words correct then their male counterparts.