Using the same dataset from last week, Affairs, I want to see which variables would influence the action of an affair happening. In this week’s homework I want to focus on the education level and the gender differences. Focusing on the effects and the certainty levels, it uses the variables in a clearer way in showing their relatinships with the dependent variable, affairs.
Importing the data in and using all the desired libraries. I had to change the variables to the proper format. I attempted to use yearsmarried by creating three levels for it but I ended up going with the education variable instead.
library(Zelig)
library(dplyr)
library(radiant.data)
library(tidyr)
library(ggplot2)
affairs<- read.csv("C:/Users/Jessica/Desktop/712/affairs.csv")
head(affairs)
## X affairs gender age yearsmarried children religiousness education
## 1 4 0 male 37 10.00 no 3 18
## 2 5 0 female 27 4.00 no 4 14
## 3 11 0 female 32 15.00 yes 1 12
## 4 16 0 male 57 15.00 yes 5 18
## 5 23 0 male 22 0.75 no 2 17
## 6 29 0 female 32 1.50 no 2 17
## occupation rating
## 1 7 4
## 2 6 4
## 3 1 4
## 4 6 5
## 5 6 3
## 6 5 5
summary(affairs)
## X affairs gender age
## Min. : 4 Min. : 0.000 female:315 Min. :17.50
## 1st Qu.: 528 1st Qu.: 0.000 male :286 1st Qu.:27.00
## Median :1009 Median : 0.000 Median :32.00
## Mean :1060 Mean : 1.456 Mean :32.49
## 3rd Qu.:1453 3rd Qu.: 0.000 3rd Qu.:37.00
## Max. :9029 Max. :12.000 Max. :57.00
## yearsmarried children religiousness education
## Min. : 0.125 no :171 Min. :1.000 Min. : 9.00
## 1st Qu.: 4.000 yes:430 1st Qu.:2.000 1st Qu.:14.00
## Median : 7.000 Median :3.000 Median :16.00
## Mean : 8.178 Mean :3.116 Mean :16.17
## 3rd Qu.:15.000 3rd Qu.:4.000 3rd Qu.:18.00
## Max. :15.000 Max. :5.000 Max. :20.00
## occupation rating
## Min. :1.000 Min. :1.000
## 1st Qu.:3.000 1st Qu.:3.000
## Median :5.000 Median :4.000
## Mean :4.195 Mean :3.932
## 3rd Qu.:6.000 3rd Qu.:5.000
## Max. :7.000 Max. :5.000
affairs<-affairs%>%
mutate(female = ifelse(gender == "female", 1, 0),
children = ifelse(children == "yes", 1, 0),
affairs = ifelse(affairs >= 1, 1, 0),
marstat = ifelse(yearsmarried <= 2, "Newlywed",
ifelse(yearsmarried > 2 & yearsmarried <= 10, "Decade",
ifelse(yearsmarried > 10, "Long Term", NA))))
z.aff <- zelig(affairs ~ gender + children*yearsmarried + education + age, model = "logit", data = affairs, cite = F)
summary(z.aff)
## Model:
##
## Call:
## z5$zelig(formula = affairs ~ gender + children * yearsmarried +
## education + age, data = affairs)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3263 -0.8200 -0.6969 -0.3393 2.1679
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.591461 0.796518 -1.998 0.04571
## gendermale 0.273001 0.218400 1.250 0.21130
## children 1.084991 0.401171 2.705 0.00684
## yearsmarried 0.199251 0.055364 3.599 0.00032
## education 0.004572 0.042928 0.106 0.91519
## age -0.038329 0.017294 -2.216 0.02667
## children:yearsmarried -0.129732 0.055319 -2.345 0.01902
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 675.38 on 600 degrees of freedom
## Residual deviance: 648.48 on 594 degrees of freedom
## AIC: 662.48
##
## Number of Fisher Scoring iterations: 4
##
## Next step: Use 'setx' method
Here I looked into the effects of age. Using the full age range as part of the counterfactual situation.
Based on the plot, there is a negative correlation between age and affairs. There is a higher certainty of committing an affair when you are in the late teens to twentys vs lower certainty to have an affair when you are between the ages of 30 and 40. It is also interesting becasue the certaity levels increase again after 45 years and above.
a.range = min(affairs$age):max(affairs$age)
x <- setx(z.aff, age = a.range)
s <- sim(z.aff, x = x)
ci.plot(s)
Here I looked into the effects of number of years married. I want to create another set of conterfactual situation to evaluate the affect of number of years married on the chance of an affair. There is a positive correlation between number of years married and affairs. The certainty level is actually quite even throughout the plot except for 0 years married which would make sense since one is not married then. The lowest level of certainty to have an affair is between 8- 10 years of marriage.
y.range = min(affairs$yearsmarried):max(affairs$yearsmarried)
x <- setx(z.aff, yearsmarried = y.range)
s <- sim(z.aff, x = x)
ci.plot(s)
Then the gender differences. Here we see that the mean is -0.054 which means on average at the 95.5% confidence interval, women have a lower chance than males to have an affair, but it can go as low as -0.14 or as high as 0.025.
x <- setx(z.aff, gender = "male")
x1 <- setx(z.aff, gender = "female")
s <- sim(z.aff, x = x, x1 = x1)
summary(s)
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 0.3043913 0.03314172 0.3031681 0.2410212 0.368317
## pv
## 0 1
## [1,] 0.691 0.309
##
## sim x1 :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 0.2499708 0.03091525 0.2501207 0.1921855 0.3116079
## pv
## 0 1
## [1,] 0.742 0.258
## fd
## mean sd 50% 2.5% 97.5%
## [1,] -0.05442048 0.04229373 -0.05561783 -0.1360261 0.02760703
fd <- s$get_qi(xvalue="x1", qi="fd")
summary(fd)
## V1
## Min. :-0.18403
## 1st Qu.:-0.08303
## Median :-0.05562
## Mean :-0.05442
## 3rd Qu.:-0.02611
## Max. : 0.07566
plot(s)
This specific dataset uses the education variable as each year of education. So here I calculated 12 years as high school diploma, 16 years for a bacherlors degree, and 18 years for a masters degree. Here I made sure that each gender in the simulation refers to the same level of education.
c1x <- setx(z.aff, gender = "male", education == "12")
c1x1 <- setx(z.aff, gender = "female", education == "12")
c1s <- sim(z.aff, x = c1x, x1 = c1x1)
c2x <- setx(z.aff, gender = "male", education == "16")
c2x1 <- setx(z.aff, gender = "female",education == "16")
c2s <- sim(z.aff, x = c2x, x1 = c2x1)
c3x <- setx(z.aff, gender = "male", education == "18")
c3x1 <- setx(z.aff, gender = "female", education == "18")
c3s <- sim(z.aff, x = c3x, x1 = c3x1)
plot(c1s)
plot(c2s)
plot(c3s)
Here we can see that the is the highest gender difference in the third level which was the masters level of education at -0.0525.
d1 <- c1s$get_qi(xvalue="x1", qi="fd")
d2 <- c2s$get_qi(xvalue="x1", qi="fd")
d3 <- c3s$get_qi(xvalue="x1", qi="fd")
dfd <- as.data.frame(cbind(d1, d2, d3))
head(dfd)
## V1 V2 V3
## 1 -0.08384855 -0.086284415 -0.02486833
## 2 -0.04538848 -0.006212256 -0.04275374
## 3 -0.05076381 -0.040372970 -0.09629340
## 4 -0.03713391 0.018430428 -0.10187886
## 5 -0.06304403 -0.111951129 -0.01939424
## 6 -0.05868805 -0.069173127 -0.02542595
tidd <- dfd %>%
gather(class, simv, 1:3)
head(tidd)
## class simv
## 1 V1 -0.08384855
## 2 V1 -0.04538848
## 3 V1 -0.05076381
## 4 V1 -0.03713391
## 5 V1 -0.06304403
## 6 V1 -0.05868805
tidd %>%
group_by(class) %>%
summarise(mean = mean(simv), sd = sd(simv))
## # A tibble: 3 x 3
## class mean sd
## <chr> <dbl> <dbl>
## 1 V1 -0.0537 0.0423
## 2 V2 -0.0545 0.0440
## 3 V3 -0.0545 0.0424
ggplot(tidd, aes(simv)) + geom_histogram() + facet_grid(~class)
Using the age and yearsmarried variables to see the effects on affairs. It was very interesting to see that at the 8 years to 10 years married level, the certainty of affairs are the lowest compared to the other years. Another interesting finding was that the 30 years old to the 40 years old had a low certainty of having an affair. For the last part, the 18 years of education had the highest gender difference in committing an affair compared to the other two levels.