Week Seven Part II

Using the same dataset from last week, Affairs, I want to see which variables would influence the action of an affair happening. In this week’s homework I want to focus on the education level and the gender differences. Focusing on the effects and the certainty levels, it uses the variables in a clearer way in showing their relatinships with the dependent variable, affairs.

Interaction Model

Importing the data in and using all the desired libraries. I had to change the variables to the proper format. I attempted to use yearsmarried by creating three levels for it but I ended up going with the education variable instead.

library(Zelig)
library(dplyr)
library(radiant.data)
library(tidyr)
library(ggplot2)


affairs<- read.csv("C:/Users/Jessica/Desktop/712/affairs.csv")
head(affairs)

##    X affairs gender age yearsmarried children religiousness education
## 1  4       0   male  37        10.00       no             3        18
## 2  5       0 female  27         4.00       no             4        14
## 3 11       0 female  32        15.00      yes             1        12
## 4 16       0   male  57        15.00      yes             5        18
## 5 23       0   male  22         0.75       no             2        17
## 6 29       0 female  32         1.50       no             2        17
##   occupation rating
## 1          7      4
## 2          6      4
## 3          1      4
## 4          6      5
## 5          6      3
## 6          5      5

summary(affairs)

##        X           affairs          gender         age       
##  Min.   :   4   Min.   : 0.000   female:315   Min.   :17.50  
##  1st Qu.: 528   1st Qu.: 0.000   male  :286   1st Qu.:27.00  
##  Median :1009   Median : 0.000                Median :32.00  
##  Mean   :1060   Mean   : 1.456                Mean   :32.49  
##  3rd Qu.:1453   3rd Qu.: 0.000                3rd Qu.:37.00  
##  Max.   :9029   Max.   :12.000                Max.   :57.00  
##   yearsmarried    children  religiousness     education    
##  Min.   : 0.125   no :171   Min.   :1.000   Min.   : 9.00  
##  1st Qu.: 4.000   yes:430   1st Qu.:2.000   1st Qu.:14.00  
##  Median : 7.000             Median :3.000   Median :16.00  
##  Mean   : 8.178             Mean   :3.116   Mean   :16.17  
##  3rd Qu.:15.000             3rd Qu.:4.000   3rd Qu.:18.00  
##  Max.   :15.000             Max.   :5.000   Max.   :20.00  
##    occupation        rating     
##  Min.   :1.000   Min.   :1.000  
##  1st Qu.:3.000   1st Qu.:3.000  
##  Median :5.000   Median :4.000  
##  Mean   :4.195   Mean   :3.932  
##  3rd Qu.:6.000   3rd Qu.:5.000  
##  Max.   :7.000   Max.   :5.000

affairs<-affairs%>%
  mutate(female = ifelse(gender == "female", 1, 0),
         children = ifelse(children == "yes", 1, 0),
         affairs = ifelse(affairs >= 1, 1, 0),
         marstat = ifelse(yearsmarried <= 2, "Newlywed",
                               ifelse(yearsmarried > 2 & yearsmarried <= 10, "Decade",
                                      ifelse(yearsmarried > 10, "Long Term", NA))))

z.aff <- zelig(affairs ~  gender + children*yearsmarried + education + age, model = "logit", data = affairs, cite = F)
summary(z.aff)

## Model: 
## 
## Call:
## z5$zelig(formula = affairs ~ gender + children * yearsmarried + 
##     education + age, data = affairs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3263  -0.8200  -0.6969  -0.3393   2.1679  
## 
## Coefficients:
##                        Estimate Std. Error z value Pr(>|z|)
## (Intercept)           -1.591461   0.796518  -1.998  0.04571
## gendermale             0.273001   0.218400   1.250  0.21130
## children               1.084991   0.401171   2.705  0.00684
## yearsmarried           0.199251   0.055364   3.599  0.00032
## education              0.004572   0.042928   0.106  0.91519
## age                   -0.038329   0.017294  -2.216  0.02667
## children:yearsmarried -0.129732   0.055319  -2.345  0.01902
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 675.38  on 600  degrees of freedom
## Residual deviance: 648.48  on 594  degrees of freedom
## AIC: 662.48
## 
## Number of Fisher Scoring iterations: 4
## 
## Next step: Use 'setx' method

Age Effect

Here I looked into the effects of age. Using the full age range as part of the counterfactual situation.

Based on the plot, there is a negative correlation between age and affairs. There is a higher certainty of committing an affair when you are in the late teens to twentys vs lower certainty to have an affair when you are between the ages of 30 and 40. It is also interesting becasue the certaity levels increase again after 45 years and above.

a.range = min(affairs$age):max(affairs$age)
x <- setx(z.aff, age = a.range)
s <- sim(z.aff, x = x)
ci.plot(s)

Years Married Effect

Here I looked into the effects of number of years married. I want to create another set of conterfactual situation to evaluate the affect of number of years married on the chance of an affair. There is a positive correlation between number of years married and affairs. The certainty level is actually quite even throughout the plot except for 0 years married which would make sense since one is not married then. The lowest level of certainty to have an affair is between 8- 10 years of marriage.

y.range = min(affairs$yearsmarried):max(affairs$yearsmarried)
x <- setx(z.aff, yearsmarried = y.range)
s <- sim(z.aff, x = x)
ci.plot(s)

Gender Difference

Then the gender differences. Here we see that the mean is -0.054 which means on average at the 95.5% confidence interval, women have a lower chance than males to have an affair, but it can go as low as -0.14 or as high as 0.025.

x <- setx(z.aff, gender = "male")
x1 <- setx(z.aff, gender = "female")
s <- sim(z.aff, x = x, x1 = x1)
summary(s)

## 
##  sim x :
##  -----
## ev
##           mean         sd       50%      2.5%    97.5%
## [1,] 0.3043913 0.03314172 0.3031681 0.2410212 0.368317
## pv
##          0     1
## [1,] 0.691 0.309
## 
##  sim x1 :
##  -----
## ev
##           mean         sd       50%      2.5%     97.5%
## [1,] 0.2499708 0.03091525 0.2501207 0.1921855 0.3116079
## pv
##          0     1
## [1,] 0.742 0.258
## fd
##             mean         sd         50%       2.5%      97.5%
## [1,] -0.05442048 0.04229373 -0.05561783 -0.1360261 0.02760703

First Difference

fd <- s$get_qi(xvalue="x1", qi="fd")
summary(fd)

##        V1          
##  Min.   :-0.18403  
##  1st Qu.:-0.08303  
##  Median :-0.05562  
##  Mean   :-0.05442  
##  3rd Qu.:-0.02611  
##  Max.   : 0.07566

plot(s)

Test The Variation in Education

This specific dataset uses the education variable as each year of education. So here I calculated 12 years as high school diploma, 16 years for a bacherlors degree, and 18 years for a masters degree. Here I made sure that each gender in the simulation refers to the same level of education.

c1x <- setx(z.aff, gender = "male",  education == "12")
c1x1 <- setx(z.aff, gender = "female", education == "12")
c1s <- sim(z.aff, x = c1x, x1 = c1x1)

c2x <- setx(z.aff, gender = "male", education == "16")
c2x1 <- setx(z.aff, gender = "female",education == "16")
c2s <- sim(z.aff, x = c2x, x1 = c2x1)

c3x <- setx(z.aff, gender = "male", education == "18")
c3x1 <- setx(z.aff, gender = "female", education == "18")
c3s <- sim(z.aff, x = c3x, x1 = c3x1)

plot(c1s)

plot(c2s)

plot(c3s)

Putting Them In One Place

Here we can see that the is the highest gender difference in the third level which was the masters level of education at -0.0525.

d1 <- c1s$get_qi(xvalue="x1", qi="fd")
d2 <- c2s$get_qi(xvalue="x1", qi="fd")
d3 <- c3s$get_qi(xvalue="x1", qi="fd")

dfd <- as.data.frame(cbind(d1, d2, d3))
head(dfd)

##            V1           V2          V3
## 1 -0.08384855 -0.086284415 -0.02486833
## 2 -0.04538848 -0.006212256 -0.04275374
## 3 -0.05076381 -0.040372970 -0.09629340
## 4 -0.03713391  0.018430428 -0.10187886
## 5 -0.06304403 -0.111951129 -0.01939424
## 6 -0.05868805 -0.069173127 -0.02542595

tidd <- dfd %>% 
  gather(class, simv, 1:3)
head(tidd)

##   class        simv
## 1    V1 -0.08384855
## 2    V1 -0.04538848
## 3    V1 -0.05076381
## 4    V1 -0.03713391
## 5    V1 -0.06304403
## 6    V1 -0.05868805

tidd %>% 
  group_by(class) %>% 
  summarise(mean = mean(simv), sd = sd(simv))

## # A tibble: 3 x 3
##   class    mean     sd
##   <chr>   <dbl>  <dbl>
## 1 V1    -0.0537 0.0423
## 2 V2    -0.0545 0.0440
## 3 V3    -0.0545 0.0424

ggplot(tidd, aes(simv)) + geom_histogram() + facet_grid(~class)

Using the age and yearsmarried variables to see the effects on affairs. It was very interesting to see that at the 8 years to 10 years married level, the certainty of affairs are the lowest compared to the other years. Another interesting finding was that the 30 years old to the 40 years old had a low certainty of having an affair. For the last part, the 18 years of education had the highest gender difference in committing an affair compared to the other two levels.

Homework 6 Part II

Jessica Yin

March 24, 2019