I will be redoing the classwork from slides 5.2 to 5.18 as Zelig 5 below:
library(radiant.data)
library(Zelig)
data(titanic)
titanic <- titanic %>%
mutate(surv = as.integer(survived)) %>%
mutate(survival = sjmisc::rec(surv, rec = "2=0; 1=1")) %>%
select(pclass, survived, survival, everything())
head(titanic)
z5 <- zlogit$new()
z5$zelig(survival ~ age + sex*pclass+fare, data = titanic)
summary(z5)
Model:
Call:
z5$zelig(formula = survival ~ age + sex * pclass + fare, data = titanic)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.0634 -0.6641 -0.4943 0.4336 2.4941
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.8978215 0.6131092 7.988 1.37e-15
age -0.0387245 0.0068044 -5.691 1.26e-08
sexmale -3.8996177 0.5015659 -7.775 7.55e-15
pclass2nd -1.5923247 0.6024844 -2.643 0.00822
pclass3rd -4.1382715 0.5601819 -7.387 1.50e-13
fare -0.0009058 0.0020509 -0.442 0.65874
sexmale:pclass2nd -0.0600076 0.6372949 -0.094 0.92498
sexmale:pclass3rd 2.5019110 0.5479901 4.566 4.98e-06
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1409.99 on 1042 degrees of freedom
Residual deviance: 931.45 on 1035 degrees of freedom
AIC: 947.45
Number of Fisher Scoring iterations: 5
Next step: Use 'setx' method
As age goes up (from infant to elder), the chances of surviving the Titanic goes down, where the median survival rate is around .35 for infants. There is a negative correlation. However, the uncertainty to survive for infants/young people is much higher than certainty of survival when you’re 80 (a higher gap is more uncertainty).
z5 <- zlogit$new()
z5$zelig(survival ~ age + sex*pclass+fare, data = titanic)
a.range = min(titanic$age):max(titanic$age)
z5$setrange(age=a.range)
z5$sim()
z5$graph()
As fare goes up, chances of surviving the Titanic pretty much stays the same. The expected value of survival remains pretty constant with respect to fare. Fare price did not have much too much influence on one’s chances of surviving the Titanic.
z5 <- zlogit$new()
z5$zelig(survival ~ age + sex*pclass+fare, data = titanic)
f.range = min(titanic$fare):max(titanic$fare)
z5$setrange(fare=f.range)
z5$sim()
z5$graph()
The first difference shows that females’ expected value is about .26 higher than males, on average. Females can have an expected value of about .39 and males can have an expected value of about .13. This shows that females have a better chance of surviving the Titanic than males.
z5 <- zlogit$new()
z5$zelig(survival ~ age + sex*pclass+fare, data = titanic)
z5$setx(sex = "male")
z5$setx1(sex = "female")
z5$sim()
z5$graph()
fd <- z5$get_qi(xvalue="x1", qi="fd")
summary(fd)
V1
Min. :0.09055
1st Qu.:0.22784
Median :0.25590
Mean :0.25601
3rd Qu.:0.28558
Max. :0.42242
1st class: The difference in expected value of survival is about .52 higher for females than males. 2nd class: The difference in expected value of survival is about .77 higher for females than males. The male survival rate decreased by a lot (from around .45 in 1st class to around .13 in 2nd class). 3rd class: 3rd class had the lowest survival rate out of all the classes. The difference in expected value of survival is about .25 higher for females than males.
#First class
z5fc <- zlogit$new()
z5fc$zelig(survival ~ age +sex*pclass + fare, data=titanic)
z5fc$setx(sex="male", pclass="1st")
z5fc$setx1(sex="female", pclass="1st")
z5fc$sim()
z5fc$graph()
#Second class
z5sc <- zlogit$new()
z5sc$zelig(survival ~ age +sex*pclass + fare, data=titanic)
z5sc$setx(sex="male", pclass="2nd")
z5sc$setx1(sex="female", pclass="2nd")
z5sc$sim()
z5sc$graph()
#Third class
z5tc <- zlogit$new()
z5tc$zelig(survival ~ age +sex*pclass + fare, data=titanic)
z5tc$setx(sex="male", pclass="3rd")
z5tc$setx1(sex="female", pclass="3rd")
z5tc$sim()
z5tc$graph()
V1=Gender difference survival rate in 1st class V2=Gender difference survival rate in 2nd class V3=Gender difference survival rate in 3rd class
d1 <- z5fc$get_qi(xvalue="x1", qi="fd")
d2 <- z5sc$get_qi(xvalue="x1", qi="fd")
d3 <- z5tc$get_qi(xvalue="x1", qi="fd")
dfd <- as.data.frame(cbind(d1, d2, d3))
head(dfd)
library(tidyr)
tidd <- dfd %>%
gather(class, simv, 1:3)
head(tidd)
Among 1st class, women have .52 chance higher than men to survive. 3rd class (V3) is .25(lowest), 2nd class (V2) is largest difference (about .75).
library(dplyr)
tidd %>%
group_by(class) %>%
summarise(mean = mean(simv), sd = sd(simv))
Visualization of the gender first differences by class. As expected, the smallest first difference expected value between females and males is in 3rd class and the largest is in 2nd class.
library(ggplot2)
ggplot(tidd, aes(simv)) + geom_histogram() + facet_grid(~class)
I found the dataset I will be using from kaggle about mental health in the workplace https://www.kaggle.com/osmi/mental-health-in-tech-survey. Taking care of one’s mental health is becoming socially normalized. I want to look at variables such as knowledge of family history of mental illness(family_history), if the mental condition would interfere with work(work_interfere), gender, age, working remotely (remote_work) and healthcare benefits from a job, to see if they are factors for those who sought out mental health treatments(treatment - binary dependent vairable).
library(texreg)
library(tidyr)
library(list)
library(corpcor)
library(tidyverse)
library(knitr)
library (Zelig)
library(readr)
library(readr)
library(dplyr)
library(sjmisc)
survey <- read_csv("C:/Users/abbys/Downloads/survey.csv")
Parsed with column specification:
cols(
.default = col_character(),
Timestamp = [34mcol_datetime(format = "")[39m,
Age = [32mcol_double()[39m
)
See spec(...) for full column specifications.
head(survey)
new_survey<-survey %>%
filter(!is.na(benefits),
!is.na(treatment),
!is.na(work_interfere),
!is.na(family_history),
!is.na(Age),
!is.na(remote_work)
)%>%
mutate(treatment=rec(treatment,rec="No=0;Yes=1"),work_interfere=rec(work_interfere,rec="Never,Rarely=No;Sometimes,Often=Yes"),Gender=rec(Gender,rec="M,Male,male,m,Male-ish,Cis Male,Mal,Make,Guy (-ish) ^_^,male leaning androgynous,msle,Mail,Malr,Cis Man,ostensibly male, unsure what that really means=0;Female,female,Trans-female,Cis Female,F,f,queer/she/they,woman,cis-female/femme,Trans woman,Female (trans),Female (cis),Woman,femail=1"))%>%
filter(!is.na(Gender),
Age<100, Age>0)%>%
mutate(work_interfere=factor(work_interfere),family_history = factor(family_history), benefits = factor(benefits), Gender=factor(Gender))
head(new_survey)
dim(new_survey)
[1] 969 27
The best model is model 2, since model 2 has the lowest aic and bic.
glm1=glm(treatment~benefits+Gender,family="binomial",data=new_survey)
glm2=glm(treatment~family_history+Gender+work_interfere+benefits+Age,family="binomial",data=new_survey)
glm3=glm(treatment~family_history*Gender+work_interfere+benefits+Age+remote_work,family="binomial",data=new_survey)
texreg::screenreg(list(glm1,glm2,glm3))
================================================================
Model 1 Model 2 Model 3
----------------------------------------------------------------
(Intercept) -0.15 -2.26 *** -2.28 ***
(0.12) (0.39) (0.39)
benefitsNo 0.43 * 0.42 * 0.43 *
(0.17) (0.20) (0.20)
benefitsYes 0.99 *** 1.04 *** 1.05 ***
(0.17) (0.19) (0.20)
Gender1 0.95 *** 0.81 *** 1.01 ***
(0.19) (0.21) (0.28)
family_historyYes 1.19 *** 1.27 ***
(0.16) (0.18)
work_interfereYes 1.66 *** 1.65 ***
(0.16) (0.16)
Age 0.02 * 0.02 *
(0.01) (0.01)
remote_workYes 0.05
(0.17)
family_historyYes:Gender1 -0.47
(0.42)
----------------------------------------------------------------
AIC 1212.30 1005.30 1007.99
BIC 1231.81 1039.43 1051.88
Log Likelihood -602.15 -495.65 -495.00
Deviance 1204.30 991.30 989.99
Num. obs. 969 969 969
================================================================
*** p < 0.001, ** p < 0.01, * p < 0.05
To clarify, here are some brief explanations of the variables I chose: family_history-Do you have a family history of mental illness? (yes/no) treatment-Have you sought treatment for a mental health condition? (yes=1/no=0) work_interfere-If you have a mental health condition, do you feel that it interferes with your work? (yes/no) benefits-Does your employer provide mental health benefits?(yes/no/don’t know)
z.mental_health <- zelig(treatment ~ family_history+Gender+work_interfere+benefits+Age, model = "logit", data = new_survey, cite = FALSE)
summary(z.mental_health)
Model:
Call:
z5$zelig(formula = treatment ~ family_history + Gender + work_interfere +
benefits + Age, data = new_survey)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.4653 -0.7623 0.4436 0.7586 1.9641
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.26273 0.38933 -5.812 6.18e-09
family_historyYes 1.18541 0.16474 7.196 6.21e-13
Gender1 0.80972 0.21248 3.811 0.000138
work_interfereYes 1.66258 0.16045 10.362 < 2e-16
benefitsNo 0.41689 0.19546 2.133 0.032932
benefitsYes 1.03698 0.19455 5.330 9.82e-08
Age 0.02231 0.01069 2.088 0.036795
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1276.5 on 968 degrees of freedom
Residual deviance: 991.3 on 962 degrees of freedom
AIC: 1005.3
Number of Fisher Scoring iterations: 4
Next step: Use 'setx' method
Those with a family history of mental illness have about a .15 higher chance of having sought out mental health treatment than those who do not have a family history of mental illness, on average.
z.mental_health <- zelig(treatment ~ family_history+Gender+work_interfere+benefits+Age, model = "logit", data = new_survey, cite = FALSE)
z.mental_health$setx(family_history="No")
z.mental_health$setx1(family_history="Yes")
z.mental_health$sim()
fd1 <- z.mental_health$get_qi(xvalue="x1", qi="fd")
summary(fd1)
V1
Min. :0.07629
1st Qu.:0.13266
Median :0.15052
Mean :0.15168
3rd Qu.:0.17077
Max. :0.24441
z.mental_health$sim()
z.mental_health$graph()
Those who believe that their mental illness interferes with their work have about a .38 higher chance of having sought out mental health treatment than those who do not believe their mental illness interferes with their work, on average.
z.mental_health <- zelig(treatment ~ family_history+Gender+work_interfere+benefits+Age, model = "logit", data = new_survey, cite = FALSE)
z.mental_health$setx(work_interfere="No")
z.mental_health$setx1(work_interfere="Yes")
z.mental_health$sim()
fd2 <- z.mental_health$get_qi(xvalue="x1", qi="fd")
summary(fd2)
V1
Min. :0.2894
1st Qu.:0.3609
Median :0.3846
Mean :0.3831
3rd Qu.:0.4055
Max. :0.5044
z.mental_health$sim()
z.mental_health$graph()
Females(1) have about a .11 higher chance of having sought out mental health treatment than males(0), on average.
z.mental_health <- zelig(treatment ~ family_history+Gender+work_interfere+benefits+Age, model = "logit", data = new_survey, cite = FALSE)
z.mental_health$setx(Gender=0)
z.mental_health$setx1(Gender=1)
z.mental_health$sim()
fd3 <- z.mental_health$get_qi(xvalue="x1", qi="fd")
summary(fd3)
V1
Min. :-0.008041
1st Qu.: 0.095241
Median : 0.115085
Mean : 0.114757
3rd Qu.: 0.134881
Max. : 0.213362
z.mental_health$sim()
z.mental_health$graph()
Those who have mental healthcare coverage through their employer have about a .13 higher chance than those with no healthcare coverage through their employer for having sought out mental health treatment.
z.mental_health <- zelig(treatment ~ family_history+Gender+work_interfere+benefits+Age, model = "logit", data = new_survey, cite = FALSE)
z.mental_health$setx(benefits="No")
z.mental_health$setx1(benefits="Yes")
z.mental_health$sim()
fd4 <- z.mental_health$get_qi(xvalue="x1", qi="fd")
summary(fd4)
V1
Min. :-0.002983
1st Qu.: 0.100418
Median : 0.128722
Mean : 0.128130
3rd Qu.: 0.154615
Max. : 0.278853
z.mental_health$sim()
z.mental_health$graph()
Those who do not have healthcare coverage have about a .10 higher chance than those who don’t know if they have mental healthcare coverage through their employer for having sought out mental health treatment.
z.mental_health <- zelig(treatment ~ family_history+Gender+work_interfere+benefits+Age, model = "logit", data = new_survey, cite = FALSE)
z.mental_health$setx(benefits="Don't know")
z.mental_health$setx1(benefits="No")
z.mental_health$sim()
fd5 <- z.mental_health$get_qi(xvalue="x1", qi="fd")
summary(fd5)
V1
Min. :-0.04287
1st Qu.: 0.06867
Median : 0.09935
Mean : 0.10032
3rd Qu.: 0.12936
Max. : 0.26553
z.mental_health$sim()
z.mental_health$graph()
The expected value of having sought out mental health treatment increases as age increases.
z.mental_health <- zelig(treatment ~ family_history+Gender+work_interfere+benefits+Age, model = "logit", data = new_survey, cite = FALSE)
z.age_range <- min(new_survey$Age):max(new_survey$Age)
z.mental_health$setrange(Age=z.age_range)
z.mental_health$sim()
z.mental_health$graph()
To conclude, I found the following from my analysis: Family history - People are more likely to seek out mental health treatment if there’s a family history of mental illness than if there’s no family history of mental illness. Work Interference - People are more likely to seek out mental health treatment if the mental health condition interferes with their work than if it doesn’t interfere with their work. Gender - Females are more likely to seek out mental health treatment than males. Benefits: Yes Vs No - People are more likely to seek out mental health treatment if they have mental health coverage through their employer than if they don’t have coverage. Benefits: No Vs Don’t know - People are more likely to seek out mental health treatment if they do not have mental health coverage through their employer than if they do not know if they have mental health coverage through their employer. This susprised me the most but it makes sense, since if someone knows they are not covered for mental health they are probably looking for treatment. If someone does not know if they have mental health coverage, they are probably not going to seek treatment. Age - Older people are more likely to seek out mental health treatment than younger people.