The dataset used for this analysis is the National Health Interview Survey (NHIS) from 1997-2016. The analysis is between Serious Mental illness, the dependent variable and gender, being the independant variable. For this analysis, mental illness is indicated using Kessler Score, which was developed by Ronald C. Kessler and known as the Kessler 6 scale (K6). My hypothesis is that women are more likely to have mental illness along with other indicators such as those with high BMIs. I believe that there might be a correlation between age as well but as in my last analysis I did not include it, I will test the effect of age on Serious Mental Illness. Additionally I will be using a logit model for the analysis and use stimulation to calculate bether the impact of serious mental illness across Sex, BMIs, etc.
library(readr)
library(dplyr)
library(texreg)
library(ggplot2)
library(Zelig)
load("/Users/Deepakie/Documents/Queens College/SOC712/Data/NHIS_v3.rdata")
head(NHIS_v3)
NHIS<- NHIS_v3%>%
select(sex,racenew,bmi_7,health,asad,aeffort,ahopeless,aworthless,anervous,arestless,age)%>%
mutate(sex = ifelse(sex==1, "1","0"),
BMI = ifelse(bmi_7==1, "1",
ifelse(bmi_7==2, "2",
ifelse(bmi_7==3, "3",
ifelse(bmi_7==4, "4",
ifelse(bmi_7==5, "5",
ifelse(bmi_7==6, "6",NA)))))),
race =ifelse(racenew==10, "1",
ifelse(racenew==20, "2",
ifelse(racenew>=30, "3",
ifelse(racenew>=61, NA,NA)))),
ahopeless= ifelse(ahopeless>4,NA,ahopeless),
asad= ifelse(asad>4,NA,asad),
aworthless= ifelse(aworthless>4,NA,aworthless),
aeffort= ifelse(aeffort>4,NA,aeffort),
arestless= ifelse(arestless>4,NA,arestless),
anervous= ifelse(anervous>4,NA,anervous),
Seriousmentalillness=ifelse(ahopeless+asad+aworthless+aeffort+arestless+anervous>=13,1,0))%>%
select(-asad,-aeffort,-ahopeless,-aworthless,-anervous,-arestless,-bmi_7,-racenew)%>%filter(!is.na(Seriousmentalillness), !is.na(sex), !is.na(BMI), !is.na(race), !is.na(age))
As we want to analysis and investigate the effect of sex and other variables on serious mental distress, we use the following variables - sex, racenew, bmi_7, asad,aeffort,ahopeless,aworthless,anervous,arestless and age. I recoded and cleaned my variables for missing data. For BMI, there are 6 categories being underweight (1), normal (2), overweight(3) Obese30s (4), Obese40s(5) and Obese50s (6) numbered from 1- 6, respectively.BMI of 30 or above is considered obese. There are three obese categories for the BMI variable in this analysis as obese 30s, indicating one with BMI from 30-39, obese 40s indicating BMI from 40-49 and obese 50s indicating one’s obesity level of 50 and above. Additionally, race is specified in 3 categories being white(1), African American/black (2) and others (3) which include all other races. The dependent variable, Seriousmentalillness, is the addition of all 6 questions which consitiute a scale measuring psychological distress.Kessler score (K6) is indicated by the addition of 6 mental distress variables on a scale of 1-4 each. So if one is to have a total score of 13 or above, they are known to have serious mental distress or illness for the sake of this analysis. Those with 13 or below have low or no mental illness. Those who have serious mental illness are coded as 1 and who aren’t are coded by 0. Age is a metric value so we just get rid of missing values for age. Sex is recoded as 1 being males and 0 = females.
Serious Mental Illness will be referred as SMI throughout this assignment
head(NHIS)
NHIS$sex <- as.factor(NHIS$sex)
NHIS$BMI <- as.factor(NHIS$BMI)
NHIS$race <- as.factor(NHIS$race)
Here I change the above variables - sex, BMI and race into factors to run the Models and go on with stimulation.
SMI <- zelig(Seriousmentalillness ~ sex, model = "logit", data = NHIS, cite = F)
|================================================|100% ~0 s remaining
summary(SMI)
Model:
Call:
z5$zelig(formula = Seriousmentalillness ~ sex, data = NHIS)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.2922 -0.2922 -0.2922 -0.2363 2.6804
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.132645 0.009351 -334.99 <2e-16
sex1 -0.431699 0.015742 -27.42 <2e-16
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 158642 on 518494 degrees of freedom
Residual deviance: 157864 on 518493 degrees of freedom
AIC: 157868
Number of Fisher Scoring iterations: 6
Next step: Use 'setx' method
This is a simple logit model where we have one binary dependent and independent variable. The results above show us men have lower likelihood than females to have serious mental distress. The log odds of males are .43 lower likely to have serious mental distress compared to females. In other words, females have a higher chance of having SMI. The next model is a logit model between SMI and other independent variables.
SMI2 <- zelig(Seriousmentalillness ~ sex + race + age + BMI, model = "logit", data = NHIS, cite = F)
|================================================|100% ~0 s remaining
summary(SMI2)
Model:
Call:
z5$zelig(formula = Seriousmentalillness ~ sex + race + age +
BMI, data = NHIS)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.4828 -0.2708 -0.2606 -0.2187 2.7607
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.6803579 0.0480740 -55.755 < 2e-16
sex1 -0.4027082 0.0160377 -25.110 < 2e-16
race2 0.0198130 0.0209744 0.945 0.345
race3 0.1091577 0.0276593 3.947 7.93e-05
age -0.0003396 0.0004244 -0.800 0.424
BMI2 -0.6764645 0.0453889 -14.904 < 2e-16
BMI3 -0.6250893 0.0457832 -13.653 < 2e-16
BMI4 -0.1904996 0.0456995 -4.169 3.07e-05
BMI5 0.2837190 0.0520457 5.451 5.00e-08
BMI6 0.4867664 0.0813947 5.980 2.23e-09
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 158642 on 518494 degrees of freedom
Residual deviance: 156299 on 518485 degrees of freedom
AIC: 156319
Number of Fisher Scoring iterations: 6
Next step: Use 'setx' method
This model gives us the likelhihood of each independent variable effecting SMI between sexs. For example, we see that those with a BMI of Normal weight (BMI2) have .67 less log odds of having SMI compared to the reference BMI in this case being Underweight. Those with BMI4 (Obese30s) have .19 less log odds and those with higher BMIs such as Obese40(BMI5) and Obese50s(BMI6) have .28 higher and .48 log odds, respectiviely, of SMI compared to underweight. In addition, those who are white females with a BMI level of underweight have 2.68 lower odds of SMI overall. I think BMI has a bigger impact on those with serious mental illiness opposed to the other variables as obese population tend to have anxiety, depression; so i continue in the next model to run a interaction between sex and BMI.
SMI3 <- zelig(as.integer(Seriousmentalillness) ~ sex*BMI + race + age, model = "logit", data = NHIS, cite = F)
x.out <- setx(SMI3)
s.out <- sim(SMI3, x = x.out)
summary(SMI3)
Model:
Call:
z5$zelig(formula = as.integer(Seriousmentalillness) ~ sex * BMI +
race + age, data = NHIS)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.4737 -0.2752 -0.2489 -0.2349 2.7713
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.8140129 0.0541513 -51.966 < 2e-16
sex1 0.2770712 0.0994706 2.785 0.005345
BMI2 -0.6342714 0.0527892 -12.015 < 2e-16
BMI3 -0.4231245 0.0533603 -7.930 2.20e-15
BMI4 0.0117360 0.0529918 0.221 0.824727
BMI5 0.4459071 0.0598983 7.444 9.74e-14
BMI6 0.5867632 0.0940862 6.236 4.48e-10
race2 0.0028593 0.0210165 0.136 0.891780
race3 0.1027281 0.0276696 3.713 0.000205
age -0.0003619 0.0004239 -0.854 0.393309
sex1:BMI2 -0.3834914 0.1034529 -3.707 0.000210
sex1:BMI3 -0.8274547 0.1034694 -7.997 1.27e-15
sex1:BMI4 -0.8596493 0.1037962 -8.282 < 2e-16
sex1:BMI5 -0.7753786 0.1205743 -6.431 1.27e-10
sex1:BMI6 -0.5134850 0.1868926 -2.747 0.006005
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 158642 on 518494 degrees of freedom
Residual deviance: 156084 on 518480 degrees of freedom
AIC: 156114
Number of Fisher Scoring iterations: 6
Next step: Use 'setx' method
We can see clearly that different BMI levels for males have a lower likelihood of having serious mental illness compared to females. For example we see that for BMI4(Obese30s) males, there is a .85 lower log odd of having SMI compared to .01 of those overall with a Obese BMI between 30-39. Likewise, those with BMIs above 50(BMI6 Category) have a .58 higher odd of SMI comapred to Obese Men with BMIs above 50 who have .51 LOWER odds of mental illness.
x.male <- setx(SMI3, sex="1")
x.female <- setx(SMI3, sex="0")
s.sex<- sim(SMI3, x = x.male, x1=x.female)
summary(s.sex)
sim x :
-----
ev
mean sd 50% 2.5% 97.5%
[1,] 0.02732781 0.0006086279 0.02731625 0.02616431 0.02854238
pv
0 1
[1,] 0.968 0.032
sim x1 :
-----
ev
mean sd 50% 2.5% 97.5%
[1,] 0.03031923 0.0005039535 0.03031015 0.02938105 0.03129355
pv
0 1
[1,] 0.962 0.038
fd
mean sd 50% 2.5% 97.5%
[1,] 0.002991421 0.0007555062 0.003015316 0.001530812 0.004355582
Here we use Zelig to look at the effect of sex on SMI.Males have on average a .02 probability of having Serious Mental illness compared to women with a average of .03. Though the difference is only small, it is important to analysis. Overall the difference between the probability of males and females having SMI is .002. Lets plot it and see what we see for expected means!
plot(s.sex)
We can see that the expected mean for males to have serious mental illness in context of K6 Score is about.275 whereas for females it is .35. So, the expected difference between the means of females and males having SMI is about .075.
fd <- s.sex$get_qi(xvalue="x1", qi="fd")
summary(fd)
V1
Min. :0.0007528
1st Qu.:0.0024577
Median :0.0030153
Mean :0.0029914
3rd Qu.:0.0035235
Max. :0.0057074
This supports our finding above in the stimulation and is a simple way to find just the difference between the sexs in regard to SMI. The average difference between probablity of males having SMI vs. Females having SMI is approximately .003.
NHIS$BMI <- as.numeric(NHIS$BMI)
bmi.range = min(NHIS$BMI):max(NHIS$BMI)
SMI3$setrange(BMI = bmi.range)
SMI3$sim()
ci.plot(SMI3)
Here we first changed the BMI to be numeric and ran a plot to summarize the BMI effect on Serious mental illness. The plot above shows us that the dependent variable, being SMI, has different probability depending on BMI Levels. Starting with underweight, those who are underweight have as low as about .023 of having SMI where as those with Normal BMI have a probablity of .03. As the BMI increase, so does the probablity of having SMI which makes sense and supports my hypothesis that those with higher BMI and obesity levels have higher likelihood of having serious mental illness. So for those with BMIs of 30 above (Includes Obese30, Obese40 and Obese50 categories) have more than .05 probabality of SMI.
normal <- setx(SMI3, BMI = 2)
under <- setx1(SMI3, BMI = 1)
simm <- sim(SMI3, x = normal, x1 = under)
summary(simm)
sim x :
-----
ev
mean sd 50% 2.5% 97.5%
[1,] 0.03032639 0.0005174617 0.03031422 0.02934644 0.03138235
pv
0 1
[1,] 0.973 0.027
sim x1 :
-----
ev
mean sd 50% 2.5% 97.5%
[1,] 0.05580779 0.002500219 0.05582356 0.05110141 0.06092238
pv
0 1
[1,] 0.952 0.048
fd
mean sd 50% 2.5% 97.5%
[1,] 0.0254814 0.002536839 0.02550023 0.02081156 0.03064638
As a reminder, those with underweight BMI were numerically coded with 1 and those with Normal BMI were coded with 2. The stimulation above shows us the variation across both BMI levels of having Serious Mental Illness. People with Normal BMIs have a .03 probablity of having SMI whereas those with underweight BMIs have .23. This supports our findings above. Lets plot it and see what we see.
This gives us expected average of those with a BMI of Normal to have average of .0310 probablity of having SMI whereas for those with a underweight BMI hav a .0235 probabality of SMI. The difference in probablity between those with Normal BMI and Underweight BMIs is .025
normal.b <- setx(SMI3, BMI = 2)
|=================================================================|100% ~0 s remaining
obese.b <- setx1(SMI3, BMI = 5)
simm2 <- sim(SMI3, x = normal.b, x1 = obese.b)
summary(simm2)
sim x :
-----
ev
mean sd 50% 2.5% 97.5%
[1,] 0.03032791 0.0005063713 0.03033059 0.02935057 0.03127879
pv
0 1
[1,] 0.966 0.034
sim x1 :
-----
ev
mean sd 50% 2.5% 97.5%
[1,] 0.08433305 0.00258455 0.08439448 0.07898066 0.08931924
pv
0 1
[1,] 0.927 0.073
fd
mean sd 50% 2.5% 97.5%
[1,] 0.05400514 0.002609322 0.05408292 0.04879475 0.05915046
This stimuation is between those with Normal BMIs and the highest BMI level being a BMI above 50 (indicating Obesity). As mentioned above, the probablity of Normal BMI population is .03 to have been diagnosed with SMI. The average probablity for those with a BMI level of 50 above, which is indicatd as obesity in this analysis is .084. In other words, the probablity of those with BMIs of 50 or above having SMI is .05 more than those with a Normal BMI.The plot indicates the results.
plot(simm2)
The difference in probablity between those with Normal BMI and Obese BMI of 50 and above is as high as .05