The dataset used for this analysis is the National Health Interview Survey (NHIS) from 1997-2016. The analysis is between Mental illness, the dependent variable and gender, being the independant variable. For this analysis, mental illness is indicated using Kessler Score, which was developed by Ronald C. Kessler and known as the Kessler 6 scale (K6). My hypothesis is that women are more likely to have mental illness along with other indicators such as those with high BMIs. I believe that there might be a correlation between age as well but as in my last homework i did not include it, I will test the effect of age on Serious Mental Illness.
library(readr)
library(dplyr)
library(texreg)
library(ggplot2)
library(Zelig)
load("/Users/Deepakie/Documents/Queens College/SOC712/Data/NHIS_v3.rdata")
head(NHIS_v3)
NHIS<- NHIS_v3%>%
select(sex,racenew,bmi_7,health,asad,aeffort,ahopeless,aworthless,anervous,arestless,age)%>%
mutate(sex = ifelse(sex==1, "1","0"),
BMI = ifelse(bmi_7==1, "1",
ifelse(bmi_7==2, "2",
ifelse(bmi_7==3, "3",
ifelse(bmi_7==4, "4",
ifelse(bmi_7==5, "5",
ifelse(bmi_7==6, "6",NA)))))),
race =ifelse(racenew==10, "1",
ifelse(racenew==20, "2",
ifelse(racenew>=30, "3",
ifelse(racenew>=61, NA,NA)))),
ahopeless= ifelse(ahopeless>4,NA,ahopeless),
asad= ifelse(asad>4,NA,asad),
aworthless= ifelse(aworthless>4,NA,aworthless),
aeffort= ifelse(aeffort>4,NA,aeffort),
arestless= ifelse(arestless>4,NA,arestless),
anervous= ifelse(anervous>4,NA,anervous),
Seriousmentalillness=ifelse(ahopeless+asad+aworthless+aeffort+arestless+anervous>=13,1,0))%>%
select(-asad,-aeffort,-ahopeless,-aworthless,-anervous,-arestless,-bmi_7,-racenew)%>%filter(!is.na(Seriousmentalillness), !is.na(sex), !is.na(BMI), !is.na(race))
As we want to analysis and investigate the effect of sex and other variables on serious mental distress, we use the following variables - sex, racenew, bmi_7, asad,aeffort,ahopeless,aworthless,anervous,arestless and age. I recoded and cleaned my variables for missing data. For BMI, there are 6 categories being underweight, overweight, normal, Obese30s, Obese40s and Obese50s numbered from 1- 6, respectively.BMI of 30 or above is considered obese. There are three obese categories for the BMI variable in this analysis as obese 30s, indicating one with BMI from 30-39, obese 40s indicating BMI from 40-49 and obese 50s indicating one’s obesity level of 50 and above. Additionally, race is specified in 5 categories. The dependent variable, Seriousmentalillness, is the addition of all 6 questions which consitiute a scale measuring psychological distress.Kessler score (K6) is indicated by the addition of 6 mental distress variables on a scale of 1-4 each. So if one is to have a total score of 13 or above, they are known to have serious mental distress or illness for the sake of this analysis. Those with 13 or below have low or no mental illness. Those who have serious mental illness are coded as 1 and who aren’t are coded by 0. Age is a metric value so we just get rid of missing values for age. Sex is recoded as 1 being males and 0 = females.
head(NHIS)
SMI <- zelig(Seriousmentalillness ~ sex, model = "logit", data = NHIS, cite = F)
|============================================================|100% ~0 s remaining
summary(SMI)
Model:
Call:
z5$zelig(formula = Seriousmentalillness ~ sex, data = NHIS)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.2922 -0.2922 -0.2922 -0.2363 2.6804
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.132645 0.009351 -334.99 <2e-16
sex1 -0.431699 0.015742 -27.42 <2e-16
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 158642 on 518494 degrees of freedom
Residual deviance: 157864 on 518493 degrees of freedom
AIC: 157868
Number of Fisher Scoring iterations: 6
Next step: Use 'setx' method
This is a simple logit model where we have one binary dependent and independent variable. The results above show us men have lower likelihood than females to have serious mental distress. The log odds of males are .43 lower likely to have serious mental distress compared to females. In other words, females have a higher chance of having SMI. The next model is a logit model between SMI and other independent variables.
SMI2 <- zelig(Seriousmentalillness ~ sex + race + age + BMI, model = "logit", data = NHIS, cite = F)
|============================================================|100% ~0 s remaining
summary(SMI2)
Model:
Call:
z5$zelig(formula = Seriousmentalillness ~ sex + race + age +
BMI, data = NHIS)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.4828 -0.2708 -0.2606 -0.2187 2.7607
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.6803579 0.0480740 -55.755 < 2e-16
sex1 -0.4027082 0.0160377 -25.110 < 2e-16
race2 0.0198130 0.0209744 0.945 0.345
race3 0.1091577 0.0276593 3.947 7.93e-05
age -0.0003396 0.0004244 -0.800 0.424
BMI2 -0.6764645 0.0453889 -14.904 < 2e-16
BMI3 -0.6250893 0.0457832 -13.653 < 2e-16
BMI4 -0.1904996 0.0456995 -4.169 3.07e-05
BMI5 0.2837190 0.0520457 5.451 5.00e-08
BMI6 0.4867664 0.0813947 5.980 2.23e-09
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 158642 on 518494 degrees of freedom
Residual deviance: 156299 on 518485 degrees of freedom
AIC: 156319
Number of Fisher Scoring iterations: 6
Next step: Use 'setx' method
This model gives us the likelhihood of each independent variable effecting SMI between sexs. For example, we see that those with a BMI of Normal weight (BMI2) have .67 less log odds of having SMI compared to the reference BMI in this case being Underweight. Those with BMI4 (Obese30s) have .19 less log odds and those with higher BMIs such as Obese40(BMI5) and Obese50s(BMI6) have .28 higher and .48 log odds, respectiviely, of SMI compared to underweight. In addition, those who are white females with a BMI level of underweight have 2.68 lower odds of SMI overall. I think BMI has a bigger impact on those with serious mental illiness opposed to the other variables as obese population tend to have anxiety, depression; so i continue in the next model to run a interaction between sex and BMI.
SMI3 <- zelig(Seriousmentalillness ~ sex*BMI + race + age, model = "logit", data = NHIS, cite = F)
|============================================================|100% ~0 s remaining
summary(SMI3)
Model:
Call:
z5$zelig(formula = Seriousmentalillness ~ sex * BMI + race +
age, data = NHIS)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.4737 -0.2752 -0.2489 -0.2349 2.7713
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.8140129 0.0541513 -51.966 < 2e-16
sex1 0.2770712 0.0994706 2.785 0.005345
BMI2 -0.6342714 0.0527892 -12.015 < 2e-16
BMI3 -0.4231245 0.0533603 -7.930 2.20e-15
BMI4 0.0117360 0.0529918 0.221 0.824727
BMI5 0.4459071 0.0598983 7.444 9.74e-14
BMI6 0.5867632 0.0940862 6.236 4.48e-10
race2 0.0028593 0.0210165 0.136 0.891780
race3 0.1027281 0.0276696 3.713 0.000205
age -0.0003619 0.0004239 -0.854 0.393309
sex1:BMI2 -0.3834914 0.1034529 -3.707 0.000210
sex1:BMI3 -0.8274547 0.1034694 -7.997 1.27e-15
sex1:BMI4 -0.8596493 0.1037962 -8.282 < 2e-16
sex1:BMI5 -0.7753786 0.1205743 -6.431 1.27e-10
sex1:BMI6 -0.5134850 0.1868926 -2.747 0.006005
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 158642 on 518494 degrees of freedom
Residual deviance: 156084 on 518480 degrees of freedom
AIC: 156114
Number of Fisher Scoring iterations: 6
Next step: Use 'setx' method
We can see clearly that different BMI levels for males have a lower likelihood of having serious mental illness comapred to females. For example we see that for BMI4(Obese30s) males, there is a .85 lower log odd of having SMI compared to .01 of those overall with a Obese BMI between 30-39. Likewise, those with BMIs above 50(BMI6 Category) have a .58 higher odd of SMI comapred to Obese Men with BMIs above 50 who have .51 LOWER odds of mental illness.
x.out <- setx(SMI3)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
x.male <- setx(SMI3, sex="1")
Error: Wrong factor
SMI3$setrange(sex = min(NHIS$sex, na.rm = T):max(NHIS$sex, na.rm = T))
Error: Wrong factor
bmi.range = min(NHIS$BMI):max(NHIS$BMI)
x <- setx(SMI3, BMI = bmi.range)
Error: Wrong factor
bmi.range = min(NHIS$BMI):max(NHIS$BMI)
SMI3$setrange(BMI = bmi.range)
Error: Wrong factor
a.range = min(NHIS$age):max(NHIS$age)
x <- setx(SMI3, age = a.range)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels