The BRFSS survey tracks individuals who commit behaviors which place them at higher risk of HIV: unprotected sex, drug use and so forth. My theory is that individuals with poor mental health will be more likely to commit high-risk activities which could lead to HIV infection. The outcome is binary: “1” if someone has committed a behavior which is high-risk for HIV in the past year, “0” if they have not, “NA” for missing data.
#bad mental health days
brfss16m<-brfss16m %>% mutate(bad_mental_days=case_when(.$menthlth %in% c(1:30)~.$menthlth,
.$menthlth==88~0,
.$menthlth %in% c(77,99)~NA_real_))
# HIV risk
brfss16m<-brfss16m %>% mutate(hiv_risk=case_when(.$hivrisk4==1~1,
.$hivrisk4==2~0,
.$hivrisk4 %in% c(7,9)~NA_real_))
Does exposure to risky behavior increase across MMSA’s?
anova(glm(hiv_risk ~ mmsa, brfss16m, family=binomial), test="Chisq")
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: hiv_risk
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 228761 77812
## mmsa 1 35.454 228760 77776 2.611e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
It does indeed appear that propensity for HIV-spreading risky behaviors does change across geographic areas.
Here, I am using code from Dr. Spark’s example to construct higher-level predictors for the BRFSS from ACS data:
#higher-level predictors
library(acs)
## Loading required package: stringr
## Loading required package: XML
##
## Attaching package: 'acs'
## The following object is masked from 'package:dplyr':
##
## combine
## The following object is masked from 'package:base':
##
## apply
mykey<-api.key.install(key="")
#Get 2010 ACS median household incomes for tracts in Texas
msaacs<-geo.make(msa="*")
#ACS tables B17 is poverty, B19 is Gini, B25001 is housing vacancy, B25035 is median year built
acsecon<-acs.fetch(key=mykey, endyear=2010, span=5, geography=msaacs, variable = c("B19083_001","B17001_001","B17001_002" , "B25002_001","B25002_003", "B25035_001"))
msaecon<-data.frame(gini=acsecon@estimate[, "B19083_001"],
ppoverty=acsecon@estimate[, "B17001_002"]/acsecon@estimate[, "B17001_001"],
giniz=scale(acsecon@estimate[, "B19083_001"]),
pvacant=acsecon@estimate[,"B25002_003"]/acsecon@estimate[, "B25002_001"],
ppovertyz=scale(acsecon@estimate[, "B17001_002"]/acsecon@estimate[, "B17001_001"]),
pvacantz=scale(acsecon@estimate[,"B25002_003"]/acsecon@estimate[, "B25002_001"]),
medhouse=acsecon@estimate[, "B25035_001" ],
medhousez=scale(acsecon@estimate[, "B25035_001" ]))
msaecon$ids<-paste(acsecon@geography$metropolitanstatisticalareamicropolitanstatisticalarea)
head(msaecon)
## gini ppoverty giniz
## Adjuntas, PR Micro Area 0.525 0.5925656 2.718785
## Aguadilla-Isabela-San Sebastián, PR Metro Area 0.529 0.5577311 2.847562
## Coamo, PR Micro Area 0.532 0.5633984 2.944145
## Fajardo, PR Metro Area 0.483 0.4309070 1.366623
## Guayama, PR Metro Area 0.518 0.4980518 2.493425
## Jayuya, PR Micro Area 0.502 0.5446137 1.978315
## pvacant ppovertyz
## Adjuntas, PR Micro Area 0.1996029 6.271934
## Aguadilla-Isabela-San Sebastián, PR Metro Area 0.2005956 5.762039
## Coamo, PR Micro Area 0.1500353 5.844994
## Fajardo, PR Metro Area 0.3143172 3.905632
## Guayama, PR Metro Area 0.1951184 4.888474
## Jayuya, PR Micro Area 0.1562109 5.570031
## pvacantz medhouse
## Adjuntas, PR Micro Area 0.9053869 1978
## Aguadilla-Isabela-San Sebastián, PR Metro Area 0.9199516 1979
## Coamo, PR Micro Area 0.1781186 1979
## Fajardo, PR Metro Area 2.5885036 1977
## Guayama, PR Metro Area 0.8395882 1978
## Jayuya, PR Micro Area 0.2687281 1978
## medhousez ids
## Adjuntas, PR Micro Area 0.4923898 10260
## Aguadilla-Isabela-San Sebastián, PR Metro Area 0.5982027 10380
## Coamo, PR Micro Area 0.5982027 17620
## Fajardo, PR Metro Area 0.3865770 21940
## Guayama, PR Metro Area 0.4923898 25020
## Jayuya, PR Micro Area 0.4923898 27580
Now that we have higher-level predictors (Census MSA’s), we can construct GLM models with the data:
#joining the ACS and BRFSS data:
joindata<-merge(brfss16m, msaecon, by.x="mmsa",by.y="ids", all.x=T)
# bad mental health days scaled
joindata$bad_mental_daysz<-as.numeric(scale(joindata$bad_mental_days, center=T, scale=T))
Here I am creating two GLM models, one without higher-level predictors (fit.mix.bin) and one with two higher-level predictors (ppovertyz and giniz – fit.mix.bin2):
joindata$mmsa_fac<-as.factor(joindata$mmsa)
join_data<-joindata %>% filter(complete.cases(ppovertyz))
fit.mix.bin<-glmer(hiv_risk~male+bad_mental_daysz+educ+race_eth+(1|mmsa_fac), family=binomial, join_data,
control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
fit.mix.bin2<-glmer(hiv_risk~male+bad_mental_daysz+educ+race_eth+ppovertyz+giniz+(1|mmsa_fac), family=binomial, join_data,
control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
summary(fit.mix.bin)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula:
## hiv_risk ~ male + bad_mental_daysz + educ + race_eth + (1 | mmsa_fac)
## Data: join_data
## Control:
## glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))
##
## AIC BIC logLik deviance df.resid
## 54310.3 54431.2 -27143.1 54286.3 176103
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -0.7123 -0.2108 -0.1769 -0.1381 10.3922
##
## Random effects:
## Groups Name Variance Std.Dev.
## mmsa_fac (Intercept) 0.03297 0.1816
## Number of obs: 176115, groups: mmsa_fac, 123
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.838458 0.035518 -108.07 < 2e-16 ***
## male 0.764232 0.025680 29.76 < 2e-16 ***
## bad_mental_daysz 0.340030 0.009069 37.50 < 2e-16 ***
## educ0Prim -0.499613 0.094712 -5.28 1.33e-07 ***
## educ1somehs 0.076023 0.056535 1.34 0.17872
## educ3somecol 0.103274 0.032579 3.17 0.00152 **
## educ4colgrad -0.204533 0.033049 -6.19 6.06e-10 ***
## race_ethhispanic 0.685038 0.042995 15.93 < 2e-16 ***
## race_ethnh black 0.690156 0.039465 17.49 < 2e-16 ***
## race_ethnh multirace 0.721393 0.074520 9.68 < 2e-16 ***
## race_ethnh other 0.275873 0.068583 4.02 5.76e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) male bd_mn_ edc0Pr edc1sm edc3sm edc4cl rc_thh rc_thb
## male -0.439
## bd_mntl_dys -0.176 0.098
## educ0Prim -0.133 -0.007 -0.023
## educ1somehs -0.253 0.002 -0.054 0.124
## educ3somecl -0.509 0.025 0.013 0.178 0.304
## educ4colgrd -0.512 -0.005 0.094 0.167 0.289 0.535
## rac_thhspnc -0.193 -0.003 0.013 -0.165 -0.089 0.036 0.083
## rc_thnhblck -0.218 0.035 0.005 -0.006 -0.046 0.021 0.076 0.162
## rc_thnhmltr -0.079 -0.013 -0.051 -0.004 -0.016 -0.015 0.019 0.079 0.088
## rc_thnhothr -0.076 -0.026 -0.008 -0.017 -0.016 0.003 -0.013 0.092 0.090
## rc_thm
## male
## bd_mntl_dys
## educ0Prim
## educ1somehs
## educ3somecl
## educ4colgrd
## rac_thhspnc
## rc_thnhblck
## rc_thnhmltr
## rc_thnhothr 0.047
summary(fit.mix.bin2)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula:
## hiv_risk ~ male + bad_mental_daysz + educ + race_eth + ppovertyz +
## giniz + (1 | mmsa_fac)
## Data: join_data
## Control:
## glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))
##
## AIC BIC logLik deviance df.resid
## 54308.6 54449.7 -27140.3 54280.6 176101
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -0.7110 -0.2108 -0.1768 -0.1381 10.3112
##
## Random effects:
## Groups Name Variance Std.Dev.
## mmsa_fac (Intercept) 0.03024 0.1739
## Number of obs: 176115, groups: mmsa_fac, 123
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.902632 0.044541 -87.62 < 2e-16 ***
## male 0.764396 0.025675 29.77 < 2e-16 ***
## bad_mental_daysz 0.340264 0.009071 37.51 < 2e-16 ***
## educ0Prim -0.499759 0.094841 -5.27 1.37e-07 ***
## educ1somehs 0.075525 0.056371 1.34 0.18031
## educ3somecol 0.103390 0.032590 3.17 0.00151 **
## educ4colgrad -0.205827 0.033058 -6.23 4.78e-10 ***
## race_ethhispanic 0.692512 0.043456 15.94 < 2e-16 ***
## race_ethnh black 0.684216 0.039661 17.25 < 2e-16 ***
## race_ethnh multirace 0.721155 0.074898 9.63 < 2e-16 ***
## race_ethnh other 0.274017 0.068485 4.00 6.30e-05 ***
## ppovertyz -0.115872 0.049473 -2.34 0.01917 *
## giniz 0.089369 0.043072 2.07 0.03800 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 13 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
stargazer(fit.mix.bin,fit.mix.bin2,type="html")
| Dependent variable: | ||
| hiv_risk | ||
| (1) | (2) | |
| male | 0.764*** | 0.764*** |
| (0.026) | (0.026) | |
| bad_mental_daysz | 0.340*** | 0.340*** |
| (0.009) | (0.009) | |
| educ0Prim | -0.500*** | -0.500*** |
| (0.095) | (0.095) | |
| educ1somehs | 0.076 | 0.076 |
| (0.057) | (0.056) | |
| educ3somecol | 0.103*** | 0.103*** |
| (0.033) | (0.033) | |
| educ4colgrad | -0.205*** | -0.206*** |
| (0.033) | (0.033) | |
| race_ethhispanic | 0.685*** | 0.693*** |
| (0.043) | (0.043) | |
| race_ethnh black | 0.690*** | 0.684*** |
| (0.039) | (0.040) | |
| race_ethnh multirace | 0.721*** | 0.721*** |
| (0.075) | (0.075) | |
| race_ethnh other | 0.276*** | 0.274*** |
| (0.069) | (0.068) | |
| ppovertyz | -0.116** | |
| (0.049) | ||
| giniz | 0.089** | |
| (0.043) | ||
| Constant | -3.838*** | -3.903*** |
| (0.036) | (0.045) | |
| Observations | 176,115 | 176,115 |
| Log Likelihood | -27,143.140 | -27,140.310 |
| Akaike Inf. Crit. | 54,310.290 | 54,308.610 |
| Bayesian Inf. Crit. | 54,431.230 | 54,449.720 |
| Note: | p<0.1; p<0.05; p<0.01 | |
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
lrtest(fit.mix.bin,fit.mix.bin2)
## Likelihood ratio test
##
## Model 1: hiv_risk ~ male + bad_mental_daysz + educ + race_eth + (1 | mmsa_fac)
## Model 2: hiv_risk ~ male + bad_mental_daysz + educ + race_eth + ppovertyz +
## giniz + (1 | mmsa_fac)
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 12 -27143
## 2 14 -27140 2 5.6713 0.05868 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
As we can see from the table above as well as the likelihood ratio test, the model with group level predictors is only vaguely different from the model without. Indeed, they cannot be distinguished in the AIC test either.
It does appear that there is a relationship between scaled bad mental health days and self-reported behaviors which put the respondent at risk of HIV infection.