Naslund Homework 8

Homework 8: MLM Models 3

Hypothesis: Bad mental health will be associated with increased HIV risk

The BRFSS survey tracks individuals who commit behaviors which place them at higher risk of HIV: unprotected sex, drug use and so forth. My theory is that individuals with poor mental health will be more likely to commit high-risk activities which could lead to HIV infection. The outcome is binary: “1” if someone has committed a behavior which is high-risk for HIV in the past year, “0” if they have not, “NA” for missing data.

Research question: does an increase of bad mental health result in increased HIV risk?

#bad mental health days

brfss16m<-brfss16m %>% mutate(bad_mental_days=case_when(.$menthlth %in% c(1:30)~.$menthlth,
                                                        .$menthlth==88~0,
                                                        .$menthlth %in% c(77,99)~NA_real_))

# HIV risk
brfss16m<-brfss16m %>% mutate(hiv_risk=case_when(.$hivrisk4==1~1,
                                                 .$hivrisk4==2~0,
                                                 .$hivrisk4 %in% c(7,9)~NA_real_))

Change across groups

Does exposure to risky behavior increase across MMSA’s?

anova(glm(hiv_risk ~ mmsa, brfss16m, family=binomial), test="Chisq")

## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: hiv_risk
## 
## Terms added sequentially (first to last)
## 
## 
##      Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                228761      77812              
## mmsa  1   35.454    228760      77776 2.611e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

It does indeed appear that propensity for HIV-spreading risky behaviors does change across geographic areas.

MLM with higher-level predictors

Here, I am using code from Dr. Spark’s example to construct higher-level predictors for the BRFSS from ACS data:

#higher-level predictors
library(acs)

## Loading required package: stringr

## Loading required package: XML

## 
## Attaching package: 'acs'

## The following object is masked from 'package:dplyr':
## 
##     combine

## The following object is masked from 'package:base':
## 
##     apply

mykey<-api.key.install(key="")
#Get 2010 ACS median household incomes for tracts in Texas
msaacs<-geo.make(msa="*")

#ACS tables B17 is poverty, B19 is Gini, B25001 is housing vacancy, B25035 is median year built 
acsecon<-acs.fetch(key=mykey, endyear=2010, span=5, geography=msaacs, variable = c("B19083_001","B17001_001","B17001_002" , "B25002_001","B25002_003", "B25035_001"))

msaecon<-data.frame(gini=acsecon@estimate[, "B19083_001"], 
ppoverty=acsecon@estimate[, "B17001_002"]/acsecon@estimate[, "B17001_001"],
giniz=scale(acsecon@estimate[, "B19083_001"]),
pvacant=acsecon@estimate[,"B25002_003"]/acsecon@estimate[, "B25002_001"],
ppovertyz=scale(acsecon@estimate[, "B17001_002"]/acsecon@estimate[, "B17001_001"]), 
pvacantz=scale(acsecon@estimate[,"B25002_003"]/acsecon@estimate[, "B25002_001"]), 
medhouse=acsecon@estimate[, "B25035_001" ],
medhousez=scale(acsecon@estimate[, "B25035_001" ]))

msaecon$ids<-paste(acsecon@geography$metropolitanstatisticalareamicropolitanstatisticalarea)
head(msaecon)

##                                                  gini  ppoverty    giniz
## Adjuntas, PR Micro Area                         0.525 0.5925656 2.718785
## Aguadilla-Isabela-San SebastiÃ¡n, PR Metro Area 0.529 0.5577311 2.847562
## Coamo, PR Micro Area                            0.532 0.5633984 2.944145
## Fajardo, PR Metro Area                          0.483 0.4309070 1.366623
## Guayama, PR Metro Area                          0.518 0.4980518 2.493425
## Jayuya, PR Micro Area                           0.502 0.5446137 1.978315
##                                                   pvacant ppovertyz
## Adjuntas, PR Micro Area                         0.1996029  6.271934
## Aguadilla-Isabela-San SebastiÃ¡n, PR Metro Area 0.2005956  5.762039
## Coamo, PR Micro Area                            0.1500353  5.844994
## Fajardo, PR Metro Area                          0.3143172  3.905632
## Guayama, PR Metro Area                          0.1951184  4.888474
## Jayuya, PR Micro Area                           0.1562109  5.570031
##                                                  pvacantz medhouse
## Adjuntas, PR Micro Area                         0.9053869     1978
## Aguadilla-Isabela-San SebastiÃ¡n, PR Metro Area 0.9199516     1979
## Coamo, PR Micro Area                            0.1781186     1979
## Fajardo, PR Metro Area                          2.5885036     1977
## Guayama, PR Metro Area                          0.8395882     1978
## Jayuya, PR Micro Area                           0.2687281     1978
##                                                 medhousez   ids
## Adjuntas, PR Micro Area                         0.4923898 10260
## Aguadilla-Isabela-San SebastiÃ¡n, PR Metro Area 0.5982027 10380
## Coamo, PR Micro Area                            0.5982027 17620
## Fajardo, PR Metro Area                          0.3865770 21940
## Guayama, PR Metro Area                          0.4923898 25020
## Jayuya, PR Micro Area                           0.4923898 27580

Now that we have higher-level predictors (Census MSA’s), we can construct GLM models with the data:

#joining the ACS and BRFSS data:

joindata<-merge(brfss16m, msaecon, by.x="mmsa",by.y="ids", all.x=T)

# bad mental health days scaled

joindata$bad_mental_daysz<-as.numeric(scale(joindata$bad_mental_days, center=T, scale=T))

Here I am creating two GLM models, one without higher-level predictors (fit.mix.bin) and one with two higher-level predictors (ppovertyz and giniz – fit.mix.bin2):

joindata$mmsa_fac<-as.factor(joindata$mmsa)
join_data<-joindata %>% filter(complete.cases(ppovertyz))
fit.mix.bin<-glmer(hiv_risk~male+bad_mental_daysz+educ+race_eth+(1|mmsa_fac), family=binomial, join_data,
                   control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
fit.mix.bin2<-glmer(hiv_risk~male+bad_mental_daysz+educ+race_eth+ppovertyz+giniz+(1|mmsa_fac), family=binomial, join_data,
                   control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))

summary(fit.mix.bin)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## hiv_risk ~ male + bad_mental_daysz + educ + race_eth + (1 | mmsa_fac)
##    Data: join_data
## Control: 
## glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))
## 
##      AIC      BIC   logLik deviance df.resid 
##  54310.3  54431.2 -27143.1  54286.3   176103 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -0.7123 -0.2108 -0.1769 -0.1381 10.3922 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  mmsa_fac (Intercept) 0.03297  0.1816  
## Number of obs: 176115, groups:  mmsa_fac, 123
## 
## Fixed effects:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          -3.838458   0.035518 -108.07  < 2e-16 ***
## male                  0.764232   0.025680   29.76  < 2e-16 ***
## bad_mental_daysz      0.340030   0.009069   37.50  < 2e-16 ***
## educ0Prim            -0.499613   0.094712   -5.28 1.33e-07 ***
## educ1somehs           0.076023   0.056535    1.34  0.17872    
## educ3somecol          0.103274   0.032579    3.17  0.00152 ** 
## educ4colgrad         -0.204533   0.033049   -6.19 6.06e-10 ***
## race_ethhispanic      0.685038   0.042995   15.93  < 2e-16 ***
## race_ethnh black      0.690156   0.039465   17.49  < 2e-16 ***
## race_ethnh multirace  0.721393   0.074520    9.68  < 2e-16 ***
## race_ethnh other      0.275873   0.068583    4.02 5.76e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) male   bd_mn_ edc0Pr edc1sm edc3sm edc4cl rc_thh rc_thb
## male        -0.439                                                        
## bd_mntl_dys -0.176  0.098                                                 
## educ0Prim   -0.133 -0.007 -0.023                                          
## educ1somehs -0.253  0.002 -0.054  0.124                                   
## educ3somecl -0.509  0.025  0.013  0.178  0.304                            
## educ4colgrd -0.512 -0.005  0.094  0.167  0.289  0.535                     
## rac_thhspnc -0.193 -0.003  0.013 -0.165 -0.089  0.036  0.083              
## rc_thnhblck -0.218  0.035  0.005 -0.006 -0.046  0.021  0.076  0.162       
## rc_thnhmltr -0.079 -0.013 -0.051 -0.004 -0.016 -0.015  0.019  0.079  0.088
## rc_thnhothr -0.076 -0.026 -0.008 -0.017 -0.016  0.003 -0.013  0.092  0.090
##             rc_thm
## male              
## bd_mntl_dys       
## educ0Prim         
## educ1somehs       
## educ3somecl       
## educ4colgrd       
## rac_thhspnc       
## rc_thnhblck       
## rc_thnhmltr       
## rc_thnhothr  0.047

summary(fit.mix.bin2)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## hiv_risk ~ male + bad_mental_daysz + educ + race_eth + ppovertyz +  
##     giniz + (1 | mmsa_fac)
##    Data: join_data
## Control: 
## glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))
## 
##      AIC      BIC   logLik deviance df.resid 
##  54308.6  54449.7 -27140.3  54280.6   176101 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -0.7110 -0.2108 -0.1768 -0.1381 10.3112 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  mmsa_fac (Intercept) 0.03024  0.1739  
## Number of obs: 176115, groups:  mmsa_fac, 123
## 
## Fixed effects:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          -3.902632   0.044541  -87.62  < 2e-16 ***
## male                  0.764396   0.025675   29.77  < 2e-16 ***
## bad_mental_daysz      0.340264   0.009071   37.51  < 2e-16 ***
## educ0Prim            -0.499759   0.094841   -5.27 1.37e-07 ***
## educ1somehs           0.075525   0.056371    1.34  0.18031    
## educ3somecol          0.103390   0.032590    3.17  0.00151 ** 
## educ4colgrad         -0.205827   0.033058   -6.23 4.78e-10 ***
## race_ethhispanic      0.692512   0.043456   15.94  < 2e-16 ***
## race_ethnh black      0.684216   0.039661   17.25  < 2e-16 ***
## race_ethnh multirace  0.721155   0.074898    9.63  < 2e-16 ***
## race_ethnh other      0.274017   0.068485    4.00 6.30e-05 ***
## ppovertyz            -0.115872   0.049473   -2.34  0.01917 *  
## giniz                 0.089369   0.043072    2.07  0.03800 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## Correlation matrix not shown by default, as p = 13 > 12.
## Use print(x, correlation=TRUE)  or
##   vcov(x)     if you need it

stargazer(fit.mix.bin,fit.mix.bin2,type="html")


	Dependent variable:

	hiv_risk
	(1)	(2)

male	0.764^***	0.764^***
	(0.026)	(0.026)

bad_mental_daysz	0.340^***	0.340^***
	(0.009)	(0.009)

educ0Prim	-0.500^***	-0.500^***
	(0.095)	(0.095)

educ1somehs	0.076	0.076
	(0.057)	(0.056)

educ3somecol	0.103^***	0.103^***
	(0.033)	(0.033)

educ4colgrad	-0.205^***	-0.206^***
	(0.033)	(0.033)

race_ethhispanic	0.685^***	0.693^***
	(0.043)	(0.043)

race_ethnh black	0.690^***	0.684^***
	(0.039)	(0.040)

race_ethnh multirace	0.721^***	0.721^***
	(0.075)	(0.075)

race_ethnh other	0.276^***	0.274^***
	(0.069)	(0.068)

ppovertyz		-0.116^**
		(0.049)

giniz		0.089^**
		(0.043)

Constant	-3.838^***	-3.903^***
	(0.036)	(0.045)


Observations	176,115	176,115
Log Likelihood	-27,143.140	-27,140.310
Akaike Inf. Crit.	54,310.290	54,308.610
Bayesian Inf. Crit.	54,431.230	54,449.720

Note:	p<0.1; p<0.05; p<0.01

library(lmtest)

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

lrtest(fit.mix.bin,fit.mix.bin2)

## Likelihood ratio test
## 
## Model 1: hiv_risk ~ male + bad_mental_daysz + educ + race_eth + (1 | mmsa_fac)
## Model 2: hiv_risk ~ male + bad_mental_daysz + educ + race_eth + ppovertyz + 
##     giniz + (1 | mmsa_fac)
##   #Df LogLik Df  Chisq Pr(>Chisq)  
## 1  12 -27143                       
## 2  14 -27140  2 5.6713    0.05868 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As we can see from the table above as well as the likelihood ratio test, the model with group level predictors is only vaguely different from the model without. Indeed, they cannot be distinguished in the AIC test either.

Conclusion

It does appear that there is a relationship between scaled bad mental health days and self-reported behaviors which put the respondent at risk of HIV infection.

Naslund Homework 8 - DEM 7283