phq2 = Dummy variable for depression. Sum ‘interest’ and ‘down’. Score of 3 or higher indicates major depressive disorder. Dichotomized.
gad2 = Dummy variable for anxiety. Sum ‘anxious’ and ‘worry’. Score of 3 or higher indicative of generalized anxiety disorder. Dichotomized.
See section 2.2.1 Mental Health in https://www.sciencedirect.com/science/article/pii/S0277953620307760
Independent variables
From Household Pulse Survey
rentcur: Caught up on rent. Dichotomized so that “Caught up on rent” = 1 and “Not caught up on rent” = 0. NAs removed.
evict: Risk of eviction in next two months. 0 is caught up on rent, 1 is not likely at all, 2 is not very likely, 3 is somewhat likeley, 4 is very likely. NAs removed.
score: State score on the COVID-19 Housing Policy Scorecard. Value is on a scale from 0 to 5, with the highest score being 4.63.
Covariates
(from Household Pulse Survey)
income: Total household income (before taxes).
Less than $25,000
$25,000 - $34,999
$35,000 - $49,999
$50,000 - $74,999
$75,000 - $99,999
$100,000 - $149,999
$150,000 - $199,999
$200,000 and above
Question seen but category not selected
Missing / Did not report
age: Derived from tbirth_year (year of birth)
genid_describe: Current gender identity. 1 is Male, 2 is Female, 3 is Transgender, 4 is none of these, 88, 99.
race_eth: Categorical variable derived from ‘rhispanic’ and ‘rrace’. (Describe coding)
single_adult: Dummy variable for single adult in household vs. multiple adults. Derived from ‘thhld_numadlt’.
thhld_numper: Total number of people in household
child: Dummy variable derived from ‘thhld_numkid’
eeduc: Educational attainment. 1 is less than high school, 2 is some high school, 3 is high school graduate, 4 is some college but no degree, 5 is associates degree, 6 is bachelor’s degree, 7 is graduate degree.
Other variables for cleaning
week: week of interview (review how to use this)
anxious, worry, interest, and down. Used to derive phq4. Frequency of having bad feeling over previous 2 weeks. 1 not at all, 2 is several days, 3 is more than half the days, 4 is nearly every day, 88 is missing, 99 is category not selected.
tenure: 3 is rented. Used to select renters.
est_st: FIPS code for state, used to merge
state: Name of state
abbv: Two letter abbreviation of state
Reading in the data
Importing the data for Household Pulse Survey phase 3.2 to 3.4.
Make dichotomized variables for depression and anxiety.
The HH Pulse variables for “anxious”, “worry”, “interest”, and “down” are scored from 1 to 4, but the clinical scale is from 0 to 3. Therefore, the variables must be adjusted to the 0 to 3 scale by subtracting 1.
First, eliminate NA’s and subtract 1 from each scale of anxious, worry, interest, and down.
Next, make variable “depression” consisting of the sum of “interest” and “down” and make a variable “anxiety” consisting of the sum of “anxious” and “worry.”
Code
dat <- dat %>%mutate(depression = interest + down) %>%mutate(anxiety = anxious + worry)tabyl(dat$depression)
Next, dichotomize “depression” and “anxiety”. ‘phq2’ is the dichotomized depression variable. ‘gad2’ is the dichotomized anxiety variable.
Code
dat <- dat %>%mutate(phq2 =case_when(depression %in%c(0:2) ~0, depression %in%c(3:6) ~1, ) )tabyl(dat$phq2)
dat$phq2 n percent
0 112835 0.7106105
1 45951 0.2893895
Code
dat <- dat %>%mutate(gad2 =case_when(anxiety %in%c(0:2) ~0, anxiety %in%c(3:6) ~1, ) )tabyl(dat$gad2)
dat$gad2 n percent
0 101890 0.6416813
1 56896 0.3583187
Code
nrow(dat)
[1] 158786
Clean and recode ‘rentcur’
Then I will check the distribution of ‘rentcur’ and remove observations where ‘rentcur’ is missing. In the original dataset, 1 is “caught up on rent” and 2 is “Not caught up on rent.”
Then I will create a dummy variable for ‘rentcur’ so that the value of ‘1’ means ‘caught up on rent’ and the value of ‘0’ is ‘not caught up on rent’.
dat$rentcur n percent
0 18104 0.1143384
1 140233 0.8856616
Code
nrow(dat)
[1] 158337
Clean and recode ‘evict’
Checking distribution of evict.
evict: Eviction in next two months. 1 is very likely, 2 is somewhat likeley, 3 is not very likely, 4 is not likely at all, 88 and 99. Only asked if rentcur = 2.
This needs to be recoded so that 0 is current on rent (not at risk of eviction due to late on rent), 1 is not likely at all, 2 is not very likely, 3 is somewhat likely, and 4 is very likely.
Code ‘evict’ = 0 when rentcur = 1. Change other codes as well.
Code
dat <- dat %>%mutate(evict =case_when(rentcur ==1~0, evict ==4~1, evict ==3~2, evict ==2~3, evict ==1~4 ) )tabyl(dat$evict)
dat$evict n percent valid_percent
0 140233 0.885661595 0.88882199
1 4850 0.030630870 0.03074017
2 5241 0.033100286 0.03321840
3 4961 0.031331906 0.03144371
4 2489 0.015719636 0.01577573
NA 563 0.003555707 NA
Filer out NAs.
Code
dat <- dat %>%filter(evict %in%c(0:4))tabyl(dat$evict)
Creating a new variable ‘age’ by subtracting birth year from the year the data were collected. (How can I fix this since the data were collected in 2 years?)
genid_describe: Current gender identity. 1 is Male, 2 is Female, 3 is Transgender, 4 is none of these. 99 is ‘question seen but category not selected.’
dat$children n percent
0 106423 0.7077833
1 43938 0.2922167
Clean eeduc
eeduc: Educational attainment. 1 is less than high school, 2 is some high school, 3 is high school graduate, 4 is some college but no degree, 5 is associates degree, 6 is bachelor’s degree, 7 is graduate degree.
For educational attainment, dichotomize into “less than high school”, “high school”, “some college”, “bachelor’s or higher”.
The key thing for the model is to seee how much variation is there across states.
Scatterplot - aggregating values by state and graphing it. Calculate percent on phq2 and gad2. First do the mean on the raw HH Pulse mental health score. For each state you have the score on the eviction policy. State eviction score is the unit of analysis on X axis score. On the Y axis you have mean mental health score by state or percent with bad mental health by state.
Here I create a new dataframe that takes the data in dat and groups them by state. Then I find the mean mental health number for each state. I create a dataframe that contains the state and the mean mental health score of renters who are late on rent in each state.
ggplot(statedat, aes(x=score, y=mean_mh, label=abbv)) +geom_point() +geom_text(hjust=-0.3, vjust=0.5) +geom_smooth(method=lm) +labs(title ="Mean Mental Health of All Renters by State",x ="State COVID Housing Policy Score",y ="Mean Mental Health")
`geom_smooth()` using formula = 'y ~ x'
Warning: The following aesthetics were dropped during statistical transformation: label
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
Pearson correlation test between State COVID-19 Score and the mean mental health among all renters.
Code
test <-cor.test(statedat$score, statedat$mean_mh, method ="spearman")
Warning in cor.test.default(statedat$score, statedat$mean_mh, method =
"spearman"): Cannot compute exact p-value with ties
Code
test
Spearman's rank correlation rho
data: statedat$score and statedat$mean_mh
S = 29127, p-value = 0.02299
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
-0.3179432
library(ggpubr)library(ggrepel)statebmh$evict <-as.factor(statebmh$evict)ggplot(data=statebmh, aes(x = mscore, y = anx_pc, color=evict)) +geom_point() +stat_smooth(method ="lm", size =1) +stat_regline_equation(aes(label = ..rr.label..)) +geom_text_repel(aes(anx_pc, mscore, label = abbv, color=evict), size =6)+guides(color=F)+theme_bw(base_size =18) +labs(title="Marriageable Male Effect for States Differing in Conservatism", subtitle="US-Born Men Ages 18 to 27", caption="Source: 2012 ACS 5-Year File") +xlab(label="# of Employed Men Per 100 Men in the Unmarried Population Ages 18 to 27") +ylab(label="Standardized Early Marriage Ratio") +ylim(0, 1)
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
of ggplot2 3.3.4.
Warning: The dot-dot notation (`..rr.label..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(rr.label)` instead.
Warning: ggrepel: 80 unlabeled data points (too many overlaps). Consider
increasing max.overlaps
Percent of Tenants with Depression by State
Code
ggplot(statedat, aes(x=score, y=dep_pc, label=abbv)) +geom_point() +geom_text(hjust=-0.3, vjust=0.5) +geom_smooth(method=lm) +labs(title ="Percent of Tenants with Depression",x ="State COVID Housing Policy Score",y ="Percent with Depression")
`geom_smooth()` using formula = 'y ~ x'
Warning: The following aesthetics were dropped during statistical transformation: label
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
Percent of Tenants with Anxiety by State
Code
ggplot(statedat, aes(x=score, y=anx_pc, label=abbv)) +geom_point() +geom_text(hjust=-0.3, vjust=0.5) +geom_smooth(method=lm) +labs(title ="Percent of Tenants with Anxiety",x ="State COVID Housing Policy Score",y ="Percent with Depression")
`geom_smooth()` using formula = 'y ~ x'
Warning: The following aesthetics were dropped during statistical transformation: label
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
Pearson correlation test between State COVID-19 Score and the percent with depression among all renters.
Warning in cor.test.default(statedat$score, statedat$dep_pc, method =
"spearman"): Cannot compute exact p-value with ties
Code
test2
Spearman's rank correlation rho
data: statedat$score and statedat$dep_pc
S = 3922021, p-value = 2.832e-12
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
-0.419212
Pearson correlation test between State COVID-19 Score and the percent with anxiety among all renters.
Warning in cor.test.default(statedat$score, statedat$anx_pc, method =
"spearman"): Cannot compute exact p-value with ties
Code
test3
Spearman's rank correlation rho
data: statedat$score and statedat$anx_pc
S = 2993794, p-value = 0.1847
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
-0.08332638
Plot of risk of eviction and depression.
Code
ggplot(dat, aes(evict, depression)) +geom_boxplot(aes(group = evict)) +geom_smooth(method=lm) +labs(title ="Eviction Risk and Depression",x ="Risk of Eviction",y ="Depression")
`geom_smooth()` using formula = 'y ~ x'
Plot of risk of eviction and anxiety
Code
ggplot(dat, aes(evict, anxiety)) +geom_boxplot(aes(group = evict)) +geom_smooth(method=lm) +labs(title ="Eviction Risk and Anxiety",x ="Risk of Eviction",y ="Anxiety")
Warning in cor.test.default(dat$evict, dat$depression, method = "spearman"):
Cannot compute exact p-value with ties
Code
test4
Spearman's rank correlation rho
data: dat$evict and dat$depression
S = 4.7599e+14, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.1598694
Warning in cor.test.default(dat$evict, dat$anxiety, method = "spearman"):
Cannot compute exact p-value with ties
Code
test5
Spearman's rank correlation rho
data: dat$evict and dat$anxiety
S = 4.7278e+14, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.165534
Modeling
Plan for modeling.
Population: All renters in the HH Pulse Survey phases 3.2 to 3.4.
Dependent variables
phq2 = Dummy variable for depression. Sum ‘interest’ and ‘down’. Score of 3 or higher indicates major depressive disorder. Dichotomized.
gad2 = Dummy variable for anxiety. Sum ‘anxious’ and ‘worry’. Score of 3 or higher indicative of generalized anxiety disorder. Dichotomized.
Independent variables
rentcur: Caught up on rent. Dichotomized so that “Caught up on rent” = 1 and “Not caught up on rent” = 0. NAs removed.
evict: Risk of eviction in next two months. 0 is caught up on rent, 1 is not likely at all, 2 is not very likely, 3 is somewhat likeley, 4 is very likely. NAs removed.
race_eth: categorical: hispanic, nh_white, nh_black, nh_asian, other
dichotomized: hispanic, nh_white, nh_black, nh_asian, other
single_adult: dichotomized 1 = singe adult or 0 = mutiple adults in household
thhld_numper: integer: number of people in household from 1 to 10.
children: dichotomized: 1 = children in household, 0 = no children in household
eeduc: 1 is less than high school, 2 is some high school, 3 is high school graduate, 4 is some college but no degree, 5 is associates degree, 6 is bachelor’s degree, 7 is graduate degree.
fix scatterplot showing state scores but grouped by eviction risk (have to think about this). I need to find the percent of eviction risk by state. The variable is dichotomized, so I can find the mean for each state. But what does that variable mean?
run models!
mean eviction risk by score DV, then add score, then add evict risk
Code
# Running Generalized linear mixed model (GLMM). Checking variance of depression by state.m1 <-glmer(phq2 ~ (1| abbv), data = dat, family = binomial)summary(m1)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: phq2 ~ (1 | abbv)
Data: dat
AIC BIC logLik deviance df.resid
180666.9 180686.7 -90331.4 180662.9 150359
Scaled residuals:
Min 1Q Median 3Q Max
-0.7735 -0.6475 -0.6060 1.4627 1.8419
Random effects:
Groups Name Variance Std.Dev.
abbv (Intercept) 0.0224 0.1497
Number of obs: 150361, groups: abbv, 51
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.86299 0.02191 -39.38 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
# Outcome variable is depression and focus variable is state housing score.m2 <-glmer(phq2 ~ score + (1| abbv), data = dat, family = binomial)summary(m2)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: phq2 ~ score + (1 | abbv)
Data: dat
AIC BIC logLik deviance df.resid
180659.3 180689.1 -90326.7 180653.3 150358
Scaled residuals:
Min 1Q Median 3Q Max
-0.7730 -0.6462 -0.6052 1.4615 1.8468
Random effects:
Groups Name Variance Std.Dev.
abbv (Intercept) 0.01827 0.1352
Number of obs: 150361, groups: abbv, 51
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.77732 0.03283 -23.679 <2e-16 ***
score -0.04654 0.01426 -3.265 0.0011 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
score -0.795
Code
# Outcome variable is depression and focus variables are state housing score and risk of eviction.m3 <-glmer(phq2 ~ score +factor(evict) + (1| abbv), data = dat, family = binomial)summary(m3)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: phq2 ~ score + factor(evict) + (1 | abbv)
Data: dat
AIC BIC logLik deviance df.resid
176544.5 176613.9 -88265.3 176530.5 150354
Scaled residuals:
Min 1Q Median 3Q Max
-1.7420 -0.6229 -0.5766 1.1417 1.9222
Random effects:
Groups Name Variance Std.Dev.
abbv (Intercept) 0.01744 0.1321
Number of obs: 150361, groups: abbv, 51
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.90834 0.03254 -27.915 < 2e-16 ***
score -0.03946 0.01406 -2.807 0.00500 **
factor(evict)1 0.08671 0.03363 2.579 0.00992 **
factor(evict)2 0.83823 0.02894 28.966 < 2e-16 ***
factor(evict)3 1.30674 0.02987 43.743 < 2e-16 ***
factor(evict)4 1.76857 0.04475 39.522 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) score fct()1 fct()2 fct()3
score -0.796
factr(vct)1 -0.030 -0.006
factr(vct)2 -0.040 -0.001 0.044
factr(vct)3 -0.045 0.006 0.040 0.045
factr(vct)4 -0.029 0.003 0.029 0.030 0.026
Code
# Checking variance of anxiety by state.m4 <-glmer(gad2 ~ (1| abbv), data = dat, family = binomial)summary(m4)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: gad2 ~ (1 | abbv)
Data: dat
AIC BIC logLik deviance df.resid
195998.7 196018.6 -97997.4 195994.7 150359
Scaled residuals:
Min 1Q Median 3Q Max
-0.8830 -0.7606 -0.7016 1.2813 1.5215
Random effects:
Groups Name Variance Std.Dev.
abbv (Intercept) 0.01958 0.1399
Number of obs: 150361, groups: abbv, 51
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.55000 0.02049 -26.84 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
# Outcome variable is anxiety and focus variable is state housing score.m5 <-glmer(gad2 ~ score + (1| abbv), data = dat, family = binomial)summary(m5)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: gad2 ~ score + (1 | abbv)
Data: dat
AIC BIC logLik deviance df.resid
195995.6 196025.3 -97994.8 195989.6 150358
Scaled residuals:
Min 1Q Median 3Q Max
-0.8846 -0.7611 -0.7018 1.2835 1.5226
Random effects:
Groups Name Variance Std.Dev.
abbv (Intercept) 0.0175 0.1323
Number of obs: 150361, groups: abbv, 51
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.48963 0.03232 -15.151 <2e-16 ***
score -0.03273 0.01398 -2.342 0.0192 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
score -0.798
Code
# Outcome variable is anxiety and focus variables are state housing score and risk of eviction.m6 <-glmer(gad2 ~ score +factor(evict) + (1| abbv), data = dat, family = binomial)summary(m6)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: gad2 ~ score + factor(evict) + (1 | abbv)
Data: dat
AIC BIC logLik deviance df.resid
191564.0 191633.4 -95775.0 191550.0 150354
Scaled residuals:
Min 1Q Median 3Q Max
-2.0263 -0.7299 -0.6706 1.3243 1.6227
Random effects:
Groups Name Variance Std.Dev.
abbv (Intercept) 0.01732 0.1316
Number of obs: 150361, groups: abbv, 51
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.61512 0.03218 -19.115 < 2e-16 ***
score -0.02572 0.01390 -1.850 0.06434 .
factor(evict)1 0.08285 0.03156 2.625 0.00867 **
factor(evict)2 0.86104 0.02862 30.087 < 2e-16 ***
factor(evict)3 1.39889 0.03136 44.612 < 2e-16 ***
factor(evict)4 1.80915 0.04747 38.108 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) score fct()1 fct()2 fct()3
score -0.796
factr(vct)1 -0.028 -0.006
factr(vct)2 -0.037 0.000 0.038
factr(vct)3 -0.039 0.006 0.032 0.038
factr(vct)4 -0.028 0.007 0.024 0.029 0.032
Code
# Outcome variable is depression and input variables are all variables.m7 <-glmer(phq2 ~ score +factor(evict) +factor(income) + age +factor(genid_describe) +factor(race_eth) + single_adult + thhld_numper + children +factor(eeduc) + (1| abbv), data = dat, family = binomial)
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.0281776 (tol = 0.002, component 1)
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
Correlation matrix not shown by default, as p = 31 > 12.
Use print(x, correlation=TRUE) or
vcov(x) if you need it
optimizer (Nelder_Mead) convergence code: 0 (OK)
Model failed to converge with max|grad| = 0.0281776 (tol = 0.002, component 1)
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
Code
# Outcome variable is anxiety and input variables are all variables.m8 <-glmer(gad2 ~ score +factor(evict) +factor(income) + age +factor(genid_describe) +factor(race_eth) + single_adult + thhld_numper + children +factor(eeduc) + (1| abbv), data = dat, family = binomial)
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?