phq2 = Dummy variable for depression. Sum ‘interest’ and ‘down’. Score of 3 or higher indicates major depressive disorder. Dichotomized.
gad2 = Dummy variable for anxiety. Sum ‘anxious’ and ‘worry’. Score of 3 or higher indicative of generalized anxiety disorder. Dichotomized.
See section 2.2.1 Mental Health in https://www.sciencedirect.com/science/article/pii/S0277953620307760
Independent variables
From Household Pulse Survey
rentcur: Caught up on rent. Dichotomized so that “Caught up on rent” = 1 and “Not caught up on rent” = 0. NAs removed.
evict: Risk of eviction in next two months. 0 is caught up on rent, 1 is not likely at all, 2 is not very likely, 3 is somewhat likeley, 4 is very likely. NAs removed.
score: State score on the COVID-19 Housing Policy Scorecard. Value is on a scale from 0 to 5, with the highest score being 4.63.
Covariates
(from Household Pulse Survey)
income: Total household income (before taxes).
Less than $25,000
$25,000 - $34,999
$35,000 - $49,999
$50,000 - $74,999
$75,000 - $99,999
$100,000 - $149,999
$150,000 - $199,999
$200,000 and above
Question seen but category not selected
Missing / Did not report
age: Derived from tbirth_year (year of birth)
genid_describe: Current gender identity. 1 is Male, 2 is Female, 3 is Transgender, 4 is none of these, 88, 99.
race_eth: Categorical variable derived from ‘rhispanic’ and ‘rrace’. (Describe coding)
single_adult: Dummy variable for single adult in household vs. multiple adults. Derived from ‘thhld_numadlt’.
thhld_numper: Total number of people in household
child: Dummy variable derived from ‘thhld_numkid’
eeduc: Educational attainment. 1 is less than high school, 2 is some high school, 3 is high school graduate, 4 is some college but no degree, 5 is associates degree, 6 is bachelor’s degree, 7 is graduate degree.
Other variables for cleaning
week: week of interview (review how to use this)
anxious, worry, interest, and down. Used to derive phq4. Frequency of having bad feeling over previous 2 weeks. 1 not at all, 2 is several days, 3 is more than half the days, 4 is nearly every day, 88 is missing, 99 is category not selected.
tenure: 3 is rented. Used to select renters.
est_st: FIPS code for state, used to merge
state: Name of state
abbv: Two letter abbreviation of state
Reading in the data
Importing the data for Household Pulse Survey phase 3.2 to 3.4.
Make dichotomized variables for depression and anxiety.
The HH Pulse variables for “anxious”, “worry”, “interest”, and “down” are scored from 1 to 4, but the clinical scale is from 0 to 3. Therefore, the variables must be adjusted to the 0 to 3 scale by subtracting 1.
First, eliminate NA’s and subtract 1 from each scale of anxious, worry, interest, and down.
Next, make variable “depression” consisting of the sum of “interest” and “down” and make a variable “anxiety” consisting of the sum of “anxious” and “worry.”
Code
dat <- dat %>%mutate(depression = interest + down) %>%mutate(anxiety = anxious + worry)tabyl(dat$depression)
Next, dichotomize “depression” and “anxiety”. ‘phq2’ is the dichotomized depression variable. ‘gad2’ is the dichotomized anxiety variable.
Code
dat <- dat %>%mutate(phq2 =case_when(depression %in%c(0:2) ~0, depression %in%c(3:6) ~1, ) )tabyl(dat$phq2)
dat$phq2 n percent
0 112835 0.7106105
1 45951 0.2893895
Code
dat <- dat %>%mutate(gad2 =case_when(anxiety %in%c(0:2) ~0, anxiety %in%c(3:6) ~1, ) )tabyl(dat$gad2)
dat$gad2 n percent
0 101890 0.6416813
1 56896 0.3583187
Code
nrow(dat)
[1] 158786
Clean and recode ‘rentcur’
Then I will check the distribution of ‘rentcur’ and remove observations where ‘rentcur’ is missing. In the original dataset, 1 is “caught up on rent” and 2 is “Not caught up on rent.”
Then I will create a dummy variable for ‘rentcur’ so that the value of ‘1’ means ‘caught up on rent’ and the value of ‘0’ is ‘not caught up on rent’.
dat$rentcur n percent
0 18104 0.1143384
1 140233 0.8856616
Code
nrow(dat)
[1] 158337
Clean and recode ‘evict’
Checking distribution of evict.
evict: Eviction in next two months. 1 is very likely, 2 is somewhat likeley, 3 is not very likely, 4 is not likely at all, 88 and 99. Only asked if rentcur = 2.
This needs to be recoded so that 0 is current on rent (not at risk of eviction due to late on rent), 1 is not likely at all, 2 is not very likely, 3 is somewhat likely, and 4 is very likely.
Code ‘evict’ = 0 when rentcur = 1. Change other codes as well.
Code
dat <- dat %>%mutate(evict =case_when(rentcur ==1~0, evict ==4~1, evict ==3~2, evict ==2~3, evict ==1~4 ) )tabyl(dat$evict)
dat$evict n percent valid_percent
0 140233 0.885661595 0.88882199
1 4850 0.030630870 0.03074017
2 5241 0.033100286 0.03321840
3 4961 0.031331906 0.03144371
4 2489 0.015719636 0.01577573
NA 563 0.003555707 NA
Filer out NAs.
Code
dat <- dat %>%filter(evict %in%c(0:4))tabyl(dat$evict)
Creating a new variable ‘age’ by subtracting birth year from the year the data were collected. (How can I fix this since the data were collected in 2 years?)
genid_describe: Current gender identity. 1 is Male, 2 is Female, 3 is Transgender, 4 is none of these. 99 is ‘question seen but category not selected.’
dat$children n percent
0 106423 0.7077833
1 43938 0.2922167
Clean eeduc
eeduc: Educational attainment. 1 is less than high school, 2 is some high school, 3 is high school graduate, 4 is some college but no degree, 5 is associates degree, 6 is bachelor’s degree, 7 is graduate degree.
For educational attainment, dichotomize into “less than high school”, “high school”, “some college”, “bachelor’s or higher”.
The key thing for the model is to seee how much variation is there across states.
Scatterplot - aggregating values by state and graphing it. Calculate percent on phq2 and gad2. First do the mean on the raw HH Pulse mental health score. For each state you have the score on the eviction policy. State eviction score is the unit of analysis on X axis score. On the Y axis you have mean mental health score by state or percent with bad mental health by state.
Here I create a new dataframe that takes the data in dat and groups them by state. Then I find the mean mental health number for each state. I create a dataframe that contains the state and the mean mental health score of renters who are late on rent in each state.
ggplot(statedat, aes(x=score, y=mean_mh, label=abbv)) +geom_point() +geom_text(hjust=-0.3, vjust=0.5) +geom_smooth(method=lm) +labs(title ="Mean Mental Health of All Renters by State",x ="State COVID Housing Policy Score",y ="Mean Mental Health")
`geom_smooth()` using formula = 'y ~ x'
Warning: The following aesthetics were dropped during statistical transformation: label
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
Pearson correlation test between State COVID-19 Score and the mean mental health among all renters.
Code
test <-cor.test(statedat$score, statedat$mean_mh, method ="spearman")
Warning in cor.test.default(statedat$score, statedat$mean_mh, method =
"spearman"): Cannot compute exact p-value with ties
Code
test
Spearman's rank correlation rho
data: statedat$score and statedat$mean_mh
S = 29127, p-value = 0.02299
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
-0.3179432
ggplot(statedat, aes(x=score, y=dep_pc, label=abbv)) +geom_point() +geom_text(hjust=-0.3, vjust=0.5) +geom_smooth(method=lm) +labs(title ="Percent of Tenants with Depression",x ="State COVID Housing Policy Score",y ="Percent with Depression")
`geom_smooth()` using formula = 'y ~ x'
Warning: The following aesthetics were dropped during statistical transformation: label
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
Percent of Tenants with Anxiety by State
Code
ggplot(statedat, aes(x=score, y=anx_pc, label=abbv)) +geom_point() +geom_text(hjust=-0.3, vjust=0.5) +geom_smooth(method=lm) +labs(title ="Percent of Tenants with Anxiety",x ="State COVID Housing Policy Score",y ="Percent with Depression")
`geom_smooth()` using formula = 'y ~ x'
Warning: The following aesthetics were dropped during statistical transformation: label
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
Pearson correlation test between State COVID-19 Score and the percent with depression among all renters.
Warning in cor.test.default(statedat$score, statedat$dep_pc, method =
"spearman"): Cannot compute exact p-value with ties
Code
test2
Spearman's rank correlation rho
data: statedat$score and statedat$dep_pc
S = 31365, p-value = 0.002199
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
-0.419212
Pearson correlation test between State COVID-19 Score and the percent with anxiety among all renters.
Warning in cor.test.default(statedat$score, statedat$anx_pc, method =
"spearman"): Cannot compute exact p-value with ties
Code
test3
Spearman's rank correlation rho
data: statedat$score and statedat$anx_pc
S = 29660, p-value = 0.01401
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
-0.3420937
Plot of risk of eviction and depression.
Code
ggplot(dat, aes(evict, depression)) +geom_boxplot(aes(group = evict)) +geom_smooth(method=lm) +labs(title ="Eviction Risk and Depression",x ="Risk of Eviction",y ="Depression")
`geom_smooth()` using formula = 'y ~ x'
Plot of risk of eviction and anxiety
Code
ggplot(dat, aes(evict, anxiety)) +geom_boxplot(aes(group = evict)) +geom_smooth(method=lm) +labs(title ="Eviction Risk and Anxiety",x ="Risk of Eviction",y ="Anxiety")
Warning in cor.test.default(dat$evict, dat$depression, method = "spearman"):
Cannot compute exact p-value with ties
Code
test4
Spearman's rank correlation rho
data: dat$evict and dat$depression
S = 4.7599e+14, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.1598694
Warning in cor.test.default(dat$evict, dat$anxiety, method = "spearman"):
Cannot compute exact p-value with ties
Code
test5
Spearman's rank correlation rho
data: dat$evict and dat$anxiety
S = 4.7278e+14, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.165534
Modelling
Plan for modelling.
Population: All renters in the HH Pulse Survey phases 3.2 to 3.4.
Dependent variables
phq2 = Dummy variable for depression. Sum ‘interest’ and ‘down’. Score of 3 or higher indicates major depressive disorder. Dichotomized.
gad2 = Dummy variable for anxiety. Sum ‘anxious’ and ‘worry’. Score of 3 or higher indicative of generalized anxiety disorder. Dichotomized.
Independent variables
rentcur: Caught up on rent. Dichotomized so that “Caught up on rent” = 1 and “Not caught up on rent” = 0. NAs removed.
evict: Risk of eviction in next two months. 0 is caught up on rent, 1 is not likely at all, 2 is not very likely, 3 is somewhat likeley, 4 is very likely. NAs removed.
single_adult: dichotomized singe adult or mutiple adults in household
thhld_numper: number of people in household
children: dichotomized presence of children in household
eeduc: dichotomized educational attainment
Modeling
To do:
fix scatterplot showing state scores but grouped by eviction risk (have to think about this). I need to find the percent of evcition risk by state. The variable is dichotomized, so I can find the mean for each state. But what does that variable mean?