mnhlth: An index from 0 to 12 consisting of the sum of ‘anxious’, ‘worry’, ‘interest’, and ‘down’.
phq4: A variable for bad mental health derived from ‘anxious’, ‘worry’, ‘interest’, and ‘down’. Value can be None, Mild, Moderate, or Severe. Alternately, I could code this as a dummy variable.
badmh: A dummy variable for bad mental health. Bad mental health (6 to 12) is 1, good mental health (0 to 5) is 0. The threshold for this can be adjusted.
Independent variables
From Household Pulse Survey
rentcur: Caught up on rent. Dichotomized so that “Caught up on rent” = 1 and “Not caught up on rent” = 0. NAs removed.
evict: Eviction in next two months. 1 is very likely, 2 is somewhat likeley, 3 is not very likely, 4 is not likely at all, 88 and 99. Only asked if rentcur = 2.
State Policy Variable
From COVID-19 Housing Policy Scoreboard
score: State score on the COVID-19 Housing Policy Scorecard. Value is on a scale from 0 to 4.63
Covariates
(from Household Pulse Survey)
income: Total household income (before taxes).
Less than $25,000
$25,000 - $34,999
$35,000 - $49,999
$50,000 - $74,999
$75,000 - $99,999
$100,000 - $149,999
$150,000 - $199,999
$200,000 and above
Question seen but category not selected
Missing / Did not report
age: Derived from tbirth_year (year of birth)
genid_describe: Current gender identity. 1 is Male, 2 is Female, 3 is Transgender, 4 is none of these, 88, 99.
race_eth: Categorical variable derived from ‘rhispanic’ and ‘rrace’. (Describe coding)
single_adult: Dummy variable for single adult in household vs. multiple adults. Derived from ‘thhld_numadlt’.
thhld_numper: Total number of people in household
child: Dummy variable derived from ‘thhld_numkid’
eeduc: Educational attainment. 1 is less than high school, 2 is some high school, 3 is high school graduate, 4 is some college but no degree, 5 is associates degree, 6 is bachelor’s degree, 7 is graduate degree.
Other variables for cleaning
week: week of interview (review how to use this)
anxious, worry, interest, and down. Used to derive phq4. Frequency of having bad feeling over previous 2 weeks. 1 not at all, 2 is several days, 3 is more than half the days, 4 is nearly every day, 88 is missing, 99 is category not selected.
tenure: 3 is rented. Used to select renters.
est_st: FIPS code for state, used to merge
state: Name of state
abbv: Two letter abbreviation of state
Reading in the data
Importing the data for Household Pulse Survey phase 3.2 to 3.4.
dat$phq4 n percent
1 none 64542 0.4064716
2 mild 44274 0.2788281
3 moderate 23233 0.1463164
4 severe 26737 0.1683839
Make a dummy variable for bad mental health. Bad mental health (6 to 12) is 1, good mental health (0 to 5) is 0. The threshold for this can be adjusted.
Code
dat <- dat %>%mutate(badmh =case_when(.$mnhlth <=5~0, .$mnhlth >=6~1, ) )tabyl(dat$badmh)
dat$badmh n percent
0 108816 0.6852997
1 49970 0.3147003
Clean and recode ‘rentcur’
Then I will check the distribution of ‘rentcur’ and remove observations where ‘rentcur’ is missing. In the original dataset, 1 is “caught up on rent” and 2 is “Not caught up on rent.”
Then I will create a dummy variable for ‘rentcur’ so that the value of ‘1’ means ‘caught up on rent’ and the value of ‘0’ is ‘not caught up on rent’.
dat$rentcur n percent
0 18104 0.1143384
1 140233 0.8856616
Clean and recode ‘evict’
Checking distribution of evict.
evict: Eviction in next two months. 1 is very likely, 2 is somewhat likeley, 3 is not very likely, 4 is not likely at all, 88 and 99. Only asked if rentcur = 2.
“Risk of eviction” question is only asked if rentcur = 2 (not caught up on rent), so it is only asked of people who are not caught up on rent. 1) Very likely 2) Somewhat likely 3) Not very likely 4) Not likely at all. Should I recode this?
genid_describe: Current gender identity. 1 is Male, 2 is Female, 3 is Transgender, 4 is none of these. 99 is ‘question seen but category not selected.’
dat$children n percent
0 106465 0.707639
1 43986 0.292361
Clean eeduc
eeduc: Educational attainment. 1 is less than high school, 2 is some high school, 3 is high school graduate, 4 is some college but no degree, 5 is associates degree, 6 is bachelor’s degree, 7 is graduate degree.
For educational attainment, dichotomize into “less than high school”, “high school”, “some college”, “bachelor’s or higher”.
Checking count of observations in each state. Is this a sufficient number in each state for a multilevel model? North Dakota has 114. Vermont has 103. This is with HH Pulse phases 3.2 and 3.3.
Code
tabyl(evictdat, state)
state n percent
Alabama 232 0.013918886
Alaska 258 0.015478762
Arizona 288 0.017278618
Arkansas 262 0.015718743
California 1723 0.103371730
Colorado 259 0.015538757
Connecticut 335 0.020098392
Delaware 169 0.010139189
District of Columbia 295 0.017698584
Florida 637 0.038216943
Georgia 450 0.026997840
Hawaii 206 0.012359011
Idaho 174 0.010439165
Illinois 412 0.024718023
Indiana 257 0.015418766
Iowa 200 0.011999040
Kansas 236 0.014158867
Kentucky 212 0.012718982
Louisiana 277 0.016618671
Maine 143 0.008579314
Maryland 456 0.027357811
Massachusetts 497 0.029817615
Michigan 399 0.023938085
Minnesota 219 0.013138949
Mississippi 203 0.012179026
Missouri 259 0.015538757
Montana 143 0.008579314
Nebraska 224 0.013438925
Nevada 319 0.019138469
New Hampshire 196 0.011759059
New Jersey 346 0.020758339
New Mexico 302 0.018118551
New York 629 0.037736981
North Carolina 297 0.017818575
North Dakota 114 0.006839453
Ohio 247 0.014818814
Oklahoma 305 0.018298536
Oregon 459 0.027537797
Pennsylvania 448 0.026877850
Rhode Island 226 0.013558915
South Carolina 229 0.013738901
South Dakota 126 0.007559395
Tennessee 298 0.017878570
Texas 955 0.057295416
Utah 202 0.012119030
Vermont 103 0.006179506
Virginia 361 0.021658267
Washington 567 0.034017279
West Virginia 176 0.010559155
Wisconsin 220 0.013198944
Wyoming 118 0.007079434
Lets see how ‘evict’ is distributed across states. Do I need to add more HH pulse phases?
The key thing for the model is to seee how much variation is there across states.
Scatterplot - aggregating values by state and graphing it. Calculate percent on bad mental health. First do the mean on the raw HH Pulse mental health score. For each state you have the score on the eviction policy. State eviction score is the unit of analysis on X axis score. On the Y axis you have mean mental health score by state or percent with bad mental health by state.
Here I create a new dataframe that takes the data in evictdat (sample of all renters who are late on rent) and groups them by state. Then I find the mean mental health number for each state. I create a dataframe that contains the state (full name, I should change to abbreviations way up top) and the mean mental health score of renters who are late on rent in each state.
ggplot(statedat, aes(x=score, y=mean_mh, label=abbv)) +geom_point() +geom_text(hjust=-0.3, vjust=0.5) +geom_smooth(method=lm) +labs(title ="Mental Health of Tenants Behind on Rent by State",x ="State COVID Housing Policy Score",y ="Mean Mental Health")
`geom_smooth()` using formula = 'y ~ x'
Warning: The following aesthetics were dropped during statistical transformation: label
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
Pearson correlation test between State COVID-19 Score and the mean mental health among renters who are late on rent.
Code
test <-cor.test(statedat$score, statedat$mean_mh, method ="spearman")
Warning in cor.test.default(statedat$score, statedat$mean_mh, method =
"spearman"): Cannot compute exact p-value with ties
Code
test
Spearman's rank correlation rho
data: statedat$score and statedat$mean_mh
S = 31628, p-value = 0.001586
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
-0.4311286
Here I am using the percent with bad mental health instead of the mean mental health score.
ggplot(statedat, aes(x=score, y=mean_bmh, label=abbv)) +geom_point() +geom_text(hjust=-0.3, vjust=0.5) +geom_smooth(method=lm) +labs(title ="Percent of Tenants Behind on Rent with Bad Mental Health",x ="State COVID Housing Policy Score",y ="Percent with Bad Mental Health")
`geom_smooth()` using formula = 'y ~ x'
Warning: The following aesthetics were dropped during statistical transformation: label
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
Pearson correlation test between State COVID-19 Score and the percent with bad mental health among renters who are late on rent.
Warning in cor.test.default(statedat$score, statedat$mean_bmh, method =
"spearman"): Cannot compute exact p-value with ties
Code
test2
Spearman's rank correlation rho
data: statedat$score and statedat$mean_bmh
S = 30442, p-value = 0.006319
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
-0.3774811
Plot of risk of eviction and bad mental health.
Code
ggplot(evictdat, aes(evict, mnhlth)) +geom_boxplot(aes(group = evict)) +geom_smooth(method=lm) +labs(title ="Eviction Risk and Mental Health For Tenants Behind on Rent",x ="Risk of Eviction",y ="Mental Health Scores")
Warning in cor.test.default(evictdat$evict, evictdat$mnhlth, method =
"spearman"): Cannot compute exact p-value with ties
Code
test3
Spearman's rank correlation rho
data: evictdat$evict and evictdat$mnhlth
S = 1.0442e+12, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
-0.3529651
Modelling
Plan for modelling.
Independent variable: - evict: Eviction in next two months. 1 is very likely, 2 is somewhat likeley, 3 is not very likely, 4 is not likely at all, 88 and 99. Only asked if rentcur = 2.
Dependent variables: - mnhlth: An index from 0 to 12 consisting of the sum of ‘anxious’, ‘worry’, ‘interest’, and ‘down’. - badmh: A dummy variable for bad mental health. Bad mental (6 to 12) is 1, good mental health (0 to 5) is 0.
Covariates
income: Total household income (before taxes).
Less than $25,000
$25,000 - $34,999
$35,000 - $49,999
$50,000 - $74,999
$75,000 - $99,999
$100,000 - $149,999
$150,000 - $199,999
$200,000 and above
Question seen but category not selected
Missing / Did not report
age: Derived from tbirth_year (year of birth)
genid_describe: Current gender identity. 1 is Male, 2 is Female, 3 is Transgender, 4 is none of these, 88, 99.
race_eth: Categorical variable derived from ‘rhispanic’ and ‘rrace’. (Describe coding)
single_adult: Dummy variable for single adult in household vs. multiple adults. Derived from ‘thhld_numadlt’.
thhld_numper: Total number of people in household
children: Dummy variable derived from ‘thhld_numkid’
eeduc: Educational attainment. 1 is less than high school, 2 is some high school, 3 is high school graduate, 4 is some college but no degree, 5 is associates degree, 6 is bachelor’s degree, 7 is graduate degree.
When the outcome variable is categorical but ordered, we should use the ordered logit models. (https://libguides.princeton.edu/R-logit)
Code
# Ordered logit model of mental health with risk of eviction.evictdat$mnhlth <-as.factor(evictdat$mnhlth)evictdat$evict <-as.factor(evictdat$evict)m1 <-polr(mnhlth ~ evict, data = evictdat, Hess =TRUE)summary(m1)
# Playing with covariates. Do these need to be numerical like age or binomial? Rather than categorical like gender or race/ethnicity?m2 <-polr(mnhlth ~ evict + age, data = evictdat, Hess =TRUE)summary(m2)