phq2 = Dummy variable for depression. Sum ‘interest’ and ‘down’. Score of 3 or higher indicates major depressive disorder. Dichotomized.
gad2 = Dummy variable for anxiety. Sum ‘anxious’ and ‘worry’. Score of 3 or higher indicative of generalized anxiety disorder. Dichotomized.
See section 2.2.1 Mental Health in https://www.sciencedirect.com/science/article/pii/S0277953620307760
Independent variables
From Household Pulse Survey
rentcur: Caught up on rent. Dichotomized so that “Caught up on rent” = 1 and “Not caught up on rent” = 0. NAs removed.
evict: Risk of eviction in next two months. 0 is caught up on rent, 1 is not likely at all, 2 is not very likely, 3 is somewhat likeley, 4 is very likely. NAs removed.
score: State score on the COVID-19 Housing Policy Scorecard. Value is on a scale from 0 to 5, with the highest score being 4.63.
Covariates
(from Household Pulse Survey)
income: Total household income (before taxes).
Less than $25,000
$25,000 - $34,999
$35,000 - $49,999
$50,000 - $74,999
$75,000 - $99,999
$100,000 - $149,999
$150,000 - $199,999
$200,000 and above
Question seen but category not selected
Missing / Did not report
age: Derived from tbirth_year (year of birth)
genid_describe: Current gender identity. 1 is Male, 2 is Female, 3 is Transgender, 4 is none of these, 88, 99.
race_eth: Categorical variable derived from ‘rhispanic’ and ‘rrace’. (Describe coding)
single_adult: Dummy variable for single adult in household vs. multiple adults. Derived from ‘thhld_numadlt’.
thhld_numper: Total number of people in household
child: Dummy variable derived from ‘thhld_numkid’
eeduc: Educational attainment. 1 is less than high school, 2 is some high school, 3 is high school graduate, 4 is some college but no degree, 5 is associates degree, 6 is bachelor’s degree, 7 is graduate degree.
Other variables for cleaning
week: week of interview (review how to use this)
anxious, worry, interest, and down. Used to derive phq4. Frequency of having bad feeling over previous 2 weeks. 1 not at all, 2 is several days, 3 is more than half the days, 4 is nearly every day, 88 is missing, 99 is category not selected.
tenure: 3 is rented. Used to select renters.
est_st: FIPS code for state, used to merge
state: Name of state
abbv: Two letter abbreviation of state
Reading in the data
Importing the data for Household Pulse Survey phase 3.2 to 3.4.
Make dichotomized variables for depression and anxiety.
The HH Pulse variables for “anxious”, “worry”, “interest”, and “down” are scored from 1 to 4, but the clinical scale is from 0 to 3. Therefore, the variables must be adjusted to the 0 to 3 scale by subtracting 1.
First, eliminate NA’s and subtract 1 from each scale of anxious, worry, interest, and down.
Next, make variable “depression” consisting of the sum of “interest” and “down” and make a variable “anxiety” consisting of the sum of “anxious” and “worry.”
Code
dat <- dat %>%mutate(depression = interest + down) %>%mutate(anxiety = anxious + worry)tabyl(dat$depression)
Next, dichotomize “depression” and “anxiety”. ‘phq2’ is the dichotomized depression variable. ‘gad2’ is the dichotomized anxiety variable.
Code
dat <- dat %>%mutate(phq2 =case_when(depression %in%c(0:2) ~0, depression %in%c(3:6) ~1, ) )tabyl(dat$phq2)
dat$phq2 n percent
0 112835 0.7106105
1 45951 0.2893895
Code
dat <- dat %>%mutate(gad2 =case_when(anxiety %in%c(0:2) ~0, anxiety %in%c(3:6) ~1, ) )tabyl(dat$gad2)
dat$gad2 n percent
0 101890 0.6416813
1 56896 0.3583187
Code
nrow(dat)
[1] 158786
Clean and recode ‘rentcur’
Then I will check the distribution of ‘rentcur’ and remove observations where ‘rentcur’ is missing. In the original dataset, 1 is “caught up on rent” and 2 is “Not caught up on rent.”
Then I will create a dummy variable for ‘rentcur’ so that the value of ‘1’ means ‘caught up on rent’ and the value of ‘0’ is ‘not caught up on rent’.
dat$rentcur n percent
0 18104 0.1143384
1 140233 0.8856616
Code
nrow(dat)
[1] 158337
Clean and recode ‘evict’
Checking distribution of evict.
evict: Eviction in next two months. 1 is very likely, 2 is somewhat likeley, 3 is not very likely, 4 is not likely at all, 88 and 99. Only asked if rentcur = 2.
This needs to be recoded so that 0 is current on rent (not at risk of eviction due to late on rent), 1 is not likely at all, 2 is not very likely, 3 is somewhat likely, and 4 is very likely.
Code ‘evict’ = 0 when rentcur = 1. Change other codes as well.
Code
dat <- dat %>%mutate(evict =case_when(rentcur ==1~0, evict ==4~1, evict ==3~2, evict ==2~3, evict ==1~4 ) )tabyl(dat$evict)
dat$evict n percent valid_percent
0 140233 0.885661595 0.88882199
1 4850 0.030630870 0.03074017
2 5241 0.033100286 0.03321840
3 4961 0.031331906 0.03144371
4 2489 0.015719636 0.01577573
NA 563 0.003555707 NA
Filer out NAs.
Code
dat <- dat %>%filter(evict %in%c(0:4))tabyl(dat$evict)
Creating a new variable ‘age’ by subtracting birth year from the year the data were collected. (How can I fix this since the data were collected in 2 years?)
genid_describe: Current gender identity. 1 is Male, 2 is Female, 3 is Transgender, 4 is none of these. 99 is ‘question seen but category not selected.’
dat$children n percent
0 106423 0.7077833
1 43938 0.2922167
Clean eeduc
eeduc: Educational attainment. 1 is less than high school, 2 is some high school, 3 is high school graduate, 4 is some college but no degree, 5 is associates degree, 6 is bachelor’s degree, 7 is graduate degree.
For educational attainment, dichotomize into “less than high school”, “high school”, “some college”, “bachelor’s or higher”.
The key thing for the model is to seee how much variation is there across states.
Scatterplot - aggregating values by state and graphing it. Calculate percent on phq2 and gad2. First do the mean on the raw HH Pulse mental health score. For each state you have the score on the eviction policy. State eviction score is the unit of analysis on X axis score. On the Y axis you have mean mental health score by state or percent with bad mental health by state.
Here I create a new dataframe that takes the data in dat and groups them by state. Then I find the mean mental health number for each state. I create a dataframe that contains the state and the mean mental health score of renters who are late on rent in each state.
ggplot(statedat, aes(x=score, y=mean_mh, label=abbv)) +geom_point() +geom_text(hjust=-0.3, vjust=0.5) +geom_smooth(method=lm) +labs(title ="Mean Mental Health of All Renters by State",x ="State COVID Housing Policy Score",y ="Mean Mental Health")
`geom_smooth()` using formula = 'y ~ x'
Warning: The following aesthetics were dropped during statistical transformation: label
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
Pearson correlation test between State COVID-19 Score and the mean mental health among all renters.
Code
test <-cor.test(statedat$score, statedat$mean_mh, method ="spearman")
Warning in cor.test.default(statedat$score, statedat$mean_mh, method =
"spearman"): Cannot compute exact p-value with ties
Code
test
Spearman's rank correlation rho
data: statedat$score and statedat$mean_mh
S = 29127, p-value = 0.02299
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
-0.3179432
library(ggpubr)library(ggrepel)statebmh$evict <-as.factor(statebmh$evict)ggplot(data=statebmh, aes(x = mscore, y = anx_pc, color=evict)) +geom_point() +stat_smooth(method ="lm", size =1) +stat_regline_equation(aes(label = ..rr.label..)) +geom_text_repel(aes(anx_pc, mscore, label = abbv, color=evict), size =6)+guides(color=F)+theme_bw(base_size =18) +labs(title="Marriageable Male Effect for States Differing in Conservatism", subtitle="US-Born Men Ages 18 to 27", caption="Source: 2012 ACS 5-Year File") +xlab(label="# of Employed Men Per 100 Men in the Unmarried Population Ages 18 to 27") +ylab(label="Standardized Early Marriage Ratio") +ylim(0, 1)
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
of ggplot2 3.3.4.
Warning: The dot-dot notation (`..rr.label..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(rr.label)` instead.
Warning: ggrepel: 80 unlabeled data points (too many overlaps). Consider
increasing max.overlaps
Percent of Tenants with Depression by State
Code
ggplot(statedat, aes(x=score, y=dep_pc, label=abbv)) +geom_point() +geom_text(hjust=-0.3, vjust=0.5) +geom_smooth(method=lm) +labs(title ="Percent of Tenants with Depression",x ="State COVID Housing Policy Score",y ="Percent with Depression")
`geom_smooth()` using formula = 'y ~ x'
Warning: The following aesthetics were dropped during statistical transformation: label
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
Percent of Tenants with Anxiety by State
Code
ggplot(statedat, aes(x=score, y=anx_pc, label=abbv)) +geom_point() +geom_text(hjust=-0.3, vjust=0.5) +geom_smooth(method=lm) +labs(title ="Percent of Tenants with Anxiety",x ="State COVID Housing Policy Score",y ="Percent with Depression")
`geom_smooth()` using formula = 'y ~ x'
Warning: The following aesthetics were dropped during statistical transformation: label
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
Pearson correlation test between State COVID-19 Score and the percent with depression among all renters.
Warning in cor.test.default(statedat$score, statedat$dep_pc, method =
"spearman"): Cannot compute exact p-value with ties
Code
test2
Spearman's rank correlation rho
data: statedat$score and statedat$dep_pc
S = 3922021, p-value = 2.832e-12
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
-0.419212
Pearson correlation test between State COVID-19 Score and the percent with anxiety among all renters.
Warning in cor.test.default(statedat$score, statedat$anx_pc, method =
"spearman"): Cannot compute exact p-value with ties
Code
test3
Spearman's rank correlation rho
data: statedat$score and statedat$anx_pc
S = 2993794, p-value = 0.1847
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
-0.08332638
Plot of risk of eviction and depression.
Code
ggplot(dat, aes(evict, depression)) +geom_boxplot(aes(group = evict)) +geom_smooth(method=lm) +labs(title ="Eviction Risk and Depression",x ="Risk of Eviction",y ="Depression")
`geom_smooth()` using formula = 'y ~ x'
Plot of risk of eviction and anxiety
Code
ggplot(dat, aes(evict, anxiety)) +geom_boxplot(aes(group = evict)) +geom_smooth(method=lm) +labs(title ="Eviction Risk and Anxiety",x ="Risk of Eviction",y ="Anxiety")
Warning in cor.test.default(dat$evict, dat$depression, method = "spearman"):
Cannot compute exact p-value with ties
Code
test4
Spearman's rank correlation rho
data: dat$evict and dat$depression
S = 4.7599e+14, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.1598694
Warning in cor.test.default(dat$evict, dat$anxiety, method = "spearman"):
Cannot compute exact p-value with ties
Code
test5
Spearman's rank correlation rho
data: dat$evict and dat$anxiety
S = 4.7278e+14, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.165534
Modeling
Plan for modeling.
Population: All renters in the HH Pulse Survey phases 3.2 to 3.4.
Dependent variables
phq2 = Dummy variable for depression. Sum ‘interest’ and ‘down’. Score of 3 or higher indicates major depressive disorder. Dichotomized.
gad2 = Dummy variable for anxiety. Sum ‘anxious’ and ‘worry’. Score of 3 or higher indicative of generalized anxiety disorder. Dichotomized.
Independent variables
rentcur: Caught up on rent. Dichotomized so that “Caught up on rent” = 1 and “Not caught up on rent” = 0. NAs removed.
evict: Risk of eviction in next two months. 0 is caught up on rent, 1 is not likely at all, 2 is not very likely, 3 is somewhat likeley, 4 is very likely. NAs removed.
race_eth: categorical: hispanic, nh_white, nh_black, nh_asian, other
dichotomized: hispanic, nh_white, nh_black, nh_asian, other
single_adult: dichotomized 1 = singe adult or 0 = mutiple adults in household
thhld_numper: integer: number of people in household from 1 to 10.
children: dichotomized: 1 = children in household, 0 = no children in household
eeduc: 1 is less than high school, 2 is some high school, 3 is high school graduate, 4 is some college but no degree, 5 is associates degree, 6 is bachelor’s degree, 7 is graduate degree.
fix scatterplot showing state scores but grouped by eviction risk (have to think about this). I need to find the percent of eviction risk by state. The variable is dichotomized, so I can find the mean for each state. But what does that variable mean?
run models!
mean eviction risk by score DV, then add score, then add evict risk
DV is depression, checking variation by state.
Code
# Running Generalized linear mixed model (GLMM). Checking variance of depression by state.m1 <-glmer(phq2 ~ (1| abbv), data = dat, family = binomial)summary(m1)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: phq2 ~ (1 | abbv)
Data: dat
AIC BIC logLik deviance df.resid
180666.9 180686.7 -90331.4 180662.9 150359
Scaled residuals:
Min 1Q Median 3Q Max
-0.7735 -0.6475 -0.6060 1.4627 1.8419
Random effects:
Groups Name Variance Std.Dev.
abbv (Intercept) 0.0224 0.1497
Number of obs: 150361, groups: abbv, 51
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.86299 0.02191 -39.38 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
DV is depression, IV is state score
Code
# Outcome variable is depression and focus variable is state housing score.m2 <-glmer(phq2 ~ score + (1| abbv), data = dat, family = binomial)summary(m2)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: phq2 ~ score + (1 | abbv)
Data: dat
AIC BIC logLik deviance df.resid
180659.3 180689.1 -90326.7 180653.3 150358
Scaled residuals:
Min 1Q Median 3Q Max
-0.7730 -0.6462 -0.6052 1.4615 1.8468
Random effects:
Groups Name Variance Std.Dev.
abbv (Intercept) 0.01827 0.1352
Number of obs: 150361, groups: abbv, 51
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.77732 0.03283 -23.679 <2e-16 ***
score -0.04654 0.01426 -3.265 0.0011 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
score -0.795
DV is depression, IV is state score and eviction risk.
Code
# Outcome variable is depression and focus variables are state housing score and risk of eviction.m3 <-glmer(phq2 ~ score +factor(evict) + (1| abbv), data = dat, family = binomial)summary(m3)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: phq2 ~ score + factor(evict) + (1 | abbv)
Data: dat
AIC BIC logLik deviance df.resid
176544.5 176613.9 -88265.3 176530.5 150354
Scaled residuals:
Min 1Q Median 3Q Max
-1.7420 -0.6229 -0.5766 1.1417 1.9222
Random effects:
Groups Name Variance Std.Dev.
abbv (Intercept) 0.01744 0.1321
Number of obs: 150361, groups: abbv, 51
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.90834 0.03254 -27.915 < 2e-16 ***
score -0.03946 0.01406 -2.807 0.00500 **
factor(evict)1 0.08671 0.03363 2.579 0.00992 **
factor(evict)2 0.83823 0.02894 28.966 < 2e-16 ***
factor(evict)3 1.30674 0.02987 43.743 < 2e-16 ***
factor(evict)4 1.76857 0.04475 39.522 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) score fct()1 fct()2 fct()3
score -0.796
factr(vct)1 -0.030 -0.006
factr(vct)2 -0.040 -0.001 0.044
factr(vct)3 -0.045 0.006 0.040 0.045
factr(vct)4 -0.029 0.003 0.029 0.030 0.026
DV is anxiety, checking variance of anxiety by state.
Code
# Checking variance of anxiety by state.m4 <-glmer(gad2 ~ (1| abbv), data = dat, family = binomial)summary(m4)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: gad2 ~ (1 | abbv)
Data: dat
AIC BIC logLik deviance df.resid
195998.7 196018.6 -97997.4 195994.7 150359
Scaled residuals:
Min 1Q Median 3Q Max
-0.8830 -0.7606 -0.7016 1.2813 1.5215
Random effects:
Groups Name Variance Std.Dev.
abbv (Intercept) 0.01958 0.1399
Number of obs: 150361, groups: abbv, 51
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.55000 0.02049 -26.84 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
DV is anxiety, IV is state score.
Code
# Outcome variable is anxiety and focus variable is state housing score.m5 <-glmer(gad2 ~ score + (1| abbv), data = dat, family = binomial)summary(m5)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: gad2 ~ score + (1 | abbv)
Data: dat
AIC BIC logLik deviance df.resid
195995.6 196025.3 -97994.8 195989.6 150358
Scaled residuals:
Min 1Q Median 3Q Max
-0.8846 -0.7611 -0.7018 1.2835 1.5226
Random effects:
Groups Name Variance Std.Dev.
abbv (Intercept) 0.0175 0.1323
Number of obs: 150361, groups: abbv, 51
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.48963 0.03232 -15.151 <2e-16 ***
score -0.03273 0.01398 -2.342 0.0192 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
score -0.798
DV is anxiety, IV is state score and eviction risk.
Code
# Outcome variable is anxiety and focus variables are state housing score and risk of eviction.m6 <-glmer(gad2 ~ score +factor(evict) + (1| abbv), data = dat, family = binomial)summary(m6)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: gad2 ~ score + factor(evict) + (1 | abbv)
Data: dat
AIC BIC logLik deviance df.resid
191564.0 191633.4 -95775.0 191550.0 150354
Scaled residuals:
Min 1Q Median 3Q Max
-2.0263 -0.7299 -0.6706 1.3243 1.6227
Random effects:
Groups Name Variance Std.Dev.
abbv (Intercept) 0.01732 0.1316
Number of obs: 150361, groups: abbv, 51
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.61512 0.03218 -19.115 < 2e-16 ***
score -0.02572 0.01390 -1.850 0.06434 .
factor(evict)1 0.08285 0.03156 2.625 0.00867 **
factor(evict)2 0.86104 0.02862 30.087 < 2e-16 ***
factor(evict)3 1.39889 0.03136 44.612 < 2e-16 ***
factor(evict)4 1.80915 0.04747 38.108 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) score fct()1 fct()2 fct()3
score -0.796
factr(vct)1 -0.028 -0.006
factr(vct)2 -0.037 0.000 0.038
factr(vct)3 -0.039 0.006 0.032 0.038
factr(vct)4 -0.028 0.007 0.024 0.029 0.032
Code
# Outcome variable is anxiety and focus variables are state housing score and risk of eviction.m7 <-glmer(phq2 ~ score +factor(evict) + (1| abbv), data = dat, family = binomial)summary(m7)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: phq2 ~ score + factor(evict) + (1 | abbv)
Data: dat
AIC BIC logLik deviance df.resid
176544.5 176613.9 -88265.3 176530.5 150354
Scaled residuals:
Min 1Q Median 3Q Max
-1.7420 -0.6229 -0.5766 1.1417 1.9222
Random effects:
Groups Name Variance Std.Dev.
abbv (Intercept) 0.01744 0.1321
Number of obs: 150361, groups: abbv, 51
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.90834 0.03254 -27.915 < 2e-16 ***
score -0.03946 0.01406 -2.807 0.00500 **
factor(evict)1 0.08671 0.03363 2.579 0.00992 **
factor(evict)2 0.83823 0.02894 28.966 < 2e-16 ***
factor(evict)3 1.30674 0.02987 43.743 < 2e-16 ***
factor(evict)4 1.76857 0.04475 39.522 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) score fct()1 fct()2 fct()3
score -0.796
factr(vct)1 -0.030 -0.006
factr(vct)2 -0.040 -0.001 0.044
factr(vct)3 -0.045 0.006 0.040 0.045
factr(vct)4 -0.029 0.003 0.029 0.030 0.026
Commenting these blocks out for now because they take a long time to run. Outcome variable is depression and input variables are all variables.
m7 <- glmer(phq2 ~ score + factor(evict) + factor(income) + age + factor(genid_describe) + factor(race_eth) + single_adult + thhld_numper + children + factor(eeduc) + (1 | abbv), data = dat, family = binomial)
summary(m7)
Outcome variable is anxiety and input variables are all variables.
m8 <- glmer(gad2 ~ score + factor(evict) + factor(income) + age + factor(genid_describe) + factor(race_eth) + single_adult + thhld_numper + children + factor(eeduc) + (1 | abbv), data = dat, family = binomial)
summary(m8)
Eviction risk as the DV.
Proportion maps
Could you determine with your file the percent of respondents in each state who say they are very or somewhat likely to leave their place in the next two months due to eviction (among all respondents)? Question - should this be among homeowners and renters? The “evict” variable only applies to renters. There is a separate question for homeowners if they will be foreclosed on in the next two months.
Code
# Determining the percent of respondents in each state who say they are very or somewhat likely to leave their place in the next two months due to eviction (among all respondents).alldata <- alldata %>%mutate(highrisk =case_when(evict %in%c(3:4) ~1,TRUE~0))# Create a dataframe called state_highrisk to create a map.all_state_highrisk <- alldata %>%group_by(abbr = abbv) %>%summarize(highrisk =100*mean(highrisk))all_state_highrisk
# A tibble: 51 × 2
abbr highrisk
<chr> <dbl>
1 AK 1.18
2 AL 1.24
3 AR 1.40
4 AZ 0.779
5 CA 1.74
6 CO 0.746
7 CT 1.50
8 DC 2.08
9 DE 1.22
10 FL 1.32
# ℹ 41 more rows
Code
# Plot a US map# Get centroidslibrary(usmapdata)library(usmap)
Attaching package: 'usmap'
The following object is masked from 'package:usmapdata':
us_map
Code
centroid_labels <- usmapdata::centroid_labels("states")# Join data to centroidsdata_labels <-merge(centroid_labels, all_state_highrisk, by ="abbr")map1dat <- data_labels %>% dplyr::select(fips, highrisk)plot_usmap(data = map1dat, values ="highrisk")+labs(title="Percent of all respondents in each state who are very or somewhat likely to be evicted in next two months.", subtitle="Highrisk", caption ="Source: Household Pulse, COVID 19 Housing Policy Score") +theme(panel.background =element_rect(colour ="black"))+scale_fill_continuous(low ="white", high ="darkred", name ="Value on Index",label = scales::comma) +theme(legend.position ="right") +guides(fill ="none") +geom_text(data = data_labels, ggplot2::aes(x = x, y = y,label = scales::number(highrisk, scale =1, accuracy = .1)), color ="black")
Code
# Determining the percent of renters in each state who say they are very or somewhat likely to leave their place in the next two months due to eviction (among all respondents).# Create a dichotomous variable called "highrisk" for renters who are very or somewhat likely to be evicted in the next 2 months.dat <- dat %>%mutate(highrisk =case_when(evict %in%c(3:4) ~1,TRUE~0))# Create a dataframe called state_highrisk to create a map.state_highrisk <- dat %>%group_by(abbr = abbv) %>%summarize(highrisk =100*mean(highrisk))state_highrisk
# A tibble: 51 × 2
abbr highrisk
<chr> <dbl>
1 AK 4.91
2 AL 7.72
3 AR 7.56
4 AZ 3.73
5 CA 4.36
6 CO 3.03
7 CT 5.04
8 DC 2.69
9 DE 5.25
10 FL 5.37
# ℹ 41 more rows
Code
# Join data to centroidsdata_labels <-merge(centroid_labels, state_highrisk, by ="abbr")map1dat <- data_labels %>% dplyr::select(fips, highrisk)plot_usmap(data = map1dat, values ="highrisk")+labs(title="Percent of renters in each state who are very or somewhat likely to be evicted in next two months.", subtitle="Highrisk", caption ="Source: Household Pulse, COVID 19 Housing Policy Score") +theme(panel.background =element_rect(colour ="black"))+scale_fill_continuous(low ="white", high ="darkred", name ="Value on Index",label = scales::comma) +theme(legend.position ="right") +guides(fill ="none") +geom_text(data = data_labels, ggplot2::aes(x = x, y = y,label = scales::number(highrisk, scale =1, accuracy = .1)), color ="black")
In addition, determine these percentages:
The percent of respondents in each state who are renting.
Code
# I'll have to use alldata for this one.# Category 3 is rented, category 4 is occupied without payment of rent. What to do about category 4?tabyl(alldata$tenure)
# A tibble: 51 × 2
abbr Renters
<chr> <dbl>
1 AK 18.5
2 AL 15.5
3 AR 17.6
4 AZ 18.6
5 CA 27.4
6 CO 18.7
7 CT 17.9
8 DC 35.2
9 DE 13.6
10 FL 19.1
# ℹ 41 more rows
Code
# Join data to centroidsdata_labels <-merge(centroid_labels, state_renting, by ="abbr")map1dat <- data_labels %>% dplyr::select(fips, Renters)plot_usmap(data = map1dat, values ="Renters")+labs(title="Percent of respondents who are renting.", subtitle="Renting", caption ="Source: Household Pulse, COVID 19 Housing Policy Score") +theme(panel.background =element_rect(colour ="black"))+scale_fill_continuous(low ="white", high ="darkred", name ="Value on Index",label = scales::comma) +theme(legend.position ="right") +guides(fill ="none") +geom_text(data = data_labels, ggplot2::aes(x = x, y = y,label = scales::number(Renters, scale =1, accuracy = .1)), color ="black")
The percent of respondents in each state who are not caught upon rent among those who are renting.
Code
# For this, I am only using category 3, renters.tabyl(dat$rentcur)
dat$rentcur n percent
0 16668 0.1108532
1 133693 0.8891468
Code
late_renters <- dat %>%group_by(abbr = abbv) %>%summarize(late_renters =100*(1-mean(rentcur)))late_renters
# A tibble: 51 × 2
abbr late_renters
<chr> <dbl>
1 AK 11.2
2 AL 15.6
3 AR 15.6
4 AZ 7.89
5 CA 10.8
6 CO 7.07
7 CT 13.8
8 DC 8.62
9 DE 14.1
10 FL 12.3
# ℹ 41 more rows
Code
# Join data to centroidsdata_labels <-merge(centroid_labels, late_renters, by ="abbr")map1dat <- data_labels %>% dplyr::select(fips, late_renters)plot_usmap(data = map1dat, values ="late_renters")+labs(title="Percent of renters who are late on rent", subtitle="Late Renters", caption ="Source: Household Pulse, COVID 19 Housing Policy Score") +theme(panel.background =element_rect(colour ="black"))+scale_fill_continuous(low ="white", high ="darkred", name ="Value on Index",label = scales::comma) +theme(legend.position ="right") +guides(fill ="none") +geom_text(data = data_labels, ggplot2::aes(x = x, y = y,label = scales::number(late_renters, scale =1, accuracy = .1)), color ="black")
The percent of respondents in each state who are very or somewhat likely to leave their place in the next two months due to eviction among those who are not caught up on rent.
Code
# Limit the data to those who are not caught up on rent.not_rentcur <- dat %>% dplyr::select(abbv, rentcur, evict) %>%filter(rentcur ==0) %>%mutate(highrisk =case_when(evict %in%c(3:4) ~1,TRUE~0))not_rentcur <- not_rentcur %>%group_by(abbr = abbv) %>%summarize(highrisk =100*mean(highrisk))not_rentcur
# A tibble: 51 × 2
abbr highrisk
<chr> <dbl>
1 AK 43.8
2 AL 49.6
3 AR 48.5
4 AZ 47.2
5 CA 40.6
6 CO 42.9
7 CT 36.4
8 DC 31.2
9 DE 37.3
10 FL 43.8
# ℹ 41 more rows
Code
# Join data to centroidsdata_labels <-merge(centroid_labels, not_rentcur, by ="abbr")map1dat <- data_labels %>% dplyr::select(fips, highrisk)plot_usmap(data = map1dat, values ="highrisk")+labs(title="Perent of late renters who are very or somewhat likely to be evicted", subtitle="High risk late renters", caption ="Source: Household Pulse, COVID 19 Housing Policy Score") +theme(panel.background =element_rect(colour ="black"))+scale_fill_continuous(low ="white", high ="darkred", name ="Value on Index",label = scales::comma) +theme(legend.position ="right") +guides(fill ="none") +geom_text(data = data_labels, ggplot2::aes(x = x, y = y,label = scales::number(highrisk, scale =1, accuracy = .1)), color ="black")