Code
<- read.csv("./data/pulse2022_puf_46.csv")
hps46
<- read.csv("./data/pulse2022_puf_47.csv")
hps47
<- read.csv("./data/pulse2022_puf_48.csv") hps48
Variable descriptions and descriptive results. Here you will describe the coding of your outcome and independent variables. You will produce and discuss summary statistics (tables) and data visualizations (figures) for your outcome and independent variables. You will also need to discuss how your sample size decreases with successive restrictions. This part of the project will alert me to potential coding issues that you need to fix prior to running models. Please send me a draft of this part (along with your revisions to the first part of the project) by April 7th or April 14th (depending on when we discuss your project proposals).
In “Evicted: Poverty and Profit in the American City,” the author Matthew Desmond observed that in some households, there isn’t enough to eat because the “rent eats first.” Housing is shelter, a basic human need. As documented by Desmond, housing is so important it can take priority over other basic needs like food. Therefore, it is reasonable to presume that anxiety and depression is elevated in low-income households that pay a high proportion of income on rent, as well as households that have fallen behind on rent. This research project will explore the relationship between rental cost burden and tenant mental health.
The project will focus on renters (rather than homeowners) for multiple reasons. Prior research shows that low-income families are more likely to rent than to own a home. Renters exhibit greater cost burden than homeowners. Furthermore, renters tend to be at greater risk of eviction than homeowners. The sample consists of respondents to the U.S. Census Bureau Household Pulse Survey Phase 3.3 who rent their dwelling.
What is the relationship between the proportion of income spent on rent and rates of depression and anxiety in the Unites States? Does income level interact with proportion of income spent on rent relative to rates of depression and anxiety?
What is the relationship between falling behind on rent and mental health?
Renters in low-income households (in the lowest 2 income brackets) who pay a high proportion of income for rent (>30%) exhibit higher rates of depression and anxiety. Among similarly situated renters, rates of depression and anxiety are lower in states with pro-tenant policies and higher in states with pro-landlord policies.
Renters who are late on rent payments exhibit higher rates of depression and anxiety. Among renters who are late on rent, rates of depression and anxiety are lower in states with pro-tenant policies and higher in states with pro-landlord policies.
Household Pulse Survey Phase 3.3, December 1, 2021 – February 7, 2022 (https://www.census.gov/programs-surveys/household-pulse-survey.html)
The Law Atlas Project, State Eviction Laws, January 1, 2021 (https://lawatlas.org/datasets/state-eviction-laws)
inclvl: numerical variable derived from INCOME
trentamt: Monthly rent from TRENTAMT
rentpct: rent as a percent of income: Variable derived from INCOME and TRENTAMT
evict: Eviction in next two months
TBIRTH_YEAR: Year of birth to determine age
GENID_DESCRIBE: Current gender identity
Race/Ethnicity: Categorical variable derived from RHISPANIC and RRACE
MS: Marital status
THHLD_NUMPER: Total number of people in household
Children in household: Dichotomous variable derived from THHLD_NUMKID
EEDUC: Educational attainment
Importing the data for Household Pulse Survey phase 3.5. I had to change to phase 3.5 (instead of 3.3) because earlier phases do not include amount of rent.
<- read.csv("./data/pulse2022_puf_46.csv")
hps46
<- read.csv("./data/pulse2022_puf_47.csv")
hps47
<- read.csv("./data/pulse2022_puf_48.csv") hps48
Combining into one dataframe.
.5 <- rbind(hps46, hps47, hps48)
hps3
names(hps3.5) <- tolower(names(hps3.5))
nrow(hps3.5)
[1] 167931
Importing the data for the Law Atlas Project State Eviction Laws.
<- read_xlsx("./data/LSCEvictionLaws_StateTerritory_Data.xlsx") lawdat
New names:
• `` -> `...60`
nrow(lawdat)
[1] 53
Importing the data for OPENICPSR Eviction Moratoria data. Setting this aside for now.
# evictdat <- read_xlsx("./data/2023.02.01 Moratoria Supportive + Measures Datasets.xlsx")
# For the timeperiod of this data, only California and Massachusetts have an eviction moratorium in place.
Joining hps3.5 with lawdat.
<- merge(x=hps3.5, y=lawdat, by='est_st', all.x=TRUE)
alldata
nrow(alldata)
[1] 167931
Selecting variables from alldata for a working dataframe. I could select more variables from lawdat or I could create an index.
<- alldata %>%
dat select(week, income, tenure, trentamt, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, est_st, Jurisdictions, hweight, pweight, grace_period)
nrow(dat)
[1] 167931
Distribution of tenure. Category “3” is renters.
tabyl(dat$tenure)
dat$tenure n percent
-99 1231 0.007330392
-88 27107 0.161417487
1 39562 0.235584853
2 64143 0.381960448
3 34032 0.202654662
4 1856 0.011052158
Filter for only renters.
<- dat %>%
dat filter(.$tenure == 3)
nrow(dat)
[1] 34032
Distribution of income.
tabyl(dat$income)
dat$income n percent
-99 468 0.01375176
-88 1568 0.04607428
1 7488 0.22002821
2 4635 0.13619535
3 4740 0.13928068
4 5750 0.16895863
5 3420 0.10049365
6 3236 0.09508698
7 1360 0.03996239
8 1367 0.04016808
Filter out rows with missing income data.
<- dat %>%
dat filter(.$income %in% c(1:8))
tabyl(dat$income)
dat$income n percent
1 7488 0.23402925
2 4635 0.14486186
3 4740 0.14814352
4 5750 0.17970996
5 3420 0.10688836
6 3236 0.10113764
7 1360 0.04250531
8 1367 0.04272409
Checking sample size.
nrow(dat)
[1] 31996
Recode income with new variable ‘inclvl’.
<- dat %>%
dat mutate(inclvl = case_when(.$income == '1' ~ 12500,
$income == '2' ~ 30000,
.$income == '3' ~ 42500,
.$income == '4' ~ 62500,
.$income == '5' ~ 87500,
.$income == '6' ~ 125000,
.$income == '7' ~ 175000,
.$income == '8' ~ 200000,
.
)
)
tabyl(dat$inclvl)
dat$inclvl n percent
12500 7488 0.23402925
30000 4635 0.14486186
42500 4740 0.14814352
62500 5750 0.17970996
87500 3420 0.10688836
125000 3236 0.10113764
175000 1360 0.04250531
200000 1367 0.04272409
Checking sample size.
nrow(dat)
[1] 31996
Histogram of income levels.
hist(dat$inclvl)
Summary statistics of rent amount.
summary(dat$trentamt)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-99 725 1200 1297 1775 3500
How many people reported paying zero rent?
length(which(dat$trentamt == 0))
[1] 138
Removing negative values for rent, but leaving zeros. The data dictionary does not show an NA variable. It should be coded as a positive value from 0 to 99999.
<- dat %>%
dat filter(.$trentamt >= 0)
Checking sample size.
nrow(dat)
[1] 30074
Updated summary statistics of rent amount with negative amounts removed.
summary(dat$trentamt)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 800 1216 1386 1800 3500
Boxplot of rent amount.
boxplot(dat$trentamt)
Histogram of rent amount.
hist(dat$trentamt)
Making a new variable for rent as a percent of income.
$rentpct <- dat$trentamt/(dat$inclvl/12) dat
Summary statistics of rent as a percent of income.
summary(dat$rentpct)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.1920 0.2832 0.4135 0.4800 3.3600
Boxplot of rent as a percent of income.
boxplot(dat$rentpct)
Histogram of rent as a percent of income.
hist(dat$rentpct)
Checking sample size.
nrow(dat)
[1] 30074
Checking distribution of rentcur.
tabyl(dat$rentcur)
dat$rentcur n percent
-99 39 0.001296801
1 26815 0.891633970
2 3220 0.107069229
Removing missing values for rentcur.
<- dat %>%
dat filter(.$rentcur %in% c(1:2))
Checking distribution of rentcur.
tabyl(dat$rentcur)
dat$rentcur n percent
1 26815 0.8927917
2 3220 0.1072083
Dichotomize rentcur. “Caught up on rent” = 1 and “Not caught up on rent” = 0
$rentcur <- car::recode(dat$rentcur, "1=1; else=0")
dattabyl(dat$rentcur)
dat$rentcur n percent
0 3220 0.1072083
1 26815 0.8927917
Checking sample size.
nrow(dat)
[1] 30035
Checking distribution of evict.
tabyl(dat$evict)
dat$evict n percent
-99 12 0.0003995339
-88 26815 0.8927917430
1 519 0.0172798402
2 852 0.0283669053
3 920 0.0306309306
4 917 0.0305310471
89.3% of respondents did not answer this. It probably would be better to use the rentcur variable instead of evict.
First, I add the 4 mental health categories together to create a mental health index.
<- dat %>%
dat mutate(mnhlth = anxious + worry + interest + down)
tabyl(dat$mnhlth)
dat$mnhlth n percent
-396 7 2.330614e-04
-295 5 1.664724e-04
-294 1 3.329449e-05
-293 2 6.658898e-05
-196 6 1.997669e-04
-195 2 6.658898e-05
-194 3 9.988347e-05
-96 32 1.065424e-03
-95 15 4.994173e-04
-94 12 3.995339e-04
-93 18 5.993008e-04
-92 6 1.997669e-04
-91 7 2.330614e-04
-90 9 2.996504e-04
-89 1 3.329449e-05
-88 4 1.331780e-04
-87 10 3.329449e-04
4 6607 2.199767e-01
5 2351 7.827535e-02
6 2882 9.595472e-02
7 2611 8.693191e-02
8 4007 1.334110e-01
9 1869 6.222740e-02
10 1613 5.370401e-02
11 1255 4.178458e-02
12 1572 5.233894e-02
13 1042 3.469286e-02
14 1013 3.372732e-02
15 850 2.830032e-02
16 2223 7.401365e-02
Filter for results from 4 to 16.
<- dat %>%
dat filter(.$mnhlth %in% c(4:16))
tabyl(dat$mnhlth)
dat$mnhlth n percent
4 6607 0.22100686
5 2351 0.07864191
6 2882 0.09640408
7 2611 0.08733902
8 4007 0.13403579
9 1869 0.06251882
10 1613 0.05395551
11 1255 0.04198026
12 1572 0.05258404
13 1042 0.03485533
14 1013 0.03388527
15 850 0.02843285
16 2223 0.07436026
Convert to PHQ4 scale by subtracting 4 from each score.
$mnhlth <- dat$mnhlth - 4
dat
tabyl(dat$mnhlth)
dat$mnhlth n percent
0 6607 0.22100686
1 2351 0.07864191
2 2882 0.09640408
3 2611 0.08733902
4 4007 0.13403579
5 1869 0.06251882
6 1613 0.05395551
7 1255 0.04198026
8 1572 0.05258404
9 1042 0.03485533
10 1013 0.03388527
11 850 0.02843285
12 2223 0.07436026
Histogram of mental health index for all respondents.
hist(dat$mnhlth)
Coding the mental health index to PHQ-4. There are four psychological distress levels: None, Mild, Moderate, and Severe. These could be dichotomized.
<- dat %>%
dat mutate(phq4 = case_when(.$mnhlth %in% c(0:2) ~"1 none",
$mnhlth %in% c(3:5) ~"2 mild",
.$mnhlth %in% c(6:8) ~"3 moderate",
.$mnhlth %in% c(9:12) ~"4 severe",
.
)
)
tabyl(dat$phq4)
dat$phq4 n percent
1 none 11840 0.3960529
2 mild 8487 0.2838936
3 moderate 4440 0.1485198
4 severe 5128 0.1715337
Make a dummy variable for bad mental health. Bad mental (10 to 16) is 1, good mental health (4 to 9) is 0. The threshold for this can be adjusted.
<- dat %>%
dat mutate(badmnhlth = case_when(.$mnhlth <=5 ~ 0,
$mnhlth >=6 ~ 1,
.
)
)
tabyl(dat$badmnhlth)
dat$badmnhlth n percent
0 20327 0.6799465
1 9568 0.3200535
A graph of income level and rent alone. Would this work better as a series of boxplots?
plot(dat$inclvl, dat$trentamt)
Visualization of bad mental health by rent alone.
cdplot(as.factor(badmnhlth) ~ trentamt, data=dat)
Visualization of bad mental health by rent alone, faceted by income levels.
ggplot(dat, aes(trentamt, after_stat(count), fill = forcats::fct_relevel(as.factor(badmnhlth)))) +
geom_density(position = "fill") +
labs(fill = "Bad mental health") +
facet_wrap(dat$inclvl)
A graph of income level and rent as a percentage of income.
plot(dat$inclvl, dat$rentpct)
Visualization of bad mental health by rent as a percent of income.
cdplot(as.factor(badmnhlth) ~ rentpct, data=dat)
Visualization of bad mental health by rent as a percent of income, faceted by income levels.
ggplot(dat, aes(rentpct, after_stat(count), fill = forcats::fct_relevel(as.factor(badmnhlth)))) +
geom_density(position = "fill") +
labs(fill = "Bad mental health") +
facet_wrap(dat$inclvl)
Table of bad mental health by rentcur.
tabyl(dat, mnhlth, rentcur)
mnhlth 0 1
0 391 6216
1 129 2222
2 195 2687
3 197 2414
4 392 3615
5 202 1667
6 189 1424
7 173 1082
8 255 1317
9 177 865
10 184 829
11 168 682
12 544 1679
Figure out some kind of visualization for this. Stacked bars by rentcur? There are sort of too many mental health levels. It would help if there were just 4 levels (which mirrors PHQ-4). I need a distribution of mental health whether a person is current or not. So maybe I just need a basic summary table with distribution.
Checking sample size.
nrow(dat)
[1] 29895
Checking the distribution of grace_period.
tabyl(dat$grace_period)
dat$grace_period n percent
0 2471 0.082655963
1 220 0.007359090
3 8217 0.274862017
5 3771 0.126141495
6 253 0.008462954
7 3317 0.110955009
10 5480 0.183308246
12 511 0.017093159
14 4730 0.158220438
20 308 0.010302726
30 617 0.020638903
How many observations do we have by state?
tabyl(dat$Jurisdictions)
dat$Jurisdictions n percent
Alabama 320 0.010704131
Alaska 337 0.011272788
Arizona 771 0.025790266
Arkansas 317 0.010603780
California 3139 0.105000836
Colorado 746 0.024954006
Connecticut 511 0.017093159
Delaware 220 0.007359090
District of Columbia 617 0.020638903
Florida 982 0.032848302
Georgia 713 0.023850142
Hawaii 465 0.015554441
Idaho 424 0.014182974
Illinois 648 0.021675866
Indiana 459 0.015353738
Iowa 386 0.012911858
Kansas 537 0.017962870
Kentucky 390 0.013045660
Louisiana 321 0.010737582
Maine 251 0.008396053
Maryland 559 0.018698779
Massachusetts 973 0.032547249
Michigan 594 0.019869543
Minnesota 534 0.017862519
Mississippi 220 0.007359090
Missouri 512 0.017126610
Montana 287 0.009600268
Nebraska 418 0.013982271
Nevada 551 0.018431176
New Hampshire 456 0.015253387
New Jersey 484 0.016189998
New Mexico 482 0.016123098
New York 819 0.027395886
North Carolina 508 0.016992808
North Dakota 278 0.009299214
Ohio 491 0.016424151
Oklahoma 462 0.015454089
Oregon 988 0.033049005
Pennsylvania 733 0.024519150
Rhode Island 308 0.010302726
South Carolina 355 0.011874895
South Dakota 253 0.008462954
Tennessee 505 0.016892457
Texas 1534 0.051312929
Utah 683 0.022846630
Vermont 304 0.010168925
Virginia 721 0.024117745
Washington 1408 0.047098177
West Virginia 181 0.006054524
Wisconsin 529 0.017695267
Wyoming 211 0.007058036
Let’s make dummy variables for income. The dummy variables are inclvl1 to inclvl8.
tabyl(dat$inclvl)
dat$inclvl n percent
12500 6844 0.22893460
30000 4295 0.14366951
42500 4442 0.14858672
62500 5413 0.18106707
87500 3235 0.10821208
125000 3066 0.10255896
175000 1295 0.04331828
200000 1305 0.04365278
$inclvl1<- car::recode(dat$inclvl, "12500=1; else=0")
dattabyl(dat$inclvl1)
dat$inclvl1 n percent
0 23051 0.7710654
1 6844 0.2289346
$inclvl2<- car::recode(dat$inclvl, "30000=1; else=0")
dattabyl(dat$inclvl2)
dat$inclvl2 n percent
0 25600 0.8563305
1 4295 0.1436695
$inclvl3<- car::recode(dat$inclvl, "42500=1; else=0")
dattabyl(dat$inclvl3)
dat$inclvl3 n percent
0 25453 0.8514133
1 4442 0.1485867
$inclvl4<- car::recode(dat$inclvl, "62500=1; else=0")
dattabyl(dat$inclvl4)
dat$inclvl4 n percent
0 24482 0.8189329
1 5413 0.1810671
$inclvl5<- car::recode(dat$inclvl, "87500=1; else=0")
dattabyl(dat$inclvl5)
dat$inclvl5 n percent
0 26660 0.8917879
1 3235 0.1082121
$inclvl6<- car::recode(dat$inclvl, "125000=1; else=0")
dattabyl(dat$inclvl6)
dat$inclvl6 n percent
0 26829 0.897441
1 3066 0.102559
$inclvl7<- car::recode(dat$inclvl, "175000=1; else=0")
dattabyl(dat$inclvl7)
dat$inclvl7 n percent
0 28600 0.95668172
1 1295 0.04331828
$inclvl8<- car::recode(dat$inclvl, "200000=1; else=0")
dattabyl(dat$inclvl8)
dat$inclvl8 n percent
0 28590 0.95634722
1 1305 0.04365278
Let’s visualize birth year with a histogram.
hist(dat$tbirth_year)
Let’s use birth year to create a new variable for age.
$age <- 2022-dat$tbirth_year
dat
tabyl(dat$age)
dat$age n percent
18 21 0.0007024586
19 48 0.0016056197
20 77 0.0025756816
21 163 0.0054524168
22 218 0.0072921893
23 370 0.0123766516
24 489 0.0163572504
25 616 0.0206054524
26 708 0.0236828901
27 736 0.0246195016
28 768 0.0256899147
29 780 0.0260913196
30 803 0.0268606790
31 803 0.0268606790
32 817 0.0273289848
33 752 0.0251547081
34 783 0.0261916708
35 725 0.0242515471
36 720 0.0240842950
37 712 0.0238166918
38 709 0.0237163405
39 744 0.0248871049
40 651 0.0217762168
41 644 0.0215420639
42 614 0.0205385516
43 636 0.0212744606
44 610 0.0204047500
45 545 0.0182304733
46 510 0.0170597090
47 497 0.0166248537
48 500 0.0167252049
49 481 0.0160896471
50 506 0.0169259073
51 535 0.0178959692
52 562 0.0187991303
53 512 0.0171266098
54 498 0.0166583041
55 410 0.0137146680
56 453 0.0151530356
57 435 0.0145509282
58 468 0.0156547918
59 417 0.0139488209
60 467 0.0156213414
61 459 0.0153537381
62 459 0.0153537381
63 449 0.0150192340
64 399 0.0133467135
65 413 0.0138150192
66 401 0.0134136143
67 367 0.0122763004
68 393 0.0131460110
69 357 0.0119417963
70 335 0.0112058873
71 269 0.0089981602
72 304 0.0101689246
73 265 0.0088643586
74 237 0.0079277471
75 235 0.0078608463
76 185 0.0061883258
77 126 0.0042147516
78 106 0.0035457434
79 117 0.0039136979
80 88 0.0029436361
81 90 0.0030105369
82 55 0.0018397725
83 54 0.0018063221
84 44 0.0014718180
85 32 0.0010704131
86 30 0.0010035123
87 22 0.0007359090
88 91 0.0030439873
hist(dat$age)
Clean up gender identity. Check the codes in the data dictionary. I need to do dummy variables. What do I do about -99? How many categories should I do?
tabyl(dat$genid_describe)
dat$genid_describe n percent
-99 139 0.004649607
1 10680 0.357250376
2 18280 0.611473491
3 271 0.009065061
4 525 0.017561465
$male <- car::recode(dat$genid_describe, "1=1; else=0") #gen1 = male
dattabyl(dat$male)
dat$male n percent
0 19215 0.6427496
1 10680 0.3572504
$female <- car::recode(dat$genid_describe, "2=1; else=0") #gen2 = female
dattabyl(dat$female)
dat$female n percent
0 11615 0.3885265
1 18280 0.6114735
$transgender <- car::recode(dat$genid_describe, "3=1; else=0") #gen3 = transgender
dattabyl(dat$transgender)
dat$transgender n percent
0 29624 0.990934939
1 271 0.009065061
$gen_none <- car::recode(dat$genid_describe, "4=1; else=0") #gen4 = None of these
dattabyl(dat$gen_none)
dat$gen_none n percent
0 29370 0.98243853
1 525 0.01756147
$gen_notsel <- car::recode(dat$genid_describe, "-99=1; else=0") #gen5 = not selected
dattabyl(dat$gen_notsel)
dat$gen_notsel n percent
0 29756 0.995350393
1 139 0.004649607
Combine race/ethnicity into a race_eth variable.
tabyl(dat$rrace)
dat$rrace n percent
1 22450 0.75096170
2 3715 0.12426827
3 1559 0.05214919
4 2171 0.07262084
tabyl(dat$rhispanic)
dat$rhispanic n percent
1 26360 0.8817528
2 3535 0.1182472
<- dat %>%
dat mutate(race_eth = case_when(.$rhispanic == 1 & .$rrace == 1 ~"nh_white",
$rhispanic == 1 & .$rrace == 2 ~"nh_black",
.$rhispanic == 1 & .$rrace == 3 ~"nh_asian",
.$rhispanic == 1 & .$rrace == 4 ~"other",
.$rhispanic == 2 & .$rrace %in% c(1:4) ~ "hispanic"
.
)
)
tabyl(dat$race_eth)
dat$race_eth n percent
hispanic 3535 0.11824720
nh_asian 1480 0.04950661
nh_black 3500 0.11707643
nh_white 19707 0.65920723
other 1673 0.05596254
Dichotomizing race and ethnicity.
$hispanic <- car::recode(dat$race_eth, "'hispanic'=1; else=0")
dattabyl(dat$hispanic)
dat$hispanic n percent
0 26360 0.8817528
1 3535 0.1182472
$nh_white <- car::recode(dat$race_eth, "'nh_white'=1; else=0")
dattabyl(dat$nh_white)
dat$nh_white n percent
0 10188 0.3407928
1 19707 0.6592072
$nh_black <- car::recode(dat$race_eth, "'nh_black'=1; else=0")
dattabyl(dat$nh_black)
dat$nh_black n percent
0 26395 0.8829236
1 3500 0.1170764
$nh_asian <- car::recode(dat$race_eth, "'nh_asian'=1; else=0")
dattabyl(dat$nh_asian)
dat$nh_asian n percent
0 28415 0.95049339
1 1480 0.04950661
$other <- car::recode(dat$race_eth, "'other'=1; else=0")
dattabyl(dat$other)
dat$other n percent
0 28222 0.94403746
1 1673 0.05596254
For educational attainment, dichotomize into “less than high school”, “high school”, “some college”, “bachelor’s or higher”. This could be dichotomized.
tabyl(dat$eeduc)
dat$eeduc n percent
1 291 0.009734069
2 628 0.021006857
3 4308 0.144104365
4 7311 0.244555946
5 3149 0.105335340
6 8107 0.271182472
7 6101 0.204080950
Dichotomizing educational attainment.
$lessthanhs <- car::recode(dat$eeduc, "1=1; 2=1; else=0")
dattabyl(dat$lessthanhs)
dat$lessthanhs n percent
0 28976 0.96925907
1 919 0.03074093
$highschool <- car::recode(dat$eeduc, "3=1; else=0")
dattabyl(dat$highschool)
dat$highschool n percent
0 25587 0.8558956
1 4308 0.1441044
$somecollege <- car::recode(dat$eeduc, "4=1; 5=1; else=0")
dattabyl(dat$somecollege)
dat$somecollege n percent
0 19435 0.6501087
1 10460 0.3498913
$bachelors <- car::recode(dat$eeduc, "6=1; 7=1; else=0")
dattabyl(dat$bachelors)
dat$bachelors n percent
0 15687 0.5247366
1 14208 0.4752634
Distribution of number of people in household. I’m leaving this as an integer for now.
tabyl(dat$thhld_numper)
dat$thhld_numper n percent
1 9703 0.324569326
2 10030 0.335507610
3 4470 0.149523332
4 3097 0.103595919
5 1423 0.047599933
6 632 0.021140659
7 283 0.009466466
8 109 0.003646095
9 47 0.001572169
10 101 0.003378491
Dichotomize presence of children.
tabyl(dat$thhld_numkid)
dat$thhld_numkid n percent
0 21396 0.715704967
1 4234 0.141629035
2 2565 0.085800301
3 1054 0.035256732
4 393 0.013146011
5 253 0.008462954
Dichotomizing presence of children. Children in household is coded as “1”, no children is “0”.
$children <- car::recode(dat$thhld_numkid, "0=0; else=1")
dattabyl(dat$children)
dat$children n percent
0 21396 0.715705
1 8499 0.284295
Dichotomizing number of adults in household. Single adult is coded as “1”, multiple adults is coded as “0”.
tabyl(dat$thhld_numadlt)
dat$thhld_numadlt n percent
1 11834 0.3958521492
2 13271 0.4439203880
3 3111 0.1040642248
4 1101 0.0368289012
5 382 0.0127780565
6 110 0.0036795451
7 33 0.0011038635
8 17 0.0005686570
9 19 0.0006355578
10 17 0.0005686570
$single_adult <- car::recode(dat$thhld_numadlt, "1=1; else=0")
dattabyl(dat$single_adult)
dat$single_adult n percent
0 18061 0.6041479
1 11834 0.3958521
Let’s review my hypotheses:
Comments: I’m going to adjust this one to the lowest 4 income brackets, because the median household income in 2020 was $67,521. The median income is encompassed in the 4th income bracket, so this will capture all households from the lowest incomes to $74,999. Furthermore, instead of using rent divided by income, I will just use rent. This is because income is divided into wide categories and it is not possible to get a precise measurement of rent as a proportion of income. My options for DVs are mental health on 3 scales: 1) a categorical variable from 0 to 12, 2) PHQ4 which is a categorical variable of 4 measurements, or 3) a dichotomous variable for bad mental health, split down the middle (or it can be divided another way, such as the most severe level).
Statistical method: Population: limited to bottom 4 categories of income. IV: Rent amount an integer. DV: Mental health but which variable?
Statistical method: Limit to lowest 4 income brackets. IV: Grace period by state. DV: Mental health but which variable?
Statistical method: IV: Current on rent, which is dichotomous. DV: Mental health, but which variable?
Statistical method: Population: only renters who are late on rent. IV: Grace period by state. DV: Mental health, but which variable.
tabyl(dat$income)
dat$income n percent
1 6844 0.22893460
2 4295 0.14366951
3 4442 0.14858672
4 5413 0.18106707
5 3235 0.10821208
6 3066 0.10255896
7 1295 0.04331828
8 1305 0.04365278
nrow(dat)
[1] 29895
<- dat %>%
dat2 filter(income %in% c(1:4))
tabyl(dat2$income)
dat2$income n percent
1 6844 0.3259979
2 4295 0.2045823
3 4442 0.2115843
4 5413 0.2578356
nrow(dat2)
[1] 20994
#Logit models of bad mental health - adapt for my project
#brfss21malesm$X_AGE80SQ<-brfss21malesm$X_AGE80*brfss21malesm$X_AGE80
#model <- glm(badmentalhealth ~ X_AGE80, family = "binomial", data = brfss21malesm)
#summary(model)
Next:
Notes:
Income is by household.
I made multiple variables for mental health. There is a mental health variable with levels from 4 to 16. (Do I need to adjust this to 0 to 12?) PHQ4 is a four level screening variable. Bad mental health is a dichotomized good/bad mental health variable.
age
gender identity
race/ethnicity
number of people in household
presence of children in household
number of adults in household
Next steps:
Clean up and visualize grace period and eviction risk. Rentcur has many more responses than evict. Check to see how many responses to evict in each state.
Start modelling. Where do I start and what steps do I take?