Stats II Project Part 3

Author

Brian Surratt

Published

April 25, 2023

Variable descriptions and descriptive results. Here you will describe the coding of your outcome and independent variables. You will produce and discuss summary statistics (tables) and data visualizations (figures) for your outcome and independent variables. You will also need to discuss how your sample size decreases with successive restrictions. This part of the project will alert me to potential coding issues that you need to fix prior to running models. Please send me a draft of this part (along with your revisions to the first part of the project) by April 7th or April 14th (depending on when we discuss your project proposals).

Rental Burden and Mental Health

In “Evicted: Poverty and Profit in the American City,” the author Matthew Desmond observed that in some households, there isn’t enough to eat because the “rent eats first.” Housing is shelter, a basic human need. As documented by Desmond, housing is so important it can take priority over other basic needs like food. Therefore, it is reasonable to presume that anxiety and depression is elevated in low-income households that pay a high proportion of income on rent, as well as households that have fallen behind on rent. This research project will explore the relationship between rental cost burden and tenant mental health.

The project will focus on renters (rather than homeowners) for multiple reasons. Prior research shows that low-income families are more likely to rent than to own a home. Renters exhibit greater cost burden than homeowners. Furthermore, renters tend to be at greater risk of eviction than homeowners. The sample consists of respondents to the U.S. Census Bureau Household Pulse Survey Phase 3.3 who rent their dwelling.

Research questions:

  1. What is the relationship between the proportion of income spent on rent and rates of depression and anxiety in the Unites States? Does income level interact with proportion of income spent on rent relative to rates of depression and anxiety?

    1. Do state landlord-tenant laws affect these relationships
  2. What is the relationship between falling behind on rent and mental health?

    1. Do state landlord-tenant laws affect these relationships?

Hypotheses

  1. Renters in low-income households (in the lowest 2 income brackets) who pay a high proportion of income for rent (>30%) exhibit higher rates of depression and anxiety. Among similarly situated renters, rates of depression and anxiety are lower in states with pro-tenant policies and higher in states with pro-landlord policies.

  2. Renters who are late on rent payments exhibit higher rates of depression and anxiety. Among renters who are late on rent, rates of depression and anxiety are lower in states with pro-tenant policies and higher in states with pro-landlord policies.

Data sources

Independent variables (from Household Pulse Survey)

  • inclvl: numerical variable derived from INCOME

  • trentamt: Monthly rent from TRENTAMT

  • rentpct: rent as a percent of income: Variable derived from INCOME and TRENTAMT

  • evict: Eviction in next two months

Dependent variables (from Household Pulse Survey)

  • badmnhlth: a dummy variable for bad mental health derived from ANXIOUS, WORRY, INTEREST, and DOWN

Control variables (from Household Pulse Survey)

  • TBIRTH_YEAR: Year of birth to determine age

  • GENID_DESCRIBE: Current gender identity

  • Race/Ethnicity: Categorical variable derived from RHISPANIC and RRACE

  • MS: Marital status

  • THHLD_NUMPER: Total number of people in household

  • Children in household: Dichotomous variable derived from THHLD_NUMKID

  • EEDUC: Educational attainment

State Policy Variables

  • grace_period: Grace period for paying rent prior to eviction, a numerical variable derived from Question 14 on State Eviction Laws database

Reading in the data

Importing the data for Household Pulse Survey phase 3.5. I had to change to phase 3.5 (instead of 3.3) because earlier phases do not include amount of rent.

Code
hps46 <- read.csv("./data/pulse2022_puf_46.csv")

hps47 <- read.csv("./data/pulse2022_puf_47.csv")

hps48 <- read.csv("./data/pulse2022_puf_48.csv")

Combining into one dataframe.

Code
hps3.5 <- rbind(hps46, hps47, hps48)

names(hps3.5) <- tolower(names(hps3.5))

nrow(hps3.5)
[1] 167931

Importing the data for the Law Atlas Project State Eviction Laws.

Code
lawdat <- read_xlsx("./data/LSCEvictionLaws_StateTerritory_Data.xlsx")
New names:
• `` -> `...60`
Code
nrow(lawdat)
[1] 53

Importing the data for OPENICPSR Eviction Moratoria data. Setting this aside for now.

Code
# evictdat <- read_xlsx("./data/2023.02.01 Moratoria Supportive + Measures Datasets.xlsx")

# For the timeperiod of this data, only California and Massachusetts have an eviction moratorium in place.

Joining hps3.5 with lawdat.

Code
alldata <- merge(x=hps3.5, y=lawdat, by='est_st', all.x=TRUE)

nrow(alldata)
[1] 167931

Selecting variables from alldata for a working dataframe. I could select more variables from lawdat or I could create an index.

Code
dat <- alldata %>% 
  select(week, income, tenure, trentamt, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, est_st, Jurisdictions, hweight, pweight, grace_period)

nrow(dat)
[1] 167931

Cleaning and exploring the data

Distribution of tenure. Category “3” is renters.

Code
tabyl(dat$tenure)
 dat$tenure     n     percent
        -99  1231 0.007330392
        -88 27107 0.161417487
          1 39562 0.235584853
          2 64143 0.381960448
          3 34032 0.202654662
          4  1856 0.011052158

Filter for only renters.

Code
dat <- dat %>%
  filter(.$tenure == 3)

nrow(dat)
[1] 34032

Distribution of income.

Code
tabyl(dat$income)
 dat$income    n    percent
        -99  468 0.01375176
        -88 1568 0.04607428
          1 7488 0.22002821
          2 4635 0.13619535
          3 4740 0.13928068
          4 5750 0.16895863
          5 3420 0.10049365
          6 3236 0.09508698
          7 1360 0.03996239
          8 1367 0.04016808

Filter out rows with missing income data.

Code
dat <- dat %>% 
  filter(.$income %in% c(1:8))

tabyl(dat$income)
 dat$income    n    percent
          1 7488 0.23402925
          2 4635 0.14486186
          3 4740 0.14814352
          4 5750 0.17970996
          5 3420 0.10688836
          6 3236 0.10113764
          7 1360 0.04250531
          8 1367 0.04272409

Checking sample size.

Code
nrow(dat)
[1] 31996

Recode income with new variable ‘inclvl’.

Code
dat <- dat %>%
  mutate(inclvl = case_when(.$income == '1' ~ 12500,
                            .$income == '2' ~ 30000,
                            .$income == '3' ~ 42500,
                            .$income == '4' ~ 62500,
                            .$income == '5' ~ 87500,
                            .$income == '6' ~ 125000,
                            .$income == '7' ~ 175000,
                            .$income == '8' ~ 200000,
                              )
        )

tabyl(dat$inclvl)
 dat$inclvl    n    percent
      12500 7488 0.23402925
      30000 4635 0.14486186
      42500 4740 0.14814352
      62500 5750 0.17970996
      87500 3420 0.10688836
     125000 3236 0.10113764
     175000 1360 0.04250531
     200000 1367 0.04272409

Checking sample size.

Code
nrow(dat)
[1] 31996

Histogram of income levels.

Code
hist(dat$inclvl)

Summary statistics of rent amount.

Code
summary(dat$trentamt)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    -99     725    1200    1297    1775    3500 

How many people reported paying zero rent?

Code
length(which(dat$trentamt == 0))
[1] 138

Removing negative values for rent, but leaving zeros. The data dictionary does not show an NA variable. It should be coded as a positive value from 0 to 99999.

Code
dat <- dat %>% 
  filter(.$trentamt >= 0)

Checking sample size.

Code
nrow(dat)
[1] 30074

Updated summary statistics of rent amount with negative amounts removed.

Code
summary(dat$trentamt)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0     800    1216    1386    1800    3500 

Boxplot of rent amount.

Code
boxplot(dat$trentamt)

Histogram of rent amount.

Code
hist(dat$trentamt)

Making a new variable for rent as a percent of income.

Code
dat$rentpct <- dat$trentamt/(dat$inclvl/12)

Summary statistics of rent as a percent of income.

Code
summary(dat$rentpct)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.1920  0.2832  0.4135  0.4800  3.3600 

Boxplot of rent as a percent of income.

Code
boxplot(dat$rentpct)

Histogram of rent as a percent of income.

Code
hist(dat$rentpct)

Checking sample size.

Code
nrow(dat)
[1] 30074

Checking distribution of rentcur.

Code
tabyl(dat$rentcur)
 dat$rentcur     n     percent
         -99    39 0.001296801
           1 26815 0.891633970
           2  3220 0.107069229

Removing missing values for rentcur.

Code
dat <- dat %>% 
  filter(.$rentcur %in% c(1:2))

Checking distribution of rentcur.

Code
tabyl(dat$rentcur)
 dat$rentcur     n   percent
           1 26815 0.8927917
           2  3220 0.1072083

Dichotomize rentcur. “Caught up on rent” = 1 and “Not caught up on rent” = 0

Code
dat$rentcur <- car::recode(dat$rentcur, "1=1; else=0")
tabyl(dat$rentcur)
 dat$rentcur     n   percent
           0  3220 0.1072083
           1 26815 0.8927917

Checking sample size.

Code
nrow(dat)
[1] 30035

Checking distribution of evict.

Code
tabyl(dat$evict)
 dat$evict     n      percent
       -99    12 0.0003995339
       -88 26815 0.8927917430
         1   519 0.0172798402
         2   852 0.0283669053
         3   920 0.0306309306
         4   917 0.0305310471

89.3% of respondents did not answer this. It probably would be better to use the rentcur variable instead of evict.


Mental health.

First, I add the 4 mental health categories together to create a mental health index.

Code
dat <- dat %>% 
  mutate(mnhlth = anxious + worry + interest + down)

tabyl(dat$mnhlth)
 dat$mnhlth    n      percent
       -396    7 2.330614e-04
       -295    5 1.664724e-04
       -294    1 3.329449e-05
       -293    2 6.658898e-05
       -196    6 1.997669e-04
       -195    2 6.658898e-05
       -194    3 9.988347e-05
        -96   32 1.065424e-03
        -95   15 4.994173e-04
        -94   12 3.995339e-04
        -93   18 5.993008e-04
        -92    6 1.997669e-04
        -91    7 2.330614e-04
        -90    9 2.996504e-04
        -89    1 3.329449e-05
        -88    4 1.331780e-04
        -87   10 3.329449e-04
          4 6607 2.199767e-01
          5 2351 7.827535e-02
          6 2882 9.595472e-02
          7 2611 8.693191e-02
          8 4007 1.334110e-01
          9 1869 6.222740e-02
         10 1613 5.370401e-02
         11 1255 4.178458e-02
         12 1572 5.233894e-02
         13 1042 3.469286e-02
         14 1013 3.372732e-02
         15  850 2.830032e-02
         16 2223 7.401365e-02

Filter for results from 4 to 16.

Code
dat <- dat %>% 
  filter(.$mnhlth %in% c(4:16))

tabyl(dat$mnhlth)
 dat$mnhlth    n    percent
          4 6607 0.22100686
          5 2351 0.07864191
          6 2882 0.09640408
          7 2611 0.08733902
          8 4007 0.13403579
          9 1869 0.06251882
         10 1613 0.05395551
         11 1255 0.04198026
         12 1572 0.05258404
         13 1042 0.03485533
         14 1013 0.03388527
         15  850 0.02843285
         16 2223 0.07436026

Convert to PHQ4 scale by subtracting 4 from each score.

Code
dat$mnhlth <- dat$mnhlth - 4

tabyl(dat$mnhlth)
 dat$mnhlth    n    percent
          0 6607 0.22100686
          1 2351 0.07864191
          2 2882 0.09640408
          3 2611 0.08733902
          4 4007 0.13403579
          5 1869 0.06251882
          6 1613 0.05395551
          7 1255 0.04198026
          8 1572 0.05258404
          9 1042 0.03485533
         10 1013 0.03388527
         11  850 0.02843285
         12 2223 0.07436026

Histogram of mental health index for all respondents.

Code
hist(dat$mnhlth)

Coding the mental health index to PHQ-4. There are four psychological distress levels: None, Mild, Moderate, and Severe. These could be dichotomized.

Code
dat <- dat %>%
  mutate(phq4 = case_when(.$mnhlth %in% c(0:2) ~"1 none",
                          .$mnhlth %in% c(3:5) ~"2 mild",
                          .$mnhlth %in% c(6:8) ~"3 moderate",
                          .$mnhlth %in% c(9:12) ~"4 severe",
                          )
         )

tabyl(dat$phq4)
   dat$phq4     n   percent
     1 none 11840 0.3960529
     2 mild  8487 0.2838936
 3 moderate  4440 0.1485198
   4 severe  5128 0.1715337

Make a dummy variable for bad mental health. Bad mental (10 to 16) is 1, good mental health (4 to 9) is 0. The threshold for this can be adjusted.

Code
dat <- dat %>%
  mutate(badmnhlth = case_when(.$mnhlth <=5 ~ 0,
                               .$mnhlth >=6 ~ 1,
                               )
         )

tabyl(dat$badmnhlth)
 dat$badmnhlth     n   percent
             0 20327 0.6799465
             1  9568 0.3200535

Visualizations

A graph of income level and rent alone. Would this work better as a series of boxplots?

Code
plot(dat$inclvl, dat$trentamt)

Visualization of bad mental health by rent alone.

Code
cdplot(as.factor(badmnhlth) ~ trentamt, data=dat)

Visualization of bad mental health by rent alone, faceted by income levels.

Code
ggplot(dat, aes(trentamt, after_stat(count), fill = forcats::fct_relevel(as.factor(badmnhlth)))) +
  geom_density(position = "fill") +
  labs(fill = "Bad mental health") +
  facet_wrap(dat$inclvl)

A graph of income level and rent as a percentage of income.

Code
plot(dat$inclvl, dat$rentpct)

Visualization of bad mental health by rent as a percent of income.

Code
cdplot(as.factor(badmnhlth) ~ rentpct, data=dat)

Visualization of bad mental health by rent as a percent of income, faceted by income levels.

Code
ggplot(dat, aes(rentpct, after_stat(count), fill = forcats::fct_relevel(as.factor(badmnhlth)))) +
  geom_density(position = "fill") +
  labs(fill = "Bad mental health") +
  facet_wrap(dat$inclvl)

Table of bad mental health by rentcur.

Code
tabyl(dat, mnhlth, rentcur)
 mnhlth   0    1
      0 391 6216
      1 129 2222
      2 195 2687
      3 197 2414
      4 392 3615
      5 202 1667
      6 189 1424
      7 173 1082
      8 255 1317
      9 177  865
     10 184  829
     11 168  682
     12 544 1679

Figure out some kind of visualization for this. Stacked bars by rentcur? There are sort of too many mental health levels. It would help if there were just 4 levels (which mirrors PHQ-4). I need a distribution of mental health whether a person is current or not. So maybe I just need a basic summary table with distribution.

Checking sample size.

Code
nrow(dat)
[1] 29895

Checking the distribution of grace_period.

Code
tabyl(dat$grace_period)
 dat$grace_period    n     percent
                0 2471 0.082655963
                1  220 0.007359090
                3 8217 0.274862017
                5 3771 0.126141495
                6  253 0.008462954
                7 3317 0.110955009
               10 5480 0.183308246
               12  511 0.017093159
               14 4730 0.158220438
               20  308 0.010302726
               30  617 0.020638903

How many observations do we have by state?

Code
tabyl(dat$Jurisdictions)
    dat$Jurisdictions    n     percent
              Alabama  320 0.010704131
               Alaska  337 0.011272788
              Arizona  771 0.025790266
             Arkansas  317 0.010603780
           California 3139 0.105000836
             Colorado  746 0.024954006
          Connecticut  511 0.017093159
             Delaware  220 0.007359090
 District of Columbia  617 0.020638903
              Florida  982 0.032848302
              Georgia  713 0.023850142
               Hawaii  465 0.015554441
                Idaho  424 0.014182974
             Illinois  648 0.021675866
              Indiana  459 0.015353738
                 Iowa  386 0.012911858
               Kansas  537 0.017962870
             Kentucky  390 0.013045660
            Louisiana  321 0.010737582
                Maine  251 0.008396053
             Maryland  559 0.018698779
        Massachusetts  973 0.032547249
             Michigan  594 0.019869543
            Minnesota  534 0.017862519
          Mississippi  220 0.007359090
             Missouri  512 0.017126610
              Montana  287 0.009600268
             Nebraska  418 0.013982271
               Nevada  551 0.018431176
        New Hampshire  456 0.015253387
           New Jersey  484 0.016189998
           New Mexico  482 0.016123098
             New York  819 0.027395886
       North Carolina  508 0.016992808
         North Dakota  278 0.009299214
                 Ohio  491 0.016424151
             Oklahoma  462 0.015454089
               Oregon  988 0.033049005
         Pennsylvania  733 0.024519150
         Rhode Island  308 0.010302726
       South Carolina  355 0.011874895
         South Dakota  253 0.008462954
            Tennessee  505 0.016892457
                Texas 1534 0.051312929
                 Utah  683 0.022846630
              Vermont  304 0.010168925
             Virginia  721 0.024117745
           Washington 1408 0.047098177
        West Virginia  181 0.006054524
            Wisconsin  529 0.017695267
              Wyoming  211 0.007058036

Cleaning and dichotomizing income

Let’s make dummy variables for income. The dummy variables are inclvl1 to inclvl8.

Code
tabyl(dat$inclvl)
 dat$inclvl    n    percent
      12500 6844 0.22893460
      30000 4295 0.14366951
      42500 4442 0.14858672
      62500 5413 0.18106707
      87500 3235 0.10821208
     125000 3066 0.10255896
     175000 1295 0.04331828
     200000 1305 0.04365278
Code
dat$inclvl1<- car::recode(dat$inclvl, "12500=1; else=0")
tabyl(dat$inclvl1)
 dat$inclvl1     n   percent
           0 23051 0.7710654
           1  6844 0.2289346
Code
dat$inclvl2<- car::recode(dat$inclvl, "30000=1; else=0")
tabyl(dat$inclvl2)
 dat$inclvl2     n   percent
           0 25600 0.8563305
           1  4295 0.1436695
Code
dat$inclvl3<- car::recode(dat$inclvl, "42500=1; else=0")
tabyl(dat$inclvl3)
 dat$inclvl3     n   percent
           0 25453 0.8514133
           1  4442 0.1485867
Code
dat$inclvl4<- car::recode(dat$inclvl, "62500=1; else=0")
tabyl(dat$inclvl4)
 dat$inclvl4     n   percent
           0 24482 0.8189329
           1  5413 0.1810671
Code
dat$inclvl5<- car::recode(dat$inclvl, "87500=1; else=0")
tabyl(dat$inclvl5)
 dat$inclvl5     n   percent
           0 26660 0.8917879
           1  3235 0.1082121
Code
dat$inclvl6<- car::recode(dat$inclvl, "125000=1; else=0")
tabyl(dat$inclvl6)
 dat$inclvl6     n  percent
           0 26829 0.897441
           1  3066 0.102559
Code
dat$inclvl7<- car::recode(dat$inclvl, "175000=1; else=0")
tabyl(dat$inclvl7)
 dat$inclvl7     n    percent
           0 28600 0.95668172
           1  1295 0.04331828
Code
dat$inclvl8<- car::recode(dat$inclvl, "200000=1; else=0")
tabyl(dat$inclvl8)
 dat$inclvl8     n    percent
           0 28590 0.95634722
           1  1305 0.04365278

Cleaning and recoding age

Let’s visualize birth year with a histogram.

Code
hist(dat$tbirth_year)

Let’s use birth year to create a new variable for age.

Code
dat$age <- 2022-dat$tbirth_year

tabyl(dat$age)
 dat$age   n      percent
      18  21 0.0007024586
      19  48 0.0016056197
      20  77 0.0025756816
      21 163 0.0054524168
      22 218 0.0072921893
      23 370 0.0123766516
      24 489 0.0163572504
      25 616 0.0206054524
      26 708 0.0236828901
      27 736 0.0246195016
      28 768 0.0256899147
      29 780 0.0260913196
      30 803 0.0268606790
      31 803 0.0268606790
      32 817 0.0273289848
      33 752 0.0251547081
      34 783 0.0261916708
      35 725 0.0242515471
      36 720 0.0240842950
      37 712 0.0238166918
      38 709 0.0237163405
      39 744 0.0248871049
      40 651 0.0217762168
      41 644 0.0215420639
      42 614 0.0205385516
      43 636 0.0212744606
      44 610 0.0204047500
      45 545 0.0182304733
      46 510 0.0170597090
      47 497 0.0166248537
      48 500 0.0167252049
      49 481 0.0160896471
      50 506 0.0169259073
      51 535 0.0178959692
      52 562 0.0187991303
      53 512 0.0171266098
      54 498 0.0166583041
      55 410 0.0137146680
      56 453 0.0151530356
      57 435 0.0145509282
      58 468 0.0156547918
      59 417 0.0139488209
      60 467 0.0156213414
      61 459 0.0153537381
      62 459 0.0153537381
      63 449 0.0150192340
      64 399 0.0133467135
      65 413 0.0138150192
      66 401 0.0134136143
      67 367 0.0122763004
      68 393 0.0131460110
      69 357 0.0119417963
      70 335 0.0112058873
      71 269 0.0089981602
      72 304 0.0101689246
      73 265 0.0088643586
      74 237 0.0079277471
      75 235 0.0078608463
      76 185 0.0061883258
      77 126 0.0042147516
      78 106 0.0035457434
      79 117 0.0039136979
      80  88 0.0029436361
      81  90 0.0030105369
      82  55 0.0018397725
      83  54 0.0018063221
      84  44 0.0014718180
      85  32 0.0010704131
      86  30 0.0010035123
      87  22 0.0007359090
      88  91 0.0030439873
Code
hist(dat$age)

Recoding and dichotomizing gender identity.

Clean up gender identity. Check the codes in the data dictionary. I need to do dummy variables. What do I do about -99? How many categories should I do?

Code
tabyl(dat$genid_describe)
 dat$genid_describe     n     percent
                -99   139 0.004649607
                  1 10680 0.357250376
                  2 18280 0.611473491
                  3   271 0.009065061
                  4   525 0.017561465
Code
dat$male <- car::recode(dat$genid_describe, "1=1; else=0") #gen1 = male
tabyl(dat$male)
 dat$male     n   percent
        0 19215 0.6427496
        1 10680 0.3572504
Code
dat$female <- car::recode(dat$genid_describe, "2=1; else=0") #gen2 = female
tabyl(dat$female)
 dat$female     n   percent
          0 11615 0.3885265
          1 18280 0.6114735
Code
dat$transgender <- car::recode(dat$genid_describe, "3=1; else=0") #gen3 = transgender
tabyl(dat$transgender)
 dat$transgender     n     percent
               0 29624 0.990934939
               1   271 0.009065061
Code
dat$gen_none <- car::recode(dat$genid_describe, "4=1; else=0") #gen4 = None of these
tabyl(dat$gen_none)
 dat$gen_none     n    percent
            0 29370 0.98243853
            1   525 0.01756147
Code
dat$gen_notsel <- car::recode(dat$genid_describe, "-99=1; else=0") #gen5 = not selected
tabyl(dat$gen_notsel)
 dat$gen_notsel     n     percent
              0 29756 0.995350393
              1   139 0.004649607

Recoding and dichotomizing race and ethnicity.

Combine race/ethnicity into a race_eth variable.

Code
tabyl(dat$rrace)
 dat$rrace     n    percent
         1 22450 0.75096170
         2  3715 0.12426827
         3  1559 0.05214919
         4  2171 0.07262084
Code
tabyl(dat$rhispanic)
 dat$rhispanic     n   percent
             1 26360 0.8817528
             2  3535 0.1182472
Code
dat <- dat %>%
  mutate(race_eth = case_when(.$rhispanic == 1 & .$rrace == 1 ~"nh_white",
                              .$rhispanic == 1 & .$rrace == 2 ~"nh_black",
                              .$rhispanic == 1 & .$rrace == 3 ~"nh_asian",
                              .$rhispanic == 1 & .$rrace == 4 ~"other",
                              .$rhispanic == 2 & .$rrace %in% c(1:4) ~ "hispanic"
                              )
        )

tabyl(dat$race_eth)
 dat$race_eth     n    percent
     hispanic  3535 0.11824720
     nh_asian  1480 0.04950661
     nh_black  3500 0.11707643
     nh_white 19707 0.65920723
        other  1673 0.05596254

Dichotomizing race and ethnicity.

Code
dat$hispanic <- car::recode(dat$race_eth, "'hispanic'=1; else=0")
tabyl(dat$hispanic)
 dat$hispanic     n   percent
            0 26360 0.8817528
            1  3535 0.1182472
Code
dat$nh_white <- car::recode(dat$race_eth, "'nh_white'=1; else=0")
tabyl(dat$nh_white)
 dat$nh_white     n   percent
            0 10188 0.3407928
            1 19707 0.6592072
Code
dat$nh_black <- car::recode(dat$race_eth, "'nh_black'=1; else=0")
tabyl(dat$nh_black)
 dat$nh_black     n   percent
            0 26395 0.8829236
            1  3500 0.1170764
Code
dat$nh_asian <- car::recode(dat$race_eth, "'nh_asian'=1; else=0")
tabyl(dat$nh_asian)
 dat$nh_asian     n    percent
            0 28415 0.95049339
            1  1480 0.04950661
Code
dat$other <- car::recode(dat$race_eth, "'other'=1; else=0")
tabyl(dat$other)
 dat$other     n    percent
         0 28222 0.94403746
         1  1673 0.05596254

Cleaning and recoding educational attainment.

For educational attainment, dichotomize into “less than high school”, “high school”, “some college”, “bachelor’s or higher”. This could be dichotomized.

Code
tabyl(dat$eeduc)
 dat$eeduc    n     percent
         1  291 0.009734069
         2  628 0.021006857
         3 4308 0.144104365
         4 7311 0.244555946
         5 3149 0.105335340
         6 8107 0.271182472
         7 6101 0.204080950

Dichotomizing educational attainment.

Code
dat$lessthanhs <- car::recode(dat$eeduc, "1=1; 2=1; else=0")
tabyl(dat$lessthanhs)
 dat$lessthanhs     n    percent
              0 28976 0.96925907
              1   919 0.03074093
Code
dat$highschool <- car::recode(dat$eeduc, "3=1; else=0")
tabyl(dat$highschool)
 dat$highschool     n   percent
              0 25587 0.8558956
              1  4308 0.1441044
Code
dat$somecollege <- car::recode(dat$eeduc, "4=1; 5=1; else=0")
tabyl(dat$somecollege)
 dat$somecollege     n   percent
               0 19435 0.6501087
               1 10460 0.3498913
Code
dat$bachelors <- car::recode(dat$eeduc, "6=1; 7=1; else=0")
tabyl(dat$bachelors)
 dat$bachelors     n   percent
             0 15687 0.5247366
             1 14208 0.4752634

Number of people in the household.

Distribution of number of people in household. I’m leaving this as an integer for now.

Code
tabyl(dat$thhld_numper)
 dat$thhld_numper     n     percent
                1  9703 0.324569326
                2 10030 0.335507610
                3  4470 0.149523332
                4  3097 0.103595919
                5  1423 0.047599933
                6   632 0.021140659
                7   283 0.009466466
                8   109 0.003646095
                9    47 0.001572169
               10   101 0.003378491

Visualizing and dichotomizing presence of children

Dichotomize presence of children.

Code
tabyl(dat$thhld_numkid)
 dat$thhld_numkid     n     percent
                0 21396 0.715704967
                1  4234 0.141629035
                2  2565 0.085800301
                3  1054 0.035256732
                4   393 0.013146011
                5   253 0.008462954

Dichotomizing presence of children. Children in household is coded as “1”, no children is “0”.

Code
dat$children <- car::recode(dat$thhld_numkid, "0=0; else=1")
tabyl(dat$children)
 dat$children     n  percent
            0 21396 0.715705
            1  8499 0.284295

Visualizing and dichotomizing households with one adult versus multiple adults.

Dichotomizing number of adults in household. Single adult is coded as “1”, multiple adults is coded as “0”.

Code
tabyl(dat$thhld_numadlt)
 dat$thhld_numadlt     n      percent
                 1 11834 0.3958521492
                 2 13271 0.4439203880
                 3  3111 0.1040642248
                 4  1101 0.0368289012
                 5   382 0.0127780565
                 6   110 0.0036795451
                 7    33 0.0011038635
                 8    17 0.0005686570
                 9    19 0.0006355578
                10    17 0.0005686570
Code
dat$single_adult <- car::recode(dat$thhld_numadlt, "1=1; else=0")
tabyl(dat$single_adult)
 dat$single_adult     n   percent
                0 18061 0.6041479
                1 11834 0.3958521

Modelling

Let’s review my hypotheses:

  1. Renters in low-income households (in the lowest 2 income brackets) who pay a high proportion of income for rent (>30%) exhibit higher rates of depression and anxiety.

Comments: I’m going to adjust this one to the lowest 4 income brackets, because the median household income in 2020 was $67,521. The median income is encompassed in the 4th income bracket, so this will capture all households from the lowest incomes to $74,999. Furthermore, instead of using rent divided by income, I will just use rent. This is because income is divided into wide categories and it is not possible to get a precise measurement of rent as a proportion of income. My options for DVs are mental health on 3 scales: 1) a categorical variable from 0 to 12, 2) PHQ4 which is a categorical variable of 4 measurements, or 3) a dichotomous variable for bad mental health, split down the middle (or it can be divided another way, such as the most severe level).

Statistical method: Population: limited to bottom 4 categories of income. IV: Rent amount an integer. DV: Mental health but which variable?

  1. Among similarly situated renters, rates of depression and anxiety are lower in states with pro-tenant policies and higher in states with pro-landlord policies.

Statistical method: Limit to lowest 4 income brackets. IV: Grace period by state. DV: Mental health but which variable?

  1. Renters who are late on rent payments exhibit higher rates of depression and anxiety.

Statistical method: IV: Current on rent, which is dichotomous. DV: Mental health, but which variable?

  1. Among renters who are late on rent, rates of depression and anxiety are lower in states with pro-tenant policies and higher in states with pro-landlord policies.

Statistical method: Population: only renters who are late on rent. IV: Grace period by state. DV: Mental health, but which variable.

Limit database to population in lower 4 income brackets

Code
tabyl(dat$income)
 dat$income    n    percent
          1 6844 0.22893460
          2 4295 0.14366951
          3 4442 0.14858672
          4 5413 0.18106707
          5 3235 0.10821208
          6 3066 0.10255896
          7 1295 0.04331828
          8 1305 0.04365278
Code
nrow(dat)
[1] 29895
Code
dat2 <- dat %>% 
  filter(income %in% c(1:4))

tabyl(dat2$income)
 dat2$income    n   percent
           1 6844 0.3259979
           2 4295 0.2045823
           3 4442 0.2115843
           4 5413 0.2578356
Code
nrow(dat2)
[1] 20994

Testing Hypothesis 1 - rent and bad mental health

Code
#Logit models of bad mental health - adapt for my project

#brfss21malesm$X_AGE80SQ<-brfss21malesm$X_AGE80*brfss21malesm$X_AGE80

#model <- glm(badmentalhealth ~ X_AGE80, family = "binomial", data = brfss21malesm)
#summary(model)

Next:

  • Check on grace period and risk of eviction. How many respondents do we have in each state for evict?

Notes:

  • Income is by household.

  • I made multiple variables for mental health. There is a mental health variable with levels from 4 to 16. (Do I need to adjust this to 0 to 12?) PHQ4 is a four level screening variable. Bad mental health is a dichotomized good/bad mental health variable.

  • age

  • gender identity

  • race/ethnicity

  • number of people in household

  • presence of children in household

  • number of adults in household

Next steps:

  • Clean up and visualize grace period and eviction risk. Rentcur has many more responses than evict. Check to see how many responses to evict in each state.

  • Start modelling. Where do I start and what steps do I take?