Surratt Eviction Analysis for SDA

Author

Brian Surratt

Published

September 4, 2023

Data Sources

Household Pulse Survey Phases 3.2 to 3.4 (weeks 34 to 45) from July 21, 2021 – May 9, 2022 (https://www.census.gov/programs-surveys/household-pulse-survey.html)
COVID-19 Housing Policy Scorecard (https://evictionlab.org/covid-policy-scorecard/)

Measures

Dependent variables

From Household Pulse Survey

phq2 = Dummy variable for depression. Sum ‘interest’ and ‘down’. Score of 3 or higher indicates major depressive disorder. Dichotomized.

gad2 = Dummy variable for anxiety. Sum ‘anxious’ and ‘worry’. Score of 3 or higher indicative of generalized anxiety disorder. Dichotomized.

See section 2.2.1 Mental Health in https://www.sciencedirect.com/science/article/pii/S0277953620307760

Independent variables

From Household Pulse Survey

rentcur: Caught up on rent. Dichotomized so that “Caught up on rent” = 1 and “Not caught up on rent” = 0. NAs removed.
evict: Risk of eviction in next two months. 0 is caught up on rent, 1 is not likely at all, 2 is not very likely, 3 is somewhat likeley, 4 is very likely. NAs removed.
evict0, evict1, evict2, evict3, evict4 - dichotomized eviction risk

State Policy Variable

From COVID-19 Housing Policy Scoreboard

score: State score on the COVID-19 Housing Policy Scorecard. Value is on a scale from 0 to 5, with the highest score being 4.63.

Covariates

(from Household Pulse Survey)

income: Total household income (before taxes).
1. Less than $25,000
1. $25,000 - $34,999
1. $35,000 - $49,999
1. $50,000 - $74,999
1. $75,000 - $99,999
1. $100,000 - $149,999
1. $150,000 - $199,999
1. $200,000 and above
1. Question seen but category not selected
1. Missing / Did not report
age: Derived from tbirth_year (year of birth)
genid_describe: Current gender identity. 1 is Male, 2 is Female, 3 is Transgender, 4 is none of these, 88, 99.
race_eth: Categorical variable derived from ‘rhispanic’ and ‘rrace’. (Describe coding)
single_adult: Dummy variable for single adult in household vs. multiple adults. Derived from ‘thhld_numadlt’.
thhld_numper: Total number of people in household
child: Dummy variable derived from ‘thhld_numkid’
eeduc: Educational attainment. 1 is less than high school, 2 is some high school, 3 is high school graduate, 4 is some college but no degree, 5 is associates degree, 6 is bachelor’s degree, 7 is graduate degree.

Other variables for cleaning

week: week of interview (review how to use this)
anxious, worry, interest, and down. Used to derive phq4. Frequency of having bad feeling over previous 2 weeks. 1 not at all, 2 is several days, 3 is more than half the days, 4 is nearly every day, 88 is missing, 99 is category not selected.
tenure: 3 is rented. Used to select renters.
est_st: FIPS code for state, used to merge
state: Name of state
abbv: Two letter abbreviation of state

Reading in the data

Importing the data for Household Pulse Survey phase 3.2 to 3.4.

Code

hps34 <- read.csv("./data/HPS_Week34_PUF_CSV/pulse2021_puf_34.csv")

hps35 <- read.csv("./data/HPS_Week35_PUF_CSV/pulse2021_puf_35.csv")

hps36 <- read.csv("./data/HPS_Week36_PUF_CSV/pulse2021_puf_36.csv")

hps37 <- read.csv("./data/HPS_Week37_PUF_CSV/pulse2021_puf_37.csv")

hps38 <- read.csv("./data/HPS_Week38_PUF_CSV/pulse2021_puf_38.csv")

hps39 <- read.csv("./data/HPS_Week39_PUF_CSV/pulse2021_puf_39.csv")

hps40 <- read.csv("./data/HPS_Week40_PUF_CSV/pulse2021_puf_40.csv")

hps41 <- read.csv("./data/HPS_Week41_PUF_CSV/pulse2022_puf_41.csv")

hps42 <- read.csv("./data/HPS_Week42_PUF_CSV/pulse2022_puf_42.csv")

hps43 <- read.csv("./data/HPS_Week43_PUF_CSV/pulse2022_puf_43.csv")

hps44 <- read.csv("./data/HPS_Week44_PUF_CSV/pulse2022_puf_44.csv")

hps45 <- read.csv("./data/HPS_Week45_PUF_CSV/pulse2022_puf_45.csv")

Filtering variables.

Code

names(hps34) <- tolower(names(hps34))

hps34 <- hps34 %>%
  dplyr::select(week, income, tenure, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, hweight, pweight, est_st)

names(hps35) <- tolower(names(hps35))

hps35 <- hps35 %>%
  dplyr::select(week, income, tenure, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, hweight, pweight, est_st)

names(hps36) <- tolower(names(hps36))

hps36 <- hps36 %>%
  dplyr::select(week, income, tenure, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, hweight, pweight, est_st)

names(hps37) <- tolower(names(hps37))

hps37 <- hps37 %>%
  dplyr::select(week, income, tenure, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, hweight, pweight, est_st)

names(hps38) <- tolower(names(hps38))

hps38 <- hps38 %>%
  dplyr::select(week, income, tenure, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, hweight, pweight, est_st)

names(hps39) <- tolower(names(hps39))

hps39 <- hps39 %>%
  dplyr::select(week, income, tenure, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, hweight, pweight, est_st)

names(hps40) <- tolower(names(hps40))

hps40 <- hps40 %>%
  dplyr::select(week, income, tenure, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, hweight, pweight, est_st)

names(hps41) <- tolower(names(hps41))

hps41 <- hps41 %>%
  dplyr::select(week, income, tenure, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, hweight, pweight, est_st)

names(hps42) <- tolower(names(hps42))

hps42 <- hps42 %>%
  dplyr::select(week, income, tenure, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, hweight, pweight, est_st)

names(hps43) <- tolower(names(hps43))

hps43 <- hps43 %>%
  dplyr::select(week, income, tenure, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, hweight, pweight, est_st)

names(hps44) <- tolower(names(hps44))

hps44 <- hps44 %>%
  dplyr::select(week, income, tenure, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, hweight, pweight, est_st)

names(hps45) <- tolower(names(hps45))

hps45 <- hps45 %>%
  dplyr::select(week, income, tenure, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, hweight, pweight, est_st)

Combining into one dataframe.

Code

hps <- rbind(hps34, hps35, hps36, hps37, hps38, hps39, hps40, hps41, hps42, hps43, hps44, hps45)

nrow(hps)

[1] 803905

Code

statedat <- read.csv("./data/state_policy.csv")

nrow(statedat)

[1] 51

Joining hps with state.

Code

alldata <- merge(x=hps, y=statedat, by='est_st', all.x=TRUE)

nrow(alldata)

[1] 803905

Selecting variables from alldata for a working dataframe.

Code

dat <- alldata %>% 
  dplyr::select(week, income, tenure, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, hweight, pweight, est_st, state, abbv, score)

nrow(dat)

[1] 803905

Cleaning and exploring the data

Filter for only renters

First I will filter for renters. This will exclude homeowners.

Check the distribution of tenure. Category “3” is renters.

Code

tabyl(dat$tenure)

 dat$tenure      n     percent
        -99   6498 0.008083045
        -88 117839 0.146583241
          1 194842 0.242369434
          2 316637 0.393873654
          3 159649 0.198591873
          4   8440 0.010498753

Filter for only renters.

Code

dat <- dat %>%
  filter(.$tenure == 3)

nrow(dat)

[1] 159649

Clean and recode mental health variables

Make dichotomized variables for depression and anxiety.

The HH Pulse variables for “anxious”, “worry”, “interest”, and “down” are scored from 1 to 4, but the clinical scale is from 0 to 3. Therefore, the variables must be adjusted to the 0 to 3 scale by subtracting 1.

First, eliminate NA’s and subtract 1 from each scale of anxious, worry, interest, and down.

Code

tabyl(dat$anxious)

 dat$anxious     n     percent
         -99   248 0.001553408
           1 50699 0.317565409
           2 53981 0.338123007
           3 21680 0.135797907
           4 33041 0.206960269

Code

dat <- dat %>%
  filter(anxious %in% c(1:4)) %>% 
  mutate(anxious = anxious -1)

tabyl(dat$anxious)

 dat$anxious     n   percent
           0 50699 0.3180595
           1 53981 0.3386491
           2 21680 0.1360092
           3 33041 0.2072823

Code

nrow(dat)

[1] 159401

Code

tabyl(dat$worry)

 dat$worry     n     percent
       -99   230 0.001442902
         1 62608 0.392770434
         2 52143 0.327118400
         3 19014 0.119284070
         4 25406 0.159384195

Code

dat <- dat %>%
  filter(worry %in% c(1:4)) %>% 
  mutate(worry = worry -1)

tabyl(dat$worry)

 dat$worry     n   percent
         0 62608 0.3933380
         1 52143 0.3275911
         2 19014 0.1194564
         3 25406 0.1596145

Code

nrow(dat)

[1] 159171

Code

tabyl(dat$interest)

 dat$interest     n     percent
          -99   245 0.001539225
            1 67827 0.426126619
            2 51063 0.320805926
            3 20052 0.125977722
            4 19984 0.125550509

Code

dat <- dat %>%
  filter(interest %in% c(1:4)) %>% 
  mutate(interest = interest -1)

tabyl(dat$interest)

 dat$interest     n   percent
            0 67827 0.4267835
            1 51063 0.3213005
            2 20052 0.1261719
            3 19984 0.1257441

Code

nrow(dat)

[1] 158926

Code

tabyl(dat$down)

 dat$down     n      percent
      -99   140 0.0008809131
        1 66802 0.4203339919
        2 52857 0.3325887520
        3 17800 0.1120018122
        4 21327 0.1341945308

Code

dat <- dat %>%
  filter(down %in% c(1:4)) %>% 
  mutate(down = down -1)

tabyl(dat$down)

 dat$down     n   percent
        0 66802 0.4207046
        1 52857 0.3328820
        2 17800 0.1121006
        3 21327 0.1343128

Code

nrow(dat)

[1] 158786

Next, make variable “depression” consisting of the sum of “interest” and “down” and make a variable “anxiety” consisting of the sum of “anxious” and “worry.”

Code

dat <- dat %>% 
  mutate(depression = interest + down) %>%
  mutate(anxiety = anxious + worry)

tabyl(dat$depression)

 dat$depression     n    percent
              0 55979 0.35254368
              1 19857 0.12505511
              2 36999 0.23301173
              3 11297 0.07114607
              4 12797 0.08059275
              5  6625 0.04172282
              6 15232 0.09592785

Code

tabyl(dat$anxiety)

 dat$anxiety     n    percent
           0 45985 0.28960362
           1 18420 0.11600519
           2 37485 0.23607245
           3 12093 0.07615911
           4 14257 0.08978751
           5  8065 0.05079163
           6 22481 0.14158049

Code

hist(dat$depression)

Code

hist(dat$anxiety)

Next, dichotomize “depression” and “anxiety”. ‘phq2’ is the dichotomized depression variable. ‘gad2’ is the dichotomized anxiety variable.

Code

dat <- dat %>%
  mutate(phq2 = case_when(depression %in% c(0:2) ~ 0,
                          depression %in% c(3:6) ~1,
                          )
         )

tabyl(dat$phq2)

 dat$phq2      n   percent
        0 112835 0.7106105
        1  45951 0.2893895

Code

dat <- dat %>%
  mutate(gad2 = case_when(anxiety %in% c(0:2) ~ 0,
                          anxiety %in% c(3:6) ~1,
                          )
         )

tabyl(dat$gad2)

 dat$gad2      n   percent
        0 101890 0.6416813
        1  56896 0.3583187

Code

nrow(dat)

[1] 158786

Clean and recode ‘rentcur’

Then I will check the distribution of ‘rentcur’ and remove observations where ‘rentcur’ is missing. In the original dataset, 1 is “caught up on rent” and 2 is “Not caught up on rent.”

Then I will create a dummy variable for ‘rentcur’ so that the value of ‘1’ means ‘caught up on rent’ and the value of ‘0’ is ‘not caught up on rent’.

Checking distribution of rentcur.

Code

tabyl(dat$rentcur)

 dat$rentcur      n      percent
         -99    300 0.0018893353
         -88    149 0.0009383699
           1 140233 0.8831572053
           2  18104 0.1140150895

Removing missing values for rentcur.

Code

dat <- dat %>% 
  filter(.$rentcur %in% c(1:2))

Checking distribution of rentcur.

Code

tabyl(dat$rentcur)

 dat$rentcur      n   percent
           1 140233 0.8856616
           2  18104 0.1143384

Code

nrow(dat)

[1] 158337

Dichotomize rentcur. “Caught up on rent” = 1 and “Not caught up on rent” = 0

Code

dat$rentcur <- car::recode(dat$rentcur, "1=1; else=0")
tabyl(dat$rentcur)

 dat$rentcur      n   percent
           0  18104 0.1143384
           1 140233 0.8856616

Code

nrow(dat)

[1] 158337

Clean and recode ‘evict’

Checking distribution of evict.

evict: Eviction in next two months. 1 is very likely, 2 is somewhat likeley, 3 is not very likely, 4 is not likely at all, 88 and 99. Only asked if rentcur = 2.

Code

tabyl(dat$evict)

 dat$evict      n      percent
       -99    133 0.0008399805
       -88 140663 0.8883773218
         1   2489 0.0157196360
         2   4961 0.0313319060
         3   5241 0.0331002861
         4   4850 0.0306308696

This needs to be recoded so that 0 is current on rent (not at risk of eviction due to late on rent), 1 is not likely at all, 2 is not very likely, 3 is somewhat likely, and 4 is very likely.

Code ‘evict’ = 0 when rentcur = 1. Change other codes as well.

Code

dat <- dat %>%
  mutate(evict = case_when(rentcur == 1 ~ 0,
                           evict == 4 ~ 1,
                           evict == 3 ~ 2,
                           evict == 2 ~ 3,
                           evict == 1 ~ 4
                           )
         )

tabyl(dat$evict)

 dat$evict      n     percent valid_percent
         0 140233 0.885661595    0.88882199
         1   4850 0.030630870    0.03074017
         2   5241 0.033100286    0.03321840
         3   4961 0.031331906    0.03144371
         4   2489 0.015719636    0.01577573
        NA    563 0.003555707            NA

Filer out NAs.

Code

dat <- dat %>%
  filter(evict %in% c(0:4))

tabyl(dat$evict)

 dat$evict      n    percent
         0 140233 0.88882199
         1   4850 0.03074017
         2   5241 0.03321840
         3   4961 0.03144371
         4   2489 0.01577573

Code

nrow(dat)

[1] 157774

Dichotomizing eviction risk

Code

dat$evict0 <- car::recode(dat$evict, "0=1; else=0")
tabyl(dat$evict0)

 dat$evict0      n  percent
          0  17541 0.111178
          1 140233 0.888822

Code

dat$evict1 <- car::recode(dat$evict, "1=1; else=0")
tabyl(dat$evict1)

 dat$evict1      n    percent
          0 152924 0.96925983
          1   4850 0.03074017

Code

dat$evict2 <- car::recode(dat$evict, "2=1; else=0")
tabyl(dat$evict2)

 dat$evict2      n   percent
          0 152533 0.9667816
          1   5241 0.0332184

Code

dat$evict3 <- car::recode(dat$evict, "3=1; else=0")
tabyl(dat$evict3)

 dat$evict3      n    percent
          0 152813 0.96855629
          1   4961 0.03144371

Code

dat$evict4 <- car::recode(dat$evict, "4=1; else=0")
tabyl(dat$evict4)

 dat$evict4      n    percent
          0 155285 0.98422427
          1   2489 0.01577573

Check distribution of state policy variable

Code

tabyl(dat, score)

 score     n     percent
  0.00 14527 0.092074740
  0.23  3823 0.024230862
  0.30  1403 0.008892466
  0.38  4164 0.026392181
  0.53  2433 0.015420792
  0.60  2575 0.016320813
  0.63  1319 0.008360059
  0.75  1103 0.006991012
  0.78  3850 0.024401993
  0.83  5070 0.032134572
  1.03  2780 0.017620140
  1.15  2308 0.014628519
  1.23  2426 0.015376425
  1.28 11747 0.074454600
  1.33 16796 0.106456070
  1.43  2621 0.016612370
  1.58  7060 0.044747550
  1.84  2751 0.017436333
  1.98  1855 0.011757324
  2.08  2851 0.018070151
  2.29  1754 0.011117168
  2.30  1614 0.010229822
  2.48  1521 0.009640372
  2.65  1798 0.011396048
  2.73  4952 0.031386667
  2.78  1925 0.012200996
  3.00  2438 0.015452483
  3.08  3199 0.020275838
  3.13  3651 0.023140695
  3.15  2153 0.013646101
  3.25  4568 0.028952806
  3.33  3168 0.020079354
  3.38  3817 0.024192833
  3.73  2462 0.015604599
  3.75  6988 0.044291201
  3.78  2568 0.016276446
  3.80  2658 0.016846882
  3.88  1283 0.008131885
  4.03  4965 0.031469063
  4.30  3251 0.020605423
  4.63  3579 0.022684346

Clean up covariates

Clean income

income: Total household income (before taxes).
1. Less than $25,000
1. $25,000 - $34,999
1. $35,000 - $49,999
1. $50,000 - $74,999
1. $75,000 - $99,999
1. $100,000 - $149,999
1. $150,000 - $199,999
1. $200,000 and above
1. Question seen but category not selected
1. Missing / Did not report

Check distribution of income.

Code

tabyl(dat$income)

 dat$income     n    percent
        -99  1863 0.01180803
        -88  5550 0.03517690
          1 37181 0.23565987
          2 22896 0.14511897
          3 22670 0.14368654
          4 26638 0.16883644
          5 15727 0.09968056
          6 14206 0.09004018
          7  5614 0.03558254
          8  5429 0.03440998

Filter out rows with missing income data.

Code

dat <- dat %>% 
  filter(.$income %in% c(1:8))

tabyl(dat$income)

 dat$income     n    percent
          1 37181 0.24727822
          2 22896 0.15227353
          3 22670 0.15077048
          4 26638 0.17716030
          5 15727 0.10459494
          6 14206 0.09447929
          7  5614 0.03733681
          8  5429 0.03610644

Code

nrow(dat)

[1] 150361

Histogram of income levels.

Code

hist(dat$income)

Clean age

age: Derived from tbirth_year (year of birth)

Creating a new variable ‘age’ by subtracting birth year from the year the data were collected. (How can I fix this since the data were collected in 2 years?)

Code

dat$age <- 2021-dat$tbirth_year

tabyl(dat$age)

 dat$age    n      percent
      17   40 0.0002660264
      18  240 0.0015961586
      19  418 0.0027799762
      20  737 0.0049015370
      21 1068 0.0071029057
      22 1541 0.0102486682
      23 1970 0.0131018017
      24 2594 0.0172518140
      25 3059 0.0203443712
      26 3268 0.0217343593
      27 3481 0.0231509500
      28 3702 0.0246207461
      29 3764 0.0250330870
      30 3880 0.0258045637
      31 3767 0.0250530390
      32 3715 0.0247072047
      33 3676 0.0244478289
      34 3502 0.0232906139
      35 3327 0.0221267483
      36 3493 0.0232307580
      37 3401 0.0226188972
      38 3377 0.0224592813
      39 3456 0.0229846835
      40 3378 0.0224659320
      41 3318 0.0220668923
      42 3165 0.0210493413
      43 2944 0.0195795452
      44 2971 0.0197591131
      45 2770 0.0184223303
      46 2691 0.0178969281
      47 2606 0.0173316219
      48 2579 0.0171520541
      49 2621 0.0174313818
      50 2675 0.0177905175
      51 2786 0.0185287408
      52 2544 0.0169192809
      53 2401 0.0159682364
      54 2292 0.0152433144
      55 2387 0.0158751272
      56 2410 0.0160280924
      57 2404 0.0159881884
      58 2455 0.0163273721
      59 2457 0.0163406734
      60 2355 0.0156623061
      61 2403 0.0159815378
      62 2316 0.0154029303
      63 2218 0.0147511655
      64 2314 0.0153896290
      65 2125 0.0141326541
      66 2166 0.0144053312
      67 2083 0.0138533263
      68 1928 0.0128224739
      69 1761 0.0117118136
      70 1578 0.0104947427
      71 1464 0.0097365673
      72 1336 0.0088852828
      73 1286 0.0085527497
      74 1255 0.0083465792
      75  988 0.0065708528
      76  785 0.0052207687
      77  694 0.0046155586
      78  620 0.0041234097
      79  554 0.0036844661
      80  485 0.0032255705
      81  377 0.0025072991
      82  342 0.0022745260
      83  298 0.0019818969
      84  232 0.0015429533
      85  218 0.0014498440
      86  160 0.0010641057
      87  383 0.0025472031
      88  307 0.0020417528

Code

hist(dat$age)

Checking sample size.

Code

nrow(dat)

[1] 150361

Clean genid_describe

genid_describe: Current gender identity. 1 is Male, 2 is Female, 3 is Transgender, 4 is none of these. 99 is ‘question seen but category not selected.’

Checking the distribution of gender identity.

Code

tabyl(dat$genid_describe)

 dat$genid_describe     n     percent
                -99   803 0.005340481
                  1 54773 0.364276641
                  2 91561 0.608941148
                  3  1029 0.006843530
                  4  2195 0.014598200

Dichotomizing genid_describe.

Code

dat$male <- car::recode(dat$genid_describe, "1=1; else=0") #gen1 = male
tabyl(dat$male)

 dat$male     n   percent
        0 95588 0.6357234
        1 54773 0.3642766

Code

dat$female <- car::recode(dat$genid_describe, "2=1; else=0") #gen2 = female
tabyl(dat$female)

 dat$female     n   percent
          0 58800 0.3910589
          1 91561 0.6089411

Code

dat$transgender <- car::recode(dat$genid_describe, "3=1; else=0") #gen3 = transgender
tabyl(dat$transgender)

 dat$transgender      n    percent
               0 149332 0.99315647
               1   1029 0.00684353

Code

dat$gen_none <- car::recode(dat$genid_describe, "4=1; else=0") #gen4 = None of these
tabyl(dat$gen_none)

 dat$gen_none      n   percent
            0 148166 0.9854018
            1   2195 0.0145982

Code

dat$gen_notsel <- car::recode(dat$genid_describe, "-99=1; else=0") #gen5 = not selected
tabyl(dat$gen_notsel)

 dat$gen_notsel      n     percent
              0 149558 0.994659519
              1    803 0.005340481

Checking sample size.

Code

nrow(dat)

[1] 150361

Clean race_eth

race_eth: Categorical variable derived from ‘rhispanic’ and ‘rrace’.

Combine race/ethnicity into a race_eth variable.

Code

tabyl(dat$rrace)

 dat$rrace      n    percent
         1 113187 0.75276834
         2  19087 0.12694116
         3   7974 0.05303237
         4  10113 0.06725813

Code

tabyl(dat$rhispanic)

 dat$rhispanic      n   percent
             1 131125 0.8720679
             2  19236 0.1279321

Code

dat <- dat %>%
  mutate(race_eth = case_when(.$rhispanic == 1 & .$rrace == 1 ~"nh_white",
                              .$rhispanic == 1 & .$rrace == 2 ~"nh_black",
                              .$rhispanic == 1 & .$rrace == 3 ~"nh_asian",
                              .$rhispanic == 1 & .$rrace == 4 ~"other",
                              .$rhispanic == 2 & .$rrace %in% c(1:4) ~ "hispanic"
                              )
        )

tabyl(dat$race_eth)

 dat$race_eth     n    percent
     hispanic 19236 0.12793211
     nh_asian  7446 0.04952082
     nh_black 17861 0.11878745
     nh_white 98063 0.65218374
        other  7755 0.05157587

Dichotomizing race and ethnicity.

Code

dat$hispanic <- car::recode(dat$race_eth, "'hispanic'=1; else=0")
tabyl(dat$hispanic)

 dat$hispanic      n   percent
            0 131125 0.8720679
            1  19236 0.1279321

Code

dat$nh_white <- car::recode(dat$race_eth, "'nh_white'=1; else=0")
tabyl(dat$nh_white)

 dat$nh_white     n   percent
            0 52298 0.3478163
            1 98063 0.6521837

Code

dat$nh_black <- car::recode(dat$race_eth, "'nh_black'=1; else=0")
tabyl(dat$nh_black)

 dat$nh_black      n   percent
            0 132500 0.8812125
            1  17861 0.1187875

Code

dat$nh_asian <- car::recode(dat$race_eth, "'nh_asian'=1; else=0")
tabyl(dat$nh_asian)

 dat$nh_asian      n    percent
            0 142915 0.95047918
            1   7446 0.04952082

Code

dat$other <- car::recode(dat$race_eth, "'other'=1; else=0")
tabyl(dat$other)

 dat$other      n    percent
         0 142606 0.94842413
         1   7755 0.05157587

Checking sample size.

Code

nrow(dat)

[1] 150361

Clean single_adult

single_adult: Dummy variable for single adult derived from ‘thhld_numadlt’.

Dichotomizing number of adults in household. Single adult is coded as “1”, multiple adults is coded as “0”.

Code

tabyl(dat$thhld_numadlt)

 dat$thhld_numadlt     n      percent
                 1 60088 0.3996249027
                 2 65608 0.4363365500
                 3 15704 0.1044419763
                 4  5988 0.0398241565
                 5  1889 0.0125630981
                 6   580 0.0038573832
                 7   206 0.0013700361
                 8    84 0.0005586555
                 9   110 0.0007315727
                10   104 0.0006916687

Code

dat$single_adult <- car::recode(dat$thhld_numadlt, "1=1; else=0")
tabyl(dat$single_adult)

 dat$single_adult     n   percent
                0 90273 0.6003751
                1 60088 0.3996249

Checking sample size.

Code

nrow(dat)

[1] 150361

Clean thhld_numper

thhld_numper: Total number of people in household

Distribution of number of people in household. I’m leaving this as an integer.

Code

tabyl(dat$thhld_numper)

 dat$thhld_numper     n     percent
                1 48881 0.325090948
                2 49287 0.327791116
                3 22746 0.151275929
                4 15835 0.105313213
                5  7643 0.050831000
                6  3340 0.022213207
                7  1301 0.008652510
                8   650 0.004322929
                9   236 0.001569556
               10   442 0.002939592

Checking sample size.

Code

nrow(dat)

[1] 150361

Clean child

child: Dummy variable derived from ‘thhld_numkid’

Check distribution of ‘thhld_numkid’.

Code

tabyl(dat$thhld_numkid)

 dat$thhld_numkid      n     percent
                0 106423 0.707783268
                1  21924 0.145809086
                2  13416 0.089225265
                3   5532 0.036791455
                4   1980 0.013168308
                5   1086 0.007222618

Dichotomizing presence of children. Children in household is coded as “1”, no children is “0”.

Code

dat$children <- car::recode(dat$thhld_numkid, "0=0; else=1")
tabyl(dat$children)

 dat$children      n   percent
            0 106423 0.7077833
            1  43938 0.2922167

Clean eeduc

eeduc: Educational attainment. 1 is less than high school, 2 is some high school, 3 is high school graduate, 4 is some college but no degree, 5 is associates degree, 6 is bachelor’s degree, 7 is graduate degree.

For educational attainment, dichotomize into “less than high school”, “high school”, “some college”, “bachelor’s or higher”.

Code

tabyl(dat$eeduc)

 dat$eeduc     n    percent
         1  1527 0.01015556
         2  3418 0.02273196
         3 21486 0.14289610
         4 37721 0.25086957
         5 16022 0.10655689
         6 40541 0.26962444
         7 29646 0.19716549

Dichotomizing educational attainment.

Code

dat$lessthanhs <- car::recode(dat$eeduc, "1=1; 2=1; else=0")
tabyl(dat$lessthanhs)

 dat$lessthanhs      n    percent
              0 145416 0.96711248
              1   4945 0.03288752

Code

dat$highschool <- car::recode(dat$eeduc, "3=1; else=0")
tabyl(dat$highschool)

 dat$highschool      n   percent
              0 128875 0.8571039
              1  21486 0.1428961

Code

dat$somecollege <- car::recode(dat$eeduc, "4=1; 5=1; else=0")
tabyl(dat$somecollege)

 dat$somecollege     n   percent
               0 96618 0.6425735
               1 53743 0.3574265

Code

dat$bachelors <- car::recode(dat$eeduc, "6=1; 7=1; else=0")
tabyl(dat$bachelors)

 dat$bachelors     n   percent
             0 80174 0.5332101
             1 70187 0.4667899

Checking sample size.

Code

nrow(dat)

[1] 150361

Visualizations

Table of phq2 by rentcur.

Code

tabyl(dat, rentcur, phq2)

 rentcur     0     1
       0  8767  7901
       1 97939 35754

Table of gad2 by rentcur.

Code

tabyl(dat, rentcur, gad2)

 rentcur     0     1
       0  7395  9273
       1 88855 44838

Table of phq2 by evict.

Code

tabyl(dat, evict, phq2)

 evict     0     1
     0 97939 35754
     1  3272  1274
     2  2726  2291
     3  2016  2729
     4   753  1607

Table of gad2 by evict.

Code

tabyl(dat, evict, gad2)

 evict     0     1
     0 88855 44838
     1  2959  1587
     2  2295  2722
     3  1560  3185
     4   581  1779

Checking count of observations in each state.

Code

tabyl(dat, state)

                state     n     percent
              Alabama  1489 0.009902834
               Alaska  2302 0.015309821
              Arizona  3650 0.024274912
             Arkansas  1679 0.011166459
           California 16018 0.106530284
             Colorado  3662 0.024354720
          Connecticut  2423 0.016114551
             Delaware  1199 0.007974142
 District of Columbia  3422 0.022758561
              Florida  5191 0.034523580
              Georgia  3199 0.021275464
               Hawaii  2038 0.013554047
                Idaho  2370 0.015762066
             Illinois  3072 0.020430830
              Indiana  2181 0.014505091
                 Iowa  1839 0.012230565
               Kansas  2499 0.016620001
             Kentucky  1712 0.011385931
            Louisiana  1560 0.010375031
                Maine  1523 0.010128956
             Maryland  3131 0.020823219
        Massachusetts  4696 0.031231503
             Michigan  3010 0.020018489
            Minnesota  2544 0.016919281
          Mississippi  1023 0.006803626
             Missouri  2318 0.015416232
              Montana  1689 0.011232966
             Nebraska  2455 0.016327372
               Nevada  3081 0.020490686
        New Hampshire  2353 0.015649005
           New Jersey  2684 0.017850373
           New Mexico  2670 0.017757264
             New York  4395 0.029229654
       North Carolina  2614 0.017384827
         North Dakota  1318 0.008765571
                 Ohio  2244 0.014924083
             Oklahoma  2054 0.013660457
               Oregon  4750 0.031590639
         Pennsylvania  3484 0.023170902
         Rhode Island  1745 0.011605403
       South Carolina  1524 0.010135607
         South Dakota  1278 0.008499544
            Tennessee  2317 0.015409581
                Texas  8053 0.053557771
                 Utah  3340 0.022213207
              Vermont  1446 0.009616855
             Virginia  3676 0.024447829
           Washington  6744 0.044852056
        West Virginia  1246 0.008286723
            Wisconsin  2347 0.015609101
              Wyoming  1104 0.007342329

Lets see how ‘evict’ is distributed across states.

Code

dat %>%
  tabyl(., state, evict)

                state     0   1   2   3   4
              Alabama  1257  48  69  67  48
               Alaska  2044  64  81  79  34
              Arizona  3362  61  91  96  40
             Arkansas  1417  50  85  80  47
           California 14295 547 477 460 239
             Colorado  3403  58  90  76  35
          Connecticut  2088 106 107  81  41
             Delaware  1030  47  59  43  20
 District of Columbia  3127 111  92  69  23
              Florida  4554 184 174 169 110
              Georgia  2749  83 119 137 111
               Hawaii  1832  68  69  55  14
                Idaho  2196  36  66  54  18
             Illinois  2660 116 125 112  59
              Indiana  1924  71  91  65  30
                 Iowa  1639  51  76  57  16
               Kansas  2263  81  62  59  34
             Kentucky  1500  43  61  71  37
            Louisiana  1283  52  68 101  56
                Maine  1380  38  60  35  10
             Maryland  2675 134 133 118  71
        Massachusetts  4199 188 144 112  53
             Michigan  2611 114 125 109  51
            Minnesota  2325  72  67  55  25
          Mississippi   820  40  57  72  34
             Missouri  2059  51 102  64  42
              Montana  1546  43  44  39  17
             Nebraska  2231  61  65  70  28
               Nevada  2762  79  74 113  53
        New Hampshire  2157  60  68  48  20
           New Jersey  2338 113 113  81  39
           New Mexico  2368  78  85  98  41
             New York  3766 211 209 138  71
       North Carolina  2317  74  88  88  47
         North Dakota  1204  25  29  40  20
                 Ohio  1997  50  77  78  42
             Oklahoma  1749  60  90 105  50
               Oregon  4291  96 137 157  69
         Pennsylvania  3036 135 130 116  67
         Rhode Island  1519  70  66  68  22
       South Carolina  1295  54  59  82  34
         South Dakota  1152  35  47  31  13
            Tennessee  2019  65  93  93  47
                Texas  7098 241 247 310 157
                 Utah  3138  61  55  52  34
              Vermont  1343  31  41  19  12
             Virginia  3315 125  98  98  40
           Washington  6177 147 188 154  78
        West Virginia  1070  36  51  67  22
            Wisconsin  2127  56  71  67  26
              Wyoming   986  26  42  37  13

Variation across states

The key thing for the model is to seee how much variation is there across states.

Scatterplot - aggregating values by state and graphing it. Calculate percent on phq2 and gad2. First do the mean on the raw HH Pulse mental health score. For each state you have the score on the eviction policy. State eviction score is the unit of analysis on X axis score. On the Y axis you have mean mental health score by state or percent with bad mental health by state.

Here I create a new dataframe that takes the data in dat and groups them by state. Then I find the mean mental health number for each state. I create a dataframe that contains the state and the mean mental health score of renters who are late on rent in each state.

Code

statemh <- dat %>%
  group_by(abbv) %>%
  summarize(mean_mh = mean(anxious+worry+interest+down), mh_n=n())

statedat <- merge(x=statemh, y=statedat, by='abbv', all.x=TRUE)

nrow(statedat)

[1] 51

Code

ggplot(statedat, aes(x=score, y=mean_mh, label=abbv)) +
    geom_point() +
    geom_text(hjust=-0.3, vjust=0.5) +
    geom_smooth(method=lm) +
    labs(title = "Mean Mental Health of All Renters by State",
          x = "State COVID Housing Policy Score",
          y = "Mean Mental Health")

`geom_smooth()` using formula = 'y ~ x'

Warning: The following aesthetics were dropped during statistical transformation: label
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

Pearson correlation test between State COVID-19 Score and the mean mental health among all renters.

Code

test <- cor.test(statedat$score, statedat$mean_mh, method = "spearman")

Warning in cor.test.default(statedat$score, statedat$mean_mh, method =
"spearman"): Cannot compute exact p-value with ties

Code

test


    Spearman's rank correlation rho

data:  statedat$score and statedat$mean_mh
S = 29127, p-value = 0.02299
alternative hypothesis: true rho is not equal to 0
sample estimates:
       rho 
-0.3179432

Here I am finding the percent with depression.

Code

statebmh <- dat %>%
  group_by(abbv) %>%
  summarize(dep_pc = mean(phq2), dmh_n=n())

statedat <- merge(x=statebmh, y=statedat, by='abbv', all.x=TRUE)

nrow(statedat)

[1] 51

Here I am finding the percent with anxiety

Code

statebmh <- dat %>%
  group_by(abbv) %>%
  summarize(anx_pc = mean(gad2), amh_n=n())

statedat <- merge(x=statebmh, y=statedat, by='abbv', all.x=TRUE)

nrow(statedat)

[1] 51

Percent of Tenants with Depression by State

Code

ggplot(statedat, aes(x=score, y=dep_pc, label=abbv)) +
    geom_point() +
    geom_text(hjust=-0.3, vjust=0.5) +
    geom_smooth(method=lm) +
    labs(title = "Percent of Tenants with Depression",
          x = "State COVID Housing Policy Score",
          y = "Percent with Depression")

`geom_smooth()` using formula = 'y ~ x'

Warning: The following aesthetics were dropped during statistical transformation: label
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

Percent of Tenants with Anxiety by State

Code

ggplot(statedat, aes(x=score, y=anx_pc, label=abbv)) +
    geom_point() +
    geom_text(hjust=-0.3, vjust=0.5) +
    geom_smooth(method=lm) +
    labs(title = "Percent of Tenants with Anxiety",
          x = "State COVID Housing Policy Score",
          y = "Percent with Depression")

`geom_smooth()` using formula = 'y ~ x'

Warning: The following aesthetics were dropped during statistical transformation: label
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

Pearson correlation test between State COVID-19 Score and the percent with depression among all renters.

Code

test2 <- cor.test(statedat$score, statedat$dep_pc, method = "spearman")

Warning in cor.test.default(statedat$score, statedat$dep_pc, method =
"spearman"): Cannot compute exact p-value with ties

Code

test2


    Spearman's rank correlation rho

data:  statedat$score and statedat$dep_pc
S = 31365, p-value = 0.002199
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
-0.419212

Pearson correlation test between State COVID-19 Score and the percent with anxiety among all renters.

Code

test3 <- cor.test(statedat$score, statedat$anx_pc, method = "spearman")

Warning in cor.test.default(statedat$score, statedat$anx_pc, method =
"spearman"): Cannot compute exact p-value with ties

Code

test3


    Spearman's rank correlation rho

data:  statedat$score and statedat$anx_pc
S = 29660, p-value = 0.01401
alternative hypothesis: true rho is not equal to 0
sample estimates:
       rho 
-0.3420937

Plot of risk of eviction and depression.

Code

ggplot(dat, aes(evict, depression)) + 
  geom_boxplot(aes(group = evict)) +
  geom_smooth(method=lm) +
    labs(title = "Eviction Risk and Depression",
          x = "Risk of Eviction",
          y = "Depression")

`geom_smooth()` using formula = 'y ~ x'

Plot of risk of eviction and anxiety

Code

ggplot(dat, aes(evict, anxiety)) + 
  geom_boxplot(aes(group = evict)) +
  geom_smooth(method=lm) +
    labs(title = "Eviction Risk and Anxiety",
          x = "Risk of Eviction",
          y = "Anxiety")

`geom_smooth()` using formula = 'y ~ x'

Code

#dat$phq2 <- as.numeric(dat$phq2)

#dat$evict  <- as.numeric(dat$evict)

test4 <- cor.test(dat$evict, dat$depression, method = "spearman")

Warning in cor.test.default(dat$evict, dat$depression, method = "spearman"):
Cannot compute exact p-value with ties

Code

test4


    Spearman's rank correlation rho

data:  dat$evict and dat$depression
S = 4.7599e+14, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.1598694

Code

test5 <- cor.test(dat$evict, dat$anxiety, method = "spearman")

Warning in cor.test.default(dat$evict, dat$anxiety, method = "spearman"):
Cannot compute exact p-value with ties

Code

test5


    Spearman's rank correlation rho

data:  dat$evict and dat$anxiety
S = 4.7278e+14, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
     rho 
0.165534

Modelling

Plan for modelling.

Population: All renters in the HH Pulse Survey phases 3.2 to 3.4.

Dependent variables

phq2 = Dummy variable for depression. Sum ‘interest’ and ‘down’. Score of 3 or higher indicates major depressive disorder. Dichotomized.

gad2 = Dummy variable for anxiety. Sum ‘anxious’ and ‘worry’. Score of 3 or higher indicative of generalized anxiety disorder. Dichotomized.

Independent variables

rentcur: Caught up on rent. Dichotomized so that “Caught up on rent” = 1 and “Not caught up on rent” = 0. NAs removed.
evict: Risk of eviction in next two months. 0 is caught up on rent, 1 is not likely at all, 2 is not very likely, 3 is somewhat likeley, 4 is very likely. NAs removed.
evict0, evict1, evict2, evict3, evict4 - dichotomized eviction risk

Covariates

income: categorical income variable
age: numerical age in years
genid_describe: dichotomized gender ID
race_eth: dichotomized race/ethnicity
single_adult: dichotomized singe adult or mutiple adults in household
thhld_numper: number of people in household
children: dichotomized presence of children in household
eeduc: dichotomized educational attainment

Modeling

To do:

fix scatterplot showing state scores but grouped by eviction risk (have to think about this). I need to find the percent of evcition risk by state. The variable is dichotomized, so I can find the mean for each state. But what does that variable mean?
run models!