Surratt Eviction Risk SDA 9/18/23

Author

Brian Surratt

Published

September 19, 2023

Data Sources

Measures

Dependent variables

From Household Pulse Survey

phq2 = Dummy variable for depression. Sum ‘interest’ and ‘down’. Score of 3 or higher indicates major depressive disorder. Dichotomized.

gad2 = Dummy variable for anxiety. Sum ‘anxious’ and ‘worry’. Score of 3 or higher indicative of generalized anxiety disorder. Dichotomized.

See section 2.2.1 Mental Health in https://www.sciencedirect.com/science/article/pii/S0277953620307760

Independent variables

From Household Pulse Survey

  • rentcur: Caught up on rent. Dichotomized so that “Caught up on rent” = 1 and “Not caught up on rent” = 0. NAs removed.

  • evict: Risk of eviction in next two months. 0 is caught up on rent, 1 is not likely at all, 2 is not very likely, 3 is somewhat likeley, 4 is very likely. NAs removed.

  • evict0, evict1, evict2, evict3, evict4 - dichotomized eviction risk

State Policy Variable

From COVID-19 Housing Policy Scoreboard

  • score: State score on the COVID-19 Housing Policy Scorecard. Value is on a scale from 0 to 5, with the highest score being 4.63.

Covariates

(from Household Pulse Survey)

  • income: Total household income (before taxes).
    1. Less than $25,000
    1. $25,000 - $34,999
    1. $35,000 - $49,999
    1. $50,000 - $74,999
    1. $75,000 - $99,999
    1. $100,000 - $149,999
    1. $150,000 - $199,999
    1. $200,000 and above
    1. Question seen but category not selected
    1. Missing / Did not report
  • age: Derived from tbirth_year (year of birth)
  • genid_describe: Current gender identity. 1 is Male, 2 is Female, 3 is Transgender, 4 is none of these, 88, 99.
  • race_eth: Categorical variable derived from ‘rhispanic’ and ‘rrace’. (Describe coding)
  • single_adult: Dummy variable for single adult in household vs. multiple adults. Derived from ‘thhld_numadlt’.
  • thhld_numper: Total number of people in household
  • child: Dummy variable derived from ‘thhld_numkid’
  • eeduc: Educational attainment. 1 is less than high school, 2 is some high school, 3 is high school graduate, 4 is some college but no degree, 5 is associates degree, 6 is bachelor’s degree, 7 is graduate degree.

Other variables for cleaning

  • week: week of interview (review how to use this)
  • anxious, worry, interest, and down. Used to derive phq4. Frequency of having bad feeling over previous 2 weeks. 1 not at all, 2 is several days, 3 is more than half the days, 4 is nearly every day, 88 is missing, 99 is category not selected.
  • tenure: 3 is rented. Used to select renters.
  • est_st: FIPS code for state, used to merge
  • state: Name of state
  • abbv: Two letter abbreviation of state

Reading in the data

Importing the data for Household Pulse Survey phase 3.2 to 3.4.

Code
hps34 <- read.csv("./data/HPS_Week34_PUF_CSV/pulse2021_puf_34.csv")

hps35 <- read.csv("./data/HPS_Week35_PUF_CSV/pulse2021_puf_35.csv")

hps36 <- read.csv("./data/HPS_Week36_PUF_CSV/pulse2021_puf_36.csv")

hps37 <- read.csv("./data/HPS_Week37_PUF_CSV/pulse2021_puf_37.csv")

hps38 <- read.csv("./data/HPS_Week38_PUF_CSV/pulse2021_puf_38.csv")

hps39 <- read.csv("./data/HPS_Week39_PUF_CSV/pulse2021_puf_39.csv")

hps40 <- read.csv("./data/HPS_Week40_PUF_CSV/pulse2021_puf_40.csv")

hps41 <- read.csv("./data/HPS_Week41_PUF_CSV/pulse2022_puf_41.csv")

hps42 <- read.csv("./data/HPS_Week42_PUF_CSV/pulse2022_puf_42.csv")

hps43 <- read.csv("./data/HPS_Week43_PUF_CSV/pulse2022_puf_43.csv")

hps44 <- read.csv("./data/HPS_Week44_PUF_CSV/pulse2022_puf_44.csv")

hps45 <- read.csv("./data/HPS_Week45_PUF_CSV/pulse2022_puf_45.csv")

Filtering variables.

Code
names(hps34) <- tolower(names(hps34))

hps34 <- hps34 %>%
  dplyr::select(week, income, tenure, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, hweight, pweight, est_st)

names(hps35) <- tolower(names(hps35))

hps35 <- hps35 %>%
  dplyr::select(week, income, tenure, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, hweight, pweight, est_st)

names(hps36) <- tolower(names(hps36))

hps36 <- hps36 %>%
  dplyr::select(week, income, tenure, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, hweight, pweight, est_st)

names(hps37) <- tolower(names(hps37))

hps37 <- hps37 %>%
  dplyr::select(week, income, tenure, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, hweight, pweight, est_st)

names(hps38) <- tolower(names(hps38))

hps38 <- hps38 %>%
  dplyr::select(week, income, tenure, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, hweight, pweight, est_st)

names(hps39) <- tolower(names(hps39))

hps39 <- hps39 %>%
  dplyr::select(week, income, tenure, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, hweight, pweight, est_st)

names(hps40) <- tolower(names(hps40))

hps40 <- hps40 %>%
  dplyr::select(week, income, tenure, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, hweight, pweight, est_st)

names(hps41) <- tolower(names(hps41))

hps41 <- hps41 %>%
  dplyr::select(week, income, tenure, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, hweight, pweight, est_st)

names(hps42) <- tolower(names(hps42))

hps42 <- hps42 %>%
  dplyr::select(week, income, tenure, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, hweight, pweight, est_st)

names(hps43) <- tolower(names(hps43))

hps43 <- hps43 %>%
  dplyr::select(week, income, tenure, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, hweight, pweight, est_st)

names(hps44) <- tolower(names(hps44))

hps44 <- hps44 %>%
  dplyr::select(week, income, tenure, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, hweight, pweight, est_st)

names(hps45) <- tolower(names(hps45))

hps45 <- hps45 %>%
  dplyr::select(week, income, tenure, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, hweight, pweight, est_st)

Combining into one dataframe.

Code
hps <- rbind(hps34, hps35, hps36, hps37, hps38, hps39, hps40, hps41, hps42, hps43, hps44, hps45)

nrow(hps)
[1] 803905
Code
statedat <- read.csv("./data/state_policy.csv")

nrow(statedat)
[1] 51

Joining hps with state.

Code
alldata <- merge(x=hps, y=statedat, by='est_st', all.x=TRUE)

nrow(alldata)
[1] 803905

Selecting variables from alldata for a working dataframe.

Code
dat <- alldata %>% 
  dplyr::select(week, income, tenure, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, hweight, pweight, est_st, state, abbv, score)

nrow(dat)
[1] 803905

Cleaning and exploring the data

Filter for only renters

First I will filter for renters. This will exclude homeowners.

Check the distribution of tenure. Category “3” is renters.

Code
tabyl(dat$tenure)
 dat$tenure      n     percent
        -99   6498 0.008083045
        -88 117839 0.146583241
          1 194842 0.242369434
          2 316637 0.393873654
          3 159649 0.198591873
          4   8440 0.010498753

Filter for only renters.

Code
dat <- dat %>%
  filter(.$tenure == 3)

nrow(dat)
[1] 159649

Clean and recode mental health variables

Make dichotomized variables for depression and anxiety.

The HH Pulse variables for “anxious”, “worry”, “interest”, and “down” are scored from 1 to 4, but the clinical scale is from 0 to 3. Therefore, the variables must be adjusted to the 0 to 3 scale by subtracting 1.

First, eliminate NA’s and subtract 1 from each scale of anxious, worry, interest, and down.

Code
tabyl(dat$anxious)
 dat$anxious     n     percent
         -99   248 0.001553408
           1 50699 0.317565409
           2 53981 0.338123007
           3 21680 0.135797907
           4 33041 0.206960269
Code
dat <- dat %>%
  filter(anxious %in% c(1:4)) %>% 
  mutate(anxious = anxious - 1)

tabyl(dat$anxious)
 dat$anxious     n   percent
           0 50699 0.3180595
           1 53981 0.3386491
           2 21680 0.1360092
           3 33041 0.2072823
Code
nrow(dat)
[1] 159401
Code
tabyl(dat$worry)
 dat$worry     n     percent
       -99   230 0.001442902
         1 62608 0.392770434
         2 52143 0.327118400
         3 19014 0.119284070
         4 25406 0.159384195
Code
dat <- dat %>%
  filter(worry %in% c(1:4)) %>% 
  mutate(worry = worry -1)

tabyl(dat$worry)
 dat$worry     n   percent
         0 62608 0.3933380
         1 52143 0.3275911
         2 19014 0.1194564
         3 25406 0.1596145
Code
nrow(dat)
[1] 159171
Code
tabyl(dat$interest)
 dat$interest     n     percent
          -99   245 0.001539225
            1 67827 0.426126619
            2 51063 0.320805926
            3 20052 0.125977722
            4 19984 0.125550509
Code
dat <- dat %>%
  filter(interest %in% c(1:4)) %>% 
  mutate(interest = interest -1)

tabyl(dat$interest)
 dat$interest     n   percent
            0 67827 0.4267835
            1 51063 0.3213005
            2 20052 0.1261719
            3 19984 0.1257441
Code
nrow(dat)
[1] 158926
Code
tabyl(dat$down)
 dat$down     n      percent
      -99   140 0.0008809131
        1 66802 0.4203339919
        2 52857 0.3325887520
        3 17800 0.1120018122
        4 21327 0.1341945308
Code
dat <- dat %>%
  filter(down %in% c(1:4)) %>% 
  mutate(down = down -1)

tabyl(dat$down)
 dat$down     n   percent
        0 66802 0.4207046
        1 52857 0.3328820
        2 17800 0.1121006
        3 21327 0.1343128
Code
nrow(dat)
[1] 158786

Next, make variable “depression” consisting of the sum of “interest” and “down” and make a variable “anxiety” consisting of the sum of “anxious” and “worry.”

Code
dat <- dat %>% 
  mutate(depression = interest + down) %>%
  mutate(anxiety = anxious + worry)

tabyl(dat$depression)
 dat$depression     n    percent
              0 55979 0.35254368
              1 19857 0.12505511
              2 36999 0.23301173
              3 11297 0.07114607
              4 12797 0.08059275
              5  6625 0.04172282
              6 15232 0.09592785
Code
tabyl(dat$anxiety)
 dat$anxiety     n    percent
           0 45985 0.28960362
           1 18420 0.11600519
           2 37485 0.23607245
           3 12093 0.07615911
           4 14257 0.08978751
           5  8065 0.05079163
           6 22481 0.14158049
Code
hist(dat$depression)

Code
hist(dat$anxiety)

Next, dichotomize “depression” and “anxiety”. ‘phq2’ is the dichotomized depression variable. ‘gad2’ is the dichotomized anxiety variable.

Code
dat <- dat %>%
  mutate(phq2 = case_when(depression %in% c(0:2) ~ 0,
                          depression %in% c(3:6) ~1,
                          )
         )

tabyl(dat$phq2)
 dat$phq2      n   percent
        0 112835 0.7106105
        1  45951 0.2893895
Code
dat <- dat %>%
  mutate(gad2 = case_when(anxiety %in% c(0:2) ~ 0,
                          anxiety %in% c(3:6) ~1,
                          )
         )

tabyl(dat$gad2)
 dat$gad2      n   percent
        0 101890 0.6416813
        1  56896 0.3583187
Code
nrow(dat)
[1] 158786

Clean and recode ‘rentcur’

Then I will check the distribution of ‘rentcur’ and remove observations where ‘rentcur’ is missing. In the original dataset, 1 is “caught up on rent” and 2 is “Not caught up on rent.”

Then I will create a dummy variable for ‘rentcur’ so that the value of ‘1’ means ‘caught up on rent’ and the value of ‘0’ is ‘not caught up on rent’.

Checking distribution of rentcur.

Code
tabyl(dat$rentcur)
 dat$rentcur      n      percent
         -99    300 0.0018893353
         -88    149 0.0009383699
           1 140233 0.8831572053
           2  18104 0.1140150895

Removing missing values for rentcur.

Code
dat <- dat %>% 
  filter(.$rentcur %in% c(1:2))

Checking distribution of rentcur.

Code
tabyl(dat$rentcur)
 dat$rentcur      n   percent
           1 140233 0.8856616
           2  18104 0.1143384
Code
nrow(dat)
[1] 158337

Dichotomize rentcur. “Caught up on rent” = 1 and “Not caught up on rent” = 0

Code
dat$rentcur <- car::recode(dat$rentcur, "1=1; else=0")
tabyl(dat$rentcur)
 dat$rentcur      n   percent
           0  18104 0.1143384
           1 140233 0.8856616
Code
nrow(dat)
[1] 158337

Clean and recode ‘evict’

Checking distribution of evict.

  • evict: Eviction in next two months. 1 is very likely, 2 is somewhat likeley, 3 is not very likely, 4 is not likely at all, 88 and 99. Only asked if rentcur = 2.
Code
tabyl(dat$evict)
 dat$evict      n      percent
       -99    133 0.0008399805
       -88 140663 0.8883773218
         1   2489 0.0157196360
         2   4961 0.0313319060
         3   5241 0.0331002861
         4   4850 0.0306308696

This needs to be recoded so that 0 is current on rent (not at risk of eviction due to late on rent), 1 is not likely at all, 2 is not very likely, 3 is somewhat likely, and 4 is very likely.

Code ‘evict’ = 0 when rentcur = 1. Change other codes as well.

Code
dat <- dat %>%
  mutate(evict = case_when(rentcur == 1 ~ 0,
                           evict == 4 ~ 1,
                           evict == 3 ~ 2,
                           evict == 2 ~ 3,
                           evict == 1 ~ 4
                           )
         )

tabyl(dat$evict)
 dat$evict      n     percent valid_percent
         0 140233 0.885661595    0.88882199
         1   4850 0.030630870    0.03074017
         2   5241 0.033100286    0.03321840
         3   4961 0.031331906    0.03144371
         4   2489 0.015719636    0.01577573
        NA    563 0.003555707            NA

Filer out NAs.

Code
dat <- dat %>%
  filter(evict %in% c(0:4))

tabyl(dat$evict)
 dat$evict      n    percent
         0 140233 0.88882199
         1   4850 0.03074017
         2   5241 0.03321840
         3   4961 0.03144371
         4   2489 0.01577573
Code
nrow(dat)
[1] 157774

Dichotomizing eviction risk

Code
dat$evict0 <- car::recode(dat$evict, "0=1; else=0")
tabyl(dat$evict0)
 dat$evict0      n  percent
          0  17541 0.111178
          1 140233 0.888822
Code
dat$evict1 <- car::recode(dat$evict, "1=1; else=0")
tabyl(dat$evict1)
 dat$evict1      n    percent
          0 152924 0.96925983
          1   4850 0.03074017
Code
dat$evict2 <- car::recode(dat$evict, "2=1; else=0")
tabyl(dat$evict2)
 dat$evict2      n   percent
          0 152533 0.9667816
          1   5241 0.0332184
Code
dat$evict3 <- car::recode(dat$evict, "3=1; else=0")
tabyl(dat$evict3)
 dat$evict3      n    percent
          0 152813 0.96855629
          1   4961 0.03144371
Code
dat$evict4 <- car::recode(dat$evict, "4=1; else=0")
tabyl(dat$evict4)
 dat$evict4      n    percent
          0 155285 0.98422427
          1   2489 0.01577573

Check distribution of state policy variable

Code
tabyl(dat, score)
 score     n     percent
  0.00 14527 0.092074740
  0.23  3823 0.024230862
  0.30  1403 0.008892466
  0.38  4164 0.026392181
  0.53  2433 0.015420792
  0.60  2575 0.016320813
  0.63  1319 0.008360059
  0.75  1103 0.006991012
  0.78  3850 0.024401993
  0.83  5070 0.032134572
  1.03  2780 0.017620140
  1.15  2308 0.014628519
  1.23  2426 0.015376425
  1.28 11747 0.074454600
  1.33 16796 0.106456070
  1.43  2621 0.016612370
  1.58  7060 0.044747550
  1.84  2751 0.017436333
  1.98  1855 0.011757324
  2.08  2851 0.018070151
  2.29  1754 0.011117168
  2.30  1614 0.010229822
  2.48  1521 0.009640372
  2.65  1798 0.011396048
  2.73  4952 0.031386667
  2.78  1925 0.012200996
  3.00  2438 0.015452483
  3.08  3199 0.020275838
  3.13  3651 0.023140695
  3.15  2153 0.013646101
  3.25  4568 0.028952806
  3.33  3168 0.020079354
  3.38  3817 0.024192833
  3.73  2462 0.015604599
  3.75  6988 0.044291201
  3.78  2568 0.016276446
  3.80  2658 0.016846882
  3.88  1283 0.008131885
  4.03  4965 0.031469063
  4.30  3251 0.020605423
  4.63  3579 0.022684346

Clean up covariates

Clean income

  • income: Total household income (before taxes).
    1. Less than $25,000
    1. $25,000 - $34,999
    1. $35,000 - $49,999
    1. $50,000 - $74,999
    1. $75,000 - $99,999
    1. $100,000 - $149,999
    1. $150,000 - $199,999
    1. $200,000 and above
    1. Question seen but category not selected
    1. Missing / Did not report

Check distribution of income.

Code
tabyl(dat$income)
 dat$income     n    percent
        -99  1863 0.01180803
        -88  5550 0.03517690
          1 37181 0.23565987
          2 22896 0.14511897
          3 22670 0.14368654
          4 26638 0.16883644
          5 15727 0.09968056
          6 14206 0.09004018
          7  5614 0.03558254
          8  5429 0.03440998

Filter out rows with missing income data.

Code
dat <- dat %>% 
  filter(.$income %in% c(1:8))

tabyl(dat$income)
 dat$income     n    percent
          1 37181 0.24727822
          2 22896 0.15227353
          3 22670 0.15077048
          4 26638 0.17716030
          5 15727 0.10459494
          6 14206 0.09447929
          7  5614 0.03733681
          8  5429 0.03610644
Code
nrow(dat)
[1] 150361

Histogram of income levels.

Code
hist(dat$income)

Clean age

  • age: Derived from tbirth_year (year of birth)

Creating a new variable ‘age’ by subtracting birth year from the year the data were collected. (How can I fix this since the data were collected in 2 years?)

Code
dat$age <- 2021-dat$tbirth_year

tabyl(dat$age)
 dat$age    n      percent
      17   40 0.0002660264
      18  240 0.0015961586
      19  418 0.0027799762
      20  737 0.0049015370
      21 1068 0.0071029057
      22 1541 0.0102486682
      23 1970 0.0131018017
      24 2594 0.0172518140
      25 3059 0.0203443712
      26 3268 0.0217343593
      27 3481 0.0231509500
      28 3702 0.0246207461
      29 3764 0.0250330870
      30 3880 0.0258045637
      31 3767 0.0250530390
      32 3715 0.0247072047
      33 3676 0.0244478289
      34 3502 0.0232906139
      35 3327 0.0221267483
      36 3493 0.0232307580
      37 3401 0.0226188972
      38 3377 0.0224592813
      39 3456 0.0229846835
      40 3378 0.0224659320
      41 3318 0.0220668923
      42 3165 0.0210493413
      43 2944 0.0195795452
      44 2971 0.0197591131
      45 2770 0.0184223303
      46 2691 0.0178969281
      47 2606 0.0173316219
      48 2579 0.0171520541
      49 2621 0.0174313818
      50 2675 0.0177905175
      51 2786 0.0185287408
      52 2544 0.0169192809
      53 2401 0.0159682364
      54 2292 0.0152433144
      55 2387 0.0158751272
      56 2410 0.0160280924
      57 2404 0.0159881884
      58 2455 0.0163273721
      59 2457 0.0163406734
      60 2355 0.0156623061
      61 2403 0.0159815378
      62 2316 0.0154029303
      63 2218 0.0147511655
      64 2314 0.0153896290
      65 2125 0.0141326541
      66 2166 0.0144053312
      67 2083 0.0138533263
      68 1928 0.0128224739
      69 1761 0.0117118136
      70 1578 0.0104947427
      71 1464 0.0097365673
      72 1336 0.0088852828
      73 1286 0.0085527497
      74 1255 0.0083465792
      75  988 0.0065708528
      76  785 0.0052207687
      77  694 0.0046155586
      78  620 0.0041234097
      79  554 0.0036844661
      80  485 0.0032255705
      81  377 0.0025072991
      82  342 0.0022745260
      83  298 0.0019818969
      84  232 0.0015429533
      85  218 0.0014498440
      86  160 0.0010641057
      87  383 0.0025472031
      88  307 0.0020417528
Code
hist(dat$age)

Checking sample size.

Code
nrow(dat)
[1] 150361

Clean genid_describe

  • genid_describe: Current gender identity. 1 is Male, 2 is Female, 3 is Transgender, 4 is none of these. 99 is ‘question seen but category not selected.’

Checking the distribution of gender identity.

Code
tabyl(dat$genid_describe)
 dat$genid_describe     n     percent
                -99   803 0.005340481
                  1 54773 0.364276641
                  2 91561 0.608941148
                  3  1029 0.006843530
                  4  2195 0.014598200

Dichotomizing genid_describe.

Code
dat$male <- car::recode(dat$genid_describe, "1=1; else=0") #gen1 = male
tabyl(dat$male)
 dat$male     n   percent
        0 95588 0.6357234
        1 54773 0.3642766
Code
dat$female <- car::recode(dat$genid_describe, "2=1; else=0") #gen2 = female
tabyl(dat$female)
 dat$female     n   percent
          0 58800 0.3910589
          1 91561 0.6089411
Code
dat$transgender <- car::recode(dat$genid_describe, "3=1; else=0") #gen3 = transgender
tabyl(dat$transgender)
 dat$transgender      n    percent
               0 149332 0.99315647
               1   1029 0.00684353
Code
dat$gen_none <- car::recode(dat$genid_describe, "4=1; else=0") #gen4 = None of these
tabyl(dat$gen_none)
 dat$gen_none      n   percent
            0 148166 0.9854018
            1   2195 0.0145982
Code
dat$gen_notsel <- car::recode(dat$genid_describe, "-99=1; else=0") #gen5 = not selected
tabyl(dat$gen_notsel)
 dat$gen_notsel      n     percent
              0 149558 0.994659519
              1    803 0.005340481

Checking sample size.

Code
nrow(dat)
[1] 150361

Clean race_eth

  • race_eth: Categorical variable derived from ‘rhispanic’ and ‘rrace’.

Combine race/ethnicity into a race_eth variable.

Code
tabyl(dat$rrace)
 dat$rrace      n    percent
         1 113187 0.75276834
         2  19087 0.12694116
         3   7974 0.05303237
         4  10113 0.06725813
Code
tabyl(dat$rhispanic)
 dat$rhispanic      n   percent
             1 131125 0.8720679
             2  19236 0.1279321
Code
dat <- dat %>%
  mutate(race_eth = case_when(.$rhispanic == 1 & .$rrace == 1 ~"nh_white",
                              .$rhispanic == 1 & .$rrace == 2 ~"nh_black",
                              .$rhispanic == 1 & .$rrace == 3 ~"nh_asian",
                              .$rhispanic == 1 & .$rrace == 4 ~"other",
                              .$rhispanic == 2 & .$rrace %in% c(1:4) ~ "hispanic"
                              )
        )

tabyl(dat$race_eth)
 dat$race_eth     n    percent
     hispanic 19236 0.12793211
     nh_asian  7446 0.04952082
     nh_black 17861 0.11878745
     nh_white 98063 0.65218374
        other  7755 0.05157587

Dichotomizing race and ethnicity.

Code
dat$hispanic <- car::recode(dat$race_eth, "'hispanic'=1; else=0")
tabyl(dat$hispanic)
 dat$hispanic      n   percent
            0 131125 0.8720679
            1  19236 0.1279321
Code
dat$nh_white <- car::recode(dat$race_eth, "'nh_white'=1; else=0")
tabyl(dat$nh_white)
 dat$nh_white     n   percent
            0 52298 0.3478163
            1 98063 0.6521837
Code
dat$nh_black <- car::recode(dat$race_eth, "'nh_black'=1; else=0")
tabyl(dat$nh_black)
 dat$nh_black      n   percent
            0 132500 0.8812125
            1  17861 0.1187875
Code
dat$nh_asian <- car::recode(dat$race_eth, "'nh_asian'=1; else=0")
tabyl(dat$nh_asian)
 dat$nh_asian      n    percent
            0 142915 0.95047918
            1   7446 0.04952082
Code
dat$other <- car::recode(dat$race_eth, "'other'=1; else=0")
tabyl(dat$other)
 dat$other      n    percent
         0 142606 0.94842413
         1   7755 0.05157587

Checking sample size.

Code
nrow(dat)
[1] 150361

Clean single_adult

  • single_adult: Dummy variable for single adult derived from ‘thhld_numadlt’.

Dichotomizing number of adults in household. Single adult is coded as “1”, multiple adults is coded as “0”.

Code
tabyl(dat$thhld_numadlt)
 dat$thhld_numadlt     n      percent
                 1 60088 0.3996249027
                 2 65608 0.4363365500
                 3 15704 0.1044419763
                 4  5988 0.0398241565
                 5  1889 0.0125630981
                 6   580 0.0038573832
                 7   206 0.0013700361
                 8    84 0.0005586555
                 9   110 0.0007315727
                10   104 0.0006916687
Code
dat$single_adult <- car::recode(dat$thhld_numadlt, "1=1; else=0")
tabyl(dat$single_adult)
 dat$single_adult     n   percent
                0 90273 0.6003751
                1 60088 0.3996249

Checking sample size.

Code
nrow(dat)
[1] 150361

Clean thhld_numper

  • thhld_numper: Total number of people in household

Distribution of number of people in household. I’m leaving this as an integer.

Code
tabyl(dat$thhld_numper)
 dat$thhld_numper     n     percent
                1 48881 0.325090948
                2 49287 0.327791116
                3 22746 0.151275929
                4 15835 0.105313213
                5  7643 0.050831000
                6  3340 0.022213207
                7  1301 0.008652510
                8   650 0.004322929
                9   236 0.001569556
               10   442 0.002939592

Checking sample size.

Code
nrow(dat)
[1] 150361

Clean child

  • child: Dummy variable derived from ‘thhld_numkid’

Check distribution of ‘thhld_numkid’.

Code
tabyl(dat$thhld_numkid)
 dat$thhld_numkid      n     percent
                0 106423 0.707783268
                1  21924 0.145809086
                2  13416 0.089225265
                3   5532 0.036791455
                4   1980 0.013168308
                5   1086 0.007222618

Dichotomizing presence of children. Children in household is coded as “1”, no children is “0”.

Code
dat$children <- car::recode(dat$thhld_numkid, "0=0; else=1")
tabyl(dat$children)
 dat$children      n   percent
            0 106423 0.7077833
            1  43938 0.2922167

Clean eeduc

  • eeduc: Educational attainment. 1 is less than high school, 2 is some high school, 3 is high school graduate, 4 is some college but no degree, 5 is associates degree, 6 is bachelor’s degree, 7 is graduate degree.

For educational attainment, dichotomize into “less than high school”, “high school”, “some college”, “bachelor’s or higher”.

Code
tabyl(dat$eeduc)
 dat$eeduc     n    percent
         1  1527 0.01015556
         2  3418 0.02273196
         3 21486 0.14289610
         4 37721 0.25086957
         5 16022 0.10655689
         6 40541 0.26962444
         7 29646 0.19716549

Dichotomizing educational attainment.

Code
dat$lessthanhs <- car::recode(dat$eeduc, "1=1; 2=1; else=0")
tabyl(dat$lessthanhs)
 dat$lessthanhs      n    percent
              0 145416 0.96711248
              1   4945 0.03288752
Code
dat$highschool <- car::recode(dat$eeduc, "3=1; else=0")
tabyl(dat$highschool)
 dat$highschool      n   percent
              0 128875 0.8571039
              1  21486 0.1428961
Code
dat$somecollege <- car::recode(dat$eeduc, "4=1; 5=1; else=0")
tabyl(dat$somecollege)
 dat$somecollege     n   percent
               0 96618 0.6425735
               1 53743 0.3574265
Code
dat$bachelors <- car::recode(dat$eeduc, "6=1; 7=1; else=0")
tabyl(dat$bachelors)
 dat$bachelors     n   percent
             0 80174 0.5332101
             1 70187 0.4667899

Checking sample size.

Code
nrow(dat)
[1] 150361

Visualizations

Table of phq2 by rentcur.

Code
tabyl(dat, rentcur, phq2)
 rentcur     0     1
       0  8767  7901
       1 97939 35754

Table of gad2 by rentcur.

Code
tabyl(dat, rentcur, gad2)
 rentcur     0     1
       0  7395  9273
       1 88855 44838

Table of phq2 by evict.

Code
tabyl(dat, evict, phq2)
 evict     0     1
     0 97939 35754
     1  3272  1274
     2  2726  2291
     3  2016  2729
     4   753  1607

Table of gad2 by evict.

Code
tabyl(dat, evict, gad2)
 evict     0     1
     0 88855 44838
     1  2959  1587
     2  2295  2722
     3  1560  3185
     4   581  1779

Checking count of observations in each state.

Code
tabyl(dat, state)
                state     n     percent
              Alabama  1489 0.009902834
               Alaska  2302 0.015309821
              Arizona  3650 0.024274912
             Arkansas  1679 0.011166459
           California 16018 0.106530284
             Colorado  3662 0.024354720
          Connecticut  2423 0.016114551
             Delaware  1199 0.007974142
 District of Columbia  3422 0.022758561
              Florida  5191 0.034523580
              Georgia  3199 0.021275464
               Hawaii  2038 0.013554047
                Idaho  2370 0.015762066
             Illinois  3072 0.020430830
              Indiana  2181 0.014505091
                 Iowa  1839 0.012230565
               Kansas  2499 0.016620001
             Kentucky  1712 0.011385931
            Louisiana  1560 0.010375031
                Maine  1523 0.010128956
             Maryland  3131 0.020823219
        Massachusetts  4696 0.031231503
             Michigan  3010 0.020018489
            Minnesota  2544 0.016919281
          Mississippi  1023 0.006803626
             Missouri  2318 0.015416232
              Montana  1689 0.011232966
             Nebraska  2455 0.016327372
               Nevada  3081 0.020490686
        New Hampshire  2353 0.015649005
           New Jersey  2684 0.017850373
           New Mexico  2670 0.017757264
             New York  4395 0.029229654
       North Carolina  2614 0.017384827
         North Dakota  1318 0.008765571
                 Ohio  2244 0.014924083
             Oklahoma  2054 0.013660457
               Oregon  4750 0.031590639
         Pennsylvania  3484 0.023170902
         Rhode Island  1745 0.011605403
       South Carolina  1524 0.010135607
         South Dakota  1278 0.008499544
            Tennessee  2317 0.015409581
                Texas  8053 0.053557771
                 Utah  3340 0.022213207
              Vermont  1446 0.009616855
             Virginia  3676 0.024447829
           Washington  6744 0.044852056
        West Virginia  1246 0.008286723
            Wisconsin  2347 0.015609101
              Wyoming  1104 0.007342329

Lets see how ‘evict’ is distributed across states.

Code
dat %>%
  tabyl(., state, evict)
                state     0   1   2   3   4
              Alabama  1257  48  69  67  48
               Alaska  2044  64  81  79  34
              Arizona  3362  61  91  96  40
             Arkansas  1417  50  85  80  47
           California 14295 547 477 460 239
             Colorado  3403  58  90  76  35
          Connecticut  2088 106 107  81  41
             Delaware  1030  47  59  43  20
 District of Columbia  3127 111  92  69  23
              Florida  4554 184 174 169 110
              Georgia  2749  83 119 137 111
               Hawaii  1832  68  69  55  14
                Idaho  2196  36  66  54  18
             Illinois  2660 116 125 112  59
              Indiana  1924  71  91  65  30
                 Iowa  1639  51  76  57  16
               Kansas  2263  81  62  59  34
             Kentucky  1500  43  61  71  37
            Louisiana  1283  52  68 101  56
                Maine  1380  38  60  35  10
             Maryland  2675 134 133 118  71
        Massachusetts  4199 188 144 112  53
             Michigan  2611 114 125 109  51
            Minnesota  2325  72  67  55  25
          Mississippi   820  40  57  72  34
             Missouri  2059  51 102  64  42
              Montana  1546  43  44  39  17
             Nebraska  2231  61  65  70  28
               Nevada  2762  79  74 113  53
        New Hampshire  2157  60  68  48  20
           New Jersey  2338 113 113  81  39
           New Mexico  2368  78  85  98  41
             New York  3766 211 209 138  71
       North Carolina  2317  74  88  88  47
         North Dakota  1204  25  29  40  20
                 Ohio  1997  50  77  78  42
             Oklahoma  1749  60  90 105  50
               Oregon  4291  96 137 157  69
         Pennsylvania  3036 135 130 116  67
         Rhode Island  1519  70  66  68  22
       South Carolina  1295  54  59  82  34
         South Dakota  1152  35  47  31  13
            Tennessee  2019  65  93  93  47
                Texas  7098 241 247 310 157
                 Utah  3138  61  55  52  34
              Vermont  1343  31  41  19  12
             Virginia  3315 125  98  98  40
           Washington  6177 147 188 154  78
        West Virginia  1070  36  51  67  22
            Wisconsin  2127  56  71  67  26
              Wyoming   986  26  42  37  13

Variation across states

The key thing for the model is to seee how much variation is there across states.

Scatterplot - aggregating values by state and graphing it. Calculate percent on phq2 and gad2. First do the mean on the raw HH Pulse mental health score. For each state you have the score on the eviction policy. State eviction score is the unit of analysis on X axis score. On the Y axis you have mean mental health score by state or percent with bad mental health by state.

Here I create a new dataframe that takes the data in dat and groups them by state. Then I find the mean mental health number for each state. I create a dataframe that contains the state and the mean mental health score of renters who are late on rent in each state.

Code
statemh <- dat %>%
  group_by(abbv) %>%
  summarize(mean_mh = mean(anxious+worry+interest+down), mh_n=n())

statedat <- merge(x=statemh, y=statedat, by='abbv', all.x=TRUE)

nrow(statedat)
[1] 51
Code
ggplot(statedat, aes(x=score, y=mean_mh, label=abbv)) +
    geom_point() +
    geom_text(hjust=-0.3, vjust=0.5) +
    geom_smooth(method=lm) +
    labs(title = "Mean Mental Health of All Renters by State",
          x = "State COVID Housing Policy Score",
          y = "Mean Mental Health")
`geom_smooth()` using formula = 'y ~ x'
Warning: The following aesthetics were dropped during statistical transformation: label
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

Pearson correlation test between State COVID-19 Score and the mean mental health among all renters.

Code
test <- cor.test(statedat$score, statedat$mean_mh, method = "spearman")
Warning in cor.test.default(statedat$score, statedat$mean_mh, method =
"spearman"): Cannot compute exact p-value with ties
Code
test

    Spearman's rank correlation rho

data:  statedat$score and statedat$mean_mh
S = 29127, p-value = 0.02299
alternative hypothesis: true rho is not equal to 0
sample estimates:
       rho 
-0.3179432 

Here I am finding the percent with depression.

Code
statebmh <- dat %>%
  group_by(abbv) %>%
  summarize(dep_pc = mean(phq2), dmh_n=n())

statedat <- merge(x=statebmh, y=statedat, by='abbv', all.x=TRUE)

nrow(statedat)
[1] 51

Here I am finding the percent with anxiety

Code
statebmh <- dat %>%
  group_by(abbv, evict) %>%
  summarize(anx_pc = mean(gad2), mscore=mean(score), amh_n=n())
`summarise()` has grouped output by 'abbv'. You can override using the
`.groups` argument.
Code
statedat <- merge(x=statebmh, y=statedat, by='abbv', all.x=TRUE)

nrow(statedat)
[1] 255
Code
library(ggpubr)
library(ggrepel)

statebmh$evict <- as.factor(statebmh$evict)

ggplot(data=statebmh, aes(x = mscore, y = anx_pc, color=evict)) + 
  geom_point() +
  stat_smooth(method = "lm", size = 1) +
  stat_regline_equation(aes(label = ..rr.label..)) +
  geom_text_repel(aes(anx_pc, mscore, label = abbv, color=evict), size = 6)+guides(color=F)+
  theme_bw(base_size = 18) +
  labs(title="Marriageable Male Effect for States Differing in Conservatism", 
       subtitle="US-Born Men Ages 18 to 27", 
       caption="Source: 2012 ACS 5-Year File") +  
 xlab(label="# of Employed Men Per 100 Men in the Unmarried Population Ages 18 to 27") +
  ylab(label="Standardized Early Marriage Ratio") +
  ylim(0, 1)
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
of ggplot2 3.3.4.
Warning: The dot-dot notation (`..rr.label..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(rr.label)` instead.
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 165 rows containing missing values (`geom_text_repel()`).
Warning: ggrepel: 80 unlabeled data points (too many overlaps). Consider
increasing max.overlaps

Percent of Tenants with Depression by State

Code
ggplot(statedat, aes(x=score, y=dep_pc, label=abbv)) +
    geom_point() +
    geom_text(hjust=-0.3, vjust=0.5) +
    geom_smooth(method=lm) +
    labs(title = "Percent of Tenants with Depression",
          x = "State COVID Housing Policy Score",
          y = "Percent with Depression")
`geom_smooth()` using formula = 'y ~ x'
Warning: The following aesthetics were dropped during statistical transformation: label
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

Percent of Tenants with Anxiety by State

Code
ggplot(statedat, aes(x=score, y=anx_pc, label=abbv)) +
    geom_point() +
    geom_text(hjust=-0.3, vjust=0.5) +
    geom_smooth(method=lm) +
    labs(title = "Percent of Tenants with Anxiety",
          x = "State COVID Housing Policy Score",
          y = "Percent with Depression")
`geom_smooth()` using formula = 'y ~ x'
Warning: The following aesthetics were dropped during statistical transformation: label
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

Pearson correlation test between State COVID-19 Score and the percent with depression among all renters.

Code
test2 <- cor.test(statedat$score, statedat$dep_pc, method = "spearman")
Warning in cor.test.default(statedat$score, statedat$dep_pc, method =
"spearman"): Cannot compute exact p-value with ties
Code
test2

    Spearman's rank correlation rho

data:  statedat$score and statedat$dep_pc
S = 3922021, p-value = 2.832e-12
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
-0.419212 

Pearson correlation test between State COVID-19 Score and the percent with anxiety among all renters.

Code
test3 <- cor.test(statedat$score, statedat$anx_pc, method = "spearman")
Warning in cor.test.default(statedat$score, statedat$anx_pc, method =
"spearman"): Cannot compute exact p-value with ties
Code
test3

    Spearman's rank correlation rho

data:  statedat$score and statedat$anx_pc
S = 2993794, p-value = 0.1847
alternative hypothesis: true rho is not equal to 0
sample estimates:
        rho 
-0.08332638 

Plot of risk of eviction and depression.

Code
ggplot(dat, aes(evict, depression)) + 
  geom_boxplot(aes(group = evict)) +
  geom_smooth(method=lm) +
    labs(title = "Eviction Risk and Depression",
          x = "Risk of Eviction",
          y = "Depression")
`geom_smooth()` using formula = 'y ~ x'

Plot of risk of eviction and anxiety

Code
ggplot(dat, aes(evict, anxiety)) + 
  geom_boxplot(aes(group = evict)) +
  geom_smooth(method=lm) +
    labs(title = "Eviction Risk and Anxiety",
          x = "Risk of Eviction",
          y = "Anxiety")
`geom_smooth()` using formula = 'y ~ x'

Code
ggplot(dat, aes(score, evict)) + 
  geom_boxplot(aes(group = evict)) +
  geom_smooth(method=lm) 
`geom_smooth()` using formula = 'y ~ x'

Code
    #labs(title = "Eviction Risk and Anxiety",
          #x = "Risk of Eviction",
          #y = "Anxiety")
Code
#dat$phq2 <- as.numeric(dat$phq2)

#dat$evict  <- as.numeric(dat$evict)

test4 <- cor.test(dat$evict, dat$depression, method = "spearman")
Warning in cor.test.default(dat$evict, dat$depression, method = "spearman"):
Cannot compute exact p-value with ties
Code
test4

    Spearman's rank correlation rho

data:  dat$evict and dat$depression
S = 4.7599e+14, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.1598694 
Code
test5 <- cor.test(dat$evict, dat$anxiety, method = "spearman")
Warning in cor.test.default(dat$evict, dat$anxiety, method = "spearman"):
Cannot compute exact p-value with ties
Code
test5

    Spearman's rank correlation rho

data:  dat$evict and dat$anxiety
S = 4.7278e+14, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
     rho 
0.165534 

Modeling

Plan for modeling.

Population: All renters in the HH Pulse Survey phases 3.2 to 3.4.

Dependent variables

phq2 = Dummy variable for depression. Sum ‘interest’ and ‘down’. Score of 3 or higher indicates major depressive disorder. Dichotomized.

gad2 = Dummy variable for anxiety. Sum ‘anxious’ and ‘worry’. Score of 3 or higher indicative of generalized anxiety disorder. Dichotomized.

Independent variables

  • rentcur: Caught up on rent. Dichotomized so that “Caught up on rent” = 1 and “Not caught up on rent” = 0. NAs removed.

  • evict: Risk of eviction in next two months. 0 is caught up on rent, 1 is not likely at all, 2 is not very likely, 3 is somewhat likeley, 4 is very likely. NAs removed.

  • dichotomized: evict0, evict1, evict2, evict3, evict4

Covariates

  • income: categorical income variable

    1. Less than $25,000
    1. $25,000 - $34,999
    1. $35,000 - $49,999
    1. $50,000 - $74,999
    1. $75,000 - $99,999
    1. $100,000 - $149,999
    1. $150,000 - $199,999
    1. $200,000 and above
  • age: integer: numerical age in years

  • genid_describe: categorical: 1 is Male, 2 is Female, 3 is Transgender, 4 is none of these. 99 is ‘question seen but category not selected.’

  • dichotomized: male, female, transgender, gen_none, gen_notsel

  • race_eth: categorical: hispanic, nh_white, nh_black, nh_asian, other

  • dichotomized: hispanic, nh_white, nh_black, nh_asian, other

  • single_adult: dichotomized 1 = singe adult or 0 = mutiple adults in household

  • thhld_numper: integer: number of people in household from 1 to 10.

  • children: dichotomized: 1 = children in household, 0 = no children in household

  • eeduc: 1 is less than high school, 2 is some high school, 3 is high school graduate, 4 is some college but no degree, 5 is associates degree, 6 is bachelor’s degree, 7 is graduate degree.

  • dichotomized: lessthanhs, highschool, somecollege, bachelors

glmer models

DV is depression, IV is evict risk and state score.

Code
m <- glmer(phq2 ~ factor(evict) + score +
    (1 | abbv), data = dat, family = binomial)

summary(m)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: phq2 ~ factor(evict) + score + (1 | abbv)
   Data: dat

     AIC      BIC   logLik deviance df.resid 
176544.5 176613.9 -88265.3 176530.5   150354 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.7420 -0.6229 -0.5766  1.1417  1.9222 

Random effects:
 Groups Name        Variance Std.Dev.
 abbv   (Intercept) 0.01744  0.1321  
Number of obs: 150361, groups:  abbv, 51

Fixed effects:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -0.90834    0.03246 -27.987  < 2e-16 ***
factor(evict)1  0.08671    0.03345   2.592  0.00955 ** 
factor(evict)2  0.83823    0.02892  28.984  < 2e-16 ***
factor(evict)3  1.30674    0.02994  43.646  < 2e-16 ***
factor(evict)4  1.76858    0.04401  40.189  < 2e-16 ***
score          -0.03946    0.01402  -2.815  0.00488 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) fct()1 fct()2 fct()3 fct()4
factr(vct)1 -0.034                            
factr(vct)2 -0.044  0.039                     
factr(vct)3 -0.042  0.038  0.048              
factr(vct)4 -0.033  0.024  0.031  0.027       
score       -0.795 -0.002  0.003  0.003  0.008

DV is depression, IV is evict risk and state score and evict*score

Code
m <- glmer(phq2 ~ factor(evict) + score + factor(evict)*score +
    (1 | abbv), data = dat, family = binomial)

summary(m)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: phq2 ~ factor(evict) + score + factor(evict) * score + (1 | abbv)
   Data: dat

     AIC      BIC   logLik deviance df.resid 
176550.0 176659.1 -88264.0 176528.0   150350 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.7289 -0.6229 -0.5761  1.1439  1.9240 

Random effects:
 Groups Name        Variance Std.Dev.
 abbv   (Intercept) 0.01744  0.132   
Number of obs: 150361, groups:  abbv, 51

Fixed effects:
                      Estimate Std. Error z value Pr(>|z|)    
(Intercept)          -0.906403   0.032385 -27.988  < 2e-16 ***
factor(evict)1        0.095360   0.058734   1.624  0.10446    
factor(evict)2        0.855964   0.050042  17.105  < 2e-16 ***
factor(evict)3        1.245644   0.050496  24.668  < 2e-16 ***
factor(evict)4        1.742843   0.070256  24.807  < 2e-16 ***
score                -0.040447   0.014065  -2.876  0.00403 ** 
factor(evict)1:score -0.004267   0.024745  -0.172  0.86308    
factor(evict)2:score -0.009214   0.021464  -0.429  0.66770    
factor(evict)3:score  0.033156   0.022457   1.476  0.13984    
factor(evict)4:score  0.014389   0.032409   0.444  0.65706    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) fct()1 fct()2 fct()3 fct()4 score  fc()1: fc()2: fc()3:
factr(vct)1 -0.053                                                        
factr(vct)2 -0.069  0.042                                                 
factr(vct)3 -0.072  0.045  0.050                                          
factr(vct)4 -0.040  0.036  0.040  0.035                                   
score       -0.796  0.041  0.055  0.058  0.030                            
fctr(vct)1:  0.042 -0.825 -0.035 -0.038 -0.031 -0.053                     
fctr(vct)2:  0.055 -0.034 -0.816 -0.040 -0.031 -0.067  0.042              
fctr(vct)3:  0.057 -0.036 -0.039 -0.806 -0.028 -0.068  0.043  0.046       
fctr(vct)4:  0.030 -0.026 -0.029 -0.027 -0.782 -0.037  0.033  0.032  0.032

Polr model, DV is evict risk, IV is and state score.

Code
m <- polr(factor(evict) ~ score,
             data = dat, Hess = T)

summary(m)
Call:
polr(formula = factor(evict) ~ score, data = dat, Hess = T)

Coefficients:
         Value Std. Error t value
score -0.03719   0.006134  -6.063

Intercepts:
    Value    Std. Error t value 
0|1   2.0098   0.0143   140.1055
1|2   2.3617   0.0151   156.4628
2|3   2.9317   0.0169   173.5659
3|4   4.0667   0.0238   170.7363

Residual Deviance: 149713.36 
AIC: 149723.36 
Code
m.coef <- data.frame(coef(summary(m)))
m.coef$pval = round((pnorm(abs(m.coef$t.value), lower.tail = FALSE) * 2),2)
m.coef
            Value  Std..Error    t.value pval
score -0.03718867 0.006134154  -6.062559    0
0|1    2.00978871 0.014344823 140.105504    0
1|2    2.36173926 0.015094575 156.462789    0
2|3    2.93174154 0.016891234 173.565860    0
3|4    4.06665554 0.023818335 170.736349    0
Code
cbind(Estimate=round(coef(m),4),
      OR=round(exp(coef(m)),4))
      Estimate     OR
score  -0.0372 0.9635

To do:

  • fix scatterplot showing state scores but grouped by eviction risk (have to think about this). I need to find the percent of eviction risk by state. The variable is dichotomized, so I can find the mean for each state. But what does that variable mean?

  • run models!

mean eviction risk by score DV, then add score, then add evict risk

DV is depression, checking variation by state.

Code
# Running Generalized linear mixed model (GLMM).  Checking variance of depression by state.

m1 <- glmer(phq2 ~ 
             (1 | abbv), data = dat, family = binomial)

summary(m1)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: phq2 ~ (1 | abbv)
   Data: dat

     AIC      BIC   logLik deviance df.resid 
180666.9 180686.7 -90331.4 180662.9   150359 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-0.7735 -0.6475 -0.6060  1.4627  1.8419 

Random effects:
 Groups Name        Variance Std.Dev.
 abbv   (Intercept) 0.0224   0.1497  
Number of obs: 150361, groups:  abbv, 51

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.86299    0.02191  -39.38   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

DV is depression, IV is state score

Code
# Outcome variable is depression and focus variable is state housing score.

m2 <- glmer(phq2 ~ score +
             (1 | abbv), data = dat, family = binomial)

summary(m2)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: phq2 ~ score + (1 | abbv)
   Data: dat

     AIC      BIC   logLik deviance df.resid 
180659.3 180689.1 -90326.7 180653.3   150358 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-0.7730 -0.6462 -0.6052  1.4615  1.8468 

Random effects:
 Groups Name        Variance Std.Dev.
 abbv   (Intercept) 0.01827  0.1352  
Number of obs: 150361, groups:  abbv, 51

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.77732    0.03283 -23.679   <2e-16 ***
score       -0.04654    0.01426  -3.265   0.0011 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
      (Intr)
score -0.795

DV is depression, IV is state score and eviction risk.

Code
# Outcome variable is depression and focus variables are state housing score and risk of eviction.

m3 <- glmer(phq2 ~ score + factor(evict) +
             (1 | abbv), data = dat, family = binomial)

summary(m3)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: phq2 ~ score + factor(evict) + (1 | abbv)
   Data: dat

     AIC      BIC   logLik deviance df.resid 
176544.5 176613.9 -88265.3 176530.5   150354 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.7420 -0.6229 -0.5766  1.1417  1.9222 

Random effects:
 Groups Name        Variance Std.Dev.
 abbv   (Intercept) 0.01744  0.1321  
Number of obs: 150361, groups:  abbv, 51

Fixed effects:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -0.90834    0.03254 -27.915  < 2e-16 ***
score          -0.03946    0.01406  -2.807  0.00500 ** 
factor(evict)1  0.08671    0.03363   2.579  0.00992 ** 
factor(evict)2  0.83823    0.02894  28.966  < 2e-16 ***
factor(evict)3  1.30674    0.02987  43.743  < 2e-16 ***
factor(evict)4  1.76857    0.04475  39.522  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) score  fct()1 fct()2 fct()3
score       -0.796                            
factr(vct)1 -0.030 -0.006                     
factr(vct)2 -0.040 -0.001  0.044              
factr(vct)3 -0.045  0.006  0.040  0.045       
factr(vct)4 -0.029  0.003  0.029  0.030  0.026

DV is anxiety, checking variance of anxiety by state.

Code
# Checking variance of anxiety by state.

m4 <- glmer(gad2 ~ 
             (1 | abbv), data = dat, family = binomial)

summary(m4)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: gad2 ~ (1 | abbv)
   Data: dat

     AIC      BIC   logLik deviance df.resid 
195998.7 196018.6 -97997.4 195994.7   150359 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-0.8830 -0.7606 -0.7016  1.2813  1.5215 

Random effects:
 Groups Name        Variance Std.Dev.
 abbv   (Intercept) 0.01958  0.1399  
Number of obs: 150361, groups:  abbv, 51

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.55000    0.02049  -26.84   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

DV is anxiety, IV is state score.

Code
# Outcome variable is anxiety and focus variable is state housing score.

m5 <- glmer(gad2 ~ score +
             (1 | abbv), data = dat, family = binomial)

summary(m5)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: gad2 ~ score + (1 | abbv)
   Data: dat

     AIC      BIC   logLik deviance df.resid 
195995.6 196025.3 -97994.8 195989.6   150358 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-0.8846 -0.7611 -0.7018  1.2835  1.5226 

Random effects:
 Groups Name        Variance Std.Dev.
 abbv   (Intercept) 0.0175   0.1323  
Number of obs: 150361, groups:  abbv, 51

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.48963    0.03232 -15.151   <2e-16 ***
score       -0.03273    0.01398  -2.342   0.0192 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
      (Intr)
score -0.798

DV is anxiety, IV is state score and eviction risk.

Code
# Outcome variable is anxiety and focus variables are state housing score and risk of eviction.

m6 <- glmer(gad2 ~ score + factor(evict) +
             (1 | abbv), data = dat, family = binomial)

summary(m6)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: gad2 ~ score + factor(evict) + (1 | abbv)
   Data: dat

     AIC      BIC   logLik deviance df.resid 
191564.0 191633.4 -95775.0 191550.0   150354 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.0263 -0.7299 -0.6706  1.3243  1.6227 

Random effects:
 Groups Name        Variance Std.Dev.
 abbv   (Intercept) 0.01732  0.1316  
Number of obs: 150361, groups:  abbv, 51

Fixed effects:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -0.61512    0.03218 -19.115  < 2e-16 ***
score          -0.02572    0.01390  -1.850  0.06434 .  
factor(evict)1  0.08285    0.03156   2.625  0.00867 ** 
factor(evict)2  0.86104    0.02862  30.087  < 2e-16 ***
factor(evict)3  1.39889    0.03136  44.612  < 2e-16 ***
factor(evict)4  1.80915    0.04747  38.108  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) score  fct()1 fct()2 fct()3
score       -0.796                            
factr(vct)1 -0.028 -0.006                     
factr(vct)2 -0.037  0.000  0.038              
factr(vct)3 -0.039  0.006  0.032  0.038       
factr(vct)4 -0.028  0.007  0.024  0.029  0.032
Code
# Outcome variable is anxiety and focus variables are state housing score and risk of eviction.

m7 <- glmer(phq2 ~ score + factor(evict) +
             (1 | abbv), data = dat, family = binomial)

summary(m7)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: phq2 ~ score + factor(evict) + (1 | abbv)
   Data: dat

     AIC      BIC   logLik deviance df.resid 
176544.5 176613.9 -88265.3 176530.5   150354 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.7420 -0.6229 -0.5766  1.1417  1.9222 

Random effects:
 Groups Name        Variance Std.Dev.
 abbv   (Intercept) 0.01744  0.1321  
Number of obs: 150361, groups:  abbv, 51

Fixed effects:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -0.90834    0.03254 -27.915  < 2e-16 ***
score          -0.03946    0.01406  -2.807  0.00500 ** 
factor(evict)1  0.08671    0.03363   2.579  0.00992 ** 
factor(evict)2  0.83823    0.02894  28.966  < 2e-16 ***
factor(evict)3  1.30674    0.02987  43.743  < 2e-16 ***
factor(evict)4  1.76857    0.04475  39.522  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) score  fct()1 fct()2 fct()3
score       -0.796                            
factr(vct)1 -0.030 -0.006                     
factr(vct)2 -0.040 -0.001  0.044              
factr(vct)3 -0.045  0.006  0.040  0.045       
factr(vct)4 -0.029  0.003  0.029  0.030  0.026

Commenting these blocks out for now because they take a long time to run. Outcome variable is depression and input variables are all variables.

m7 <- glmer(phq2 ~ score + factor(evict) + factor(income) + age + factor(genid_describe) + factor(race_eth) + single_adult + thhld_numper + children + factor(eeduc) + (1 | abbv), data = dat, family = binomial)

summary(m7)

Outcome variable is anxiety and input variables are all variables.

m8 <- glmer(gad2 ~ score + factor(evict) + factor(income) + age + factor(genid_describe) + factor(race_eth) + single_adult + thhld_numper + children + factor(eeduc) + (1 | abbv), data = dat, family = binomial)

summary(m8)

Eviction risk as the DV.

Proportion maps

Could you determine with your file the percent of respondents in each state who say they are very or somewhat likely to leave their place in the next two months due to eviction (among all respondents)? Question - should this be among homeowners and renters? The “evict” variable only applies to renters. There is a separate question for homeowners if they will be foreclosed on in the next two months.

Code
# Determining the percent of respondents in each state who say they are very or somewhat likely to leave their place in the next two months due to eviction (among all respondents).

alldata <- alldata %>% 
  mutate(highrisk = case_when(evict %in% c(3:4) ~ 1,
                              TRUE ~ 0))

# Create a dataframe called state_highrisk to create a map.

all_state_highrisk <- alldata %>%   
  group_by(abbr = abbv) %>% 
  summarize(highrisk = 100*mean(highrisk))

all_state_highrisk
# A tibble: 51 × 2
   abbr  highrisk
   <chr>    <dbl>
 1 AK       1.18 
 2 AL       1.24 
 3 AR       1.40 
 4 AZ       0.779
 5 CA       1.74 
 6 CO       0.746
 7 CT       1.50 
 8 DC       2.08 
 9 DE       1.22 
10 FL       1.32 
# ℹ 41 more rows
Code
# Plot a US map
# Get centroids
library(usmapdata)
library(usmap)

Attaching package: 'usmap'
The following object is masked from 'package:usmapdata':

    us_map
Code
centroid_labels <- usmapdata::centroid_labels("states")


# Join data to centroids
data_labels <- merge(centroid_labels, all_state_highrisk, by = "abbr")

map1dat <- data_labels %>%
  dplyr::select(fips, highrisk)

plot_usmap(data = map1dat, values = "highrisk")+
labs(title="Percent of all respondents in each state who are very or somewhat likely to be evicted in next two months.", 
       subtitle="Highrisk", 
        caption = "Source: Household Pulse, COVID 19 Housing Policy Score") +
    theme(panel.background = element_rect(colour = "black"))+
    scale_fill_continuous(low = "white", high ="darkred", 
                          name = "Value on Index",label = scales::comma) + 
    theme(legend.position = "right") +
    guides(fill = "none") +
  geom_text(data = data_labels, ggplot2::aes(
    x = x, y = y,
    label = scales::number(highrisk, scale = 1, accuracy = .1)), color = "black")

Code
# Determining the percent of renters in each state who say they are very or somewhat likely to leave their place in the next two months due to eviction (among all respondents).

# Create a dichotomous variable called "highrisk" for renters who are very or somewhat likely to be evicted in the next 2 months.

dat <- dat %>% 
  mutate(highrisk = case_when(evict %in% c(3:4) ~ 1,
                              TRUE ~ 0))

# Create a dataframe called state_highrisk to create a map.

state_highrisk <- dat %>%   
  group_by(abbr = abbv) %>% 
  summarize(highrisk = 100*mean(highrisk))

state_highrisk
# A tibble: 51 × 2
   abbr  highrisk
   <chr>    <dbl>
 1 AK        4.91
 2 AL        7.72
 3 AR        7.56
 4 AZ        3.73
 5 CA        4.36
 6 CO        3.03
 7 CT        5.04
 8 DC        2.69
 9 DE        5.25
10 FL        5.37
# ℹ 41 more rows
Code
# Join data to centroids
data_labels <- merge(centroid_labels, state_highrisk, by = "abbr")

map1dat <- data_labels %>%
  dplyr::select(fips, highrisk)

plot_usmap(data = map1dat, values = "highrisk")+
labs(title="Percent of renters in each state who are very or somewhat likely to be evicted in next two months.", 
       subtitle="Highrisk", 
        caption = "Source: Household Pulse, COVID 19 Housing Policy Score") +
    theme(panel.background = element_rect(colour = "black"))+
    scale_fill_continuous(low = "white", high ="darkred", 
                          name = "Value on Index",label = scales::comma) + 
    theme(legend.position = "right") +
    guides(fill = "none") +
  geom_text(data = data_labels, ggplot2::aes(
    x = x, y = y,
    label = scales::number(highrisk, scale = 1, accuracy = .1)), color = "black")

In addition, determine these percentages:

The percent of respondents in each state who are renting.

Code
# I'll have to use alldata for this one.

# Category 3 is rented, category 4 is occupied without payment of rent.  What to do about category 4?

tabyl(alldata$tenure)
 alldata$tenure      n     percent
            -99   6498 0.008083045
            -88 117839 0.146583241
              1 194842 0.242369434
              2 316637 0.393873654
              3 159649 0.198591873
              4   8440 0.010498753
Code
alldata <- alldata %>%
  mutate(renting = case_when(tenure == 3 ~ 1,
                             TRUE ~ 0))

state_renting <- alldata %>% 
  group_by(abbr = abbv) %>% 
  summarize(Renters = 100*mean(renting))

state_renting
# A tibble: 51 × 2
   abbr  Renters
   <chr>   <dbl>
 1 AK       18.5
 2 AL       15.5
 3 AR       17.6
 4 AZ       18.6
 5 CA       27.4
 6 CO       18.7
 7 CT       17.9
 8 DC       35.2
 9 DE       13.6
10 FL       19.1
# ℹ 41 more rows
Code
# Join data to centroids
data_labels <- merge(centroid_labels, state_renting, by = "abbr")

map1dat <- data_labels %>%
  dplyr::select(fips, Renters)

plot_usmap(data = map1dat, values = "Renters")+
labs(title="Percent of respondents who are renting.", 
       subtitle="Renting", 
        caption = "Source: Household Pulse, COVID 19 Housing Policy Score") +
    theme(panel.background = element_rect(colour = "black"))+
    scale_fill_continuous(low = "white", high ="darkred", 
                          name = "Value on Index",label = scales::comma) + 
    theme(legend.position = "right") +
    guides(fill = "none") +
  geom_text(data = data_labels, ggplot2::aes(
    x = x, y = y,
    label = scales::number(Renters, scale = 1, accuracy = .1)), color = "black")

The percent of respondents in each state who are not caught upon rent among those who are renting.

Code
# For this, I am only using category 3, renters.

tabyl(dat$rentcur)
 dat$rentcur      n   percent
           0  16668 0.1108532
           1 133693 0.8891468
Code
late_renters <- dat %>% 
  group_by(abbr = abbv) %>% 
  summarize(late_renters = 100*(1-mean(rentcur)))

late_renters
# A tibble: 51 × 2
   abbr  late_renters
   <chr>        <dbl>
 1 AK           11.2 
 2 AL           15.6 
 3 AR           15.6 
 4 AZ            7.89
 5 CA           10.8 
 6 CO            7.07
 7 CT           13.8 
 8 DC            8.62
 9 DE           14.1 
10 FL           12.3 
# ℹ 41 more rows
Code
# Join data to centroids
data_labels <- merge(centroid_labels, late_renters, by = "abbr")

map1dat <- data_labels %>%
  dplyr::select(fips, late_renters)

plot_usmap(data = map1dat, values = "late_renters")+
labs(title="Percent of renters who are late on rent", 
       subtitle="Late Renters", 
        caption = "Source: Household Pulse, COVID 19 Housing Policy Score") +
    theme(panel.background = element_rect(colour = "black"))+
    scale_fill_continuous(low = "white", high ="darkred", 
                          name = "Value on Index",label = scales::comma) + 
    theme(legend.position = "right") +
    guides(fill = "none") +
  geom_text(data = data_labels, ggplot2::aes(
    x = x, y = y,
    label = scales::number(late_renters, scale = 1, accuracy = .1)), color = "black")

The percent of respondents in each state who are very or somewhat likely to leave their place in the next two months due to eviction among those who are not caught up on rent.

Code
# Limit the data to those who are not caught up on rent.

not_rentcur <- dat %>% 
  dplyr::select(abbv, rentcur, evict) %>% 
  filter(rentcur == 0) %>%
  mutate(highrisk = case_when(evict %in% c(3:4) ~ 1,
                               TRUE ~ 0))

not_rentcur <- not_rentcur %>% 
  group_by(abbr = abbv) %>% 
  summarize(highrisk = 100*mean(highrisk))

not_rentcur
# A tibble: 51 × 2
   abbr  highrisk
   <chr>    <dbl>
 1 AK        43.8
 2 AL        49.6
 3 AR        48.5
 4 AZ        47.2
 5 CA        40.6
 6 CO        42.9
 7 CT        36.4
 8 DC        31.2
 9 DE        37.3
10 FL        43.8
# ℹ 41 more rows
Code
# Join data to centroids
data_labels <- merge(centroid_labels, not_rentcur, by = "abbr")

map1dat <- data_labels %>%
  dplyr::select(fips, highrisk)

plot_usmap(data = map1dat, values = "highrisk")+
labs(title="Perent of late renters who are very or somewhat likely to be evicted", 
       subtitle="High risk late renters", 
        caption = "Source: Household Pulse, COVID 19 Housing Policy Score") +
    theme(panel.background = element_rect(colour = "black"))+
    scale_fill_continuous(low = "white", high ="darkred", 
                          name = "Value on Index",label = scales::comma) + 
    theme(legend.position = "right") +
    guides(fill = "none") +
  geom_text(data = data_labels, ggplot2::aes(
    x = x, y = y,
    label = scales::number(highrisk, scale = 1, accuracy = .1)), color = "black")