Stats II Project

Author

Brian Surratt

Published

September 19, 2023

Variable descriptions and descriptive results

Here you will describe the coding of your outcome and independent variables. You will produce and discuss summary statistics (tables) and data visualizations (figures) for your outcome and independent variables. You will also need to discuss how your sample size decreases with successive restrictions. This part of the project will alert me to potential coding issues that you need to fix prior to running models. Please send me a draft of this part (along with your revisions to the first part of the project) by April 7th or April 14th (depending on when we discuss your project proposals).

Rental Burden and Mental Health

In “Evicted: Poverty and Profit in the American City,” the author Matthew Desmond observed that in some households, there isn’t enough to eat because the “rent eats first.” Housing is shelter, a basic human need. As documented by Desmond, housing is so important it can take priority over other basic needs like food. Therefore, it is reasonable to presume that anxiety and depression is elevated in low-income households that pay a high proportion of income on rent, as well as households that have fallen behind on rent. This research project will explore the relationship between rental cost burden and tenant mental health.

The project will focus on renters (rather than homeowners) for multiple reasons. Prior research shows that low-income families are more likely to rent than to own a home. Renters exhibit greater cost burden than homeowners. Furthermore, renters tend to be at greater risk of eviction than homeowners. The sample consists of respondents to the U.S. Census Bureau Household Pulse Survey Phase 3.3 who rent their dwelling.

Research questions:

  1. What is the relationship between the proportion of income spent on rent and rates of depression and anxiety in the Unites States? Does income level interact with proportion of income spent on rent relative to rates of depression and anxiety?

    1. Do state landlord-tenant laws affect these relationships
  2. What is the relationship between falling behind on rent and mental health?

    1. Do state landlord-tenant laws affect these relationships?

Hypotheses

  1. Renters in low-income households (in the lowest 2 income brackets) who pay a high proportion of income for rent (>30%) exhibit higher rates of depression and anxiety. Among similarly situated renters, rates of depression and anxiety are lower in states with pro-tenant policies and higher in states with pro-landlord policies.

  2. Renters who are late on rent payments exhibit higher rates of depression and anxiety. Among renters who are late on rent, rates of depression and anxiety are lower in states with pro-tenant policies and higher in states with pro-landlord policies.

Data sources

Independent variables (from Household Pulse Survey)

  • inclvl: numerical variable derived from INCOME

  • trentamt: Monthly rent from TRENTAMT

  • rentpct: rent as a percent of income: Variable derived from INCOME and TRENTAMT

  • evict: Eviction in next two months

Dependent variables (from Household Pulse Survey)

  • badmnhlth: a dummy variable for bad mental health derived from ANXIOUS, WORRY, INTEREST, and DOWN

Control variables (from Household Pulse Survey)

  • TBIRTH_YEAR: Year of birth to determine age

  • GENID_DESCRIBE: Current gender identity

  • Race/Ethnicity: Categorical variable derived from RHISPANIC and RRACE

  • MS: Marital status

  • THHLD_NUMPER: Total number of people in household

  • Children in household: Dichotomous variable derived from THHLD_NUMKID

  • EEDUC: Educational attainment

State Policy Variables

  • grace_period: Grace period for paying rent prior to eviction, a numerical variable derived from Question 14 on State Eviction Laws database

Reading in the data

Importing the data for Household Pulse Survey phase 3.5. I had to change to phase 3.5 (instead of 3.3) because earlier phases do not include amount of rent.

Code
hps46 <- read.csv("./data/pulse2022_puf_46.csv")

hps47 <- read.csv("./data/pulse2022_puf_47.csv")

hps48 <- read.csv("./data/pulse2022_puf_48.csv")

Combining into one dataframe.

Code
hps3.5 <- rbind(hps46, hps47, hps48)

names(hps3.5) <- tolower(names(hps3.5))

nrow(hps3.5)
[1] 167931

Importing the data for the Law Atlas Project State Eviction Laws.

Code
lawdat <- read_xlsx("./data/LSCEvictionLaws_StateTerritory_Data.xlsx")
New names:
• `` -> `...60`
Code
nrow(lawdat)
[1] 53

Importing the data for OPENICPSR Eviction Moratoria data. Setting this aside for now.

Code
# evictdat <- read_xlsx("./data/2023.02.01 Moratoria Supportive + Measures Datasets.xlsx")

# For the timeperiod of this data, only California and Massachusetts have an eviction moratorium in place.

Joining hps3.5 with lawdat.

Code
alldata <- merge(x=hps3.5, y=lawdat, by='est_st', all.x=TRUE)

nrow(alldata)
[1] 167931

Selecting variables from alldata for a working dataframe. I could select more variables from lawdat or I could create an index.

Code
dat <- alldata %>% 
  dplyr::select(week, income, tenure, trentamt, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, est_st, Jurisdictions, hweight, pweight, grace_period)

nrow(dat)
[1] 167931

Cleaning and exploring the data

Distribution of tenure. Category “3” is renters.

Code
tabyl(dat$tenure)
 dat$tenure     n     percent
        -99  1231 0.007330392
        -88 27107 0.161417487
          1 39562 0.235584853
          2 64143 0.381960448
          3 34032 0.202654662
          4  1856 0.011052158

Filter for only renters.

Code
dat <- dat %>%
  filter(.$tenure == 3)

nrow(dat)
[1] 34032

Distribution of income.

Code
tabyl(dat$income)
 dat$income    n    percent
        -99  468 0.01375176
        -88 1568 0.04607428
          1 7488 0.22002821
          2 4635 0.13619535
          3 4740 0.13928068
          4 5750 0.16895863
          5 3420 0.10049365
          6 3236 0.09508698
          7 1360 0.03996239
          8 1367 0.04016808

Filter out rows with missing income data.

Code
dat <- dat %>% 
  filter(.$income %in% c(1:8))

tabyl(dat$income)
 dat$income    n    percent
          1 7488 0.23402925
          2 4635 0.14486186
          3 4740 0.14814352
          4 5750 0.17970996
          5 3420 0.10688836
          6 3236 0.10113764
          7 1360 0.04250531
          8 1367 0.04272409

Checking sample size.

Code
nrow(dat)
[1] 31996

Recode income with new variable ‘inclvl’.

Code
dat <- dat %>%
  mutate(inclvl = case_when(.$income == '1' ~ 12500,
                            .$income == '2' ~ 30000,
                            .$income == '3' ~ 42500,
                            .$income == '4' ~ 62500,
                            .$income == '5' ~ 87500,
                            .$income == '6' ~ 125000,
                            .$income == '7' ~ 175000,
                            .$income == '8' ~ 200000,
                              )
        )

tabyl(dat$inclvl)
 dat$inclvl    n    percent
      12500 7488 0.23402925
      30000 4635 0.14486186
      42500 4740 0.14814352
      62500 5750 0.17970996
      87500 3420 0.10688836
     125000 3236 0.10113764
     175000 1360 0.04250531
     200000 1367 0.04272409

Checking sample size.

Code
nrow(dat)
[1] 31996

Histogram of income levels.

Code
hist(dat$inclvl)

Summary statistics of rent amount.

Code
summary(dat$trentamt)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    -99     725    1200    1297    1775    3500 

How many people reported paying zero rent?

Code
length(which(dat$trentamt == 0))
[1] 138

Removing negative values for rent, but leaving zeros. The data dictionary does not show an NA variable. It should be coded as a positive value from 0 to 99999.

Code
dat <- dat %>% 
  filter(.$trentamt >= 0)

Checking sample size.

Code
nrow(dat)
[1] 30074

Updated summary statistics of rent amount with negative amounts removed.

Code
summary(dat$trentamt)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0     800    1216    1386    1800    3500 

Boxplot of rent amount.

Code
boxplot(dat$trentamt)

Histogram of rent amount.

Code
hist(dat$trentamt)

Making a new variable for rent as a percent of income.

Code
#I'm adding 0.01 because some rent amounts are zero.

dat$rentpct <- (dat$trentamt + 0.01)/(dat$inclvl/12)

Summary statistics of rent as a percent of income.

Code
summary(dat$rentpct)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
0.000001 0.192001 0.283201 0.413501 0.480010 3.360010 

Boxplot of rent as a percent of income.

Code
boxplot(dat$rentpct)

Histogram of rent as a percent of income.

Code
hist(dat$rentpct)

Checking sample size.

Code
nrow(dat)
[1] 30074

Checking distribution of rentcur.

Code
tabyl(dat$rentcur)
 dat$rentcur     n     percent
         -99    39 0.001296801
           1 26815 0.891633970
           2  3220 0.107069229

Removing missing values for rentcur.

Code
dat <- dat %>% 
  filter(.$rentcur %in% c(1:2))

Checking distribution of rentcur.

Code
tabyl(dat$rentcur)
 dat$rentcur     n   percent
           1 26815 0.8927917
           2  3220 0.1072083

Dichotomize rentcur. “Caught up on rent” = 1 and “Not caught up on rent” = 0

Code
dat$rentcur <- car::recode(dat$rentcur, "1=1; else=0")
tabyl(dat$rentcur)
 dat$rentcur     n   percent
           0  3220 0.1072083
           1 26815 0.8927917

Checking sample size.

Code
nrow(dat)
[1] 30035

Checking distribution of evict.

Code
tabyl(dat$evict)
 dat$evict     n      percent
       -99    12 0.0003995339
       -88 26815 0.8927917430
         1   519 0.0172798402
         2   852 0.0283669053
         3   920 0.0306309306
         4   917 0.0305310471

This question is asked if RENTCUR = 2, so it is only asked of people who are not caught up on rent. 1) Very likely 2) Somewhat likely 3) Not very likely 4) Not likely at all

Set this aside for now.


Mental health.

First, I add the 4 mental health categories together to create a mental health index.

Code
dat <- dat %>% 
  mutate(mnhlth = anxious + worry + interest + down)

tabyl(dat$mnhlth)
 dat$mnhlth    n      percent
       -396    7 2.330614e-04
       -295    5 1.664724e-04
       -294    1 3.329449e-05
       -293    2 6.658898e-05
       -196    6 1.997669e-04
       -195    2 6.658898e-05
       -194    3 9.988347e-05
        -96   32 1.065424e-03
        -95   15 4.994173e-04
        -94   12 3.995339e-04
        -93   18 5.993008e-04
        -92    6 1.997669e-04
        -91    7 2.330614e-04
        -90    9 2.996504e-04
        -89    1 3.329449e-05
        -88    4 1.331780e-04
        -87   10 3.329449e-04
          4 6607 2.199767e-01
          5 2351 7.827535e-02
          6 2882 9.595472e-02
          7 2611 8.693191e-02
          8 4007 1.334110e-01
          9 1869 6.222740e-02
         10 1613 5.370401e-02
         11 1255 4.178458e-02
         12 1572 5.233894e-02
         13 1042 3.469286e-02
         14 1013 3.372732e-02
         15  850 2.830032e-02
         16 2223 7.401365e-02

Filter for results from 4 to 16.

Code
dat <- dat %>% 
  filter(.$mnhlth %in% c(4:16))

tabyl(dat$mnhlth)
 dat$mnhlth    n    percent
          4 6607 0.22100686
          5 2351 0.07864191
          6 2882 0.09640408
          7 2611 0.08733902
          8 4007 0.13403579
          9 1869 0.06251882
         10 1613 0.05395551
         11 1255 0.04198026
         12 1572 0.05258404
         13 1042 0.03485533
         14 1013 0.03388527
         15  850 0.02843285
         16 2223 0.07436026

Convert to PHQ4 scale by subtracting 4 from each score.

Code
dat$mnhlth <- dat$mnhlth - 4

tabyl(dat$mnhlth)
 dat$mnhlth    n    percent
          0 6607 0.22100686
          1 2351 0.07864191
          2 2882 0.09640408
          3 2611 0.08733902
          4 4007 0.13403579
          5 1869 0.06251882
          6 1613 0.05395551
          7 1255 0.04198026
          8 1572 0.05258404
          9 1042 0.03485533
         10 1013 0.03388527
         11  850 0.02843285
         12 2223 0.07436026

Histogram of mental health index for all respondents.

Code
hist(dat$mnhlth)

Coding the mental health index to a variable “phq4”. There are four psychological distress levels: None, Mild, Moderate, and Severe. These could be dichotomized.

Code
dat <- dat %>%
  mutate(phq4 = case_when(.$mnhlth %in% c(0:2) ~"1 none",
                          .$mnhlth %in% c(3:5) ~"2 mild",
                          .$mnhlth %in% c(6:8) ~"3 moderate",
                          .$mnhlth %in% c(9:12) ~"4 severe",
                          )
         )

tabyl(dat$phq4)
   dat$phq4     n   percent
     1 none 11840 0.3960529
     2 mild  8487 0.2838936
 3 moderate  4440 0.1485198
   4 severe  5128 0.1715337

Make a dummy variable for bad mental health. Bad mental (10 to 16) is 1, good mental health (4 to 9) is 0. The threshold for this can be adjusted.

Code
dat <- dat %>%
  mutate(badmnhlth = case_when(.$mnhlth <=5 ~ 0,
                               .$mnhlth >=6 ~ 1,
                               )
         )

tabyl(dat$badmnhlth)
 dat$badmnhlth     n   percent
             0 20327 0.6799465
             1  9568 0.3200535

Visualizations

A graph of income level and rent alone. Would this work better as a series of boxplots?

Code
plot(dat$inclvl, dat$trentamt)

Visualization of bad mental health by rent alone.

Code
cdplot(as.factor(badmnhlth) ~ trentamt, data=dat)

Visualization of bad mental health by rent alone, faceted by income levels.

Code
ggplot(dat, aes(trentamt, after_stat(count), fill = forcats::fct_relevel(as.factor(badmnhlth)))) +
  geom_density(position = "fill") +
  labs(fill = "Bad mental health") +
  facet_wrap(dat$inclvl)

A graph of income level and rent as a percentage of income.

Code
plot(dat$inclvl, dat$rentpct)

Visualization of bad mental health by rent as a percent of income.

Code
cdplot(as.factor(badmnhlth) ~ rentpct, data=dat)

Visualization of bad mental health by rent as a percent of income, faceted by income levels.

Code
ggplot(dat, aes(rentpct, after_stat(count), fill = forcats::fct_relevel(as.factor(badmnhlth)))) +
  geom_density(position = "fill") +
  labs(fill = "Bad mental health") +
  facet_wrap(dat$inclvl)

Visualization of bad mental health by rent for all income levels.

Code
cdplot(as.factor(badmnhlth) ~ trentamt, data=dat)

Visulalization of bad mental health by rent for the two lowest income levels. Making a datafram “temp” with just the two lowest income levels.

Code
temp <- dat %>%
  filter(income %in% c(1:2))

tabyl(temp$income)
 temp$income    n   percent
           1 6844 0.6144178
           2 4295 0.3855822

Visualization of bad mental health by rent for two lowest income levels.

Code
cdplot(as.factor(badmnhlth) ~ trentamt, data=temp)

Visualization of bad mental health by rent as a percent of income for two lowest income levels.

Code
cdplot(as.factor(badmnhlth) ~ rentpct, data=temp)

Table of bad mental health by rentcur.

Code
tabyl(dat, mnhlth, rentcur)
 mnhlth   0    1
      0 391 6216
      1 129 2222
      2 195 2687
      3 197 2414
      4 392 3615
      5 202 1667
      6 189 1424
      7 173 1082
      8 255 1317
      9 177  865
     10 184  829
     11 168  682
     12 544 1679

Table of phq4 by rentcur.

Code
tabyl(dat, phq4, rentcur)
       phq4    0     1
     1 none  715 11125
     2 mild  791  7696
 3 moderate  617  3823
   4 severe 1073  4055

Checking sample size.

Code
nrow(dat)
[1] 29895

Checking the distribution of grace_period.

Code
tabyl(dat$grace_period)
 dat$grace_period    n     percent
                0 2471 0.082655963
                1  220 0.007359090
                3 8217 0.274862017
                5 3771 0.126141495
                6  253 0.008462954
                7 3317 0.110955009
               10 5480 0.183308246
               12  511 0.017093159
               14 4730 0.158220438
               20  308 0.010302726
               30  617 0.020638903

Histogram of grace period.

Code
hist(dat$grace_period)

Grace period jurisdictions.

Code
tabyl(dat, Jurisdictions, grace_period)
        Jurisdictions   0   1    3   5   6   7   10  12   14  20  30
              Alabama   0   0    0   0   0 320    0   0    0   0   0
               Alaska   0   0    0   0   0 337    0   0    0   0   0
              Arizona   0   0    0 771   0   0    0   0    0   0   0
             Arkansas   0   0  317   0   0   0    0   0    0   0   0
           California   0   0 3139   0   0   0    0   0    0   0   0
             Colorado   0   0    0   0   0   0  746   0    0   0   0
          Connecticut   0   0    0   0   0   0    0 511    0   0   0
             Delaware   0   0    0 220   0   0    0   0    0   0   0
 District of Columbia   0   0    0   0   0   0    0   0    0   0 617
              Florida   0   0  982   0   0   0    0   0    0   0   0
              Georgia 713   0    0   0   0   0    0   0    0   0   0
               Hawaii   0   0    0 465   0   0    0   0    0   0   0
                Idaho   0   0  424   0   0   0    0   0    0   0   0
             Illinois   0   0    0 648   0   0    0   0    0   0   0
              Indiana   0   0    0   0   0   0  459   0    0   0   0
                 Iowa   0   0  386   0   0   0    0   0    0   0   0
               Kansas   0   0  537   0   0   0    0   0    0   0   0
             Kentucky   0   0    0   0   0 390    0   0    0   0   0
            Louisiana   0   0    0 321   0   0    0   0    0   0   0
                Maine   0   0    0   0   0 251    0   0    0   0   0
             Maryland 559   0    0   0   0   0    0   0    0   0   0
        Massachusetts   0   0    0   0   0   0    0   0  973   0   0
             Michigan   0   0    0   0   0 594    0   0    0   0   0
            Minnesota 534   0    0   0   0   0    0   0    0   0   0
          Mississippi   0 220    0   0   0   0    0   0    0   0   0
             Missouri   0   0    0   0   0   0  512   0    0   0   0
              Montana   0   0  287   0   0   0    0   0    0   0   0
             Nebraska   0   0    0   0   0 418    0   0    0   0   0
               Nevada   0   0    0   0   0 551    0   0    0   0   0
        New Hampshire   0   0    0   0   0 456    0   0    0   0   0
           New Jersey 484   0    0   0   0   0    0   0    0   0   0
           New Mexico   0   0  482   0   0   0    0   0    0   0   0
             New York   0   0    0   0   0   0    0   0  819   0   0
       North Carolina   0   0    0   0   0   0  508   0    0   0   0
         North Dakota   0   0  278   0   0   0    0   0    0   0   0
                 Ohio   0   0  491   0   0   0    0   0    0   0   0
             Oklahoma   0   0    0 462   0   0    0   0    0   0   0
               Oregon   0   0    0   0   0   0  988   0    0   0   0
         Pennsylvania   0   0    0   0   0   0  733   0    0   0   0
         Rhode Island   0   0    0   0   0   0    0   0    0 308   0
       South Carolina   0   0    0 355   0   0    0   0    0   0   0
         South Dakota   0   0    0   0 253   0    0   0    0   0   0
            Tennessee   0   0    0   0   0   0    0   0  505   0   0
                Texas   0   0    0   0   0   0 1534   0    0   0   0
                 Utah   0   0  683   0   0   0    0   0    0   0   0
              Vermont   0   0    0   0   0   0    0   0  304   0   0
             Virginia   0   0    0   0   0   0    0   0  721   0   0
           Washington   0   0    0   0   0   0    0   0 1408   0   0
        West Virginia 181   0    0   0   0   0    0   0    0   0   0
            Wisconsin   0   0    0 529   0   0    0   0    0   0   0
              Wyoming   0   0  211   0   0   0    0   0    0   0   0

How many observations do we have by state?

Code
tabyl(dat$Jurisdictions)
    dat$Jurisdictions    n     percent
              Alabama  320 0.010704131
               Alaska  337 0.011272788
              Arizona  771 0.025790266
             Arkansas  317 0.010603780
           California 3139 0.105000836
             Colorado  746 0.024954006
          Connecticut  511 0.017093159
             Delaware  220 0.007359090
 District of Columbia  617 0.020638903
              Florida  982 0.032848302
              Georgia  713 0.023850142
               Hawaii  465 0.015554441
                Idaho  424 0.014182974
             Illinois  648 0.021675866
              Indiana  459 0.015353738
                 Iowa  386 0.012911858
               Kansas  537 0.017962870
             Kentucky  390 0.013045660
            Louisiana  321 0.010737582
                Maine  251 0.008396053
             Maryland  559 0.018698779
        Massachusetts  973 0.032547249
             Michigan  594 0.019869543
            Minnesota  534 0.017862519
          Mississippi  220 0.007359090
             Missouri  512 0.017126610
              Montana  287 0.009600268
             Nebraska  418 0.013982271
               Nevada  551 0.018431176
        New Hampshire  456 0.015253387
           New Jersey  484 0.016189998
           New Mexico  482 0.016123098
             New York  819 0.027395886
       North Carolina  508 0.016992808
         North Dakota  278 0.009299214
                 Ohio  491 0.016424151
             Oklahoma  462 0.015454089
               Oregon  988 0.033049005
         Pennsylvania  733 0.024519150
         Rhode Island  308 0.010302726
       South Carolina  355 0.011874895
         South Dakota  253 0.008462954
            Tennessee  505 0.016892457
                Texas 1534 0.051312929
                 Utah  683 0.022846630
              Vermont  304 0.010168925
             Virginia  721 0.024117745
           Washington 1408 0.047098177
        West Virginia  181 0.006054524
            Wisconsin  529 0.017695267
              Wyoming  211 0.007058036

Cleaning and dichotomizing income

Let’s make dummy variables for income. The dummy variables are inclvl1 to inclvl8.

Code
tabyl(dat$inclvl)
 dat$inclvl    n    percent
      12500 6844 0.22893460
      30000 4295 0.14366951
      42500 4442 0.14858672
      62500 5413 0.18106707
      87500 3235 0.10821208
     125000 3066 0.10255896
     175000 1295 0.04331828
     200000 1305 0.04365278
Code
dat$inclvl1<- car::recode(dat$inclvl, "12500=1; else=0")
tabyl(dat$inclvl1)
 dat$inclvl1     n   percent
           0 23051 0.7710654
           1  6844 0.2289346
Code
dat$inclvl2<- car::recode(dat$inclvl, "30000=1; else=0")
tabyl(dat$inclvl2)
 dat$inclvl2     n   percent
           0 25600 0.8563305
           1  4295 0.1436695
Code
dat$inclvl3<- car::recode(dat$inclvl, "42500=1; else=0")
tabyl(dat$inclvl3)
 dat$inclvl3     n   percent
           0 25453 0.8514133
           1  4442 0.1485867
Code
dat$inclvl4<- car::recode(dat$inclvl, "62500=1; else=0")
tabyl(dat$inclvl4)
 dat$inclvl4     n   percent
           0 24482 0.8189329
           1  5413 0.1810671
Code
dat$inclvl5<- car::recode(dat$inclvl, "87500=1; else=0")
tabyl(dat$inclvl5)
 dat$inclvl5     n   percent
           0 26660 0.8917879
           1  3235 0.1082121
Code
dat$inclvl6<- car::recode(dat$inclvl, "125000=1; else=0")
tabyl(dat$inclvl6)
 dat$inclvl6     n  percent
           0 26829 0.897441
           1  3066 0.102559
Code
dat$inclvl7<- car::recode(dat$inclvl, "175000=1; else=0")
tabyl(dat$inclvl7)
 dat$inclvl7     n    percent
           0 28600 0.95668172
           1  1295 0.04331828
Code
dat$inclvl8<- car::recode(dat$inclvl, "200000=1; else=0")
tabyl(dat$inclvl8)
 dat$inclvl8     n    percent
           0 28590 0.95634722
           1  1305 0.04365278

Cleaning and recoding age

Let’s visualize birth year with a histogram.

Code
hist(dat$tbirth_year)

Let’s use birth year to create a new variable for age.

Code
dat$age <- 2022-dat$tbirth_year

tabyl(dat$age)
 dat$age   n      percent
      18  21 0.0007024586
      19  48 0.0016056197
      20  77 0.0025756816
      21 163 0.0054524168
      22 218 0.0072921893
      23 370 0.0123766516
      24 489 0.0163572504
      25 616 0.0206054524
      26 708 0.0236828901
      27 736 0.0246195016
      28 768 0.0256899147
      29 780 0.0260913196
      30 803 0.0268606790
      31 803 0.0268606790
      32 817 0.0273289848
      33 752 0.0251547081
      34 783 0.0261916708
      35 725 0.0242515471
      36 720 0.0240842950
      37 712 0.0238166918
      38 709 0.0237163405
      39 744 0.0248871049
      40 651 0.0217762168
      41 644 0.0215420639
      42 614 0.0205385516
      43 636 0.0212744606
      44 610 0.0204047500
      45 545 0.0182304733
      46 510 0.0170597090
      47 497 0.0166248537
      48 500 0.0167252049
      49 481 0.0160896471
      50 506 0.0169259073
      51 535 0.0178959692
      52 562 0.0187991303
      53 512 0.0171266098
      54 498 0.0166583041
      55 410 0.0137146680
      56 453 0.0151530356
      57 435 0.0145509282
      58 468 0.0156547918
      59 417 0.0139488209
      60 467 0.0156213414
      61 459 0.0153537381
      62 459 0.0153537381
      63 449 0.0150192340
      64 399 0.0133467135
      65 413 0.0138150192
      66 401 0.0134136143
      67 367 0.0122763004
      68 393 0.0131460110
      69 357 0.0119417963
      70 335 0.0112058873
      71 269 0.0089981602
      72 304 0.0101689246
      73 265 0.0088643586
      74 237 0.0079277471
      75 235 0.0078608463
      76 185 0.0061883258
      77 126 0.0042147516
      78 106 0.0035457434
      79 117 0.0039136979
      80  88 0.0029436361
      81  90 0.0030105369
      82  55 0.0018397725
      83  54 0.0018063221
      84  44 0.0014718180
      85  32 0.0010704131
      86  30 0.0010035123
      87  22 0.0007359090
      88  91 0.0030439873
Code
hist(dat$age)

Recoding and dichotomizing gender identity.

Clean up gender identity. Check the codes in the data dictionary. I need to do dummy variables. What do I do about -99? How many categories should I do?

Code
tabyl(dat$genid_describe)
 dat$genid_describe     n     percent
                -99   139 0.004649607
                  1 10680 0.357250376
                  2 18280 0.611473491
                  3   271 0.009065061
                  4   525 0.017561465
Code
dat$male <- car::recode(dat$genid_describe, "1=1; else=0") #gen1 = male
tabyl(dat$male)
 dat$male     n   percent
        0 19215 0.6427496
        1 10680 0.3572504
Code
dat$female <- car::recode(dat$genid_describe, "2=1; else=0") #gen2 = female
tabyl(dat$female)
 dat$female     n   percent
          0 11615 0.3885265
          1 18280 0.6114735
Code
dat$transgender <- car::recode(dat$genid_describe, "3=1; else=0") #gen3 = transgender
tabyl(dat$transgender)
 dat$transgender     n     percent
               0 29624 0.990934939
               1   271 0.009065061
Code
dat$gen_none <- car::recode(dat$genid_describe, "4=1; else=0") #gen4 = None of these
tabyl(dat$gen_none)
 dat$gen_none     n    percent
            0 29370 0.98243853
            1   525 0.01756147
Code
dat$gen_notsel <- car::recode(dat$genid_describe, "-99=1; else=0") #gen5 = not selected
tabyl(dat$gen_notsel)
 dat$gen_notsel     n     percent
              0 29756 0.995350393
              1   139 0.004649607

Recoding and dichotomizing race and ethnicity.

Combine race/ethnicity into a race_eth variable.

Code
tabyl(dat$rrace)
 dat$rrace     n    percent
         1 22450 0.75096170
         2  3715 0.12426827
         3  1559 0.05214919
         4  2171 0.07262084
Code
tabyl(dat$rhispanic)
 dat$rhispanic     n   percent
             1 26360 0.8817528
             2  3535 0.1182472
Code
dat <- dat %>%
  mutate(race_eth = case_when(.$rhispanic == 1 & .$rrace == 1 ~"nh_white",
                              .$rhispanic == 1 & .$rrace == 2 ~"nh_black",
                              .$rhispanic == 1 & .$rrace == 3 ~"nh_asian",
                              .$rhispanic == 1 & .$rrace == 4 ~"other",
                              .$rhispanic == 2 & .$rrace %in% c(1:4) ~ "hispanic"
                              )
        )

tabyl(dat$race_eth)
 dat$race_eth     n    percent
     hispanic  3535 0.11824720
     nh_asian  1480 0.04950661
     nh_black  3500 0.11707643
     nh_white 19707 0.65920723
        other  1673 0.05596254

Dichotomizing race and ethnicity.

Code
dat$hispanic <- car::recode(dat$race_eth, "'hispanic'=1; else=0")
tabyl(dat$hispanic)
 dat$hispanic     n   percent
            0 26360 0.8817528
            1  3535 0.1182472
Code
dat$nh_white <- car::recode(dat$race_eth, "'nh_white'=1; else=0")
tabyl(dat$nh_white)
 dat$nh_white     n   percent
            0 10188 0.3407928
            1 19707 0.6592072
Code
dat$nh_black <- car::recode(dat$race_eth, "'nh_black'=1; else=0")
tabyl(dat$nh_black)
 dat$nh_black     n   percent
            0 26395 0.8829236
            1  3500 0.1170764
Code
dat$nh_asian <- car::recode(dat$race_eth, "'nh_asian'=1; else=0")
tabyl(dat$nh_asian)
 dat$nh_asian     n    percent
            0 28415 0.95049339
            1  1480 0.04950661
Code
dat$other <- car::recode(dat$race_eth, "'other'=1; else=0")
tabyl(dat$other)
 dat$other     n    percent
         0 28222 0.94403746
         1  1673 0.05596254

Cleaning and recoding educational attainment.

For educational attainment, dichotomize into “less than high school”, “high school”, “some college”, “bachelor’s or higher”. This could be dichotomized.

Code
tabyl(dat$eeduc)
 dat$eeduc    n     percent
         1  291 0.009734069
         2  628 0.021006857
         3 4308 0.144104365
         4 7311 0.244555946
         5 3149 0.105335340
         6 8107 0.271182472
         7 6101 0.204080950

Dichotomizing educational attainment.

Code
dat$lessthanhs <- car::recode(dat$eeduc, "1=1; 2=1; else=0")
tabyl(dat$lessthanhs)
 dat$lessthanhs     n    percent
              0 28976 0.96925907
              1   919 0.03074093
Code
dat$highschool <- car::recode(dat$eeduc, "3=1; else=0")
tabyl(dat$highschool)
 dat$highschool     n   percent
              0 25587 0.8558956
              1  4308 0.1441044
Code
dat$somecollege <- car::recode(dat$eeduc, "4=1; 5=1; else=0")
tabyl(dat$somecollege)
 dat$somecollege     n   percent
               0 19435 0.6501087
               1 10460 0.3498913
Code
dat$bachelors <- car::recode(dat$eeduc, "6=1; 7=1; else=0")
tabyl(dat$bachelors)
 dat$bachelors     n   percent
             0 15687 0.5247366
             1 14208 0.4752634

Number of people in the household.

Distribution of number of people in household. I’m leaving this as an integer for now.

Code
tabyl(dat$thhld_numper)
 dat$thhld_numper     n     percent
                1  9703 0.324569326
                2 10030 0.335507610
                3  4470 0.149523332
                4  3097 0.103595919
                5  1423 0.047599933
                6   632 0.021140659
                7   283 0.009466466
                8   109 0.003646095
                9    47 0.001572169
               10   101 0.003378491

Visualizing and dichotomizing presence of children

Dichotomize presence of children.

Code
tabyl(dat$thhld_numkid)
 dat$thhld_numkid     n     percent
                0 21396 0.715704967
                1  4234 0.141629035
                2  2565 0.085800301
                3  1054 0.035256732
                4   393 0.013146011
                5   253 0.008462954

Dichotomizing presence of children. Children in household is coded as “1”, no children is “0”.

Code
dat$children <- car::recode(dat$thhld_numkid, "0=0; else=1")
tabyl(dat$children)
 dat$children     n  percent
            0 21396 0.715705
            1  8499 0.284295

Visualizing and dichotomizing households with one adult versus multiple adults.

Dichotomizing number of adults in household. Single adult is coded as “1”, multiple adults is coded as “0”.

Code
tabyl(dat$thhld_numadlt)
 dat$thhld_numadlt     n      percent
                 1 11834 0.3958521492
                 2 13271 0.4439203880
                 3  3111 0.1040642248
                 4  1101 0.0368289012
                 5   382 0.0127780565
                 6   110 0.0036795451
                 7    33 0.0011038635
                 8    17 0.0005686570
                 9    19 0.0006355578
                10    17 0.0005686570
Code
dat$single_adult <- car::recode(dat$thhld_numadlt, "1=1; else=0")
tabyl(dat$single_adult)
 dat$single_adult     n   percent
                0 18061 0.6041479
                1 11834 0.3958521

Modeling

Let’s review my hypotheses:

  1. Renters in low-income households (in the lowest 2 income brackets) who pay a high proportion of income for rent (>30%) exhibit higher rates of depression and anxiety.

Comments: I’m going to keep lowest 2 income brackets. Reference Schuetz for justification of this. My options for DVs are mental health on 3 scales: 1) a categorical variable from 0 to 12, 2) PHQ4 which is a categorical variable of 4 measurements, or 3) a dichotomous variable for bad mental health, split down the middle (or it can be divided another way, such as the most severe level).

Statistical method: Population: limited to bottom 2 categories of income. IV: Rent amount an integer. DV: Mental health but which variable?

  1. Among similarly situated renters, rates of depression and anxiety are lower in states with pro-tenant policies and higher in states with pro-landlord policies.

Statistical method: Limit to lowest 2 income brackets. IV: Grace period by state. DV: Mental health but which variable?

  1. Renters who are late on rent payments exhibit higher rates of depression and anxiety.

Statistical method: IV: Current on rent, which is dichotomous. DV: Mental health, but which variable?

  1. Among renters who are late on rent, rates of depression and anxiety are lower in states with pro-tenant policies and higher in states with pro-landlord policies.

Statistical method: Population: only renters who are late on rent. IV: Grace period by state. DV: Mental health, but which variable.

Hypothesis 1:

DV: Bad mental health

Bad mental health (10 to 16) is 1, good mental health (4 to 9) is 0.

IV: Rent at a percent of the midpoint of the income level.

Rentpct is a continuous variable defined as rent as a percentage of the midpoint of the income level.

IV:

Control variables. - age: This is a continuous variable. Not dichotomized. - thhld_numper: Number of people in household. Continuous variable. Not dichotomized. - children: This is a dichotomous variable. No children = 0, children = 1 - single_adult: This is a dichotomous variable. More than 1 adult = 0, Single adult = 1 - race_eth is categorical. The dichotomized variables are ‘hisanic’, ‘nh_white’, ‘nh_black’, ‘nh_asian’, and ‘other’. - eeduc is educational attainment, a categorical variable coded as a number. The dichotomized variables are ‘lessthanhs’, ‘highschool’, ‘somecollege’, and ‘bachelors’.

Limit database to population in lowest 2 income brackets

Code
tabyl(dat$income)
 dat$income    n    percent
          1 6844 0.22893460
          2 4295 0.14366951
          3 4442 0.14858672
          4 5413 0.18106707
          5 3235 0.10821208
          6 3066 0.10255896
          7 1295 0.04331828
          8 1305 0.04365278
Code
nrow(dat)
[1] 29895
Code
dat2 <- dat %>% 
  filter(income %in% c(1:2))

tabyl(dat2$income)
 dat2$income    n   percent
           1 6844 0.6144178
           2 4295 0.3855822
Code
nrow(dat2)
[1] 11139

Preliminary modeling results

This is where you will describe the justification for the models that you are estimating and present the results from them using tables and figures. In discussing your results, you not only must refer to statistical significance, but also the magnitude of effects. Also, be sure to address model fit based on different specifications of your independent variables or inclusion of different sets of variables.

Testing Hypothesis 1

  1. Renters in low-income households (in the lowest 2 income brackets) who pay a high proportion of income for rent (>30%) exhibit higher rates of depression and anxiety.

DV: Bad mental health. Bad mental health (10 to 16) is 1, good mental health (4 to 9) is 0.

IV: Rentpct is a continuous variable defined as rent as a percentage of the midpoint of the income level.

Control variables:

  • age: This is a continuous variable. Not dichotomized.
  • thhld_numper: Number of people in household. Continuous variable. Not dichotomized.
  • children: This is a dichotomous variable. No children = 0, children = 1
  • single_adult: This is a dichotomous variable. More than 1 adult = 0, Single adult = 1
  • gender_id is categorical. The dichotomized variables are ‘male’, ‘female’, ‘transgender’, ‘gen_none’, and ‘gen_notsel’.
  • race_eth is categorical. The dichotomized variables are ‘hisanic’, ‘nh_white’, ‘nh_black’, ‘nh_asian’, and ‘other’.
  • eeduc is educational attainment, a categorical variable coded as a number. The dichotomized variables are ‘lessthanhs’, ‘highschool’, ‘somecollege’, and ‘bachelors’.

Visualization of bad mental health by rent percent for two lowest income levels.

Code
cdplot(as.factor(badmnhlth) ~ rentpct, data=dat2)

Code
#Logit models of bad mental health with rent percent

model <- glm(badmnhlth ~ rentpct, family = "binomial", data = dat2)

summary(model)

Call:
glm(formula = badmnhlth ~ rentpct, family = "binomial", data = dat2)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.53481    0.03272 -16.345  < 2e-16 ***
rentpct      0.18485    0.03801   4.863 1.16e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14990  on 11138  degrees of freedom
Residual deviance: 14966  on 11137  degrees of freedom
AIC: 14970

Number of Fisher Scoring iterations: 4
Code
# Exponentiate

modelExp = model
modelExp$coefficients = exp(modelExp$coefficients)
summary(modelExp)

Call:
glm(formula = badmnhlth ~ rentpct, family = "binomial", data = dat2)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.58578    0.03272   17.90   <2e-16 ***
rentpct      1.20304    0.03801   31.65   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14990  on 11138  degrees of freedom
Residual deviance: 14966  on 11137  degrees of freedom
AIC: 14970

Number of Fisher Scoring iterations: 4

Creating a variable for rent percent squared.

Code
dat2$rentpctSQ <- dat2$rentpct*dat2$rentpct

Visualization of bad mental health by rent percent squared for two lowest income levels.

Code
cdplot(as.factor(badmnhlth) ~ rentpctSQ, data=dat2)

Code
#Logit models of bad mental health with trentamtSQ

model <- glm(badmnhlth ~ rentpct + rentpctSQ, family = "binomial", data = dat2)

summary(model)

Call:
glm(formula = badmnhlth ~ rentpct + rentpctSQ, family = "binomial", 
    data = dat2)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.64589    0.04843 -13.336  < 2e-16 ***
rentpct      0.47934    0.10174   4.712 2.46e-06 ***
rentpctSQ   -0.12616    0.04054  -3.112  0.00186 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14990  on 11138  degrees of freedom
Residual deviance: 14956  on 11136  degrees of freedom
AIC: 14962

Number of Fisher Scoring iterations: 4

Let’s add age to the model.

Code
model <- glm(badmnhlth ~ rentpct + age, family = "binomial", data = dat2)

summary(model)

Call:
glm(formula = badmnhlth ~ rentpct + age, family = "binomial", 
    data = dat2)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.366286   0.067388   5.436 5.46e-08 ***
rentpct      0.144519   0.038523   3.752 0.000176 ***
age         -0.018122   0.001195 -15.162  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14990  on 11138  degrees of freedom
Residual deviance: 14731  on 11136  degrees of freedom
AIC: 14737

Number of Fisher Scoring iterations: 4
Code
# MEMs: marginal effects at the mean

emmeans(model, specs = "rentpct",
        regrid = "response")
 rentpct  prob     SE  df asymp.LCL asymp.UCL
   0.686 0.397 0.0047 Inf     0.388     0.406

Confidence level used: 0.95 

I’m not sure how to interpret marginal effects for a continuous variable like rentpct? The marginal effect at the mean age is that at the mean rent percent, a person has a 39.7% probability of having bad mental health at the 95% confidence interval.

Let’s add number of people in the household.

Code
model <- glm(badmnhlth ~ rentpct + age + thhld_numper, family = "binomial", data = dat2)

summary(model)

Call:
glm(formula = badmnhlth ~ rentpct + age + thhld_numper, family = "binomial", 
    data = dat2)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)   0.391931   0.079607   4.923 8.51e-07 ***
rentpct       0.146236   0.038630   3.786 0.000153 ***
age          -0.018304   0.001233 -14.849  < 2e-16 ***
thhld_numper -0.007656   0.012648  -0.605 0.544961    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14990  on 11138  degrees of freedom
Residual deviance: 14730  on 11135  degrees of freedom
AIC: 14738

Number of Fisher Scoring iterations: 4
Code
# MEMs: marginal effects at the mean

emmeans(model, specs = "rentpct",
        regrid = "response")
 rentpct  prob     SE  df asymp.LCL asymp.UCL
   0.686 0.397 0.0047 Inf     0.388     0.406

Confidence level used: 0.95 

This is showing the same result as the prior emmeans function. The marginal effect at the mean age and number of people in the household is that at the mean rent percent, a person has a 39.7% probability of having bad mental health at the 95% confidence interval.

Let’s add presence of children in the household.

Code
model <- glm(badmnhlth ~ rentpct + age + thhld_numper + children, family = "binomial", data = dat2)

summary(model)

Call:
glm(formula = badmnhlth ~ rentpct + age + thhld_numper + children, 
    family = "binomial", data = dat2)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)   0.392143   0.079568   4.928 8.29e-07 ***
rentpct       0.142838   0.038689   3.692 0.000223 ***
age          -0.018521   0.001239 -14.951  < 2e-16 ***
thhld_numper  0.010579   0.016386   0.646 0.518522    
children     -0.098339   0.056450  -1.742 0.081500 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14990  on 11138  degrees of freedom
Residual deviance: 14727  on 11134  degrees of freedom
AIC: 14737

Number of Fisher Scoring iterations: 4
Code
# MEMs: marginal effects at the mean

emmeans(model, specs = "rentpct",
        regrid = "response")
 rentpct  prob      SE  df asymp.LCL asymp.UCL
   0.686 0.393 0.00533 Inf     0.382     0.403

Results are averaged over the levels of: children 
Confidence level used: 0.95 

The marginal effect at the mean age, number of people in the household, and presence of children is that at the mean rent percent, a person has a 39.3% probability of having bad mental health at the 95% confidence interval.

Let’s add single vs. multiple adults in the household.

Code
model <- glm(badmnhlth ~ rentpct + age + thhld_numper + children + single_adult, family = "binomial", data = dat2)

summary(model)

Call:
glm(formula = badmnhlth ~ rentpct + age + thhld_numper + children + 
    single_adult, family = "binomial", data = dat2)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)   0.535043   0.091187   5.868 4.42e-09 ***
rentpct       0.138281   0.038733   3.570 0.000357 ***
age          -0.018032   0.001248 -14.444  < 2e-16 ***
thhld_numper -0.032243   0.021157  -1.524 0.127506    
children     -0.031048   0.060174  -0.516 0.605872    
single_adult -0.169164   0.052113  -3.246 0.001170 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14990  on 11138  degrees of freedom
Residual deviance: 14717  on 11133  degrees of freedom
AIC: 14729

Number of Fisher Scoring iterations: 4
Code
# MEMs: marginal effects at the mean

emmeans(model, specs = "rentpct",
        regrid = "response")
 rentpct  prob      SE  df asymp.LCL asymp.UCL
   0.686 0.396 0.00546 Inf     0.385     0.406

Results are averaged over the levels of: children, single_adult 
Confidence level used: 0.95 

The marginal effect at the mean age, number of people in the household, presence of children, and single adult is that at the mean rent percent, a person has a 39.6% probability of having bad mental health at the 95% confidence interval.

Let’s look at Gender using male as the reference group.

Code
model <- glm(badmnhlth ~ rentpct + female + transgender + gen_none + gen_notsel, family = "binomial", data = dat2)

summary(model)

Call:
glm(formula = badmnhlth ~ rentpct + female + transgender + gen_none + 
    gen_notsel, family = "binomial", data = dat2)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.745826   0.046977 -15.876  < 2e-16 ***
rentpct      0.186073   0.038224   4.868 1.13e-06 ***
female       0.253796   0.044679   5.680 1.34e-08 ***
transgender  1.318868   0.196828   6.701 2.08e-11 ***
gen_none     0.835824   0.132018   6.331 2.43e-10 ***
gen_notsel   0.008629   0.242967   0.036    0.972    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14990  on 11138  degrees of freedom
Residual deviance: 14870  on 11133  degrees of freedom
AIC: 14882

Number of Fisher Scoring iterations: 4

Let’s just look at transgender.

Code
model <- glm(badmnhlth ~ rentpct + transgender, family = "binomial", data = dat2)

summary(model)

Call:
glm(formula = badmnhlth ~ rentpct + transgender, family = "binomial", 
    data = dat2)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.54429    0.03282 -16.586  < 2e-16 ***
rentpct      0.18062    0.03810   4.741 2.13e-06 ***
transgender  1.12151    0.19408   5.779 7.53e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14990  on 11138  degrees of freedom
Residual deviance: 14930  on 11136  degrees of freedom
AIC: 14936

Number of Fisher Scoring iterations: 4
Code
# MEMs: marginal effects at the mean

emmeans(model, specs = "transgender",
        regrid = "response")
 transgender  prob      SE  df asymp.LCL asymp.UCL
           0 0.396 0.00467 Inf     0.387     0.406
           1 0.668 0.04280 Inf     0.585     0.752

Confidence level used: 0.95 

The marginal effect at the mean rent percent is that a transgender person has a 66.8% probability of having bad mental health at the 95% confidence interval. A non-transgender person has a 39.6% probability of having bad mental health at the 95% confidence interval.

Now let’s look at Race and Ethnicity using nh_white as the reference group.

Code
model <- glm(badmnhlth ~ rentpct + hispanic + nh_black + nh_asian + other, family = "binomial", data = dat2)

summary(model)

Call:
glm(formula = badmnhlth ~ rentpct + hispanic + nh_black + nh_asian + 
    other, family = "binomial", data = dat2)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.46343    0.03559 -13.021  < 2e-16 ***
rentpct      0.20361    0.03847   5.292 1.21e-07 ***
hispanic    -0.15617    0.05777  -2.703  0.00687 ** 
nh_black    -0.36654    0.05564  -6.588 4.46e-11 ***
nh_asian    -0.63859    0.13078  -4.883 1.05e-06 ***
other        0.16699    0.07693   2.171  0.02996 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14990  on 11138  degrees of freedom
Residual deviance: 14888  on 11133  degrees of freedom
AIC: 14900

Number of Fisher Scoring iterations: 4

Now let’s just look at nh_Black.

Code
model <- glm(badmnhlth ~ rentpct + nh_black, family = "binomial", data = dat2)

summary(model)

Call:
glm(formula = badmnhlth ~ rentpct + nh_black, family = "binomial", 
    data = dat2)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.47791    0.03393 -14.086  < 2e-16 ***
rentpct      0.17923    0.03809   4.706 2.53e-06 ***
nh_black    -0.33572    0.05407  -6.210 5.31e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14990  on 11138  degrees of freedom
Residual deviance: 14927  on 11136  degrees of freedom
AIC: 14933

Number of Fisher Scoring iterations: 4
Code
# MEMs: marginal effects at the mean

emmeans(model, specs = "nh_black",
        regrid = "response")
 nh_black  prob     SE  df asymp.LCL asymp.UCL
        0 0.412 0.0051 Inf     0.402     0.422
        1 0.334 0.0111 Inf     0.312     0.356

Confidence level used: 0.95 

The marginal effect at the mean rent percent is that a non-Hispanic Black person has a 33.4% probability of having bad mental health at the 95% confidence interval. Those who are not non-Hispanic Black have a 41.2% probability of having bad mental health at the 95% confidence interval.

Now let’s look at Educational Attainment using highschool as the reference group.

Code
model <- glm(badmnhlth ~ rentpct + lessthanhs + somecollege + bachelors, family = "binomial", data = dat2)

summary(model)

Call:
glm(formula = badmnhlth ~ rentpct + lessthanhs + somecollege + 
    bachelors, family = "binomial", data = dat2)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.55768    0.04686 -11.901  < 2e-16 ***
rentpct      0.21582    0.03846   5.612 2.00e-08 ***
lessthanhs  -0.01751    0.08564  -0.204  0.83799    
somecollege  0.14527    0.04943   2.939  0.00329 ** 
bachelors   -0.24652    0.05647  -4.366 1.27e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14990  on 11138  degrees of freedom
Residual deviance: 14901  on 11134  degrees of freedom
AIC: 14911

Number of Fisher Scoring iterations: 4

Now let’s just look at some college.

Code
model <- glm(badmnhlth ~ rentpct + somecollege, family = "binomial", data = dat2)

summary(model)

Call:
glm(formula = badmnhlth ~ rentpct + somecollege, family = "binomial", 
    data = dat2)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.65782    0.03781 -17.400  < 2e-16 ***
rentpct      0.19503    0.03812   5.116 3.11e-07 ***
somecollege  0.25933    0.03901   6.647 2.99e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14990  on 11138  degrees of freedom
Residual deviance: 14922  on 11136  degrees of freedom
AIC: 14928

Number of Fisher Scoring iterations: 4
Code
# MEMs: marginal effects at the mean

emmeans(model, specs = "somecollege",
        regrid = "response")
 somecollege  prob      SE  df asymp.LCL asymp.UCL
           0 0.372 0.00613 Inf      0.36     0.384
           1 0.434 0.00708 Inf      0.42     0.448

Confidence level used: 0.95 

The marginal effect at the mean rent percent is that a person with some college has a 43.4% probability of having bad mental health at the 95% confidence interval. Those who are not in the some college category have a 37.2% probability of having bad mental health at the 95% confidence interval.

Model with age, single adult, female, nh_black, some college.

Code
model <- glm(badmnhlth ~ rentpct + age + single_adult + female + nh_black + somecollege, family = "binomial", data = dat2)

summary(model)

Call:
glm(formula = badmnhlth ~ rentpct + age + single_adult + female + 
    nh_black + somecollege, family = "binomial", data = dat2)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)   0.225063   0.076306   2.949 0.003183 ** 
rentpct       0.145303   0.038928   3.733 0.000189 ***
age          -0.017915   0.001229 -14.575  < 2e-16 ***
single_adult -0.100095   0.040417  -2.477 0.013265 *  
female        0.160023   0.042922   3.728 0.000193 ***
nh_black     -0.364349   0.054871  -6.640 3.13e-11 ***
somecollege   0.283910   0.039649   7.161 8.03e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14990  on 11138  degrees of freedom
Residual deviance: 14617  on 11132  degrees of freedom
AIC: 14631

Number of Fisher Scoring iterations: 4

Let’s look at the marginal effects of some college in this model.

Code
# MEMs: marginal effects at the mean

emmeans(model, specs = "somecollege",
        regrid = "response")
 somecollege  prob      SE  df asymp.LCL asymp.UCL
           0 0.333 0.00733 Inf     0.319     0.348
           1 0.398 0.00840 Inf     0.382     0.415

Results are averaged over the levels of: single_adult, female, nh_black 
Confidence level used: 0.95 

The marginal effect at the mean rent percent, age, singe adult, female, and non-Hispanic Black, is that a person with some college has a 39.8% probability of having bad mental health at the 95% confidence interval. Those who are not in the some college category have a 33.3% probability of having bad mental health at the 95% confidence interval.

Testing Hypothesis 2

  1. Among similarly situated renters, rates of depression and anxiety are lower in states with pro-tenant policies and higher in states with pro-landlord policies.

DV: Bad mental health. Bad mental health (10 to 16) is 1, good mental health (4 to 9) is 0.

IV: grace_period: Grace period for paying rent prior to eviction. Continuous variable from 0 days to 30 days.

I need to visualize this relationship.

Code
model <- lmer(badmnhlth ~ 1 + (1 | grace_period), data=dat2)

summary(model)
Linear mixed model fit by REML ['lmerMod']
Formula: badmnhlth ~ 1 + (1 | grace_period)
   Data: dat2

REML criterion at convergence: 15718.6

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-0.8314 -0.8140 -0.8064  1.2183  1.2555 

Random effects:
 Groups       Name        Variance  Std.Dev.
 grace_period (Intercept) 0.0001464 0.0121  
 Residual                 0.2398336 0.4897  
Number of obs: 11139, groups:  grace_period, 11

Fixed effects:
            Estimate Std. Error t value
(Intercept) 0.395075   0.006592   59.94

Testing Hypothesis 3

  1. Renters who are late on rent payments exhibit higher rates of depression and anxiety.

DV: Bad mental health. Bad mental health (10 to 16) is 1, good mental health (4 to 9) is 0.

IV: rentcur “Caught up on rent” = 1 and “Not caught up on rent” = 0

How do I do a visualization of bad mental health by caught up on rent for two lowest income levels?

Code
#Logit models of bad mental health by caught up on rent.

model <- glm(badmnhlth ~ rentcur, family = "binomial", data = dat2)

summary(model)

Call:
glm(formula = badmnhlth ~ rentcur, family = "binomial", data = dat2)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.24645    0.04584   5.377 7.58e-08 ***
rentcur     -0.79746    0.05069 -15.733  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14990  on 11138  degrees of freedom
Residual deviance: 14740  on 11137  degrees of freedom
AIC: 14744

Number of Fisher Scoring iterations: 4
Code
# MEMs: marginal effects at the mean

emmeans(model, specs = "rentcur",
        regrid = "response")
 rentcur  prob      SE  df asymp.LCL asymp.UCL
       0 0.561 0.01129 Inf     0.539     0.583
       1 0.366 0.00502 Inf     0.356     0.375

Confidence level used: 0.95 

A person caught up on rent has a 36.6% probability of having bad mental health at the 95% confidence interval. A person not caught up on rent has a 56.1% probability of having bad mental health at the 95% confidence interval.

Let’s add age to the model.

Code
model <- glm(badmnhlth ~ rentcur + age, family = "binomial", data = dat2)

summary(model)

Call:
glm(formula = badmnhlth ~ rentcur + age, family = "binomial", 
    data = dat2)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.052415   0.072432   14.53   <2e-16 ***
rentcur     -0.753579   0.051103  -14.75   <2e-16 ***
age         -0.017487   0.001207  -14.49   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14990  on 11138  degrees of freedom
Residual deviance: 14525  on 11136  degrees of freedom
AIC: 14531

Number of Fisher Scoring iterations: 4
Code
# MEMs: marginal effects at the mean

emmeans(model, specs = "rentcur",
        regrid = "response")
 rentcur  prob      SE  df asymp.LCL asymp.UCL
       0 0.550 0.01144 Inf     0.528     0.572
       1 0.365 0.00507 Inf     0.355     0.375

Confidence level used: 0.95 

At the average age, a person caught up on rent has a 36.5% probability of having bad mental health at the 95% confidence interval. A person not caught up on rent has a 55.0% probability of having bad mental health at the 95% confidence interval.

Let’s add number of people in the household.

Code
model <- glm(badmnhlth ~ rentcur + age + thhld_numper, family = "binomial", data = dat2)

summary(model)

Call:
glm(formula = badmnhlth ~ rentcur + age + thhld_numper, family = "binomial", 
    data = dat2)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)   1.188651   0.089200  13.326  < 2e-16 ***
rentcur      -0.774896   0.051785 -14.964  < 2e-16 ***
age          -0.018288   0.001246 -14.678  < 2e-16 ***
thhld_numper -0.034081   0.012967  -2.628  0.00858 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14990  on 11138  degrees of freedom
Residual deviance: 14519  on 11135  degrees of freedom
AIC: 14527

Number of Fisher Scoring iterations: 4
Code
# MEMs: marginal effects at the mean

emmeans(model, specs = "rentcur",
        regrid = "response")
 rentcur  prob      SE  df asymp.LCL asymp.UCL
       0 0.554 0.01155 Inf     0.532     0.577
       1 0.364 0.00508 Inf     0.354     0.374

Confidence level used: 0.95 

At the average age, and number of people in the household, a person caught up on rent has a 36.4% probability of having bad mental health at the 95% confidence interval. A person not caught up on rent has a 55.4% probability of having bad mental health at the 95% confidence interval.

Let’s add presence of children in the household.

Code
model <- glm(badmnhlth ~ rentcur + age + thhld_numper + children, family = "binomial", data = dat2)

summary(model)

Call:
glm(formula = badmnhlth ~ rentcur + age + thhld_numper + children, 
    family = "binomial", data = dat2)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)   1.192740   0.089108   13.38  < 2e-16 ***
rentcur      -0.784138   0.051899  -15.11  < 2e-16 ***
age          -0.018633   0.001252  -14.89  < 2e-16 ***
thhld_numper -0.004159   0.016606   -0.25  0.80222    
children     -0.163626   0.057212   -2.86  0.00424 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14990  on 11138  degrees of freedom
Residual deviance: 14510  on 11134  degrees of freedom
AIC: 14520

Number of Fisher Scoring iterations: 4
Code
# MEMs: marginal effects at the mean

emmeans(model, specs = "rentcur",
        regrid = "response")
 rentcur  prob      SE  df asymp.LCL asymp.UCL
       0 0.548 0.01175 Inf     0.525     0.571
       1 0.357 0.00565 Inf     0.346     0.368

Results are averaged over the levels of: children 
Confidence level used: 0.95 

At the average age, number of people in the household, and presence of children, a person caught up on rent has a 35.7% probability of having bad mental health at the 95% confidence interval. A person not caught up on rent has a 54.8% probability of having bad mental health at the 95% confidence interval.

Let’s add single vs. multiple adults in the household.

Code
model <- glm(badmnhlth ~ rentcur + age + thhld_numper + children + single_adult, family = "binomial", data = dat2)

summary(model)

Call:
glm(formula = badmnhlth ~ rentcur + age + thhld_numper + children + 
    single_adult, family = "binomial", data = dat2)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)   1.352089   0.099949  13.528  < 2e-16 ***
rentcur      -0.787093   0.051938 -15.155  < 2e-16 ***
age          -0.018075   0.001262 -14.328  < 2e-16 ***
thhld_numper -0.052657   0.021534  -2.445 0.014475 *  
children     -0.087468   0.061028  -1.433 0.151790    
single_adult -0.189336   0.052698  -3.593 0.000327 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14990  on 11138  degrees of freedom
Residual deviance: 14497  on 11133  degrees of freedom
AIC: 14509

Number of Fisher Scoring iterations: 4
Code
# MEMs: marginal effects at the mean

emmeans(model, specs = "rentcur",
        regrid = "response")
 rentcur  prob      SE  df asymp.LCL asymp.UCL
       0 0.552 0.01178 Inf     0.529     0.575
       1 0.360 0.00579 Inf     0.349     0.371

Results are averaged over the levels of: children, single_adult 
Confidence level used: 0.95 

At the average age, number of people in the household, presence of children, and single adult, a person caught up on rent has a 36.0% probability of having bad mental health at the 95% confidence interval. A person not caught up on rent has a 55.2% probability of having bad mental health at the 95% confidence interval.

Let’s look at Gender using male as the reference group.

Code
model <- glm(badmnhlth ~ rentcur + female + transgender + gen_none + gen_notsel, family = "binomial", data = dat2)

summary(model)

Call:
glm(formula = badmnhlth ~ rentcur + female + transgender + gen_none + 
    gen_notsel, family = "binomial", data = dat2)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.05104    0.05693   0.896    0.370    
rentcur     -0.79919    0.05089 -15.703  < 2e-16 ***
female       0.23402    0.04511   5.188 2.13e-07 ***
transgender  1.38124    0.19782   6.982 2.91e-12 ***
gen_none     0.82451    0.13335   6.183 6.29e-10 ***
gen_notsel  -0.01681    0.24580  -0.068    0.945    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14990  on 11138  degrees of freedom
Residual deviance: 14645  on 11133  degrees of freedom
AIC: 14657

Number of Fisher Scoring iterations: 4

Let’s look at only transgender.

Code
model <- glm(badmnhlth ~ rentcur + transgender, family = "binomial", data = dat2)

summary(model)

Call:
glm(formula = badmnhlth ~ rentcur + transgender, family = "binomial", 
    data = dat2)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.23862    0.04589   5.200 1.99e-07 ***
rentcur     -0.80423    0.05076 -15.843  < 2e-16 ***
transgender  1.19825    0.19506   6.143 8.10e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14990  on 11138  degrees of freedom
Residual deviance: 14699  on 11136  degrees of freedom
AIC: 14705

Number of Fisher Scoring iterations: 4
Code
# MEMs: marginal effects at the mean

emmeans(model, specs = "transgender",
        regrid = "response")
 transgender  prob     SE  df asymp.LCL asymp.UCL
           0 0.461 0.0062 Inf     0.449     0.473
           1 0.731 0.0372 Inf     0.658     0.803

Results are averaged over the levels of: rentcur 
Confidence level used: 0.95 

At the average value of current on rent, a transgender person has a 73.1% probability of having bad mental health at the 95% confidence interval. A non transgender person has a 46.1% probability of having bad mental health at the 95% confidence interval.

Now let’s look at Race and Ethnicity using nh_white as the reference group.

Code
model <- glm(badmnhlth ~ rentcur + hispanic + nh_black + nh_asian + other, family = "binomial", data = dat2)

summary(model)

Call:
glm(formula = badmnhlth ~ rentcur + hispanic + nh_black + nh_asian + 
    other, family = "binomial", data = dat2)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.44362    0.05198   8.534  < 2e-16 ***
rentcur     -0.88649    0.05210 -17.015  < 2e-16 ***
hispanic    -0.21214    0.05857  -3.622 0.000293 ***
nh_black    -0.52525    0.05756  -9.126  < 2e-16 ***
nh_asian    -0.70253    0.13251  -5.302 1.15e-07 ***
other        0.08733    0.07824   1.116 0.264381    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14990  on 11138  degrees of freedom
Residual deviance: 14621  on 11133  degrees of freedom
AIC: 14633

Number of Fisher Scoring iterations: 4

Now let’s just look at Non-Hispanic Asian.

Code
model <- glm(badmnhlth ~ rentcur + nh_asian, family = "binomial", data = dat2)

summary(model)

Call:
glm(formula = badmnhlth ~ rentcur + nh_asian, family = "binomial", 
    data = dat2)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.26781    0.04615   5.804 6.49e-09 ***
rentcur     -0.80558    0.05080 -15.858  < 2e-16 ***
nh_asian    -0.58433    0.13106  -4.459 8.25e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14990  on 11138  degrees of freedom
Residual deviance: 14718  on 11136  degrees of freedom
AIC: 14724

Number of Fisher Scoring iterations: 4
Code
# MEMs: marginal effects at the mean

emmeans(model, specs = "nh_asian",
        regrid = "response")
 nh_asian  prob      SE  df asymp.LCL asymp.UCL
        0 0.468 0.00624 Inf     0.455     0.480
        1 0.334 0.02794 Inf     0.279     0.388

Results are averaged over the levels of: rentcur 
Confidence level used: 0.95 

At the mean value of current on rent, a non-Hispanic Asian person has a 33.4% probability of having bad mental health at the 95% confidence interval. A person who is not a non-Hispanic Asian has a 46.8% probability of having bad mental health at the 95% confidence interval.

Now let’s look at Educational Attainment using highschool as the reference group.

Code
model <- glm(badmnhlth ~ rentcur + lessthanhs + somecollege + bachelors, family = "binomial", data = dat2)

summary(model)

Call:
glm(formula = badmnhlth ~ rentcur + lessthanhs + somecollege + 
    bachelors, family = "binomial", data = dat2)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.19602    0.05712   3.432 0.000600 ***
rentcur     -0.78027    0.05097 -15.307  < 2e-16 ***
lessthanhs  -0.03395    0.08668  -0.392 0.695259    
somecollege  0.16888    0.04995   3.381 0.000723 ***
bachelors   -0.14594    0.05671  -2.574 0.010065 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14990  on 11138  degrees of freedom
Residual deviance: 14696  on 11134  degrees of freedom
AIC: 14706

Number of Fisher Scoring iterations: 4

Now let’s just look at some college.

Code
model <- glm(badmnhlth ~ rentcur + somecollege, family = "binomial", data = dat2)

summary(model)

Call:
glm(formula = badmnhlth ~ rentcur + somecollege, family = "binomial", 
    data = dat2)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.13445    0.04942   2.721  0.00651 ** 
rentcur     -0.79146    0.05078 -15.587  < 2e-16 ***
somecollege  0.23956    0.03938   6.083 1.18e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14990  on 11138  degrees of freedom
Residual deviance: 14703  on 11136  degrees of freedom
AIC: 14709

Number of Fisher Scoring iterations: 4
Code
# MEMs: marginal effects at the mean

emmeans(model, specs = "somecollege",
        regrid = "response")
 somecollege  prob      SE  df asymp.LCL asymp.UCL
           0 0.437 0.00754 Inf     0.423     0.452
           1 0.495 0.00799 Inf     0.479     0.510

Results are averaged over the levels of: rentcur 
Confidence level used: 0.95 

At the average value of current on rent, a person with some college has a 49.5% probability of having bad mental health at the 95% confidence interval. A person in a category other than some college has a 43.7% probability of having bad mental health at the 95% confidence interval.

Model with age, single adult, female, nh_black, some college.

Code
model <- glm(badmnhlth ~ rentcur + age + single_adult + female + nh_black + somecollege, family = "binomial", data = dat2)

summary(model)

Call:
glm(formula = badmnhlth ~ rentcur + age + single_adult + female + 
    nh_black + somecollege, family = "binomial", data = dat2)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)   0.979844   0.081049  12.090  < 2e-16 ***
rentcur      -0.803790   0.052276 -15.376  < 2e-16 ***
age          -0.017340   0.001243 -13.955  < 2e-16 ***
single_adult -0.076584   0.040772  -1.878 0.060333 .  
female        0.147595   0.043276   3.411 0.000648 ***
nh_black     -0.490591   0.056454  -8.690  < 2e-16 ***
somecollege   0.268255   0.040038   6.700 2.08e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14990  on 11138  degrees of freedom
Residual deviance: 14391  on 11132  degrees of freedom
AIC: 14405

Number of Fisher Scoring iterations: 4

Let’s look at the marginal effects of some college in this model.

Code
# MEMs: marginal effects at the mean

emmeans(model, specs = "somecollege",
        regrid = "response")
 somecollege  prob      SE  df asymp.LCL asymp.UCL
           0 0.389 0.00848 Inf     0.372     0.406
           1 0.451 0.00913 Inf     0.433     0.469

Results are averaged over the levels of: rentcur, single_adult, female, nh_black 
Confidence level used: 0.95 

The marginal effect at the mean value of current on rent, age, singe adult, female, and non-Hispanic Black, is that a person with some college has a 45.1% probability of having bad mental health at the 95% confidence interval. Those who are not in the some college category have a 38.9% probability of having bad mental health at the 95% confidence interval

Testing Hypothesis 4

  1. Among renters who are late on rent, rates of depression and anxiety are lower in states with pro-tenant policies and higher in states with pro-landlord policies.
Code
# adding a fixed level-one predictor

model <- lmer(badmnhlth ~ 1 + rentcur + (1 | grace_period), data=dat2)

summary(model)
Linear mixed model fit by REML ['lmerMod']
Formula: badmnhlth ~ 1 + rentcur + (1 | grace_period)
   Data: dat2

REML criterion at convergence: 15464.9

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.1860 -0.7719 -0.7454  1.2874  1.3734 

Random effects:
 Groups       Name        Variance  Std.Dev.
 grace_period (Intercept) 0.0004136 0.02034 
 Residual                 0.2342210 0.48396 
Number of obs: 11139, groups:  grace_period, 11

Fixed effects:
            Estimate Std. Error t value
(Intercept)  0.55250    0.01325   41.71
rentcur     -0.19703    0.01212  -16.26

Correlation of Fixed Effects:
        (Intr)
rentcur -0.750

Final project discussion

Here you will describe the overall results of your project, including whether your results support your hypotheses. Be sure to discuss the limitations of your project and how you could refine the analyses in the future. For instance, are there any omitted variables that could be driving the associations between your independent variables and outcome? You could also discuss what you would do differently if you had the opportunity to redo the project.

Hypothesis 1:

Renters in low-income households (in the lowest 2 income brackets) who pay a high proportion of income for rent (>30%) exhibit higher rates of depression and anxiety.

The results support the hypothesis. In the logit model with bad mental health as the outcome variable and rent as percent of income as the input variable, a one percentage point increase in ‘rentpc’ resulted in a positive 0.185 coefficient and the correlation is statistically significant. (I believe I need to exponentiate the coefficient to find the odds ratio?)

Adding in various correlates to the model indicated that age and single vs. multiple adults in the household are significant, but number of people and presence of children are not.

I evaluated the marginal effects of various models and found, for example, the following:

The marginal effect at the mean rent percent is that a transgender person has a 66.8% probability of having bad mental health at the 95% confidence interval. A non-transgender person has a 39.6% probability of having bad mental health at the 95% confidence interval.

The marginal effect at the mean rent percent is that a non-Hispanic Black person has a 33.4% probability of having bad mental health at the 95% confidence interval. Those who are not non-Hispanic Black have a 41.2% probability of having bad mental health at the 95% confidence interval.

The marginal effect at the mean rent percent is that a person with some college has a 43.4% probability of having bad mental health at the 95% confidence interval. Those who are not in the some college category have a 37.2% probability of having bad mental health at the 95% confidence interval.

The marginal effect at the mean rent percent, age, singe adult, female, and non-Hispanic Black, is that a person with some college has a 39.8% probability of having bad mental health at the 95% confidence interval. Those who are not in the some college category have a 33.3% probability of having bad mental health at the 95% confidence interval.

Hyphotheis 2:

Among similarly situated renters, rates of depression and anxiety are lower in states with pro-tenant policies and higher in states with pro-landlord policies.

I ran the lmer() function, but I do not understand how to interpret the results of this model.

Hypothesis 3:

Renters who are late on rent payments exhibit higher rates of depression and anxiety.

The results support the hypothesis. In the logit model with bad mental health as the outcome variable and current on rent as the input variable, a one percentage point increase in ‘rentcur’ resulted in a negative 0.797 coefficient and the correlation is statistically significant. (I believe I need to exponentiate the coefficient to find the odds ratio?)

I evaluated the marginal effects of various models and found, for example, the following:

A person caught up on rent has a 36.6% probability of having bad mental health at the 95% confidence interval. A person not caught up on rent has a 56.1% probability of having bad mental health at the 95% confidence interval.

At the mean age, number of people in the household, presence of children, and single vs. multiple adults, a person caught up on rent has a 36.0% probability of having bad mental health at the 95% confidence interval. A person not caught up on rent has a 55.2% probability of having bad mental health at the 95% confidence interval.

At the mean value of current on rent, a transgender person has a 73.1% probability of having bad mental health at the 95% confidence interval. A non transgender person has a 46.1% probability of having bad mental health at the 95% confidence interval.

At the mean value of current on rent, a non-Hispanic Asian person has a 33.4% probability of having bad mental health at the 95% confidence interval. A person who is not a non-Hispanic Asian has a 46.8% probability of having bad mental health at the 95% confidence interval.

At the average value of current on rent, a person with some college has a 49.5% probability of having bad mental health at the 95% confidence interval. A person in a category other than some college has a 43.7% probability of having bad mental health at the 95% confidence interval.

Hypothesis 4:

Among renters who are late on rent, rates of depression and anxiety are lower in states with pro-tenant policies and higher in states with pro-landlord policies.

I ran the lmer() function, but I do not understand how to interpret the results of this model.

Limitations and refinement

One limitation of this project is that income is a categorical variable. It was converted to a value at the median of each range, so the value of rent as a percentage of income is an approximation.

Instead of bad mental health, it may be useful to evaluate educational outcomes. One of the largest expenditures for local governments is on public schools. Housing precarity (high cost burden, at risk of eviction, and homelessness) in households with children may have a relationship to childrens’ educational outcomes. An analysis of this relationship may be useful to local governments regarding both housing assistance as well as spending on public schools.

Questions/issues:

  • I realize I need to have a literature review in support of statistical research.

  • How should I evaluate whether to remove outliers for rent amount and income (both of which affects my IV, rent as a percent of income)?

  • It is difficult to manage a large number of continuous, categorical, and dichotomized variables. I got “lost” in the models. I wasn’t sure how to set up the model to “control” for variables or how to use the dichotomized models to support the hypotheses.

  • For a logit model, do we exponentiate the coefficients? Does this provide the odds ratios? Or do we interpret the exponentiated coefficients another way?

  • I realize I need to work on visualizing the data.

  • How do you evaluate model fit?

  • Can you apply a marginal effects function to both continuous and dichotomized variables?

  • If you have a lot of variables in your model, how do you decide what to check for marginal effects? Instead of doing this one variable at a time, is there a way to do many at a time and format the results into a table?

  • I know I need to include more interpretations, model fit tests, visuals, etc.

  • How do you interpret the results of lmer()?

test