Code
<- read.csv("./data/pulse2022_puf_46.csv")
hps46
<- read.csv("./data/pulse2022_puf_47.csv")
hps47
<- read.csv("./data/pulse2022_puf_48.csv") hps48
Here you will describe the coding of your outcome and independent variables. You will produce and discuss summary statistics (tables) and data visualizations (figures) for your outcome and independent variables. You will also need to discuss how your sample size decreases with successive restrictions. This part of the project will alert me to potential coding issues that you need to fix prior to running models. Please send me a draft of this part (along with your revisions to the first part of the project) by April 7th or April 14th (depending on when we discuss your project proposals).
In “Evicted: Poverty and Profit in the American City,” the author Matthew Desmond observed that in some households, there isn’t enough to eat because the “rent eats first.” Housing is shelter, a basic human need. As documented by Desmond, housing is so important it can take priority over other basic needs like food. Therefore, it is reasonable to presume that anxiety and depression is elevated in low-income households that pay a high proportion of income on rent, as well as households that have fallen behind on rent. This research project will explore the relationship between rental cost burden and tenant mental health.
The project will focus on renters (rather than homeowners) for multiple reasons. Prior research shows that low-income families are more likely to rent than to own a home. Renters exhibit greater cost burden than homeowners. Furthermore, renters tend to be at greater risk of eviction than homeowners. The sample consists of respondents to the U.S. Census Bureau Household Pulse Survey Phase 3.3 who rent their dwelling.
What is the relationship between the proportion of income spent on rent and rates of depression and anxiety in the Unites States? Does income level interact with proportion of income spent on rent relative to rates of depression and anxiety?
What is the relationship between falling behind on rent and mental health?
Renters in low-income households (in the lowest 2 income brackets) who pay a high proportion of income for rent (>30%) exhibit higher rates of depression and anxiety. Among similarly situated renters, rates of depression and anxiety are lower in states with pro-tenant policies and higher in states with pro-landlord policies.
Renters who are late on rent payments exhibit higher rates of depression and anxiety. Among renters who are late on rent, rates of depression and anxiety are lower in states with pro-tenant policies and higher in states with pro-landlord policies.
Household Pulse Survey Phase 3.3, December 1, 2021 – February 7, 2022 (https://www.census.gov/programs-surveys/household-pulse-survey.html)
The Law Atlas Project, State Eviction Laws, January 1, 2021 (https://lawatlas.org/datasets/state-eviction-laws)
inclvl: numerical variable derived from INCOME
trentamt: Monthly rent from TRENTAMT
rentpct: rent as a percent of income: Variable derived from INCOME and TRENTAMT
evict: Eviction in next two months
TBIRTH_YEAR: Year of birth to determine age
GENID_DESCRIBE: Current gender identity
Race/Ethnicity: Categorical variable derived from RHISPANIC and RRACE
MS: Marital status
THHLD_NUMPER: Total number of people in household
Children in household: Dichotomous variable derived from THHLD_NUMKID
EEDUC: Educational attainment
Importing the data for Household Pulse Survey phase 3.5. I had to change to phase 3.5 (instead of 3.3) because earlier phases do not include amount of rent.
<- read.csv("./data/pulse2022_puf_46.csv")
hps46
<- read.csv("./data/pulse2022_puf_47.csv")
hps47
<- read.csv("./data/pulse2022_puf_48.csv") hps48
Combining into one dataframe.
.5 <- rbind(hps46, hps47, hps48)
hps3
names(hps3.5) <- tolower(names(hps3.5))
nrow(hps3.5)
[1] 167931
Importing the data for the Law Atlas Project State Eviction Laws.
<- read_xlsx("./data/LSCEvictionLaws_StateTerritory_Data.xlsx") lawdat
New names:
• `` -> `...60`
nrow(lawdat)
[1] 53
Importing the data for OPENICPSR Eviction Moratoria data. Setting this aside for now.
# evictdat <- read_xlsx("./data/2023.02.01 Moratoria Supportive + Measures Datasets.xlsx")
# For the timeperiod of this data, only California and Massachusetts have an eviction moratorium in place.
Joining hps3.5 with lawdat.
<- merge(x=hps3.5, y=lawdat, by='est_st', all.x=TRUE)
alldata
nrow(alldata)
[1] 167931
Selecting variables from alldata for a working dataframe. I could select more variables from lawdat or I could create an index.
<- alldata %>%
dat ::select(week, income, tenure, trentamt, evict, rentcur, anxious, worry, interest, down, tbirth_year, genid_describe, rhispanic, rrace, ms, thhld_numadlt, thhld_numper, thhld_numkid, eeduc, est_st, Jurisdictions, hweight, pweight, grace_period)
dplyr
nrow(dat)
[1] 167931
Distribution of tenure. Category “3” is renters.
tabyl(dat$tenure)
dat$tenure n percent
-99 1231 0.007330392
-88 27107 0.161417487
1 39562 0.235584853
2 64143 0.381960448
3 34032 0.202654662
4 1856 0.011052158
Filter for only renters.
<- dat %>%
dat filter(.$tenure == 3)
nrow(dat)
[1] 34032
Distribution of income.
tabyl(dat$income)
dat$income n percent
-99 468 0.01375176
-88 1568 0.04607428
1 7488 0.22002821
2 4635 0.13619535
3 4740 0.13928068
4 5750 0.16895863
5 3420 0.10049365
6 3236 0.09508698
7 1360 0.03996239
8 1367 0.04016808
Filter out rows with missing income data.
<- dat %>%
dat filter(.$income %in% c(1:8))
tabyl(dat$income)
dat$income n percent
1 7488 0.23402925
2 4635 0.14486186
3 4740 0.14814352
4 5750 0.17970996
5 3420 0.10688836
6 3236 0.10113764
7 1360 0.04250531
8 1367 0.04272409
Checking sample size.
nrow(dat)
[1] 31996
Recode income with new variable ‘inclvl’.
<- dat %>%
dat mutate(inclvl = case_when(.$income == '1' ~ 12500,
$income == '2' ~ 30000,
.$income == '3' ~ 42500,
.$income == '4' ~ 62500,
.$income == '5' ~ 87500,
.$income == '6' ~ 125000,
.$income == '7' ~ 175000,
.$income == '8' ~ 200000,
.
)
)
tabyl(dat$inclvl)
dat$inclvl n percent
12500 7488 0.23402925
30000 4635 0.14486186
42500 4740 0.14814352
62500 5750 0.17970996
87500 3420 0.10688836
125000 3236 0.10113764
175000 1360 0.04250531
200000 1367 0.04272409
Checking sample size.
nrow(dat)
[1] 31996
Histogram of income levels.
hist(dat$inclvl)
Summary statistics of rent amount.
summary(dat$trentamt)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-99 725 1200 1297 1775 3500
How many people reported paying zero rent?
length(which(dat$trentamt == 0))
[1] 138
Removing negative values for rent, but leaving zeros. The data dictionary does not show an NA variable. It should be coded as a positive value from 0 to 99999.
<- dat %>%
dat filter(.$trentamt >= 0)
Checking sample size.
nrow(dat)
[1] 30074
Updated summary statistics of rent amount with negative amounts removed.
summary(dat$trentamt)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 800 1216 1386 1800 3500
Boxplot of rent amount.
boxplot(dat$trentamt)
Histogram of rent amount.
hist(dat$trentamt)
Making a new variable for rent as a percent of income.
#I'm adding 0.01 because some rent amounts are zero.
$rentpct <- (dat$trentamt + 0.01)/(dat$inclvl/12) dat
Summary statistics of rent as a percent of income.
summary(dat$rentpct)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000001 0.192001 0.283201 0.413501 0.480010 3.360010
Boxplot of rent as a percent of income.
boxplot(dat$rentpct)
Histogram of rent as a percent of income.
hist(dat$rentpct)
Checking sample size.
nrow(dat)
[1] 30074
Checking distribution of rentcur.
tabyl(dat$rentcur)
dat$rentcur n percent
-99 39 0.001296801
1 26815 0.891633970
2 3220 0.107069229
Removing missing values for rentcur.
<- dat %>%
dat filter(.$rentcur %in% c(1:2))
Checking distribution of rentcur.
tabyl(dat$rentcur)
dat$rentcur n percent
1 26815 0.8927917
2 3220 0.1072083
Dichotomize rentcur. “Caught up on rent” = 1 and “Not caught up on rent” = 0
$rentcur <- car::recode(dat$rentcur, "1=1; else=0")
dattabyl(dat$rentcur)
dat$rentcur n percent
0 3220 0.1072083
1 26815 0.8927917
Checking sample size.
nrow(dat)
[1] 30035
Checking distribution of evict.
tabyl(dat$evict)
dat$evict n percent
-99 12 0.0003995339
-88 26815 0.8927917430
1 519 0.0172798402
2 852 0.0283669053
3 920 0.0306309306
4 917 0.0305310471
This question is asked if RENTCUR = 2, so it is only asked of people who are not caught up on rent. 1) Very likely 2) Somewhat likely 3) Not very likely 4) Not likely at all
Set this aside for now.
First, I add the 4 mental health categories together to create a mental health index.
<- dat %>%
dat mutate(mnhlth = anxious + worry + interest + down)
tabyl(dat$mnhlth)
dat$mnhlth n percent
-396 7 2.330614e-04
-295 5 1.664724e-04
-294 1 3.329449e-05
-293 2 6.658898e-05
-196 6 1.997669e-04
-195 2 6.658898e-05
-194 3 9.988347e-05
-96 32 1.065424e-03
-95 15 4.994173e-04
-94 12 3.995339e-04
-93 18 5.993008e-04
-92 6 1.997669e-04
-91 7 2.330614e-04
-90 9 2.996504e-04
-89 1 3.329449e-05
-88 4 1.331780e-04
-87 10 3.329449e-04
4 6607 2.199767e-01
5 2351 7.827535e-02
6 2882 9.595472e-02
7 2611 8.693191e-02
8 4007 1.334110e-01
9 1869 6.222740e-02
10 1613 5.370401e-02
11 1255 4.178458e-02
12 1572 5.233894e-02
13 1042 3.469286e-02
14 1013 3.372732e-02
15 850 2.830032e-02
16 2223 7.401365e-02
Filter for results from 4 to 16.
<- dat %>%
dat filter(.$mnhlth %in% c(4:16))
tabyl(dat$mnhlth)
dat$mnhlth n percent
4 6607 0.22100686
5 2351 0.07864191
6 2882 0.09640408
7 2611 0.08733902
8 4007 0.13403579
9 1869 0.06251882
10 1613 0.05395551
11 1255 0.04198026
12 1572 0.05258404
13 1042 0.03485533
14 1013 0.03388527
15 850 0.02843285
16 2223 0.07436026
Convert to PHQ4 scale by subtracting 4 from each score.
$mnhlth <- dat$mnhlth - 4
dat
tabyl(dat$mnhlth)
dat$mnhlth n percent
0 6607 0.22100686
1 2351 0.07864191
2 2882 0.09640408
3 2611 0.08733902
4 4007 0.13403579
5 1869 0.06251882
6 1613 0.05395551
7 1255 0.04198026
8 1572 0.05258404
9 1042 0.03485533
10 1013 0.03388527
11 850 0.02843285
12 2223 0.07436026
Histogram of mental health index for all respondents.
hist(dat$mnhlth)
Coding the mental health index to a variable “phq4”. There are four psychological distress levels: None, Mild, Moderate, and Severe. These could be dichotomized.
<- dat %>%
dat mutate(phq4 = case_when(.$mnhlth %in% c(0:2) ~"1 none",
$mnhlth %in% c(3:5) ~"2 mild",
.$mnhlth %in% c(6:8) ~"3 moderate",
.$mnhlth %in% c(9:12) ~"4 severe",
.
)
)
tabyl(dat$phq4)
dat$phq4 n percent
1 none 11840 0.3960529
2 mild 8487 0.2838936
3 moderate 4440 0.1485198
4 severe 5128 0.1715337
Make a dummy variable for bad mental health. Bad mental (10 to 16) is 1, good mental health (4 to 9) is 0. The threshold for this can be adjusted.
<- dat %>%
dat mutate(badmnhlth = case_when(.$mnhlth <=5 ~ 0,
$mnhlth >=6 ~ 1,
.
)
)
tabyl(dat$badmnhlth)
dat$badmnhlth n percent
0 20327 0.6799465
1 9568 0.3200535
A graph of income level and rent alone. Would this work better as a series of boxplots?
plot(dat$inclvl, dat$trentamt)
Visualization of bad mental health by rent alone.
cdplot(as.factor(badmnhlth) ~ trentamt, data=dat)
Visualization of bad mental health by rent alone, faceted by income levels.
ggplot(dat, aes(trentamt, after_stat(count), fill = forcats::fct_relevel(as.factor(badmnhlth)))) +
geom_density(position = "fill") +
labs(fill = "Bad mental health") +
facet_wrap(dat$inclvl)
A graph of income level and rent as a percentage of income.
plot(dat$inclvl, dat$rentpct)
Visualization of bad mental health by rent as a percent of income.
cdplot(as.factor(badmnhlth) ~ rentpct, data=dat)
Visualization of bad mental health by rent as a percent of income, faceted by income levels.
ggplot(dat, aes(rentpct, after_stat(count), fill = forcats::fct_relevel(as.factor(badmnhlth)))) +
geom_density(position = "fill") +
labs(fill = "Bad mental health") +
facet_wrap(dat$inclvl)
Visualization of bad mental health by rent for all income levels.
cdplot(as.factor(badmnhlth) ~ trentamt, data=dat)
Visulalization of bad mental health by rent for the two lowest income levels. Making a datafram “temp” with just the two lowest income levels.
<- dat %>%
temp filter(income %in% c(1:2))
tabyl(temp$income)
temp$income n percent
1 6844 0.6144178
2 4295 0.3855822
Visualization of bad mental health by rent for two lowest income levels.
cdplot(as.factor(badmnhlth) ~ trentamt, data=temp)
Visualization of bad mental health by rent as a percent of income for two lowest income levels.
cdplot(as.factor(badmnhlth) ~ rentpct, data=temp)
Table of bad mental health by rentcur.
tabyl(dat, mnhlth, rentcur)
mnhlth 0 1
0 391 6216
1 129 2222
2 195 2687
3 197 2414
4 392 3615
5 202 1667
6 189 1424
7 173 1082
8 255 1317
9 177 865
10 184 829
11 168 682
12 544 1679
Table of phq4 by rentcur.
tabyl(dat, phq4, rentcur)
phq4 0 1
1 none 715 11125
2 mild 791 7696
3 moderate 617 3823
4 severe 1073 4055
Checking sample size.
nrow(dat)
[1] 29895
Checking the distribution of grace_period.
tabyl(dat$grace_period)
dat$grace_period n percent
0 2471 0.082655963
1 220 0.007359090
3 8217 0.274862017
5 3771 0.126141495
6 253 0.008462954
7 3317 0.110955009
10 5480 0.183308246
12 511 0.017093159
14 4730 0.158220438
20 308 0.010302726
30 617 0.020638903
Histogram of grace period.
hist(dat$grace_period)
Grace period jurisdictions.
tabyl(dat, Jurisdictions, grace_period)
Jurisdictions 0 1 3 5 6 7 10 12 14 20 30
Alabama 0 0 0 0 0 320 0 0 0 0 0
Alaska 0 0 0 0 0 337 0 0 0 0 0
Arizona 0 0 0 771 0 0 0 0 0 0 0
Arkansas 0 0 317 0 0 0 0 0 0 0 0
California 0 0 3139 0 0 0 0 0 0 0 0
Colorado 0 0 0 0 0 0 746 0 0 0 0
Connecticut 0 0 0 0 0 0 0 511 0 0 0
Delaware 0 0 0 220 0 0 0 0 0 0 0
District of Columbia 0 0 0 0 0 0 0 0 0 0 617
Florida 0 0 982 0 0 0 0 0 0 0 0
Georgia 713 0 0 0 0 0 0 0 0 0 0
Hawaii 0 0 0 465 0 0 0 0 0 0 0
Idaho 0 0 424 0 0 0 0 0 0 0 0
Illinois 0 0 0 648 0 0 0 0 0 0 0
Indiana 0 0 0 0 0 0 459 0 0 0 0
Iowa 0 0 386 0 0 0 0 0 0 0 0
Kansas 0 0 537 0 0 0 0 0 0 0 0
Kentucky 0 0 0 0 0 390 0 0 0 0 0
Louisiana 0 0 0 321 0 0 0 0 0 0 0
Maine 0 0 0 0 0 251 0 0 0 0 0
Maryland 559 0 0 0 0 0 0 0 0 0 0
Massachusetts 0 0 0 0 0 0 0 0 973 0 0
Michigan 0 0 0 0 0 594 0 0 0 0 0
Minnesota 534 0 0 0 0 0 0 0 0 0 0
Mississippi 0 220 0 0 0 0 0 0 0 0 0
Missouri 0 0 0 0 0 0 512 0 0 0 0
Montana 0 0 287 0 0 0 0 0 0 0 0
Nebraska 0 0 0 0 0 418 0 0 0 0 0
Nevada 0 0 0 0 0 551 0 0 0 0 0
New Hampshire 0 0 0 0 0 456 0 0 0 0 0
New Jersey 484 0 0 0 0 0 0 0 0 0 0
New Mexico 0 0 482 0 0 0 0 0 0 0 0
New York 0 0 0 0 0 0 0 0 819 0 0
North Carolina 0 0 0 0 0 0 508 0 0 0 0
North Dakota 0 0 278 0 0 0 0 0 0 0 0
Ohio 0 0 491 0 0 0 0 0 0 0 0
Oklahoma 0 0 0 462 0 0 0 0 0 0 0
Oregon 0 0 0 0 0 0 988 0 0 0 0
Pennsylvania 0 0 0 0 0 0 733 0 0 0 0
Rhode Island 0 0 0 0 0 0 0 0 0 308 0
South Carolina 0 0 0 355 0 0 0 0 0 0 0
South Dakota 0 0 0 0 253 0 0 0 0 0 0
Tennessee 0 0 0 0 0 0 0 0 505 0 0
Texas 0 0 0 0 0 0 1534 0 0 0 0
Utah 0 0 683 0 0 0 0 0 0 0 0
Vermont 0 0 0 0 0 0 0 0 304 0 0
Virginia 0 0 0 0 0 0 0 0 721 0 0
Washington 0 0 0 0 0 0 0 0 1408 0 0
West Virginia 181 0 0 0 0 0 0 0 0 0 0
Wisconsin 0 0 0 529 0 0 0 0 0 0 0
Wyoming 0 0 211 0 0 0 0 0 0 0 0
How many observations do we have by state?
tabyl(dat$Jurisdictions)
dat$Jurisdictions n percent
Alabama 320 0.010704131
Alaska 337 0.011272788
Arizona 771 0.025790266
Arkansas 317 0.010603780
California 3139 0.105000836
Colorado 746 0.024954006
Connecticut 511 0.017093159
Delaware 220 0.007359090
District of Columbia 617 0.020638903
Florida 982 0.032848302
Georgia 713 0.023850142
Hawaii 465 0.015554441
Idaho 424 0.014182974
Illinois 648 0.021675866
Indiana 459 0.015353738
Iowa 386 0.012911858
Kansas 537 0.017962870
Kentucky 390 0.013045660
Louisiana 321 0.010737582
Maine 251 0.008396053
Maryland 559 0.018698779
Massachusetts 973 0.032547249
Michigan 594 0.019869543
Minnesota 534 0.017862519
Mississippi 220 0.007359090
Missouri 512 0.017126610
Montana 287 0.009600268
Nebraska 418 0.013982271
Nevada 551 0.018431176
New Hampshire 456 0.015253387
New Jersey 484 0.016189998
New Mexico 482 0.016123098
New York 819 0.027395886
North Carolina 508 0.016992808
North Dakota 278 0.009299214
Ohio 491 0.016424151
Oklahoma 462 0.015454089
Oregon 988 0.033049005
Pennsylvania 733 0.024519150
Rhode Island 308 0.010302726
South Carolina 355 0.011874895
South Dakota 253 0.008462954
Tennessee 505 0.016892457
Texas 1534 0.051312929
Utah 683 0.022846630
Vermont 304 0.010168925
Virginia 721 0.024117745
Washington 1408 0.047098177
West Virginia 181 0.006054524
Wisconsin 529 0.017695267
Wyoming 211 0.007058036
Let’s make dummy variables for income. The dummy variables are inclvl1 to inclvl8.
tabyl(dat$inclvl)
dat$inclvl n percent
12500 6844 0.22893460
30000 4295 0.14366951
42500 4442 0.14858672
62500 5413 0.18106707
87500 3235 0.10821208
125000 3066 0.10255896
175000 1295 0.04331828
200000 1305 0.04365278
$inclvl1<- car::recode(dat$inclvl, "12500=1; else=0")
dattabyl(dat$inclvl1)
dat$inclvl1 n percent
0 23051 0.7710654
1 6844 0.2289346
$inclvl2<- car::recode(dat$inclvl, "30000=1; else=0")
dattabyl(dat$inclvl2)
dat$inclvl2 n percent
0 25600 0.8563305
1 4295 0.1436695
$inclvl3<- car::recode(dat$inclvl, "42500=1; else=0")
dattabyl(dat$inclvl3)
dat$inclvl3 n percent
0 25453 0.8514133
1 4442 0.1485867
$inclvl4<- car::recode(dat$inclvl, "62500=1; else=0")
dattabyl(dat$inclvl4)
dat$inclvl4 n percent
0 24482 0.8189329
1 5413 0.1810671
$inclvl5<- car::recode(dat$inclvl, "87500=1; else=0")
dattabyl(dat$inclvl5)
dat$inclvl5 n percent
0 26660 0.8917879
1 3235 0.1082121
$inclvl6<- car::recode(dat$inclvl, "125000=1; else=0")
dattabyl(dat$inclvl6)
dat$inclvl6 n percent
0 26829 0.897441
1 3066 0.102559
$inclvl7<- car::recode(dat$inclvl, "175000=1; else=0")
dattabyl(dat$inclvl7)
dat$inclvl7 n percent
0 28600 0.95668172
1 1295 0.04331828
$inclvl8<- car::recode(dat$inclvl, "200000=1; else=0")
dattabyl(dat$inclvl8)
dat$inclvl8 n percent
0 28590 0.95634722
1 1305 0.04365278
Let’s visualize birth year with a histogram.
hist(dat$tbirth_year)
Let’s use birth year to create a new variable for age.
$age <- 2022-dat$tbirth_year
dat
tabyl(dat$age)
dat$age n percent
18 21 0.0007024586
19 48 0.0016056197
20 77 0.0025756816
21 163 0.0054524168
22 218 0.0072921893
23 370 0.0123766516
24 489 0.0163572504
25 616 0.0206054524
26 708 0.0236828901
27 736 0.0246195016
28 768 0.0256899147
29 780 0.0260913196
30 803 0.0268606790
31 803 0.0268606790
32 817 0.0273289848
33 752 0.0251547081
34 783 0.0261916708
35 725 0.0242515471
36 720 0.0240842950
37 712 0.0238166918
38 709 0.0237163405
39 744 0.0248871049
40 651 0.0217762168
41 644 0.0215420639
42 614 0.0205385516
43 636 0.0212744606
44 610 0.0204047500
45 545 0.0182304733
46 510 0.0170597090
47 497 0.0166248537
48 500 0.0167252049
49 481 0.0160896471
50 506 0.0169259073
51 535 0.0178959692
52 562 0.0187991303
53 512 0.0171266098
54 498 0.0166583041
55 410 0.0137146680
56 453 0.0151530356
57 435 0.0145509282
58 468 0.0156547918
59 417 0.0139488209
60 467 0.0156213414
61 459 0.0153537381
62 459 0.0153537381
63 449 0.0150192340
64 399 0.0133467135
65 413 0.0138150192
66 401 0.0134136143
67 367 0.0122763004
68 393 0.0131460110
69 357 0.0119417963
70 335 0.0112058873
71 269 0.0089981602
72 304 0.0101689246
73 265 0.0088643586
74 237 0.0079277471
75 235 0.0078608463
76 185 0.0061883258
77 126 0.0042147516
78 106 0.0035457434
79 117 0.0039136979
80 88 0.0029436361
81 90 0.0030105369
82 55 0.0018397725
83 54 0.0018063221
84 44 0.0014718180
85 32 0.0010704131
86 30 0.0010035123
87 22 0.0007359090
88 91 0.0030439873
hist(dat$age)
Clean up gender identity. Check the codes in the data dictionary. I need to do dummy variables. What do I do about -99? How many categories should I do?
tabyl(dat$genid_describe)
dat$genid_describe n percent
-99 139 0.004649607
1 10680 0.357250376
2 18280 0.611473491
3 271 0.009065061
4 525 0.017561465
$male <- car::recode(dat$genid_describe, "1=1; else=0") #gen1 = male
dattabyl(dat$male)
dat$male n percent
0 19215 0.6427496
1 10680 0.3572504
$female <- car::recode(dat$genid_describe, "2=1; else=0") #gen2 = female
dattabyl(dat$female)
dat$female n percent
0 11615 0.3885265
1 18280 0.6114735
$transgender <- car::recode(dat$genid_describe, "3=1; else=0") #gen3 = transgender
dattabyl(dat$transgender)
dat$transgender n percent
0 29624 0.990934939
1 271 0.009065061
$gen_none <- car::recode(dat$genid_describe, "4=1; else=0") #gen4 = None of these
dattabyl(dat$gen_none)
dat$gen_none n percent
0 29370 0.98243853
1 525 0.01756147
$gen_notsel <- car::recode(dat$genid_describe, "-99=1; else=0") #gen5 = not selected
dattabyl(dat$gen_notsel)
dat$gen_notsel n percent
0 29756 0.995350393
1 139 0.004649607
Combine race/ethnicity into a race_eth variable.
tabyl(dat$rrace)
dat$rrace n percent
1 22450 0.75096170
2 3715 0.12426827
3 1559 0.05214919
4 2171 0.07262084
tabyl(dat$rhispanic)
dat$rhispanic n percent
1 26360 0.8817528
2 3535 0.1182472
<- dat %>%
dat mutate(race_eth = case_when(.$rhispanic == 1 & .$rrace == 1 ~"nh_white",
$rhispanic == 1 & .$rrace == 2 ~"nh_black",
.$rhispanic == 1 & .$rrace == 3 ~"nh_asian",
.$rhispanic == 1 & .$rrace == 4 ~"other",
.$rhispanic == 2 & .$rrace %in% c(1:4) ~ "hispanic"
.
)
)
tabyl(dat$race_eth)
dat$race_eth n percent
hispanic 3535 0.11824720
nh_asian 1480 0.04950661
nh_black 3500 0.11707643
nh_white 19707 0.65920723
other 1673 0.05596254
Dichotomizing race and ethnicity.
$hispanic <- car::recode(dat$race_eth, "'hispanic'=1; else=0")
dattabyl(dat$hispanic)
dat$hispanic n percent
0 26360 0.8817528
1 3535 0.1182472
$nh_white <- car::recode(dat$race_eth, "'nh_white'=1; else=0")
dattabyl(dat$nh_white)
dat$nh_white n percent
0 10188 0.3407928
1 19707 0.6592072
$nh_black <- car::recode(dat$race_eth, "'nh_black'=1; else=0")
dattabyl(dat$nh_black)
dat$nh_black n percent
0 26395 0.8829236
1 3500 0.1170764
$nh_asian <- car::recode(dat$race_eth, "'nh_asian'=1; else=0")
dattabyl(dat$nh_asian)
dat$nh_asian n percent
0 28415 0.95049339
1 1480 0.04950661
$other <- car::recode(dat$race_eth, "'other'=1; else=0")
dattabyl(dat$other)
dat$other n percent
0 28222 0.94403746
1 1673 0.05596254
For educational attainment, dichotomize into “less than high school”, “high school”, “some college”, “bachelor’s or higher”. This could be dichotomized.
tabyl(dat$eeduc)
dat$eeduc n percent
1 291 0.009734069
2 628 0.021006857
3 4308 0.144104365
4 7311 0.244555946
5 3149 0.105335340
6 8107 0.271182472
7 6101 0.204080950
Dichotomizing educational attainment.
$lessthanhs <- car::recode(dat$eeduc, "1=1; 2=1; else=0")
dattabyl(dat$lessthanhs)
dat$lessthanhs n percent
0 28976 0.96925907
1 919 0.03074093
$highschool <- car::recode(dat$eeduc, "3=1; else=0")
dattabyl(dat$highschool)
dat$highschool n percent
0 25587 0.8558956
1 4308 0.1441044
$somecollege <- car::recode(dat$eeduc, "4=1; 5=1; else=0")
dattabyl(dat$somecollege)
dat$somecollege n percent
0 19435 0.6501087
1 10460 0.3498913
$bachelors <- car::recode(dat$eeduc, "6=1; 7=1; else=0")
dattabyl(dat$bachelors)
dat$bachelors n percent
0 15687 0.5247366
1 14208 0.4752634
Distribution of number of people in household. I’m leaving this as an integer for now.
tabyl(dat$thhld_numper)
dat$thhld_numper n percent
1 9703 0.324569326
2 10030 0.335507610
3 4470 0.149523332
4 3097 0.103595919
5 1423 0.047599933
6 632 0.021140659
7 283 0.009466466
8 109 0.003646095
9 47 0.001572169
10 101 0.003378491
Dichotomize presence of children.
tabyl(dat$thhld_numkid)
dat$thhld_numkid n percent
0 21396 0.715704967
1 4234 0.141629035
2 2565 0.085800301
3 1054 0.035256732
4 393 0.013146011
5 253 0.008462954
Dichotomizing presence of children. Children in household is coded as “1”, no children is “0”.
$children <- car::recode(dat$thhld_numkid, "0=0; else=1")
dattabyl(dat$children)
dat$children n percent
0 21396 0.715705
1 8499 0.284295
Dichotomizing number of adults in household. Single adult is coded as “1”, multiple adults is coded as “0”.
tabyl(dat$thhld_numadlt)
dat$thhld_numadlt n percent
1 11834 0.3958521492
2 13271 0.4439203880
3 3111 0.1040642248
4 1101 0.0368289012
5 382 0.0127780565
6 110 0.0036795451
7 33 0.0011038635
8 17 0.0005686570
9 19 0.0006355578
10 17 0.0005686570
$single_adult <- car::recode(dat$thhld_numadlt, "1=1; else=0")
dattabyl(dat$single_adult)
dat$single_adult n percent
0 18061 0.6041479
1 11834 0.3958521
Let’s review my hypotheses:
Comments: I’m going to keep lowest 2 income brackets. Reference Schuetz for justification of this. My options for DVs are mental health on 3 scales: 1) a categorical variable from 0 to 12, 2) PHQ4 which is a categorical variable of 4 measurements, or 3) a dichotomous variable for bad mental health, split down the middle (or it can be divided another way, such as the most severe level).
Statistical method: Population: limited to bottom 2 categories of income. IV: Rent amount an integer. DV: Mental health but which variable?
Statistical method: Limit to lowest 2 income brackets. IV: Grace period by state. DV: Mental health but which variable?
Statistical method: IV: Current on rent, which is dichotomous. DV: Mental health, but which variable?
Statistical method: Population: only renters who are late on rent. IV: Grace period by state. DV: Mental health, but which variable.
Hypothesis 1:
DV: Bad mental health
Bad mental health (10 to 16) is 1, good mental health (4 to 9) is 0.
IV: Rent at a percent of the midpoint of the income level.
Rentpct is a continuous variable defined as rent as a percentage of the midpoint of the income level.
IV:
Control variables. - age: This is a continuous variable. Not dichotomized. - thhld_numper: Number of people in household. Continuous variable. Not dichotomized. - children: This is a dichotomous variable. No children = 0, children = 1 - single_adult: This is a dichotomous variable. More than 1 adult = 0, Single adult = 1 - race_eth is categorical. The dichotomized variables are ‘hisanic’, ‘nh_white’, ‘nh_black’, ‘nh_asian’, and ‘other’. - eeduc is educational attainment, a categorical variable coded as a number. The dichotomized variables are ‘lessthanhs’, ‘highschool’, ‘somecollege’, and ‘bachelors’.
tabyl(dat$income)
dat$income n percent
1 6844 0.22893460
2 4295 0.14366951
3 4442 0.14858672
4 5413 0.18106707
5 3235 0.10821208
6 3066 0.10255896
7 1295 0.04331828
8 1305 0.04365278
nrow(dat)
[1] 29895
<- dat %>%
dat2 filter(income %in% c(1:2))
tabyl(dat2$income)
dat2$income n percent
1 6844 0.6144178
2 4295 0.3855822
nrow(dat2)
[1] 11139
This is where you will describe the justification for the models that you are estimating and present the results from them using tables and figures. In discussing your results, you not only must refer to statistical significance, but also the magnitude of effects. Also, be sure to address model fit based on different specifications of your independent variables or inclusion of different sets of variables.
DV: Bad mental health. Bad mental health (10 to 16) is 1, good mental health (4 to 9) is 0.
IV: Rentpct is a continuous variable defined as rent as a percentage of the midpoint of the income level.
Control variables:
Visualization of bad mental health by rent percent for two lowest income levels.
cdplot(as.factor(badmnhlth) ~ rentpct, data=dat2)
#Logit models of bad mental health with rent percent
<- glm(badmnhlth ~ rentpct, family = "binomial", data = dat2)
model
summary(model)
Call:
glm(formula = badmnhlth ~ rentpct, family = "binomial", data = dat2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.53481 0.03272 -16.345 < 2e-16 ***
rentpct 0.18485 0.03801 4.863 1.16e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14990 on 11138 degrees of freedom
Residual deviance: 14966 on 11137 degrees of freedom
AIC: 14970
Number of Fisher Scoring iterations: 4
# Exponentiate
= model
modelExp $coefficients = exp(modelExp$coefficients)
modelExpsummary(modelExp)
Call:
glm(formula = badmnhlth ~ rentpct, family = "binomial", data = dat2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.58578 0.03272 17.90 <2e-16 ***
rentpct 1.20304 0.03801 31.65 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14990 on 11138 degrees of freedom
Residual deviance: 14966 on 11137 degrees of freedom
AIC: 14970
Number of Fisher Scoring iterations: 4
Creating a variable for rent percent squared.
$rentpctSQ <- dat2$rentpct*dat2$rentpct dat2
Visualization of bad mental health by rent percent squared for two lowest income levels.
cdplot(as.factor(badmnhlth) ~ rentpctSQ, data=dat2)
#Logit models of bad mental health with trentamtSQ
<- glm(badmnhlth ~ rentpct + rentpctSQ, family = "binomial", data = dat2)
model
summary(model)
Call:
glm(formula = badmnhlth ~ rentpct + rentpctSQ, family = "binomial",
data = dat2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.64589 0.04843 -13.336 < 2e-16 ***
rentpct 0.47934 0.10174 4.712 2.46e-06 ***
rentpctSQ -0.12616 0.04054 -3.112 0.00186 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14990 on 11138 degrees of freedom
Residual deviance: 14956 on 11136 degrees of freedom
AIC: 14962
Number of Fisher Scoring iterations: 4
Let’s add age to the model.
<- glm(badmnhlth ~ rentpct + age, family = "binomial", data = dat2)
model
summary(model)
Call:
glm(formula = badmnhlth ~ rentpct + age, family = "binomial",
data = dat2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.366286 0.067388 5.436 5.46e-08 ***
rentpct 0.144519 0.038523 3.752 0.000176 ***
age -0.018122 0.001195 -15.162 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14990 on 11138 degrees of freedom
Residual deviance: 14731 on 11136 degrees of freedom
AIC: 14737
Number of Fisher Scoring iterations: 4
# MEMs: marginal effects at the mean
emmeans(model, specs = "rentpct",
regrid = "response")
rentpct prob SE df asymp.LCL asymp.UCL
0.686 0.397 0.0047 Inf 0.388 0.406
Confidence level used: 0.95
I’m not sure how to interpret marginal effects for a continuous variable like rentpct? The marginal effect at the mean age is that at the mean rent percent, a person has a 39.7% probability of having bad mental health at the 95% confidence interval.
Let’s add number of people in the household.
<- glm(badmnhlth ~ rentpct + age + thhld_numper, family = "binomial", data = dat2)
model
summary(model)
Call:
glm(formula = badmnhlth ~ rentpct + age + thhld_numper, family = "binomial",
data = dat2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.391931 0.079607 4.923 8.51e-07 ***
rentpct 0.146236 0.038630 3.786 0.000153 ***
age -0.018304 0.001233 -14.849 < 2e-16 ***
thhld_numper -0.007656 0.012648 -0.605 0.544961
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14990 on 11138 degrees of freedom
Residual deviance: 14730 on 11135 degrees of freedom
AIC: 14738
Number of Fisher Scoring iterations: 4
# MEMs: marginal effects at the mean
emmeans(model, specs = "rentpct",
regrid = "response")
rentpct prob SE df asymp.LCL asymp.UCL
0.686 0.397 0.0047 Inf 0.388 0.406
Confidence level used: 0.95
This is showing the same result as the prior emmeans function. The marginal effect at the mean age and number of people in the household is that at the mean rent percent, a person has a 39.7% probability of having bad mental health at the 95% confidence interval.
Let’s add presence of children in the household.
<- glm(badmnhlth ~ rentpct + age + thhld_numper + children, family = "binomial", data = dat2)
model
summary(model)
Call:
glm(formula = badmnhlth ~ rentpct + age + thhld_numper + children,
family = "binomial", data = dat2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.392143 0.079568 4.928 8.29e-07 ***
rentpct 0.142838 0.038689 3.692 0.000223 ***
age -0.018521 0.001239 -14.951 < 2e-16 ***
thhld_numper 0.010579 0.016386 0.646 0.518522
children -0.098339 0.056450 -1.742 0.081500 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14990 on 11138 degrees of freedom
Residual deviance: 14727 on 11134 degrees of freedom
AIC: 14737
Number of Fisher Scoring iterations: 4
# MEMs: marginal effects at the mean
emmeans(model, specs = "rentpct",
regrid = "response")
rentpct prob SE df asymp.LCL asymp.UCL
0.686 0.393 0.00533 Inf 0.382 0.403
Results are averaged over the levels of: children
Confidence level used: 0.95
The marginal effect at the mean age, number of people in the household, and presence of children is that at the mean rent percent, a person has a 39.3% probability of having bad mental health at the 95% confidence interval.
Let’s add single vs. multiple adults in the household.
<- glm(badmnhlth ~ rentpct + age + thhld_numper + children + single_adult, family = "binomial", data = dat2)
model
summary(model)
Call:
glm(formula = badmnhlth ~ rentpct + age + thhld_numper + children +
single_adult, family = "binomial", data = dat2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.535043 0.091187 5.868 4.42e-09 ***
rentpct 0.138281 0.038733 3.570 0.000357 ***
age -0.018032 0.001248 -14.444 < 2e-16 ***
thhld_numper -0.032243 0.021157 -1.524 0.127506
children -0.031048 0.060174 -0.516 0.605872
single_adult -0.169164 0.052113 -3.246 0.001170 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14990 on 11138 degrees of freedom
Residual deviance: 14717 on 11133 degrees of freedom
AIC: 14729
Number of Fisher Scoring iterations: 4
# MEMs: marginal effects at the mean
emmeans(model, specs = "rentpct",
regrid = "response")
rentpct prob SE df asymp.LCL asymp.UCL
0.686 0.396 0.00546 Inf 0.385 0.406
Results are averaged over the levels of: children, single_adult
Confidence level used: 0.95
The marginal effect at the mean age, number of people in the household, presence of children, and single adult is that at the mean rent percent, a person has a 39.6% probability of having bad mental health at the 95% confidence interval.
Let’s look at Gender using male as the reference group.
<- glm(badmnhlth ~ rentpct + female + transgender + gen_none + gen_notsel, family = "binomial", data = dat2)
model
summary(model)
Call:
glm(formula = badmnhlth ~ rentpct + female + transgender + gen_none +
gen_notsel, family = "binomial", data = dat2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.745826 0.046977 -15.876 < 2e-16 ***
rentpct 0.186073 0.038224 4.868 1.13e-06 ***
female 0.253796 0.044679 5.680 1.34e-08 ***
transgender 1.318868 0.196828 6.701 2.08e-11 ***
gen_none 0.835824 0.132018 6.331 2.43e-10 ***
gen_notsel 0.008629 0.242967 0.036 0.972
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14990 on 11138 degrees of freedom
Residual deviance: 14870 on 11133 degrees of freedom
AIC: 14882
Number of Fisher Scoring iterations: 4
Let’s just look at transgender.
<- glm(badmnhlth ~ rentpct + transgender, family = "binomial", data = dat2)
model
summary(model)
Call:
glm(formula = badmnhlth ~ rentpct + transgender, family = "binomial",
data = dat2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.54429 0.03282 -16.586 < 2e-16 ***
rentpct 0.18062 0.03810 4.741 2.13e-06 ***
transgender 1.12151 0.19408 5.779 7.53e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14990 on 11138 degrees of freedom
Residual deviance: 14930 on 11136 degrees of freedom
AIC: 14936
Number of Fisher Scoring iterations: 4
# MEMs: marginal effects at the mean
emmeans(model, specs = "transgender",
regrid = "response")
transgender prob SE df asymp.LCL asymp.UCL
0 0.396 0.00467 Inf 0.387 0.406
1 0.668 0.04280 Inf 0.585 0.752
Confidence level used: 0.95
The marginal effect at the mean rent percent is that a transgender person has a 66.8% probability of having bad mental health at the 95% confidence interval. A non-transgender person has a 39.6% probability of having bad mental health at the 95% confidence interval.
Now let’s look at Race and Ethnicity using nh_white as the reference group.
<- glm(badmnhlth ~ rentpct + hispanic + nh_black + nh_asian + other, family = "binomial", data = dat2)
model
summary(model)
Call:
glm(formula = badmnhlth ~ rentpct + hispanic + nh_black + nh_asian +
other, family = "binomial", data = dat2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.46343 0.03559 -13.021 < 2e-16 ***
rentpct 0.20361 0.03847 5.292 1.21e-07 ***
hispanic -0.15617 0.05777 -2.703 0.00687 **
nh_black -0.36654 0.05564 -6.588 4.46e-11 ***
nh_asian -0.63859 0.13078 -4.883 1.05e-06 ***
other 0.16699 0.07693 2.171 0.02996 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14990 on 11138 degrees of freedom
Residual deviance: 14888 on 11133 degrees of freedom
AIC: 14900
Number of Fisher Scoring iterations: 4
Now let’s just look at nh_Black.
<- glm(badmnhlth ~ rentpct + nh_black, family = "binomial", data = dat2)
model
summary(model)
Call:
glm(formula = badmnhlth ~ rentpct + nh_black, family = "binomial",
data = dat2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.47791 0.03393 -14.086 < 2e-16 ***
rentpct 0.17923 0.03809 4.706 2.53e-06 ***
nh_black -0.33572 0.05407 -6.210 5.31e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14990 on 11138 degrees of freedom
Residual deviance: 14927 on 11136 degrees of freedom
AIC: 14933
Number of Fisher Scoring iterations: 4
# MEMs: marginal effects at the mean
emmeans(model, specs = "nh_black",
regrid = "response")
nh_black prob SE df asymp.LCL asymp.UCL
0 0.412 0.0051 Inf 0.402 0.422
1 0.334 0.0111 Inf 0.312 0.356
Confidence level used: 0.95
The marginal effect at the mean rent percent is that a non-Hispanic Black person has a 33.4% probability of having bad mental health at the 95% confidence interval. Those who are not non-Hispanic Black have a 41.2% probability of having bad mental health at the 95% confidence interval.
Now let’s look at Educational Attainment using highschool as the reference group.
<- glm(badmnhlth ~ rentpct + lessthanhs + somecollege + bachelors, family = "binomial", data = dat2)
model
summary(model)
Call:
glm(formula = badmnhlth ~ rentpct + lessthanhs + somecollege +
bachelors, family = "binomial", data = dat2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.55768 0.04686 -11.901 < 2e-16 ***
rentpct 0.21582 0.03846 5.612 2.00e-08 ***
lessthanhs -0.01751 0.08564 -0.204 0.83799
somecollege 0.14527 0.04943 2.939 0.00329 **
bachelors -0.24652 0.05647 -4.366 1.27e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14990 on 11138 degrees of freedom
Residual deviance: 14901 on 11134 degrees of freedom
AIC: 14911
Number of Fisher Scoring iterations: 4
Now let’s just look at some college.
<- glm(badmnhlth ~ rentpct + somecollege, family = "binomial", data = dat2)
model
summary(model)
Call:
glm(formula = badmnhlth ~ rentpct + somecollege, family = "binomial",
data = dat2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.65782 0.03781 -17.400 < 2e-16 ***
rentpct 0.19503 0.03812 5.116 3.11e-07 ***
somecollege 0.25933 0.03901 6.647 2.99e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14990 on 11138 degrees of freedom
Residual deviance: 14922 on 11136 degrees of freedom
AIC: 14928
Number of Fisher Scoring iterations: 4
# MEMs: marginal effects at the mean
emmeans(model, specs = "somecollege",
regrid = "response")
somecollege prob SE df asymp.LCL asymp.UCL
0 0.372 0.00613 Inf 0.36 0.384
1 0.434 0.00708 Inf 0.42 0.448
Confidence level used: 0.95
The marginal effect at the mean rent percent is that a person with some college has a 43.4% probability of having bad mental health at the 95% confidence interval. Those who are not in the some college category have a 37.2% probability of having bad mental health at the 95% confidence interval.
Model with age, single adult, female, nh_black, some college.
<- glm(badmnhlth ~ rentpct + age + single_adult + female + nh_black + somecollege, family = "binomial", data = dat2)
model
summary(model)
Call:
glm(formula = badmnhlth ~ rentpct + age + single_adult + female +
nh_black + somecollege, family = "binomial", data = dat2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.225063 0.076306 2.949 0.003183 **
rentpct 0.145303 0.038928 3.733 0.000189 ***
age -0.017915 0.001229 -14.575 < 2e-16 ***
single_adult -0.100095 0.040417 -2.477 0.013265 *
female 0.160023 0.042922 3.728 0.000193 ***
nh_black -0.364349 0.054871 -6.640 3.13e-11 ***
somecollege 0.283910 0.039649 7.161 8.03e-13 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14990 on 11138 degrees of freedom
Residual deviance: 14617 on 11132 degrees of freedom
AIC: 14631
Number of Fisher Scoring iterations: 4
Let’s look at the marginal effects of some college in this model.
# MEMs: marginal effects at the mean
emmeans(model, specs = "somecollege",
regrid = "response")
somecollege prob SE df asymp.LCL asymp.UCL
0 0.333 0.00733 Inf 0.319 0.348
1 0.398 0.00840 Inf 0.382 0.415
Results are averaged over the levels of: single_adult, female, nh_black
Confidence level used: 0.95
The marginal effect at the mean rent percent, age, singe adult, female, and non-Hispanic Black, is that a person with some college has a 39.8% probability of having bad mental health at the 95% confidence interval. Those who are not in the some college category have a 33.3% probability of having bad mental health at the 95% confidence interval.
DV: Bad mental health. Bad mental health (10 to 16) is 1, good mental health (4 to 9) is 0.
IV: grace_period: Grace period for paying rent prior to eviction. Continuous variable from 0 days to 30 days.
I need to visualize this relationship.
<- lmer(badmnhlth ~ 1 + (1 | grace_period), data=dat2)
model
summary(model)
Linear mixed model fit by REML ['lmerMod']
Formula: badmnhlth ~ 1 + (1 | grace_period)
Data: dat2
REML criterion at convergence: 15718.6
Scaled residuals:
Min 1Q Median 3Q Max
-0.8314 -0.8140 -0.8064 1.2183 1.2555
Random effects:
Groups Name Variance Std.Dev.
grace_period (Intercept) 0.0001464 0.0121
Residual 0.2398336 0.4897
Number of obs: 11139, groups: grace_period, 11
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.395075 0.006592 59.94
DV: Bad mental health. Bad mental health (10 to 16) is 1, good mental health (4 to 9) is 0.
IV: rentcur “Caught up on rent” = 1 and “Not caught up on rent” = 0
How do I do a visualization of bad mental health by caught up on rent for two lowest income levels?
#Logit models of bad mental health by caught up on rent.
<- glm(badmnhlth ~ rentcur, family = "binomial", data = dat2)
model
summary(model)
Call:
glm(formula = badmnhlth ~ rentcur, family = "binomial", data = dat2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.24645 0.04584 5.377 7.58e-08 ***
rentcur -0.79746 0.05069 -15.733 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14990 on 11138 degrees of freedom
Residual deviance: 14740 on 11137 degrees of freedom
AIC: 14744
Number of Fisher Scoring iterations: 4
# MEMs: marginal effects at the mean
emmeans(model, specs = "rentcur",
regrid = "response")
rentcur prob SE df asymp.LCL asymp.UCL
0 0.561 0.01129 Inf 0.539 0.583
1 0.366 0.00502 Inf 0.356 0.375
Confidence level used: 0.95
A person caught up on rent has a 36.6% probability of having bad mental health at the 95% confidence interval. A person not caught up on rent has a 56.1% probability of having bad mental health at the 95% confidence interval.
Let’s add age to the model.
<- glm(badmnhlth ~ rentcur + age, family = "binomial", data = dat2)
model
summary(model)
Call:
glm(formula = badmnhlth ~ rentcur + age, family = "binomial",
data = dat2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.052415 0.072432 14.53 <2e-16 ***
rentcur -0.753579 0.051103 -14.75 <2e-16 ***
age -0.017487 0.001207 -14.49 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14990 on 11138 degrees of freedom
Residual deviance: 14525 on 11136 degrees of freedom
AIC: 14531
Number of Fisher Scoring iterations: 4
# MEMs: marginal effects at the mean
emmeans(model, specs = "rentcur",
regrid = "response")
rentcur prob SE df asymp.LCL asymp.UCL
0 0.550 0.01144 Inf 0.528 0.572
1 0.365 0.00507 Inf 0.355 0.375
Confidence level used: 0.95
At the average age, a person caught up on rent has a 36.5% probability of having bad mental health at the 95% confidence interval. A person not caught up on rent has a 55.0% probability of having bad mental health at the 95% confidence interval.
Let’s add number of people in the household.
<- glm(badmnhlth ~ rentcur + age + thhld_numper, family = "binomial", data = dat2)
model
summary(model)
Call:
glm(formula = badmnhlth ~ rentcur + age + thhld_numper, family = "binomial",
data = dat2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.188651 0.089200 13.326 < 2e-16 ***
rentcur -0.774896 0.051785 -14.964 < 2e-16 ***
age -0.018288 0.001246 -14.678 < 2e-16 ***
thhld_numper -0.034081 0.012967 -2.628 0.00858 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14990 on 11138 degrees of freedom
Residual deviance: 14519 on 11135 degrees of freedom
AIC: 14527
Number of Fisher Scoring iterations: 4
# MEMs: marginal effects at the mean
emmeans(model, specs = "rentcur",
regrid = "response")
rentcur prob SE df asymp.LCL asymp.UCL
0 0.554 0.01155 Inf 0.532 0.577
1 0.364 0.00508 Inf 0.354 0.374
Confidence level used: 0.95
At the average age, and number of people in the household, a person caught up on rent has a 36.4% probability of having bad mental health at the 95% confidence interval. A person not caught up on rent has a 55.4% probability of having bad mental health at the 95% confidence interval.
Let’s add presence of children in the household.
<- glm(badmnhlth ~ rentcur + age + thhld_numper + children, family = "binomial", data = dat2)
model
summary(model)
Call:
glm(formula = badmnhlth ~ rentcur + age + thhld_numper + children,
family = "binomial", data = dat2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.192740 0.089108 13.38 < 2e-16 ***
rentcur -0.784138 0.051899 -15.11 < 2e-16 ***
age -0.018633 0.001252 -14.89 < 2e-16 ***
thhld_numper -0.004159 0.016606 -0.25 0.80222
children -0.163626 0.057212 -2.86 0.00424 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14990 on 11138 degrees of freedom
Residual deviance: 14510 on 11134 degrees of freedom
AIC: 14520
Number of Fisher Scoring iterations: 4
# MEMs: marginal effects at the mean
emmeans(model, specs = "rentcur",
regrid = "response")
rentcur prob SE df asymp.LCL asymp.UCL
0 0.548 0.01175 Inf 0.525 0.571
1 0.357 0.00565 Inf 0.346 0.368
Results are averaged over the levels of: children
Confidence level used: 0.95
At the average age, number of people in the household, and presence of children, a person caught up on rent has a 35.7% probability of having bad mental health at the 95% confidence interval. A person not caught up on rent has a 54.8% probability of having bad mental health at the 95% confidence interval.
Let’s add single vs. multiple adults in the household.
<- glm(badmnhlth ~ rentcur + age + thhld_numper + children + single_adult, family = "binomial", data = dat2)
model
summary(model)
Call:
glm(formula = badmnhlth ~ rentcur + age + thhld_numper + children +
single_adult, family = "binomial", data = dat2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.352089 0.099949 13.528 < 2e-16 ***
rentcur -0.787093 0.051938 -15.155 < 2e-16 ***
age -0.018075 0.001262 -14.328 < 2e-16 ***
thhld_numper -0.052657 0.021534 -2.445 0.014475 *
children -0.087468 0.061028 -1.433 0.151790
single_adult -0.189336 0.052698 -3.593 0.000327 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14990 on 11138 degrees of freedom
Residual deviance: 14497 on 11133 degrees of freedom
AIC: 14509
Number of Fisher Scoring iterations: 4
# MEMs: marginal effects at the mean
emmeans(model, specs = "rentcur",
regrid = "response")
rentcur prob SE df asymp.LCL asymp.UCL
0 0.552 0.01178 Inf 0.529 0.575
1 0.360 0.00579 Inf 0.349 0.371
Results are averaged over the levels of: children, single_adult
Confidence level used: 0.95
At the average age, number of people in the household, presence of children, and single adult, a person caught up on rent has a 36.0% probability of having bad mental health at the 95% confidence interval. A person not caught up on rent has a 55.2% probability of having bad mental health at the 95% confidence interval.
Let’s look at Gender using male as the reference group.
<- glm(badmnhlth ~ rentcur + female + transgender + gen_none + gen_notsel, family = "binomial", data = dat2)
model
summary(model)
Call:
glm(formula = badmnhlth ~ rentcur + female + transgender + gen_none +
gen_notsel, family = "binomial", data = dat2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.05104 0.05693 0.896 0.370
rentcur -0.79919 0.05089 -15.703 < 2e-16 ***
female 0.23402 0.04511 5.188 2.13e-07 ***
transgender 1.38124 0.19782 6.982 2.91e-12 ***
gen_none 0.82451 0.13335 6.183 6.29e-10 ***
gen_notsel -0.01681 0.24580 -0.068 0.945
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14990 on 11138 degrees of freedom
Residual deviance: 14645 on 11133 degrees of freedom
AIC: 14657
Number of Fisher Scoring iterations: 4
Let’s look at only transgender.
<- glm(badmnhlth ~ rentcur + transgender, family = "binomial", data = dat2)
model
summary(model)
Call:
glm(formula = badmnhlth ~ rentcur + transgender, family = "binomial",
data = dat2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.23862 0.04589 5.200 1.99e-07 ***
rentcur -0.80423 0.05076 -15.843 < 2e-16 ***
transgender 1.19825 0.19506 6.143 8.10e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14990 on 11138 degrees of freedom
Residual deviance: 14699 on 11136 degrees of freedom
AIC: 14705
Number of Fisher Scoring iterations: 4
# MEMs: marginal effects at the mean
emmeans(model, specs = "transgender",
regrid = "response")
transgender prob SE df asymp.LCL asymp.UCL
0 0.461 0.0062 Inf 0.449 0.473
1 0.731 0.0372 Inf 0.658 0.803
Results are averaged over the levels of: rentcur
Confidence level used: 0.95
At the average value of current on rent, a transgender person has a 73.1% probability of having bad mental health at the 95% confidence interval. A non transgender person has a 46.1% probability of having bad mental health at the 95% confidence interval.
Now let’s look at Race and Ethnicity using nh_white as the reference group.
<- glm(badmnhlth ~ rentcur + hispanic + nh_black + nh_asian + other, family = "binomial", data = dat2)
model
summary(model)
Call:
glm(formula = badmnhlth ~ rentcur + hispanic + nh_black + nh_asian +
other, family = "binomial", data = dat2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.44362 0.05198 8.534 < 2e-16 ***
rentcur -0.88649 0.05210 -17.015 < 2e-16 ***
hispanic -0.21214 0.05857 -3.622 0.000293 ***
nh_black -0.52525 0.05756 -9.126 < 2e-16 ***
nh_asian -0.70253 0.13251 -5.302 1.15e-07 ***
other 0.08733 0.07824 1.116 0.264381
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14990 on 11138 degrees of freedom
Residual deviance: 14621 on 11133 degrees of freedom
AIC: 14633
Number of Fisher Scoring iterations: 4
Now let’s just look at Non-Hispanic Asian.
<- glm(badmnhlth ~ rentcur + nh_asian, family = "binomial", data = dat2)
model
summary(model)
Call:
glm(formula = badmnhlth ~ rentcur + nh_asian, family = "binomial",
data = dat2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.26781 0.04615 5.804 6.49e-09 ***
rentcur -0.80558 0.05080 -15.858 < 2e-16 ***
nh_asian -0.58433 0.13106 -4.459 8.25e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14990 on 11138 degrees of freedom
Residual deviance: 14718 on 11136 degrees of freedom
AIC: 14724
Number of Fisher Scoring iterations: 4
# MEMs: marginal effects at the mean
emmeans(model, specs = "nh_asian",
regrid = "response")
nh_asian prob SE df asymp.LCL asymp.UCL
0 0.468 0.00624 Inf 0.455 0.480
1 0.334 0.02794 Inf 0.279 0.388
Results are averaged over the levels of: rentcur
Confidence level used: 0.95
At the mean value of current on rent, a non-Hispanic Asian person has a 33.4% probability of having bad mental health at the 95% confidence interval. A person who is not a non-Hispanic Asian has a 46.8% probability of having bad mental health at the 95% confidence interval.
Now let’s look at Educational Attainment using highschool as the reference group.
<- glm(badmnhlth ~ rentcur + lessthanhs + somecollege + bachelors, family = "binomial", data = dat2)
model
summary(model)
Call:
glm(formula = badmnhlth ~ rentcur + lessthanhs + somecollege +
bachelors, family = "binomial", data = dat2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.19602 0.05712 3.432 0.000600 ***
rentcur -0.78027 0.05097 -15.307 < 2e-16 ***
lessthanhs -0.03395 0.08668 -0.392 0.695259
somecollege 0.16888 0.04995 3.381 0.000723 ***
bachelors -0.14594 0.05671 -2.574 0.010065 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14990 on 11138 degrees of freedom
Residual deviance: 14696 on 11134 degrees of freedom
AIC: 14706
Number of Fisher Scoring iterations: 4
Now let’s just look at some college.
<- glm(badmnhlth ~ rentcur + somecollege, family = "binomial", data = dat2)
model
summary(model)
Call:
glm(formula = badmnhlth ~ rentcur + somecollege, family = "binomial",
data = dat2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.13445 0.04942 2.721 0.00651 **
rentcur -0.79146 0.05078 -15.587 < 2e-16 ***
somecollege 0.23956 0.03938 6.083 1.18e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14990 on 11138 degrees of freedom
Residual deviance: 14703 on 11136 degrees of freedom
AIC: 14709
Number of Fisher Scoring iterations: 4
# MEMs: marginal effects at the mean
emmeans(model, specs = "somecollege",
regrid = "response")
somecollege prob SE df asymp.LCL asymp.UCL
0 0.437 0.00754 Inf 0.423 0.452
1 0.495 0.00799 Inf 0.479 0.510
Results are averaged over the levels of: rentcur
Confidence level used: 0.95
At the average value of current on rent, a person with some college has a 49.5% probability of having bad mental health at the 95% confidence interval. A person in a category other than some college has a 43.7% probability of having bad mental health at the 95% confidence interval.
Model with age, single adult, female, nh_black, some college.
<- glm(badmnhlth ~ rentcur + age + single_adult + female + nh_black + somecollege, family = "binomial", data = dat2)
model
summary(model)
Call:
glm(formula = badmnhlth ~ rentcur + age + single_adult + female +
nh_black + somecollege, family = "binomial", data = dat2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.979844 0.081049 12.090 < 2e-16 ***
rentcur -0.803790 0.052276 -15.376 < 2e-16 ***
age -0.017340 0.001243 -13.955 < 2e-16 ***
single_adult -0.076584 0.040772 -1.878 0.060333 .
female 0.147595 0.043276 3.411 0.000648 ***
nh_black -0.490591 0.056454 -8.690 < 2e-16 ***
somecollege 0.268255 0.040038 6.700 2.08e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14990 on 11138 degrees of freedom
Residual deviance: 14391 on 11132 degrees of freedom
AIC: 14405
Number of Fisher Scoring iterations: 4
Let’s look at the marginal effects of some college in this model.
# MEMs: marginal effects at the mean
emmeans(model, specs = "somecollege",
regrid = "response")
somecollege prob SE df asymp.LCL asymp.UCL
0 0.389 0.00848 Inf 0.372 0.406
1 0.451 0.00913 Inf 0.433 0.469
Results are averaged over the levels of: rentcur, single_adult, female, nh_black
Confidence level used: 0.95
The marginal effect at the mean value of current on rent, age, singe adult, female, and non-Hispanic Black, is that a person with some college has a 45.1% probability of having bad mental health at the 95% confidence interval. Those who are not in the some college category have a 38.9% probability of having bad mental health at the 95% confidence interval
# adding a fixed level-one predictor
<- lmer(badmnhlth ~ 1 + rentcur + (1 | grace_period), data=dat2)
model
summary(model)
Linear mixed model fit by REML ['lmerMod']
Formula: badmnhlth ~ 1 + rentcur + (1 | grace_period)
Data: dat2
REML criterion at convergence: 15464.9
Scaled residuals:
Min 1Q Median 3Q Max
-1.1860 -0.7719 -0.7454 1.2874 1.3734
Random effects:
Groups Name Variance Std.Dev.
grace_period (Intercept) 0.0004136 0.02034
Residual 0.2342210 0.48396
Number of obs: 11139, groups: grace_period, 11
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.55250 0.01325 41.71
rentcur -0.19703 0.01212 -16.26
Correlation of Fixed Effects:
(Intr)
rentcur -0.750
Here you will describe the overall results of your project, including whether your results support your hypotheses. Be sure to discuss the limitations of your project and how you could refine the analyses in the future. For instance, are there any omitted variables that could be driving the associations between your independent variables and outcome? You could also discuss what you would do differently if you had the opportunity to redo the project.
Renters in low-income households (in the lowest 2 income brackets) who pay a high proportion of income for rent (>30%) exhibit higher rates of depression and anxiety.
The results support the hypothesis. In the logit model with bad mental health as the outcome variable and rent as percent of income as the input variable, a one percentage point increase in ‘rentpc’ resulted in a positive 0.185 coefficient and the correlation is statistically significant. (I believe I need to exponentiate the coefficient to find the odds ratio?)
Adding in various correlates to the model indicated that age and single vs. multiple adults in the household are significant, but number of people and presence of children are not.
I evaluated the marginal effects of various models and found, for example, the following:
The marginal effect at the mean rent percent is that a transgender person has a 66.8% probability of having bad mental health at the 95% confidence interval. A non-transgender person has a 39.6% probability of having bad mental health at the 95% confidence interval.
The marginal effect at the mean rent percent is that a non-Hispanic Black person has a 33.4% probability of having bad mental health at the 95% confidence interval. Those who are not non-Hispanic Black have a 41.2% probability of having bad mental health at the 95% confidence interval.
The marginal effect at the mean rent percent is that a person with some college has a 43.4% probability of having bad mental health at the 95% confidence interval. Those who are not in the some college category have a 37.2% probability of having bad mental health at the 95% confidence interval.
The marginal effect at the mean rent percent, age, singe adult, female, and non-Hispanic Black, is that a person with some college has a 39.8% probability of having bad mental health at the 95% confidence interval. Those who are not in the some college category have a 33.3% probability of having bad mental health at the 95% confidence interval.
Among similarly situated renters, rates of depression and anxiety are lower in states with pro-tenant policies and higher in states with pro-landlord policies.
I ran the lmer() function, but I do not understand how to interpret the results of this model.
Renters who are late on rent payments exhibit higher rates of depression and anxiety.
The results support the hypothesis. In the logit model with bad mental health as the outcome variable and current on rent as the input variable, a one percentage point increase in ‘rentcur’ resulted in a negative 0.797 coefficient and the correlation is statistically significant. (I believe I need to exponentiate the coefficient to find the odds ratio?)
I evaluated the marginal effects of various models and found, for example, the following:
A person caught up on rent has a 36.6% probability of having bad mental health at the 95% confidence interval. A person not caught up on rent has a 56.1% probability of having bad mental health at the 95% confidence interval.
At the mean age, number of people in the household, presence of children, and single vs. multiple adults, a person caught up on rent has a 36.0% probability of having bad mental health at the 95% confidence interval. A person not caught up on rent has a 55.2% probability of having bad mental health at the 95% confidence interval.
At the mean value of current on rent, a transgender person has a 73.1% probability of having bad mental health at the 95% confidence interval. A non transgender person has a 46.1% probability of having bad mental health at the 95% confidence interval.
At the mean value of current on rent, a non-Hispanic Asian person has a 33.4% probability of having bad mental health at the 95% confidence interval. A person who is not a non-Hispanic Asian has a 46.8% probability of having bad mental health at the 95% confidence interval.
At the average value of current on rent, a person with some college has a 49.5% probability of having bad mental health at the 95% confidence interval. A person in a category other than some college has a 43.7% probability of having bad mental health at the 95% confidence interval.
Among renters who are late on rent, rates of depression and anxiety are lower in states with pro-tenant policies and higher in states with pro-landlord policies.
I ran the lmer() function, but I do not understand how to interpret the results of this model.
One limitation of this project is that income is a categorical variable. It was converted to a value at the median of each range, so the value of rent as a percentage of income is an approximation.
Instead of bad mental health, it may be useful to evaluate educational outcomes. One of the largest expenditures for local governments is on public schools. Housing precarity (high cost burden, at risk of eviction, and homelessness) in households with children may have a relationship to childrens’ educational outcomes. An analysis of this relationship may be useful to local governments regarding both housing assistance as well as spending on public schools.
Questions/issues:
I realize I need to have a literature review in support of statistical research.
How should I evaluate whether to remove outliers for rent amount and income (both of which affects my IV, rent as a percent of income)?
It is difficult to manage a large number of continuous, categorical, and dichotomized variables. I got “lost” in the models. I wasn’t sure how to set up the model to “control” for variables or how to use the dichotomized models to support the hypotheses.
For a logit model, do we exponentiate the coefficients? Does this provide the odds ratios? Or do we interpret the exponentiated coefficients another way?
I realize I need to work on visualizing the data.
How do you evaluate model fit?
Can you apply a marginal effects function to both continuous and dichotomized variables?
If you have a lot of variables in your model, how do you decide what to check for marginal effects? Instead of doing this one variable at a time, is there a way to do many at a time and format the results into a table?
I know I need to include more interpretations, model fit tests, visuals, etc.
How do you interpret the results of lmer()?
test