Healthb will be my dichotomous outcome variable. 1 for fair/poor health, 0 for good or better.
For predictors I’ll be using:
race: 4 standard groupings
edu: educationlths, hs , college plus
mar: marriage status 1 for yes, 0 for no
inc: income based on three categories, < $25k, between $25k and <$75k, >$75k
Here is all of the two way interactions amongst the chosen variables:
hw3 <- hw3 %>%
mutate(
int_raceedu = interaction(race, edu),
int_raceinc = interaction(race, inc),
int_racemar = interaction(race, mar),
int_edumar = interaction(edu, mar),
int_eduinc = interaction(edu, inc),
int_marinc = interaction(mar, inc)
)
#str(hw3)
tab_raceedu <- svyby(formula = ~healthb, by= ~hw3$int_raceedu, design = des, FUN = svymean, na.rm=T, data = hw3)
#summary(tab_raceedu)
tab_raceinc <- svyby(formula = ~healthb, by= ~hw3$int_raceinc, design = des, FUN = svymean, na.rm=T )
tab_racemar <- svyby(formula = ~healthb, by= ~hw3$int_racemar, design = des, FUN = svymean, na.rm=T )
tab_edumar <- svyby(formula = ~healthb, by= ~hw3$int_edumar, design = des, FUN = svymean, na.rm=T )
tab_eduinc <- svyby(formula = ~healthb, by= ~hw3$int_eduinc, design = des, FUN = svymean, na.rm=T )
tab_marinc <- svyby(formula = ~healthb, by= ~hw3$int_marinc, design = des, FUN = svymean, na.rm=T )
Here we can see the various interactions:
print(tab_raceedu)
## hw3$int_raceedu healthb0 healthb1 se.healthb0 se.healthb1
## nhw.hs nhw.hs 0.8392704 0.16072962 0.01942879 0.01942879
## nhb.hs nhb.hs 0.8355380 0.16446201 0.02761881 0.02761881
## hisp.hs hisp.hs 0.8756070 0.12439304 0.01041168 0.01041168
## other.hs other.hs 0.9627573 0.03724266 0.01064177 0.01064177
## nhw.lths nhw.lths 0.7670398 0.23296016 0.02731888 0.02731888
## nhb.lths nhb.lths 0.7304123 0.26958770 0.05024703 0.05024703
## hisp.lths hisp.lths 0.7610689 0.23893109 0.02515227 0.02515227
## other.lths other.lths 0.8579312 0.14206879 0.09042507 0.09042507
## nhw.college nhw.college 0.6620700 0.33792998 0.02933333 0.02933333
## nhb.college nhb.college 0.1653317 0.83466832 0.06294037 0.06294037
## hisp.college hisp.college 0.4952952 0.50470482 0.07574091 0.07574091
## other.college other.college 0.6542786 0.34572137 0.25044559 0.25044559
print(tab_raceinc)
## hw3$int_raceinc healthb0 healthb1 se.healthb0
## nhw.$25k to $75k nhw.$25k to $75k 0.8250697 0.17493031 0.02141843
## nhb.$25k to $75k nhb.$25k to $75k 0.8285003 0.17149969 0.03862119
## hisp.$25k to $75k hisp.$25k to $75k 0.8237947 0.17620528 0.01814508
## other.$25k to $75k other.$25k to $75k 0.9057941 0.09420592 0.05358624
## nhw.gt $75k nhw.gt $75k 0.8983927 0.10160734 0.02732769
## nhb.gt $75k nhb.gt $75k 0.9478421 0.05215792 0.02937425
## hisp.gt $75k hisp.gt $75k 0.9089065 0.09109353 0.01244464
## other.gt $75k other.gt $75k 0.9761002 0.02389980 0.01162920
## nhw.lt $25k nhw.lt $25k 0.6606190 0.33938102 0.02386725
## nhb.lt $25k nhb.lt $25k 0.5476192 0.45238085 0.05279069
## hisp.lt $25k hisp.lt $25k 0.6513170 0.34868302 0.03126566
## other.lt $25k other.lt $25k 0.8452794 0.15472060 0.09131812
## se.healthb1
## nhw.$25k to $75k 0.02141843
## nhb.$25k to $75k 0.03862119
## hisp.$25k to $75k 0.01814508
## other.$25k to $75k 0.05358624
## nhw.gt $75k 0.02732769
## nhb.gt $75k 0.02937425
## hisp.gt $75k 0.01244464
## other.gt $75k 0.01162920
## nhw.lt $25k 0.02386725
## nhb.lt $25k 0.05279069
## hisp.lt $25k 0.03126566
## other.lt $25k 0.09131812
print(tab_racemar)
## hw3$int_racemar healthb0 healthb1 se.healthb0 se.healthb1
## nhw.n nhw.n 0.7415125 0.25848750 0.02163885 0.02163885
## nhb.n nhb.n 0.7097678 0.29023220 0.03725652 0.03725652
## hisp.n hisp.n 0.7962975 0.20370245 0.01741462 0.01741462
## other.n other.n 0.9633759 0.03662414 0.01125318 0.01125318
## nhw.y nhw.y 0.7624572 0.23754282 0.02105674 0.02105674
## nhb.y nhb.y 0.8335686 0.16643139 0.03770840 0.03770840
## hisp.y hisp.y 0.8553331 0.14466693 0.01326465 0.01326465
## other.y other.y 0.8858774 0.11412255 0.05147940 0.05147940
print(tab_eduinc)
## hw3$int_eduinc healthb0 healthb1 se.healthb0
## hs.$25k to $75k hs.$25k to $75k 0.8692611 0.13073894 0.013632071
## lths.$25k to $75k lths.$25k to $75k 0.8242474 0.17575260 0.024250413
## college.$25k to $75k college.$25k to $75k 0.6665457 0.33345434 0.053511748
## hs.gt $75k hs.gt $75k 0.9285984 0.07140161 0.009551788
## lths.gt $75k lths.gt $75k 0.8916388 0.10836119 0.028146042
## college.gt $75k college.gt $75k 0.4585234 0.54147663 0.156258593
## hs.lt $25k hs.lt $25k 0.7202350 0.27976496 0.026443039
## lths.lt $25k lths.lt $25k 0.6454609 0.35453911 0.029776438
## college.lt $25k college.lt $25k 0.5998525 0.40014754 0.033345586
## se.healthb1
## hs.$25k to $75k 0.013632071
## lths.$25k to $75k 0.024250413
## college.$25k to $75k 0.053511748
## hs.gt $75k 0.009551788
## lths.gt $75k 0.028146042
## college.gt $75k 0.156258593
## hs.lt $25k 0.026443039
## lths.lt $25k 0.029776438
## college.lt $25k 0.033345586
print(tab_edumar)
## hw3$int_edumar healthb0 healthb1 se.healthb0 se.healthb1
## hs.n hs.n 0.8292242 0.17077579 0.014545894 0.014545894
## lths.n lths.n 0.7559399 0.24406008 0.022999963 0.022999963
## college.n college.n 0.6166494 0.38335065 0.039467225 0.039467225
## hs.y hs.y 0.9000689 0.09993109 0.009302129 0.009302129
## lths.y lths.y 0.7749300 0.22507001 0.025622266 0.025622266
## college.y college.y 0.6124883 0.38751173 0.039782717 0.039782717
print(tab_marinc)
## hw3$int_marinc healthb0 healthb1 se.healthb0 se.healthb1
## n.$25k to $75k n.$25k to $75k 0.8420128 0.15798721 0.01755181 0.01755181
## y.$25k to $75k y.$25k to $75k 0.8188233 0.18117672 0.01863789 0.01863789
## n.gt $75k n.gt $75k 0.9308591 0.06914091 0.01651442 0.01651442
## y.gt $75k y.gt $75k 0.9119315 0.08806853 0.01179253 0.01179253
## n.lt $25k n.lt $25k 0.6575462 0.34245379 0.02116671 0.02116671
## y.lt $25k y.lt $25k 0.6426850 0.35731503 0.03194328 0.03194328
We’ll examine 4 different models:
Model 1 will examine how being of a certain race effects one’s health status
Model 2 will add the effects of being married
Model 3 will add the effect of education Model 4 will add the effect of a person’s income
mod1 <- svyglm(healthb~race,
design = des,
family = "binomial",
data = hw3)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
mod2 <- svyglm(healthb~race+mar,
design = des,
family = "binomial",
data = hw3)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
mod3 <- svyglm(healthb~race+mar+edu,
design = des,
family = "binomial",
data = hw3)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
mod4 <- svyglm(healthb~race+mar+edu+inc,
design = des,
family = "binomial",
data = hw3)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
summary(mod1)
##
## Call:
## svyglm(formula = healthb ~ race, design = des, family = "binomial",
## data = hw3)
##
## Survey design:
## svydesign(ids = ~1, weights = ~wt, data = hw3)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.11060 0.08104 -13.704 < 2e-16 ***
## racenhb -0.01608 0.17143 -0.094 0.92526
## racehisp -0.48405 0.11077 -4.370 1.26e-05 ***
## raceother -1.31065 0.42337 -3.096 0.00197 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1.000099)
##
## Number of Fisher Scoring iterations: 4
summary(mod2)
##
## Call:
## svyglm(formula = healthb ~ race + mar, design = des, family = "binomial",
## data = hw3)
##
## Survey design:
## svydesign(ids = ~1, weights = ~wt, data = hw3)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.97468 0.09360 -10.413 < 2e-16 ***
## racenhb -0.05566 0.17012 -0.327 0.74353
## racehisp -0.46334 0.11147 -4.157 3.25e-05 ***
## raceother -1.29507 0.42719 -3.032 0.00244 **
## mary -0.27461 0.10472 -2.622 0.00875 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1.002463)
##
## Number of Fisher Scoring iterations: 4
regTermTest(mod2, test.terms = ~mar, method = "Wald", df = NULL)
## Wald test for mar
## in svyglm(formula = healthb ~ race + mar, design = des, family = "binomial",
## data = hw3)
## F = 6.876526 on 1 and 10066 df: p= 0.0087467
The results of the F test suggest that the second model fits the data better
summary(mod3)
##
## Call:
## svyglm(formula = healthb ~ race + mar + edu, design = des, family = "binomial",
## data = hw3)
##
## Survey design:
## svydesign(ids = ~1, weights = ~wt, data = hw3)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.8373 0.1487 -12.358 < 2e-16 ***
## racenhb 0.4424 0.1796 2.464 0.0138 *
## racehisp 0.1022 0.1352 0.756 0.4496
## raceother -0.7882 0.4254 -1.853 0.0639 .
## mary -0.2602 0.1077 -2.416 0.0157 *
## edulths 0.6937 0.1262 5.499 3.92e-08 ***
## educollege 1.4949 0.1664 8.985 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 0.9950085)
##
## Number of Fisher Scoring iterations: 4
#for some reason Rmarkdown would not knit with this line active, so I # it out
#but included the results
#regTermTest(mod3, test.terms = ~inc, method = "Wald", df = NULL)
#results: Wald test for inc
#in svyglm(formula = healthb ~ edu + mar + inc + race + race * edu +
# race * inc + race * mar + edu * mar + edu * inc, design = des,
# family = "binomial", data = hw3)
#F = 10.77327 on 2 and 10041 df: p= 2.1195e-05
The results of the F test suggest that this model most likely fits the data the best.
summary(mod4)
##
## Call:
## svyglm(formula = healthb ~ race + mar + edu + inc, design = des,
## family = "binomial", data = hw3)
##
## Survey design:
## svydesign(ids = ~1, weights = ~wt, data = hw3)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.11600 0.16431 -12.878 < 2e-16 ***
## racenhb 0.51210 0.18190 2.815 0.004881 **
## racehisp 0.35510 0.13885 2.557 0.010561 *
## raceother -0.66787 0.41277 -1.618 0.105685
## mary 0.07281 0.11498 0.633 0.526578
## edulths 0.44697 0.12922 3.459 0.000544 ***
## educollege 0.99026 0.17976 5.509 3.70e-08 ***
## incgt $75k -0.69517 0.16230 -4.283 1.86e-05 ***
## inclt $25k 0.79733 0.13037 6.116 9.97e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 0.9930785)
##
## Number of Fisher Scoring iterations: 4
regTermTest(mod4, test.terms = ~inc, method = "Wald", df = NULL)
## Wald test for inc
## in svyglm(formula = healthb ~ race + mar + edu + inc, design = des,
## family = "binomial", data = hw3)
## F = 38.89009 on 2 and 10062 df: p= < 2.22e-16
The F test suggests that this model fits the data the best.
Looking at all of the models together and we see the following:
stargazer(mod1, mod2, mod3, mod4,
type = "html",
style = "demography",
ci = T,
column.sep.width = "10pt")
| healthb | ||||
| Model 1 | Model 2 | Model 3 | Model 4 | |
| racenhb | -0.016 | -0.056 | 0.442* | 0.512** |
| (-0.352, 0.320) | (-0.389, 0.278) | (0.091, 0.794) | (0.156, 0.869) | |
| racehisp | -0.484*** | -0.463*** | 0.102 | 0.355* |
| (-0.701, -0.267) | (-0.682, -0.245) | (-0.163, 0.367) | (0.083, 0.627) | |
| raceother | -1.311** | -1.295** | -0.788 | -0.668 |
| (-2.140, -0.481) | (-2.132, -0.458) | (-1.622, 0.046) | (-1.477, 0.141) | |
| mary | -0.275** | -0.260* | 0.073 | |
| (-0.480, -0.069) | (-0.471, -0.049) | (-0.153, 0.298) | ||
| edulths | 0.694*** | 0.447*** | ||
| (0.446, 0.941) | (0.194, 0.700) | |||
| educollege | 1.495*** | 0.990*** | ||
| (1.169, 1.821) | (0.638, 1.343) | |||
| 75k | -0.695*** | |||
| (-1.013, -0.377) | ||||
| 25k | 0.797*** | |||
| (0.542, 1.053) | ||||
| Constant | -1.111*** | -0.975*** | -1.837*** | -2.116*** |
| (-1.269, -0.952) | (-1.158, -0.791) | (-2.129, -1.546) | (-2.438, -1.794) | |
| N | 10,071 | 10,071 | 10,071 | 10,071 |
| Log Likelihood | -4,750.214 | -4,737.609 | -4,524.720 | -4,360.229 |
| AIC | 9,508.427 | 9,485.218 | 9,063.440 | 8,738.457 |
| p < .05; p < .01; p < .001 | ||||
Here we can see that the effect of being of hispanic descent is protective, as compared to non-Hispanic white population.
In model 2, when the effect of marriage is accounted for, we can see that being of hispanic descent is slightly less protective.
In model 3, when education is accounted for, we observe that being of Hispanic descent is no longer protective in regards to being in poor health.
In model 4, when income is accounted for, we can see that being of hispanic descent is now more likely to be of poor or fair health than good health.
Looking at the nesting of our models, we can see that when adding education to the model, we have fully accounted for the being of Hispanic descent on a person’s self reported health status. but when we add income to the model, we see that being of hispanic descent is now harmful in regards to self reported health. This bears further investigation and is eveidence of there being stratification within the models.
mod1_strat1 = svyglm(healthb ~ race*inc + edu + mar, design = des,
family = binomial) #race*inc
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
regTermTest(mod1_strat1, test.terms = ~race:inc, method = "Wald", df = NULL)
## Wald test for race:inc
## in svyglm(formula = healthb ~ race * inc + edu + mar, design = des,
## family = binomial)
## F = 1.110701 on 6 and 10056 df: p= 0.35311
Unfortunately, we see that our interaction term is not significant, so we may conclude that the effects of income are fairly constant by race.
Because Education seemed to have the largest effect of our 4 models, we will next examine educations effect by race:
mod1_strat2 = svyglm(healthb ~ race*edu + inc + mar, design = des,
family = binomial) #race*inc
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
regTermTest(mod1_strat2, test.terms = ~race:edu, method = "Wald", df = NULL)
## Wald test for race:edu
## in svyglm(formula = healthb ~ race * edu + inc + mar, design = des,
## family = binomial)
## F = 3.732227 on 6 and 10056 df: p= 0.0010362
Here we see that the interaction term in this model is significant, so we may conclude, at the .05 level of significance, that the effects of education are not constant by race.
Now we will stratify the model.
fit.unrestricted <- svyglm(healthb ~ race + edu + inc + mar, design = des,
family = binomial)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
mod_white <- mod1 <- svyglm(healthb~(edu + inc + mar),
design = subset(des, race=="nhw"),
family = "binomial",
data = hw3)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
mod_black <- mod1 <- svyglm(healthb~(edu + inc + mar),
design = subset(des, race=="nhb"),
family = "binomial",
data = hw3)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
mod_hisp <- mod1 <- svyglm(healthb~(edu + inc + mar),
design = subset(des, race=="hisp"),
family = "binomial",
data = hw3)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
mod_other <- mod1 <- svyglm(healthb~(edu + inc + mar),
design = subset(des, race=="other"),
family = "binomial",
data = hw3)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
stargazer(mod_white, mod_black, mod_hisp, mod_other,
type = "html",
covariate.labels = c("High School", "College", "$75+", "LT $25k", "Married", "Constant"),
column.labels = c("NH White", "NH Black", "Hispanic", "Other"),
ci = T)
| Dependent variable: | ||||
| healthb | ||||
| NH White | NH Black | Hispanic | Other | |
| (1) | (2) | (3) | (4) | |
| High School | 0.190 | 0.227 | 0.567*** | 1.223 |
| (-0.228, 0.608) | (-0.510, 0.963) | (0.231, 0.903) | (-0.634, 3.080) | |
| College | 0.546** | 2.541*** | 1.426*** | 1.911* |
| (0.096, 0.996) | (1.451, 3.631) | (0.706, 2.146) | (-0.146, 3.968) | |
| 75+ | -0.465 | -1.296* | -0.676*** | -1.411 |
| (-1.133, 0.203) | (-2.650, 0.058) | (-1.058, -0.294) | (-3.158, 0.336) | |
| LT 25k | 0.758*** | 1.172*** | 0.761*** | 0.365 |
| (0.365, 1.151) | (0.462, 1.882) | (0.379, 1.143) | (-1.593, 2.323) | |
| Married | -0.009 | 0.295 | 0.120 | 1.649** |
| (-0.348, 0.331) | (-0.455, 1.044) | (-0.202, 0.442) | (0.341, 2.958) | |
| Constant | -1.776*** | -1.874*** | -1.854*** | -3.960*** |
| (-2.165, -1.386) | (-2.590, -1.157) | (-2.195, -1.514) | (-5.210, -2.709) | |
| Observations | 2,664 | 839 | 6,093 | 475 |
| Log Likelihood | -1,355.792 | -346.636 | -2,442.896 | -88.261 |
| Akaike Inf. Crit. | 2,723.585 | 705.273 | 4,897.791 | 188.522 |
| Note: | p<0.1; p<0.05; p<0.01 | |||
Quite frankly, I’m a little perplexed by the results of the stratification. I’m pretty sure I’ve got the labels correct and the categories correct. I’m going to interpret based on the results and forgo any knowledge I may have on the matter.
Based on the stratification, we can see that the effects of education get worse for all races when compared to a person who does not finish high school. This effect is significant for Hispanics who complete high school and all races who have some college.
High income appears to be a protective benefit for both NH Black and Hispanics, while earning less than $25k is harmful to all races but the other category when compared to those who make between $25k to $75k.
The effect of getting married appears harmful to those in the other category of race, and does not effect the other races.
Overall we can see that there is stratification by race amongst these categories.