3. Construct a Dicjotomous outcome variable.

Healthb will be my dichotomous outcome variable. 1 for fair/poor health, 0 for good or better.

4. Construct relevant predictor/control variables for your outcome based on literature from your area of interest.

For predictors I’ll be using:
race: 4 standard groupings
edu: educationlths, hs , college plus
mar: marriage status 1 for yes, 0 for no
inc: income based on three categories, < $25k, between $25k and <$75k, >$75k

5. following the outline in class example, examine the interactions between your prepredictor variables.

Here is all of the two way interactions amongst the chosen variables:

hw3 <- hw3 %>% 
  mutate(
    int_raceedu = interaction(race, edu),
    int_raceinc = interaction(race, inc),
    int_racemar = interaction(race, mar),
    int_edumar = interaction(edu, mar),
    int_eduinc = interaction(edu, inc),
    int_marinc = interaction(mar, inc)
  )
#str(hw3)
tab_raceedu <- svyby(formula = ~healthb, by= ~hw3$int_raceedu, design = des, FUN = svymean, na.rm=T, data = hw3)
#summary(tab_raceedu)
tab_raceinc <- svyby(formula = ~healthb, by= ~hw3$int_raceinc, design = des, FUN = svymean, na.rm=T )
tab_racemar <- svyby(formula = ~healthb, by= ~hw3$int_racemar, design = des, FUN = svymean, na.rm=T )
tab_edumar <- svyby(formula = ~healthb, by= ~hw3$int_edumar, design = des, FUN = svymean, na.rm=T )
tab_eduinc <- svyby(formula = ~healthb, by= ~hw3$int_eduinc, design = des, FUN = svymean, na.rm=T )
tab_marinc <- svyby(formula = ~healthb, by= ~hw3$int_marinc, design = des, FUN = svymean, na.rm=T )

Here we can see the various interactions:

Race and Education:
print(tab_raceedu)
##               hw3$int_raceedu  healthb0   healthb1 se.healthb0 se.healthb1
## nhw.hs                 nhw.hs 0.8392704 0.16072962  0.01942879  0.01942879
## nhb.hs                 nhb.hs 0.8355380 0.16446201  0.02761881  0.02761881
## hisp.hs               hisp.hs 0.8756070 0.12439304  0.01041168  0.01041168
## other.hs             other.hs 0.9627573 0.03724266  0.01064177  0.01064177
## nhw.lths             nhw.lths 0.7670398 0.23296016  0.02731888  0.02731888
## nhb.lths             nhb.lths 0.7304123 0.26958770  0.05024703  0.05024703
## hisp.lths           hisp.lths 0.7610689 0.23893109  0.02515227  0.02515227
## other.lths         other.lths 0.8579312 0.14206879  0.09042507  0.09042507
## nhw.college       nhw.college 0.6620700 0.33792998  0.02933333  0.02933333
## nhb.college       nhb.college 0.1653317 0.83466832  0.06294037  0.06294037
## hisp.college     hisp.college 0.4952952 0.50470482  0.07574091  0.07574091
## other.college   other.college 0.6542786 0.34572137  0.25044559  0.25044559
Race and Income:
print(tab_raceinc)
##                       hw3$int_raceinc  healthb0   healthb1 se.healthb0
## nhw.$25k to $75k     nhw.$25k to $75k 0.8250697 0.17493031  0.02141843
## nhb.$25k to $75k     nhb.$25k to $75k 0.8285003 0.17149969  0.03862119
## hisp.$25k to $75k   hisp.$25k to $75k 0.8237947 0.17620528  0.01814508
## other.$25k to $75k other.$25k to $75k 0.9057941 0.09420592  0.05358624
## nhw.gt $75k               nhw.gt $75k 0.8983927 0.10160734  0.02732769
## nhb.gt $75k               nhb.gt $75k 0.9478421 0.05215792  0.02937425
## hisp.gt $75k             hisp.gt $75k 0.9089065 0.09109353  0.01244464
## other.gt $75k           other.gt $75k 0.9761002 0.02389980  0.01162920
## nhw.lt $25k               nhw.lt $25k 0.6606190 0.33938102  0.02386725
## nhb.lt $25k               nhb.lt $25k 0.5476192 0.45238085  0.05279069
## hisp.lt $25k             hisp.lt $25k 0.6513170 0.34868302  0.03126566
## other.lt $25k           other.lt $25k 0.8452794 0.15472060  0.09131812
##                    se.healthb1
## nhw.$25k to $75k    0.02141843
## nhb.$25k to $75k    0.03862119
## hisp.$25k to $75k   0.01814508
## other.$25k to $75k  0.05358624
## nhw.gt $75k         0.02732769
## nhb.gt $75k         0.02937425
## hisp.gt $75k        0.01244464
## other.gt $75k       0.01162920
## nhw.lt $25k         0.02386725
## nhb.lt $25k         0.05279069
## hisp.lt $25k        0.03126566
## other.lt $25k       0.09131812
Race and Marital Status:
print(tab_racemar)
##         hw3$int_racemar  healthb0   healthb1 se.healthb0 se.healthb1
## nhw.n             nhw.n 0.7415125 0.25848750  0.02163885  0.02163885
## nhb.n             nhb.n 0.7097678 0.29023220  0.03725652  0.03725652
## hisp.n           hisp.n 0.7962975 0.20370245  0.01741462  0.01741462
## other.n         other.n 0.9633759 0.03662414  0.01125318  0.01125318
## nhw.y             nhw.y 0.7624572 0.23754282  0.02105674  0.02105674
## nhb.y             nhb.y 0.8335686 0.16643139  0.03770840  0.03770840
## hisp.y           hisp.y 0.8553331 0.14466693  0.01326465  0.01326465
## other.y         other.y 0.8858774 0.11412255  0.05147940  0.05147940
Education and Income:
print(tab_eduinc)
##                            hw3$int_eduinc  healthb0   healthb1 se.healthb0
## hs.$25k to $75k           hs.$25k to $75k 0.8692611 0.13073894 0.013632071
## lths.$25k to $75k       lths.$25k to $75k 0.8242474 0.17575260 0.024250413
## college.$25k to $75k college.$25k to $75k 0.6665457 0.33345434 0.053511748
## hs.gt $75k                     hs.gt $75k 0.9285984 0.07140161 0.009551788
## lths.gt $75k                 lths.gt $75k 0.8916388 0.10836119 0.028146042
## college.gt $75k           college.gt $75k 0.4585234 0.54147663 0.156258593
## hs.lt $25k                     hs.lt $25k 0.7202350 0.27976496 0.026443039
## lths.lt $25k                 lths.lt $25k 0.6454609 0.35453911 0.029776438
## college.lt $25k           college.lt $25k 0.5998525 0.40014754 0.033345586
##                      se.healthb1
## hs.$25k to $75k      0.013632071
## lths.$25k to $75k    0.024250413
## college.$25k to $75k 0.053511748
## hs.gt $75k           0.009551788
## lths.gt $75k         0.028146042
## college.gt $75k      0.156258593
## hs.lt $25k           0.026443039
## lths.lt $25k         0.029776438
## college.lt $25k      0.033345586
Education and Marital Status:
print(tab_edumar)
##           hw3$int_edumar  healthb0   healthb1 se.healthb0 se.healthb1
## hs.n                hs.n 0.8292242 0.17077579 0.014545894 0.014545894
## lths.n            lths.n 0.7559399 0.24406008 0.022999963 0.022999963
## college.n      college.n 0.6166494 0.38335065 0.039467225 0.039467225
## hs.y                hs.y 0.9000689 0.09993109 0.009302129 0.009302129
## lths.y            lths.y 0.7749300 0.22507001 0.025622266 0.025622266
## college.y      college.y 0.6124883 0.38751173 0.039782717 0.039782717
Marital Status and Income:
print(tab_marinc)
##                hw3$int_marinc  healthb0   healthb1 se.healthb0 se.healthb1
## n.$25k to $75k n.$25k to $75k 0.8420128 0.15798721  0.01755181  0.01755181
## y.$25k to $75k y.$25k to $75k 0.8188233 0.18117672  0.01863789  0.01863789
## n.gt $75k           n.gt $75k 0.9308591 0.06914091  0.01651442  0.01651442
## y.gt $75k           y.gt $75k 0.9119315 0.08806853  0.01179253  0.01179253
## n.lt $25k           n.lt $25k 0.6575462 0.34245379  0.02116671  0.02116671
## y.lt $25k           y.lt $25k 0.6426850 0.35731503  0.03194328  0.03194328
5a) examine your model for potential stratification by one or more of your predictors.

We’ll examine 4 different models:
Model 1 will examine how being of a certain race effects one’s health status
Model 2 will add the effects of being married
Model 3 will add the effect of education Model 4 will add the effect of a person’s income

mod1 <- svyglm(healthb~race,
               design = des,
               family = "binomial",
               data = hw3)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
mod2 <- svyglm(healthb~race+mar,
               design = des,
               family = "binomial",
               data = hw3)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
mod3 <- svyglm(healthb~race+mar+edu,
               design = des,
               family = "binomial",
               data = hw3)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
mod4 <- svyglm(healthb~race+mar+edu+inc,
               design = des,
               family = "binomial",
               data = hw3)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!

Model 1

summary(mod1)
## 
## Call:
## svyglm(formula = healthb ~ race, design = des, family = "binomial", 
##     data = hw3)
## 
## Survey design:
## svydesign(ids = ~1, weights = ~wt, data = hw3)
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.11060    0.08104 -13.704  < 2e-16 ***
## racenhb     -0.01608    0.17143  -0.094  0.92526    
## racehisp    -0.48405    0.11077  -4.370 1.26e-05 ***
## raceother   -1.31065    0.42337  -3.096  0.00197 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1.000099)
## 
## Number of Fisher Scoring iterations: 4

Model 2

summary(mod2)
## 
## Call:
## svyglm(formula = healthb ~ race + mar, design = des, family = "binomial", 
##     data = hw3)
## 
## Survey design:
## svydesign(ids = ~1, weights = ~wt, data = hw3)
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.97468    0.09360 -10.413  < 2e-16 ***
## racenhb     -0.05566    0.17012  -0.327  0.74353    
## racehisp    -0.46334    0.11147  -4.157 3.25e-05 ***
## raceother   -1.29507    0.42719  -3.032  0.00244 ** 
## mary        -0.27461    0.10472  -2.622  0.00875 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1.002463)
## 
## Number of Fisher Scoring iterations: 4
regTermTest(mod2, test.terms =  ~mar, method = "Wald", df = NULL)
## Wald test for mar
##  in svyglm(formula = healthb ~ race + mar, design = des, family = "binomial", 
##     data = hw3)
## F =  6.876526  on  1  and  10066  df: p= 0.0087467

The results of the F test suggest that the second model fits the data better

Model 3

summary(mod3)
## 
## Call:
## svyglm(formula = healthb ~ race + mar + edu, design = des, family = "binomial", 
##     data = hw3)
## 
## Survey design:
## svydesign(ids = ~1, weights = ~wt, data = hw3)
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.8373     0.1487 -12.358  < 2e-16 ***
## racenhb       0.4424     0.1796   2.464   0.0138 *  
## racehisp      0.1022     0.1352   0.756   0.4496    
## raceother    -0.7882     0.4254  -1.853   0.0639 .  
## mary         -0.2602     0.1077  -2.416   0.0157 *  
## edulths       0.6937     0.1262   5.499 3.92e-08 ***
## educollege    1.4949     0.1664   8.985  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 0.9950085)
## 
## Number of Fisher Scoring iterations: 4
#for some reason Rmarkdown would not knit with this line active, so I # it out
#but included the results
#regTermTest(mod3, test.terms =  ~inc, method = "Wald", df = NULL)
#results: Wald test for inc
 #in svyglm(formula = healthb ~ edu + mar + inc + race + race * edu + 
 #   race * inc + race * mar + edu * mar + edu * inc, design = des, 
  #  family = "binomial", data = hw3)
#F =  10.77327  on  2  and  10041  df: p= 2.1195e-05 

The results of the F test suggest that this model most likely fits the data the best.

Model 4
summary(mod4)
## 
## Call:
## svyglm(formula = healthb ~ race + mar + edu + inc, design = des, 
##     family = "binomial", data = hw3)
## 
## Survey design:
## svydesign(ids = ~1, weights = ~wt, data = hw3)
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.11600    0.16431 -12.878  < 2e-16 ***
## racenhb      0.51210    0.18190   2.815 0.004881 ** 
## racehisp     0.35510    0.13885   2.557 0.010561 *  
## raceother   -0.66787    0.41277  -1.618 0.105685    
## mary         0.07281    0.11498   0.633 0.526578    
## edulths      0.44697    0.12922   3.459 0.000544 ***
## educollege   0.99026    0.17976   5.509 3.70e-08 ***
## incgt $75k  -0.69517    0.16230  -4.283 1.86e-05 ***
## inclt $25k   0.79733    0.13037   6.116 9.97e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 0.9930785)
## 
## Number of Fisher Scoring iterations: 4
regTermTest(mod4, test.terms =  ~inc, method = "Wald", df = NULL)
## Wald test for inc
##  in svyglm(formula = healthb ~ race + mar + edu + inc, design = des, 
##     family = "binomial", data = hw3)
## F =  38.89009  on  2  and  10062  df: p= < 2.22e-16

The F test suggests that this model fits the data the best.

Looking at all of the models together and we see the following:

stargazer(mod1, mod2, mod3, mod4,
          type = "html",
          style = "demography",
          ci = T,
          column.sep.width = "10pt")
healthb
Model 1 Model 2 Model 3 Model 4
racenhb -0.016 -0.056 0.442* 0.512**
(-0.352, 0.320) (-0.389, 0.278) (0.091, 0.794) (0.156, 0.869)
racehisp -0.484*** -0.463*** 0.102 0.355*
(-0.701, -0.267) (-0.682, -0.245) (-0.163, 0.367) (0.083, 0.627)
raceother -1.311** -1.295** -0.788 -0.668
(-2.140, -0.481) (-2.132, -0.458) (-1.622, 0.046) (-1.477, 0.141)
mary -0.275** -0.260* 0.073
(-0.480, -0.069) (-0.471, -0.049) (-0.153, 0.298)
edulths 0.694*** 0.447***
(0.446, 0.941) (0.194, 0.700)
educollege 1.495*** 0.990***
(1.169, 1.821) (0.638, 1.343)
75k -0.695***
(-1.013, -0.377)
25k 0.797***
(0.542, 1.053)
Constant -1.111*** -0.975*** -1.837*** -2.116***
(-1.269, -0.952) (-1.158, -0.791) (-2.129, -1.546) (-2.438, -1.794)
N 10,071 10,071 10,071 10,071
Log Likelihood -4,750.214 -4,737.609 -4,524.720 -4,360.229
AIC 9,508.427 9,485.218 9,063.440 8,738.457
p < .05; p < .01; p < .001

Here we can see that the effect of being of hispanic descent is protective, as compared to non-Hispanic white population.

In model 2, when the effect of marriage is accounted for, we can see that being of hispanic descent is slightly less protective.

In model 3, when education is accounted for, we observe that being of Hispanic descent is no longer protective in regards to being in poor health.

In model 4, when income is accounted for, we can see that being of hispanic descent is now more likely to be of poor or fair health than good health.

5b) What evidence do you use to justify your stratification?

Looking at the nesting of our models, we can see that when adding education to the model, we have fully accounted for the being of Hispanic descent on a person’s self reported health status. but when we add income to the model, we see that being of hispanic descent is now harmful in regards to self reported health. This bears further investigation and is eveidence of there being stratification within the models.

6) Stratify your model by one or more of your predictors

mod1_strat1 = svyglm(healthb ~ race*inc + edu + mar, design = des, 
family = binomial) #race*inc
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
regTermTest(mod1_strat1, test.terms = ~race:inc, method = "Wald", df = NULL)
## Wald test for race:inc
##  in svyglm(formula = healthb ~ race * inc + edu + mar, design = des, 
##     family = binomial)
## F =  1.110701  on  6  and  10056  df: p= 0.35311

Unfortunately, we see that our interaction term is not significant, so we may conclude that the effects of income are fairly constant by race.

Because Education seemed to have the largest effect of our 4 models, we will next examine educations effect by race:

mod1_strat2 = svyglm(healthb ~ race*edu + inc + mar, design = des, 
family = binomial) #race*inc
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
regTermTest(mod1_strat2, test.terms = ~race:edu, method = "Wald", df = NULL)
## Wald test for race:edu
##  in svyglm(formula = healthb ~ race * edu + inc + mar, design = des, 
##     family = binomial)
## F =  3.732227  on  6  and  10056  df: p= 0.0010362

Here we see that the interaction term in this model is significant, so we may conclude, at the .05 level of significance, that the effects of education are not constant by race.

Now we will stratify the model.

fit.unrestricted <- svyglm(healthb ~ race + edu + inc + mar, design = des, 
family = binomial)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
mod_white <- mod1 <- svyglm(healthb~(edu + inc + mar),
               design = subset(des, race=="nhw"),
               family = "binomial",
               data = hw3)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
mod_black <- mod1 <- svyglm(healthb~(edu + inc + mar),
               design = subset(des, race=="nhb"),
               family = "binomial",
               data = hw3)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
mod_hisp <- mod1 <- svyglm(healthb~(edu + inc + mar),
               design = subset(des, race=="hisp"),
               family = "binomial",
               data = hw3)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
mod_other <- mod1 <- svyglm(healthb~(edu + inc + mar),
               design = subset(des, race=="other"),
               family = "binomial",
               data = hw3)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
stargazer(mod_white, mod_black, mod_hisp, mod_other,
          type = "html",
          covariate.labels = c("High School", "College", "$75+", "LT $25k", "Married", "Constant"),
          column.labels = c("NH White", "NH Black", "Hispanic", "Other"),
          ci = T)
Dependent variable:
healthb
NH White NH Black Hispanic Other
(1) (2) (3) (4)
High School 0.190 0.227 0.567*** 1.223
(-0.228, 0.608) (-0.510, 0.963) (0.231, 0.903) (-0.634, 3.080)
College 0.546** 2.541*** 1.426*** 1.911*
(0.096, 0.996) (1.451, 3.631) (0.706, 2.146) (-0.146, 3.968)
75+ -0.465 -1.296* -0.676*** -1.411
(-1.133, 0.203) (-2.650, 0.058) (-1.058, -0.294) (-3.158, 0.336)
LT 25k 0.758*** 1.172*** 0.761*** 0.365
(0.365, 1.151) (0.462, 1.882) (0.379, 1.143) (-1.593, 2.323)
Married -0.009 0.295 0.120 1.649**
(-0.348, 0.331) (-0.455, 1.044) (-0.202, 0.442) (0.341, 2.958)
Constant -1.776*** -1.874*** -1.854*** -3.960***
(-2.165, -1.386) (-2.590, -1.157) (-2.195, -1.514) (-5.210, -2.709)
Observations 2,664 839 6,093 475
Log Likelihood -1,355.792 -346.636 -2,442.896 -88.261
Akaike Inf. Crit. 2,723.585 705.273 4,897.791 188.522
Note: p<0.1; p<0.05; p<0.01

6b) Is there evidence for homogeneity of regression effects across the strata? how do you know? Explain/Discuss the implications

Quite frankly, I’m a little perplexed by the results of the stratification. I’m pretty sure I’ve got the labels correct and the categories correct. I’m going to interpret based on the results and forgo any knowledge I may have on the matter.

Based on the stratification, we can see that the effects of education get worse for all races when compared to a person who does not finish high school. This effect is significant for Hispanics who complete high school and all races who have some college.

High income appears to be a protective benefit for both NH Black and Hispanics, while earning less than $25k is harmful to all races but the other category when compared to those who make between $25k to $75k.

The effect of getting married appears harmful to those in the other category of race, and does not effect the other races.

Overall we can see that there is stratification by race amongst these categories.