For the following task, I am interested in looking at whether mortality as a binary outcome is affected by educational attainment and race/ethnicity which will be treated as categorical variables.

Research Questions: I am interested in answering the following couple of questoins: 1. Does mortality vary with increased educational attainment? 2. Does mortality vary across racial and ethinic population?

The reason, I am investigating these two questions is because, education is a major moderator for mortality as an outcome. People are exposed to education from an early age and education determines various later life attainments which can ensure the conditions required for a healty life, which in turn can affect the mortality outcome. Similarly, mortality has long been affected by the socioeconomic status of the population from the various racial and ethnic groups. The distal factors are critical in determining the broader socioeconomic context that affect mortality outcome. Race and ethnicity is one of those factors that determine the life expectancy at birth, and that varies across the life course. Hence I am interested to look at the variation in mortality outcome by race and ethnicity.

To address these two questions, I am going to use NHIS-LMF 1999-2009 dataset. However, it is a large dataset which is difficult to manage in regular computing capacity. Hence, I will be using a subset of around 20,000 samples from the total population.

Recoding Variables:

Mortality, as the main outcome variable, will be recoded as a binary outcome. That means, whether someone is dead or not. Education, as one of the moderators, will be categorized into three groups: High School or Less, Some College, and College. Race/Ethnicity, as another moderator, will be categorized into four groups: NHWhite, NHBlack, Hispanic, and NHother. Age is recoded to break into the following age groups: 18-24, 24-35, 35-45, 45-65, 65+.

Loading Libraries

library(readr)
library(car)
library(stargazer)
## 
## Please cite as:
##  Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.1. https://CRAN.R-project.org/package=stargazer
library(survey)
## Loading required package: grid
## Loading required package: Matrix
## Loading required package: survival
## 
## Attaching package: 'survey'
## The following object is masked from 'package:graphics':
## 
##     dotchart
library(survival)
library(cmprsk)
library(questionr)
library(haven)
library(tidyr)
## 
## Attaching package: 'tidyr'
## The following object is masked from 'package:Matrix':
## 
##     expand

Reading data

mydata <- read_dta("C://Users/munta/Google Drive/NHIS/nhis_00005.dta")
names(mydata)
##   [1] "year"         "serial"       "quarter"      "strata"      
##   [5] "psu"          "nhishid"      "hhweight"     "region"      
##   [9] "livingqtr"    "nhispid"      "hhx"          "fmx"         
##  [13] "px"           "perweight"    "sampweight"   "fweight"     
##  [17] "astatflg"     "cstatflg"     "pernum"       "supp2wt"     
##  [21] "age"          "sex"          "marstat"      "marst"       
##  [25] "marstcohab"   "cohabmarst"   "cohabevmar"   "birthmo"     
##  [29] "birthyr"      "racea"        "hispeth"      "racesr"      
##  [33] "yrsinus"      "hispyn"       "usborn"       "citizen"     
##  [37] "racenew"      "racebr"       "raceimpute"   "educrec2"    
##  [41] "educrec1"     "higrade2"     "higrade1"     "educ"        
##  [45] "empstat"      "occ1995"      "occ"          "ind1995"     
##  [49] "ind"          "hourswrk"     "monthwrk"     "numemps"     
##  [53] "secondjob"    "paidhour"     "paidsick"     "usualft"     
##  [57] "empstatwkyr"  "pooryn"       "incfam97on2"  "earnimp1"    
##  [61] "earnimp2"     "earnimp3"     "earnimp4"     "earnimp5"    
##  [65] "health"       "weight"       "bedayr2"      "bmicalc"     
##  [69] "bmi"          "bedayr"       "alc1yr"       "alclife"     
##  [73] "alc5upyr"     "alcamt"       "alcstat1"     "alcstat2"    
##  [77] "alcanyno"     "alcanytp"     "alcdaysmo"    "alcdayswk"   
##  [81] "alcdaysyr"    "alc5upno"     "alc5uptp"     "smokev"      
##  [85] "smokagereg"   "cigdaymo"     "cigsday"      "cigsday1"    
##  [89] "cigsday2"     "smokestatus1" "smokestatus2" "smokfreqnow" 
##  [93] "mortelig"     "mortstat"     "mortdodq"     "mortdody"    
##  [97] "mortucod"     "mortucodld"   "mortwt"       "mortdiab"    
## [101] "morthypr"     "mortcms"      "mortndi"      "mortssa"     
## [105] "mortwtsa"

Taking a Subset of the Population and Recoding Variables

#Taking a subset of the total observations
sub<-subset(mydata, mydata$mortelig==1&is.na(mydata$racenew)==F)

samps<-sample(1:length(sub$psu), size = 20000, replace = F)
sub<-sub[samps,] #Using a subset of 20000 samples

#Age
sub$d.age<-ifelse(sub$mortstat==1, sub$mortdody-(sub$year-sub$age) ,
                  ifelse(sub$mortstat==2, 2006-(sub$year-sub$age), NA))
#Mortality Status
sub$d.event<-ifelse(sub$mortstat==1,1,0)
sub$timetodeath<-ifelse(sub$mortstat ==1, sub$mortdody-sub$year  , 2006 - sub$year )
sub$d5yr<-ifelse(sub$timetodeath<=5&sub$mortstat==1, 1,0)

#Marital Status
sub$married<-recode(sub$marstat, recodes="00=NA; 10:13='married'; 20:40='sep'; 50='nm'; 99=NA" ,as.factor.result=T )
sub$sep<-ifelse(sub$married=='sep',1,0)
sub$nm<-ifelse(sub$married=='nm',1,0)

sub$male<-ifelse(sub$sex==1,1,0)

sub$mwt<-sub$mortwt/mean(sub$mortwt, na.rm=T)

#Age cut into intervals
sub$agere<-cut(sub$age, breaks=c(15,18,24,35,45,65,85))

#Education 
sub$moredu<-recode(sub$educrec2, recodes="00=NA; 10:42='hs or less'; 50:53='some coll'; 54:60='coll'; else=NA", as.factor.result=T)

sub$hs<-ifelse(sub$moredu=='hs or less',1,0)
sub$scol1<-ifelse(sub$moredu=='some coll',1,0)

#Race/Ethnicity
sub$race<-recode(sub$racenew, recodes ="10='wht'; 20 ='blk'; 30:61='other'; 97:99=NA", as.factor.result=T)
sub$black<-ifelse(sub$race=='blk',1,0)
sub$oth<-ifelse(sub$race=='other',1,0)

sub$hisp<-recode(sub$hispyn, recodes="1=0; 2=1; else=NA")

sub$race_eth[sub$hisp == 0 & sub$race=="wht"]<-"NHWhite"
## Warning: Unknown or uninitialised column: 'race_eth'.
sub$race_eth[sub$hisp == 0 & sub$race=="blk"]<-"NHBlack"
sub$race_eth[sub$hisp == 0 & sub$race=="other"]<-"NHother"
sub$race_eth[sub$hisp == 1 ]<-"Hispanic"
sub$race_eth[is.na(sub$hisp) ==T | is.na(sub$race)==T]<-NA

Analysis

Descriptive analysis of the outcome variable

##Education 
#Raw frequencies
table(sub$mortstat, sub$moredu) 
##    
##     coll hs or less some coll
##   1  197       1003       281
##   2 4231       8809      5132
#column percentages
prop.table(table(sub$mortstat, sub$moredu), margin=2)
##    
##           coll hs or less  some coll
##   1 0.04448961 0.10222177 0.05191206
##   2 0.95551039 0.89777823 0.94808794
#basic chi square test of independence
chisq.test(table(sub$mortstat, sub$moredu))
## 
##  Pearson's Chi-squared test
## 
## data:  table(sub$mortstat, sub$moredu)
## X-squared = 204.88, df = 2, p-value < 2.2e-16
##Race/Ethnicity 
#Raw frequencies
table(sub$mortstat, sub$race_eth) 
##    
##     Hispanic NHBlack NHother NHWhite
##   1      187     229      43    1054
##   2     3895    2502    1068   11018
#column percentages
prop.table(table(sub$mortstat, sub$race_eth), margin=2)
##    
##       Hispanic    NHBlack    NHother    NHWhite
##   1 0.04581088 0.08385207 0.03870387 0.08730948
##   2 0.95418912 0.91614793 0.96129613 0.91269052
#basic chi square test of independence
chisq.test(table(sub$mortstat, sub$race_eth))
## 
##  Pearson's Chi-squared test
## 
## data:  table(sub$mortstat, sub$race_eth)
## X-squared = 99.741, df = 3, p-value < 2.2e-16

The descriptive results do not provide much reliable information because we assume random sampling. However, the dataset NHIS-LMF being used here does not assume random sampling during survery.

Creating a Survey Design Object

I try to identify the survey design, including ids = PSU identifers, strata=strata identifiers, weights=case weights, data= the data frame where these variables are located. Respondents with NON-MISSING case weights are included in the survey only.

options(survey.lonely.psu = "adjust")
des<-svydesign(ids=~psu, strata=~strata, weights=~mortwt, data = sub[sub$mortwt>0,], nest=T)

Simple weighted analysis

Re-doing the analysis from above using only survey weights:

Education

#For Education
cat<-wtd.table(sub$mortstat, sub$moredu, weights = sub$mortwt)
prop.table(wtd.table(sub$mortstat, sub$moredu, weights = sub$mortwt), margin=2)
##         coll hs or less  some coll
## 1 0.04413312 0.10280604 0.04808980
## 2 0.95586688 0.89719396 0.95191020
#compare that with the original
prop.table(table(sub$mortstat, sub$moredu), margin=2)
##    
##           coll hs or less  some coll
##   1 0.04448961 0.10222177 0.05191206
##   2 0.95551039 0.89777823 0.94808794

There is no notable difference between the weighted and unweighted output.

Race/Ethnicity

#For Race/Ethnicity
cat<-wtd.table(sub$mortstat, sub$race_eth, weights = sub$mortwt)
prop.table(wtd.table(sub$mortstat, sub$race_eth, weights = sub$mortwt), margin=2)
##     Hispanic    NHBlack    NHother    NHWhite
## 1 0.03860241 0.07635770 0.03723757 0.08153937
## 2 0.96139759 0.92364230 0.96276243 0.91846063
#compare that with the original
prop.table(table(sub$mortstat, sub$race_eth), margin=2)
##    
##       Hispanic    NHBlack    NHother    NHWhite
##   1 0.04581088 0.08385207 0.03870387 0.08730948
##   2 0.95418912 0.91614793 0.96129613 0.91269052

There are slight differences, notably that the mortality status of assumed alive is higher in the sample than the population.

We can also look at the the standard errors of these percentages. This can be found for a proportion by: \(s.e. (p)={\sqrt {p(1-p)} \over {n}}\)

So we need to get n and p:

n<-table(is.na(sub$mortstat)==F)
n
## 
##  TRUE 
## 20000
#Education
p<-prop.table(wtd.table(sub$mortstat, sub$moredu, weights = sub$mortwt), margin=2)
se<-sqrt((p*(1-p))/n[1])

data.frame(proportion=p, se=se)
##   proportion.Var1 proportion.Var2 proportion.Freq se.Var1    se.Var2
## 1               1            coll      0.04413312       1       coll
## 2               2            coll      0.95586688       2       coll
## 3               1      hs or less      0.10280604       1 hs or less
## 4               2      hs or less      0.89719396       2 hs or less
## 5               1       some coll      0.04808980       1  some coll
## 6               2       some coll      0.95191020       2  some coll
##       se.Freq
## 1 0.001452332
## 2 0.001452332
## 3 0.002147521
## 4 0.002147521
## 5 0.001512897
## 6 0.001512897
#Race/Ethnicity
p<-prop.table(wtd.table(sub$mortstat, sub$race_eth, weights = sub$mortwt), margin=2)
se<-sqrt((p*(1-p))/n[1]) #assumes random sampling

data.frame(proportion=p, se=se)
##   proportion.Var1 proportion.Var2 proportion.Freq se.Var1  se.Var2
## 1               1        Hispanic      0.03860241       1 Hispanic
## 2               2        Hispanic      0.96139759       2 Hispanic
## 3               1         NHBlack      0.07635770       1  NHBlack
## 4               2         NHBlack      0.92364230       2  NHBlack
## 5               1         NHother      0.03723757       1  NHother
## 6               2         NHother      0.96276243       2  NHother
## 7               1         NHWhite      0.08153937       1  NHWhite
## 8               2         NHWhite      0.91846063       2  NHWhite
##       se.Freq
## 1 0.001362209
## 2 0.001362209
## 3 0.001877860
## 4 0.001877860
## 5 0.001338860
## 6 0.001338860
## 7 0.001935080
## 8 0.001935080

The outputs show very small standard errors, because it has a huge sample size. We can use survey statistical methods to get the right standard error for a statistic.

Proper survey design analysis

#Considering the full sample design + weights
cat<-svytable(~mortstat+moredu, design = des)
prop.table(svytable(~mortstat+moredu, design = des), margin = 2)
##         moredu
## mortstat       coll hs or less  some coll
##        1 0.04413312 0.10280604 0.04808980
##        2 0.95586688 0.89719396 0.95191020
cat<-svytable(~mortstat+race_eth, design = des)
prop.table(svytable(~mortstat+race_eth, design = des), margin = 2)
##         race_eth
## mortstat   Hispanic    NHBlack    NHother    NHWhite
##        1 0.03860241 0.07635770 0.03723757 0.08153937
##        2 0.96139759 0.92364230 0.96276243 0.91846063

The output provide us the same %’s as the weighted table above.

Using svyby() function to calculate statistics by groups:

sv.table<-svyby(formula = ~mortstat, by = ~moredu, design = des, FUN = svymean, na.rm=T)
sv.table
##                moredu mortstat          se
## coll             coll 1.955867 0.003300489
## hs or less hs or less 1.897194 0.003417530
## some coll   some coll 1.951910 0.003110086
sv.table<-svyby(formula = ~mortstat, by = ~race_eth, design = des, FUN = svymean, na.rm=T)
sv.table
##          race_eth mortstat          se
## Hispanic Hispanic 1.961398 0.003241441
## NHBlack   NHBlack 1.923642 0.005245946
## NHother   NHother 1.962762 0.006003384
## NHWhite   NHWhite 1.918461 0.002683321

And we see the similar point estimates of our mortality outcome as in the simple weighted table; the standard errors have now been adjusted, which are slightly larger than the previous ones, for the survey design as well, so they can be accepted as correct.

Regression Analysis

For Education

An OLS model for our Mortality outcome using education and age as predictors it fitted below:

fit1<-glm(I(mortstat==1)~moredu+agere, data=sub, family = binomial)
summary(fit1)
## 
## Call:
## glm(formula = I(mortstat == 1) ~ moredu + agere, family = binomial, 
##     data = sub)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.9597  -0.3439  -0.2122  -0.1226   3.4242  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -6.65390    1.00476  -6.622 3.53e-11 ***
## moreduhs or less  0.79432    0.08504   9.341  < 2e-16 ***
## moredusome coll   0.40227    0.10083   3.989 6.62e-05 ***
## agere(18,24]      0.97279    1.03939   0.936 0.349308    
## agere(24,35]      1.58029    1.01313   1.560 0.118806    
## agere(35,45]      2.46962    1.00642   2.454 0.014133 *  
## agere(45,65]      3.45329    1.00281   3.444 0.000574 ***
## agere(65,85]      5.32329    1.00218   5.312 1.09e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 10505.8  on 19652  degrees of freedom
## Residual deviance:  8033.5  on 19645  degrees of freedom
##   (347 observations deleted due to missingness)
## AIC: 8049.5
## 
## Number of Fisher Scoring iterations: 8

The output shows that, mortality outcome is affected by college education. That means, higher educational attainment increases the likelihood of living longer life. High school or less education has the opposite affect on the mortality outcome compared to the college educated population. As for age, with increased age, likelihood of mortality increases significantly.

Then, we incorporate case weights:

fit2<-glm(I(mortstat==1)~moredu+agere, data=sub, weights = mortwt/mean(sub$mortwt, na.rm=T), family = binomial)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
summary(fit2)
## 
## Call:
## glm(formula = I(mortstat == 1) ~ moredu + agere, family = binomial, 
##     data = sub, weights = mortwt/mean(sub$mortwt, na.rm = T))
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1010  -0.3274  -0.1972  -0.1140   5.1210  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -7.26335    1.36042  -5.339 9.34e-08 ***
## moreduhs or less  0.75475    0.08288   9.107  < 2e-16 ***
## moredusome coll   0.29433    0.09933   2.963  0.00305 ** 
## agere(18,24]      1.55134    1.38896   1.117  0.26403    
## agere(24,35]      2.14165    1.36833   1.565  0.11755    
## agere(35,45]      3.09846    1.36204   2.275  0.02291 *  
## agere(45,65]      4.03499    1.35921   2.969  0.00299 ** 
## agere(65,85]      5.92559    1.35866   4.361 1.29e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 10277  on 19551  degrees of freedom
## Residual deviance:  7842  on 19544  degrees of freedom
##   (347 observations deleted due to missingness)
## AIC: 7874
## 
## Number of Fisher Scoring iterations: 8

With the inclusion of case weights, we do not see any notable difference, except the degree of significance increases.

Now we incorporate design effects as well:

fit3<-svyglm(I(mortstat==1)~moredu+agere, des, family=binomial)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
summary(fit3)
## 
## Call:
## svyglm(formula = I(mortstat == 1) ~ moredu + agere, des, family = binomial)
## 
## Survey design:
## svydesign(ids = ~psu, strata = ~strata, weights = ~mortwt, data = sub[sub$mortwt > 
##     0, ], nest = T)
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -7.26335    1.00287  -7.243 1.29e-12 ***
## moreduhs or less  0.75475    0.08825   8.552  < 2e-16 ***
## moredusome coll   0.29433    0.10821   2.720  0.00671 ** 
## agere(18,24]      1.55134    1.05172   1.475  0.14070    
## agere(24,35]      2.14165    1.00091   2.140  0.03276 *  
## agere(35,45]      3.09846    1.00971   3.069  0.00224 ** 
## agere(45,65]      4.03499    0.99838   4.042 5.96e-05 ***
## agere(65,85]      5.92559    0.99374   5.963 4.12e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 0.9570914)
## 
## Number of Fisher Scoring iterations: 8

The results for the education levels do not vary much compared to either of the other two analysis. The t-statistic is also similar to the previous ones.

For Race/Ethnicity

fit4<-glm(I(mortstat==1)~race_eth+agere, data=sub, family=binomial)
summary(fit4)
## 
## Call:
## glm(formula = I(mortstat == 1) ~ race_eth + agere, family = binomial, 
##     data = sub)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.9595  -0.3664  -0.2241  -0.1415   3.4553  
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     -5.96706    1.00337  -5.947 2.73e-09 ***
## race_ethNHBlack  0.25579    0.11090   2.307  0.02108 *  
## race_ethNHother -0.40767    0.18289  -2.229  0.02581 *  
## race_ethNHWhite  0.03778    0.08877   0.426  0.67043    
## agere(18,24]     0.83339    1.03921   0.802  0.42259    
## agere(24,35]     1.36781    1.01295   1.350  0.17691    
## agere(35,45]     2.25773    1.00621   2.244  0.02485 *  
## agere(45,65]     3.26180    1.00266   3.253  0.00114 ** 
## agere(65,85]     5.17453    1.00221   5.163 2.43e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 10719.9  on 19995  degrees of freedom
## Residual deviance:  8299.5  on 19987  degrees of freedom
##   (4 observations deleted due to missingness)
## AIC: 8317.5
## 
## Number of Fisher Scoring iterations: 8

For the, non-Hispanic White population, changes of being alive as a morality outcome is significant, compared to the rest of the race and ethnic population.

Then we incorporate case weights:

fit5<-glm(I(mortstat==1)~race_eth+agere, data=sub, weights = mortwt/mean(sub$mortwt, na.rm=T), family=binomial)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
summary(fit5)
## 
## Call:
## glm(formula = I(mortstat == 1) ~ race_eth + agere, family = binomial, 
##     data = sub, weights = mortwt/mean(sub$mortwt, na.rm = T))
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1992  -0.3448  -0.2055  -0.1177   5.2245  
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -6.7508     1.3618  -4.957 7.14e-07 ***
## race_ethNHBlack   0.4518     0.1408   3.209  0.00133 ** 
## race_ethNHother  -0.1858     0.2016  -0.921  0.35683    
## race_ethNHWhite   0.1564     0.1170   1.336  0.18148    
## agere(18,24]      1.3827     1.3888   0.996  0.31942    
## agere(24,35]      1.9103     1.3681   1.396  0.16264    
## agere(35,45]      2.8709     1.3618   2.108  0.03503 *  
## agere(45,65]      3.8322     1.3590   2.820  0.00481 ** 
## agere(65,85]      5.7794     1.3586   4.254 2.10e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 10472.8  on 19893  degrees of freedom
## Residual deviance:  8088.4  on 19885  degrees of freedom
##   (4 observations deleted due to missingness)
## AIC: 8121.3
## 
## Number of Fisher Scoring iterations: 8

We do not see much difference, except in the degree of significance.

Now we incorporate design effects as well:

fit6<-svyglm(I(mortstat==1)~race_eth+agere,des, family=binomial)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
summary(fit6)
## 
## Call:
## svyglm(formula = I(mortstat == 1) ~ race_eth + agere, des, family = binomial)
## 
## Survey design:
## svydesign(ids = ~psu, strata = ~strata, weights = ~mortwt, data = sub[sub$mortwt > 
##     0, ], nest = T)
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -6.7508     0.9969  -6.772 2.91e-11 ***
## race_ethNHBlack   0.4518     0.1255   3.600 0.000343 ***
## race_ethNHother  -0.1858     0.2082  -0.892 0.372718    
## race_ethNHWhite   0.1564     0.1020   1.533 0.125888    
## agere(18,24]      1.3827     1.0504   1.316 0.188503    
## agere(24,35]      1.9103     0.9991   1.912 0.056322 .  
## agere(35,45]      2.8709     1.0093   2.844 0.004593 ** 
## agere(45,65]      3.8322     0.9981   3.840 0.000136 ***
## agere(65,85]      5.7794     0.9947   5.810 9.90e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1.00078)
## 
## Number of Fisher Scoring iterations: 8

With design effect, we do not see much variation in the output; Also, the t-statistic is similar to the previous two models.

Showing the results of the three models

For Education and Mortality

stargazer(fit1, fit2, fit3, style="demography", type="html",
          column.labels = c("OLS", "Weights", "Survey"),
          title = "Regression models for Mortality Status by Education using survey data - NHIS2009", 
          covariate.labels=c("College", "High School or Less", "Some College", "Age 18-24","Age 24-35" ,"Age 35-45", "Age 45-65", "Age 65+"), 
          keep.stat="n", model.names=F, align=T, ci=T)
Regression models for Mortality Status by Education using survey data - NHIS2009
I(mortstat == 1)
OLS Weights Survey
Model 1 Model 2 Model 3
College 0.794*** 0.755*** 0.755***
(0.628, 0.961) (0.592, 0.917) (0.582, 0.928)
High School or Less 0.402*** 0.294** 0.294**
(0.205, 0.600) (0.100, 0.489) (0.082, 0.506)
Some College 0.973 1.551 1.551
(-1.064, 3.010) (-1.171, 4.274) (-0.510, 3.613)
Age 18-24 1.580 2.142 2.142*
(-0.405, 3.566) (-0.540, 4.824) (0.180, 4.103)
Age 24-35 2.470* 3.098* 3.098**
(0.497, 4.442) (0.429, 5.768) (1.119, 5.077)
Age 35-45 3.453*** 4.035** 4.035***
(1.488, 5.419) (1.371, 6.699) (2.078, 5.992)
Age 45-65 5.323*** 5.926*** 5.926***
(3.359, 7.288) (3.263, 8.589) (3.978, 7.873)
Age 65+ -6.654*** -7.263*** -7.263***
(-8.623, -4.685) (-9.930, -4.597) (-9.229, -5.298)
N 19,653 19,653 19,552
p < .05; p < .01; p < .001

For Race/Ethnicity and Mortality

stargazer(fit4, fit5, fit6, style="demography", type="html",
          column.labels = c("OLS", "Weights", "Survey"),
          title = "Regression models for Mortality Status by Race/Ethnicty using survey data - NHIS2009", 
          covariate.labels=c("Hispanic", "NHBlack", "NHother", "NHWhite", "Age 18-24","Age 24-35" ,"Age 35-45", "Age 45-65", "Age 65+"), 
          keep.stat="n", model.names=F, align=T, ci=T)
Regression models for Mortality Status by Race/Ethnicty using survey data - NHIS2009
I(mortstat == 1)
OLS Weights Survey
Model 1 Model 2 Model 3
Hispanic 0.256* 0.452** 0.452***
(0.038, 0.473) (0.176, 0.728) (0.206, 0.698)
NHBlack -0.408* -0.186 -0.186
(-0.766, -0.049) (-0.581, 0.209) (-0.594, 0.222)
NHother 0.038 0.156 0.156
(-0.136, 0.212) (-0.073, 0.386) (-0.044, 0.356)
NHWhite 0.833 1.383 1.383
(-1.203, 2.870) (-1.339, 4.105) (-0.676, 3.441)
Age 18-24 1.368 1.910 1.910
(-0.618, 3.353) (-0.771, 4.592) (-0.048, 3.868)
Age 24-35 2.258* 2.871* 2.871**
(0.286, 4.230) (0.202, 5.540) (0.893, 4.849)
Age 35-45 3.262** 3.832** 3.832***
(1.297, 5.227) (1.169, 6.496) (1.876, 5.788)
Age 45-65 5.175*** 5.779*** 5.779***
(3.210, 7.139) (3.117, 8.442) (3.830, 7.729)
Age 65+ -5.967*** -6.751*** -6.751***
(-7.934, -4.000) (-9.420, -4.082) (-8.705, -4.797)
N 19,996 19,996 19,894
p < .05; p < .01; p < .001

The output above shows the same \(\beta\)’s between the survey design model. The weights give us unbiased coefficients.

Hence, our overall interpretation of the effects do not change, positive effects of educational attainment and negative effects of age. Also,that the non-Hispanic White population has a better likelihood of being alive compared to the other racial and ethnic population.

Plot

The following plots show us how different the coefficients are from one another,

For Education and Mortality

plot(coef(fit1)[-1], ylab="Beta parameters",ylim=c(-2, 4), xlab=NULL,axes=T,xaxt="n",main=expression(paste(beta, " Parameters from Survey Regression and non survey regression models")))
axis(side=1, at=1:8, labels=F)
text(x=1:8, y=-2.5,  srt = 45, pos = 1, xpd = TRUE,labels = c( "College", "High School or Less", "Some College", "18_24", "24_35", "35_45", "45_65", "65+"))
#add the coefficients for the unweighted model
points(coef(fit3)[-1], col=2, pch=4, cex=1.5)
abline(h=0, lty=2)
legend("topleft", legend=c("Non-survey model", "Survey Model"), col=c(1,2), pch=c(1,4))

For Race/Ethnicity and Mortality

plot(coef(fit4)[-1], ylab="Beta parameters",ylim=c(-2, 4), xlab=NULL,axes=T,xaxt="n", main=expression(paste(beta, " Parameters from Survey Regression and non survey regression models")))
axis(side=1, at=1:8, labels=F)
text(x=1:8, y=-2.5,  srt = 45, pos = 1, xpd = TRUE,labels = c( "Hispanic", "NHBlack", "NHother", "NHWhite", "18_24", "24_35", "35_45", "45_65", "65+"))
#add the coefficients for the unweighted model
points(coef(fit6)[-1], col=2, pch=4, cex=1.5)
abline(h=0, lty=2)
legend("topleft", legend=c("Non-survey model", "Survey Model"), col=c(1,2), pch=c(1,4))

Which shows us that the betas are similar but have some differences between the two models. Also the plot show the trajectory of what we have already seen in the output of the above models.