Homework 3

Quarto

Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.

Running Code

When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:

1 + 1

[1] 2

You can add options to executable code like this

[1] 4

The echo: false option disables the printing of code (only output is displayed).

##load libraries
library(tidyverse)

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6     ✔ purrr   0.3.4
✔ tibble  3.1.8     ✔ dplyr   1.0.9
✔ tidyr   1.2.0     ✔ stringr 1.4.1
✔ readr   2.1.2     ✔ forcats 0.5.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

library(haven)
library(survival)
library(car)

Loading required package: carData

Attaching package: 'car'

The following object is masked from 'package:dplyr':

    recode

The following object is masked from 'package:purrr':

    some

library(survey)

Loading required package: grid
Loading required package: Matrix

Attaching package: 'Matrix'

The following objects are masked from 'package:tidyr':

    expand, pack, unpack


Attaching package: 'survey'

The following object is masked from 'package:graphics':

    dotchart

library(muhaz)
library(eha)

##load data



nhis9 <- read_stata("C:/Users/okabe/OneDrive/Pictures/Stats 2/nhis_00010.dta.gz")
nhis9<-zap_labels(nhis9)

#recode variables

#poor

nhis9$poor<-Recode(nhis9$pooryn, recodes ="7:9=NA; 1=0;2=1",as.factor=T)

##Alcohol

nhis9$alcohol<-Recode(nhis9$alclife, recodes ="1=1;2=0;7:9=NA;0=NA")

##smoking

nhis9$smoke<-Recode(nhis9$smokev, recodes ="7:9=NA;1=1;2=0; 0=NA")

##usborn
nhis9$born<-Recode(nhis9$usborn, recodes ="96:99=NA; 20=0;10:12=1;else=NA",as.factor=T)

##educ
nhis9$education<-Recode(nhis9$educ, recodes="102='0Prim'; 100:116='1somehs'; 200:202='2hsgrad'; 300:301='3somecol'; 400='4colgrad';500:503='5masteranddoc'; else= NA", as.factor=T)

##Recode
##citizen
nhis9$mycitizen<-Recode(nhis9$citizen, recodes =" 1=1;2=0;else=NA",as.factor=T)

##Marital status

nhis9$marital<-Recode(nhis9$marstat, recodes="10='married'; 30='divorced'; 20='widowed'; 40='separated'; 50='nm'; else=NA", as.factor=T)
nhis9$marital<-relevel(nhis9$marital, ref='married')

#Age cut into intervals 


nhis9$agec<-cut(nhis9$age, breaks = c( 30, 40, 50, 60, 70, 80, 100), include.lowest = T)

#race/ethnicity
nhis9$race<-Recode(nhis9$racesr, recodes="100='white'; 200='black'; else='other'", as.factor=T)
nhis9$race<-relevel(nhis9$race, ref = "white")


##Hispanic
nhis9$ethnicity<-Recode(nhis9$hispeth, recodes="10='not hispanic'; 20:70='hispanic'; else=NA", as.factor=T)

##disease
#nhis2$disease<- Recode(nhis2$mortucodld, recodes= "01= 0; 02=1; 03=2; 04=3;05=4; 06=5; 07=6; 08= 7; 09=8; 10=9; else=NA")

#nhis2$disease<- Recode(nhis2$mortucodld, recodes= "01= 'Diseases of heart'; 02='Malignant neoplasms'; 03='Chronic lower respiratory diseases'; 04='Accidents';05='Cerebrovascular disease'; 06='Alzheimers disease'; 07='Diabetes mellitus'; 08= 'Influenza and pneumonia'; 09='Nephritis'; 10='All other causes'; 96= NA; else=NA",as.factor=T )


##sex
nhis9$male<-as.factor(ifelse(nhis9$sex==1, "Male", "Female"))
nhis9$male<-relevel(nhis9$male, ref='Male')

##Earnings
#nhis9$inc<-Recode(brfss_17$incomg, recodes = "9= NA;1='1_lt15k'; 2='2_15-25k';3='3_25-35k';4='4_35-50k';5='5_50kplus'", as.factor = T)

##employment status
nhis9$employed<- Recode(nhis9$empstat, recodes= " 100:122='unemployed'; 200:217='Employed';220='notinlaborforce'; else=NA", as.factor=T)

Submit either a link to an Rpub that has your homework published, or as a emailed Word document Please answer the following questions, use appropriate figures and statistical output to answer the questions.

Parametric models 1) Carry out the following analysis:

Define your outcome as in HW 1. Also consider what covariates are hypothesized to affect the outcome variable. Define these and construct a parametric model for your outcome.

The outcome that I will looking at is mortality. To see if citizenship status and SES influence mortality outcomes of immigrants

Fit the parametric model of your choosing to the data.

#aft distribution for hazard
#complete.cases(nhis9$mycitizen, nhis9$education, nhis9$mortstat,nhis9$mortdody,nhis9$poor, nhis9$male)

nhis9 <- nhis9 %>%
  filter(mortelig==1)%>%
          mutate(death_age = ifelse( mortstat ==1, 
                                     mortdody - (year - age), 
                                     2018 - (year - age)), 
                 d.event = ifelse(mortstat == 1, 1, 0))%>%
  filter(complete.cases(mycitizen,education,poor,male))

 library(survival)

        age_fit <- survfit(Surv(death_age, d.event)~ 1, 
                           data = nhis9)

        library(ggsurvfit)

        age_fit %>%
          ggsurvfit() +
          add_confidence_interval(type = "ribbon") +
          add_quantile()

##using WEIBULL

fit.4<-phreg(Surv(death_age, d.event)~mycitizen+education+poor+male,
                   data=nhis9,
             
dist="weibull")

summary(fit.4)

Covariate             Mean       Coef     Rel.Risk   S.E.    LR p
mycitizen                                                   0.0001 
               0      0.881     0         1 (reference)
               1      0.119    -0.291     0.747     0.077
education                                                   0.0000 
           0Prim      0.008     0         1 (reference)
         1somehs      0.193     0.487     1.628     0.174
         2hsgrad      0.307     0.543     1.720     0.175
        3somecol      0.199     0.580     1.785     0.177
        4colgrad      0.186     0.210     1.234     0.179
   5masteranddoc      0.107     0.035     1.036     0.183
poor                                                        0.0000 
               0      0.852     0         1 (reference)
               1      0.148     0.394     1.483     0.042
male                                                        0.0000 
            Male      0.477     0         1 (reference)
          Female      0.523    -0.401     0.670     0.031

Events                    4381 
Total time at risk        2638001 
Max. log. likelihood      -22702 
LR test statistic         355.90 
Degrees of freedom        8 
Overall p-value           0

plot(fit.4)

fit.gom<-phreg(Surv(death_age, d.event)~mycitizen+education+poor+male,
                   data=nhis9,
             
dist="gompertz")

Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"): Non-
positive shape or scale

Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale

Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"): Non-
positive shape or scale

Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale

summary(fit.gom)

Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"): Non-
positive shape or scale

Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"): Non-
positive shape or scale

Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"): Non-
positive shape or scale

Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale

Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"): Non-
positive shape or scale

Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale

Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"): Non-
positive shape or scale

Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale

Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"): Non-
positive shape or scale

Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale

Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"): Non-
positive shape or scale

Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale

Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"): Non-
positive shape or scale

Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale

Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"): Non-
positive shape or scale

Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale

Covariate             Mean       Coef     Rel.Risk   S.E.    LR p
mycitizen                                                   0.0001 
               0      0.881     0         1 (reference)
               1      0.119    -0.284     0.753     0.077
education                                                   0.0000 
           0Prim      0.008     0         1 (reference)
         1somehs      0.193     0.505     1.656     0.174
         2hsgrad      0.307     0.578     1.782     0.175
        3somecol      0.199     0.624     1.867     0.177
        4colgrad      0.186     0.252     1.286     0.179
   5masteranddoc      0.107     0.098     1.103     0.183
poor                                                        0.0000 
               0      0.852     0         1 (reference)
               1      0.148     0.383     1.467     0.042
male                                                        0.0000 
            Male      0.477     0         1 (reference)
          Female      0.523    -0.421     0.656     0.031

Events                    4381 
Total time at risk        2638001 
Max. log. likelihood      -22373 
LR test statistic         356.11 
Degrees of freedom        8 
Overall p-value           0

plot(fit.gom)

Did you choose an AFT or PH model and why?

I choose PH because the data fit and ran best.

Justify what parametric distribution you choose

The PH model provides various function such as Gompertz and Weibull that provide descriptives and graphics to show how the data fits as well as show mortality outcomes based on predictors used for this homework which include(sex, citizenship status and education). Additionally, it helps measure the probability of failing at any given time. In this case, the probability of experiencing mortality based on citizenship status and SES.

Carry out model fit diagnostics for the model

AIC(fit.4)

[1] 45423.61

AIC(fit.gom)

[1] 44766.79

Gompertz fits best compared to the rest of the models based on its low AIC.

Include all main effects in the model
Test for an interaction between at least two of the predictors

The predictors used are citizenship status, SES (education,poverty status) and sex

Interpret your results and write them up

Weibull Interpretation:

The results above show that comparing to citizens, non citizens are less likely to experience mortality, similarly,this is similar to females when they are compared to men. However, when looking a the poverty threshold, individuals living below poverty are more likely to experience mortality. Based on education status, although all education groups have an increased risk of experiencing mortality, the probability decreases the high the education status an individual has. With all the predictor variables showing statistical significance

The Weibull graph shows an increased risk in mortality based on time.

Gompertz Interpretation:

The Gompertz interpretation is similar to the Weibull. Where we see that non citizens and females are less likely to experience mortality. Similarly, higher education reduces the risk of experiencing mortality.

g.Provide tabular and graphical output to support your conclusions

emp<-coxreg(Surv(death_age, d.event)~mycitizen+education+poor+male,
            data=nhis9)

check.dist(sp=emp,pp=fit.gom, main = " Empirical vs.Gompertz")

##check Weibull

check.dist(sp=emp,pp=fit.4, main = " Empirical vs.Weibull")

While looking at the two graphs, we see that Gompertz fits the data better compared to Weibull