Parametric Models - Homework 3

Author

Jules Gonzalez

Data

library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.2      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
library(haven)
library(survival)
library(car)
Loading required package: carData

Attaching package: 'car'

The following object is masked from 'package:dplyr':

    recode

The following object is masked from 'package:purrr':

    some
library(survey)
Loading required package: grid
Loading required package: Matrix

Attaching package: 'Matrix'

The following objects are masked from 'package:tidyr':

    expand, pack, unpack


Attaching package: 'survey'

The following object is masked from 'package:graphics':

    dotchart
library(muhaz)
library(eha)
library(ipumsr)

ddi <- read_ipums_ddi("nhis_00006.xml")
data_nhis<- read_ipums_micro(ddi)
Use of data from IPUMS NHIS is subject to conditions including that users
should cite the data appropriately. Use command `ipums_conditions()` for more
details.
data_nhis<-haven::zap_labels(data_nhis)

nams<-names(data_nhis)
head(nams,n=20)
 [1] "YEAR"       "SERIAL"     "STRATA"     "PSU"        "NHISHID"   
 [6] "HHWEIGHT"   "PERNUM"     "NHISPID"    "HHX"        "FMX"       
[11] "PX"         "PERWEIGHT"  "SAMPWEIGHT" "FWEIGHT"    "ASTATFLG"  
[16] "CSTATFLG"   "RACENEW"    "HISPETH"    "CITIZEN"    "OWNERSHIP" 
newnames<-tolower(gsub(pattern = "_",replacement =  "",x =  nams))
names(data_nhis)<-newnames

Recode:

library(dplyr)
#rent or own 
data_nhis$home<-Recode(data_nhis$ownership, recodes="10:12='own'; 20='rent'; else=NA", as.factor=T)

#us citizen
data_nhis$us<-Recode(data_nhis$citizen, recodes="1='nous'; 2='yesus'; else=NA", as.factor=T)
summary(data_nhis$us)
 nous yesus  NA's 
 8626 83928   832 
data_nhis <- data_nhis %>%
  filter(mortelig == 1, mortdody<9999, is.na(us)==F, is.na(home)==F)

1) Carry out the following analysis:

Define your outcome as in HW 1.

  • The outcome is mortality based on home ownership and citizenship.

  • The grouping variables are Housing/Home ownership (own or rent) and Citizenship (U.S. citizen or not U.S. citizen). The variables are categorical.

Also consider what covariates are hypothesized to affect the outcome variable.

The covariates hypothesized are home ownership and citizenship.

Hypothesis: Mortality for homeowners, both renting and owning, will decrease based on U.S. citizenship.

Construct a parametric model for your outcome.

data_nhis <- data_nhis %>%
  mutate(death_time = ifelse( mortelig ==1, 
                             mortdody - year , 
                             2002 - year ), 
         d5.event = ifelse(mortelig == 1 & death_time <= 5, 1, 0))

library(survival)

time_fit <- survfit(Surv(death_time, d5.event) ~ 1, 
                  conf.type = "log",
                   data = data_nhis)

library(ggsurvfit)

time_fit %>%
  ggsurvfit()+
  xlim(-1, 6) +
  add_censor_mark() +
  add_confidence_interval()+
  add_quantile(y_value = .975)
Warning: Removed 11 row(s) containing missing values (geom_path).

Fit the parametric model of your choosing to the data.

  1. Did you choose an AFT or PH model and why?

    Proportional Hazards Model (PH) because I want to see if the effect of the covariates increase or decrease the hazard by a proportionate amount of time.

  2. Justify what parametric distribution you choose

    Weibull because it is a flexible distribution.

  3. Carry out model fit diagnostics for the model

    #using WEIBULL
    day<- 1/365
    fit.wei<-phreg(Surv(death_time+day, d5.event)~us+home,
                       data=data_nhis,
    
    dist="weibull")
    
    summary(fit.wei)
    Covariate             Mean       Coef     Rel.Risk   S.E.    LR p
    us                                                          0.1786 
                nous      0.044     0         1 (reference)
               yesus      0.956     0.126     1.134     0.095
    home                                                        0.2411 
                 own      0.784     0         1 (reference)
                rent      0.216     0.052     1.054     0.044
    
    Events                    2994 
    Total time at risk        100560 
    Max. log. likelihood      -12763 
    LR test statistic         2.74 
    Degrees of freedom        2 
    Overall p-value           0.253488
  4. Include all main effects in the model

    plot(fit.wei)

  5. Test for an interaction between at least two of the predictors

    survdiff(Surv(death_time, d5.event)~us+home, data=data_nhis)
    Call:
    survdiff(formula = Surv(death_time, d5.event) ~ us + home, data = data_nhis)
    
                           N Observed Expected (O-E)^2/E (O-E)^2/V
    us=nous, home=own    213       61     58.7    0.0937     0.101
    us=nous, home=rent   244       56     69.3    2.5412     2.755
    us=yesus, home=own  8279     2260   2281.1    0.1958     0.871
    us=yesus, home=rent 2141      617    584.9    1.7568     2.312
    
     Chisq= 4.9  on 3 degrees of freedom, p= 0.2 
  6. Interpret your results and write them up

    Results: In both the weibull and survdiff fits, the p. value is over .05 indicating there is no significant interactions between home ownership, citizenship status and mortality. There are no interactions between the predictor variables (home ownershp and citizenship) and the outcome (mortality)

  7. Provide tabular and graphical output to support your conclusions

fit.gom<-phreg(Surv(death_time, d5.event)~us+home,
                   data=data_nhis,
             
dist="gompertz")
summary(fit.gom)
Covariate             Mean       Coef     Rel.Risk   S.E.    LR p
us                                                          0.1180 
            nous      0.044     0         1 (reference)
           yesus      0.956     0.146     1.157     0.095
home                                                        0.1516 
             own      0.784     0         1 (reference)
            rent      0.216     0.064     1.066     0.044

Events                    2994 
Total time at risk        100530 
Max. log. likelihood      -13513 
LR test statistic         3.87 
Degrees of freedom        2 
Overall p-value           0.144361
plot(fit.gom)