Dem 7223 - Homework 2

Author

Jules Gonzalez

Data: IPUMS NHIS

#libraries
library(haven)
library(survival)
library(car)
Loading required package: carData
library(tidyverse)
── Attaching packages
───────────────────────────────────────
tidyverse 1.3.2 ──
✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.2      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
✖ dplyr::recode() masks car::recode()
✖ purrr::some()   masks car::some()
library(ipumsr)



ddi <- read_ipums_ddi("nhis_00004.xml")
data_nhis<- read_ipums_micro(ddi)
Use of data from IPUMS NHIS is subject to conditions including that users
should cite the data appropriately. Use command `ipums_conditions()` for more
details.
data_nhis<-haven::zap_labels(data_nhis)

nams<-names(data_nhis)
head(nams,n=20)
 [1] "YEAR"       "SERIAL"     "STRATA"     "PSU"        "NHISHID"   
 [6] "HHWEIGHT"   "PERNUM"     "NHISPID"    "HHX"        "FMX"       
[11] "PX"         "PERWEIGHT"  "SAMPWEIGHT" "FWEIGHT"    "ASTATFLG"  
[16] "CSTATFLG"   "OWNERSHIP"  "MORTELIG"   "MORTSTAT"   "MORTDODY"  
newnames<-tolower(gsub(pattern = "_",replacement =  "",x =  nams))
names(data_nhis)<-newnames

recode

data_nhis <- data_nhis %>%
  filter(mortelig == 1, mortdody<9999)


library(dplyr)
data_nhis$home<-Recode(data_nhis$ownership, recodes="10:12='own'; 20='rent'; else=NA", as.factor=T)


summary(data_nhis$home)
 own rent NA's 
8506 2388  363 

Comparing Survival Times Between Groups

  1. Define a duration or time variable

    Duration/Time Variable

    Time variable - year 2002

    Event - Death

  2. Define a censoring indicator:
    Year of death

  3. Estimate the survival function for your outcome and plot it

    Survival Function

    data_nhis <- data_nhis %>%
      mutate(death_time = ifelse( mortelig ==1, 
                                 mortdody - year , 
                                 2002 - year ), 
             d5.event = ifelse(mortelig == 1 & death_time <= 5, 1, 0))
    
    library(survival)
    
    time_fit <- survfit(Surv(death_time, d5.event) ~ 1, 
                      conf.type = "log",
                       data = data_nhis)
    
    library(ggsurvfit)
    
    time_fit %>%
      ggsurvfit()+
      xlim(-1, 6) +
      add_censor_mark() +
      add_confidence_interval()+
      add_quantile(y_value = .975)
    Warning: Removed 11 row(s) containing missing values (geom_path).

  4. Carry out the following analysis:

Kaplan-Meier Survival Analysis Outcome

summary(time_fit)
Call: survfit(formula = Surv(death_time, d5.event) ~ 1, data = data_nhis, 
    conf.type = "log")

 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    0  11257     240    0.979 0.00136        0.976        0.981
    1  11017     519    0.933 0.00236        0.928        0.937
    2  10498     553    0.883 0.00302        0.878        0.889
    3   9945     624    0.828 0.00356        0.821        0.835
    4   9321     578    0.777 0.00393        0.769        0.784
    5   8743     602    0.723 0.00422        0.715        0.732

Define a grouping variable (dichotomous/categorical)

The grouping variable is Housing/Home ownership (own or rent). The variable is categorical.

Research hypothesis about the survival patterns for the levels of the variable

Hypothesis: Individuals in rental housing will have a lower survival status than individuals who own a home.

Comparison of Kaplan-Meier survival across grouping variables , interpret results.

x-axis is time in year, y-axis is survival probability.

At one year, the survival probability is 0.933. At three years survival probability rate drops to .828.

Plot the survival function for the analysis for each level of the group variable.

survdiff(Surv(year, d5.event)~home, data=data_nhis)
Call:
survdiff(formula = Surv(year, d5.event) ~ home, data = data_nhis)

n=10894, 363 observations deleted due to missingness.

             N Observed Expected (O-E)^2/E (O-E)^2/V
home=own  8506     2321     2339     0.143     0.898
home=rent 2388      675      657     0.508     0.898

 Chisq= 0.9  on 1 degrees of freedom, p= 0.3 
library(ggsurvfit)

newfit <- survfit(Surv(death_time, d5.event) ~ home, 
                  conf.type = "log",
                   data = data_nhis)

summary(newfit)
Call: survfit(formula = Surv(death_time, d5.event) ~ home, data = data_nhis, 
    conf.type = "log")

363 observations deleted due to missingness 
                home=own 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    0   8506     192    0.977 0.00161        0.974        0.981
    1   8314     390    0.932 0.00274        0.926        0.937
    2   7924     398    0.885 0.00346        0.878        0.892
    3   7526     460    0.831 0.00407        0.823        0.839
    4   7066     426    0.781 0.00449        0.772        0.789
    5   6640     455    0.727 0.00483        0.718        0.737

                home=rent 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    0   2388      39    0.984 0.00259        0.979        0.989
    1   2349     114    0.936 0.00501        0.926        0.946
    2   2235     136    0.879 0.00667        0.866        0.892
    3   2099     141    0.820 0.00786        0.805        0.835
    4   1958     121    0.769 0.00862        0.753        0.786
    5   1837     124    0.717 0.00921        0.700        0.736
newfit %>%
  ggsurvfit()+
  xlim(-1, 6) +
  add_censor_mark() +
  add_confidence_interval()+
  add_quantile(y_value = .975)
Warning: Removed 22 row(s) containing missing values (geom_path).

Results indicate there is a small difference in survival probability between individuals who own a home and rent. Individuals who own a home have a slightly higher survival probability than those who rent.

Source:

Lynn A. Blewett, Julia A. Rivera Drew, Miriam L. King, Kari C.W. Williams, Natalie Del Ponte and Pat Convey. IPUMS Health Surveys: National Health Interview Survey, Version 7.1 [dataset]. Minneapolis, MN: IPUMS, 2021. 
https://doi.org/10.18128/D070.V7.1