Event History Analysis Homework 2

Author

Cristina Martinez, MPH

df1 <- read.csv('cat_data.csv')
df2 <- read.csv('raw_data.csv')
df1b <- df1 %>%
  # filter(CV_HIGHEST_DEGREE_EVER_EDT_2015 %in% 
  #          c("Associate/Junior college (AA)",
  #            "Bachelor's degree (BA, BS)",
  #            "Master's degree (MA, MS)",
  #            "PhD",
  #            "Professional degree (DDS, JD, MD)"
  #            )
  #        ) %>%
  mutate(
    debt_cat = if_else(CVC_ASSETS_DEBTS_25_XRND > 17000, 1, 0),
    age1 = 2005 - KEY_BDATE_Y_1997,
    age2 = 2014 - KEY_BDATE_Y_1997, 
    child05 = if_else(CV_BIO_CHILD_HH_2005 > 0, 1, 0),
    child14 = if_else(CV_BIO_CHILD_HH_2015 > 0, 1, 0),
    childtran = if_else(child05 == 0 & child14 == 0, 0, 1)
         ) %>%
  # Filter out those with children at first time point
  filter(child05 == 0) %>%
  filter(complete.cases(.)) %>%
  # Select only variables needed
  select(KEY_SEX_1997, KEY_RACE_ETHNICITY_1997, CV_HIGHEST_DEGREE_EVER_EDT_2015, debt_cat:childtran)
  1. Define your event variable

    • Presence of biological child in household
  2. Define a duration or time variable

    • 2005 - 2014
  3. Define a censoring indicator

    • Censored if respondent does not report having a biological child in the household within the time period
  4. Estimate the survival function for your outcome and plot it 1. by age, proportion of women at a given age who haven’t had a birth

    library(survival)
    
    child_fit <- survfit(Surv(age2, childtran)~ 1,  data = df1b)
    
    library(ggsurvfit)
    
    child_fit %>% 
      ggsurvfit() +
      add_confidence_interval(type = "ribbon") +
      add_quantile() 

    summary(child_fit)
    Call: survfit(formula = Surv(age2, childtran) ~ 1, data = df1b)
    
     time n.risk n.event survival std.err lower 95% CI upper 95% CI
       30    261      19    0.927  0.0161        0.896        0.959
       31    229      24    0.830  0.0237        0.785        0.878
       32    182      28    0.702  0.0299        0.646        0.763
       33    123      29    0.537  0.0353        0.472        0.611
       34     60      25    0.313  0.0399        0.244        0.402

The survival curve shows that the median age some will have a child in the household at the second time point is about 34 (?)

  1. Carry out the following analysis:

    • Kaplan-Meier survival analysis of the outcome

    • Define a grouping variable, this can be dichotomous or categorical.

      • Debt less than or equal to $17k (debt_cat = 0) or Debt greater than $17k (debt_cat = 1)
    • Do you have a research hypothesis about the survival patterns for the levels of the categorical variable? State it.

      • Those with debt totaling more than $17k will be less likely to have a child in the home in the second time period.
    • Comparison of Kaplan-Meier survival across grouping variables in your data. Interpret your results.

    • Plot the survival function for the analysis for each level of the group variable

kpfit <- survfit(Surv(age2, childtran) ~ debt_cat, data = df1b)

summary(kpfit)
Call: survfit(formula = Surv(age2, childtran) ~ debt_cat, data = df1b)

                debt_cat=0 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
   30    227      17    0.925  0.0175        0.891        0.960
   31    197      20    0.831  0.0254        0.783        0.882
   32    155      23    0.708  0.0321        0.648        0.774
   33    105      24    0.546  0.0381        0.476        0.626
   34     50      18    0.349  0.0444        0.272        0.448

                debt_cat=1 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
   30     34       2    0.941  0.0404       0.8653        1.000
   31     32       4    0.824  0.0654       0.7049        0.962
   32     27       5    0.671  0.0814       0.5290        0.851
   33     18       5    0.485  0.0921       0.3340        0.703
   34     10       7    0.145  0.0755       0.0526        0.402
## Compare difference across groups

survdiff(Surv(age2, childtran) ~ debt_cat, data = df1b)
Call:
survdiff(formula = Surv(age2, childtran) ~ debt_cat, data = df1b)

             N Observed Expected (O-E)^2/E (O-E)^2/V
debt_cat=0 227      102    106.6     0.199       1.7
debt_cat=1  34       23     18.4     1.154       1.7

 Chisq= 1.7  on 1 degrees of freedom, p= 0.2 

Those with higher debt seem less likely to reach the expected outcome. However, this difference is not significant.

## Plot survival function

kpfit %>%
  ggsurvfit(conf.int = T, title = "Survival for transition to child in household")
Warning: Ignoring unknown parameters: conf.int, title