DEM 7223 - Event History Analysis

Author

Albert Assogba

Published

September 15, 2022

load libraries

library(haven)
library(survival)
library(car)
Loading required package: carData
library(muhaz)
library(tidyverse)
── Attaching packages
───────────────────────────────────────
tidyverse 1.3.2 ──
✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
✔ tibble  3.1.7      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.2      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
✖ dplyr::recode() masks car::recode()
✖ purrr::some()   masks car::some()
library(ggsurvfit)

Homework 2, I will perform a survival analysis on Ghana DHS 2014 data to measure the event of a child dying before age 1 as its outcome variable.

load package

tad<-read_dta("GHKR72DT/GHKR72FL.DTA")
tad<-zap_label(tad)

(1) Define your event variable

Event - Infant Mortality. That is the event of a child dying before age 1 or alive, and the age at death if the child is dead in months.

(2) Define a duration or time variable

The duration or time variable is when the child is alive.

(3) Define a censoring indicator

The censoring indicator here is if the child dead at age 1, or the censoring time (12 months).

Estimate the survival function for your outcome and plot it

tad$death.age<-ifelse(tad$b5==1,
                      ((((tad$v008)))- (((tad$b3))))
                      ,tad$b7)

censoring indicator for death by age 1, in months (12 months)

tad$d.event<-ifelse(is.na(tad$b7) ==T| tad$b7 > 12,0,1)
tad$d.eventfac<-factor(tad$d.event)
levels(tad$d.eventfac)<-c("Alive at 1", "Death by 1")
table(tad$d.eventfac)

Alive at 1 Death by 1 
      5626        258 
fit <- survfit(Surv(death.age, d.event) ~ 1, data = tad)
summary(fit)
Call: survfit(formula = Surv(death.age, d.event) ~ 1, data = tad)

 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    0   5884     167    0.972 0.00216        0.967        0.976
    1   5676      15    0.969 0.00226        0.965        0.973
    2   5568      10    0.967 0.00232        0.963        0.972
    3   5445       7    0.966 0.00236        0.961        0.971
    4   5322      10    0.964 0.00243        0.960        0.969
    5   5203       5    0.963 0.00246        0.959        0.968
    6   5057       7    0.962 0.00251        0.957        0.967
    7   4936       7    0.961 0.00256        0.956        0.966
    8   4821       3    0.960 0.00258        0.955        0.965
    9   4729       6    0.959 0.00262        0.954        0.964
   10   4631       3    0.958 0.00265        0.953        0.963
   11   4550       4    0.957 0.00268        0.952        0.963
   12   4436      14    0.954 0.00279        0.949        0.960
fit %>% 
  ggsurvfit() +
  labs(title = "Infant mortality survival plot",
       subtitle = "By household SES",
       x = "Years", y = "survival Probability") +
  xlim(c(0, 12)) +
  ylim(c(.95, 1))

Comparing Two Groups

Define a grouping variable, this can be dichotomous or categorical

I have selected a categorical variable socioeconomic status(SES), to test the survival function of children in high or low SES households. I recoded “1” if the child’s household socioeconomic status (SES) is low during the interview and “0” if the child’s household SES is high during the same period. I hypothesize that children from low SES households are more likely to experience infant mortality within the first 12 months than children from high SES households.

tad$highses<-Recode(tad$v190,
                    recodes = "1:3 = 0; 4:5 = 1; else = NA")
group_fit <- survfit(Surv(death.age, d.event) ~ highses, data=tad)
group_fit %>% 
  ggsurvfit() +
  labs(title = "Infant mortality survival plot",
       subtitle = "By household SES",
       x = "Years", y = "survival Probability") +
  xlim(c(0, 12)) +
  ylim(c(.95, 1))

summary(fit)
Call: survfit(formula = Surv(death.age, d.event) ~ 1, data = tad)

 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    0   5884     167    0.972 0.00216        0.967        0.976
    1   5676      15    0.969 0.00226        0.965        0.973
    2   5568      10    0.967 0.00232        0.963        0.972
    3   5445       7    0.966 0.00236        0.961        0.971
    4   5322      10    0.964 0.00243        0.960        0.969
    5   5203       5    0.963 0.00246        0.959        0.968
    6   5057       7    0.962 0.00251        0.957        0.967
    7   4936       7    0.961 0.00256        0.956        0.966
    8   4821       3    0.960 0.00258        0.955        0.965
    9   4729       6    0.959 0.00262        0.954        0.964
   10   4631       3    0.958 0.00265        0.953        0.963
   11   4550       4    0.957 0.00268        0.952        0.963
   12   4436      14    0.954 0.00279        0.949        0.960

comparison of Kaplan-Meier survival across grouping variable and interpret it

survdiff(Surv(death.age, d.event) ~ highses, data = tad)
Call:
survdiff(formula = Surv(death.age, d.event) ~ highses, data = tad)

             N Observed Expected (O-E)^2/E (O-E)^2/V
highses=0 4273      189    187.3    0.0149    0.0554
highses=1 1611       69     70.7    0.0394    0.0554

 Chisq= 0.1  on 1 degrees of freedom, p= 0.8 

It can be seen that there are no differences in the survival status of children based on their household SES.