The objective of this exercise is to execute the procedures for estimating time-to-event (TTE) functions in event history analysis with a focus on infant mortality experience. The TTE, also referred to as survival time, measures how long an individual or a cohort had survived before experiencing an event \(y\) (in this case, infant death) between two observed time points \(x-x+n\) (in this case, between age 0 and 12th month). The distribution of observed time-to-event is mainly described using three inter-related functions which one can be derived from the others; namely, the survival function (\(S(t)\)), probability density function (\(f(t)\)), and hazard function (\(h(t)\)).
The pooled birth-recode data sets of 2003, 2008, 2013 and 2018 Nigeria Demographic and Health Surveys were employed:
hwdata <- read.dta("C:/A/02/01F20/DEM223 Event History Analysis/hwdata.dta")
hwdata <- hwdata %>% select(strata, psu, iwt, d_age, status_12, status_12b, bcat01) %>%
filter(complete.cases(hwdata))
hwdata$caseid <- c(1:nrow(hwdata))
hwdata$bcat01 <- factor(hwdata$bcat01, levels = c("Group 01", "Group 02", "Group 03",
"Group 04", "Group 05"), labels = c("Grp01", "Grp02", "Grp03", "Grp04", "Grp05"))
hwdata$pwt = c(hwdata$iwt/1000000)
dhsdes = svydesign(id = ~psu, strata = ~strata, weights = ~pwt, data = hwdata)The event variable or censoring indicator is whether a child born in the last five years prior to the surveys died during infancy (uncensored) or had survived the period (censored).
The comparative indicator or main grouping variable is a child’s demographic risk attribute at birth (DRAB); a composite predictor derived from maternal age at birth, birth order and birth interval. The DRAB predictor is classified as follow:
The resultant composite indicator coded as Grp01 (0) vs Grp02 (1), Grp03 (2), Grp04 (3), and Grp05 (4). The analysis compared the pattern of infant mortality risks among these groups hypothesizing that the the mortality risks will be indifferent across the DRAB dimensions.
** It is worthy to note that the DRAB categorization as presented here is for illustration purpose and subject to further refinement. Meanwhile, the categories was so constructed as the my interest is to investigate the effect of stage of childbearing age, especially the ages at the end of childbearing age spectrum, on early-life mortality experience. This is mainly conceived to capture and measure the impact of the mostly avoidable demographic risk associated with extreme maternal age-related fertility behaviors and compare the risk with those characterized by middle maternal age. Also, there were too few cases in birth order 1, and birth interval outside 24-59 range for early-age and advanced-age mothers.
I estimated the TTE functions for the pooled kids population and specific categories of infants by birth circumstances. In addition, I presented graphical illustrations of the functions and compared their distributions for the aggregate and classes of infants.
First, I examined the structure of the variables and survival status distribution by birth classification:
Rows: 61,213
Columns: 9
$ strata <chr> "2003-north west - urban", "2003-north west - urban", "2...
$ psu <chr> "2003-183", "2003-183", "2003-183", "2003-183", "2003-18...
$ iwt <int> 474592, 474592, 474592, 474592, 474592, 474592, 474592, ...
$ d_age <dbl> 14, 17, 19, 49, 27, 26, 1, 27, 8, 14, 32, 12, 10, 3, 46,...
$ status_12 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
$ status_12b <fct> Alive, Alive, Alive, Alive, Alive, Alive, Alive, Alive, ...
$ bcat01 <fct> Grp02, Grp02, Grp01, Grp05, Grp04, Grp04, Grp03, Grp05, ...
$ caseid <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1...
$ pwt <dbl> 0.47459, 0.47459, 0.47459, 0.47459, 0.47459, 0.47459, 0....
status_12b
bcat01 Alive Dead
Grp01 15849.54 505.85
Grp02 5868.99 282.28
Grp03 6559.21 447.78
Grp04 18799.31 867.31
Grp05 11480.22 727.71
status_12b
bcat01 Alive Dead
Grp01 96.9 3.1
Grp02 95.4 4.6
Grp03 93.6 6.4
Grp04 95.6 4.4
Grp05 94.0 6.0
Then, I examined the mortality pattern for the first 20 children in the data set:
[1] 14+ 17+ 19+ 49+ 27+ 26+ 1+ 27+ 8+ 14+ 32+ 12+ 10 3+ 46+ 0+ 26+ 14+ 14+
[20] 26+
The pooled survival function of infant mortality:
inf.surv <- survfit(Surv(d_age, status_12) ~ 1, data = hwdata, conf.type = "none")
plot(inf.surv, ylim = c(0.94, 0.98), xlim = c(0, 12), main = "Survival function of infant mortality, NDHS 2003-2018",
lwd = 1.5, col = 2)Basic life table distribution of survival:
# Store the survivorship function as an object for subsequent use:
surv12 <- inf.surv
summary(inf.surv)Call: survfit(formula = Surv(d_age, status_12) ~ 1, data = hwdata,
conf.type = "none")
time n.risk n.event survival std.err
0 61213 1531 0.975 0.000631
1 58844 145 0.973 0.000660
2 57010 113 0.971 0.000684
3 55366 123 0.969 0.000709
4 53635 88 0.967 0.000728
5 51864 94 0.965 0.000749
6 50075 91 0.963 0.000770
7 48253 125 0.961 0.000799
8 46444 101 0.959 0.000824
9 44691 98 0.957 0.000849
10 43048 86 0.955 0.000872
11 41489 61 0.953 0.000889
12 40041 211 0.948 0.000949
Pooled infant mortality rate:
[1] 47
The pooled hazard function of infant mortality:
inf.haz <- kphaz.fit(time = hwdata$d_age, status = hwdata$status_12, method = "product-limit")
kphaz.plot(inf.haz, main = "Hazard function of infant mortality, NDHS 2003-2018",
lwd = 1.5, col = 2)The basic life table distribution of hazards:
The pooled cumulative hazard function of infant mortality:
plot(cumsum(inf.haz$haz) ~ inf.haz$time, main = "Cumulative hazard function of infant mortality, NDHS 2003-2018",
ylab = "H(t)", xlab = "Time in months", type = "s", xlim = c(0, 12), lwd = 1.5,
col = 2)The pooled probability density function (PDF):
ft <- -diff(inf.surv$surv)
plot(ft, xlim = c(0.5, 12), main = "Probability density function of infant mortality, NDHS 2003-2018",
type = "s", ylab = "f(t)", xlab = "Time in Months", lwd = 1.5, col = 2)The pooled cumulative distribution function (CDF):
Ft <- cumsum(ft)
plot(Ft, xlim = c(0.5, 12), main = "Cumulative density function of infant mortality, NDHS 2003-2018",
type = "s", ylab = "f(t)", xlab = "Time in Months", lwd = 1.5, col = 2)Comparative survival function estimates by birth categories:
Call: survfit(formula = Surv(d_age, status_12) ~ bcat01, data = hwdata)
bcat01=Grp01
time n.risk n.event survival std.err lower 95% CI upper 95% CI
0 16317 273 0.983 0.00100 0.981 0.985
1 15777 22 0.982 0.00104 0.980 0.984
2 15261 23 0.980 0.00109 0.978 0.983
3 14778 24 0.979 0.00113 0.977 0.981
4 14284 18 0.978 0.00117 0.975 0.980
5 13769 17 0.976 0.00120 0.974 0.979
6 13252 14 0.975 0.00123 0.973 0.978
7 12749 34 0.973 0.00131 0.970 0.975
8 12227 22 0.971 0.00136 0.968 0.974
9 11756 15 0.970 0.00139 0.967 0.972
10 11263 16 0.968 0.00143 0.966 0.971
11 10811 9 0.968 0.00146 0.965 0.970
12 10411 37 0.964 0.00156 0.961 0.967
bcat01=Grp02
time n.risk n.event survival std.err lower 95% CI upper 95% CI
0 6056 166 0.973 0.00210 0.968 0.977
1 5794 11 0.971 0.00217 0.967 0.975
2 5590 7 0.970 0.00221 0.965 0.974
3 5393 12 0.967 0.00229 0.963 0.972
4 5170 11 0.965 0.00237 0.961 0.970
5 4951 8 0.964 0.00243 0.959 0.969
6 4756 7 0.962 0.00249 0.957 0.967
7 4548 7 0.961 0.00254 0.956 0.966
8 4320 10 0.959 0.00263 0.953 0.964
9 4109 11 0.956 0.00274 0.951 0.961
10 3912 11 0.953 0.00285 0.948 0.959
11 3728 2 0.953 0.00287 0.947 0.959
12 3541 7 0.951 0.00295 0.945 0.957
bcat01=Grp03
time n.risk n.event survival std.err lower 95% CI upper 95% CI
0 6914 254 0.963 0.00226 0.959 0.968
1 6589 20 0.960 0.00235 0.956 0.965
2 6405 15 0.958 0.00241 0.953 0.963
3 6265 18 0.955 0.00249 0.950 0.960
4 6092 11 0.954 0.00254 0.949 0.959
5 5886 15 0.951 0.00261 0.946 0.956
6 5718 14 0.949 0.00268 0.944 0.954
7 5548 12 0.947 0.00274 0.941 0.952
8 5378 18 0.944 0.00283 0.938 0.949
9 5172 13 0.941 0.00290 0.936 0.947
10 4996 11 0.939 0.00296 0.933 0.945
11 4821 12 0.937 0.00303 0.931 0.943
12 4652 25 0.932 0.00317 0.926 0.938
bcat01=Grp04
time n.risk n.event survival std.err lower 95% CI upper 95% CI
0 19554 436 0.978 0.00106 0.976 0.980
1 18871 44 0.975 0.00111 0.973 0.978
2 18311 30 0.974 0.00114 0.972 0.976
3 17818 39 0.972 0.00119 0.969 0.974
4 17296 30 0.970 0.00123 0.968 0.972
5 16810 38 0.968 0.00128 0.965 0.970
6 16234 28 0.966 0.00131 0.964 0.969
7 15663 49 0.963 0.00138 0.960 0.966
8 15090 30 0.961 0.00142 0.958 0.964
9 14533 32 0.959 0.00146 0.956 0.962
10 14041 32 0.957 0.00151 0.954 0.960
11 13566 24 0.955 0.00155 0.952 0.958
12 13106 79 0.949 0.00167 0.946 0.953
bcat01=Grp05
time n.risk n.event survival std.err lower 95% CI upper 95% CI
0 12372 402 0.968 0.00159 0.964 0.971
1 11813 48 0.964 0.00169 0.960 0.967
2 11443 38 0.960 0.00176 0.957 0.964
3 11112 30 0.958 0.00182 0.954 0.961
4 10793 18 0.956 0.00185 0.953 0.960
5 10448 16 0.955 0.00188 0.951 0.958
6 10115 28 0.952 0.00194 0.948 0.956
7 9745 23 0.950 0.00200 0.946 0.954
8 9429 21 0.948 0.00204 0.944 0.952
9 9121 27 0.945 0.00211 0.941 0.949
10 8836 16 0.943 0.00215 0.939 0.947
11 8563 14 0.942 0.00218 0.937 0.946
12 8331 63 0.935 0.00234 0.930 0.939
ANOVA test of Grpdifference in infant deaths:
Call:
survdiff(formula = Surv(d_age, status_12) ~ bcat01, data = hwdata)
N Observed Expected (O-E)^2/E (O-E)^2/V
bcat01=Grp01 16317 524 761 73.908 102.115
bcat01=Grp02 6056 270 277 0.179 0.201
bcat01=Grp03 6914 438 326 38.622 44.215
bcat01=Grp04 19554 891 922 1.050 1.570
bcat01=Grp05 12372 744 581 45.837 58.331
Chisq= 162 on 4 degrees of freedom, p= <2e-16
ggsurvplot(inf.km, xlim = c(0, 12), conf.int = T, title = "Survival function for infant mortality by DRAB, NDHS 2003-2018",
ylim = c(0.92, 1))Comparative hazard function estimates by birth categories:
Call:
coxph(formula = Surv(d_age, status_12) ~ bcat01, data = hwdata,
weights = pwt)
n= 61213, number of events= 2867
coef exp(coef) se(coef) robust se z Pr(>|z|)
bcat01Grp02 0.4176 1.5184 0.0743 0.0873 4.78 0.000001739 ***
bcat01Grp03 0.7226 2.0598 0.0649 0.0741 9.75 < 2e-16 ***
bcat01Grp04 0.3467 1.4143 0.0559 0.0642 5.40 0.000000066 ***
bcat01Grp05 0.6555 1.9261 0.0579 0.0647 10.13 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
bcat01Grp02 1.52 0.659 1.28 1.80
bcat01Grp03 2.06 0.485 1.78 2.38
bcat01Grp04 1.41 0.707 1.25 1.60
bcat01Grp05 1.93 0.519 1.70 2.19
Concordance= 0.572 (se = 0.006 )
Likelihood ratio test= 181 on 4 df, p=<2e-16
Wald test = 137 on 4 df, p=<2e-16
Score (logrank) test = 180 on 4 df, p=<2e-16, Robust = 143 p=<2e-16
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
ggsurvplot(inf.km, data = hwdata, risk.table = F, fun = "cumhaz", conf.int = T, title = "CHF of infant mortality by DRAB, NDHS 2003-2018",
xlim = c(0.2, 12), ylim = c(0, 0.08))The analysis revealed that pooled infant mortality rate for the entire period is 47 per 1000 live births. However, the numerical and graphical results of the pooled time-to-event distribution of risks indicate that the risk of dying during infancy is highest at age 12 month; suggesting the possibility of digit preference in reporting age at death of infants. Considering the inter-Grpdifferentials, the results showed that the infant mortality risks were generally higher for children in other comparative DRAB categories relative to their peers in the reference group. Meanwhile, the results specifically showed that the chances of surviving in infancy is lowest for children whose births were characterized by early maternal age at birth, followed closely by those in advanced maternal age risk set, compared with other categories.
This exercise was inspired by Dr. Corey Sparks’ lab session on “DEM 7223: Event History Analysis” Fall 2020 coursework, and his easy-to-follow instructional materials accessible at https://rpubs.com/corey_sparks
The Demographic and Health Survey data can be downloaded with permission from the DHS Program at https://www.dhsprogram.com/data/
R package https://www.R-project.org/ was used for data wrangling and analysis.
Created on September 21, 2020