For this exercise, kids recode data from the Nigeria Demographic and Health Surveys 2003, 2008, 2013 and 2018 were employed. The event variable is whether a child born in the last five years prior to the survey died during infancy. Hence, the duration of survival is age at death between birth and 12 months of age; while the censoring indicator is whether or not infant death occured. The comparative indicator or main grouping variable is a child’s demographic risk attribute at birth (DRAB) characterized by maternal age at birth, birth order and birth interval. The DRAB were classified as follow:
To illustrate the mortality risk differential by the DRAB predictor, we would estimate basic functions of survival time- the survival, hazard, cumulative hazards functions- and conduct the Kaplan-Meier survival analysis. The analysis is aimed at testing the hypothesis of indifferent infant mortality risks across DRAB dimensions; with particular focus on dimensions characterized by early and advanced maternal age.
hwdata <- read.dta("C:/A/02/01F20/DEM223 Event History Analysis/data.dta")
# summary(hwdata)
freq(hwdata$d_age, round.digits = 1, cumul = T, report.nas = T)Frequencies
hwdata$d_age
Type: Numeric
Freq % Valid % Valid Cum. % Total % Total Cum.
----------- ------- --------- -------------- --------- --------------
0 2459 3.9 3.9 3.9 3.9
1 1874 3.0 6.9 3.0 6.9
2 1686 2.7 9.6 2.7 9.6
3 1783 2.8 12.4 2.8 12.4
4 1825 2.9 15.3 2.9 15.3
5 1826 2.9 18.2 2.9 18.2
6 1862 3.0 21.2 3.0 21.2
7 1856 3.0 24.1 3.0 24.1
8 1783 2.8 27.0 2.8 27.0
9 1677 2.7 29.6 2.7 29.6
10 1598 2.5 32.2 2.5 32.2
11 1468 2.3 34.5 2.3 34.5
12 1988 3.2 37.7 3.2 37.7
13 1926 3.1 40.8 3.1 40.8
14 1785 2.8 43.6 2.8 43.6
15 1640 2.6 46.2 2.6 46.2
16 1575 2.5 48.7 2.5 48.7
17 1475 2.3 51.1 2.3 51.1
18 1497 2.4 53.4 2.4 53.4
19 1323 2.1 55.5 2.1 55.5
20 1239 2.0 57.5 2.0 57.5
21 1157 1.8 59.4 1.8 59.4
22 1036 1.6 61.0 1.6 61.0
23 977 1.6 62.6 1.6 62.6
24 1748 2.8 65.3 2.8 65.3
25 1514 2.4 67.7 2.4 67.7
26 1310 2.1 69.8 2.1 69.8
27 1114 1.8 71.6 1.8 71.6
28 1067 1.7 73.3 1.7 73.3
29 986 1.6 74.9 1.6 74.9
30 941 1.5 76.4 1.5 76.4
31 854 1.4 77.7 1.4 77.7
32 773 1.2 79.0 1.2 79.0
33 716 1.1 80.1 1.1 80.1
34 635 1.0 81.1 1.0 81.1
35 598 1.0 82.1 1.0 82.1
36 774 1.2 83.3 1.2 83.3
37 783 1.2 84.5 1.2 84.5
38 759 1.2 85.7 1.2 85.7
39 690 1.1 86.8 1.1 86.8
40 607 1.0 87.8 1.0 87.8
41 541 0.9 88.7 0.9 88.7
42 514 0.8 89.5 0.8 89.5
43 463 0.7 90.2 0.7 90.2
44 423 0.7 90.9 0.7 90.9
45 423 0.7 91.6 0.7 91.6
46 394 0.6 92.2 0.6 92.2
47 350 0.6 92.8 0.6 92.8
48 421 0.7 93.4 0.7 93.4
49 436 0.7 94.1 0.7 94.1
50 443 0.7 94.8 0.7 94.8
51 431 0.7 95.5 0.7 95.5
52 354 0.6 96.1 0.6 96.1
53 338 0.5 96.6 0.5 96.6
54 323 0.5 97.1 0.5 97.1
55 348 0.6 97.7 0.6 97.7
56 325 0.5 98.2 0.5 98.2
57 259 0.4 98.6 0.4 98.6
58 287 0.5 99.1 0.5 99.1
59 264 0.4 99.5 0.4 99.5
60 283 0.5 99.9 0.5 99.9
61 43 0.1 100.0 0.1 100.0
<NA> 0 0.0 100.0
Total 62847 100.0 100.0 100.0 100.0
Frequencies
hwdata$d_infancy
Type: Numeric
Freq % Valid % Valid Cum. % Total % Total Cum.
----------- ------- --------- -------------- --------- --------------
0 59840 95.2 95.2 95.2 95.2
1 3007 4.8 100.0 4.8 100.0
<NA> 0 0.0 100.0
Total 62847 100.0 100.0 100.0 100.0
Frequencies
hwdata$d_infancy2
Type: Factor
Freq % Valid % Valid Cum. % Total % Total Cum.
--------------------- ------- --------- -------------- --------- --------------
Alive at 12 mos 59840 95.2 95.2 95.2 95.2
Dead by 12 mos 3007 4.8 100.0 4.8 100.0
<NA> 0 0.0 100.0
Total 62847 100.0 100.0 100.0 100.0
Frequencies
hwdata$riskcat01
Type: Factor
Freq % Valid % Valid Cum. % Total % Total Cum.
----------- ------- --------- -------------- --------- --------------
0 16678 26.5 26.5 26.5 26.5
1 6201 9.9 36.4 9.9 36.4
2 7093 11.3 47.7 11.3 47.7
3 20086 32.0 79.7 32.0 79.7
4 12787 20.3 100.0 20.3 100.0
<NA> 2 0.0 100.0
Total 62847 100.0 100.0 100.0 100.0
options(survey.lonely.psu = "adjust")
surveydes = svydesign(id = ~psu, strata = ~strata, weights = ~pwt, data = hwdata,
nest = T)First, let’s examine the mortality pattern for the first 20 children in the data set:
[1] 14+ 17+ 19+ 49+ 27+ 26+ 1+ 27+ 8+ 14+ 32+ 12+ 10 3+ 46+ 0+ 26+ 14+ 25+
[20] 14+
The pooled survival function of infant mortality
inf.surv <- survfit(Surv(d_age, d_infancy) ~ 1, data = hwdata, conf.type = "none")
plot(inf.surv, ylim = c(0.9, 1), xlim = c(0, 12), main = "Survival function of infant mortality, NDHS 2003-2018")The underlying life table distribution of survival
# Store the survivorship function as an object for subsequent use:
surv5 <- inf.surv
summary(inf.surv)Call: survfit(formula = Surv(d_age, d_infancy) ~ 1, data = hwdata,
conf.type = "none")
time n.risk n.event survival std.err
0 62847 1610 0.974 0.000630
1 60388 155 0.972 0.000660
2 58514 117 0.970 0.000683
3 56828 127 0.968 0.000708
4 55045 96 0.966 0.000727
5 53220 98 0.964 0.000748
6 51394 94 0.963 0.000768
7 49532 130 0.960 0.000797
8 47676 103 0.958 0.000821
9 45893 104 0.956 0.000847
10 44216 90 0.954 0.000869
11 42618 61 0.952 0.000886
12 41150 222 0.947 0.000946
Pooled infant mortality rate (NDHS 2003, 2008, 2013 and 2018)
[1] 47.542
The pooled hazard function of infant mortality
inf.haz <- kphaz.fit(time = hwdata$d_age, status = hwdata$d_infancy, method = "product-limit")
kphaz.plot(inf.haz, main = "Hazard function of infant mortality, NDHS 2003-2018")The underlying life table distribution of hazards
The pooled cumulative hazard function of infant mortality
plot(cumsum(inf.haz$haz) ~ inf.haz$time, main = "Cumulative hazard function of infant mortality, NDHS 2003-2018",
ylab = "H(t)", xlab = "Time in months", type = "l", xlim = c(0, 12), lwd = 1.5,
col = 2)The pooled probability density function (PDF)
ft <- -diff(inf.surv$surv)
plot(ft, xlim = c(0.5, 12), main = "Probability density function of infant mortality, NDHS 2003-2018",
type = "s", ylab = "f(t)", xlab = "Time in Months")The pooled cumulative distribution function (CDF)
Ft <- cumsum(ft)
plot(Ft, xlim = c(0.5, 12), main = "Cumulative density function of infant mortality, NDHS 2003-2018",
type = "s", ylab = "f(t)", xlab = "Time in Months")Call: survfit(formula = Surv(d_age, d_infancy) ~ riskcat01, data = hwdata)
2 observations deleted due to missingness
riskcat01=0
time n.risk n.event survival std.err lower 95% CI upper 95% CI
0 16678 285 0.983 0.00100 0.981 0.985
1 16124 24 0.981 0.00105 0.979 0.983
2 15593 23 0.980 0.00109 0.978 0.982
3 15098 24 0.978 0.00113 0.976 0.981
4 14595 18 0.977 0.00116 0.975 0.980
5 14065 17 0.976 0.00120 0.974 0.978
6 13541 15 0.975 0.00123 0.973 0.977
7 13027 35 0.972 0.00130 0.970 0.975
8 12493 23 0.971 0.00135 0.968 0.973
9 12009 15 0.969 0.00139 0.967 0.972
10 11505 17 0.968 0.00143 0.965 0.971
11 11038 9 0.967 0.00145 0.964 0.970
12 10633 38 0.964 0.00155 0.961 0.967
riskcat01=1
time n.risk n.event survival std.err lower 95% CI upper 95% CI
0 6201 173 0.972 0.00209 0.968 0.976
1 5929 11 0.970 0.00216 0.966 0.975
2 5723 7 0.969 0.00220 0.965 0.973
3 5523 12 0.967 0.00228 0.963 0.971
4 5293 12 0.965 0.00236 0.960 0.969
5 5068 9 0.963 0.00242 0.958 0.968
6 4870 7 0.962 0.00248 0.957 0.967
7 4658 8 0.960 0.00254 0.955 0.965
8 4425 10 0.958 0.00262 0.953 0.963
9 4210 12 0.955 0.00273 0.950 0.961
10 4008 11 0.953 0.00284 0.947 0.958
11 3822 2 0.952 0.00286 0.946 0.958
12 3633 7 0.950 0.00294 0.944 0.956
riskcat01=2
time n.risk n.event survival std.err lower 95% CI upper 95% CI
0 7093 264 0.963 0.00225 0.958 0.967
1 6757 21 0.960 0.00233 0.955 0.964
2 6570 15 0.958 0.00240 0.953 0.962
3 6423 18 0.955 0.00247 0.950 0.960
4 6246 11 0.953 0.00252 0.948 0.958
5 6034 15 0.951 0.00259 0.946 0.956
6 5859 14 0.949 0.00265 0.943 0.954
7 5688 14 0.946 0.00272 0.941 0.952
8 5511 18 0.943 0.00280 0.938 0.949
9 5301 14 0.941 0.00287 0.935 0.946
10 5122 11 0.939 0.00293 0.933 0.944
11 4945 12 0.936 0.00300 0.931 0.942
12 4774 27 0.931 0.00315 0.925 0.937
riskcat01=3
time n.risk n.event survival std.err lower 95% CI upper 95% CI
0 20086 454 0.977 0.00105 0.975 0.979
1 19383 49 0.975 0.00110 0.973 0.977
2 18809 32 0.973 0.00114 0.971 0.976
3 18305 40 0.971 0.00119 0.969 0.973
4 17768 33 0.969 0.00122 0.967 0.972
5 17268 39 0.967 0.00127 0.965 0.970
6 16679 28 0.966 0.00131 0.963 0.968
7 16095 49 0.963 0.00137 0.960 0.965
8 15507 31 0.961 0.00141 0.958 0.963
9 14944 34 0.958 0.00145 0.956 0.961
10 14445 35 0.956 0.00150 0.953 0.959
11 13958 24 0.955 0.00154 0.952 0.958
12 13490 83 0.949 0.00166 0.945 0.952
riskcat01=4
time n.risk n.event survival std.err lower 95% CI upper 95% CI
0 12787 433 0.966 0.00160 0.963 0.969
1 12194 50 0.962 0.00169 0.959 0.965
2 11818 40 0.959 0.00176 0.955 0.962
3 11478 33 0.956 0.00182 0.953 0.960
4 11142 22 0.954 0.00186 0.951 0.958
5 10784 18 0.953 0.00189 0.949 0.956
6 10444 30 0.950 0.00195 0.946 0.954
7 10063 24 0.948 0.00200 0.944 0.952
8 9739 21 0.946 0.00205 0.942 0.950
9 9428 29 0.943 0.00211 0.939 0.947
10 9135 16 0.941 0.00215 0.937 0.945
11 8854 14 0.940 0.00218 0.935 0.944
12 8619 67 0.932 0.00234 0.928 0.937
Call:
survdiff(formula = Surv(d_age, d_infancy) ~ riskcat01, data = hwdata)
n=62845, 2 observations deleted due to missingness.
N Observed Expected (O-E)^2/E (O-E)^2/V
riskcat01=0 16678 543 794 79.565 109.79
riskcat01=1 6201 281 290 0.267 0.30
riskcat01=2 7093 454 341 37.242 42.65
riskcat01=3 20086 931 968 1.384 2.07
riskcat01=4 12787 797 613 55.271 70.48
Chisq= 176 on 4 degrees of freedom, p= <2e-16
ggsurvplot(inf.km, xlim = c(0, 12), conf.int = T, title = "Survival Function for infant mortality by DRAB, NDHS 2003-2018",
ylim = c(0.9, 1))Call:
coxph(formula = Surv(d_age, d_infancy) ~ riskcat01, data = hwdata,
weights = pwt)
n= 62845, number of events= 3006
(2 observations deleted due to missingness)
coef exp(coef) se(coef) robust se z Pr(>|z|)
riskcat011 0.4241 1.5282 0.0727 0.0854 4.96 0.000000690 ***
riskcat012 0.7230 2.0607 0.0637 0.0729 9.92 < 2e-16 ***
riskcat013 0.3503 1.4195 0.0549 0.0629 5.57 0.000000025 ***
riskcat014 0.6684 1.9511 0.0566 0.0631 10.59 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
riskcat011 1.53 0.654 1.29 1.81
riskcat012 2.06 0.485 1.79 2.38
riskcat013 1.42 0.704 1.25 1.61
riskcat014 1.95 0.513 1.72 2.21
Concordance= 0.572 (se = 0.006 )
Likelihood ratio test= 193 on 4 df, p=<2e-16
Wald test = 146 on 4 df, p=<2e-16
Score (logrank) test = 191 on 4 df, p=<2e-16, Robust = 153 p=<2e-16
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
ggsurvplot(inf.km, data = hwdata, risk.table = F, fun = "cumhaz", conf.int = T, title = "CHF of infant mortality by DRAB, NDHS 2003-2018",
xlim = c(0.5, 12), ylim = c(0, 0.08))The results showed that unadjusted risks of dying during infancy is generally greater for children in other comparative DRAB categories relative to their peers in the reference group. Meanwhile, the results specifically showed that the chances of surviving in infancy is lowest for children whose births were characterized by early maternal age at birth, followed closely by those in advanced maternal age risk set, compared with other categories.
This exercise was inspired by Dr. Corey Sparks’ lab session on “DEM 7223: Event History Analysis” Fall 2020 coursework, and his easy-to-follow instructional materials accessible at https://rpubs.com/corey_sparks
The Demographic and Health Survey data can be downloaded with permission from the DHS Program at https://www.dhsprogram.com/data/
R package was used in the preparation and analysis of data: