Purpose of the exercise

For this exercise, kids recode data from the Nigeria Demographic and Health Surveys 2003, 2008, 2013 and 2018 were employed. The event variable is whether a child born in the last five years prior to the survey died during infancy. Hence, the duration of survival is age at death between birth and 12 months of age; while the censoring indicator is whether or not infant death occured. The comparative indicator or main grouping variable is a child’s demographic risk attribute at birth (DRAB) characterized by maternal age at birth, birth order and birth interval. The DRAB were classified as follow:

To illustrate the mortality risk differential by the DRAB predictor, we would estimate basic functions of survival time- the survival, hazard, cumulative hazards functions- and conduct the Kaplan-Meier survival analysis. The analysis is aimed at testing the hypothesis of indifferent infant mortality risks across DRAB dimensions; with particular focus on dimensions characterized by early and advanced maternal age.


Read data file

hwdata <- read.dta("C:/A/02/01F20/DEM223 Event History Analysis/data.dta")
# summary(hwdata)

freq(hwdata$d_age, round.digits = 1, cumul = T, report.nas = T)
Frequencies  
hwdata$d_age  
Type: Numeric  

               Freq   % Valid   % Valid Cum.   % Total   % Total Cum.
----------- ------- --------- -------------- --------- --------------
          0    2459       3.9            3.9       3.9            3.9
          1    1874       3.0            6.9       3.0            6.9
          2    1686       2.7            9.6       2.7            9.6
          3    1783       2.8           12.4       2.8           12.4
          4    1825       2.9           15.3       2.9           15.3
          5    1826       2.9           18.2       2.9           18.2
          6    1862       3.0           21.2       3.0           21.2
          7    1856       3.0           24.1       3.0           24.1
          8    1783       2.8           27.0       2.8           27.0
          9    1677       2.7           29.6       2.7           29.6
         10    1598       2.5           32.2       2.5           32.2
         11    1468       2.3           34.5       2.3           34.5
         12    1988       3.2           37.7       3.2           37.7
         13    1926       3.1           40.8       3.1           40.8
         14    1785       2.8           43.6       2.8           43.6
         15    1640       2.6           46.2       2.6           46.2
         16    1575       2.5           48.7       2.5           48.7
         17    1475       2.3           51.1       2.3           51.1
         18    1497       2.4           53.4       2.4           53.4
         19    1323       2.1           55.5       2.1           55.5
         20    1239       2.0           57.5       2.0           57.5
         21    1157       1.8           59.4       1.8           59.4
         22    1036       1.6           61.0       1.6           61.0
         23     977       1.6           62.6       1.6           62.6
         24    1748       2.8           65.3       2.8           65.3
         25    1514       2.4           67.7       2.4           67.7
         26    1310       2.1           69.8       2.1           69.8
         27    1114       1.8           71.6       1.8           71.6
         28    1067       1.7           73.3       1.7           73.3
         29     986       1.6           74.9       1.6           74.9
         30     941       1.5           76.4       1.5           76.4
         31     854       1.4           77.7       1.4           77.7
         32     773       1.2           79.0       1.2           79.0
         33     716       1.1           80.1       1.1           80.1
         34     635       1.0           81.1       1.0           81.1
         35     598       1.0           82.1       1.0           82.1
         36     774       1.2           83.3       1.2           83.3
         37     783       1.2           84.5       1.2           84.5
         38     759       1.2           85.7       1.2           85.7
         39     690       1.1           86.8       1.1           86.8
         40     607       1.0           87.8       1.0           87.8
         41     541       0.9           88.7       0.9           88.7
         42     514       0.8           89.5       0.8           89.5
         43     463       0.7           90.2       0.7           90.2
         44     423       0.7           90.9       0.7           90.9
         45     423       0.7           91.6       0.7           91.6
         46     394       0.6           92.2       0.6           92.2
         47     350       0.6           92.8       0.6           92.8
         48     421       0.7           93.4       0.7           93.4
         49     436       0.7           94.1       0.7           94.1
         50     443       0.7           94.8       0.7           94.8
         51     431       0.7           95.5       0.7           95.5
         52     354       0.6           96.1       0.6           96.1
         53     338       0.5           96.6       0.5           96.6
         54     323       0.5           97.1       0.5           97.1
         55     348       0.6           97.7       0.6           97.7
         56     325       0.5           98.2       0.5           98.2
         57     259       0.4           98.6       0.4           98.6
         58     287       0.5           99.1       0.5           99.1
         59     264       0.4           99.5       0.4           99.5
         60     283       0.5           99.9       0.5           99.9
         61      43       0.1          100.0       0.1          100.0
       <NA>       0                                0.0          100.0
      Total   62847     100.0          100.0     100.0          100.0
freq(hwdata$d_infancy, round.digits = 1, cumul = T, report.nas = T)
Frequencies  
hwdata$d_infancy  
Type: Numeric  

               Freq   % Valid   % Valid Cum.   % Total   % Total Cum.
----------- ------- --------- -------------- --------- --------------
          0   59840      95.2           95.2      95.2           95.2
          1    3007       4.8          100.0       4.8          100.0
       <NA>       0                                0.0          100.0
      Total   62847     100.0          100.0     100.0          100.0
freq(hwdata$d_infancy2, round.digits = 1, cumul = T, report.nas = T)
Frequencies  
hwdata$d_infancy2  
Type: Factor  

                         Freq   % Valid   % Valid Cum.   % Total   % Total Cum.
--------------------- ------- --------- -------------- --------- --------------
      Alive at 12 mos   59840      95.2           95.2      95.2           95.2
       Dead by 12 mos    3007       4.8          100.0       4.8          100.0
                 <NA>       0                                0.0          100.0
                Total   62847     100.0          100.0     100.0          100.0
freq(hwdata$riskcat01, round.digits = 1, cumul = T, report.nas = T)
Frequencies  
hwdata$riskcat01  
Type: Factor  

               Freq   % Valid   % Valid Cum.   % Total   % Total Cum.
----------- ------- --------- -------------- --------- --------------
          0   16678      26.5           26.5      26.5           26.5
          1    6201       9.9           36.4       9.9           36.4
          2    7093      11.3           47.7      11.3           47.7
          3   20086      32.0           79.7      32.0           79.7
          4   12787      20.3          100.0      20.3          100.0
       <NA>       2                                0.0          100.0
      Total   62847     100.0          100.0     100.0          100.0

Complex survey design parameters

options(survey.lonely.psu = "adjust")
surveydes = svydesign(id = ~psu, strata = ~strata, weights = ~pwt, data = hwdata, 
    nest = T)

Analysis

First, let’s examine the mortality pattern for the first 20 children in the data set:

head(hwdata[, c("d_age", "d_infancy")], n = 20)
head(Surv(hwdata$d_age, hwdata$d_infancy), n = 20)
 [1] 14+ 17+ 19+ 49+ 27+ 26+  1+ 27+  8+ 14+ 32+ 12+ 10   3+ 46+  0+ 26+ 14+ 25+
[20] 14+

The pooled survival function of infant mortality

inf.surv <- survfit(Surv(d_age, d_infancy) ~ 1, data = hwdata, conf.type = "none")
plot(inf.surv, ylim = c(0.9, 1), xlim = c(0, 12), main = "Survival function of infant mortality, NDHS 2003-2018")

The underlying life table distribution of survival

# Store the survivorship function as an object for subsequent use:
surv5 <- inf.surv
summary(inf.surv)
Call: survfit(formula = Surv(d_age, d_infancy) ~ 1, data = hwdata, 
    conf.type = "none")

 time n.risk n.event survival  std.err
    0  62847    1610    0.974 0.000630
    1  60388     155    0.972 0.000660
    2  58514     117    0.970 0.000683
    3  56828     127    0.968 0.000708
    4  55045      96    0.966 0.000727
    5  53220      98    0.964 0.000748
    6  51394      94    0.963 0.000768
    7  49532     130    0.960 0.000797
    8  47676     103    0.958 0.000821
    9  45893     104    0.956 0.000847
   10  44216      90    0.954 0.000869
   11  42618      61    0.952 0.000886
   12  41150     222    0.947 0.000946

Pooled infant mortality rate (NDHS 2003, 2008, 2013 and 2018)

1000 * (1 - summary(inf.surv)$surv[12])
[1] 47.542

The pooled hazard function of infant mortality

inf.haz <- kphaz.fit(time = hwdata$d_age, status = hwdata$d_infancy, method = "product-limit")
kphaz.plot(inf.haz, main = "Hazard function of infant mortality, NDHS 2003-2018")

The underlying life table distribution of hazards

data.frame(inf.haz)

The pooled cumulative hazard function of infant mortality

plot(cumsum(inf.haz$haz) ~ inf.haz$time, main = "Cumulative hazard function of infant mortality, NDHS 2003-2018", 
    ylab = "H(t)", xlab = "Time in months", type = "l", xlim = c(0, 12), lwd = 1.5, 
    col = 2)

The pooled probability density function (PDF)

ft <- -diff(inf.surv$surv)
plot(ft, xlim = c(0.5, 12), main = "Probability density function of infant mortality, NDHS 2003-2018", 
    type = "s", ylab = "f(t)", xlab = "Time in Months")

The pooled cumulative distribution function (CDF)

Ft <- cumsum(ft)
plot(Ft, xlim = c(0.5, 12), main = "Cumulative density function of infant mortality, NDHS 2003-2018", 
    type = "s", ylab = "f(t)", xlab = "Time in Months")

inf.km <- survfit(Surv(d_age, d_infancy) ~ riskcat01, data = hwdata)
summary(inf.km)
Call: survfit(formula = Surv(d_age, d_infancy) ~ riskcat01, data = hwdata)

2 observations deleted due to missingness 
                riskcat01=0 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    0  16678     285    0.983 0.00100        0.981        0.985
    1  16124      24    0.981 0.00105        0.979        0.983
    2  15593      23    0.980 0.00109        0.978        0.982
    3  15098      24    0.978 0.00113        0.976        0.981
    4  14595      18    0.977 0.00116        0.975        0.980
    5  14065      17    0.976 0.00120        0.974        0.978
    6  13541      15    0.975 0.00123        0.973        0.977
    7  13027      35    0.972 0.00130        0.970        0.975
    8  12493      23    0.971 0.00135        0.968        0.973
    9  12009      15    0.969 0.00139        0.967        0.972
   10  11505      17    0.968 0.00143        0.965        0.971
   11  11038       9    0.967 0.00145        0.964        0.970
   12  10633      38    0.964 0.00155        0.961        0.967

                riskcat01=1 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    0   6201     173    0.972 0.00209        0.968        0.976
    1   5929      11    0.970 0.00216        0.966        0.975
    2   5723       7    0.969 0.00220        0.965        0.973
    3   5523      12    0.967 0.00228        0.963        0.971
    4   5293      12    0.965 0.00236        0.960        0.969
    5   5068       9    0.963 0.00242        0.958        0.968
    6   4870       7    0.962 0.00248        0.957        0.967
    7   4658       8    0.960 0.00254        0.955        0.965
    8   4425      10    0.958 0.00262        0.953        0.963
    9   4210      12    0.955 0.00273        0.950        0.961
   10   4008      11    0.953 0.00284        0.947        0.958
   11   3822       2    0.952 0.00286        0.946        0.958
   12   3633       7    0.950 0.00294        0.944        0.956

                riskcat01=2 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    0   7093     264    0.963 0.00225        0.958        0.967
    1   6757      21    0.960 0.00233        0.955        0.964
    2   6570      15    0.958 0.00240        0.953        0.962
    3   6423      18    0.955 0.00247        0.950        0.960
    4   6246      11    0.953 0.00252        0.948        0.958
    5   6034      15    0.951 0.00259        0.946        0.956
    6   5859      14    0.949 0.00265        0.943        0.954
    7   5688      14    0.946 0.00272        0.941        0.952
    8   5511      18    0.943 0.00280        0.938        0.949
    9   5301      14    0.941 0.00287        0.935        0.946
   10   5122      11    0.939 0.00293        0.933        0.944
   11   4945      12    0.936 0.00300        0.931        0.942
   12   4774      27    0.931 0.00315        0.925        0.937

                riskcat01=3 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    0  20086     454    0.977 0.00105        0.975        0.979
    1  19383      49    0.975 0.00110        0.973        0.977
    2  18809      32    0.973 0.00114        0.971        0.976
    3  18305      40    0.971 0.00119        0.969        0.973
    4  17768      33    0.969 0.00122        0.967        0.972
    5  17268      39    0.967 0.00127        0.965        0.970
    6  16679      28    0.966 0.00131        0.963        0.968
    7  16095      49    0.963 0.00137        0.960        0.965
    8  15507      31    0.961 0.00141        0.958        0.963
    9  14944      34    0.958 0.00145        0.956        0.961
   10  14445      35    0.956 0.00150        0.953        0.959
   11  13958      24    0.955 0.00154        0.952        0.958
   12  13490      83    0.949 0.00166        0.945        0.952

                riskcat01=4 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    0  12787     433    0.966 0.00160        0.963        0.969
    1  12194      50    0.962 0.00169        0.959        0.965
    2  11818      40    0.959 0.00176        0.955        0.962
    3  11478      33    0.956 0.00182        0.953        0.960
    4  11142      22    0.954 0.00186        0.951        0.958
    5  10784      18    0.953 0.00189        0.949        0.956
    6  10444      30    0.950 0.00195        0.946        0.954
    7  10063      24    0.948 0.00200        0.944        0.952
    8   9739      21    0.946 0.00205        0.942        0.950
    9   9428      29    0.943 0.00211        0.939        0.947
   10   9135      16    0.941 0.00215        0.937        0.945
   11   8854      14    0.940 0.00218        0.935        0.944
   12   8619      67    0.932 0.00234        0.928        0.937
survdiff(Surv(d_age, d_infancy) ~ riskcat01, data = hwdata)
Call:
survdiff(formula = Surv(d_age, d_infancy) ~ riskcat01, data = hwdata)

n=62845, 2 observations deleted due to missingness.

                N Observed Expected (O-E)^2/E (O-E)^2/V
riskcat01=0 16678      543      794    79.565    109.79
riskcat01=1  6201      281      290     0.267      0.30
riskcat01=2  7093      454      341    37.242     42.65
riskcat01=3 20086      931      968     1.384      2.07
riskcat01=4 12787      797      613    55.271     70.48

 Chisq= 176  on 4 degrees of freedom, p= <2e-16 
ggsurvplot(inf.km, xlim = c(0, 12), conf.int = T, title = "Survival Function for infant mortality by DRAB, NDHS 2003-2018", 
    ylim = c(0.9, 1))

fit1 <- coxph(Surv(d_age, d_infancy) ~ riskcat01, data = hwdata, weights = pwt)

summary(fit1)
Call:
coxph(formula = Surv(d_age, d_infancy) ~ riskcat01, data = hwdata, 
    weights = pwt)

  n= 62845, number of events= 3006 
   (2 observations deleted due to missingness)

             coef exp(coef) se(coef) robust se     z    Pr(>|z|)    
riskcat011 0.4241    1.5282   0.0727    0.0854  4.96 0.000000690 ***
riskcat012 0.7230    2.0607   0.0637    0.0729  9.92     < 2e-16 ***
riskcat013 0.3503    1.4195   0.0549    0.0629  5.57 0.000000025 ***
riskcat014 0.6684    1.9511   0.0566    0.0631 10.59     < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

           exp(coef) exp(-coef) lower .95 upper .95
riskcat011      1.53      0.654      1.29      1.81
riskcat012      2.06      0.485      1.79      2.38
riskcat013      1.42      0.704      1.25      1.61
riskcat014      1.95      0.513      1.72      2.21

Concordance= 0.572  (se = 0.006 )
Likelihood ratio test= 193  on 4 df,   p=<2e-16
Wald test            = 146  on 4 df,   p=<2e-16
Score (logrank) test = 191  on 4 df,   p=<2e-16,   Robust = 153  p=<2e-16

  (Note: the likelihood ratio and score tests assume independence of
     observations within a cluster, the Wald and robust score tests do not).
ggsurvplot(inf.km, data = hwdata, risk.table = F, fun = "cumhaz", conf.int = T, title = "CHF of infant mortality by DRAB, NDHS 2003-2018", 
    xlim = c(0.5, 12), ylim = c(0, 0.08))

Interpretations

The results showed that unadjusted risks of dying during infancy is generally greater for children in other comparative DRAB categories relative to their peers in the reference group. Meanwhile, the results specifically showed that the chances of surviving in infancy is lowest for children whose births were characterized by early maternal age at birth, followed closely by those in advanced maternal age risk set, compared with other categories.

Acknowledgements

  1. This exercise was inspired by Dr. Corey Sparks’ lab session on “DEM 7223: Event History Analysis” Fall 2020 coursework, and his easy-to-follow instructional materials accessible at https://rpubs.com/corey_sparks

  2. The Demographic and Health Survey data can be downloaded with permission from the DHS Program at https://www.dhsprogram.com/data/

  3. R package was used in the preparation and analysis of data: