The objective of this exercise is to fit a semi-parametric proportional hazards (PH) model for under-five mortality in Nigeria. The semi-parametric modeling strategy is used in circumstance where it is difficult to specify an appropriate parametric model for the outcome of interest. The Cox’s PH technique developed by Professor David R. Cox in his 1972 classic, open access article titled Regression Models and Life-Tables provides soft-landing in addressing such methodical challenge. This exercise builds on the previous ones on procedures for estimating Time-to-Event Functions and Parametric Proportional Hazard Model.
The pooled birth-recode data sets of 2003, 2008, 2013 and 2018 Nigeria Demographic and Health Surveys (NDHS) were used:
hwdata <- read.dta("C:/A/02/01F20/DEM223 Event History Analysis/hwdata.dta")
hwdata <- hwdata %>% select(strata, psu, iwt, d_age, status_60, status_60b, status_12,
status_12b, bcat01, sex, bsiz, unionstat, education, wealth, place, region2,
dhsyearf) %>% filter(complete.cases(hwdata))
hwdata$bcat01 <- factor(hwdata$bcat01, levels = c("Group 01", "Group 02", "Group 03",
"Group 04", "Group 05"), labels = c("Grp01", "Grp02", "Grp03", "Grp04", "Grp05"))
hwdata$caseid <- c(1:nrow(hwdata))
hwdata$pwt = c(hwdata$iwt/1000000)
dhsdes <- svydesign(id = ~psu, strata = ~strata, data = hwdata, weights = ~pwt)
The event variable is whether the last live birth had by a woman in the 59 months preceding the survey is alive (censored) or has died (uncensored). Hence, the observed time-to-event variable is the duration of survival between birth and death within the 5 years in retrospect; while the censoring indicator is the occurrence of under-five death.
The main grouping variable is birth category. This is operationalized as child’s demographic risk attribute at birth (DRAB); a composite variable derived from maternal age at birth, birth order and birth interval characteristics. The aggregate DRAB variable was measured and categorized as follow:
The resultant composite was coded and compared as Grp01 (0) vs Grp02 (1), Grp03 (2), Grp04 (3), and Grp05 (4).
** It is worthy to note that the DRAB categorization as presented here is for illustration purpose and subject to further refinement. Meanwhile, the categories was so constructed as the my interest is to investigate the effect of stage of childbearing age, especially the ages at the end of childbearing age spectrum, on early-life mortality experience. This is mainly conceived to capture and measure the impact of the mostly avoidable demographic risk associated with extreme maternal age-related fertility behaviors and compare the risk with those characterized by middle maternal age. Also, there were too few cases in birth order 1, and birth interval outside 24-59 range for early-age and advanced-age mothers.
For the purpose of more informative analysis, I controlled for a list of selected correlates which I hypothesized to influence the outcome event, on one hand, and the relationship between main grouping variable and the outcome, on the other hand. The variables are listed below:
First, I examined the structure of the data and the weighted distributions of survival status by each variable thereof. Also, I examined the characteristics of a few dead children in the data set. I conducted these assessments to have basic understanding of the internal process underlying possible patterns that could result from Cox’s PH model estimates.
Rows: 61,213
Columns: 19
$ strata <chr> "2003-north west - urban", "2003-north west - urban", "2...
$ psu <chr> "2003-183", "2003-183", "2003-183", "2003-183", "2003-18...
$ iwt <int> 474592, 474592, 474592, 474592, 474592, 474592, 474592, ...
$ d_age <dbl> 14, 17, 19, 49, 27, 26, 1, 27, 8, 14, 32, 12, 10, 3, 46,...
$ status_60 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
$ status_60b <fct> Alive, Alive, Alive, Alive, Alive, Alive, Alive, Alive, ...
$ status_12 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
$ status_12b <fct> Alive, Alive, Alive, Alive, Alive, Alive, Alive, Alive, ...
$ bcat01 <fct> Grp02, Grp02, Grp01, Grp05, Grp04, Grp04, Grp03, Grp05, ...
$ sex <fct> Male, Male, Female, Male, Female, Female, Male, Male, Fe...
$ bsiz <fct> Large, Large, Large, Large, Small, Large, Large, Small, ...
$ unionstat <fct> InUnion, InUnion, InUnion, NotInUnion, NotInUnion, NotIn...
$ education <fct> None, None, CompSec/High, None, None, Primary, None, Non...
$ wealth <fct> High, High, High, High, High, High, Low, Low, High, High...
$ place <fct> Urban, Urban, Urban, Urban, Urban, Urban, Urban, Urban, ...
$ region2 <fct> North, North, North, North, North, North, North, North, ...
$ dhsyearf <fct> 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 20...
$ caseid <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1...
$ pwt <dbl> 0.47459, 0.47459, 0.47459, 0.47459, 0.47459, 0.47459, 0....
Survival status distribution by birth classification:
status_60b
bcat01 Alive Dead
Grp01 95.9 4.1
Grp02 94.8 5.2
Grp03 91.8 8.2
Grp04 94.0 6.0
Grp05 92.3 7.7
Survival status distribution by sex:
status_60b
sex Alive Dead
Female 94.5 5.5
Male 93.5 6.5
Survival status distribution by birth size:
status_60b
bsiz Alive Dead
Large 94.4 5.6
Small 91.9 8.1
Survival status distribution by union status:
status_60b
unionstat Alive Dead
InUnion 94.1 5.9
NotInUnion 91.9 8.1
Survival status distribution by level of education:
status_60b
education Alive Dead
None 92.9 7.1
Primary 93.6 6.4
IncompSec 94.8 5.2
CompSec/High 96.1 3.9
Survival status distribution by wealth status:
status_60b
wealth Alive Dead
Low 93.1 6.9
High 95.7 4.3
Survival status distribution by place of residence:
status_60b
place Alive Dead
Rural 93.3 6.7
Urban 95.3 4.7
Survival status distribution by region of residence:
status_60b
region2 Alive Dead
North 93.4 6.6
South 95.2 4.8
Survival status distribution by dhs year:
status_60b
dhsyearf Alive Dead
2018 94.2 5.8
2013 94.5 5.5
2008 93.6 6.4
2003 91.6 8.4
First ten censored observations:
head(hwdata[hwdata$status_60 == 1, c("bcat01", "d_age", "sex", "bsiz", "unionstat",
"education", "wealth", "place", "region2")], n = 10)
Last ten censored observations:
tail(hwdata[hwdata$status_60 == 1, c("bcat01", "d_age", "sex", "bsiz", "unionstat",
"education", "wealth", "place", "region2")], n = 10)
Kaplan-Meier estimate of baseline hazard of under-five mortality:
fit.haz.km <- kphaz.fit(time = hwdata$d_age, status = hwdata$status_60, method = "product-limit")
km0 <- data.frame(fit.haz.km)
km1 <- data.frame(km0[km0$time %in% c(0.5:9.5), c("time", "haz")])
km2 <- data.frame(km0[km0$time %in% c(10.5:19.5), c("time", "haz")])
km3 <- data.frame(km0[km0$time >= 20.5, c("time", "haz")])
km <- cbind(km1, km2, km3)
km
fit.a <- svycoxph(Surv(d_age, status_60) ~ bcat01 + sex + bsiz + unionstat + education +
wealth + place + region2 + dhsyearf, design = dhsdes)
summary(fit.a)
Stratified 1 - level Cluster Sampling design (with replacement)
With (3532) clusters.
svydesign(id = ~psu, strata = ~strata, data = hwdata, weights = ~pwt)
Call:
svycoxph(formula = Surv(d_age, status_60) ~ bcat01 + sex + bsiz +
unionstat + education + wealth + place + region2 + dhsyearf,
design = dhsdes)
n= 61213, number of events= 3697
coef exp(coef) se(coef) robust se z Pr(>|z|)
bcat01Grp02 0.40821 1.50412 0.06933 0.08038 5.08 3.8e-07 ***
bcat01Grp03 0.52543 1.69118 0.05770 0.06584 7.98 1.5e-15 ***
bcat01Grp04 0.28281 1.32686 0.04858 0.05757 4.91 9.0e-07 ***
bcat01Grp05 0.53021 1.69928 0.05087 0.05567 9.52 < 2e-16 ***
sexMale 0.17295 1.18881 0.03317 0.03852 4.49 7.1e-06 ***
bsizSmall 0.36249 1.43691 0.04157 0.04898 7.40 1.3e-13 ***
unionstatNotInUnion 0.26228 1.29989 0.06466 0.07427 3.53 0.00041 ***
educationPrimary -0.00843 0.99160 0.04669 0.05104 -0.17 0.86878
educationIncompSec -0.14342 0.86639 0.06446 0.06959 -2.06 0.03931 *
educationCompSec/High -0.30311 0.73852 0.06261 0.07424 -4.08 4.5e-05 ***
wealthHigh -0.19988 0.81883 0.05062 0.05820 -3.43 0.00059 ***
placeUrban -0.10484 0.90047 0.04434 0.05378 -1.95 0.05123 .
region2South -0.11261 0.89350 0.04591 0.05360 -2.10 0.03567 *
dhsyearf2013 -0.07175 0.93076 0.04192 0.04994 -1.44 0.15081
dhsyearf2008 0.11030 1.11661 0.04237 0.04886 2.26 0.02398 *
dhsyearf2003 0.35162 1.42136 0.06374 0.08937 3.93 8.3e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
bcat01Grp02 1.504 0.665 1.285 1.761
bcat01Grp03 1.691 0.591 1.486 1.924
bcat01Grp04 1.327 0.754 1.185 1.485
bcat01Grp05 1.699 0.588 1.524 1.895
sexMale 1.189 0.841 1.102 1.282
bsizSmall 1.437 0.696 1.305 1.582
unionstatNotInUnion 1.300 0.769 1.124 1.504
educationPrimary 0.992 1.008 0.897 1.096
educationIncompSec 0.866 1.154 0.756 0.993
educationCompSec/High 0.739 1.354 0.639 0.854
wealthHigh 0.819 1.221 0.731 0.918
placeUrban 0.900 1.111 0.810 1.001
region2South 0.894 1.119 0.804 0.992
dhsyearf2013 0.931 1.074 0.844 1.026
dhsyearf2008 1.117 0.896 1.015 1.229
dhsyearf2003 1.421 0.704 1.193 1.693
Concordance= 0.607 (se = 0.006 )
Likelihood ratio test= NA on 16 df, p=NA
Wald test = 378 on 16 df, p=<2e-16
Score (logrank) test = NA on 16 df, p=NA
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
fit.b <- svycoxph(Surv(d_age, status_60) ~ bcat01 + sex + bsiz + unionstat + education +
wealth + place + region2 + dhsyearf + (wealth * place), design = dhsdes)
summary(fit.b)
Stratified 1 - level Cluster Sampling design (with replacement)
With (3532) clusters.
svydesign(id = ~psu, strata = ~strata, data = hwdata, weights = ~pwt)
Call:
svycoxph(formula = Surv(d_age, status_60) ~ bcat01 + sex + bsiz +
unionstat + education + wealth + place + region2 + dhsyearf +
(wealth * place), design = dhsdes)
n= 61213, number of events= 3697
coef exp(coef) se(coef) robust se z Pr(>|z|)
bcat01Grp02 0.4082 1.5041 0.0693 0.0804 5.08 3.8e-07 ***
bcat01Grp03 0.5257 1.6916 0.0577 0.0658 7.99 1.4e-15 ***
bcat01Grp04 0.2819 1.3256 0.0486 0.0576 4.89 9.8e-07 ***
bcat01Grp05 0.5287 1.6967 0.0509 0.0557 9.50 < 2e-16 ***
sexMale 0.1727 1.1885 0.0332 0.0385 4.48 7.3e-06 ***
bsizSmall 0.3627 1.4372 0.0416 0.0490 7.40 1.3e-13 ***
unionstatNotInUnion 0.2585 1.2950 0.0647 0.0742 3.48 0.00049 ***
educationPrimary -0.0129 0.9872 0.0467 0.0509 -0.25 0.79989
educationIncompSec -0.1470 0.8633 0.0645 0.0696 -2.11 0.03474 *
educationCompSec/High -0.2995 0.7412 0.0625 0.0739 -4.06 5.0e-05 ***
wealthHigh -0.1053 0.9001 0.0640 0.0720 -1.46 0.14367
placeUrban -0.0238 0.9765 0.0555 0.0652 -0.37 0.71507
region2South -0.1189 0.8879 0.0460 0.0539 -2.21 0.02734 *
dhsyearf2013 -0.0688 0.9335 0.0419 0.0499 -1.38 0.16854
dhsyearf2008 0.1118 1.1183 0.0424 0.0489 2.29 0.02209 *
dhsyearf2003 0.3526 1.4228 0.0637 0.0895 3.94 8.2e-05 ***
wealthHigh:placeUrban -0.2037 0.8157 0.0878 0.1010 -2.02 0.04372 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
bcat01Grp02 1.504 0.665 1.285 1.761
bcat01Grp03 1.692 0.591 1.487 1.924
bcat01Grp04 1.326 0.754 1.184 1.484
bcat01Grp05 1.697 0.589 1.521 1.892
sexMale 1.189 0.841 1.102 1.282
bsizSmall 1.437 0.696 1.306 1.582
unionstatNotInUnion 1.295 0.772 1.120 1.498
educationPrimary 0.987 1.013 0.893 1.091
educationIncompSec 0.863 1.158 0.753 0.990
educationCompSec/High 0.741 1.349 0.641 0.857
wealthHigh 0.900 1.111 0.782 1.036
placeUrban 0.976 1.024 0.859 1.110
region2South 0.888 1.126 0.799 0.987
dhsyearf2013 0.934 1.071 0.846 1.030
dhsyearf2008 1.118 0.894 1.016 1.231
dhsyearf2003 1.423 0.703 1.194 1.696
wealthHigh:placeUrban 0.816 1.226 0.669 0.994
Concordance= 0.608 (se = 0.006 )
Likelihood ratio test= NA on 17 df, p=NA
Wald test = 374 on 17 df, p=<2e-16
Score (logrank) test = NA on 17 df, p=NA
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
Comparison of refitted model’s AICs:
fit.aAIC <- AIC(fit.a)
fit.aAIC <- fit.aAIC[2]
fit.bAIC <- AIC(fit.b)
fit.bAIC <- fit.bAIC[2]
coxAIC <- cbind(fit.aAIC, fit.bAIC)
coxAIC
fit.aAIC fit.bAIC
AIC 78934 78936
I specified a general Cox’s PH model using DRAB, wealth status and place of residence covariates, generated new data based on those variables to construct a hypothesized under-five children risk profiles, then apply the model and new data to visualize the patterns of survival, hazard, and cumulative hazard of the risk profiles:
dat <- expand.grid(bcat01 = c("Grp01", "Grp02", "Grp03", "Grp04", "Grp05"), wealth = c("Low",
"High"), place = c("Rural", "Urban"))
dat <- data.frame(dat)
head(dat, 10)
sfit1 <- survfit(fit.c)
sfit2 <- survfit(fit.c, newdata = data.frame(bcat01 = "Grp03", wealth = "Low", place = "Rural"))
sfit3 <- survfit(fit.c, newdata = data.frame(bcat01 = "Grp03", wealth = "Low", place = "Urban"))
sfit4 <- survfit(fit.c, newdata = data.frame(bcat01 = "Grp03", wealth = "High", place = "Rural"))
sfit5 <- survfit(fit.c, newdata = data.frame(bcat01 = "Grp03", wealth = "High", place = "Urban"))
H1 <- -log(sfit1$surv)
H2 <- -log(sfit2$surv)
H3 <- -log(sfit3$surv)
H4 <- -log(sfit4$surv)
H5 <- -log(sfit5$surv)
times <- sfit1$time
h1 <- loess(diff(c(0, H1)) ~ times, degree = 1, span = 0.25)
h2 <- loess(diff(c(0, H2)) ~ times, degree = 1, span = 0.25)
h3 <- loess(diff(c(0, H3)) ~ times, degree = 1, span = 0.25)
h4 <- loess(diff(c(0, H4)) ~ times, degree = 1, span = 0.25)
h5 <- loess(diff(c(0, H5)) ~ times, degree = 1, span = 0.25)
Survival function plot of the hypothesized risk profiles:
plot(survfit(fit.c, conf.int = F), ylab = "S(t)", xlab = "Months", ylim = c(0.85,
1), xlim = c(0, 60))
title(main = "", "Fig. 1 Survivorship profiles of under-five mortality risks, NDHS 2003-2018")
lines(survfit(fit.c, newdata = data.frame(bcat01 = "Grp03", wealth = "Low", place = "Rural"),
conf.int = F), col = "red")
lines(survfit(fit.c, newdata = data.frame(bcat01 = "Grp03", wealth = "Low", place = "Urban"),
conf.int = F), col = "green")
lines(survfit(fit.c, newdata = data.frame(bcat01 = "Grp03", wealth = "High", place = "Rural"),
conf.int = F), col = "purple")
lines(survfit(fit.c, newdata = data.frame(bcat01 = "Grp03", wealth = "High", place = "Urban"),
conf.int = F), col = "blue")
legend("bottomleft", legend = c("covariates means", "grp03, low, rural", "grp03, low, urban",
"grp03, high, rural", "grp03, high, urban"), lty = 1, col = c(1, "red", "green",
"purple", "blue"), cex = 0.8)
Hazard function plot of the hypothesized risk profiles:
plot(predict(h1) ~ times, type = "l", ylab = "h(t), smoothed", xlab = "Months", ylim = c(0.0001,
0.015))
title(main = "", "Fig. 2 Hazard profiles of under-five mortality risks, NDHS 2003-2018")
lines(predict(h2) ~ times, type = "l", col = "red")
lines(predict(h3) ~ times, type = "l", col = "green")
lines(predict(h4) ~ times, type = "l", col = "purple")
lines(predict(h5) ~ times, type = "l", col = "blue")
legend("topright", legend = c("covariates means", "grp03, low, rural", "grp03, low, urban",
"grp03, high, rural", "grp03, high, urban"), lty = 1, col = c(1, "red", "green",
"purple", "blue"), cex = 0.8)
Cumulative hazard function plot of the hypothesized risk profiles:
plot(H1 ~ times, type = "l", ylab = "H(t), smoothed", xlab = "Months", ylim = c(0.005,
0.15))
title(main = "", "Fig. 3 Cumulative hazard profiles of under-five mortality risks, NDHS 2003-2018")
lines(H2 ~ times, col = "red", type = "l")
lines(H3 ~ times, col = "green", type = "l")
lines(H4 ~ times, col = "purple", type = "l")
lines(H5 ~ times, col = "blue", type = "l")
legend("topleft", legend = c("covariates means", "grp03, low, rural", "grp03, low, urban",
"grp03, high, rural", "grp03, high, urban"), lty = 1, col = c(1, "red", "green",
"purple", "blue"), cex = 0.8)
The results showed substantially significant disparities in risk of dying before age age 5 by DRAB dimensions. Compared to those in reference Grp01, the adjusted hazard of dying is equally greatest among children in Grp03 and Grp05 (AHR: +69%, p<.001), followed by those in Grp02 (AHR: +50%, p<.001), and Grp04 (AHR: +30%, p<.001). This implies that not only does childbearing at extreme maternal ages confer significant mortality risks on children, but also that first births even when had during safe maternal age period of 20-34 years still face enormous risks of not surviving to age 5 in Nigeria. In addition, the significantly higher mortality risk of Grp04 children suggests that having babies outside safe birth interval (24-59 months) could undermine the protective effect of childbearing within safe maternal age range (20-34 years).
It is further revealed that children who are male, small-sized and born to mothers not in union might be more vulnerable to die before their fifth birthday. In contrast, attainment of secondary education, high wealth status and residence in southern region may have significantly positive effect on survival chances. Also, relative to 2018 cohort, risks of dying were substantially higher among 2003 and 2008 cohorts.
Notably, the results indicate that characteristic dependency of wealth status and place of residence might play greater role in under-five survival. This is evident in the model with the interaction term where children of high wealth status mothers who reside in urban settings had about 18% (p<.05) reduced risk of dying before 5. This dependency was further strengthened by graphical representations of survival, hazard and cumulative hazard functions. Meanwhile, the results of the AICs showed that the main effect model without interaction term (AIC: 78934) is more parsimonious than the model with interaction term (AIC: 78936).
This exercise was inspired by Dr. Corey Sparks’ lab session on “DEM 7223: Event History Analysis” Fall 2020 coursework, and his compendium of easy-to-follow instructional materials accessible at https://rpubs.com/corey_sparks.
The Demographic and Health Survey data can be downloaded with permission from the DHS Program at https://www.dhsprogram.com/data/.
R package https://www.R-project.org/ was used for data wrangling and analysis.
Created on October 04, 2020