e03: Cox Proportional Hazards Model

0.1 Objective

The objective of this exercise is to fit a semi-parametric proportional hazards (PH) model for under-five mortality in Nigeria. The semi-parametric modeling strategy is used in circumstance where it is difficult to specify an appropriate parametric model for the outcome of interest. The Cox’s PH technique developed by Professor David R. Cox in his 1972 classic, open access article titled Regression Models and Life-Tables provides soft-landing in addressing such methodical challenge. This exercise builds on the previous ones on procedures for estimating Time-to-Event Functions and Parametric Proportional Hazard Model.

0.2 Methods

0.2.1 Data

The pooled birth-recode data sets of 2003, 2008, 2013 and 2018 Nigeria Demographic and Health Surveys (NDHS) were used:

hwdata <- read.dta("C:/A/02/01F20/DEM223 Event History Analysis/hwdata.dta")

hwdata <- hwdata %>% select(strata, psu, iwt, d_age, status_60, status_60b, status_12, 
    status_12b, bcat01, sex, bsiz, unionstat, education, wealth, place, region2, 
    dhsyearf) %>% filter(complete.cases(hwdata))

hwdata$bcat01 <- factor(hwdata$bcat01, levels = c("Group 01", "Group 02", "Group 03", 
    "Group 04", "Group 05"), labels = c("Grp01", "Grp02", "Grp03", "Grp04", "Grp05"))

hwdata$caseid <- c(1:nrow(hwdata))
hwdata$pwt = c(hwdata$iwt/1000000)
dhsdes <- svydesign(id = ~psu, strata = ~strata, data = hwdata, weights = ~pwt)

0.2.2 Outcome

The event variable is whether the last live birth had by a woman in the 59 months preceding the survey is alive (censored) or has died (uncensored). Hence, the observed time-to-event variable is the duration of survival between birth and death within the 5 years in retrospect; while the censoring indicator is the occurrence of under-five death.

0.2.3 Predictor

The main grouping variable is birth category. This is operationalized as child’s demographic risk attribute at birth (DRAB); a composite variable derived from maternal age at birth, birth order and birth interval characteristics. The aggregate DRAB variable was measured and categorized as follow:

Grp01: Middle-maternal age (20-34), combined with birth order 2-4, birth interval 24-59 months
Grp02: Middle-maternal age (20-34), combined with birth order 1
Grp03: Early-maternal age (<20), combined with any birth order, and any birth interval**
Grp04: Middle-maternal age (20-34), combined with birth order 5+, and birth interval 24-59 months
Grp05: Advanced-maternal age (35+), combined with any birth order, and any birth interval**

The resultant composite was coded and compared as Grp01 (0) vs Grp02 (1), Grp03 (2), Grp04 (3), and Grp05 (4).

** It is worthy to note that the DRAB categorization as presented here is for illustration purpose and subject to further refinement. Meanwhile, the categories was so constructed as the my interest is to investigate the effect of stage of childbearing age, especially the ages at the end of childbearing age spectrum, on early-life mortality experience. This is mainly conceived to capture and measure the impact of the mostly avoidable demographic risk associated with extreme maternal age-related fertility behaviors and compare the risk with those characterized by middle maternal age. Also, there were too few cases in birth order 1, and birth interval outside 24-59 range for early-age and advanced-age mothers.

0.2.4 Covariates

For the purpose of more informative analysis, I controlled for a list of selected correlates which I hypothesized to influence the outcome event, on one hand, and the relationship between main grouping variable and the outcome, on the other hand. The variables are listed below:

Child specific:
- Sex: Female (0) vs Male (1)
- Birth size: Large (0) vs Small (1)
Mother specific:
- Union status: In union (0) vs Not in union (1)
- Level of education: None (0) vs Primary (1), Incomplete secondary (2), and Complete secondary/Higher (3)
- Wealth status: Low (0) vs (High)
Context specific:
- Place of residence: Rural (0) vs Urban (1)
- Region of residence: North (0) vs South (1)

0.3 Results

0.3.1 Data exploration

First, I examined the structure of the data and the weighted distributions of survival status by each variable thereof. Also, I examined the characteristics of a few dead children in the data set. I conducted these assessments to have basic understanding of the internal process underlying possible patterns that could result from Cox’s PH model estimates.

0.3.1.1 Data structure

glimpse(hwdata)

Rows: 61,213
Columns: 19
$ strata     <chr> "2003-north west - urban", "2003-north west - urban", "2...
$ psu        <chr> "2003-183", "2003-183", "2003-183", "2003-183", "2003-18...
$ iwt        <int> 474592, 474592, 474592, 474592, 474592, 474592, 474592, ...
$ d_age      <dbl> 14, 17, 19, 49, 27, 26, 1, 27, 8, 14, 32, 12, 10, 3, 46,...
$ status_60  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
$ status_60b <fct> Alive, Alive, Alive, Alive, Alive, Alive, Alive, Alive, ...
$ status_12  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
$ status_12b <fct> Alive, Alive, Alive, Alive, Alive, Alive, Alive, Alive, ...
$ bcat01     <fct> Grp02, Grp02, Grp01, Grp05, Grp04, Grp04, Grp03, Grp05, ...
$ sex        <fct> Male, Male, Female, Male, Female, Female, Male, Male, Fe...
$ bsiz       <fct> Large, Large, Large, Large, Small, Large, Large, Small, ...
$ unionstat  <fct> InUnion, InUnion, InUnion, NotInUnion, NotInUnion, NotIn...
$ education  <fct> None, None, CompSec/High, None, None, Primary, None, Non...
$ wealth     <fct> High, High, High, High, High, High, Low, Low, High, High...
$ place      <fct> Urban, Urban, Urban, Urban, Urban, Urban, Urban, Urban, ...
$ region2    <fct> North, North, North, North, North, North, North, North, ...
$ dhsyearf   <fct> 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 20...
$ caseid     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1...
$ pwt        <dbl> 0.47459, 0.47459, 0.47459, 0.47459, 0.47459, 0.47459, 0....

0.3.1.2 Distributions of survival status

Survival status distribution by birth classification:

round((prop.table(svytable(~bcat01 + status_60b, design = dhsdes), 1) * 100), 1)

       status_60b
bcat01  Alive Dead
  Grp01  95.9  4.1
  Grp02  94.8  5.2
  Grp03  91.8  8.2
  Grp04  94.0  6.0
  Grp05  92.3  7.7

Survival status distribution by sex:

round((prop.table(svytable(~sex + status_60b, design = dhsdes), 1) * 100), 1)

        status_60b
sex      Alive Dead
  Female  94.5  5.5
  Male    93.5  6.5

Survival status distribution by birth size:

round((prop.table(svytable(~bsiz + status_60b, design = dhsdes), 1) * 100), 1)

       status_60b
bsiz    Alive Dead
  Large  94.4  5.6
  Small  91.9  8.1

Survival status distribution by union status:

round((prop.table(svytable(~unionstat + status_60b, design = dhsdes), 1) * 100), 
    1)

            status_60b
unionstat    Alive Dead
  InUnion     94.1  5.9
  NotInUnion  91.9  8.1

Survival status distribution by level of education:

round((prop.table(svytable(~education + status_60b, design = dhsdes), 1) * 100), 
    1)

              status_60b
education      Alive Dead
  None          92.9  7.1
  Primary       93.6  6.4
  IncompSec     94.8  5.2
  CompSec/High  96.1  3.9

Survival status distribution by wealth status:

round((prop.table(svytable(~wealth + status_60b, design = dhsdes), 1) * 100), 1)

      status_60b
wealth Alive Dead
  Low   93.1  6.9
  High  95.7  4.3

Survival status distribution by place of residence:

round((prop.table(svytable(~place + status_60b, design = dhsdes), 1) * 100), 1)

       status_60b
place   Alive Dead
  Rural  93.3  6.7
  Urban  95.3  4.7

Survival status distribution by region of residence:

round((prop.table(svytable(~region2 + status_60b, design = dhsdes), 1) * 100), 1)

       status_60b
region2 Alive Dead
  North  93.4  6.6
  South  95.2  4.8

Survival status distribution by dhs year:

round((prop.table(svytable(~dhsyearf + status_60b, design = dhsdes), 1) * 100), 1)

        status_60b
dhsyearf Alive Dead
    2018  94.2  5.8
    2013  94.5  5.5
    2008  93.6  6.4
    2003  91.6  8.4

0.3.1.3 Charateristics of dead children

First ten censored observations:

head(hwdata[hwdata$status_60 == 1, c("bcat01", "d_age", "sex", "bsiz", "unionstat", 
    "education", "wealth", "place", "region2")], n = 10)

Last ten censored observations:

tail(hwdata[hwdata$status_60 == 1, c("bcat01", "d_age", "sex", "bsiz", "unionstat", 
    "education", "wealth", "place", "region2")], n = 10)

0.3.2 Model specification

0.3.2.1 Kaplan-Meier hazard estimate

Kaplan-Meier estimate of baseline hazard of under-five mortality:

fit.haz.km <- kphaz.fit(time = hwdata$d_age, status = hwdata$status_60, method = "product-limit")
km0 <- data.frame(fit.haz.km)

km1 <- data.frame(km0[km0$time %in% c(0.5:9.5), c("time", "haz")])
km2 <- data.frame(km0[km0$time %in% c(10.5:19.5), c("time", "haz")])
km3 <- data.frame(km0[km0$time >= 20.5, c("time", "haz")])
km <- cbind(km1, km2, km3)
km

0.3.2.2 Cox’s PH model specification

0.3.2.2.1 Adjusted hazard ratios of model main effects

fit.a <- svycoxph(Surv(d_age, status_60) ~ bcat01 + sex + bsiz + unionstat + education + 
    wealth + place + region2 + dhsyearf, design = dhsdes)
summary(fit.a)

Stratified 1 - level Cluster Sampling design (with replacement)
With (3532) clusters.
svydesign(id = ~psu, strata = ~strata, data = hwdata, weights = ~pwt)

Call:
svycoxph(formula = Surv(d_age, status_60) ~ bcat01 + sex + bsiz + 
    unionstat + education + wealth + place + region2 + dhsyearf, 
    design = dhsdes)

  n= 61213, number of events= 3697 

                          coef exp(coef) se(coef) robust se     z Pr(>|z|)    
bcat01Grp02            0.40821   1.50412  0.06933   0.08038  5.08  3.8e-07 ***
bcat01Grp03            0.52543   1.69118  0.05770   0.06584  7.98  1.5e-15 ***
bcat01Grp04            0.28281   1.32686  0.04858   0.05757  4.91  9.0e-07 ***
bcat01Grp05            0.53021   1.69928  0.05087   0.05567  9.52  < 2e-16 ***
sexMale                0.17295   1.18881  0.03317   0.03852  4.49  7.1e-06 ***
bsizSmall              0.36249   1.43691  0.04157   0.04898  7.40  1.3e-13 ***
unionstatNotInUnion    0.26228   1.29989  0.06466   0.07427  3.53  0.00041 ***
educationPrimary      -0.00843   0.99160  0.04669   0.05104 -0.17  0.86878    
educationIncompSec    -0.14342   0.86639  0.06446   0.06959 -2.06  0.03931 *  
educationCompSec/High -0.30311   0.73852  0.06261   0.07424 -4.08  4.5e-05 ***
wealthHigh            -0.19988   0.81883  0.05062   0.05820 -3.43  0.00059 ***
placeUrban            -0.10484   0.90047  0.04434   0.05378 -1.95  0.05123 .  
region2South          -0.11261   0.89350  0.04591   0.05360 -2.10  0.03567 *  
dhsyearf2013          -0.07175   0.93076  0.04192   0.04994 -1.44  0.15081    
dhsyearf2008           0.11030   1.11661  0.04237   0.04886  2.26  0.02398 *  
dhsyearf2003           0.35162   1.42136  0.06374   0.08937  3.93  8.3e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

                      exp(coef) exp(-coef) lower .95 upper .95
bcat01Grp02               1.504      0.665     1.285     1.761
bcat01Grp03               1.691      0.591     1.486     1.924
bcat01Grp04               1.327      0.754     1.185     1.485
bcat01Grp05               1.699      0.588     1.524     1.895
sexMale                   1.189      0.841     1.102     1.282
bsizSmall                 1.437      0.696     1.305     1.582
unionstatNotInUnion       1.300      0.769     1.124     1.504
educationPrimary          0.992      1.008     0.897     1.096
educationIncompSec        0.866      1.154     0.756     0.993
educationCompSec/High     0.739      1.354     0.639     0.854
wealthHigh                0.819      1.221     0.731     0.918
placeUrban                0.900      1.111     0.810     1.001
region2South              0.894      1.119     0.804     0.992
dhsyearf2013              0.931      1.074     0.844     1.026
dhsyearf2008              1.117      0.896     1.015     1.229
dhsyearf2003              1.421      0.704     1.193     1.693

Concordance= 0.607  (se = 0.006 )
Likelihood ratio test= NA  on 16 df,   p=NA
Wald test            = 378  on 16 df,   p=<2e-16
Score (logrank) test = NA  on 16 df,   p=NA

  (Note: the likelihood ratio and score tests assume independence of
     observations within a cluster, the Wald and robust score tests do not).

0.3.2.2.2 Adjusted hazard ratios of model main effects and interaction term

fit.b <- svycoxph(Surv(d_age, status_60) ~ bcat01 + sex + bsiz + unionstat + education + 
    wealth + place + region2 + dhsyearf + (wealth * place), design = dhsdes)
summary(fit.b)

Stratified 1 - level Cluster Sampling design (with replacement)
With (3532) clusters.
svydesign(id = ~psu, strata = ~strata, data = hwdata, weights = ~pwt)

Call:
svycoxph(formula = Surv(d_age, status_60) ~ bcat01 + sex + bsiz + 
    unionstat + education + wealth + place + region2 + dhsyearf + 
    (wealth * place), design = dhsdes)

  n= 61213, number of events= 3697 

                         coef exp(coef) se(coef) robust se     z Pr(>|z|)    
bcat01Grp02            0.4082    1.5041   0.0693    0.0804  5.08  3.8e-07 ***
bcat01Grp03            0.5257    1.6916   0.0577    0.0658  7.99  1.4e-15 ***
bcat01Grp04            0.2819    1.3256   0.0486    0.0576  4.89  9.8e-07 ***
bcat01Grp05            0.5287    1.6967   0.0509    0.0557  9.50  < 2e-16 ***
sexMale                0.1727    1.1885   0.0332    0.0385  4.48  7.3e-06 ***
bsizSmall              0.3627    1.4372   0.0416    0.0490  7.40  1.3e-13 ***
unionstatNotInUnion    0.2585    1.2950   0.0647    0.0742  3.48  0.00049 ***
educationPrimary      -0.0129    0.9872   0.0467    0.0509 -0.25  0.79989    
educationIncompSec    -0.1470    0.8633   0.0645    0.0696 -2.11  0.03474 *  
educationCompSec/High -0.2995    0.7412   0.0625    0.0739 -4.06  5.0e-05 ***
wealthHigh            -0.1053    0.9001   0.0640    0.0720 -1.46  0.14367    
placeUrban            -0.0238    0.9765   0.0555    0.0652 -0.37  0.71507    
region2South          -0.1189    0.8879   0.0460    0.0539 -2.21  0.02734 *  
dhsyearf2013          -0.0688    0.9335   0.0419    0.0499 -1.38  0.16854    
dhsyearf2008           0.1118    1.1183   0.0424    0.0489  2.29  0.02209 *  
dhsyearf2003           0.3526    1.4228   0.0637    0.0895  3.94  8.2e-05 ***
wealthHigh:placeUrban -0.2037    0.8157   0.0878    0.1010 -2.02  0.04372 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

                      exp(coef) exp(-coef) lower .95 upper .95
bcat01Grp02               1.504      0.665     1.285     1.761
bcat01Grp03               1.692      0.591     1.487     1.924
bcat01Grp04               1.326      0.754     1.184     1.484
bcat01Grp05               1.697      0.589     1.521     1.892
sexMale                   1.189      0.841     1.102     1.282
bsizSmall                 1.437      0.696     1.306     1.582
unionstatNotInUnion       1.295      0.772     1.120     1.498
educationPrimary          0.987      1.013     0.893     1.091
educationIncompSec        0.863      1.158     0.753     0.990
educationCompSec/High     0.741      1.349     0.641     0.857
wealthHigh                0.900      1.111     0.782     1.036
placeUrban                0.976      1.024     0.859     1.110
region2South              0.888      1.126     0.799     0.987
dhsyearf2013              0.934      1.071     0.846     1.030
dhsyearf2008              1.118      0.894     1.016     1.231
dhsyearf2003              1.423      0.703     1.194     1.696
wealthHigh:placeUrban     0.816      1.226     0.669     0.994

Concordance= 0.608  (se = 0.006 )
Likelihood ratio test= NA  on 17 df,   p=NA
Wald test            = 374  on 17 df,   p=<2e-16
Score (logrank) test = NA  on 17 df,   p=NA

  (Note: the likelihood ratio and score tests assume independence of
     observations within a cluster, the Wald and robust score tests do not).

Comparison of refitted model’s AICs:

fit.aAIC <- AIC(fit.a)
fit.aAIC <- fit.aAIC[2]
fit.bAIC <- AIC(fit.b)
fit.bAIC <- fit.bAIC[2]
coxAIC <- cbind(fit.aAIC, fit.bAIC)
coxAIC

    fit.aAIC fit.bAIC
AIC    78934    78936

0.3.2.3 Risk profiles modeling and visualization

I specified a general Cox’s PH model using DRAB, wealth status and place of residence covariates, generated new data based on those variables to construct a hypothesized under-five children risk profiles, then apply the model and new data to visualize the patterns of survival, hazard, and cumulative hazard of the risk profiles:

fit.c <- svycoxph(Surv(d_age, status_60) ~ bcat01 + wealth + place, design = dhsdes)

dat <- expand.grid(bcat01 = c("Grp01", "Grp02", "Grp03", "Grp04", "Grp05"), wealth = c("Low", 
    "High"), place = c("Rural", "Urban"))
dat <- data.frame(dat)
head(dat, 10)

tail(dat, 10)

sfit1 <- survfit(fit.c)
sfit2 <- survfit(fit.c, newdata = data.frame(bcat01 = "Grp03", wealth = "Low", place = "Rural"))
sfit3 <- survfit(fit.c, newdata = data.frame(bcat01 = "Grp03", wealth = "Low", place = "Urban"))
sfit4 <- survfit(fit.c, newdata = data.frame(bcat01 = "Grp03", wealth = "High", place = "Rural"))
sfit5 <- survfit(fit.c, newdata = data.frame(bcat01 = "Grp03", wealth = "High", place = "Urban"))

H1 <- -log(sfit1$surv)
H2 <- -log(sfit2$surv)
H3 <- -log(sfit3$surv)
H4 <- -log(sfit4$surv)
H5 <- -log(sfit5$surv)

times <- sfit1$time

h1 <- loess(diff(c(0, H1)) ~ times, degree = 1, span = 0.25)
h2 <- loess(diff(c(0, H2)) ~ times, degree = 1, span = 0.25)
h3 <- loess(diff(c(0, H3)) ~ times, degree = 1, span = 0.25)
h4 <- loess(diff(c(0, H4)) ~ times, degree = 1, span = 0.25)
h5 <- loess(diff(c(0, H5)) ~ times, degree = 1, span = 0.25)

Survival function plot of the hypothesized risk profiles:

plot(survfit(fit.c, conf.int = F), ylab = "S(t)", xlab = "Months", ylim = c(0.85, 
    1), xlim = c(0, 60))
title(main = "", "Fig. 1 Survivorship profiles of under-five mortality risks, NDHS 2003-2018")
lines(survfit(fit.c, newdata = data.frame(bcat01 = "Grp03", wealth = "Low", place = "Rural"), 
    conf.int = F), col = "red")
lines(survfit(fit.c, newdata = data.frame(bcat01 = "Grp03", wealth = "Low", place = "Urban"), 
    conf.int = F), col = "green")
lines(survfit(fit.c, newdata = data.frame(bcat01 = "Grp03", wealth = "High", place = "Rural"), 
    conf.int = F), col = "purple")
lines(survfit(fit.c, newdata = data.frame(bcat01 = "Grp03", wealth = "High", place = "Urban"), 
    conf.int = F), col = "blue")
legend("bottomleft", legend = c("covariates means", "grp03, low, rural", "grp03, low, urban", 
    "grp03, high, rural", "grp03, high, urban"), lty = 1, col = c(1, "red", "green", 
    "purple", "blue"), cex = 0.8)

Hazard function plot of the hypothesized risk profiles:

plot(predict(h1) ~ times, type = "l", ylab = "h(t), smoothed", xlab = "Months", ylim = c(0.0001, 
    0.015))
title(main = "", "Fig. 2 Hazard profiles of under-five mortality risks, NDHS 2003-2018")
lines(predict(h2) ~ times, type = "l", col = "red")
lines(predict(h3) ~ times, type = "l", col = "green")
lines(predict(h4) ~ times, type = "l", col = "purple")
lines(predict(h5) ~ times, type = "l", col = "blue")
legend("topright", legend = c("covariates means", "grp03, low, rural", "grp03, low, urban", 
    "grp03, high, rural", "grp03, high, urban"), lty = 1, col = c(1, "red", "green", 
    "purple", "blue"), cex = 0.8)

Cumulative hazard function plot of the hypothesized risk profiles:

plot(H1 ~ times, type = "l", ylab = "H(t), smoothed", xlab = "Months", ylim = c(0.005, 
    0.15))
title(main = "", "Fig. 3 Cumulative hazard profiles of under-five mortality risks, NDHS 2003-2018")
lines(H2 ~ times, col = "red", type = "l")
lines(H3 ~ times, col = "green", type = "l")
lines(H4 ~ times, col = "purple", type = "l")
lines(H5 ~ times, col = "blue", type = "l")
legend("topleft", legend = c("covariates means", "grp03, low, rural", "grp03, low, urban", 
    "grp03, high, rural", "grp03, high, urban"), lty = 1, col = c(1, "red", "green", 
    "purple", "blue"), cex = 0.8)

0.4 Interpretations

The results showed substantially significant disparities in risk of dying before age age 5 by DRAB dimensions. Compared to those in reference Grp01, the adjusted hazard of dying is equally greatest among children in Grp03 and Grp05 (AHR: +69%, p<.001), followed by those in Grp02 (AHR: +50%, p<.001), and Grp04 (AHR: +30%, p<.001). This implies that not only does childbearing at extreme maternal ages confer significant mortality risks on children, but also that first births even when had during safe maternal age period of 20-34 years still face enormous risks of not surviving to age 5 in Nigeria. In addition, the significantly higher mortality risk of Grp04 children suggests that having babies outside safe birth interval (24-59 months) could undermine the protective effect of childbearing within safe maternal age range (20-34 years).

It is further revealed that children who are male, small-sized and born to mothers not in union might be more vulnerable to die before their fifth birthday. In contrast, attainment of secondary education, high wealth status and residence in southern region may have significantly positive effect on survival chances. Also, relative to 2018 cohort, risks of dying were substantially higher among 2003 and 2008 cohorts.

Notably, the results indicate that characteristic dependency of wealth status and place of residence might play greater role in under-five survival. This is evident in the model with the interaction term where children of high wealth status mothers who reside in urban settings had about 18% (p<.05) reduced risk of dying before 5. This dependency was further strengthened by graphical representations of survival, hazard and cumulative hazard functions. Meanwhile, the results of the AICs showed that the main effect model without interaction term (AIC: 78934) is more parsimonious than the model with interaction term (AIC: 78936).

0.5 Acknowledgements

This exercise was inspired by Dr. Corey Sparks’ lab session on “DEM 7223: Event History Analysis” Fall 2020 coursework, and his compendium of easy-to-follow instructional materials accessible at https://rpubs.com/corey_sparks.
The Demographic and Health Survey data can be downloaded with permission from the DHS Program at https://www.dhsprogram.com/data/.
R package https://www.R-project.org/ was used for data wrangling and analysis.

Created on October 04, 2020