e04: Discrete-Time Hazards Model

0.1 Objective

The objective of this lab is to fit a Discrete-Time Hazards (DTH) model for under-five mortality in Nigeria. This exercise builds on the previous ones on procedures for estimating Time-to-Event Functions, Parametric Proportional Hazard Model, and Cox Proportional Hazards Model.

0.2 Methods

0.2.1 Data

The pooled birth-recode data sets of 2003, 2008, 2013 and 2018 Nigeria Demographic and Health Surveys (NDHS) were used:

hwdata <- read_rds("../hwdata.Rda")

hwdata <- hwdata %>% select(strata, psu, iwt, d_age, status_60, status_60b, status_12, 
    status_12b, bcat01, sex, bsiz, unionstat, education, wealth, place, region2, 
    dhsyearf) %>% filter(complete.cases(hwdata))

hwdata$bcat01 <- factor(hwdata$bcat01, levels = c("Group 01", "Group 02", "Group 03", 
    "Group 04", "Group 05"), labels = c("Grp01", "Grp02", "Grp03", "Grp04", "Grp05"))
hwdata$wealth <- factor(hwdata$wealth, levels = c("Low", "High"), labels = c("LowSES", 
    "HighSES"))

hwdata$caseid <- c(1:nrow(hwdata))

hwdata$psu = paste(hwdata$dhsyearf, hwdata$psu., sep = "-")
hwdata$strata = paste(hwdata$dhsyearf, hwdata$strata., sep = "-")
hwdata$pwt = c(hwdata$iwt/1000000)

0.2.2 Outcome

The event variable is whether the last live birth had by a woman in the 59 months preceding the survey is alive (censored) or has died (uncensored). Hence, the observed time-to-event variable is the duration of survival between birth and death within the 5 years in retrospect; while the censoring indicator is the occurrence of under-five death.

0.2.3 Predictor

The main grouping variable is birth category. This is operationalized as child’s demographic risk attribute at birth (DRAB); a composite variable derived from maternal age at birth, birth order and birth interval characteristics. The aggregate DRAB variable was measured and categorized as follow:

Grp01: Middle-maternal age (20-34), combined with birth order 2-4, birth interval 24-59 months
Grp02: Middle-maternal age (20-34), combined with birth order 1
Grp03: Early-maternal age (<20), combined with any birth order, and any birth interval**
Grp04: Middle-maternal age (20-34), combined with birth order 5+, and birth interval 24-59 months
Grp05: Advanced-maternal age (35+), combined with any birth order, and any birth interval**

The resultant composite was coded and compared as Grp01 (0) vs Grp02 (1), Grp03 (2), Grp04 (3), and Grp05 (4).

** It is worthy to note that the DRAB categorization as presented here is for illustration purpose and subject to further refinement. Meanwhile, the categories was so constructed as the my interest is to investigate the effect of stage of childbearing age, especially the ages at the end of childbearing age spectrum, on early-life mortality experience. This is mainly conceived to capture and measure the impact of the mostly avoidable demographic risk associated with extreme maternal age-related fertility behaviors and compare the risk with those characterized by middle maternal age. Also, there were too few cases in birth order 1, and birth interval outside 24-59 range for early-age and advanced-age mothers.

0.2.4 Covariates

I purposely selected few covariables which may influence the a child’s chances of survival, on one hand, and the relationship between main independent variable and the outcome, on the other hand. The potential variables are grouped into three categories:

Child specific:
- Sex: Female (0) vs Male (1)
- Birth size: Large (0) vs Small (1)
Mother specific:
- Union status: In union (0) vs Not in union (1)
- Level of education: None (0) vs Primary (1), Incomplete secondary (2), and Complete secondary/Higher (3)
- Wealth status: Low (0) vs (High)
Context specific:
- Place of residence: Rural (0) vs Urban (1)
- Region of residence: North (0) vs South (1)

0.3 Data restructuring

0.3.1 Creating person-period data

As earlier stated, the focus of this exercise is to examine hazards of underfive mortality using discrete time approach, in which time/duration is discretely measured rather than been observed continuously, as applicable in previous exercises referenced above. By this method, I transformed the dataset from its original case-duration format to a person-period format using 12-month cut-points and the survSplit() embedded in survival library.

ppdata <- survSplit(Surv(d_age, status_60) ~ ., data = hwdata[hwdata$d_age > 0, ], 
    cut = c(0, 12, 24, 36, 48, 60), episode = "yeardead")
ppdata$year <- ppdata$yeardead - 1
freq(ppdata$year, cumul = F)

Frequencies  
ppdata$year  
Type: Numeric  

                Freq   % Valid   % Total
----------- -------- --------- ---------
          1    58844     44.64     44.64
          2    38103     28.90     28.90
          3    21102     16.01     16.01
          4    10012      7.59      7.59
          5     3771      2.86      2.86
       <NA>        0                0.00
      Total   131832    100.00    100.00

ppdata <- ppdata[order(ppdata$caseid, ppdata$yeardead), ]
head(ppdata[, c("caseid", "year", "status_60", "d_age", "bcat01", "sex", "education", 
    "wealth", "place")], n = 30)

0.4 Results

0.4.1 Descriptive analysis

ppdata %>% group_by(year) %>% summarise(propdeadyr = mean(status_60, na.rm = T)) %>% 
    ggplot(aes(x = year, y = propdeadyr)) + geom_line() + theme(legend.title = element_text(colour = "red", 
    size = 9, face = "bold")) + theme(legend.position = "right") + ggtitle(label = "Hazard of underfive mortality by year")

ppdata %>% group_by(year, bcat01) %>% summarise(propdeadyr = mean(status_60, na.rm = T)) %>% 
    ggplot(aes(x = year, y = propdeadyr)) + geom_line(aes(group = factor(bcat01), 
    color = factor(bcat01))) + theme(legend.title = element_text(colour = "red", 
    size = 9, face = "bold")) + theme(legend.position = "right") + ggtitle(label = "Hazard of underfive mortality by year and birth category")

0.4.2 Mortality pattern by charateristics

Also, I examined the survey-weighted pattern of survival status by the characteristics of children who had died to understand the pattern of risks that might inform the DTH model estimates. Meanwhile, the new person-period data was used to specify the survey design parameter.

dhsdes <- svydesign(id = ~psu, strata = ~strata, data = ppdata, weights = ~pwt)

# Survival status distribution by birth classification:
a = round((prop.table(svytable(~bcat01 + status_60b, design = dhsdes), 1) * 100), 
    1)

# Survival status distribution by sex:
b = round((prop.table(svytable(~sex + status_60b, design = dhsdes), 1) * 100), 1)

# Survival status distribution by level of education:
c = round((prop.table(svytable(~education + status_60b, design = dhsdes), 1) * 100), 
    1)

# Survival status distribution by wealth status:
d = round((prop.table(svytable(~wealth + status_60b, design = dhsdes), 1) * 100), 
    1)

# Survival status distribution by place of residence:
e = round((prop.table(svytable(~place + status_60b, design = dhsdes), 1) * 100), 
    1)

tab1 = data.frame(rbind(a, b, c, d, e))
tab1

First ten uncensored observations:

head(hwdata[hwdata$status_60 == 1, c("d_age", "bcat01", "sex", "education", "wealth", 
    "place")], n = 10)

Last ten uncensored observations:

tail(hwdata[hwdata$status_60 == 1, c("d_age", "bcat01", "sex", "education", "wealth", 
    "place")], n = 10)

0.4.3 Fitting the DTH model

0.4.3.1 The base model

The base DTH model takes account of time variable only, removing the intercept in the model fit to obtain hazard estimate for each time period:

# I do -1 so that no intercept is fit in the model, and we get a hazard estimate
# for each time period
mod0 <- svyglm(status_60 ~ as.factor(year) - 1, design = dhsdes, family = "binomial")
summary(mod0)


Call:
svyglm(formula = status_60 ~ as.factor(year) - 1, design = dhsdes, 
    family = "binomial")

Survey design:
svydesign(id = ~psu, strata = ~strata, data = ppdata, weights = ~pwt)

Coefficients:
                 Estimate Std. Error
as.factor(year)1    -3.80        Inf
as.factor(year)2    -3.87        Inf
as.factor(year)3    -5.74        Inf
as.factor(year)4    -7.11        Inf
as.factor(year)5    -8.97        Inf

(Dispersion parameter for binomial family taken to be 1)

Number of Fisher Scoring iterations: 10

The results from the base DTH model suggest a progressively decreasing hazards of dying among underfive as the observation years increase.

0.4.3.2 Base DTH model hazard function plot

I plotted the hazard function of the base DTH on the probability scale:

haz <- 1/(1 + exp(-coef(mod0)))
time <- seq(1, 5, 1)
plot(haz ~ time, type = "l", ylab = "h(t)")
title(main = "Base DTH function for underfive mortality")

The plot confirms the significantly decreasing pattern observed based on the model estimates.

0.4.3.3 The multivariable model

I examine the multivariate proportional hazards model estimates with which incorporated other selected correlates:

mod1 <- svyglm(status_60 ~ as.factor(year) + bcat01 + sex + education + wealth + 
    place - 1, design = dhsdes, family = binomial(link = "cloglog"))
summary(mod1)


Call:
svyglm(formula = status_60 ~ as.factor(year) + bcat01 + sex + 
    education + wealth + place - 1, design = dhsdes, family = binomial(link = "cloglog"))

Survey design:
svydesign(id = ~psu, strata = ~strata, data = ppdata, weights = ~pwt)

Coefficients:
                      Estimate Std. Error
as.factor(year)1        -3.830        Inf
as.factor(year)2        -3.904        Inf
as.factor(year)3        -5.763        Inf
as.factor(year)4        -7.118        Inf
as.factor(year)5        -8.955        Inf
bcat01Grp02              0.181        Inf
bcat01Grp03              0.408        Inf
bcat01Grp04              0.301        Inf
bcat01Grp05              0.479        Inf
sexMale                  0.108        Inf
educationPrimary        -0.142        Inf
educationIncompSec      -0.233        Inf
educationCompSec/High   -0.659        Inf
wealthHighSES           -0.362        Inf
placeUrban              -0.178        Inf

(Dispersion parameter for binomial family taken to be 0.95803)

Number of Fisher Scoring iterations: 11

Aside the consistent reduction in hazards of dying across the defined time periods, the hazards varied significantly by birth classification, sex of the child, mother’s education attainment, wealth status and place of residence. For instance, the risks of deaths were highest among children in Grp03 and Grp05 with 50.38% and 61.45% significantly higher hazards of underfive mortality follwed by those in Grp04 compared with their Grp01 peers. In contrast, children born to highly educated, high ses and urban women had approximately -48.26%, -48.26%, and -48.26% lower mortality risks relative to those born to lowly educated, low ses and rural women, respectively.

0.4.3.4 Plot depicting predicted hazards

I fitted a predictive DTH model with single independent variable and examined the within-between pattern of proportional mortality hazards:

mod2 <- svyglm(status_60 ~ as.factor(year) + wealth - 1, design = dhsdes, family = binomial(link = "cloglog"))
summary(mod2)


Call:
svyglm(formula = status_60 ~ as.factor(year) + wealth - 1, design = dhsdes, 
    family = binomial(link = "cloglog"))

Survey design:
svydesign(id = ~psu, strata = ~strata, data = ppdata, weights = ~pwt)

Coefficients:
                 Estimate Std. Error
as.factor(year)1   -3.595        Inf
as.factor(year)2   -3.662        Inf
as.factor(year)3   -5.513        Inf
as.factor(year)4   -6.861        Inf
as.factor(year)5   -8.692        Inf
wealthHighSES      -0.784        Inf

(Dispersion parameter for binomial family taken to be 0.98259)

Number of Fisher Scoring iterations: 11

datapred <- expand.grid(year = 1:5, wealth = c(0, 1))

datapred$pred <- as.numeric(predict(mod2, newdata = datapred, type = "response"))

datapred %>% ggplot(aes(x = year, y = pred, color = factor(wealth), group = factor(wealth))) + 
    geom_line() + ggtitle(label = "Hazard of underfive mortality by defined duration and birth category")

0.5 Acknowledgements

This exercise was inspired by Dr. Corey Sparks’ lab session on “DEM 7223: Event History Analysis” Fall 2020 coursework, and his compendium of easy-to-follow instructional materials accessible at https://rpubs.com/corey_sparks.
The Demographic and Health Survey data can be downloaded with permission from the DHS Program at https://www.dhsprogram.com/data/.
R package https://www.R-project.org/ was used for data wrangling and analysis.

Created on October 18, 2020