The objective of this lab is to fit a Discrete-Time Hazards (DTH) model for under-five mortality in Nigeria. This exercise builds on the previous ones on procedures for estimating Time-to-Event Functions, Parametric Proportional Hazard Model, and Cox Proportional Hazards Model.
The pooled birth-recode data sets of 2003, 2008, 2013 and 2018 Nigeria Demographic and Health Surveys (NDHS) were used:
hwdata <- read_rds("../hwdata.Rda")
hwdata <- hwdata %>% select(strata, psu, iwt, d_age, status_60, status_60b, status_12,
status_12b, bcat01, sex, bsiz, unionstat, education, wealth, place, region2,
dhsyearf) %>% filter(complete.cases(hwdata))
hwdata$bcat01 <- factor(hwdata$bcat01, levels = c("Group 01", "Group 02", "Group 03",
"Group 04", "Group 05"), labels = c("Grp01", "Grp02", "Grp03", "Grp04", "Grp05"))
hwdata$wealth <- factor(hwdata$wealth, levels = c("Low", "High"), labels = c("LowSES",
"HighSES"))
hwdata$caseid <- c(1:nrow(hwdata))
hwdata$psu = paste(hwdata$dhsyearf, hwdata$psu., sep = "-")
hwdata$strata = paste(hwdata$dhsyearf, hwdata$strata., sep = "-")
hwdata$pwt = c(hwdata$iwt/1000000)The event variable is whether the last live birth had by a woman in the 59 months preceding the survey is alive (censored) or has died (uncensored). Hence, the observed time-to-event variable is the duration of survival between birth and death within the 5 years in retrospect; while the censoring indicator is the occurrence of under-five death.
The main grouping variable is birth category. This is operationalized as child’s demographic risk attribute at birth (DRAB); a composite variable derived from maternal age at birth, birth order and birth interval characteristics. The aggregate DRAB variable was measured and categorized as follow:
The resultant composite was coded and compared as Grp01 (0) vs Grp02 (1), Grp03 (2), Grp04 (3), and Grp05 (4).
** It is worthy to note that the DRAB categorization as presented here is for illustration purpose and subject to further refinement. Meanwhile, the categories was so constructed as the my interest is to investigate the effect of stage of childbearing age, especially the ages at the end of childbearing age spectrum, on early-life mortality experience. This is mainly conceived to capture and measure the impact of the mostly avoidable demographic risk associated with extreme maternal age-related fertility behaviors and compare the risk with those characterized by middle maternal age. Also, there were too few cases in birth order 1, and birth interval outside 24-59 range for early-age and advanced-age mothers.
I purposely selected few covariables which may influence the a child’s chances of survival, on one hand, and the relationship between main independent variable and the outcome, on the other hand. The potential variables are grouped into three categories:
As earlier stated, the focus of this exercise is to examine hazards of underfive mortality using discrete time approach, in which time/duration is discretely measured rather than been observed continuously, as applicable in previous exercises referenced above. By this method, I transformed the dataset from its original case-duration format to a person-period format using 12-month cut-points and the survSplit() embedded in survival library.
ppdata <- survSplit(Surv(d_age, status_60) ~ ., data = hwdata[hwdata$d_age > 0, ],
cut = c(0, 12, 24, 36, 48, 60), episode = "yeardead")
ppdata$year <- ppdata$yeardead - 1
freq(ppdata$year, cumul = F)Frequencies
ppdata$year
Type: Numeric
Freq % Valid % Total
----------- -------- --------- ---------
1 58844 44.64 44.64
2 38103 28.90 28.90
3 21102 16.01 16.01
4 10012 7.59 7.59
5 3771 2.86 2.86
<NA> 0 0.00
Total 131832 100.00 100.00
ppdata <- ppdata[order(ppdata$caseid, ppdata$yeardead), ]
head(ppdata[, c("caseid", "year", "status_60", "d_age", "bcat01", "sex", "education",
"wealth", "place")], n = 30)ppdata %>% group_by(year) %>% summarise(propdeadyr = mean(status_60, na.rm = T)) %>%
ggplot(aes(x = year, y = propdeadyr)) + geom_line() + theme(legend.title = element_text(colour = "red",
size = 9, face = "bold")) + theme(legend.position = "right") + ggtitle(label = "Hazard of underfive mortality by year")ppdata %>% group_by(year, bcat01) %>% summarise(propdeadyr = mean(status_60, na.rm = T)) %>%
ggplot(aes(x = year, y = propdeadyr)) + geom_line(aes(group = factor(bcat01),
color = factor(bcat01))) + theme(legend.title = element_text(colour = "red",
size = 9, face = "bold")) + theme(legend.position = "right") + ggtitle(label = "Hazard of underfive mortality by year and birth category")Also, I examined the survey-weighted pattern of survival status by the characteristics of children who had died to understand the pattern of risks that might inform the DTH model estimates. Meanwhile, the new person-period data was used to specify the survey design parameter.
dhsdes <- svydesign(id = ~psu, strata = ~strata, data = ppdata, weights = ~pwt)
# Survival status distribution by birth classification:
a = round((prop.table(svytable(~bcat01 + status_60b, design = dhsdes), 1) * 100),
1)
# Survival status distribution by sex:
b = round((prop.table(svytable(~sex + status_60b, design = dhsdes), 1) * 100), 1)
# Survival status distribution by level of education:
c = round((prop.table(svytable(~education + status_60b, design = dhsdes), 1) * 100),
1)
# Survival status distribution by wealth status:
d = round((prop.table(svytable(~wealth + status_60b, design = dhsdes), 1) * 100),
1)
# Survival status distribution by place of residence:
e = round((prop.table(svytable(~place + status_60b, design = dhsdes), 1) * 100),
1)
tab1 = data.frame(rbind(a, b, c, d, e))
tab1First ten uncensored observations:
head(hwdata[hwdata$status_60 == 1, c("d_age", "bcat01", "sex", "education", "wealth",
"place")], n = 10)Last ten uncensored observations:
tail(hwdata[hwdata$status_60 == 1, c("d_age", "bcat01", "sex", "education", "wealth",
"place")], n = 10)The base DTH model takes account of time variable only, removing the intercept in the model fit to obtain hazard estimate for each time period:
# I do -1 so that no intercept is fit in the model, and we get a hazard estimate
# for each time period
mod0 <- svyglm(status_60 ~ as.factor(year) - 1, design = dhsdes, family = "binomial")
summary(mod0)
Call:
svyglm(formula = status_60 ~ as.factor(year) - 1, design = dhsdes,
family = "binomial")
Survey design:
svydesign(id = ~psu, strata = ~strata, data = ppdata, weights = ~pwt)
Coefficients:
Estimate Std. Error
as.factor(year)1 -3.80 Inf
as.factor(year)2 -3.87 Inf
as.factor(year)3 -5.74 Inf
as.factor(year)4 -7.11 Inf
as.factor(year)5 -8.97 Inf
(Dispersion parameter for binomial family taken to be 1)
Number of Fisher Scoring iterations: 10
The results from the base DTH model suggest a progressively decreasing hazards of dying among underfive as the observation years increase.
I plotted the hazard function of the base DTH on the probability scale:
haz <- 1/(1 + exp(-coef(mod0)))
time <- seq(1, 5, 1)
plot(haz ~ time, type = "l", ylab = "h(t)")
title(main = "Base DTH function for underfive mortality") The plot confirms the significantly decreasing pattern observed based on the model estimates.
I examine the multivariate proportional hazards model estimates with which incorporated other selected correlates:
mod1 <- svyglm(status_60 ~ as.factor(year) + bcat01 + sex + education + wealth +
place - 1, design = dhsdes, family = binomial(link = "cloglog"))
summary(mod1)
Call:
svyglm(formula = status_60 ~ as.factor(year) + bcat01 + sex +
education + wealth + place - 1, design = dhsdes, family = binomial(link = "cloglog"))
Survey design:
svydesign(id = ~psu, strata = ~strata, data = ppdata, weights = ~pwt)
Coefficients:
Estimate Std. Error
as.factor(year)1 -3.830 Inf
as.factor(year)2 -3.904 Inf
as.factor(year)3 -5.763 Inf
as.factor(year)4 -7.118 Inf
as.factor(year)5 -8.955 Inf
bcat01Grp02 0.181 Inf
bcat01Grp03 0.408 Inf
bcat01Grp04 0.301 Inf
bcat01Grp05 0.479 Inf
sexMale 0.108 Inf
educationPrimary -0.142 Inf
educationIncompSec -0.233 Inf
educationCompSec/High -0.659 Inf
wealthHighSES -0.362 Inf
placeUrban -0.178 Inf
(Dispersion parameter for binomial family taken to be 0.95803)
Number of Fisher Scoring iterations: 11
Aside the consistent reduction in hazards of dying across the defined time periods, the hazards varied significantly by birth classification, sex of the child, mother’s education attainment, wealth status and place of residence. For instance, the risks of deaths were highest among children in Grp03 and Grp05 with 50.38% and 61.45% significantly higher hazards of underfive mortality follwed by those in Grp04 compared with their Grp01 peers. In contrast, children born to highly educated, high ses and urban women had approximately -48.26%, -48.26%, and -48.26% lower mortality risks relative to those born to lowly educated, low ses and rural women, respectively.
I fitted a predictive DTH model with single independent variable and examined the within-between pattern of proportional mortality hazards:
mod2 <- svyglm(status_60 ~ as.factor(year) + wealth - 1, design = dhsdes, family = binomial(link = "cloglog"))
summary(mod2)
Call:
svyglm(formula = status_60 ~ as.factor(year) + wealth - 1, design = dhsdes,
family = binomial(link = "cloglog"))
Survey design:
svydesign(id = ~psu, strata = ~strata, data = ppdata, weights = ~pwt)
Coefficients:
Estimate Std. Error
as.factor(year)1 -3.595 Inf
as.factor(year)2 -3.662 Inf
as.factor(year)3 -5.513 Inf
as.factor(year)4 -6.861 Inf
as.factor(year)5 -8.692 Inf
wealthHighSES -0.784 Inf
(Dispersion parameter for binomial family taken to be 0.98259)
Number of Fisher Scoring iterations: 11
datapred <- expand.grid(year = 1:5, wealth = c(0, 1))
datapred$pred <- as.numeric(predict(mod2, newdata = datapred, type = "response"))
datapred %>% ggplot(aes(x = year, y = pred, color = factor(wealth), group = factor(wealth))) +
geom_line() + ggtitle(label = "Hazard of underfive mortality by defined duration and birth category")This exercise was inspired by Dr. Corey Sparks’ lab session on “DEM 7223: Event History Analysis” Fall 2020 coursework, and his compendium of easy-to-follow instructional materials accessible at https://rpubs.com/corey_sparks.
The Demographic and Health Survey data can be downloaded with permission from the DHS Program at https://www.dhsprogram.com/data/.
R package https://www.R-project.org/ was used for data wrangling and analysis.
Created on October 18, 2020