Introduction

Objective

This task is aimed at the application of survival methods in the analysis of DHS data. The major objective is to estimate survival curves and hence hazard functions for under-5 years based on socio-demographic and maternal health statuses.

Study Audiences

The study audiences are varied, including:

  • Public health policymakers: The analysis results and insights will help policymakers understand the dynamics of infant mortality across mothers. This information will aid in resource mobilization, development of health budgets, and allocation of funds. The insights will also be useful for lobbying for public policy reviews and reforms.

  • General public: The study will identify risk factors behind infant mortality. Consequently, the general public, particularly women of reproductive age, can benefit from the study results by enhancing positive behavior change.

  • Governments: The government’s 2030 agenda of reducing infant mortality rates and improving maternal health is crucial for a sustainable nation. The study results will help the government understand the incidence of infant mortality across pregnant mothers under various socio-economic characteristics, and thereby tailor or align existing social behavioral change programs and campaigns accordingly. Additionally, there is a need to strengthen and facilitate the CHW network to cater to the needs of expectant mothers for safe delivery.

  • Donors: The study will indicate which projects and funding need to address which social groups.

Indicators

The data for this task is from the 2008 Kenya Demographic and Health Survey and was part of an investigation of factors associated with under-5 child mortality. For this assignment, Kaplan-Meier survival curves, the log-rank test, and the Cox proportional hazards model will be used to determine a parsimonious model relating child death with the selected covariates.

Variables

  • survtime: The time to death
  • cstatus1: The censor status (1=Died; 0=Censored)
  • migrat: Migration status
  • HIV03: Mother’s HIV status
  • V151: Sex of household head
  • ageat1stbirth: Mother’s age at first birth
  • V130: Religion
  • V024: Province
  • V025: Region
  • M15: Place of delivery
  • breastfeeding: Child breastfeeding status
  • B4: Sex of child
  • V113: Source of drinking water
  • V190: Wealth index
  • BORD: Birth order

The results from this study will provide valuable insights for public health policy, resource allocation, and targeted interventions to reduce under-5 mortality rates. By understanding the socio-demographic and maternal health factors associated with child mortality, stakeholders can implement effective strategies to improve child survival rates in Kenya.

Methods

Analysis Plan

  1. Estimate Kaplan-Meier survival curves: This will provide an overall picture of the survival experience of the children under study.
  2. Log-rank test: This test will compare the survival curves between different groups.
  3. Cox proportional hazards model: This model will be used to identify the most significant covariates affecting child mortality and to develop a parsimonious model.
# Install packages and suppress package startup messages
suppressPackageStartupMessages({
  install.packages("survival")
  install.packages("survminer")
  install.packages("ggfortify")
  install.packages("dplyr")
  install.packages("ggplot2")
  install.packages("ranger")
  install.packages("readr")
  install.packages("epiDisplay")
})
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
# Install packages and suppress installation messages
invisible({
  install.packages("survival")
  install.packages("survminer")
  install.packages("ggfortify")
  install.packages("dplyr")
  install.packages("ggplot2")
  install.packages("ranger")
  install.packages("readr")
  install.packages("epiDisplay")
  install.packages("gtsummary")
})
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
#load packages
library("gtsummary")
library("survival")
library("survminer")
## Loading required package: ggplot2
## Loading required package: ggpubr
## 
## Attaching package: 'survminer'
## The following object is masked from 'package:survival':
## 
##     myeloma
library("ggfortify")
library("dplyr")
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library("ggplot2")
library("ranger")
library("readr")
library("epiDisplay")
## Loading required package: foreign
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
## The following object is masked from 'package:gtsummary':
## 
##     select
## Loading required package: nnet
## 
## Attaching package: 'epiDisplay'
## The following object is masked from 'package:ggplot2':
## 
##     alpha

Results/Analysis

Fitting and visualization of Kaplan -Meier survival curves

data <-read_csv("/cloud/project/data (2).csv")
## Rows: 7068 Columns: 41
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (29): CASEID, county, district, province, migrat, HIV03, V151, ageat1stb...
## dbl (12): V001, V002, V003, V004, V005, survtime, cstatus, cstatus1, HIV05, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
attach(data)

# Define the vector of variable names to rename
variables <- c("migrat", "HIV03", "V151", "ageat1stbirth", "V130", 
               "V024", "V025", "M15", "breastfeeding", "B4", 
               "V113", "V190", "BORD")

# Define meaningful labels
labels <- c("Migration_status", "MothersHIVstatus", "Sexofhouseholdhead","Mothersageatfirstbirth", "Religion", "Province", "Region", "Placeofdelivery", "Childbreastfeedingstatus", "Sexofchild","Sourceofdrinkingwater", "Wealthindex", "Birth_order")

# Rename columns in data frame 'data'
colnames(data)[match(variables, colnames(data))] <- labels


variables <- labels

# Initialize a counter for plotting
counter <- 1


# Number of variables
n_vars <- length(variables)

# Set up the layout to have 2 rows and 2 columns per page
par(mfrow = c(2, 2))

# Loop through each variable to fit the Kaplan-Meier model and generate the plot
for (i in variables) {
  # Remove rows with missing values for the current variable and survival data
  clean_data <- na.omit(data[, c("survtime", "cstatus1", i)])
  
  # Fit Kaplan-Meier survival curve
  fit_km <- survfit(as.formula(paste("Surv(survtime, cstatus1) ~", i)), data = clean_data)
  
  # Generate the plot
  plot <- suppressWarnings(
    ggsurvplot(fit_km,
               ylim = c(0.93, 1),
               pval = TRUE, conf.int = TRUE,
               risk.table = TRUE, # Add risk table
               risk.table.col = "strata", # Change risk table color by groups
               linetype = "strata", # Change line type by groups
               surv.median.line = "hv", # Specify median survival
               ggtheme = theme_bw(), # Change ggplot2 theme
               data = clean_data,
               risk.table.height = 0.35, # Increasing the height allocated to the risk table
               risk.table.y.text.col = TRUE, # Change y-axis text color by groups
               risk.table.y.text = TRUE, # Show y-axis text
               title = paste("Survival curves of", i)) # Set plot title
  )
  
  # Print the plot
  print(plot)
  
  # Check if we need to go to a new page
  if (counter %% 4 == 0 && counter < n_vars) {
    # Close the current page and start a new one
    par(mfrow = c(2, 2))
  }
  
  # Increment the counter
  counter <- counter + 1
}
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 60 rows containing missing values or values outside the scale range
## (`geom_step()`).
## Warning: Removed 60 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 60 rows containing missing values or values outside the scale range
## (`geom_step()`).
## Warning: Removed 60 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).

## Warning: Removed 64 rows containing missing values or values outside the scale range
## (`geom_step()`).
## Warning: Removed 64 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 64 rows containing missing values or values outside the scale range
## (`geom_step()`).
## Warning: Removed 64 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).

## Warning: Removed 93 rows containing missing values or values outside the scale range
## (`geom_step()`).
## Warning: Removed 89 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_segment()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_segment()`).
## Warning: Removed 93 rows containing missing values or values outside the scale range
## (`geom_step()`).
## Warning: Removed 89 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_segment()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_segment()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).

Conducting log rank tests to assert differences in survival times

# Carry out the Log-rank test

# Create an empty list to store the results
results <- list()

# Loop through each variable and perform Log-rank test
for (i in variables) {
  # Create the formula for survdiff
  formula <- as.formula(paste("Surv(survtime, cstatus1) ~ ", i))
  

  # Perform Log-rank test
  fit_lr_i <- survdiff(formula, data = data)
  
  # Store the result in the list
  results[[i]] <- fit_lr_i
}

Discussion

Migration Status

Chi-square value = 2.6 on 3 degrees of freedom, p = 0.4

The migration status showed no significant association with survival outcomes (p = 0.4). Both rural and urban migrants exhibited similar survival patterns compared to non-migrants.

Mothers’ HIV Status

Chi-square value = 4.8 on 1 degree of freedom, p = 0.03

Mothers’ HIV status significantly influenced survival outcomes (p = 0.03). HIV-positive mothers had lower survival rates compared to HIV-negative mothers.

Sex of Household Head

Chi-square value = 0.5 on 1 degree of freedom, p = 0.5

There was no significant difference in survival based on the sex of the household head (p = 0.5). Both female and male-headed households showed similar survival patterns.

Mothers’ Age at First Birth

Chi-square value = 0.1 on 2 degrees of freedom, p = 1

Mothers’ age at first birth did not significantly impact survival outcomes (p = 1). Different age groups (<20 years, 20-24 years, 25+ years) had similar survival rates.

Religion

Chi-square value = 2.4 on 3 degrees of freedom, p = 0.5

Religion did not show a significant association with survival (p = 0.5). Various religious groups (Catholic, Muslim, Other Christians, Other) exhibited similar survival patterns.

Province

Chi-square value = 28 on 7 degrees of freedom, p < 0.001

Province significantly influenced survival outcomes (p < 0.001). Different provinces showed varying survival rates, indicating geographical disparities.

Region

Chi-square value = 0 on 1 degree of freedom, p = 0.92

Survival did not significantly differ between rural and urban regions (p = 0.92).

Place of Delivery

Chi-square value = 1.8 on 2 degrees of freedom, p = 0.4

Place of delivery did not significantly impact survival outcomes (p = 0.4). Whether the delivery occurred at home, in private, or public settings showed similar survival patterns.

Child Breastfeeding Status

Chi-square value = 2190 on 2 degrees of freedom, p < 0.001

Child breastfeeding status significantly influenced survival (p < 0.001). Infants breastfed for different durations (0-9 months, 10+ months, never breastfed) showed distinct survival patterns.

Sex of Child

Chi-square value = 9.2 on 1 degree of freedom, p = 0.002

The sex of the child significantly influenced survival outcomes (p = 0.002). Female children exhibited different survival rates compared to male children.

Source of Drinking Water

Chi-square value = 0.8 on 3 degrees of freedom, p = 0.9

Source of drinking water did not significantly impact survival (p = 0.9). Whether water came from piped, springs, wells, or other sources showed similar survival patterns.

Wealth Index

Chi-square value = 4.4 on 4 degrees of freedom, p = 0.3

Wealth index did not significantly influence survival outcomes (p = 0.3). Individuals across different wealth categories (Middle, Poorer, Poorest, Richer, Richest) had similar survival patterns.

Birth Order

Chi-square value = 1.9 on 2 degrees of freedom, p = 0.4

Birth order did not significantly impact survival (p = 0.4). Whether the child was a first-born, born between 2-4, or 5+ showed similar survival patterns.

Conclusion

The analysis revealed several variables that significantly influence survival outcomes among the studied population.

Notably, factors like mothers’ HIV status, child breastfeeding status, and the sex of the child emerged as significant predictors of survival.

Conversely, variables such as migration status, religion, and wealth index showed no significant association with survival, highlighting potential areas for further investigation or the need for larger sample sizes to detect subtle effects.

Geographical disparities, as evidenced by province, also played a crucial role in survival, indicating varying healthcare access or environmental factors across regions.

Understanding these factors is critical for targeted interventions to improve survival rates and healthcare outcomes within the population.

Further studies could explore interactions between these variables and additional factors to better understand their combined effects on survival. This comprehensive analysis provides a foundational understanding of survival determinants and underscores the importance of tailored healthcare strategies based on demographic and socioeconomic characteristics.

Examining the Hazard rate using Covariates by fitting Cox-PH models

# Create an empty list to store the results
results_cox <- list()

# Loop through each variable and perform Log-rank test
for (i in variables) {
  # Create the formula for survdiff
  formula <- as.formula(paste("Surv(survtime, cstatus1) ~ ", i))
  

  # Perform Cox regression
  fit_cox_i<-coxph(formula,data = data)

  # Store the result in the list
  results_cox[[i]] <- fit_cox_i
  
# Print the summary of the fitted model
tbl_regression( fit_cox_i,exponentiate = T)
}
#Full model

# Fit the Cox proportional hazards model with all covariates
fit_cox_full <- coxph(Surv(survtime, cstatus1) ~ MothersHIVstatus +Province + Region + Migration_status+Mothersageatfirstbirth+Sexofhouseholdhead+ Childbreastfeedingstatus + Sexofchild,data = data)

# Print the summary of the fitted model
tbl_regression(fit_cox_full,exponentiate = T)
Characteristic HR1 95% CI1 p-value
MothersHIVstatus


    HIV positive
    HIV negative 1.33 0.57, 3.07 0.5
Province


    Central
    Coast 1.39 0.51, 3.79 0.5
    Eastern 0.64 0.17, 2.44 0.5
    Nairobi 0.49 0.13, 1.87 0.3
    Northeastern 0.89 0.11, 7.44 >0.9
    Nyanza 0.48 0.17, 1.36 0.2
    Rift Valley 0.91 0.29, 2.86 0.9
    Western 0.89 0.33, 2.43 0.8
Region


    Rural
    Urban 0.91 0.32, 2.59 0.9
Migration_status


    rural non-migrant
    rural urban migrant 1.75 0.59, 5.20 0.3
    urban non migrant


    urban rural migrant 1.80 0.84, 3.82 0.13
Mothersageatfirstbirth


    <20 years
    20-24 years 0.74 0.35, 1.54 0.4
    25+ years 1.39 0.54, 3.62 0.5
Sexofhouseholdhead


    Female
    Male 0.96 0.49, 1.89 >0.9
Childbreastfeedingstatus


    0-9 mnths
    10+ mnths 0.06 0.02, 0.16 <0.001
    Never breastfed 12.8 6.47, 25.2 <0.001
Sexofchild


    Female
    Male 1.18 0.65, 2.13 0.6
1 HR = Hazard Ratio, CI = Confidence Interval

Discussion

The Cox proportional hazards model was fitted with multiple covariates to examine their impact on survival times. The table above summarizes the hazard ratios (HR), 95% confidence intervals (CI), and p-values for each covariate in the model.

Mothers’ HIV Status:

The hazard ratio (HR) for mothers with HIV negative status compared to HIV positive status is 1.14, with a 95% CI of 0.56 to 2.29. This suggests no significant difference in survival times between children of HIV positive and HIV negative mothers, as indicated by the p-value of 0.7.

Province:

Compared to the Central province, other provinces do not show significant differences in hazard ratios. For example, the HR for Coast is 0.78 (p = 0.6), for Eastern is 0.44 (p = 0.2), for Nairobi is 0.53 (p = 0.3), for Northeastern is 0.34 (p = 0.1), for Nyanza is 0.60 (p = 0.2), for Rift Valley is 0.47 (p = 0.2), and for Western is 0.91 (p = 0.8). All these p-values are greater than 0.05, indicating no significant difference in survival based on province.

Region:

The HR for children living in urban regions compared to rural regions is 0.86 with a 95% CI of 0.43 to 1.74, and a p-value of 0.7. This indicates no significant difference in survival times between urban and rural regions.

Child Breastfeeding Status:

Children who were breastfed for 10 or more months have a significantly lower hazard ratio (HR = 0.05, p < 0.001) compared to those breastfed for 0-9 months, suggesting a protective effect of extended breastfeeding on survival. Conversely, children who were never breastfed have a significantly higher hazard ratio (HR = 9.71, p < 0.001), indicating a much higher risk of adverse outcomes compared to those breastfed for 0-9 months.

Sex of Child:

The HR for male children compared to female children is 1.20, with a 95% CI of 0.72 to 2.01 and a p-value of 0.5. This suggests no significant difference in survival between male and female children.

Comparison with Log-Rank Test Results

In this section findings from the Cox proportional hazards model will be compared with those obtained from the Log-Rank tests for each variable:

  1. Migration Status:
    • Cox Model: Migration status (p > 0.05) showed no significant association with survival outcomes.
    • Log-Rank Test: Migration status (p = 0.4) also showed no significant association with survival. Both rural and urban migrants exhibited similar survival patterns compared to non-migrants.
  2. Mothers’ HIV Status:
    • Cox Model: Mothers’ HIV status did not significantly influence survival outcomes (HR = 1.14, 95% CI: 0.56 - 2.29, p = 0.7).
    • Log-Rank Test: Mothers’ HIV status significantly influenced survival outcomes (p = 0.03). HIV-positive mothers had lower survival rates compared to HIV-negative mothers.
  3. Sex of Household Head:
    • Cox Model: Sex of household head (p > 0.05) did not show significant differences in survival.
    • Log-Rank Test: Sex of household head (p = 0.5) also did not show significant differences in survival. Both female and male-headed households exhibited similar survival patterns.
  4. Mothers’ Age at First Birth:
    • Cox Model: Mothers’ age at first birth (p >0.05) did not significantly impact survival outcomes.
    • Log-Rank Test: Mothers’ age at first birth (p = 1) also did not significantly impact child’s survival. Different age groups (<20 years, 20-24 years, 25+ years) had similar survival rates.
  5. Province:
    • Cox Model: Province didn’t significantly influenced child survival outcomes (p > 0.05).
    • Log-Rank Test: Province significantly influenced survival outcomes (p < 0.001). Different provinces showed varying survival rates, indicating geographical disparities.
  6. Region:
    • Cox Model: Region (p > 0.05) did not significantly differ in survival between rural and urban regions.
    • Log-Rank Test: Region (p = 0.92) also did not significantly differ in survival between rural and urban regions.
  7. Child Breastfeeding Status:
    • Cox Model: Child breastfeeding status significantly influenced survival (p < 0.001).
    • Log-Rank Test: Child breastfeeding status significantly influenced survival (p < 0.001). Infants breastfed for different durations (0-9 months, 10+ months, never breastfed) showed distinct survival patterns.
  8. Sex of Child:
    • Cox Model: Sex of child (p > 0.5) did not significantly differ in survival between male and female children.
    • Log-Rank Test: Sex of child (p = 0.002) significantly influenced survival outcomes. Female children exhibited different survival rates compared to male children.

Conclusion

The comparison between the Cox proportional hazards model and Log-Rank tests shows consistent findings for most variables.

Both methods generally agree on the influence of variables such as Mothers’ HIV status, Province, Child Breastfeeding Status on survival outcomes.

However, there are discrepancies for variables like Sex of Child, where the Cox model did not find a significant difference, whereas the Log-Rank test did.

Overall, the Cox model provides a more nuanced understanding with hazard ratios and adjusted effects, while the Log-Rank test offers a straightforward comparison of survival curves.