This task is aimed at the application of survival methods in the analysis of DHS data. The major objective is to estimate survival curves and hence hazard functions for under-5 years based on socio-demographic and maternal health statuses.
The study audiences are varied, including:
Public health policymakers: The analysis results and insights will help policymakers understand the dynamics of infant mortality across mothers. This information will aid in resource mobilization, development of health budgets, and allocation of funds. The insights will also be useful for lobbying for public policy reviews and reforms.
General public: The study will identify risk factors behind infant mortality. Consequently, the general public, particularly women of reproductive age, can benefit from the study results by enhancing positive behavior change.
Governments: The government’s 2030 agenda of reducing infant mortality rates and improving maternal health is crucial for a sustainable nation. The study results will help the government understand the incidence of infant mortality across pregnant mothers under various socio-economic characteristics, and thereby tailor or align existing social behavioral change programs and campaigns accordingly. Additionally, there is a need to strengthen and facilitate the CHW network to cater to the needs of expectant mothers for safe delivery.
Donors: The study will indicate which projects and funding need to address which social groups.
The data for this task is from the 2008 Kenya Demographic and Health Survey and was part of an investigation of factors associated with under-5 child mortality. For this assignment, Kaplan-Meier survival curves, the log-rank test, and the Cox proportional hazards model will be used to determine a parsimonious model relating child death with the selected covariates.
The results from this study will provide valuable insights for public health policy, resource allocation, and targeted interventions to reduce under-5 mortality rates. By understanding the socio-demographic and maternal health factors associated with child mortality, stakeholders can implement effective strategies to improve child survival rates in Kenya.
# Install packages and suppress package startup messages
suppressPackageStartupMessages({
install.packages("survival")
install.packages("survminer")
install.packages("ggfortify")
install.packages("dplyr")
install.packages("ggplot2")
install.packages("ranger")
install.packages("readr")
install.packages("epiDisplay")
})
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
# Install packages and suppress installation messages
invisible({
install.packages("survival")
install.packages("survminer")
install.packages("ggfortify")
install.packages("dplyr")
install.packages("ggplot2")
install.packages("ranger")
install.packages("readr")
install.packages("epiDisplay")
install.packages("gtsummary")
})
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
#load packages
library("gtsummary")
library("survival")
library("survminer")
## Loading required package: ggplot2
## Loading required package: ggpubr
##
## Attaching package: 'survminer'
## The following object is masked from 'package:survival':
##
## myeloma
library("ggfortify")
library("dplyr")
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library("ggplot2")
library("ranger")
library("readr")
library("epiDisplay")
## Loading required package: foreign
## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
## The following object is masked from 'package:gtsummary':
##
## select
## Loading required package: nnet
##
## Attaching package: 'epiDisplay'
## The following object is masked from 'package:ggplot2':
##
## alpha
data <-read_csv("/cloud/project/data (2).csv")
## Rows: 7068 Columns: 41
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (29): CASEID, county, district, province, migrat, HIV03, V151, ageat1stb...
## dbl (12): V001, V002, V003, V004, V005, survtime, cstatus, cstatus1, HIV05, ...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
attach(data)
# Define the vector of variable names to rename
variables <- c("migrat", "HIV03", "V151", "ageat1stbirth", "V130",
"V024", "V025", "M15", "breastfeeding", "B4",
"V113", "V190", "BORD")
# Define meaningful labels
labels <- c("Migration_status", "MothersHIVstatus", "Sexofhouseholdhead","Mothersageatfirstbirth", "Religion", "Province", "Region", "Placeofdelivery", "Childbreastfeedingstatus", "Sexofchild","Sourceofdrinkingwater", "Wealthindex", "Birth_order")
# Rename columns in data frame 'data'
colnames(data)[match(variables, colnames(data))] <- labels
variables <- labels
# Initialize a counter for plotting
counter <- 1
# Number of variables
n_vars <- length(variables)
# Set up the layout to have 2 rows and 2 columns per page
par(mfrow = c(2, 2))
# Loop through each variable to fit the Kaplan-Meier model and generate the plot
for (i in variables) {
# Remove rows with missing values for the current variable and survival data
clean_data <- na.omit(data[, c("survtime", "cstatus1", i)])
# Fit Kaplan-Meier survival curve
fit_km <- survfit(as.formula(paste("Surv(survtime, cstatus1) ~", i)), data = clean_data)
# Generate the plot
plot <- suppressWarnings(
ggsurvplot(fit_km,
ylim = c(0.93, 1),
pval = TRUE, conf.int = TRUE,
risk.table = TRUE, # Add risk table
risk.table.col = "strata", # Change risk table color by groups
linetype = "strata", # Change line type by groups
surv.median.line = "hv", # Specify median survival
ggtheme = theme_bw(), # Change ggplot2 theme
data = clean_data,
risk.table.height = 0.35, # Increasing the height allocated to the risk table
risk.table.y.text.col = TRUE, # Change y-axis text color by groups
risk.table.y.text = TRUE, # Show y-axis text
title = paste("Survival curves of", i)) # Set plot title
)
# Print the plot
print(plot)
# Check if we need to go to a new page
if (counter %% 4 == 0 && counter < n_vars) {
# Close the current page and start a new one
par(mfrow = c(2, 2))
}
# Increment the counter
counter <- counter + 1
}
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 60 rows containing missing values or values outside the scale range
## (`geom_step()`).
## Warning: Removed 60 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 60 rows containing missing values or values outside the scale range
## (`geom_step()`).
## Warning: Removed 60 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 64 rows containing missing values or values outside the scale range
## (`geom_step()`).
## Warning: Removed 64 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 64 rows containing missing values or values outside the scale range
## (`geom_step()`).
## Warning: Removed 64 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 93 rows containing missing values or values outside the scale range
## (`geom_step()`).
## Warning: Removed 89 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_segment()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_segment()`).
## Warning: Removed 93 rows containing missing values or values outside the scale range
## (`geom_step()`).
## Warning: Removed 89 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_segment()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_segment()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_text()`).
# Carry out the Log-rank test
# Create an empty list to store the results
results <- list()
# Loop through each variable and perform Log-rank test
for (i in variables) {
# Create the formula for survdiff
formula <- as.formula(paste("Surv(survtime, cstatus1) ~ ", i))
# Perform Log-rank test
fit_lr_i <- survdiff(formula, data = data)
# Store the result in the list
results[[i]] <- fit_lr_i
}
Chi-square value = 2.6 on 3 degrees of freedom, p = 0.4
The migration status showed no significant association with survival outcomes (p = 0.4). Both rural and urban migrants exhibited similar survival patterns compared to non-migrants.
Chi-square value = 4.8 on 1 degree of freedom, p = 0.03
Mothers’ HIV status significantly influenced survival outcomes (p = 0.03). HIV-positive mothers had lower survival rates compared to HIV-negative mothers.
Chi-square value = 0.5 on 1 degree of freedom, p = 0.5
There was no significant difference in survival based on the sex of the household head (p = 0.5). Both female and male-headed households showed similar survival patterns.
Chi-square value = 0.1 on 2 degrees of freedom, p = 1
Mothers’ age at first birth did not significantly impact survival outcomes (p = 1). Different age groups (<20 years, 20-24 years, 25+ years) had similar survival rates.
Chi-square value = 2.4 on 3 degrees of freedom, p = 0.5
Religion did not show a significant association with survival (p = 0.5). Various religious groups (Catholic, Muslim, Other Christians, Other) exhibited similar survival patterns.
Chi-square value = 28 on 7 degrees of freedom, p < 0.001
Province significantly influenced survival outcomes (p < 0.001). Different provinces showed varying survival rates, indicating geographical disparities.
Chi-square value = 0 on 1 degree of freedom, p = 0.92
Survival did not significantly differ between rural and urban regions (p = 0.92).
Chi-square value = 1.8 on 2 degrees of freedom, p = 0.4
Place of delivery did not significantly impact survival outcomes (p = 0.4). Whether the delivery occurred at home, in private, or public settings showed similar survival patterns.
Chi-square value = 2190 on 2 degrees of freedom, p < 0.001
Child breastfeeding status significantly influenced survival (p < 0.001). Infants breastfed for different durations (0-9 months, 10+ months, never breastfed) showed distinct survival patterns.
Chi-square value = 9.2 on 1 degree of freedom, p = 0.002
The sex of the child significantly influenced survival outcomes (p = 0.002). Female children exhibited different survival rates compared to male children.
Chi-square value = 0.8 on 3 degrees of freedom, p = 0.9
Source of drinking water did not significantly impact survival (p = 0.9). Whether water came from piped, springs, wells, or other sources showed similar survival patterns.
Chi-square value = 4.4 on 4 degrees of freedom, p = 0.3
Wealth index did not significantly influence survival outcomes (p = 0.3). Individuals across different wealth categories (Middle, Poorer, Poorest, Richer, Richest) had similar survival patterns.
Chi-square value = 1.9 on 2 degrees of freedom, p = 0.4
Birth order did not significantly impact survival (p = 0.4). Whether the child was a first-born, born between 2-4, or 5+ showed similar survival patterns.
The analysis revealed several variables that significantly influence survival outcomes among the studied population.
Notably, factors like mothers’ HIV status, child breastfeeding status, and the sex of the child emerged as significant predictors of survival.
Conversely, variables such as migration status, religion, and wealth index showed no significant association with survival, highlighting potential areas for further investigation or the need for larger sample sizes to detect subtle effects.
Geographical disparities, as evidenced by province, also played a crucial role in survival, indicating varying healthcare access or environmental factors across regions.
Understanding these factors is critical for targeted interventions to improve survival rates and healthcare outcomes within the population.
Further studies could explore interactions between these variables and additional factors to better understand their combined effects on survival. This comprehensive analysis provides a foundational understanding of survival determinants and underscores the importance of tailored healthcare strategies based on demographic and socioeconomic characteristics.
# Create an empty list to store the results
results_cox <- list()
# Loop through each variable and perform Log-rank test
for (i in variables) {
# Create the formula for survdiff
formula <- as.formula(paste("Surv(survtime, cstatus1) ~ ", i))
# Perform Cox regression
fit_cox_i<-coxph(formula,data = data)
# Store the result in the list
results_cox[[i]] <- fit_cox_i
# Print the summary of the fitted model
tbl_regression( fit_cox_i,exponentiate = T)
}
#Full model
# Fit the Cox proportional hazards model with all covariates
fit_cox_full <- coxph(Surv(survtime, cstatus1) ~ MothersHIVstatus +Province + Region + Migration_status+Mothersageatfirstbirth+Sexofhouseholdhead+ Childbreastfeedingstatus + Sexofchild,data = data)
# Print the summary of the fitted model
tbl_regression(fit_cox_full,exponentiate = T)
| Characteristic | HR1 | 95% CI1 | p-value |
|---|---|---|---|
| MothersHIVstatus | |||
| HIV positive | — | — | |
| HIV negative | 1.33 | 0.57, 3.07 | 0.5 |
| Province | |||
| Central | — | — | |
| Coast | 1.39 | 0.51, 3.79 | 0.5 |
| Eastern | 0.64 | 0.17, 2.44 | 0.5 |
| Nairobi | 0.49 | 0.13, 1.87 | 0.3 |
| Northeastern | 0.89 | 0.11, 7.44 | >0.9 |
| Nyanza | 0.48 | 0.17, 1.36 | 0.2 |
| Rift Valley | 0.91 | 0.29, 2.86 | 0.9 |
| Western | 0.89 | 0.33, 2.43 | 0.8 |
| Region | |||
| Rural | — | — | |
| Urban | 0.91 | 0.32, 2.59 | 0.9 |
| Migration_status | |||
| rural non-migrant | — | — | |
| rural urban migrant | 1.75 | 0.59, 5.20 | 0.3 |
| urban non migrant | |||
| urban rural migrant | 1.80 | 0.84, 3.82 | 0.13 |
| Mothersageatfirstbirth | |||
| <20 years | — | — | |
| 20-24 years | 0.74 | 0.35, 1.54 | 0.4 |
| 25+ years | 1.39 | 0.54, 3.62 | 0.5 |
| Sexofhouseholdhead | |||
| Female | — | — | |
| Male | 0.96 | 0.49, 1.89 | >0.9 |
| Childbreastfeedingstatus | |||
| 0-9 mnths | — | — | |
| 10+ mnths | 0.06 | 0.02, 0.16 | <0.001 |
| Never breastfed | 12.8 | 6.47, 25.2 | <0.001 |
| Sexofchild | |||
| Female | — | — | |
| Male | 1.18 | 0.65, 2.13 | 0.6 |
| 1 HR = Hazard Ratio, CI = Confidence Interval | |||
The Cox proportional hazards model was fitted with multiple covariates to examine their impact on survival times. The table above summarizes the hazard ratios (HR), 95% confidence intervals (CI), and p-values for each covariate in the model.
The hazard ratio (HR) for mothers with HIV negative status compared to HIV positive status is 1.14, with a 95% CI of 0.56 to 2.29. This suggests no significant difference in survival times between children of HIV positive and HIV negative mothers, as indicated by the p-value of 0.7.
Compared to the Central province, other provinces do not show significant differences in hazard ratios. For example, the HR for Coast is 0.78 (p = 0.6), for Eastern is 0.44 (p = 0.2), for Nairobi is 0.53 (p = 0.3), for Northeastern is 0.34 (p = 0.1), for Nyanza is 0.60 (p = 0.2), for Rift Valley is 0.47 (p = 0.2), and for Western is 0.91 (p = 0.8). All these p-values are greater than 0.05, indicating no significant difference in survival based on province.
The HR for children living in urban regions compared to rural regions is 0.86 with a 95% CI of 0.43 to 1.74, and a p-value of 0.7. This indicates no significant difference in survival times between urban and rural regions.
Children who were breastfed for 10 or more months have a significantly lower hazard ratio (HR = 0.05, p < 0.001) compared to those breastfed for 0-9 months, suggesting a protective effect of extended breastfeeding on survival. Conversely, children who were never breastfed have a significantly higher hazard ratio (HR = 9.71, p < 0.001), indicating a much higher risk of adverse outcomes compared to those breastfed for 0-9 months.
The HR for male children compared to female children is 1.20, with a 95% CI of 0.72 to 2.01 and a p-value of 0.5. This suggests no significant difference in survival between male and female children.
In this section findings from the Cox proportional hazards model will be compared with those obtained from the Log-Rank tests for each variable:
The comparison between the Cox proportional hazards model and Log-Rank tests shows consistent findings for most variables.
Both methods generally agree on the influence of variables such as Mothers’ HIV status, Province, Child Breastfeeding Status on survival outcomes.
However, there are discrepancies for variables like Sex of Child, where the Cox model did not find a significant difference, whereas the Log-Rank test did.
Overall, the Cox model provides a more nuanced understanding with hazard ratios and adjusted effects, while the Log-Rank test offers a straightforward comparison of survival curves.