The topic I chose is maternal health mortality. The source of this dataset is the 2022 Kenya Demographic and Health Survey (KDHS) which was created to make available data on the population and health situation across Kenya’s various locations. The goal of this survey was to ensure there is available data on socio-demographics and health indicators to support planning, policy formulation and evaluation of various health projects and programmes.
The variables that I will use for this project are maternal death, education, county, age, skilled attendant, distance facility, place of delivery, urbanization, wealth quintile, marital status, post natal 48hrs visits, and antenatal care visits.
For this project, I would like to explore which variables predict maternal death the most by conducting a multiple logistic regression then explore how different counties in Kenya perform on one of the most significant variables which I chose education. Finally, I also created a forest plot to explore the estimates across all the variables.
I chose this topic because I have worked on maternal health in the past for a project implemented in Burkina Faso and enjoyed it a lot. I am also a mother and I was curious to see what affects maternal health especially in East Africa where I am from.
Loading libraries
I started by loading the necessary libraries and the dataset.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.0 ✔ readr 2.2.0
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.2 ✔ tibble 3.3.1
✔ lubridate 1.9.5 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
Rows: 6000 Columns: 22
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): county, education, place_delivery, skilled_attendant, marital_stat...
dbl (15): age, parity, anc_visits, c_section, postnatal48h, age_first_birth,...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
I cleaned the dataset by using mutate to name maternal_death, death and give it a number for when a woman experiences death and when she does not. Then I used select to remove variables that I will not work with for this project. Finally, I ensured that I remove all the nas from the dataset.
Since the data for maternal death was not a number and many other variables as well, I have used a multiple logistic regression instead of a multiple linear regression model.
full_model <-glm(data=maternal2, death~ age+parity+factor(education)+anc_visits+factor(place_delivery)+factor(c_section)+factor(skilled_attendant)+postnatal48h+factor(marital_status)+wealth_quintile+distance_facility+factor(urbanization), family =binomial())summary(full_model)
Call:
glm(formula = death ~ age + parity + factor(education) + anc_visits +
factor(place_delivery) + factor(c_section) + factor(skilled_attendant) +
postnatal48h + factor(marital_status) + wealth_quintile +
distance_facility + factor(urbanization), family = binomial(),
data = maternal2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.941967 0.331548 -11.890 < 2e-16 ***
age 0.036950 0.007464 4.950 7.41e-07 ***
parity 0.208829 0.031431 6.644 3.05e-11 ***
factor(education)None 1.353375 0.187112 7.233 4.73e-13 ***
factor(education)Primary 0.890106 0.163881 5.431 5.59e-08 ***
factor(education)Secondary 0.468377 0.170997 2.739 0.006161 **
anc_visits -0.152563 0.025943 -5.881 4.08e-09 ***
factor(place_delivery)Private -0.361366 0.150998 -2.393 0.016703 *
factor(place_delivery)Public -0.508643 0.133007 -3.824 0.000131 ***
factor(c_section)1 0.086451 0.131777 0.656 0.511796
factor(skilled_attendant)Skilled -0.338012 0.125864 -2.686 0.007241 **
postnatal48h -0.062283 0.094909 -0.656 0.511669
factor(marital_status)Not in union -0.002214 0.095982 -0.023 0.981593
wealth_quintile 0.066019 0.034674 1.904 0.056912 .
distance_facility 0.014392 0.003377 4.261 2.03e-05 ***
factor(urbanization)Urban 0.058919 0.095404 0.618 0.536852
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 3927.3 on 5999 degrees of freedom
Residual deviance: 3666.9 on 5984 degrees of freedom
AIC: 3698.9
Number of Fisher Scoring iterations: 5
Once I ran the multiple logistic regression, I was able to see the variables that are the most significant and those that I can eliminate to see whether the AIC would go down. I will eliminate one variable at a time and run it again.
Backward elimination of the regression model
The first one I eliminated is marital status and the AIC went down from 3698.9 to 3696.9.
model2 <-glm(data=maternal2, death~ age+parity+factor(education)+anc_visits+factor(place_delivery)+factor(c_section)+factor(skilled_attendant)+postnatal48h+wealth_quintile+distance_facility+factor(urbanization), family =binomial())summary(model2)
Call:
glm(formula = death ~ age + parity + factor(education) + anc_visits +
factor(place_delivery) + factor(c_section) + factor(skilled_attendant) +
postnatal48h + wealth_quintile + distance_facility + factor(urbanization),
family = binomial(), data = maternal2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.942865 0.329265 -11.975 < 2e-16 ***
age 0.036952 0.007464 4.951 7.39e-07 ***
parity 0.208835 0.031430 6.645 3.04e-11 ***
factor(education)None 1.353484 0.187053 7.236 4.63e-13 ***
factor(education)Primary 0.890211 0.163818 5.434 5.51e-08 ***
factor(education)Secondary 0.468468 0.170952 2.740 0.006137 **
anc_visits -0.152574 0.025939 -5.882 4.05e-09 ***
factor(place_delivery)Private -0.361388 0.150993 -2.393 0.016692 *
factor(place_delivery)Public -0.508633 0.133005 -3.824 0.000131 ***
factor(c_section)1 0.086508 0.131753 0.657 0.511445
factor(skilled_attendant)Skilled -0.337991 0.125859 -2.685 0.007243 **
postnatal48h -0.062264 0.094906 -0.656 0.511785
wealth_quintile 0.066034 0.034668 1.905 0.056812 .
distance_facility 0.014392 0.003377 4.262 2.03e-05 ***
factor(urbanization)Urban 0.058912 0.095403 0.618 0.536903
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 3927.3 on 5999 degrees of freedom
Residual deviance: 3666.9 on 5985 degrees of freedom
AIC: 3696.9
Number of Fisher Scoring iterations: 5
The second one I eliminated is postnatal 48h and the AIC went from 3696.9 to 3695.3.
model3 <-glm(data=maternal2, death~ age+parity+factor(education)+anc_visits+place_delivery+factor(c_section)+factor(skilled_attendant)+wealth_quintile+distance_facility+factor(urbanization), family =binomial())summary(model3)
Call:
glm(formula = death ~ age + parity + factor(education) + anc_visits +
place_delivery + factor(c_section) + factor(skilled_attendant) +
wealth_quintile + distance_facility + factor(urbanization),
family = binomial(), data = maternal2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.987797 0.322154 -12.379 < 2e-16 ***
age 0.037027 0.007463 4.961 7.00e-07 ***
parity 0.208395 0.031417 6.633 3.29e-11 ***
factor(education)None 1.352341 0.187037 7.230 4.82e-13 ***
factor(education)Primary 0.889791 0.163817 5.432 5.58e-08 ***
factor(education)Secondary 0.467320 0.170943 2.734 0.006261 **
anc_visits -0.152555 0.025943 -5.880 4.09e-09 ***
place_deliveryPrivate -0.358804 0.150996 -2.376 0.017489 *
place_deliveryPublic -0.506366 0.133022 -3.807 0.000141 ***
factor(c_section)1 0.084471 0.131735 0.641 0.521380
factor(skilled_attendant)Skilled -0.338128 0.125930 -2.685 0.007252 **
wealth_quintile 0.065960 0.034665 1.903 0.057066 .
distance_facility 0.014417 0.003376 4.270 1.95e-05 ***
factor(urbanization)Urban 0.058540 0.095396 0.614 0.539445
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 3927.3 on 5999 degrees of freedom
Residual deviance: 3667.3 on 5986 degrees of freedom
AIC: 3695.3
Number of Fisher Scoring iterations: 5
The third one I eliminated was urbanization and the AIC went down from 3695.3 to 3693.7.
model4 <-glm(data=maternal2, death~ age+parity+factor(education)+anc_visits+place_delivery+factor(c_section)+factor(skilled_attendant)+wealth_quintile+distance_facility, family =binomial())summary(model4)
Call:
glm(formula = death ~ age + parity + factor(education) + anc_visits +
place_delivery + factor(c_section) + factor(skilled_attendant) +
wealth_quintile + distance_facility, family = binomial(),
data = maternal2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.964819 0.319817 -12.397 < 2e-16 ***
age 0.036874 0.007458 4.945 7.63e-07 ***
parity 0.208118 0.031409 6.626 3.45e-11 ***
factor(education)None 1.351876 0.187025 7.228 4.89e-13 ***
factor(education)Primary 0.889120 0.163798 5.428 5.69e-08 ***
factor(education)Secondary 0.465916 0.170910 2.726 0.006409 **
anc_visits -0.152675 0.025946 -5.884 4.00e-09 ***
place_deliveryPrivate -0.359298 0.150942 -2.380 0.017295 *
place_deliveryPublic -0.505295 0.132964 -3.800 0.000145 ***
factor(c_section)1 0.084642 0.131724 0.643 0.520503
factor(skilled_attendant)Skilled -0.337355 0.125875 -2.680 0.007360 **
wealth_quintile 0.066312 0.034653 1.914 0.055673 .
distance_facility 0.014385 0.003375 4.262 2.03e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 3927.3 on 5999 degrees of freedom
Residual deviance: 3667.7 on 5987 degrees of freedom
AIC: 3693.7
Number of Fisher Scoring iterations: 5
The fourth one I eliminated was c-section and the AIC went from 3693.7 to 3692.1.
model5 <-glm(data=maternal2, death~ age+parity+factor(education)+anc_visits+factor(place_delivery)+factor(skilled_attendant)+wealth_quintile+distance_facility, family =binomial())summary(model5)
Call:
glm(formula = death ~ age + parity + factor(education) + anc_visits +
factor(place_delivery) + factor(skilled_attendant) + wealth_quintile +
distance_facility, family = binomial(), data = maternal2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.953483 0.319351 -12.380 < 2e-16 ***
age 0.036824 0.007454 4.940 7.81e-07 ***
parity 0.207915 0.031412 6.619 3.61e-11 ***
factor(education)None 1.349897 0.186981 7.219 5.22e-13 ***
factor(education)Primary 0.888834 0.163786 5.427 5.74e-08 ***
factor(education)Secondary 0.464399 0.170883 2.718 0.006575 **
anc_visits -0.152730 0.025940 -5.888 3.91e-09 ***
factor(place_delivery)Private -0.348764 0.150021 -2.325 0.020084 *
factor(place_delivery)Public -0.506923 0.132934 -3.813 0.000137 ***
factor(skilled_attendant)Skilled -0.336513 0.125868 -2.674 0.007506 **
wealth_quintile 0.066745 0.034640 1.927 0.054003 .
distance_facility 0.014365 0.003375 4.256 2.08e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 3927.3 on 5999 degrees of freedom
Residual deviance: 3668.1 on 5988 degrees of freedom
AIC: 3692.1
Number of Fisher Scoring iterations: 5
The last one I eliminated was wealth quintile but the AIC did not go down. One way to understand this is probably because the p value is borderline.
model6 <-glm(data=maternal2, death~ age+parity+factor(education)+anc_visits+factor(place_delivery)+factor(skilled_attendant)+distance_facility, family =binomial())summary(model6)
Call:
glm(formula = death ~ age + parity + factor(education) + anc_visits +
factor(place_delivery) + factor(skilled_attendant) + distance_facility,
family = binomial(), data = maternal2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.766814 0.303567 -12.409 < 2e-16 ***
age 0.036895 0.007457 4.948 7.51e-07 ***
parity 0.206968 0.031395 6.592 4.33e-11 ***
factor(education)None 1.349789 0.186909 7.222 5.14e-13 ***
factor(education)Primary 0.881439 0.163683 5.385 7.24e-08 ***
factor(education)Secondary 0.458493 0.170798 2.684 0.007265 **
anc_visits -0.151682 0.025920 -5.852 4.86e-09 ***
factor(place_delivery)Private -0.343444 0.149970 -2.290 0.022016 *
factor(place_delivery)Public -0.500418 0.132890 -3.766 0.000166 ***
factor(skilled_attendant)Skilled -0.338038 0.125869 -2.686 0.007239 **
distance_facility 0.014338 0.003376 4.247 2.17e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 3927.3 on 5999 degrees of freedom
Residual deviance: 3671.8 on 5989 degrees of freedom
AIC: 3693.8
Number of Fisher Scoring iterations: 5
For the first visualization, I have focused on looking at whether the number of deaths was decreasing or increasing as women get more education. The plot is based on the 10 poorest counties in Kenya and the levels of education range from None, to Primary, to Secondary and Higher. Plotly was added to create interactivity for the plot.
p1 <-ggplot(maternal3, aes(x= county, y = death, fill =as.factor(education))) +geom_col() +coord_flip() +facet_wrap(~education) +scale_color_brewer(palette ="Set1") +#scale_fill_manual(values = c("#edf8b1", "#7fcdbb","#2c7fb8","#feb24c")) +labs(title ="Is death increasing or decreasing as maternal education increasing?",x ="Top Poorest Counties in Kenya", y ="Total Number of Deaths",caption ="Source: 2022 Kenya Demographic and Health Survey") +theme_minimal() +theme(panel.grid.major =element_blank(), panel.grid.minor =element_blank())p1
ggplotly(p1)
According to this plot, the number of maternal deaths is lower when women have higher education compared to no education. The number of maternal deaths is also lower when women have secondary education compared to primary education. It was interesting to observe that the number of deaths was highesr for women with primary and secondary education compared to no education.
Second Visualization: Variable Importance for logistic regression and Forest Plot
Using the epirhandbook as guidance, I have conducted a variable importance analysis for the logistic regression and created a forest plot. I focused on the final model of my multiple logistic regression where the AIC was the lowest to select a few variables.
mv_reg <-glm(death ~ age + parity +factor(education) + anc_visits +factor(place_delivery) +factor(skilled_attendant), family ="binomial", data = maternal2)summary(mv_reg)
Call:
glm(formula = death ~ age + parity + factor(education) + anc_visits +
factor(place_delivery) + factor(skilled_attendant), family = "binomial",
data = maternal2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.342290 0.284004 -11.768 < 2e-16 ***
age 0.036927 0.007427 4.972 6.64e-07 ***
parity 0.204698 0.031259 6.548 5.82e-11 ***
factor(education)None 1.336481 0.186467 7.167 7.64e-13 ***
factor(education)Primary 0.869571 0.163330 5.324 1.01e-07 ***
factor(education)Secondary 0.445541 0.170450 2.614 0.008951 **
anc_visits -0.151605 0.025881 -5.858 4.69e-09 ***
factor(place_delivery)Private -0.331647 0.149619 -2.217 0.026649 *
factor(place_delivery)Public -0.498293 0.132679 -3.756 0.000173 ***
factor(skilled_attendant)Skilled -0.343170 0.125655 -2.731 0.006313 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 3927.3 on 5999 degrees of freedom
Residual deviance: 3690.0 on 5990 degrees of freedom
AIC: 3710
Number of Fisher Scoring iterations: 5
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `across(where(is.numeric), round, digits = 2)`.
Caused by warning:
! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
Supply arguments directly to `.fns` through an anonymous function instead.
# Previously
across(a:b, mean, na.rm = TRUE)
# Now
across(a:b, \(x) mean(x, na.rm = TRUE))
Forest Plot
## remove the intercept term from your multivariable resultsp2 <- mv_tab_base |># remove "intercept" row from plotfilter(term !="(Intercept)") |>## plot with variable on the y axis and estimate (OR) on the x axisggplot(aes(x = estimate, y = term, color =as.factor(term))) +## show the estimate as a pointgeom_point() +scale_color_brewer(palette ="Dark2") +## add in an error bar for the confidence intervalsgeom_errorbar(aes(xmin = conf.low, xmax = conf.high)) +## show where OR = 1 is for reference as a dashed linegeom_vline(xintercept =1, linetype ="dashed") +labs(title ="Coefficient Estimates of Predictor Variables", x ="Estimate",y ="Variables",caption ="Source: 2022 Kenya Demographic and Health Survey",title.legend ="Variables") p2
Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Dark2 is 8
Returning the palette you asked for with that many colors
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_point()`).
Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Dark2 is 8
Returning the palette you asked for with that many colors
The plot shows whether the chances of deaths based on the various variables selected. For instance, if a woman has secondary education, chances of dying are decreased compared to if a woman has no education where the chances are high for dying. Some of the variables which decrease the chances of deaths include antenatal visits, skilled attendant, and private place of delivery.
Background research about Maternal Mortality in Kenya
The article “The hidden toll of maternal mortality in Kenya” summarizes the key points that were discussed during a Reproductive, Maternal, Newborn, Child, and Adolescent Health and Nutrition (RMNCAH+N) high-level Policy Dialogue and CSO Roundtable that took place in Nairobi, Kenya in 2025. The goal of the convening was to accelerate reforms, strengthen accountability and mobilize political will so every woman, child and adolescent can thrive (Lubaale et al,2025).
The article mentions how Kenya is facing high levels of maternal mortality with 355 deaths for every 100,000 live births which means that there are about 6,000 preventable deaths every year and about 16 women dying each day. The leading causes of death range from postpartum hemorrhage, then obstructed labor and eclampsia. According to the article, some of the prevention measures that could be implemented are universal access to family planning, quality antenatal and intrapartum care, skilled birth attendance, and emergency obstetric and newborn care (EmONC). Other critical prevention measures include innovations such as heat-stable carbetocin for bleeding management, E-MOTIVE approach, point-of-care ultrasound, CPAP, safe blood transfusions and appropriate supplies and equipment in addition to trained staff.
There are also issues of inequalities that need to be addressed. According to the article, one-third of pregnant women do not attend antenatal visits and only half of women with no education reach four visits while eight in ten women with higher education levels reach the four visits. The article explains how there was a shift in reforms which could be the reason for lower visits where maternity services used to be free and now require out-of-pockets payments that some women cannot afford.
Although Kenya’s Constitution highlights the right of every person to the highest attainable standard of health, including reproductive health and the right to life, there appears to be an unmet need for family planning because unintended and early pregnancies are very high. Women that face early and unintended pregnancies also suffer from lack of support, drop of of school and stigma which lead to poverty and poor health.
Finally, the authors advise the government to take action in adopting and implementing the Maternal Newborn and Child Health Bill 2023 to make possible the access to equitable, quality services in law and enhance the coordinated efforts of the national and county governments. Government stakeholders will need to ensure decisive leadership, bold investment and collective resolve in order for women, girls and children to live and thrive.
Sources: Lubaale, M., & Mushega, L. (2025, September 30). The hidden toll of maternal mortality in Kenya. World Health Organization. https://pmnch.who.int/news-and-events/news/item/30-09-2025-the-hidden-toll-of-maternal-mortality-in-kenya#:~:text=In%20Kenya%2C%20PPH%20is%20the%20leading%20cause%20of,newborn%20asphyxia%2C%20a%20leading%20cause%20of%20neonatal%20mortality.
Others sources include: Tutors and Professor guidance Class notes Standout Submissions Google search
Interesting patterns or surprises from the visualization
The surprises that arised from the first visualization were that secondary education and primary education had higher levels of deaths compared to no education. What I wished I could have included was to look at different years overtime and see whether the adoption and implementation of certain health reforms in Kenya have enabled changes with maternal deaths.
The surprises that I noticed in my second visualization are the fact that education was showing less chances of deaths for primary and secondary compared to no education whereas the first graph had shown the opposite. What I wished I would have done was to add more variables found in my research such as unwanted pregnancies, government leadership and investment in maternal health, affordability of maternity services and access to family planning programmes.