Maternal Mortality Ratio

The definition of maternal mortality is:

“A death of a woman while pregnant or within 42 days of termination of pregnancy, irrespective of the duration and site of pregnancy, from any cause related or aggravated by the pregnancy or its management, but not from accidental or incidental causes.”

The definition of maternal mortality ratio is:

“The number of maternal deaths during a given time period per 100,000 live births during the same time period.”

In this notebook we will create a long-term time-series of maternal mortality ratio. To do this we will combine two existing datasets:

GapMinder (1751-2008)
World Health Organization (2000-2017)

flowchart LR
  A["GapMinder (1751-2008)"]
  B["WHO (2000-2017)"]
  A --> C{"OWID (1751-2017)"}
  B --> C{"OWID (1751-2017)"}

We make the assumption that their methods are comparable and that the data can be combined without transformation.

In years where there is an overlap in both time-series we use the data from the WHO.

Downloading the GapMinder data

Firstly we download the data from GapMinder and save it locally. Then we read in the GapMinder data, clean up the column names and take a look at the data.

Code

if (!file.exists('data/input/happiness_report_historical.xls')){
  download.file('https://happiness-report.s3.amazonaws.com/2021/DataPanelWHR2021C2.xls', 'data/input/happiness_report_historical.xls')
}



if (!file.exists('data/input/happiness_report_2022.xls')){
  download.file('https://happiness-report.s3.amazonaws.com/2022/Appendix_2_Data_for_Figure_2.1.xls', 'data/input/happiness_report_2022.xls')
}

Code

dir.create("data/input", recursive = TRUE, showWarnings = FALSE)
download.file('https://www.gapminder.org/documentation/documentation/gapdata010.xls', paste0('data/input/',date,'_gapminder_data.xls'))

df <- read_xls(paste0('data/input/',date,'_gapminder_data.xls'), col_names = TRUE)
df <- df[-(1:16),]
df <- janitor::clean_names(df)

head(df)

Cleaning the GapMinder data

Important

There are some issues with the GapMinder data - we will manually clean them here.

The first three rows in the ‘year’ column seem to be wrong as it seems as if each row covers 110 years, but in another source they cover only 10 years.
I will replace them manually here with the middle of each decade, as is used in the other data from the same source.
There are cases where the year for Finland has been entered incorrectly three times, there are two 1772s, 1775s & 1967s, the second of each should be 1872, 1875 and 1957 respectively.
There are also two errors for New Zealand, and one error for both Sweden and the US.
Sri Lanka’s maternal mortality rate seems to drop to zero in 1990, here we set it to NA instead.

Code

df$year[1:3] <- c(1875, 1885,1895)
df['year'][df$year == '1772' & df$country == 'Finland' & df$maternal_deaths == 487,] <- '1872'
df['year'][df$year == '1775' & df$country == 'Finland' & df$maternal_deaths == 629,] <- '1875'
df['year'][df$year == '1967' & df$country == 'Finland' & df$maternal_deaths == 77,] <- '1957'
df['year'][df$year == '1967' & df$country == 'Sweden' & df$maternal_deaths == 39,] <- '1957'
df['year'][df$year == '1967' & df$country == 'United States' & df$maternal_deaths == 1766.28,] <- '1957'
df['year'][df$year =='1989-02'& df$country == 'New Zealand',] <- '1889-02'
df <- df[-which(df$year == '1950' & df$country == 'New Zealand' & df$mmr == 90),]

df['mmr'][df$country == "Sri Lanka" & df$year == 1990,] <- NA

Finding the mid-point of each year range

For some rows a range of years is given for a particular maternal mortality rate. We want to find the mid-point of that range. We use this function to find the mid-value of years given.

Code

mid_year <- function(year) {
  if(grepl("-", year)){
    year_split <-  unlist(strsplit(year, split = "-"))
    year_begin <- year_split[1]
    year_end_len <- nchar(year_split[2])
    year_end_pref <- 4 - year_end_len
    
    if (year_end_pref > 0) {
      year_pref <- substr(year_begin, 1, year_end_pref)
      year_end <- as.numeric(paste(c(year_pref, year_split[2]),collapse = ""))
      year_out <- round(mean(c(as.numeric(year_begin), year_end)))
    }else{
      year_out <- round(mean(as.numeric(c(year_begin,year_split[2]))))
    }
  }else{
    year_out <- year
  }
  
  return(year_out)
}
year_out_all <- unlist(lapply(df$year, mid_year))
df$mid_year <- as.numeric(year_out_all)

Let’s check the mid-year estimates have worked as expected. There are three cases where the mid-year estimate is not the mid-point of the range of years. Let’s fix these manually. We will keep only the columns we want to take forward.

Code

df %>% 
select(year, mid_year) %>% 
filter(grepl("-",year)) %>% 
mutate(year_begin = substr(year, 1, 4)) %>% 
filter(year_begin >= mid_year)

df['mid_year'][df$year == "1897-01",] <- 1899
df['mid_year'][df$year == "1889-02",] <- 1895
df['mid_year'][df$year == "1889-03",] <- 1896


df <- df %>% 
  select(entity = country, year = mid_year, maternal_mortality_rate = mmr)

Reading in the WHO data

Read in data from WDI - in future iterations we will read this in from ETL so it is auto updated. The WDI sources this variable from the WHO so we will henceforth refer to it as the WHO data.

Code

download.file('https://api.worldbank.org/v2/en/indicator/SH.STA.MMRT?downloadformat=csv', paste0('data/input/',date,'_wdi_maternal_mortality.zip'))

unzip(paste0('data/input/',date,'_wdi_maternal_mortality.zip'), exdir = "data/input")
des_file <- unzip(paste0('data/input/',date,'_wdi_maternal_mortality.zip'), list=TRUE)
file_name <- des_file$Name[startsWith( des_file$Name,"API_SH.STA.MMRT_DS2")]

who <- clean_names(read.csv(paste0('data/input/', file_name), skip = 4))

Cleaning the WHO data

Pivot and clean the WHO data to long format. Check the GapMinder countries are in the WHO dataset

Code

who_clean <- who %>% 
  select(country_name, starts_with("X")) %>% 
  select(-x) %>% 
  pivot_longer(starts_with("x")) %>% 
  mutate(year = as.numeric(gsub("x", "", name))) %>% 
  select(entity = country_name, year, maternal_mortality_rate = value)

head(who_clean)

all(unique(df$entity) %in% who_clean$entity)

Merging the WHO data with the GapMinder data

Combine the two datasets and add a source column so we can keep track of where each datapoint comes from.

Code

df$source <- 'GapMinder'
who_clean$source <- 'WHO_GHO'
df_all <- rbind(df, who_clean)

df_all$maternal_mortality_rate <- as.numeric(df_all$maternal_mortality_rate)

head(df_all)

Dealing with duplicate datapoints

Dealing with duplicate years - there are two identical values for NZ in 1950, we can drop one with distinct(). We remove NAs where there is a non-NA value in either Gapminder or WHO. If there are values for both GapMinder and the WHO data we have a preference for the WHO data.

Code

df_check <-df_all %>% 
  group_by(entity, year) %>% 
  add_count() %>% 
  ungroup()


df_sel <- df_check[-which(df_check$n >1 & is.na(df_check$maternal_mortality_rate)),]

df_sel <-df_sel %>% 
  group_by(entity, year) %>% 
  add_count() %>% 
  ungroup() %>% 
  distinct()

df_sel <- df_sel[-which(df_sel$nn >1 & df_sel$source == 'GapMinder'),]

df_check <- df_sel %>% 
  group_by(entity, year) %>% 
  add_count()

sum(df_check$nnn > 1) == 0

df_sel <- df_sel %>% select(-c(n, nn))

Standardising the country names

Standardising the country names for use on OWID.

Code

stand_countries <- read.csv("data/input/countries_to_standardise_country_standardized.csv")

df_fin <- df_sel %>% left_join(.,stand_countries, by = c("entity" = "Country")) %>% 
  mutate(entity = Our.World.In.Data.Name) %>% 
  select(-c(Our.World.In.Data.Name, source)) 

df_fin$maternal_mortality_rate[is.na(df_fin$maternal_mortality_rate)] <- ""

Cleaning geographic entities

We don’t want to include all of the available regions in the data as it is difficult for us to define them clearly, so we drop out a selection here. We also plot the data for Sweden to check the time-series looks as expected.

Code

entities_to_drop <- c("Africa Eastern and Southern", "Africa Western and Central", "Arab World", "Early-demographic dividend", "East Asia & Pacific (excluding high income)", "East Asia & Pacific (IDA & IBRD countries)", "Euro area", "Europe & Central Asia (excluding high income)", "Europe & Central Asia (IDA & IBRD countries)", "Fragile and conflict affected situations", "Heavily indebted poor countries (HIPC)", "IBRD only", "IDA & IBRD total", "IDA blend", "IDA only", "IDA total", "Late-demographic dividend", "Latin America & Caribbean (excluding high income)", "Latin America & the Caribbean (IDA & IBRD countries)", "Least developed countries: UN classification", "Middle East & North Africa (excluding high income)", 
                      "Middle East & North Africa (IDA & IBRD countries)", "Not classified",  "Other small states", "Pacific island small states", "Post-demographic dividend", "Pre-demographic dividend", "Small states", "South Asia (IDA & IBRD)", "Sub-Saharan Africa (excluding high income)", "Sub-Saharan Africa (IDA & IBRD countries)")

df_fin <- df_fin %>% 
 filter(!entity %in% entities_to_drop)


fig <- plot_ly(data = df_fin %>% filter(entity== "Sweden"), x = ~year,
text = ~paste("MMR: ", round(as.numeric(maternal_mortality_rate),2), '<br>Year:', year)) %>% add_trace(y = ~as.numeric(maternal_mortality_rate), name = 'MMR',mode = 'lines') %>% layout(xaxis = list(title = 'Year'), 
         yaxis = list(title = 'Maternal Mortality Ratio (per 100,000 live births)'))

fig

Calculating the proportional change since 2000

We can also calculate the proportional change since 2000. This is useful for understanding the change in the data over time and we can use this data in a slope chart.

Code

df_clean <- df_fin %>% 
  filter(year %in% c(2000, 2017)) %>% 
  filter(complete.cases(.)) %>%
  mutate(maternal_mortality_rate = as.numeric(maternal_mortality_rate)) %>% 
  group_by(entity) %>% 
    add_count() %>% 
  filter(n == 2)
  
df_wide <- df_clean %>% 
  pivot_wider(., names_from = 'year',names_glue = "{.value}_{year}", values_from = 'maternal_mortality_rate') %>% 
  mutate(change = ((maternal_mortality_rate_2017 - maternal_mortality_rate_2000)/maternal_mortality_rate_2000)*100, baseline = 0) %>% 
  select(entity, change, baseline) %>% 
  pivot_longer(cols = c("change", "baseline")) %>% 
  mutate(name = gsub("change", 2017, name), name = gsub("baseline", 2000, name)) %>% 
  mutate(year = as.numeric(name)) %>% 
  select(entity, year, relative_change_mmr = value)


slope_ggplot <-  df_wide %>%
highlight_key(., ~entity)  %>%
ggplot( aes(x = year, y = round(relative_change_mmr,2), group = entity, text = entity))+
  labs(x = "Year", y = "Relative change since 2000 (%)", title = "Relative change in maternal mortality ratio since 2000 (%)")+
  geom_line(color = "#888888" )+
  theme_bw()+
    theme(legend.position="none",
    panel.background = element_rect(fill = "#ffffff"),
    axis.ticks = element_blank(),
    axis.text = element_text(color = "#888888")) 

s <- attrs_selected(
  line = list(color = "#0000FF",
  width = 5)
)

ggplotly(tooltip = "text", slope_ggplot)%>% 
config(displayModeBar = FALSE) %>%
highlight(on = "plotly_hover",
opacityDim = 0.7,
color = "blue", 
selected = s) %>%
layout(hoverlabel = list(bgcolor = "blue", font = list(color = "FFFFFF", size = 14)))

Lastly, we combine the proportional change data with the full time-series of maternal mortality rate and write out the data to a csv file.

Code

df_out <- df_wide %>% right_join(., df_fin, by = c("entity","year"))
df_out$relative_change_mmr[is.na(df_out$relative_change_mmr)] <- ""
write.csv(df_out, "data/output/maternal_mortality_rate_upload.csv", row.names = FALSE)

--- title: "Maternal Mortality Ratio" execute: warning: false message: false eval: false format: html: fontsize: 0.9em code-tools: true code-fold: true code-copy: hover toc: true toc-depth: 2 self-contained: true smooth-scroll: true anchor-sections: true comments: hypothesis: true --- The definition of maternal mortality is: > "A death of a woman while pregnant or within 42 days of termination of pregnancy, irrespective of the duration and site of pregnancy, from any cause related or aggravated by the pregnancy or its management, but not from accidental or incidental causes." The definition of maternal mortality ratio is: > "The number of maternal deaths during a given time period per 100,000 live births during the same time period." In this notebook we will create a long-term time-series of maternal mortality ratio. To do this we will combine two existing datasets: * GapMinder (1751-2008) * World Health Organization (2000-2017) ```{mermaid} flowchart LR A["GapMinder (1751-2008)"] B["WHO (2000-2017)"] A --> C{"OWID (1751-2017)"} B --> C{"OWID (1751-2017)"} ``` We make the assumption that their methods are comparable and that the data can be combined without transformation. In years where there is an overlap in both time-series we use the data from the WHO. ## Downloading the GapMinder data ```{r} #| echo: false library(readxl) library(dplyr) library(janitor) library(tidyr) library(ggplot2) library(plotly) library(htmlwidgets) library(zoo) date <- Sys.Date() ``` Firstly we download the data from GapMinder and save it locally. Then we read in the GapMinder data, clean up the column names and take a look at the data. ```{r} if (!file.exists('data/input/happiness_report_historical.xls')){ download.file('https://happiness-report.s3.amazonaws.com/2021/DataPanelWHR2021C2.xls', 'data/input/happiness_report_historical.xls') } if (!file.exists('data/input/happiness_report_2022.xls')){ download.file('https://happiness-report.s3.amazonaws.com/2022/Appendix_2_Data_for_Figure_2.1.xls', 'data/input/happiness_report_2022.xls') } ``` ```{r} dir.create("data/input", recursive = TRUE, showWarnings = FALSE) download.file('https://www.gapminder.org/documentation/documentation/gapdata010.xls', paste0('data/input/',date,'_gapminder_data.xls')) df <- read_xls(paste0('data/input/',date,'_gapminder_data.xls'), col_names = TRUE) df <- df[-(1:16),] df <- janitor::clean_names(df) head(df) ``` ## Cleaning the GapMinder data :::{.callout-important} There are some issues with the GapMinder data - we will manually clean them here. ::: * The first three rows in the 'year' column seem to be wrong as it seems as if each row covers 110 years, but in [another source](https://docs.google.com/spreadsheets/u/0/d/14ZtQy9kd0pMRKWg_zKsTg3qKHoGtflj-Ekal9gIPZ4A/pub?gid=1#) they cover only 10 years. * I will replace them manually here with the middle of each decade, as is used in the other data from the same source. * There are cases where the year for Finland has been entered incorrectly three times, there are two 1772s, 1775s & 1967s, the second of each should be 1872, 1875 and 1957 respectively. * There are also two errors for New Zealand, and one error for both Sweden and the US. * Sri Lanka's maternal mortality rate seems to drop to zero in 1990, here we set it to NA instead. ```{r} df$year[1:3] <- c(1875, 1885,1895) df['year'][df$year == '1772' & df$country == 'Finland' & df$maternal_deaths == 487,] <- '1872' df['year'][df$year == '1775' & df$country == 'Finland' & df$maternal_deaths == 629,] <- '1875' df['year'][df$year == '1967' & df$country == 'Finland' & df$maternal_deaths == 77,] <- '1957' df['year'][df$year == '1967' & df$country == 'Sweden' & df$maternal_deaths == 39,] <- '1957' df['year'][df$year == '1967' & df$country == 'United States' & df$maternal_deaths == 1766.28,] <- '1957' df['year'][df$year =='1989-02'& df$country == 'New Zealand',] <- '1889-02' df <- df[-which(df$year == '1950' & df$country == 'New Zealand' & df$mmr == 90),] df['mmr'][df$country == "Sri Lanka" & df$year == 1990,] <- NA ``` ## Finding the mid-point of each year range For some rows a range of years is given for a particular maternal mortality rate. We want to find the mid-point of that range. We use this function to find the mid-value of years given. ```{r} mid_year <- function(year) { if(grepl("-", year)){ year_split <- unlist(strsplit(year, split = "-")) year_begin <- year_split[1] year_end_len <- nchar(year_split[2]) year_end_pref <- 4 - year_end_len if (year_end_pref > 0) { year_pref <- substr(year_begin, 1, year_end_pref) year_end <- as.numeric(paste(c(year_pref, year_split[2]),collapse = "")) year_out <- round(mean(c(as.numeric(year_begin), year_end))) }else{ year_out <- round(mean(as.numeric(c(year_begin,year_split[2])))) } }else{ year_out <- year } return(year_out) } year_out_all <- unlist(lapply(df$year, mid_year)) df$mid_year <- as.numeric(year_out_all) ``` Let's check the mid-year estimates have worked as expected. There are three cases where the mid-year estimate is not the mid-point of the range of years. Let's fix these manually. We will keep only the columns we want to take forward. ```{r} df %>% select(year, mid_year) %>% filter(grepl("-",year)) %>% mutate(year_begin = substr(year, 1, 4)) %>% filter(year_begin >= mid_year) df['mid_year'][df$year == "1897-01",] <- 1899 df['mid_year'][df$year == "1889-02",] <- 1895 df['mid_year'][df$year == "1889-03",] <- 1896 df <- df %>% select(entity = country, year = mid_year, maternal_mortality_rate = mmr) ``` ## Reading in the WHO data Read in data from WDI - in future iterations we will read this in from ETL so it is auto updated. The WDI sources this variable from the WHO so we will henceforth refer to it as the WHO data. ```{r} download.file('https://api.worldbank.org/v2/en/indicator/SH.STA.MMRT?downloadformat=csv', paste0('data/input/',date,'_wdi_maternal_mortality.zip')) unzip(paste0('data/input/',date,'_wdi_maternal_mortality.zip'), exdir = "data/input") des_file <- unzip(paste0('data/input/',date,'_wdi_maternal_mortality.zip'), list=TRUE) file_name <- des_file$Name[startsWith( des_file$Name,"API_SH.STA.MMRT_DS2")] who <- clean_names(read.csv(paste0('data/input/', file_name), skip = 4)) ``` ## Cleaning the WHO data Pivot and clean the WHO data to long format. Check the GapMinder countries are in the WHO dataset ```{r} who_clean <- who %>% select(country_name, starts_with("X")) %>% select(-x) %>% pivot_longer(starts_with("x")) %>% mutate(year = as.numeric(gsub("x", "", name))) %>% select(entity = country_name, year, maternal_mortality_rate = value) head(who_clean) all(unique(df$entity) %in% who_clean$entity) ``` ## Merging the WHO data with the GapMinder data Combine the two datasets and add a source column so we can keep track of where each datapoint comes from. ```{r} df$source <- 'GapMinder' who_clean$source <- 'WHO_GHO' df_all <- rbind(df, who_clean) df_all$maternal_mortality_rate <- as.numeric(df_all$maternal_mortality_rate) head(df_all) ``` ## Dealing with duplicate datapoints Dealing with duplicate years - there are two identical values for NZ in 1950, we can drop one with distinct(). We remove NAs where there is a non-NA value in either Gapminder or WHO. If there are values for both GapMinder and the WHO data we have a preference for the WHO data. ```{r} #| output: false df_check <-df_all %>% group_by(entity, year) %>% add_count() %>% ungroup() df_sel <- df_check[-which(df_check$n >1 & is.na(df_check$maternal_mortality_rate)),] df_sel <-df_sel %>% group_by(entity, year) %>% add_count() %>% ungroup() %>% distinct() df_sel <- df_sel[-which(df_sel$nn >1 & df_sel$source == 'GapMinder'),] df_check <- df_sel %>% group_by(entity, year) %>% add_count() sum(df_check$nnn > 1) == 0 df_sel <- df_sel %>% select(-c(n, nn)) ``` ## Standardising the country names Standardising the country names for use on OWID. ```{r} stand_countries <- read.csv("data/input/countries_to_standardise_country_standardized.csv") df_fin <- df_sel %>% left_join(.,stand_countries, by = c("entity" = "Country")) %>% mutate(entity = Our.World.In.Data.Name) %>% select(-c(Our.World.In.Data.Name, source)) df_fin$maternal_mortality_rate[is.na(df_fin$maternal_mortality_rate)] <- "" ``` ## Cleaning geographic entities We don't want to include all of the available regions in the data as it is difficult for us to define them clearly, so we drop out a selection here. We also plot the data for Sweden to check the time-series looks as expected. ```{r} entities_to_drop <- c("Africa Eastern and Southern", "Africa Western and Central", "Arab World", "Early-demographic dividend", "East Asia & Pacific (excluding high income)", "East Asia & Pacific (IDA & IBRD countries)", "Euro area", "Europe & Central Asia (excluding high income)", "Europe & Central Asia (IDA & IBRD countries)", "Fragile and conflict affected situations", "Heavily indebted poor countries (HIPC)", "IBRD only", "IDA & IBRD total", "IDA blend", "IDA only", "IDA total", "Late-demographic dividend", "Latin America & Caribbean (excluding high income)", "Latin America & the Caribbean (IDA & IBRD countries)", "Least developed countries: UN classification", "Middle East & North Africa (excluding high income)", "Middle East & North Africa (IDA & IBRD countries)", "Not classified", "Other small states", "Pacific island small states", "Post-demographic dividend", "Pre-demographic dividend", "Small states", "South Asia (IDA & IBRD)", "Sub-Saharan Africa (excluding high income)", "Sub-Saharan Africa (IDA & IBRD countries)") df_fin <- df_fin %>% filter(!entity %in% entities_to_drop) fig <- plot_ly(data = df_fin %>% filter(entity== "Sweden"), x = ~year, text = ~paste("MMR: ", round(as.numeric(maternal_mortality_rate),2), '<br>Year:', year)) %>% add_trace(y = ~as.numeric(maternal_mortality_rate), name = 'MMR',mode = 'lines') %>% layout(xaxis = list(title = 'Year'), yaxis = list(title = 'Maternal Mortality Ratio (per 100,000 live births)')) fig ``` ## Calculating the proportional change since 2000 We can also calculate the proportional change since 2000. This is useful for understanding the change in the data over time and we can use this data in a slope chart. ```{r} df_clean <- df_fin %>% filter(year %in% c(2000, 2017)) %>% filter(complete.cases(.)) %>% mutate(maternal_mortality_rate = as.numeric(maternal_mortality_rate)) %>% group_by(entity) %>% add_count() %>% filter(n == 2) df_wide <- df_clean %>% pivot_wider(., names_from = 'year',names_glue = "{.value}_{year}", values_from = 'maternal_mortality_rate') %>% mutate(change = ((maternal_mortality_rate_2017 - maternal_mortality_rate_2000)/maternal_mortality_rate_2000)*100, baseline = 0) %>% select(entity, change, baseline) %>% pivot_longer(cols = c("change", "baseline")) %>% mutate(name = gsub("change", 2017, name), name = gsub("baseline", 2000, name)) %>% mutate(year = as.numeric(name)) %>% select(entity, year, relative_change_mmr = value) slope_ggplot <- df_wide %>% highlight_key(., ~entity) %>% ggplot( aes(x = year, y = round(relative_change_mmr,2), group = entity, text = entity))+ labs(x = "Year", y = "Relative change since 2000 (%)", title = "Relative change in maternal mortality ratio since 2000 (%)")+ geom_line(color = "#888888" )+ theme_bw()+ theme(legend.position="none", panel.background = element_rect(fill = "#ffffff"), axis.ticks = element_blank(), axis.text = element_text(color = "#888888")) s <- attrs_selected( line = list(color = "#0000FF", width = 5) ) ggplotly(tooltip = "text", slope_ggplot)%>% config(displayModeBar = FALSE) %>% highlight(on = "plotly_hover", opacityDim = 0.7, color = "blue", selected = s) %>% layout(hoverlabel = list(bgcolor = "blue", font = list(color = "FFFFFF", size = 14))) ``` Lastly, we combine the proportional change data with the full time-series of maternal mortality rate and write out the data to a csv file. ```{r} df_out <- df_wide %>% right_join(., df_fin, by = c("entity","year")) df_out$relative_change_mmr[is.na(df_out$relative_change_mmr)] <- "" write.csv(df_out, "data/output/maternal_mortality_rate_upload.csv", row.names = FALSE) ``` ```{=html} <iframe src="https://ourworldindata.org/grapher/maternal-mortality" style="width: 100%; height: 600px; border: 0px none;"></iframe> ```