knitr::opts_chunk$set(echo=TRUE, message=FALSE, warning=FALSE, include=TRUE, error=FALSE)
library(tidyverse)
library(RMySQL)      # Older MysQL library (note: I could NOT get the RMariaDB lib to work)
library(formattable) # Pretty print tables
library(xlsx)
library(rvest)
library(stargazer)

Source files: https://github.com/djlofland/DATA607_F2019/tree/master/Project2

Assignment

The goal of this assignment is to give you practice in preparing different datasets for downstream analysis work. Your task is to:

  1. Choose any three of the “wide” datasets identified in the Week 6 Discussion items. (You may use your own dataset; please don’t use my Sample Post dataset, since that was used in your Week 6 assignment!) For each of the three chosen datasets:
  • Create a .CSV file (or optionally, a MySQL database!) that includes all of the information included in the dataset. You’re encouraged to use a “wide” structure similar to how the information appears in the discussion item, so that you can practice tidying and transformations as described below.
  • Read the information from your .CSV file into R, and use tidyr and dplyr as needed to tidy and transform your data. [Most of your grade will be based on this step!]
  • Perform the analysis requested in the discussion item.
  • Your code should be in an R Markdown file, posted to rpubs.com, and should include narrative descriptions of your data cleanup work, analysis, and conclusions.
  1. Please include in your homework submission, for each of the three chosen datasets:
  • The URL to the .Rmd file in your GitHub repository, and
  • The URL for your rpubs.com web page.

Project

In the late 1990’s I worked for an NGO that focused on training doctors, nurses and midwives on women’s reproductive health issues and health during pregancy. The NGO developed and tested training programs in very rural areas around the world to teach health care providers how to provide care for women with no easy access to primary health care facilities or doctors.

In our Weekly Discussion forum, Euclid Zhang shared a link to UNICEF data and while browsing their datasets, I stumbled on a few datasets which really spoke to me and the incredible work I once contributed toward. The datasets provide data and trends in both maternal health and outcomes along with indicators by country. For Project 2, rather than pick 3 disparate datasets to analyze, I’m going to instead follow this theme of maternal outcome around the world.

For an understanding of the problem space, see Maternal mortality.

Some key facts they provide: - Every day in 2017, approximately 810 women died from preventable causes related to pregnancy and childbirth. - Between 2000 and 2017, the maternal mortality ratio (MMR, number of maternal deaths per 100,000 live births) dropped by about 38% worldwide. - 94% of all maternal deaths occur in low and lower middle-income countries. - Young adolescents (ages 10-14) face a higher risk of complications and death as a result of pregnancy than other women. - Skilled care before, during and after childbirth can save the lives of women and newborns.

The primary datasets I’ll explore are:

Each of these datasets required Tidy processing to clean up. I also join the data where possible by country and year to tease out additional patterns not available within any one dataset.

To facilitate joining tables using different Country label conventions, I scraped a lookup table with Country names and ISO3 codes which I join to my various datasets.

These datasets explore different years, but have broad overlap between 2000-2015 (every 5 years). My analysis will be somewhat limited by the available data. If I were doing a deeper dive, I would look for more complete data and/or limit my questions to those countries that have a richer set of data. My guess is that better data may indicate more attention to tackling problems with maternal mortality and as a consequence, better data might come from countries with better reporting and more on-the-ground efforts.

Load Data

Load Country Codes

During research I found that different datasets included ISO2 (Alpha 2), ISO3 (Alpha 3) and/or descriptive country names. Just in case it’s needed, I wanted a lookup table I can use to join different datasets using different country name conventions.

url             <- "https://www.iban.com/country-codes"
cache_fn        <- 'country_codes.csv'

isCacheFound    <- FALSE
local_cache_fn  <- paste('data/cache/', cache_fn, sep = '')

# Check if we have a local copy of the data available to load
if(file.exists(local_cache_fn)) {
  msg <- paste('Found cached copy: ', local_cache_fn, sep='')
  isCacheFound <- TRUE
}

# Cached ata file found
if (isCacheFound) {
  # Load CSV Files
  data_df <- read_csv(local_cache_fn, col_names = TRUE)
  isDataLoaded <- TRUE
  msg <- 'Cached CSV data loaded.'

} else {
  # Load Raw Data
  data_df <- url %>%
    xml2::read_html() %>%
    html_nodes(xpath='//*[@id="myTable"]') %>%
    html_table()
  
  data_df <- data_df[[1]]
  
  # Cache the processed as CSV for future
  write.csv(data_df, local_cache_fn, row.names=FALSE, na="")
}

iso3_codes <- data_df
isTidy_iso3_codes <- FALSE

print(msg)
## [1] "Cached CSV data loaded."
knitr::kable(iso3_codes[1:10, ], booktabs = T, caption = "ISO2 and ISO3 Country Codes.  Raw data, first 10 rows shown.  (Source: http://iban.com/country-codes)")
ISO2 and ISO3 Country Codes. Raw data, first 10 rows shown. (Source: http://iban.com/country-codes)
Country Alpha-2 code Alpha-3 code Numeric
Afghanistan AF AFG 4
Åland Islands AX ALA 248
Albania AL ALB 8
Algeria DZ DZA 12
American Samoa AS ASM 16
Andorra AD AND 20
Angola AO AGO 24
Anguilla AI AIA 660
Antarctica AQ ATA 10
Antigua and Barbuda AG ATG 28

Tidy Country Codes

# If rerunning our RMarkdown, only tidy if necessary
if (isTidy_iso3_codes == FALSE) {
  
  # Convert year columns into a variable
  iso3_codes <- iso3_codes %>% 
    mutate(ISO3 = `Alpha-3 code`, ISO2 = `Alpha-2 code`) %>%
    select(Country, ISO3, ISO2)
  
  isTidy_iso3_codes <- TRUE # if this cell ir rerun, don't try to re-tidy
  
} else {
  print('Already processed, nothing to do.')
}
knitr::kable(iso3_codes[1:10, ], booktabs = T, caption = "ISO2 and ISO3 Country Codes.  Tidy data, first 10 rows shown.")
ISO2 and ISO3 Country Codes. Tidy data, first 10 rows shown.
Country ISO3 ISO2
Afghanistan AFG AF
Åland Islands ALA AX
Albania ALB AL
Algeria DZA DZ
American Samoa ASM AS
Andorra AND AD
Angola AGO AO
Anguilla AIA AI
Antarctica ATA AQ
Antigua and Barbuda ATG AG

Load Population Data

Changes in country population might change how we interpret maternal mortality rate. While the rate should theoretically be independent from population (by definition MMR is maternal deaths per 100,000 live births), if there is a large increase or decrease in population, this might impact conditions around pregnency (nutrition, access to health care, etc) and/or conditions at time of birth (e.g. access to health care). Since MMR is “deaths per live briths”, I’ll end up needing population data to help calculate the number of live births.

url        <- "https://en.wikipedia.org/wiki/List_of_countries_by_past_and_future_population"
cache_fn   <- 'population.csv'

isCacheFound <- FALSE
local_cache_fn    <- paste('data/cache/', cache_fn, sep = '')

# Check if we have a local copy of the data available to load
if(file.exists(local_cache_fn)) {
  msg <- paste('Found cached copy: ', local_cache_fn, sep='')
  isCacheFound <- TRUE
}

# Cached ata file found
if (isCacheFound) {
  # Load CSV Files
  data_df <- read_csv(local_cache_fn, col_names = TRUE)
  isDataLoaded <- TRUE
  msg <- 'Cached CSV data loaded.'

} else {
  # Load Raw Data
  data_df <- url %>%
    xml2::read_html() %>%
    html_nodes(xpath='//*[@id="mw-content-text"]/div/table[2]') %>%
    html_table()
  
  data_df <- data_df[[1]]
  
  # Cache the processed as CSV for future
  write.csv(data_df, local_cache_fn, row.names=FALSE, na="")
}

population_df <- data_df
isTidy_population <- FALSE

print(msg)
## [1] "Cached CSV data loaded."
knitr::kable(population_df[1:10, ], booktabs = T, caption = "Population Data.  Raw data, first 10 rows shown.  (Source: https://en.wikipedia.org/wiki/List_of_countries_by_past_and_future_population)")
Population Data. Raw data, first 10 rows shown. (Source: https://en.wikipedia.org/wiki/List_of_countries_by_past_and_future_population)
Country (or dependent territory) 1985 % 1990 %_1 1995 %_2 2000 %_3 2005 %_4 2010 %_5 2015 %_6
Afghanistan 13120 -2.70 13569 0.67 19446 7.46 22462 2.93 26335 3.23 29121 2.03 32565 2.26
Albania 2957 2.05 3245 1.88 3159 -0.54 3159 0.00 3025 -0.86 2987 -0.25 3030 0.28
Algeria 22009 3.19 25191 2.74 28322 2.37 30639 1.58 32918 1.45 35950 1.78 39543 1.92
American Samoa 39 3.57 48 4.09 54 2.69 58 1.39 57 -0.28 56 -0.53 55 -0.41
Andorra 45 5.84 53 3.41 64 3.70 66 0.58 77 3.18 85 2.12 86 0.25
Angola 8390 3.09 9486 2.48 11000 3.01 12683 2.89 14770 3.09 17043 2.90 19626 2.86
Anguilla 7 1.35 9 3.84 10 3.20 12 3.02 14 2.80 15 2.40 17 2.14
Antigua and Barbuda 65 -1.24 65 -0.07 69 1.38 76 1.86 82 1.53 87 1.31 93 1.28
Argentina 30672 1.57 33036 1.50 35274 1.32 37336 1.14 39182 0.97 41344 1.08 43432 0.99
Armenia 3465 2.02 3530 0.37 3131 -2.37 3101 -0.19 3085 -0.10 3072 -0.08 3057 -0.10

Tidy Population Data

# If rerunning our RMarkdown, only tidy if necessary
if (isTidy_population == FALSE) {
  
  # Convert year columns into a variable
  population_df <- population_df %>% 
    mutate(Country = `Country (or dependent territory)`) %>%
    select(Country, `2000`, `2005`, `2010`, `2015`) %>%
    gather(Year, Population, -Country)

  # Fix datatypes
  population_df$Year <- as.numeric(population_df$Year)

  # Filter years  
  population_df <- population_df %>% filter(Year >= 2000 & Year <= 2015) %>% arrange(Country, Year)
  population_df$Population <- population_df$Population* 1000
  population_df <- population_df %>% left_join(iso3_codes)
  
  isTidy_population <- TRUE # if this cell ir rerun, don't try to re-tidy
  
} else {
  print('Already processed, nothing to do.')
}
knitr::kable(population_df[1:10, ], booktabs = T, caption = "Population Data.  Tidy date, first 10 rows shown.")
Population Data. Tidy date, first 10 rows shown.
Country Year Population ISO3 ISO2
Afghanistan 2000 22462000 AFG AF
Afghanistan 2005 26335000 AFG AF
Afghanistan 2010 29121000 AFG AF
Afghanistan 2015 32565000 AFG AF
Albania 2000 3159000 ALB AL
Albania 2005 3025000 ALB AL
Albania 2010 2987000 ALB AL
Albania 2015 3030000 ALB AL
Algeria 2000 30639000 DZA DZ
Algeria 2005 32918000 DZA DZ

Load Birthrate Data

While there is variation between countries, the overall birthrate worldwide has declined. To understand changes in maternal mortality, we will need to estimate how many births are occuring during years of interest in each country. It’s possible that changes in birth rates could have an impact on MMR.

# We first want to check if the data has already been loaded and muched  as a local cached .csv file.  If the csv file is available, use that.  If 
# it's not, then we will download and/or load the raw data, munge and save as a local cache .csv.

# Source Data File
source_url        <- 'http://api.worldbank.org/v2/en/indicator/SP.DYN.CBRT.IN?downloadformat=excel'
raw_fn            <- 'API_SP.DYN.CBRT.IN_DS2_en_excel_v2_248743.xlsx'
cache_fn          <- 'births.csv'

# Which WorkSheet to load and there are some extra rows above & below our table of interest. 
data_header_rows  <- 3  # rows at top to skip (note, blank rows are automatically skipped)
data_tail_rows    <- 0  # rows at bottom to skip
data_header       <- FALSE
data_sheet        <- 1      # optional for Excel Workbooks
data_column_names <- c('Country', 'ISO3', 'Indicator', 'Indicator_Code', as.character(seq(1960, 2017, 1)))

# Some variables to help with flow control
isCacheFound      <- FALSE
isRawFound        <- FALSE
isDataLoaded      <- FALSE

local_raw_fn      <- paste('data/raw/', raw_fn, sep = '')
local_cache_fn    <- paste('data/cache/', cache_fn, sep = '')

# Check if we have a local copy of the data available to load
if(file.exists(local_cache_fn)) {
  msg <- paste('Found cached copy: ', local_cache_fn, sep='')
  isCacheFound <- TRUE
  
} else if(file.exists(local_raw_fn)) {
  msg <- paste('Found raw copy: ', local_raw_fn, ' (May need processing)', sep = '')
  isRawFound <- TRUE
  
} else {
  # Attempt to download the dataset
  download.file(source_url, local_raw_fn, method="auto")
  
  if(file.size(local_raw_fn) > 0) {
    msg <- paste('Downloaded file from: ', source_url, sep='')
    isRawFound <- TRUE
  } else {
    msg <- paste("File not found and couldn't be downloaded. Check file name and/or source.")
  }
}

print(msg)
## [1] "Found cached copy: data/cache/births.csv"
# Cached data file found
if (isCacheFound) {
  # Load CSV Files
  data_df <- read_csv(paste('data/cache/', cache_fn, sep = ''), col_names = TRUE)
  isDataLoaded <- TRUE
  msg <- 'Cached CSV data loaded.'

} else if (isRawFound) {
  # Load Raw File - Make sure it's excel since read.xlsx will break with other file formats
  if (str_detect(local_raw_fn, '.*\\.xlsx?$')) {
    data_df <- read.xlsx(local_raw_fn, sheetIndex = data_sheet, header=data_header)
    isDataLoaded <- TRUE
    msg <- 'Raw data loaded.'
    
    ##### DATA PROCESSING start #####
    
    # Are there any extraneous row at to top to remove?
    if (data_header_rows > 0) {
      data_df <- data_df[(data_header_rows + 1):nrow(data_df),]
      data_header_rows <- 0
      rownames(data_df) <- NULL
    }
    
    # Did the raw file have any extraneous rows at the bottom to remove?
    if (data_tail_rows > 0) {
      last_row <- nrow(data_df)-data_tail_rows
      data_df <- data_df[0:last_row,]
      data_tail_rows <- 0
      rownames(data_df) <- NULL
    }
    
    # Remove any empty columns in the raw data
    data_df <- data_df %>% select_if(function(x) {!all(is.na(x))})
    
    # Set the Column names
    if (length(data_column_names) > 0) {
      names(data_df) <- data_column_names
      data_column_names <- list()
    }
    
    # Convert to a tibble
    data_df <- as_tibble(data_df)
    
    ##### DATA PROCESSING done #####
    
    # Cache the processed as CSV for future
    write.csv(data_df, local_cache_fn, row.names=FALSE, na="")
    msg <- 'Raw data loaded, processed and saved to cache.'
    
    data_df <- read_csv(paste('data/cache/', cache_fn, sep = ''), col_names = TRUE)
  }
} else {
  # Bummer - no load data file was found to load
  msg <- 'Data NOT loaded.'
}

birthrate_df   <- data_df
isTidy_birthrate <- FALSE

print(msg)
## [1] "Cached CSV data loaded."
knitr::kable(birthrate_df[1:10, ], booktabs = T, caption = "Birthrate Data.  Raw data, first 10 rows shown.  Births per 1000 people.  (Source: http://api.worldbank.org/v2/en/indicator/SP.DYN.CBRT.IN?downloadformat=csv)")
Birthrate Data. Raw data, first 10 rows shown. Births per 1000 people. (Source: http://api.worldbank.org/v2/en/indicator/SP.DYN.CBRT.IN?downloadformat=csv)
Country ISO3 Indicator Indicator_Code 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
Aruba ABW Birth rate, crude (per 1,000 people) SP.DYN.CBRT.IN 35.67900 34.52900 33.3200 32.05000 30.73700 29.4130 28.12100 26.90800 25.81700 24.87200 24.09900 23.50500 23.06800 22.76000 22.56100 22.45200 22.41400 22.4240 22.45400 22.47800 22.47200 22.42400 22.32900 22.18700 21.98900 21.72600 21.39700 21.00800 20.5700 20.08900 19.57100 19.02100 18.44600 17.85900 17.27000 16.69100 16.13200 15.59800 15.09000 14.61500 14.17300 13.76200 13.37500 13.01000 12.66700 12.34800 12.05300 11.78800 11.556 11.361 11.21400 11.12300 11.0900 11.11100 11.17900 11.28100 11.4040 11.53200
Afghanistan AFG Birth rate, crude (per 1,000 people) SP.DYN.CBRT.IN 51.27900 51.37300 51.4570 51.53000 51.58900 51.6310 51.65200 51.65000 51.62300 51.57400 51.50200 51.41100 51.30300 51.18400 51.05800 50.93000 50.80300 50.6780 50.55500 50.43600 50.32100 50.21000 50.09800 49.98400 49.86500 49.73500 49.58600 49.41800 49.2360 49.04800 48.88000 48.76300 48.70900 48.71700 48.77000 48.83500 48.87000 48.83300 48.68800 48.41900 48.02100 47.50500 46.90100 46.23100 45.50700 44.72300 43.87000 42.94400 41.949 40.903 39.82900 38.75000 37.6900 36.67000 35.70600 34.80900 33.9810 33.21100
Angola AGO Birth rate, crude (per 1,000 people) SP.DYN.CBRT.IN 49.08000 48.77900 48.5470 48.43000 48.45000 48.6220 48.93600 49.34300 49.78700 50.23100 50.61900 50.90300 51.06200 51.09400 51.00500 50.82500 50.60000 50.3860 50.22600 50.13900 50.13400 50.20700 50.32200 50.44900 50.56900 50.66300 50.71200 50.71100 50.6570 50.54700 50.38300 50.16800 49.91900 49.65200 49.37800 49.11300 48.87000 48.65200 48.46000 48.29300 48.15000 48.02700 47.91100 47.78600 47.63900 47.45300 47.21500 46.92000 46.563 46.143 45.65600 45.10200 44.4930 43.84700 43.18200 42.52000 41.8820 41.28100
Albania ALB Birth rate, crude (per 1,000 people) SP.DYN.CBRT.IN 40.92400 40.36800 39.6270 38.72300 37.69500 36.5990 35.49600 34.43500 33.45800 32.59000 31.83700 31.18300 30.58700 30.01900 29.47300 28.94900 28.45500 28.0040 27.60600 27.26200 26.98100 26.77200 26.62700 26.52800 26.45200 26.36700 26.24100 26.04700 25.7620 25.37200 24.86700 24.24500 23.52900 22.74200 21.90200 21.02000 20.10600 19.17300 18.23800 17.32100 16.43600 15.59000 14.79000 14.04800 13.38100 12.82100 12.39800 12.11800 11.973 11.945 12.00100 12.10000 12.1970 12.25700 12.25900 12.19700 12.0800 11.93400
Andorra AND Birth rate, crude (per 1,000 people) SP.DYN.CBRT.IN NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 11.90000 11.00000 11.6000 12.50000 11.90000 11.90000 12.10000 11.40000 10.90000 11.00000 10.90000 11.20000 11.90000 12.60000 11.30000 11.80000 11.20000 10.30000 10.90000 10.70000 10.60000 10.10000 10.400 9.900 9.80000 NA 9.5000 NA NA NA 8.8000 NA
Arab World ARB Birth rate, crude (per 1,000 people) SP.DYN.CBRT.IN 47.79008 47.55839 47.3276 47.09162 46.84421 46.5771 46.28291 45.96055 45.61137 45.23716 44.84362 44.44035 44.03865 43.64783 43.27485 42.92493 42.60063 42.2929 41.98992 41.68051 41.34983 40.98353 40.57024 40.10007 39.56784 38.96898 38.30287 37.58055 36.8177 36.02979 35.31807 34.52188 33.83458 33.04784 32.28643 31.50175 30.83459 30.21717 29.64974 29.14368 28.70911 28.34776 28.05892 27.83078 27.65783 27.53771 27.47728 27.46073 27.463 27.496 27.51344 27.48487 27.3893 27.21144 26.94078 26.57699 26.1348 25.64802
United Arab Emirates ARE Birth rate, crude (per 1,000 people) SP.DYN.CBRT.IN 47.26600 46.70500 46.0930 45.42600 44.69400 43.8560 42.85700 41.67800 40.32900 38.84300 37.27600 35.69500 34.17900 32.80400 31.62500 30.70300 30.08200 29.7350 29.61100 29.66600 29.83900 30.05900 30.25100 30.34300 30.28200 30.02200 29.53700 28.85900 28.0260 27.05900 25.98400 24.82800 23.63100 22.43700 21.28200 20.21000 19.25300 18.41000 17.67500 17.04300 16.49400 15.99600 15.51400 15.02400 14.51800 14.00100 13.49000 13.01000 12.581 12.208 11.88800 11.61600 11.3770 11.15900 10.95800 10.77200 10.6030 10.45600
Argentina ARG Birth rate, crude (per 1,000 people) SP.DYN.CBRT.IN 23.77500 23.56900 23.3720 23.19100 23.03300 22.9090 22.83000 22.80400 22.83400 22.92400 23.09000 23.35000 23.69500 24.09800 24.52400 24.91200 25.19100 25.3200 25.27700 25.06600 24.71800 24.28800 23.84600 23.45100 23.12800 22.88900 22.72700 22.60600 22.4930 22.37200 22.22600 22.03600 21.80200 21.52600 21.21400 20.88100 20.54500 20.22900 19.94500 19.70200 19.49900 19.32800 19.17500 19.02600 18.87900 18.73500 18.59600 18.46900 18.355 18.252 18.15400 18.05500 17.9490 17.83100 17.69800 17.54800 17.3830 17.20500
Armenia ARM Birth rate, crude (per 1,000 people) SP.DYN.CBRT.IN 38.50900 37.21700 35.5610 33.62900 31.54000 29.4470 27.49600 25.78600 24.38900 23.35500 22.69400 22.36600 22.26200 22.28300 22.38200 22.53000 22.72100 22.9610 23.24300 23.53800 23.82300 24.08200 24.29700 24.44600 24.50200 24.43100 24.20700 23.82600 23.2920 22.60900 21.76800 20.76400 19.62700 18.41000 17.17200 15.99100 14.94700 14.09200 13.45700 13.06200 12.91000 12.98000 13.20800 13.52400 13.88400 14.24400 14.57200 14.85500 15.080 15.233 15.30400 15.29700 15.2320 15.12500 14.97900 14.79500 14.5680 14.29800
American Samoa ASM Birth rate, crude (per 1,000 people) SP.DYN.CBRT.IN NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 21.60000 NA NA NA 18.70000 NA NA NA 17.50000 NA NA NA

Tidy Birthrate Data

# If rerunning our RMarkdown, only tidy if necessary
if (isTidy_birthrate == FALSE) {
  
  # Convert year columns into a variable
  birthrate_df <- birthrate_df %>% 
    select(Country, `2000`, `2005`, `2010`, `2015`) %>%
    gather(Year, BirthRate, -Country) %>% 
    arrange(Country, Year) %>% 
    inner_join(iso3_codes)

  # Fix datatypes
  birthrate_df$Year <- as.numeric(birthrate_df$Year)

  isTidy_birthrate <- TRUE # if this cell ir rerun, don't try to re-tidy
  
} else {
  print('Already processed, nothing to do.')
}
knitr::kable(birthrate_df[1:10, ], booktabs = T, caption = "Birthrate Data.  Tidy data, first 10 rows shown.  Births per 1000 people.")
Birthrate Data. Tidy data, first 10 rows shown. Births per 1000 people.
Country Year BirthRate ISO3 ISO2
Afghanistan 2000 48.021 AFG AF
Afghanistan 2005 44.723 AFG AF
Afghanistan 2010 39.829 AFG AF
Afghanistan 2015 34.809 AFG AF
Albania 2000 16.436 ALB AL
Albania 2005 12.821 ALB AL
Albania 2010 12.001 ALB AL
Albania 2015 12.197 ALB AL
Algeria 2000 19.554 DZA DZ
Algeria 2005 20.774 DZA DZ

Load Maternal Mortality Data

THis is the primary dataset which everything else is linked with. Here we see the maternal mortality rate provided as deaths per 100,000 live births broken out by country and year.

# We first want to check if the data has already been loaded and muched  as a local cached .csv file.  If the csv file is available, use that.  If 
# it's not, then we will download and/or load the raw data, munge and save as a local cache .csv.

# Source Data File
source_url        <- 'https://data.unicef.org/wp-content/uploads/2015/11/MMR-trend-estimates-2000-2017_MMEIG-2.xlsx'
raw_fn            <- 'MMR-trend-estimates-2000-2017_MMEIG-2.xlsx'
cache_fn          <- 'mmr_data.csv'

# Which WorkSheet to load and there are some extra rows above & below our table of interest. 
data_header_rows  <- 4  # rows at top to skip (note, blank rows are automatically skipped)
data_tail_rows    <- 17  # rows at bottom to skip
data_header       <- FALSE
data_sheet        <- 1      # optional for Excel Workbooks
data_column_names <- c('ISO3','Country','2000','2005','2010','2015', '2017')

# Some variables to help with flow control
isCacheFound      <- FALSE
isRawFound        <- FALSE
isDataLoaded      <- FALSE

local_raw_fn      <- paste('data/raw/', raw_fn, sep = '')
local_cache_fn    <- paste('data/cache/', cache_fn, sep = '')

# Check if we have a local copy of the data available to load
if(file.exists(local_cache_fn)) {
  msg <- paste('Found cached copy: ', local_cache_fn, sep='')
  isCacheFound <- TRUE
  
} else if(file.exists(local_raw_fn)) {
  msg <- paste('Found raw copy: ', local_raw_fn, ' (May need processing)', sep = '')
  isRawFound <- TRUE
  
} else {
  # Attempt to download the dataset
  download.file(source_url, local_raw_fn, method="auto")
  
  if(file.size(local_raw_fn) > 0) {
    msg <- paste('Downloaded file from: ', source_url, sep='')
    isRawFound <- TRUE
  } else {
    msg <- paste("File not found and couldn't be downloaded. Check file name and/or source.")
  }
}

print(msg)
## [1] "Found cached copy: data/cache/mmr_data.csv"
# Cached data file found
if (isCacheFound) {
  # Load CSV Files
  data_df <- read_csv(paste('data/cache/', cache_fn, sep = ''), col_names = TRUE)
  isDataLoaded <- TRUE
  msg <- 'Cached CSV data loaded.'

} else if (isRawFound) {
  # Load Raw File - Make sure it's excel since read.xlsx will break with other file formats
  if (str_detect(local_raw_fn, '.*\\.xlsx?$')) {
    data_df <- read.xlsx(local_raw_fn, sheetIndex = data_sheet, header=data_header)
    isDataLoaded <- TRUE
    msg <- 'Raw data loaded.'
    
    ##### DATA PROCESSING start #####
    
    # Are there any extraneous row at to top to remove?
    if (data_header_rows > 0) {
      data_df <- data_df[(data_header_rows + 1):nrow(data_df),]
      data_header_rows <- 0
      rownames(data_df) <- NULL
    }
    
    # Did the raw file have any extraneous rows at the bottom to remove?
    if (data_tail_rows > 0) {
      last_row <- nrow(data_df)-data_tail_rows
      data_df <- data_df[0:last_row,]
      data_tail_rows <- 0
      rownames(data_df) <- NULL
    }
    
    # Remove any empty columns in the raw data
    data_df <- data_df %>% select_if(function(x) {!all(is.na(x))})
    
    # Set the Column names
    if (length(data_column_names) > 0) {
      names(data_df) <- data_column_names
      data_column_names <- list()
    }
    
    # Convert to a tibble
    data_df <- as_tibble(data_df)
    
    ##### DATA PROCESSING done #####
    
    # Cache the processed as CSV for future
    write.csv(data_df, local_cache_fn, row.names=FALSE, na="")
    msg <- 'Raw data loaded, processed and saved to cache.'
    
    data_df <- read_csv(paste('data/cache/', cache_fn, sep = ''), col_names = TRUE)
  }
} else {
  # Bummer - no load data file was found to load
  msg <- 'Data NOT loaded.'
}

mmr_df <- data_df
isTidy_mmr   <- FALSE

print(msg)
## [1] "Cached CSV data loaded."
knitr::kable(mmr_df[1:10, ], booktabs = T, caption = "Maternal Mortality Data.  Count of maternal deaths per 100,000 live births.  Raw data, first 10 rows shown.  (Source: https://data.unicef.org/resources/dataset/maternal-health-data/)")
Maternal Mortality Data. Count of maternal deaths per 100,000 live births. Raw data, first 10 rows shown. (Source: https://data.unicef.org/resources/dataset/maternal-health-data/)
ISO3 Country 2000 2005 2010 2015 2017
AFG Afghanistan 1450 1140 954 701 638
ALB Albania 23 22 21 15 15
DZA Algeria 161 127 115 114 112
AGO Angola 827 519 326 251 241
ATG Antigua and Barbuda 44 40 44 43 42
ARG Argentina 66 59 51 41 39
ARM Armenia 43 35 32 28 26
AUS Australia 7 5 5 6 6
AUT Austria 6 6 5 5 5
AZE Azerbaijan 47 42 31 27 26

Tidy MMR Dataset

# If rerunning our RMarkdown, only tidy if necessary
if (isTidy_mmr == FALSE) {
  # Convert year columns into a variable
  mmr_df <- mmr_df %>% 
    gather(Year, MMR, -ISO3, -Country)

  # Fix datatypes
  mmr_df$Year <- as.numeric(mmr_df$Year)

  # Filter years  
  mmr_df <- mmr_df %>% filter(Year <= 2015) %>% arrange(Country, Year)

  isTidy_mmr <- TRUE # if this cell ir rerun, don't try to re-tidy
  
} else {
  print('Already processed, nothing to do.')
}
knitr::kable(mmr_df[1:10, ], booktabs = T, caption = "Maternal Mortality Data.  Count of maternal deaths per 100,000 live births.  Tidy data, first 10 rows shown. Years included: 2000, 2005, 2010, 2015.")
Maternal Mortality Data. Count of maternal deaths per 100,000 live births. Tidy data, first 10 rows shown. Years included: 2000, 2005, 2010, 2015.
ISO3 Country Year MMR
AFG Afghanistan 2000 1450
AFG Afghanistan 2005 1140
AFG Afghanistan 2010 954
AFG Afghanistan 2015 701
ALB Albania 2000 23
ALB Albania 2005 22
ALB Albania 2010 21
ALB Albania 2015 15
DZA Algeria 2000 161
DZA Algeria 2005 127

Load Skilled Attendant Data

UNICEF provides country level data describing a number of different indicators related to pregnancy and births. The downloaded raw dataset includes a number of different indicator tabs, “Skilled Attendant at Birth” (SAB) being only one of many. I’m curious how the presence of a trainined attendant might correlate with the more central question of maternal mortality during birth. UNICEF don’t report for all countries, only a subset, so that will limit my analysis to specific countries.

# We first want to check if the data has already been loaded and muched  as a local cached .csv file.  If the csv file is available, use that.  If 
# it's not, then we will download and/or load the raw data, munge and save as a local cache .csv.

# Source Data File
source_url        <- 'https://data.unicef.org/wp-content/uploads/2018/07/maternal_health_adolescents_indicators_April-2016_250d599.xlsx'
raw_fn            <- 'maternal_health_adolescents_indicators_April-2016_250d599.xlsx'
cache_fn          <- 'sab_indicators.csv'

# Which WorkSheet to load and there are some extra rows above & below our table of interest. 
data_header_rows  <- 8  # rows at top to skip (note, blank rows are automatically skipped)
data_tail_rows    <- 9  # rows at bottom to skip
data_header       <- FALSE
data_sheet        <- 5      # optional for Excel Workbooks
data_column_names <- c('ISO3','Country','Year','Total','Age_1517','Age_1819', 'Age_lt20', 'Age_gt20', 'Age_2034', 'Age_3549', 'Source', 'SourceYear')

# Some variables to help with flow control
isCacheFound      <- FALSE
isRawFound        <- FALSE
isDataLoaded      <- FALSE

local_raw_fn      <- paste('data/raw/', raw_fn, sep = '')
local_cache_fn    <- paste('data/cache/', cache_fn, sep = '')

# Check if we have a local copy of the data available to load
if(file.exists(local_cache_fn)) {
  msg <- paste('Found cached copy: ', local_cache_fn, sep='')
  isCacheFound <- TRUE
  
} else if(file.exists(local_raw_fn)) {
  msg <- paste('Found raw copy: ', local_raw_fn, ' (May need processing)', sep = '')
  isRawFound <- TRUE
  
} else {
  # Attempt to download the dataset
  download.file(source_url, local_raw_fn, method="auto")
  
  if(file.size(local_raw_fn) > 0) {
    msg <- paste('Downloaded file from: ', source_url, sep='')
    isRawFound <- TRUE
  } else {
    msg <- paste("File not found and couldn't be downloaded. Check file name and/or source.")
  }
}

print(msg)
## [1] "Found cached copy: data/cache/sab_indicators.csv"
# Cached data file found
if (isCacheFound) {
  # Load CSV Files
  data_df <- read_csv(paste('data/cache/', cache_fn, sep = ''), col_names = TRUE)
  isDataLoaded <- TRUE
  msg <- 'Cached CSV data loaded.'

} else if (isRawFound) {
  # Load Raw File - Make sure it's excel since read.xlsx will break with other file formats
  if (str_detect(local_raw_fn, '.*\\.xlsx?$')) {
    data_df <- read.xlsx(local_raw_fn, sheetIndex = data_sheet, header=data_header)
    isDataLoaded <- TRUE
    msg <- 'Raw data loaded.'
    
    ##### DATA PROCESSING start #####
    
    # Are there any extraneous row at to top to remove?
    if (data_header_rows > 0) {
      data_df <- data_df[(data_header_rows + 1):nrow(data_df),]
      data_header_rows <- 0
      rownames(data_df) <- NULL
    }
    
    # Did the raw file have any extraneous rows at the bottom to remove?
    if (data_tail_rows > 0) {
      last_row <- nrow(data_df)-data_tail_rows
      data_df <- data_df[0:last_row,]
      data_tail_rows <- 0
      rownames(data_df) <- NULL
    }
    
    # Remove any empty columns in the raw data
    data_df <- data_df %>% select_if(function(x) {!all(is.na(x))})
    
    # Set the Column names
    if (length(data_column_names) > 0) {
      names(data_df) <- data_column_names
      data_column_names <- list()
    }
    
    # Convert to a tibble
    data_df <- as_tibble(data_df)
    
    ##### DATA PROCESSING done #####
    
    # Cache the processed as CSV for future
    write.csv(data_df, local_cache_fn, row.names=FALSE, na="")
    msg <- 'Raw data loaded, processed and saved to cache.'
    
    data_df <- read_csv(paste('data/cache/', cache_fn, sep = ''), col_names = TRUE)
  }
} else {
  # Bummer - no load data file was found to load
  msg <- 'Data NOT loaded.'
}

sab_df   <- data_df
isTidy_sab <- FALSE

print(msg)
## [1] "Cached CSV data loaded."
knitr::kable(sab_df[1:10, ], booktabs = T, caption = "Skilled Attendant at Birth.  Raw data, first 10 rows shown.  (Source: https://data.unicef.org/wp-content/uploads/2018/07/maternal_health_adolescents_indicators_April-2016_250d599.xlsx)")
Skilled Attendant at Birth. Raw data, first 10 rows shown. (Source: https://data.unicef.org/wp-content/uploads/2018/07/maternal_health_adolescents_indicators_April-2016_250d599.xlsx)
ISO3 Country Year Total Age_1517 Age_1819 Age_lt20 Age_gt20 Age_2034 Age_3549 Source SourceYear
AFG Afghanistan 2010 39.00000 34 35 34 37 37 37 MICS 2010
ALB Albania 2000 99.12912 100 100 100 99.0734476224367 98.9604208782254 100 MICS 2000
ALB Albania 2005 99.75600 100 100 100 99.7308564178849 99.7006924128754 100 MICS 2005
ALB Albania 2008 99.43894 93.9805367626799 100 98.198223919787 99.5253293235263 99.5189609784249 99.6050081410941 DHS 2008
ARM Armenia 2000 97.17686 96.1092640141004 97.3167730215472 97.112296301636 97.1883744086192 97.4217674005781 93.8763280483474 DHS 2000
ARM Armenia 2005 98.58195 100 100 100 98.4219872151632 98.3336271503564 100 DHS 2005
ARM Armenia 2010 100.00000 100 100 100 100 100 100 DHS 2010
AZE Azerbaijan 2000 87.46867 100 91.6666666666667 94 86.5497076023392 87.3817034700316 76 MICS 2000
AZE Azerbaijan 2006 89.58261 96.3219109387315 96.007394399452 96.0907370879405 88.9887563923704 89.1024951119957 87.6490910432445 DHS 2006
BGD Bangladesh 2004 15.53558 14.659160649885 17.0460969662523 15.7557293822027 15.4272774340049 15.9158543553316 9.65134903482117 DHS 2004

Tidy SAB Dataset

isTidy_sab   <- FALSE

# If rerunning our RMarkdown, only tidy if necessary
if (isTidy_sab == FALSE) {
  # While this is a wide dataset and it's trivial to gather() the age data, for my questions, I really don't need age.  Since my MMR, Population and NHA data do not provide Age breakouts, I will only focus on the Total percentage.
  sab_df <- sab_df %>% select(ISO3, Country, Year, Total, Source)
  
  # Fix datatypes
  sab_df$Year <- as.numeric(sab_df$Year)
  sab_df$Total <- as.numeric(sab_df$Total)
  
  # Since there are two different sources of data, we might have a case where they report for the same year.  I'm going to group_by() and average the totals in the case where we have multiple data points.
  sab_df <- sab_df %>% group_by(ISO3, Year) %>% summarize(SAB_Pct = mean(Total))
  sab_df <- sab_df %>% filter(Year == 2000 | Year == 2005 | Year == 2010 | Year == 2015)
  
  isTidy_sab <- TRUE # if this cell ir rerun, don't try to re-tidy

} else {
  print('Already processed, nothing to do.')
}
knitr::kable(sab_df[1:10, ], booktabs = T, caption = "Skilled Attendant at Birth.  Tidy data, first 10 rows shown.")
Skilled Attendant at Birth. Tidy data, first 10 rows shown.
ISO3 Year SAB_Pct
AFG 2010 39.00000
ALB 2000 99.12912
ALB 2005 99.75600
ARM 2000 97.17686
ARM 2005 98.58195
ARM 2010 100.00000
AZE 2000 87.46867
BDI 2000 25.17007
BDI 2005 33.62995
BDI 2010 67.18792

Load Health Expenditures Data

WHO provides on online tool to export data from WHO member countries on a variety of meansures. After exploring options, I honed in on Health Expenditures. I am curious if spending on Health has any noticable correlation with maternal mortality outcomes. I manually downloaded the NHA_indicators.xlsx from their online tool.

reference: http://apps.who.int/nha/database/Home/Index/en source: http://apps.who.int/nha/database/ViewData/Indicators/en

# reference: http://apps.who.int/nha/database/Home/Index/en
# source: http://apps.who.int/nha/database/ViewData/Indicators/en

# We first want to check if the data has already been loaded and muched  as a local cached .csv file.  If the csv file is available, use that.  If 
# it's not, then we will download and/or load the raw data, munge and save as a local cache .csv.

# Source Data File
#source_url        <- 'https://data.unicef.org/wp-content/uploads/2018/07/maternal_health_adolescents_indicators_April-2016_250d599.xlsx'
raw_fn            <- 'NHA_indicators.xlsx'
cache_fn          <- 'nha_indicators.csv'

# Which WorkSheet to load and there are some extra rows above & below our table of interest. 
data_header_rows  <- 2  # rows at top to skip (note, blank rows are automatically skipped)
data_tail_rows    <- 0  # rows at bottom to skip
data_header       <- FALSE
data_sheet        <- 1      # optional for Excel Workbooks
data_column_names <- c('Country','Indicator','Note','Year_2000','Year_2005','Year_2010', 'Year_2015')

# Some variables to help with flow control
isCacheFound      <- FALSE
isRawFound        <- FALSE
isDataLoaded      <- FALSE

local_raw_fn      <- paste('data/raw/', raw_fn, sep = '')
local_cache_fn    <- paste('data/cache/', cache_fn, sep = '')

# Check if we have a local copy of the data available to load
if(file.exists(local_cache_fn)) {
  msg <- paste('Found cached copy: ', local_cache_fn, sep='')
  isCacheFound <- TRUE
  
} else if(file.exists(local_raw_fn)) {
  msg <- paste('Found raw copy: ', local_raw_fn, ' (May need processing)', sep = '')
  isRawFound <- TRUE
  
} else {
  # Since data came from an online tool, we don't have a specific file we can download
  msg <- paste("File not found and couldn't be downloaded. Check file name and/or source.")
}

print(msg)
## [1] "Found cached copy: data/cache/nha_indicators.csv"
# Cached data file found
if (isCacheFound) {
  # Load CSV Files
  data_df <- read_csv(paste('data/cache/', cache_fn, sep = ''), col_names = TRUE)
  isDataLoaded <- TRUE
  msg <- 'Cached CSV data loaded.'

} else if (isRawFound) {
  # Load Raw File - Make sure it's excel since read.xlsx will break with other file formats
  if (str_detect(local_raw_fn, '.*\\.xlsx?$')) {
    data_df <- read.xlsx(local_raw_fn, sheetIndex = data_sheet, header=data_header)
    isDataLoaded <- TRUE
    msg <- 'Raw data loaded.'
    
    ##### DATA PROCESSING start #####
    
    # Are there any extraneous row at to top to remove?
    if (data_header_rows > 0) {
      data_df <- data_df[(data_header_rows + 1):nrow(data_df),]
      data_header_rows <- 0
      rownames(data_df) <- NULL
    }
    
    # Did the raw file have any extraneous rows at the bottom to remove?
    if (data_tail_rows > 0) {
      last_row <- nrow(data_df)-data_tail_rows
      data_df <- data_df[0:last_row,]
      data_tail_rows <- 0
      rownames(data_df) <- NULL
    }
    
    # Remove any empty columns in the raw data
    data_df <- data_df %>% select_if(function(x) {!all(is.na(x))})
    
    # Set the Column names
    if (length(data_column_names) > 0) {
      names(data_df) <- data_column_names
      data_column_names <- list()
    }
    
    # Convert to a tibble
    data_df <- as_tibble(data_df)
    
    ##### DATA PROCESSING done #####
    
    # Cache the processed as CSV for future
    write.csv(data_df, local_cache_fn, row.names=FALSE, na="")
    msg <- 'Raw data loaded, processed and saved to cache.'
    
    data_df <- read_csv(paste('data/cache/', cache_fn, sep = ''), col_names = TRUE)
  }
} else {
  # Bummer - no load data file was found to load
  msg <- 'Data NOT loaded.'
}

nha_df <- data_df
isTidy_nha   <- FALSE

print(msg)
## [1] "Cached CSV data loaded."
knitr::kable(nha_df[1:10, ], booktabs = T, caption = "WHO Health Expenditures Data.  Raw data, first 10 rows shown.  (Source: http://apps.who.int/nha/database/ViewData/Indicators/en)")
WHO Health Expenditures Data. Raw data, first 10 rows shown. (Source: http://apps.who.int/nha/database/ViewData/Indicators/en)
Country Indicator Note Year_2000 Year_2005 Year_2010 Year2015
Algeria Current Health Expenditure (CHE) per Capita in PPP NA 2.823850e+02 354.95766 645.28783 1020.54641
Algeria Gross Domestic Product in million current PPP 2.523851e+05 365236.38917 455452.36606 582720.55376
Algeria Purchasing Power Parity (NCU per Int$) NA 1.633813e+01 20.70440 26.32891 28.66228
Algeria Population (in thousands) NA 3.118366e+04 33288.43700 36117.63700 39871.52800
Algeria Female Population 15-49 NA 1.206122e+04 12739.57200 13204.20100 14043.06800
Angola Current Health Expenditure (CHE) per Capita in PPP NA 6.492660e+01 131.33186 158.17301 195.75735
Angola Gross Domestic Product in million current PPP 4.572640e+04 68754.61066 135067.31639 184842.95978
Angola Purchasing Power Parity (NCU per Int$) NA 2.004662e+00 35.79142 56.15439 66.65537
Angola Population (in thousands) NA 1.644092e+04 19552.54200 23369.13100 27859.30500
Angola Female Population 15-49 NA 6.121979e+03 7243.34500 8637.37000 10368.29500

Tidy NHA Dataset

# If rerunning our RMarkdown, only tidy if necessary
if (isTidy_nha == FALSE) {
  # Remove to extra Note column we won't need
  nha_df <- nha_df %>% select(-Note)
  
  # Convert year columns into a variable
  nha_df <- nha_df %>%
    gather(Year, Amount, -Country, -Indicator) %>% 
    mutate(Year = stringr::str_replace(Year, "Year_", "")) %>% 
    drop_na()

  # Fix datatypes
  nha_df$Year <- as.numeric(nha_df$Year)
  nha_df$Amount <- as.numeric(nha_df$Amount)
  
  # Break out CHE into a separate column
  exp_data <- nha_df %>% filter(Indicator == 'Current Health Expenditure (CHE) per Capita in PPP') %>% 
    mutate(CHE = Amount) %>%
    select(-Indicator, -Amount) %>%
    arrange(Country, Year) %>% 
    drop_na()
  
  # Break out GDP into a separate column
  gdp_data <- nha_df %>% filter(Indicator == 'Gross Domestic Product') %>% 
    mutate(GDP = Amount) %>%
    select(-Indicator, -Amount) %>%    
    arrange(Country, Year) %>% 
    drop_na()
  
  # Join the columns back into a table based on Country & Year
  nha_df <- gdp_data %>% 
    left_join(exp_data) %>%
    left_join(iso3_codes)

  isTidy_nha <- TRUE # if this cell ir rerun, don't try to re-tidy
  
} else {
  print('Already processed, nothing to do.')
}
knitr::kable(nha_df[1:10, ], booktabs = T, caption = "WHO Reported Health Expernditures & GDP by Country & Year.  Tidy data, first 10 rows shown. Note CHE are adjusted with Purchase Price Parity (PPP) so dollars have equivalent spending power per country.  This is also sometimes refered to as the 'Big Mac Index'")
WHO Reported Health Expernditures & GDP by Country & Year. Tidy data, first 10 rows shown. Note CHE are adjusted with Purchase Price Parity (PPP) so dollars have equivalent spending power per country. This is also sometimes refered to as the ‘Big Mac Index’
Country Year GDP CHE ISO3 ISO2
Afghanistan 2005 24851.345 98.61208 AFG AF
Afghanistan 2010 44457.152 132.27128 AFG AF
Albania 2000 11925.958 258.28327 ALB AL
Albania 2005 17663.350 363.84904 ALB AL
Albania 2010 28073.799 478.31474 ALB AL
Algeria 2000 252385.066 282.38500 DZA DZ
Algeria 2005 365236.389 354.95766 DZA DZ
Algeria 2010 455452.366 645.28783 DZA DZ
Andorra 2000 2132.781 3049.00618 AND AD
Andorra 2005 3444.711 3741.40393 AND AD

Analysis

Note: PPP refers to “Price Point Parity”. This country-by-country adjustment reflects different purchasing power in different geos and attempts to normalize spending for better comparison across geos. We know intuititely that cost of living and items vary greatly around the world and $1 in the US doesn’t NOT have the same purchasing power as $1 in India. PPP normalizes so that spending can be compared across countries.

Join Tables

Note that we are combining data from a number of sources and years. As such there wil be a number of NA’s where data is missing in various columns. Here I also calculate the actual number of maternal deaths based on populations, birth rates and MMR rates.

# Use the current population by year * birthrate by year to get actual births
births_df <- birthrate_df %>%
  inner_join(population_df) %>%
  mutate(Births = round(Population / 1000 * BirthRate)) %>% 
  select(ISO3, Country, Year, Population, BirthRate, Births)
births_df <- births_df %>% left_join(iso3_codes)

# now that we have births, we can multiply by MMR (deaths per 100,000 births)
mmr_df <- mmr_df %>% 
  left_join(births_df) %>%
  mutate(Deaths = round(Births / 10000 * MMR))
  
# join in Skilled Attendant at Birth (SAB)
mmr_df <- mmr_df %>%
  left_join(sab_df) %>%
  left_join(nha_df)
knitr::kable(mmr_df[1:10, ], booktabs = T, caption = "Maternal Mortaility table with supporting columns.")
Maternal Mortaility table with supporting columns.
ISO3 Country Year MMR Population BirthRate Births ISO2 Deaths SAB_Pct GDP CHE
AFG Afghanistan 2000 1450 22462000 48.021 1078648 AF 156404 NA NA NA
AFG Afghanistan 2005 1140 26335000 44.723 1177780 AF 134267 NA 24851.34 98.61208
AFG Afghanistan 2010 954 29121000 39.829 1159860 AF 110651 39.00000 44457.15 132.27128
AFG Afghanistan 2015 701 32565000 34.809 1133555 AF 79462 NA NA NA
ALB Albania 2000 23 3159000 16.436 51921 AL 119 99.12912 11925.96 258.28327
ALB Albania 2005 22 3025000 12.821 38784 AL 85 99.75600 17663.35 363.84904
ALB Albania 2010 21 2987000 12.001 35847 AL 75 NA 28073.80 478.31474
ALB Albania 2015 15 3030000 12.197 36957 AL 55 NA NA NA
DZA Algeria 2000 161 30639000 19.554 599115 DZ 9646 NA 252385.07 282.38500
DZA Algeria 2005 127 32918000 20.774 683839 DZ 8685 NA 365236.39 354.95766

MMR from 2000 to 2015

We generally see a decline in mortality rates from 2000 to 2015 data.

data <- mmr_df %>% filter(Year==2000 | Year==2015) %>% select(MMR, Year)
data$Year <- as.factor(data$Year)

ggplot(data=data, aes(x=MMR, color=Year)) +
  geom_histogram(fill='white', alpha=0.2, position="identity") +
  scale_color_discrete(name="Year", labels=c(2000, 2015))

Changes in Mortality by Country

Let’s see which countries have the most reduction in maternal deaths.

# Group data by country and calculate the delta in deaths over the years
tbl <- mmr_df %>% 
  group_by(Country) %>% 
  summarize(delta = round(first(Deaths) - last(Deaths))) %>%
  arrange(desc(delta)) %>%
  drop_na()

# barplot
ggplot(tbl %>% top_n(10, delta), aes(x=reorder(Country, delta), y=delta, fill=Country)) +
  ggtitle("Top 10 Countries with Reductions in Deaths") +
  theme(plot.title = element_text(hjust = 0.5)) + 
  geom_bar(position='dodge', stat="identity") + 
  geom_text(aes(reorder(Country, delta), y=delta, label = round(delta)), size = 2, hjust = -0.5, vjust = 0.5 ) +
  scale_y_discrete(labels = comma, breaks=c(1000, 100000, 200000, 600000)) +
  ylim(c(0,700000)) +
  xlab('Country') + 
  ylab('Deaths') +
  theme(legend.position = "none") + 
  coord_flip()

… and those with increases in maternal deaths. Nigeria is clearly experiencing a crisis.

# barplot
ggplot(tbl %>% top_n(-10, delta), aes(x=reorder(Country, delta), y=delta, fill=Country)) +
  ggtitle("Bottom 10 Countries with Increases in Deaths") +
  theme(plot.title = element_text(hjust = 0.5)) + 
  geom_bar(position='dodge', stat="identity") + 
  geom_text(aes(reorder(Country, delta), y=delta, label = round(delta)), size = 2, hjust = 1.5, vjust = 0.5 ) +
  scale_y_discrete(labels = comma, breaks=c(-30000, -20000, -1000)) +
  ylim(c(-25000,0)) +
  xlab('Country') + 
  ylab('Deaths') +
  theme(legend.position = "none") +
  coord_flip()

# Group data by country and calculate the delta in deaths over the years
mmr_df <- mmr_df %>% 
  group_by(Country) %>% 
  mutate(delta = Deaths - lag(Deaths, order_by=Year, default=first(Deaths)))

Health Expenditures vs GDP

We see that generally, countries spend less than $4000 (PPP) adjusted dollars per person on health expenditures and as GDP increases, there is an increase in CHE. We some clear outliers with very high GDP countries spending very little and vice versa, countries with relatively low GDP spending reporting quite high CHE. While this is an interesting pattern, I suspect that CHE is not entriely reported nor being spent on the poorest population that might be more impacted by MMR. Unfortunaely, I don’t have the dataset to fully explore those questions.

ggplot(mmr_df, aes(GDP, CHE)) +
  geom_point() +
  labs(title='Country Health Expenditure (PPP) vs Country GDP (PPP)') +
  xlab('GDP (Millions, PPP)') +
  ylab('CHE (Dollars, PPP)')

MMR vs CHE

We do see a pattern where higher reported health expenditures correlate with lower maternal mortality and inversely, those countries (and years) with lower CHE see higher MMR.

ggplot(mmr_df, aes(CHE, MMR)) +
  geom_point() +
  labs(title='Maternal Mortality Rate (MMR) vs Country Health Expenditure (PPP)') +
  xlab('CHE (Dollars, PPP)') +
  ylab('MMR (per 100k live births)')

Maternal Deaths vs CHE

The death count increases most dramatically with lower health expenditures and population impacts.

ggplot(mmr_df, aes(CHE, Deaths)) +
  geom_point() +
  labs(title='Maternal Deaths vs Country Health Expenditures (PPP)') +
  xlab('CHE (Dollars, PPP)') +
  ylab('Maternal Deaths')

MMR vs GDP

Generally the Maternal Mortality rate is higher with counties having lower GDP.

ggplot(mmr_df, aes(GDP, MMR)) +
  geom_point() +
  labs(title='Maternal Mortality Rate (MMR) vs Country GDP (Millions, PPP)') +
  xlab('GDP (Millions, PPP)') +
  ylab('MMR (per 100k live births)')

Skilled Attendants and MMR

While we don’t have many datapoints, there does appear to be a trend with lowered materal mortality when there is a higher percentage of births managed by a health professional. That said, a Skilled Attendant at birth may only be able to do so much … there may be affects earlier in pregnancy leading to mortality.

ggplot(mmr_df %>% drop_na(), aes(SAB_Pct, MMR)) +
  geom_point() +
  labs(title='Maternal Mortality Rate vs Skilled Attendants at Birth (SAB)', 
       caption='note, we only have 98 datapoints with both SAB and MMR')+
  xlab('Skilled Attendant at Birth (%)') +
  ylab('MMR (per 100k live births)')

Skilled Attendants and Deaths

As with mortality rate, we generally see fewer reported deaths as the percent of births are handled by a trained health professional.

ggplot(mmr_df %>% drop_na(), aes(SAB_Pct, Deaths)) +
  geom_point() + 
  scale_y_continuous(labels = comma) +
  ylim(0, 300000) +
  labs(title='Maternal Deaths vs Skilled Attendants at Birth (SAB)') +
  xlab('Skilled Attendant at Birth (%)') +
  ylab('Maternal Deaths')

Summary

I started this project with a single core dataset with MMR by country and year. As I tried to answer questions, I found I needed additional data which lead to chasing down populations, birthrates and country codes. Only then did I seek out the Skilled Attendants and Health Spending data to look for patterns. Worldwide, the reported maternal mortality rate has dropped by ~40% which broadly shows up in my histograms. While the overall mortality rate is decreasing, some countries are greatly improving (e.g. India) while others have more deaths (e.g. Nigeria). Looking at health spend, GDP and presence of Skilled Attendants during birth, the later had the most notable correlation, but data was scarce. The NGO I worked with was targeting training of health workers, and from this quick survey of data, that is likely a fruitful approach to the problem.