What is maternal mortality?

Maternal mortality covers the women and girls who had died every year due to a pregnancy related complication. In comparison to the world, it will be interesting to disect which states still experience the highest number of deaths compared to others. ## What is in the dataset? In this dataset, the MMR is the ratio of deaths/during 100,000 births. This data was collected in 2018. The variables in this dataset include the states, the MMR, Prenatal, Csection, Underserved, Uninsured, and Populations_18. Prenatal is the percentage of women recieving delayed or no prenatal care. Csection is the percentage of births ending in cesarean section. Underdeserved is the percentage of women living in medically underserved areas. Uninsured is the percent of women who did not have insurance and Population_18 is the population count of that state in 2018.

#load libraries into R
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.1.0     v dplyr   1.0.4
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(dplyr)
library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(readr)
#Set working directory
setwd("C:/Users/Haley/Desktop")
#Read in csv file to read in data and display column names
mydata <- read_csv("maternalmortality.csv")
## 
## -- Column specification --------------------------------------------------------
## cols(
##   State = col_character(),
##   MMR = col_double(),
##   Prenatal = col_double(),
##   Csection = col_double(),
##   Underserved = col_double(),
##   Uninsured = col_double(),
##   Population_18 = col_double()
## )
names(mydata)
## [1] "State"         "MMR"           "Prenatal"      "Csection"     
## [5] "Underserved"   "Uninsured"     "Population_18"
#change all names of columns to lower case
names(mydata) <- tolower(names(mydata))
names(mydata)
## [1] "state"         "mmr"           "prenatal"      "csection"     
## [5] "underserved"   "uninsured"     "population_18"
#Basic barplot with the states and the number of maternal deaths on the y-axis
mydata %>%
  mutate(deaths = (mydata$mmr/10000) * mydata$population_18) %>%
  ggplot(aes(state, deaths)) +
  geom_bar(stat = "identity")

## Outlier Although this graph is a bit more difficult to see, there is one clear outlier bar graph. The outlier in this graph has the highest death rate which most likely has a direct result of an extremely high population rate. The state with a population of nearly 4 million is California. This is extremely interesting and I hope to dig a bit deeper into seeing if there is a commonality in general among the rate of maternal mortality in areas surrounding califonia too.

#Start by filtering the regions and organizing the states data
west <- mydata %>% filter(state == "CO"| state == "WA"| state == "NV" | state == "WY" | state == "MT" | state == "ID" | state == "OR" | state == "UT" | state == "CA" | state == "AK" | state == "HI")

northeast <- mydata %>% filter(state == "NY" | state == "VT"| state == "NH" | state == "ME" | state == "MA" | state == "RI" | state == "CT" | state == "PA" | state == "NJ" | state == "DE" | state == "MD" )

southeast <- mydata %>% filter(state == "WV" | state == "VA" | state == " NC" | state == "KY" | state == "TN" | state == "SC" | state == "GA" | state == "AL" | state == "MS" | state == "AR" | state == "LA" | state == "FL")

midwest <- mydata %>% filter (state == "SD" | state == "ND" |state == "OH" | state == "IN" | state == "MI" | state == "MO" | state == "WI" | state == "MN" | state == "IA" | state == "KS" | state == "NE")

southwest <- mydata %>% filter(state == "TX" | state == "NM" | state == "AZ" | state == "OK")
#there was an extra space in the AZ so it was not working
#combine all regions into one variable
regions <- c(southwest, midwest, southeast, northeast, west)
#Combine data into one dataframe, create a vector with states you want in each filter and use %in%
west_names <- west$state %>% unique()
northeast_names <- northeast$state %>% unique()
southeast_names <- southeast$state %>% unique()
midwest_names <- midwest$state %>% unique()
southwest_names <- southwest$state %>%  unique()

Create the new column Region

#Create the column region in mydata
mydata$region <- NA
#Ensure all states are in the regions 
for (i in 1:51){
  if (mydata$state[i] %in% west_names){
  mydata$region[i] <- "west"
  }
  else if(mydata$state[i] %in% northeast_names){
    mydata$region[i] <- "northeast"
  }
  else if(mydata$state[i] %in% southeast_names){
    mydata$region[i] <- "southeast"
  }
  else if(mydata$state[i] %in% midwest_names){
    mydata$region[i] <- "midwest"
  }
  else {
    mydata$region[i] <- "southwest"
  }
}

Now the column regions has our desired mapping. We need to group it.

I will use the mean mmr by region

#group the regions and use the mean mortality rate to display the impacts in each region
gdf <- mydata %>% group_by(region) %>% summarize(rate = mean(mmr))
#display the calculated rates in the new dataframe
gdf %>% head()
## # A tibble: 5 x 2
##   region     rate
##   <chr>     <dbl>
## 1 midwest    8.1 
## 2 northeast  8.83
## 3 southeast 12.7 
## 4 southwest 14.3 
## 5 west       8.55

Here is your histogram!

plot1<- gdf %>% ggplot(aes(region, rate, fill = region))+ labs(title = "Region vs Mortality Rate", subtitle = "Births in year 2018") + xlab("Regions")+ ylab("Rate of Mortality per 100,000 births") + theme_gray()+ scale_fill_manual(values = c("#D16103", "#C3D7A4", "#52854C", "#4E84C4", "#293352"))+
  geom_bar(stat = "identity")
plot1

ggplotly(plot1)

Summary

What is most interesting about this visual is that the rates are still an issue. These rates per 100,000 births means that thousands of women are dying as a result of maternal issues. Another important aspect of this data if that even though California was the highest death rate state, the west is not one of the highest general regions. This is interesting because the highest region overall for the mean rate of maternal mortality is in the southwest. What I would like to carry further with this analysis is seeing how what factors such as underdeserved populations or uninsured women are a result as to why this southwest region has the highest mean for maternal mortality rate.