I started with loading in the packages I’ll need to look at my data and create simple plots for further analysis of said data
library(ggplot2)
library(rio)
## Warning: package 'rio' was built under R version 4.0.5
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v tibble 3.1.0 v dplyr 1.0.5
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## v purrr 0.3.4
## Warning: package 'tibble' was built under R version 4.0.4
## Warning: package 'tidyr' was built under R version 4.0.4
## Warning: package 'dplyr' was built under R version 4.0.4
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(dplyr)
The first thing I did was load in the maternal mortality data set and view the data using the str command. This way, I’m able to see what the variables are characterized as (character, number, integer, etc). I also am able to view the entire data set in a separate window.
setwd("C:/Users/12403/Documents/DATA 110/R_Datasets")
maternal_mortality <- import("maternalmortality.csv")
str(maternal_mortality)
## 'data.frame': 51 obs. of 7 variables:
## $ State : chr "AL" "AK" "AZ" "AR" ...
## $ MMR : num 9.6 5 7.2 14.6 11.3 11 5.1 13.6 13.1 20.5 ...
## $ Prenatal : num 16.3 19.8 23.5 18.9 13 20.5 11.9 14.4 16.1 15.8 ...
## $ Csection : num 33.8 22.6 26.2 34.8 32.1 25.8 34.6 32.1 37.2 32 ...
## $ Underserved : int 55 50 51 34 49 42 50 50 51 41 ...
## $ Uninsured : num 18.1 19.8 22.3 23.3 20.9 18 12.1 12.6 23.6 19.7 ...
## $ Population_18: int 4887871 737438 7171646 3013825 39557045 5695564 3572665 967171 21299325 10519475 ...
Luckily, this data set is already very clean; there are no NA variables I have to remove and all the variables are characterized how I want them to be. However, for ease of viewing I will use the dplyr command ‘arrange’ on the variable MMR (ratio of number of maternal deaths that occur during a certain time period per 100,000 live births) in descending order.
maternal_mortality <- maternal_mortality %>%
arrange(desc(MMR)) # arrange command to see MMR in descending order
head(maternal_mortality)
## State MMR Prenatal Csection Underserved Uninsured Population_18
## 1 DC 34.9 23.2 32.6 50 11.5 702455
## 2 GA 20.5 15.8 32.0 41 19.7 10519475
## 3 NM 16.9 30.9 23.3 61 25.6 2095428
## 4 MD 16.5 16.6 33.1 40 15.1 6042718
## 5 NY 16.0 15.0 33.7 40 15.1 19542209
## 6 LA 15.9 15.5 35.9 51 25.9 4659978
I created a boxplot to see both the average percent of underserved women in each state as well as the max and min percentage. Based on the visualization, I can see that the data ranges from a minimum of ~28% and a maximum of 60%. The average percentage of underserved women in each state is around 45%, which is alarmingly high.
boxplot(maternal_mortality$Underserved, main="Underserved", sub=paste("Outlier rows: ", boxplot.stats(maternal_mortality$Underserved)$out))
Lastly, I created a scatter plot to see the relationship and linear regression between the variables MMR and Prenatal, which would show me if there was a relationship between maternal deaths and prenatal care. This shows me that there is in fact a positive relationship between lack of prenatal care and maternal deaths, which is something I’d like to explore further in my following visualizations.
Something that pops at me from this visualization is the outlier. After a quick look at my data, I know that the outlier represents DC, and that the reason the MMR is so high is because MMR represents per 100,000 live births. Of course, because DC’s population is less than 100,000, the MMR number would be much higher compared to the rest of the US.
library(ggplot2)
ggplot(maternal_mortality, aes(x=MMR, y=Prenatal)) +
geom_point(shape=1) +
geom_smooth(method=lm) # Added a linear regression line
## `geom_smooth()` using formula 'y ~ x'
First I created a simple histogram to see the distribution of the MMR. MMR is definitely the key variable I’m interested in for this dataset, so I wanted to have a brief look at it with this plot. Based on my results, I see that this is a normal distribution, with the exception of DC.
ggplot(maternal_mortality, aes(x=MMR)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
I was also interested in seeing the relationship between lack of prenatal care and the percent of underserved women in each state. Based on this scatterplot, I can clearly see a positive relationship between the two variables.
ggplot(maternal_mortality, aes(x=Prenatal, y=Underserved)) +
geom_point()
Finally, I wanted to see how the underserved, uninsured, prenatal, and MMR variables were connected, if they were related at all. By making this plot, I’m able to see that save for a few outliers, generally higher percentages of uninsured and underserved women means a higher MMR and a lower percentage of women who receive prenatal care.
ggplot(maternal_mortality,aes(x=Underserved,y=Uninsured))+
geom_point(aes(size=Prenatal,col=MMR))
After exploring the data through both simple plots and through my statistical analysis, I am definitely interested in how MMR and the lack of resources (whether that be lack of insurance, lack of prenatal care, etc) are related, and how significant they are when it comes to maternal mortality rates.
mmr_prenatal <- maternal_mortality %>%
select(State, MMR, Prenatal, Underserved, Uninsured) # Narrowed down the variables I was interested in with the select dplyr command
head(mmr_prenatal)
## State MMR Prenatal Underserved Uninsured
## 1 DC 34.9 23.2 50 11.5
## 2 GA 20.5 15.8 41 19.7
## 3 NM 16.9 30.9 61 25.6
## 4 MD 16.5 16.6 40 15.1
## 5 NY 16.0 15.0 40 15.1
## 6 LA 15.9 15.5 51 25.9
# Called in the packages I would use to create my visualizations
library(ggplot2)
library(hrbrthemes)
## Warning: package 'hrbrthemes' was built under R version 4.0.5
## NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
## Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
## if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow
library(plotly)
## Warning: package 'plotly' was built under R version 4.0.5
##
## Attaching package: 'plotly'
## The following object is masked from 'package:rio':
##
## export
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(viridis)
## Loading required package: viridisLite
library(ggrepel)
## Warning: package 'ggrepel' was built under R version 4.0.5
library(ggthemes)
## Warning: package 'ggthemes' was built under R version 4.0.5
p <- ggplot(mmr_prenatal, aes(x=MMR, y=Prenatal)) + # Set my x and y values
geom_point(aes(text = paste("State:", State), size=Underserved, col=Uninsured)) + # Added the code to make the bubble chart and added the information for 'State' so it would show on the plotly.
geom_smooth(method=lm, col="orange", se=FALSE) + # Added a regression line
theme_economist() + # Changed the ggplot theme
xlab("Maternal Mortality Rate per 100,000 Live Births") +
ylab("% of Women with Delayed/No Prenatal Care") + # Created labels for my x and y axes
ggtitle("Lack of Medical Care/Resources Effects Maternal Mortality Rates") + # Added my title
scale_size(range = c(0.5, 8), name="% Underserved") + # Changed the label for my legend
scale_color_viridis(option="magma", discrete=FALSE, guide=FALSE)# Changed the color palette
## Warning: Ignoring unknown aesthetics: text
p + theme(legend.position = "bottom") # Changed the position of my legend
## `geom_smooth()` using formula 'y ~ x'
ggplotly(p) # Used the plotly package to add interactivity to my visualization
## `geom_smooth()` using formula 'y ~ x'
Data 110 - Project 2
For this project, I wanted to explore maternal mortality in the United States and the factors that play into this struggle that many women face in their lives. I decided to use the data provided by the Center of Disease Control and Prevention (CDC) which included variables such as the ratio of maternal deaths that occur per 100,000 live births, the percentage of women receiving delayed or no prenatal care, the percentage of women living in a medically underserved area, and many others. Almost all of the variables in this dataset (6/7) are numerical variables, with the only categorical variable being the state. This dataset was complete and clean; I did not have to work with any N/A variables or alter any of the values provided in the dataset. For the sake of making the data easier to follow, the only alteration I made was using the dplyr ‘arrange’ command to sort the MMR ratio from highest to lowest. Maternal mortality is a topic that I am familiar with personally; I have seen firsthand what mothers face during unstable pregnancies and the far-reaching effects of those circumstances. With the advancements in medicine and prenatal care, I wanted to see what factors are preventing all pregnant women from receiving the medical care and attention they need. This dataset, with its information on underserved and uninsured women, allowed me to explore these factors.
Maternal mortality in the United States is still a disappointingly large problem. According to information collected by the World Health Organization in 2017, the U.S. was one of two countries in the world to have a notable increase in its maternal mortality rate (Declercq). Multiple factors can lead to the death of women during or after pregnancy. These include physical complications such as severe bleeding, heart muscle disease, and infection, as well as issues with mental health, such as postpartum depression. However, aside from health factors, race and economic standing play concerningly large roles in maternal mortality rates. Women in predominantly medically underserved areas, most notably pregnant black women, have the highest rates of maternal mortality in the US. Lack of proper medical care, therapy, and access to prenatal care has led to a disproportionate number of women of color who suffer from pregnancy complications that may lead to their death both during pregnancy and also within the first 42 days of giving birth (Declercq).
Considering these factors, I wanted to create a visualization that would show how being uninsured, underserved and/or lacking in prenatal care could lead to higher maternal mortality rates. Taking state data, I created a scatter plot and added a linear regression line to show the positive relationship between lack of access to resources and maternal mortality. As I had expected, there is a lot of evidence to show how women who are not given equal medical attention and care are at higher risks of losing their lives during their pregnancy or soon after giving birth. States with lower percentages of underserved and uninsured women, such as Connecticut, Ohio, and Rhode Island had much a smaller MMR than say, a state Georgia, which had an alarming 41% of underserved women and MMR of a staggering 20%.
I wished that I could have looked at racial and economic data concerning the MMR in this dataset. Considering that socioeconomic standing is a significant indicator of the probability of maternal mortality, I would have liked to incorporate that data into my visualization.
Source:
Declercq, Eugene, and Laurie Zephyrin. “Maternal Mortality in the United States: A Primer.” Commonwealth Fund, 16 Dec. 2020, www.commonwealthfund.org/publications/issue-brief-report/2020/dec/maternal-mortality-united-states-primer.