GeoSpatial Project

Author

Ella Kucera

Introduction

For my Geo-spatial project, after much set back I decided to look at infant mortality rates among women in The United States and then compare to economically similar countries. Within The United States I wanted to look further into certain states, causes of these untimely deaths, ethnicity among mothers, and see if the infant mortality rates align with maternal mortality rates. My hypothesis is that southern states will have the highest rates of infant mortality and maternal mortality compared to northern, mid-western and states along the west coast. I also think that The United States will have a higher mortality rate among Asian and European countries due to lifestyle and a uneven distribution of healthcare resources.

Getting Started

My first step was to load the needed packages for this project!

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(leaflet)
library(sf)
Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(sp)
library(readxl)
library(tigris)
To enable caching of data, set `options(tigris_use_cache = TRUE)`
in your R script or .Rprofile.
library(tigris)
library(s2)

Infant mortality rates from 2021 in the United States was my first data set I wanted to look at. I found this data from USAFACTS, https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fusafacts.org%2Farticles%2Fwhat-is-the-us-infant-mortality-rate%2F&data=05%7C02%7Cekucera%40elon.edu%7C69e31a4054a544acc97708dc704f40df%7Cba18326d711f4ae286816115493a7a53%7C0%7C0%7C638508732887208933%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=%2FkzF2Tte4PmCc3mHoAEhJGKL7qvGTrdg1SLSeqhT6wE%3D&reserved=0

infant_mortality_rates <- read_csv("data-OgA4c.csv")
Rows: 51 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): State
dbl (1): Death Rate Per 1,000
num (2): Births, Deaths

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
us_states <- states(cb=TRUE, resolution = '20m')
Retrieving data for the year 2021

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |=====================================                                 |  52%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |======================================================================| 100%
colnames(us_states)[6] <- "State"

us_states |> 
  merge(infant_mortality_rates) -> merged_mortality

as.numeric(merged_mortality$`Death Rate Per 1,000`) -> merged_mortality$`Death Rate Per 1,000`

mortality_pal <- colorFactor('Reds', domain = merged_mortality$`Death Rate Per 1,000`)

merged_mortality |> 
  st_transform('+proj=longlat +datum=WGS84') -> merged_mortality

leaflet() |> 
  addTiles() |> 
  setView(-174.828477, -21.090038, zoom = 1) |> 
  addPolygons(
    data = merged_mortality, 
    weight = 1,
    fillColor = mortality_pal(merged_mortality$`Death Rate Per 1,000`),
    fillOpacity = 1,
    label = merged_mortality$`Death Rate Per 1,000`
  )
Warning in RColorBrewer::brewer.pal(max(3, n), palette): n too large, allowed maximum for palette Reds is 9
Returning the palette you asked for with that many colors

According from the data and the map above it is determined that Missouri is the state with the highest infant mortality rates. As you can see the darker red states indicate the higher infant mortality compared to the lighter red shaded states. As predicted the states with the highest infant mortality lost are in the southern region of the United States, interestingly enough Alaska also ranks considerably high however that is possibly due to other variables we’ll get into later on the project.

Moving away from the map I wanted to see what was causes these untimely deaths. I predicted that Sudden Infant Death syndrome aka SIDs would be the leading cause of death.

cause_of_deaths <- read_csv("data-3zeqx.csv")
Rows: 15 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): 15 Leading causes of infant mortality
dbl (2): Deaths, Mortality rate per 1,000 live births

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ggplot(cause_of_deaths, aes(x = cause_of_deaths$Deaths , y = cause_of_deaths$`15 Leading causes of infant mortality`)) +
  geom_col() +
  labs(title = "Cause of Deaths", x = "Leading causes of Infant Mortality", y = "Infant Mortality Rate") +
  theme_minimal()

Interestingly enough birth defects and preterm births rank before SIDS in being the leading cause of infant mortality.

Moving along from the infants data, I wanted too see if there was any correlation between infant mortality and maternal mortality. The first data set I looked at was infant mortality compared to maternal race.

maternal_ethnicity_2017 <- read_csv("data-uTuOe.csv")
Rows: 7 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Maternal race
dbl (1): Infant mortality rate

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ggplot(maternal_ethnicity_2017, aes(x = `Maternal race`, y = `Infant mortality rate`)) +
  geom_col(stat = "identify", fill = "lightgreen") +
  labs(title = "Infant mortality of maternal ethnicity", x = "Maternal Race", y = "Infant Mortality Rate") +
  theme_minimal() + coord_flip()
Warning in geom_col(stat = "identify", fill = "lightgreen"): Ignoring unknown
parameters: `stat`

According to the graph above American Indian or Alaskan Natives, Black women and Native Hawaiian or Pacific Islanders are more likely to experience infant mortality lost compared to White, Hispanic and Asian women.

Moving away from race and ethnicity I wanted to look at what or if maternal ages experience more infant mortality.

mothersage <- read_csv("data-Njrce.csv")
Rows: 7 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Age of mother
dbl (1): Infant mortality rate

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ggplot(mothersage, aes(x = `Age of mother`, y = `Infant mortality rate`)) +
  geom_col(stat = "identify", fill = "pink") +
  labs(title = "Age of Mother per 1000 births", x = "Age of Mother", y = "Infant Mortality Rate") +
  theme_minimal() + coord_flip()
Warning in geom_col(stat = "identify", fill = "pink"): Ignoring unknown
parameters: `stat`

According to the graph above it is determined that mothers under the age of 20 and between the ages of 20-24 and 40-64 are most likely to experience infant mortality. This could be a wide range of variables such as lifestyle, high risk pregnancy due to older age, and not having the resources needed for a healthy pregnancy.

Now looking at maternal mortality within The United States.

us_maternal_mortality <- read_csv("data-E0rJP.csv")
Rows: 22 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): X.1, Maternal mortality rate

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ggplot(us_maternal_mortality, aes(x= `X.1`, y= `Maternal mortality rate`)) +
  geom_col(fill = "skyblue") +
  labs(title = "US Maternal Mortality", x = "X.1", y = "Maternal mortality rate") +
  theme_minimal()

From the chart above it looks like The United States rates in maternal mortality have increased in the past 20 years. With an all time high in 2021, and the lowest rates being in 2003. This is a very interesting graph because it would be inferred that maternal mortality rates should be going down with modern medicine and The United States having a variety of medicine and medical resources.

Now that we see that The United States is experiencing the highest rates of maternal mortality in the past 20 years, lets see how The United States compares to comparable countries around the world. These countries are comparable in terms of economy and similar healthcare systems as well as population.

similar_countries <- read_csv("data-gSSXz.csv")
Rows: 13 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Country
dbl (1): 2017

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
s2_data_tbl_countries -> world

world_sf <- st_as_sf(world)

colnames(world_sf)[1] <- "Country"

world_sf |> 
  merge(similar_countries) -> world_merged
colnames(world_merged)[3] <- "Rate"

#colnames(world_merged)[colnames(world_merged) == '2017'] <- 'year_2017'

world_pal <- colorNumeric("Blues", domain = world_merged$Rate)

#as.numeric(world_merged$'2017') -> world_merged$'2017'

#world_merged$RATE <- as.numeric(as.character(world_merged$RATE))

#is.numeric(world_merged$RATE)

#world_merged$RATE_color <- colorNumeric("Blues", domain = world_merged$RATE)(world_merged$RATE)

leaflet() |> 
  addTiles() |> 
  setView(-174.828477, -21.090038, zoom = 1) |> 
  addPolygons(
    data = world_merged, 
    weight = 1,
    fillColor = world_pal(world_merged$Rate),
    fillOpacity = 1,
    label = world_merged$Rate
  )

From what we can see from the map above it shows that Canada is second to The United States when it comes to the highest infant mortality rates compared to Japan, Sweden, Austria, Australia, Germany, Switzerland, Belgium, Netherlands, France and United Kingdom.

Second country with the highest infant mortality is United Kingdom. The three lowest are Japan, Sweden and Austria.

countries_data<- read_csv("data-gSSXz.csv")
Rows: 13 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Country
dbl (1): 2017

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ggplot(countries_data, aes(x = `Country`, y = `2017`)) +
  geom_col(stat = "identify", fill = "brown") +
  labs(title = "Similar countries to the US", x = "Country", y = "2017 Infant Mortality Rate") +
  theme_minimal() + coord_flip()
Warning in geom_col(stat = "identify", fill = "brown"): Ignoring unknown
parameters: `stat`

Here is a final graph to get a better visual of where they rank in terms of highest infant mortality.