title: “Infant Mortality” author: “Hadiyah Sumter” date: “2025-10-14” output: html_document —

infant_data <- read.csv("infant_mortality.csv")
infant_mortality <- read.csv("~/Downloads/infant_mortality.csv")
head(infant_data)
##   Year Materal.Race.or.Ethnicity Infant.Mortality.Rate Neonatal.Mortality.Rate
## 1 2007        Black Non-Hispanic                   9.8                     6.0
## 2 2013            Other Hispanic                   4.3                     2.6
## 3 2013        Black Non-Hispanic                   8.3                     5.5
## 4 2008        White Non-Hispanic                   3.3                     2.1
## 5 2009        Black Non-Hispanic                   9.5                     5.8
## 6 2010        Black Non-Hispanic                   8.6                     5.6
##   Postneonatal.Mortality.Rate Infant.Deaths Neonatal.Infant.Deaths
## 1                         3.8           287                    177
## 2                         1.7           120                     72
## 3                         2.9           201                    132
## 4                         1.1           125                     82
## 5                         3.7           259                    158
## 6                         3.1           230                    148
##   Postneonatal.Infant.Deaths Number.of.Live.Births
## 1                        110                 29268
## 2                         48                 27621
## 3                         69                 24108
## 4                         43                 38383
## 5                        101                 27405
## 6                         82                 26635
setwd("~/Desktop/DATA-101")

Setup

Install libary:

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(zoo)
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

Introduction:

What is the highest infant mortality rate among the different racial and ethinic groups over the years, and does this pattern suggest evidence of systematic racism in the healthcare system?

The dataset used for this project was obtained from the Centers for Disease control and Prevention (CDC) and published through Data.gov https://catalog.data.gov/dataset/infant-mortality. It provides national and state level data on infant mortality in the United States. The dataset includes information such as the year, race, or ethnicity, geographic area, and infant mortality rate (measured as the number of infant deaths per 1,000 live births).The purpose of this analysis is to identify which racial or ethnic group has the highest infant mortality rate in the United States. The 5 variables that is listed below are important because they reflect differences in health outcomes among different racial and ethnic groups. Understanding these differences can help improve public health awareness and guide future policy decisions.

For this analysis, I will focus on the following key variables: - Year- to track changes in mortality over time. - Maternal Race or Ethnicity- to compare racial and ethnic disparities. - Infant death Rate - to identify which group has the highest rate of deaths. - Number of Live Births - to provide context for comparing mortality rates across groups. - Infant Mortality Rate - to identify which group has the highest rate and observe trends.

Data Analysis:

This section focuses on cleaning, summarizing, and visualizing the infant mortality dataset to answer the research question. I will first clean the dataset by removing missing or unnecessary data. Then, I will use R functions such as select(), filter(), summarize(), Head() and group_by() to explore trends across racial and ethnic groups. Finally, I will create a visualization to show how infant mortality rates vary by race and year. The analysis will help identify which groups experience the highest rates of infant mortality and how these patterns have changed over time. Before analyzing the data, I checked for missing values (NAs) and found 13 missing entries for Infant Mortality Rate and 15 missing entries for Infant Deaths. During the data cleaning process, I also noticed that some race and ethnicity categories were labeled inconsistently, such as “Black Non-Hispanic,”Non Hispanic Black, and “Black, Non-Hispanic”. To make the data consistent, I combined all of these into a single category called “Black Non-Hispanic”.

After cleaning the data, I grouped the dataset Maternal Race or Ethnicity and calculated the average Infant Mortality Rate for each group. The results showed that the Black Non-Hispanic group had the highest average infant mortality rate compared to other racial and ethnic groups. This finding highlights ongoing health disparities and emphasizes the need for continued efforts to address inequities in maternal and infant health.

# Check for missing values
colSums(is.na(infant_data))
##                        Year   Materal.Race.or.Ethnicity 
##                           0                           0 
##       Infant.Mortality.Rate     Neonatal.Mortality.Rate 
##                          15                          15 
## Postneonatal.Mortality.Rate               Infant.Deaths 
##                          17                          13 
##      Neonatal.Infant.Deaths  Postneonatal.Infant.Deaths 
##                          13                          13 
##       Number.of.Live.Births 
##                           0
# View cleaned dataset
head(infant_data)
##   Year Materal.Race.or.Ethnicity Infant.Mortality.Rate Neonatal.Mortality.Rate
## 1 2007        Black Non-Hispanic                   9.8                     6.0
## 2 2013            Other Hispanic                   4.3                     2.6
## 3 2013        Black Non-Hispanic                   8.3                     5.5
## 4 2008        White Non-Hispanic                   3.3                     2.1
## 5 2009        Black Non-Hispanic                   9.5                     5.8
## 6 2010        Black Non-Hispanic                   8.6                     5.6
##   Postneonatal.Mortality.Rate Infant.Deaths Neonatal.Infant.Deaths
## 1                         3.8           287                    177
## 2                         1.7           120                     72
## 3                         2.9           201                    132
## 4                         1.1           125                     82
## 5                         3.7           259                    158
## 6                         3.1           230                    148
##   Postneonatal.Infant.Deaths Number.of.Live.Births
## 1                        110                 29268
## 2                         48                 27621
## 3                         69                 24108
## 4                         43                 38383
## 5                        101                 27405
## 6                         82                 26635
str(infant_data)
## 'data.frame':    88 obs. of  9 variables:
##  $ Year                       : int  2007 2013 2013 2008 2009 2010 2010 2011 2008 2007 ...
##  $ Materal.Race.or.Ethnicity  : chr  "Black Non-Hispanic" "Other Hispanic" "Black Non-Hispanic" "White Non-Hispanic" ...
##  $ Infant.Mortality.Rate      : num  9.8 4.3 8.3 3.3 9.5 8.6 2.8 8.1 NA NA ...
##  $ Neonatal.Mortality.Rate    : num  6 2.6 5.5 2.1 5.8 5.6 2 5.3 NA NA ...
##  $ Postneonatal.Mortality.Rate: num  3.8 1.7 2.9 1.1 3.7 3.1 0.8 2.9 NA NA ...
##  $ Infant.Deaths              : int  287 120 201 125 259 230 104 210 NA NA ...
##  $ Neonatal.Infant.Deaths     : int  177 72 132 82 158 148 75 136 NA NA ...
##  $ Postneonatal.Infant.Deaths : int  110 48 69 43 101 82 29 74 NA NA ...
##  $ Number.of.Live.Births      : int  29268 27621 24108 38383 27405 26635 37780 25825 2548 230 ...
infant_data_clean <- infant_mortality |>
  mutate(Materal.Race.or.Ethnicity = case_when(
    Materal.Race.or.Ethnicity %in% c("Black Non-Hispanic", "Black NH", "Non-Hispanic Black") ~ "Black",
    Materal.Race.or.Ethnicity %in% c("White Non-Hispanic", "Non-Hispanic White", "White NH") ~ "White",
    Materal.Race.or.Ethnicity %in% c("Hispanic or Latino", "Latino") ~ "Hispanic",
    TRUE ~ Materal.Race.or.Ethnicity
  ))
colnames(infant_mortality)
## [1] "Year"                        "Materal.Race.or.Ethnicity"  
## [3] "Infant.Mortality.Rate"       "Neonatal.Mortality.Rate"    
## [5] "Postneonatal.Mortality.Rate" "Infant.Deaths"              
## [7] "Neonatal.Infant.Deaths"      "Postneonatal.Infant.Deaths" 
## [9] "Number.of.Live.Births"
infant_data <- infant_mortality |>
  select("Year", "Materal.Race.or.Ethnicity", "Infant.Mortality.Rate", "Infant.Deaths")
race_summary <- infant_data |>
  group_by(Materal.Race.or.Ethnicity) |>
  summarize(
    avg_rate = mean(Infant.Mortality.Rate, na.rm = TRUE),
  
  )

print(race_summary)
## # A tibble: 14 × 2
##    Materal.Race.or.Ethnicity  avg_rate
##    <chr>                         <dbl>
##  1 API                            2.8 
##  2 Asian and Pacific Islander     2.96
##  3 Black NH                       6.9 
##  4 Black Non-Hispanic             8.81
##  5 Non-Hispanic Black             8.32
##  6 Non-Hispanic White             2.42
##  7 Other Hispanic                 4.29
##  8 Other and Unknown              0   
##  9 Other/Two or More              3   
## 10 Puerto Rican                   5.9 
## 11 Total                          4   
## 12 Unknown                      NaN   
## 13 White NH                       2.2 
## 14 White Non-Hispanic             3.1
highest_rate <- race_summary |>
  filter(avg_rate == max(avg_rate, na.rm = TRUE))

print(highest_rate)
## # A tibble: 1 × 2
##   Materal.Race.or.Ethnicity avg_rate
##   <chr>                        <dbl>
## 1 Black Non-Hispanic            8.81
ggplot(infant_data, aes(x = Materal.Race.or.Ethnicity, y = Infant.Mortality.Rate)) +
  geom_boxplot(aes(fill= Materal.Race.or.Ethnicity)) +
  labs(title = "Infant Mortality Rates by Maternal Race/Ethnicity",
       x = "Maternal Race or Ethnicity",
       y = "Infant Mortality Rate (per 1,000 live births)") +
  coord_flip()

Conclusion:

The analysis revealed clear disparities in infant mortality rates across racial and ethnic groups in the United States. Black non-Hispanic mothers consistently experienced the highest infant mortality rates compared to other racial and ethnic categories, while White non-Hispanic and Hispanic groups generally showed lower rates. These results highlight persistent inequalities in maternal and infant health outcomes, which may reflect broader systemic issues such as unequal access to quality healthcare, socioeconomic barriers, and the effects of institutional racism within the healthcare system.

The findings emphasize the need for targeted public health interventions and policy changes that address these disparities at their root causes. Future research could explore more detailed variables such as geographic location, socioeconomic status, prenatal care access, and hospital quality to better understand the multifaceted nature of these inequalities. Expanding the dataset to include more recent years and linking it with maternal health indicators could provide deeper insights into progress made toward heath equity and where further action is needed.

Reference:

Centers for Disease Control and Prevention (CDC), National Center for Health Statistics (NCHS). (n.d.). Infant Mortality Data. Retrieved from https://catalog.data.gov/dataset/infant-mortality

CDC. (2024). Infant Mortality. U.S. Department of Health and Human Services. Retrieved from https://www.cdc.gov/maternal-infant-health/infant-mortality/index.html