Project 1

Author

Marie-Anne Kemajou

The dataset that I will be doing a visualization of, sourced from the CDC, has information regarding infant deaths and how frequently they happen between the years of 2007 and 2016. There is specific information regarding other related factors which this dataset includes. Maternal race/ethnicity is the race/ethnicity of the mother of the infant. The infant mortality rate, neonatal mortality rate, and postneonatal mortality rate is expressed as deaths for every 1,000 live births. Neonatal deaths are within the first month of life and postneonatal deaths are after the first month and up to a year. I plan to explore not only how infant deaths change in chronological order between 2007 and 2016, but also how differing maternal race/ethnicity impacts infant mortality. I also wish to explore how live births compare to infant deaths.

Reference 1: https://www.cdc.gov

library(readr)
infantmortality <- read_csv("/Users/marieannekemajou/Documents/Data 110/infantmortality.csv")

Rows: 60 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Maternal Race or Ethnicity
dbl (8): Year, Infant Mortality Rate, Neonatal Mortality Rate, Postneonatal ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

I added this code to be able to read my csv file into this quarto document.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ purrr     1.0.4
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(gganimate)

I called tidyverse out in order to be able to use ggplot to create my visualization. I also called out gganimate in order to be able to create my moving visualization.

infantmortality3 <- infantmortality %>%
  mutate(`Maternal Race or Ethnicity` = case_when(
    `Maternal Race or Ethnicity` %in% c("Non-Hispanic Black", "Black Non-Hispanic") ~ "Black (Non-Hispanic)",
    `Maternal Race or Ethnicity` %in% c("Non-Hispanic White", "White Non-Hispanic") ~ "White (Non-Hispanic)",
    TRUE ~ `Maternal Race or Ethnicity`
  ))

Reference 2: https://app.datacamp.com

I used dplyr and mutate, which is under dplyr here to create a new dataset that has the combined categories, as part of the process for cleaning my data. I was able to combine the two race groups for non hispanic black people and non hispanic white people. This made my visualization look cleaner and it became easier to interpret.

infantmortality2 <- infantmortality3 %>%
  filter(!is.na(`Infant Deaths`) & !is.na(`Number of Live Births`))

I used dplyr here and used the filter option to filter out NA values for the two specific columns I needed to be filtered for the time lapse gif.

ggplot(infantmortality2, aes(x = `Maternal Race or Ethnicity`, 
                                  y = `Infant Deaths`, 
                                  size = `Number of Live Births`, 
                                  color = `Maternal Race or Ethnicity`)) +
  geom_point(alpha = 0.9) +
  labs(title = 'Infant Deaths by Race/Ethnicity - Year: {frame_time}', 
       x = 'Maternal Race or Ethnicity',
       y = 'Infant Deaths') +
  scale_size(range = c(6, 14)) +
  theme_light() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +  
  transition_time(Year) +
  ease_aes('cubic-in-out')

Reference 2(Again): https://app.datacamp.com

Reference 3: “How to create a time lapse bubble chart in rstudio” (Google AI)

Reference 4: https://www.youtube.com/watch?v=z9J78rxhcrQ

I used ggplot here to create the bubble chart. I used geom_point to create the plot-like structure with the bubbles. I used scale size to put a restriction on the sizes of the bubbles. I also used the axis.text.x to make the x axis labels readable and then I used transition_time to create the loop of movement. I then used ease_aes to set the way in which the bubbles would move and the pattern it would follow.

For this dataset, I cleaned it by removing the NA values that were specific to the columns I needed for my bubble chart. I created another dataset (infantmortality2) with the two columns that I cleaned using !is.na. I also cleaned it by combining some of the data that fell under the same category. There were two categories for non-Hispanic white people and non-Hispanic black people so I combined each category. The visualization that I have created represents infant mortality by race/ethnicity, so you are able to see how the bubble for each race moves through the years 2007-2016. The purpose of this is to be able to compare and contrast differences between the patterns for each racial/ethnic group. I also wanted it to be apparent whether or not any differences between races/ethnic groups would have changed throughout the years and if overall the number of infant deaths ended up decreasing. There is quite a small number of live births for Puerto Ricans, however infant deaths seem to be very comparable to Asian and Pacific Islander which has a larger amount of live births. So it is interesting that there seems to be a larger number of infant deaths for Puerto Ricans when you take into consideration the number of live births you are comparing it to. I also find it interesting that the white non-Hispanic category has the largest amount of live births, but not the highest amount of infant deaths which would make the most sense mathematically. The black non-Hispanic category has the most infant deaths while having less live births than “other Hispanic” and “white non-Hispanic”. There were a few things I definitely wanted to be able to figure out. I wanted to change the color scheme from the default but every time I tried to pick my colors manually using scale_color_manual, it would completely change my visual and the bubbles would overlap. I could not figure out how to fix that while still having a visual that makes sense. I also wanted to be able to add a pause feature but after realizing that combining similar categories allowed for all of my bubbles to show, I don’t think it is as crucial although it would have been nice.

All References

Reference 1: https://www.cdc.gov

Reference 2:https://app.datacamp.com

Reference 3: “How to create a time lapse bubble chart in rstudio” (Google AI)

Reference 4:https://www.youtube.com/watch?v=z9J78rxhcrQ