For this project, we chose a visualization on notable Leap Day (February 29) births and deaths, sourced from Wikipedia. The project consists of three parts:
ggplot2.The original graph depicts notable Leap Day births (blue bars above the x-axis) and deaths (red bars below the x-axis) spanning from the 400s CE to the present.
library(tidyverse)
library(here)
library(showtext)
library(ggimage)
library(fontawesome)
library(ggtext)
library(scales)
library(see)
library(patchwork)
library(colorspace)
library(glue)
tuesdata <- tidytuesdayR::tt_load('2024-02-27')
births <- tuesdata$births
deaths <- tuesdata$deaths
rm(tuesdata)
# Theme
theme_set(theme_minimal(base_size = 15))
theme_update(
plot.title.position = "plot",
axis.line.x = element_line(linewidth = 0.2, colour = "gray50"),
axis.line.y = element_line(linewidth = 0.2, colour = "gray50"),
panel.grid = element_blank()
)
# Data Summary --------------------------------------------------------
births_count <- births %>%
count(year_birth) %>%
pivot_longer(!n, names_to = "event", values_to = "year") %>%
select(year, event, n)
deaths_count <- deaths %>%
count(year_death) %>%
summarise(year_death, n = n * -1) %>%
pivot_longer(!n, names_to = "event", values_to = "year") %>%
select(year, event, n)
df_final <- full_join(births_count, deaths_count, by = NULL)
# Data Visualization --------------------------------------------------
# Basic colors
birth_color <- "blue"
death_color <- "red"
line_color <- "gray"
line_color_R <- "blue"
bg_plot <- "white"
plot <- df_final %>%
ggplot(aes(year, n, fill = event)) +
geom_col(alpha = 0.9) +
# Horizontal axis line
annotate(geom = "segment",
x = min(df_final$year) - 100,
xend = max(df_final$year) + 100,
y = 0, yend = 0,
color = line_color) +
annotate(geom = "segment",
x = 2000, xend = 2000,
y = Inf, yend = 0,
color = line_color_R) +
coord_cartesian(expand = FALSE, clip = "off") +
scale_x_continuous(breaks = unique(df_final$year)) +
scale_fill_manual(name = "Event",
values = c(birth_color, death_color),
labels = c("Births", "Deaths")) +
scale_color_manual(values = c(birth_color, death_color)) +
labs(title = "Leap Day Births and Deaths") +
theme(axis.text.x = element_blank(),
axis.line.y = element_blank(),
axis.text.y = element_blank(),
axis.title.y = element_blank(),
axis.line.x = element_blank(),
axis.title.x = element_blank(),
plot.title = element_text(hjust = 0.5, face = "bold", size = 16),
plot.caption = element_text(family = "sans",
size = 6,
face = "bold",
color = "gray50",
margin = margin(1, 0, 0, 0, "cm")),
plot.margin = margin(0.5, 1, 1, 0.5, "cm"))
print(plot)
In our first attempt at redesigning the original plot, we created a visualization that combined a scatter plot and a line chart. The scatter points were displayed in two colors (red and cyan) to represent different datasets or categories, while a blue line was added to suggest a potential trend or model fit. However, this version faced significant readability challenges.
One of the key issues was the use of time as the x-axis. The timeline spanned a broad range but lacked clear labels, making it difficult to follow the progression of data over time. Additionally, the scatter points overlapped heavily in certain areas, which further obscured individual data values. The varying point sizes added unnecessary complexity without a clear explanation of their purpose.
The inclusion of the blue line also created confusion, as it was neither labeled nor described, leaving its role in the plot ambiguous. Furthermore, the axes lacked proper labels, and the legend provided minimal information, failing to adequately explain the categories or variables represented.
These issues, coupled with the visual clutter caused by overlapping points and the combination of two plot types, made the graph difficult to interpret. As a result, this initial redesign did not effectively communicate the data or its insights.
births_per_year <- table(births$year_birth)
deaths_per_year <- table(deaths$year_death)
# create data frames that have the variables categorized as numbers
births_df <- data.frame(year = as.numeric(names(births_per_year)), births = as.numeric(births_per_year))
deaths_df <- data.frame(year = as.numeric(names(deaths_per_year)), deaths = as.numeric(deaths_per_year))
# combine them, ensuring the years line up and fixing the issue of the number of years being different by including all the possible years
leap_year_counts <- merge(births_df, deaths_df, by = "year", all = TRUE)
# add a new variable to the datafram: ratio
leap_year_counts$births_deaths_ratio <- leap_year_counts$births / leap_year_counts$deaths
# Create the plot with explicit smoothing formula
my_plot <- ggplot(
leap_year_counts |> filter(!is.na(births_deaths_ratio))
) +
# births points
geom_point(aes(x = year, y = births, color = "births", size = births),
alpha = 0.5) +
# deaths points
geom_point(aes(x = year, y = deaths, color = "deaths", size = deaths),
alpha = 0.5) +
# remove legend
theme(legend.position = "none") +
# add trend line for the births/deaths ratio with explicit formula
geom_smooth(aes(x = year, y = births_deaths_ratio),
formula = y ~ x, method = "loess", se = FALSE) +
# direct labeling for the ratio at the maximum year where ratio exists
geom_text(
data = leap_year_counts |> filter(!is.na(births_deaths_ratio)) |>
filter(year == max(year)),
aes(x = year, y = births_deaths_ratio),
label = "Births to Deaths Ratio",
vjust = -2, hjust = -0.05, size = 2.5, color = "blue"
) +
# labels and titles
labs(
title = "Leap Day Births and Deaths",
subtitle = "Births and deaths are plotted by year, with a trend line for their ratio",
x = "Year",
y = "Count",
color = "Event Type",
size = "Magnitude"
) +
# avoid plot trimming
coord_cartesian(clip = "off") +
# adjust theme
theme(
plot.title = element_text(size = 22, face = "bold", hjust = 0.5),
plot.subtitle = element_text(size = 16, hjust = 0.5),
axis.title.x = element_text(size = 16, face = "bold"),
axis.title.y = element_text(size = 16, face = "bold"),
plot.margin = margin(0.1, 0.9, 0.1, 0.1, "in")
)
print(my_plot)
The final refined plot addresses the shortcomings of previous attempts by thoughtfully applying multiple layers of the grammar of graphics. The data layer is focused on Leap Day births and deaths from the 20th and 21st centuries, with a clear acknowledgment of the bias toward recent data.
The plot’s aesthetics incorporate distinct colors and a clean half-violin-half-dotplot design. This approach effectively displays both the distribution density and individual data points, eliminating visual clutter. Geometrically, births and deaths are separated into mirrored panels, ensuring clarity and facilitating easy comparisons.
Additionally, the minimalist theme, clear annotations, and a linear time scale significantly enhance readability. These design choices make the visualization accessible, visually appealing, and faithful to the underlying data, successfully addressing the issues of previous versions.
df <- bind_rows(
births |>
rename(
year = year_birth,
event = person
) |>
mutate(type = "Births") |>
select(year, event, type),
deaths |>
rename(
year = year_death,
event = person
) |>
mutate(type = "Deaths") |>
select(year, event, type)
) |>
mutate(
type = fct(type, levels = c("Births", "Deaths"))
)
# create a new dataframe for years after 1900 to use in graph
filtered_df <- df[df$year >= 1900, ]
# make new variables in `filtered_df` for birth and death counts each year
filtered_df <- filtered_df |>
group_by(year) |>
mutate(births_per_year = sum(type == "Births"),
deaths_per_year = sum(type == "Deaths"))
# Load fonts
font_add_google("Rampart One",
family = "title_font"
) # Font for titles
font_add_google("Yanone Kaffeesatz",
family = "caption_font"
) # Font for the caption
font_add_google("Oregano",
family = "body_font"
) # Font for plot text
showtext_auto()
# Choosing one Palette for the Visualization
mypal <- c(
"#A1D6E2", # Light Blue for Births
"#4B4B4B" # Dark Grey for Deaths
)
# Define colors
bg_col <- "white"
text_col <- mypal[2]
text_hil <- mypal[1]
# Define Text Size
ts <- unit(20, units = "cm") # Text Size
# Add text to plot--------------------------------------------------------------
plot_title <- "Leap Day"
subtitle_text <- "Leap Day (February 29) births and deaths from the 20th and 21st centuries, sourced from Wikipedia. The data shows a bias toward more recent events, visualized below using a half-violin, half-dot plot. Each dot represents an individual birth or death."
plot_subtitle <- str_wrap(subtitle_text, width = 90)
# create a new dataframe for years after 1900 to use in graph
filtered_df <- df[df$year >= 1900, ]
# make new variables in `filtered_df` for birth and death counts each year
filtered_df <- filtered_df |>
group_by(year) |>
mutate(births_per_year = sum(type == "Births"),
deaths_per_year = sum(type == "Deaths"))
# position for label
birth_labels <- filtered_df |>
filter(type == "Births") |>
group_by(year) |>
summarise(
label_b = 0.99 - 0.07 * max(births_per_year),
.groups = "drop"
)
death_labels <- filtered_df |>
filter(type == "Deaths") |>
group_by(year) |>
summarise(
label_d = 1.94 - 0.07 * max(deaths_per_year),
.groups = "drop"
)
filtered_df <- filtered_df |>
left_join(
birth_labels |> select(year, label_b),
by = "year"
) |>
left_join(
death_labels |> select(year, label_d),
by = "year"
)
testg <- filtered_df |>
ggplot(aes(
y = year,
x = type,
fill = type
)) +
geom_point() +
geom_violindot(
size_dots = 8,
trim = FALSE,
scale = "area",
show.legend = FALSE,
binwidth = 0.5,
width = 1
) +
geom_text(
data = filtered_df |> filter(type == "Births" & !is.na(label_b)),
aes(x = label_b, y = year, label = births_per_year),
hjust = 2, size = 30, color = "#A1D6E2"
) +
geom_text(
data = filtered_df |> filter(type == "Deaths" & !is.na(label_d)),
aes(x = label_d, y = year, label = deaths_per_year),
hjust = 0, size = 30, color = "#4B4B4B"
) +
scale_y_continuous(
limits = c(1880, 2020),
breaks = seq(1900, 2020, 20),
expand = expansion(c(0, 0.02))
) +
scale_x_discrete(
expand = expansion(0),
position = "top"
) +
scale_fill_manual(values = mypal) +
labs(
title = plot_title,
subtitle = plot_subtitle,
x = NULL, y = NULL
) +
coord_cartesian(clip = "off") +
theme_minimal() +
theme(
plot.margin = margin(2, 2, 2, 2, unit = "cm"),
legend.position = "none",
axis.line.x = element_blank(),
axis.line.y = element_line(
linewidth = 1,
linetype = 2
),
axis.ticks.y = element_blank(),
axis.text.x = element_text(
size = 100,
colour = "black",
face = "bold",
hjust = 0.5,
vjust = 0.7
),
axis.text.y = element_text(
size = 5 * ts,
colour = text_col,
hjust = 0.5,
family = "caption_font",
margin = margin(0, 0.2, 0, 1, unit = "cm")
),
plot.title = element_text(
size = 24 * ts,
colour = text_hil,
margin = margin(2, 0, 1, 0, unit = "cm"),
family = "title_font",
hjust = 0.5
),
plot.subtitle = element_text(
size = 5.7 * ts,
colour = text_col,
margin = margin(1, 0, 1, 0, unit = "cm"),
family = "body_font",
lineheight = 0.3,
hjust = 0.5
),
plot.caption = element_textbox(
size = 4 * ts,
colour = text_hil,
margin = margin(1, 0, 1, 0, unit = "cm"),
family = "caption_font",
hjust = 0.5
),
panel.grid = element_blank(),
plot.title.position = "plot"
)
ggsave("tidy_leap_day.png", plot = testg, width = 40, height = 50, units = "cm", bg = "white")