Overview

For this project, we chose a visualization on notable Leap Day (February 29) births and deaths, sourced from Wikipedia. The project consists of three parts:

  1. Replication: Reproducing the original graph as closely as possible using ggplot2.
  2. Redesign: Applying principles of data visualization to improve clarity, aesthetics, and accessibility.
  3. Written Summary: Evaluating the original graph and explaining the rationale behind our design choices.

Original Graph

The original graph depicts notable Leap Day births (blue bars above the x-axis) and deaths (red bars below the x-axis) spanning from the 400s CE to the present.

Key Issues Identified:

  • Overcrowded Labels: Recent centuries are cluttered with overlapping labels.
  • Ambiguous Scale: No clear y-axis or gridlines for reference.
  • Distracting Typography: Oversized, bold headings detract from the focus on data.
  • Irrelevant Elements: Side notes unrelated to the plot clutter the visual.
  • Small Key Text: Totals for “births” and “deaths” are difficult to read.

Changes Made

In our first attempt at redesigning the original plot, we created a visualization that combined a scatter plot and a line chart. The scatter points were displayed in two colors (red and cyan) to represent different datasets or categories, while a blue line was added to suggest a potential trend or model fit. However, this version faced significant readability challenges.

One of the key issues was the use of time as the x-axis. The timeline spanned a broad range but lacked clear labels, making it difficult to follow the progression of data over time. Additionally, the scatter points overlapped heavily in certain areas, which further obscured individual data values. The varying point sizes added unnecessary complexity without a clear explanation of their purpose.

The inclusion of the blue line also created confusion, as it was neither labeled nor described, leaving its role in the plot ambiguous. Furthermore, the axes lacked proper labels, and the legend provided minimal information, failing to adequately explain the categories or variables represented.

These issues, coupled with the visual clutter caused by overlapping points and the combination of two plot types, made the graph difficult to interpret. As a result, this initial redesign did not effectively communicate the data or its insights.

Final Refined Plot

The final refined plot addresses the shortcomings of previous attempts by thoughtfully applying multiple layers of the grammar of graphics. The data layer is focused on Leap Day births and deaths from the 20th and 21st centuries, with a clear acknowledgment of the bias toward recent data.

The plot’s aesthetics incorporate distinct colors and a clean half-violin-half-dotplot design. This approach effectively displays both the distribution density and individual data points, eliminating visual clutter. Geometrically, births and deaths are separated into mirrored panels, ensuring clarity and facilitating easy comparisons.

Additionally, the minimalist theme, clear annotations, and a linear time scale significantly enhance readability. These design choices make the visualization accessible, visually appealing, and faithful to the underlying data, successfully addressing the issues of previous versions.