1 Original Plot

library(fivethirtyeight)
pulitzer <- fivethirtyeight::pulitzer

Based on data on 50 U.S. newspapers, the author Nate Silver from FiveThirtyEight compiled the pulitzer dataset in the fivethirtyeight package and wrote an article on what data tell about the Newspapers’ development trend over two decades. From the dataset, the original graph is a scatterplot which marks the relationship between the number of Pulitzer winners and finalists at U.S. newspapers from 1990 to 2014 and their corresponding change in print and digital circulation from 2004 to 2013. The latter variable is represented by percent change on the y-axis, with positive numbers signifying percent increase and negative numbers signifying percent decrease. The graph strikes a visual balance with the black x-axis lying in the middle of the panel, dividing up the space to pinpoint the relative positions of points above and below the x-axis. Gridlines except the axes are deprecated to light gray in the background, which alleviates visual stress while providing spatial references for the reader to locate the points. Notably, the graph also presents a linear regression line (in dark gray) showing a modest positive correlation between the variables, which seems to obscure the author’s argument that Pulitzer Prizes actually do not help newspapers retain their readers. Another shortcoming of the graph is that points near the y-axis in the range between y = -20 and y = -50 significantly overlap with each other, so it is difficult to discern how much data points there are. Overall, the minimal theme and striking red-colored dots succinctly capture the story of the data, which is complemented by an informative title, subtitle, and x-axis label.

2 Replicated Plot

subtitle_text <- strwrap("Change in print and digital circulation (2004-2013) vs. Pulitzer winners and finalists (1990-2014) among the top 50 newspapers", width = 70)
subtitle_text <- paste(subtitle_text, collapse = "\n")

ggplot(data = pulitzer, aes(x = num_finals1990_2014, y = pctchg_circ)) +
  scale_x_continuous(limits = c(-5, 130), breaks = seq(0, 125, 25)) +
  scale_y_continuous(limits = c(-110, 110), breaks = seq(-100, 100, 25), 
                     labels = function(x) {paste0(ifelse(x > 0, "+", ""), x, ifelse(x == 100, "%", ""))}) +
  geom_hline(yintercept = 0, size = 0.3, color = "black") +
  theme_minimal() +
  theme(panel.grid = element_blank(),
        panel.grid.major = element_line(color = "gray80", size = 0.3),
        panel.grid.minor = element_blank(),
        axis.text = element_text(family = "Roboto", size = 10, color = "black"),
        axis.title.x = element_text(margin = margin(t = 7), family = "Ariel", size = 9, color = "gray20"),
        plot.title = element_text(margin = margin(b = 3, l = -37), hjust = 0, family = "Lexend", size = 14, face = "bold", color = "gray15"),
        plot.subtitle = element_text(margin = margin(b = 10, l = -37), hjust = 0, family = "Ariel", size = 11, color = "gray20"),
        plot.background = element_rect(fill = "gray95", color = NA),
        plot.margin = margin(t = 10, r = 25, b = 10, l = -2)) +
  geom_vline(xintercept = 0, size = 0.3, color = "black") +
  geom_smooth(method = "lm", se = FALSE, size = 1, color = "darkgray") +
  geom_point(shape = 21, size = 2.5, color = "white", fill = "red") +
  labs(x = "Pulitzer winners and finalists", y = "", title = "Pulitzer Prices Don't Lure Readers", subtitle = subtitle_text) +
  geom_text(
    aes(label = ifelse(num_finals1990_2014 == 117, "The New York Times", "")), vjust = 0.5, hjust = 1.07, family = "Ariel", size = 3, color = "gray20") +
  geom_text(
    aes(label = ifelse(num_finals1990_2014 == 100, "The Washington Post", "")), vjust = 0.5, hjust = -0.08, family = "Ariel", size = 3, color = "gray20") +
  geom_text(
    aes(label = ifelse(num_finals1990_2014 == 85, "Los Angeles Times", "")), vjust = 0.5, hjust = 1.08, family = "Ariel", size = 3, color = "gray20") +
  geom_text(
    aes(label = ifelse(num_finals1990_2014 == 5 & circ2004 == 340007, "Rocky Mountain News", "")), vjust = -0.4, hjust = -0.32, family = "Ariel", size = 3, color = "gray20") +
  coord_fixed(ratio = 0.4, clip = "off") +
  geom_curve(
    aes(x = 17, y = -93, xend = 5, yend = -93),
    arrow = arrow(length = unit(0.2, "cm"), type = "closed", angle = 40),  
    curvature = 0.4,                          
    size = 0.3, color = "black")

3 Improved Plot

subtitle_text <- strwrap("Change in print and digital circulation (2004-2013) vs. Pulitzer winners and finalists (1990-2014) among the top 50 newspapers", width = 70)
subtitle_text <- paste(subtitle_text, collapse = "\n")

ggplot(data = pulitzer, aes(x = num_finals1990_2014, y = pctchg_circ)) +
  scale_x_continuous(limits = c(-5, 130), breaks = seq(0, 125, 25)) +
  scale_y_continuous(limits = c(-110, 110), breaks = seq(-100, 100, 25), 
                     labels = function(x) {paste0(ifelse(x > 0, "+", ""), x, ifelse(x == 100, "%", ""))}) +
  geom_hline(yintercept = 0, size = 0.5, linetype = "dashed", color = "black") +
  theme_minimal() +
  theme(panel.grid = element_blank(),
        panel.grid.major = element_line(color = "gray80", size = 0.3),
        panel.grid.minor = element_blank(),
        axis.text = element_text(family = "Roboto", size = 10, color = "black"),
        axis.title.x = element_text(margin = margin(t = 7), family = "Ariel", size = 9, color = "gray20"),
        axis.title.y = element_text(family = "Ariel", size = 9, color = "gray20"),
        plot.title = element_text(margin = margin(b = 3, l = -37), hjust = 0, family = "Lexend", size = 14, face = "bold", color = "gray15"),
        plot.subtitle = element_text(margin = margin(b = 10, l = -37), hjust = 0, family = "Ariel", size = 11, color = "gray20"),
        plot.background = element_rect(fill = "gray95", color = NA),
        plot.margin = margin(t = 10, r = 145, b = 10, l = 8),
        legend.text = element_text(family = "Roboto", size = 10),
        legend.position = c(1, 0.9),
        legend.justification = c(0, 0.5)) +
  geom_vline(xintercept = 0, size = 0.3, color = "black") +
  geom_point(aes(fill = pctchg_circ < 0), shape = 21, size = 2, color = "white", alpha = 0.5) +
  scale_fill_manual(
    values = c("FALSE" = "red", "TRUE" = "blue"),   
    labels = c("Increase in Circulation", "Decline in Circulation"),                      
    name = ""                
  ) +
  labs(x = "Number of Pulitzer winners and finalists", y = "Change in print and digital circulation", title = "Pulitzer Prizes Don't Lure Readers", subtitle = subtitle_text) +
  geom_text(
    aes(label = ifelse(num_finals1990_2014 == 117, "The New York Times", "")), vjust = 0.5, hjust = 1.07, family = "Ariel", size = 3, color = "gray20") +
  geom_text(
    aes(label = ifelse(num_finals1990_2014 == 100, "The Washington Post", "")), vjust = 0.5, hjust = -0.08, family = "Ariel", size = 3, color = "gray20") +
  geom_text(
    aes(label = ifelse(num_finals1990_2014 == 85, "Los Angeles Times", "")), vjust = -1.2, hjust = 0.5, family = "Ariel", size = 3, color = "gray20") +
  geom_text(
    aes(label = ifelse(num_finals1990_2014 == 5 & circ2004 == 340007, "Rocky Mountain News", "")), vjust = -0.4, hjust = -0.28, family = "Ariel", size = 3, color = "gray20") +
  coord_fixed(ratio = 0.4, clip = "off") +
  annotate("curve",
    x = 17, y = -93, xend = 5, yend = -93,
    arrow = arrow(length = unit(0.2, "cm"), type = "closed", angle = 40),  
    curvature = 0.4,                          
    size = 0.3, color = "black")

To address the major drawbacks of the original graph, I attempt at altering the aesthetics of the points and adding more clarifying details. Depending on their relative position to the x-axis, I color-code the points in red or blue to offer an immediate contrast between the points representing newspapers which experienced an increase in circulation between 2003 and 2014 and those representing newspapers which experienced a decline. (I pick red and blue because they are color-blindness friendly.) A legend is provided to the right of the plot to help decipher what the colors mean. The color also emphasizes the differing numbers of points on each side of the y-axis, demonstrating that newspapers whose print and digital circulation increased over the decade are in the minority. Most newspapers suffered a blow to circulation regardless of their number of Pulitzer winners and finalists. Changing the x-axis to a dashed line provides another layer of visual aid that helps the reader differentiate between positive and negative change in circulation.

On the original graph, there are many overlapping points near the y-axis in the neighborhood around -50 < y < -20. To reduce cognitive load and better reflect the clustering, I decrease the size and lowered the transparency of all points so that where more than one points overlap appears darker-colored. Moreover, it is challenging to tell which point in proximity to “Los Angeles Times” the label is actually for, so I shift it to right above the data point for clarification. A label for the y-axis is added to facilitate reading the plot as well.

Like previously mentioned, the original linear regression line tells misleading information which runs counter to the argument the author wants to make. Therefore, I simply delete the line so as to minimize the undue influence the “The New York Times” point exerts on the visual perception of the overall trend in the newspaper industry.

Portfolio Project1 Write-up

Vivian Du

1 Original Plot

2 Replicated Plot

3 Improved Plot