Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: Weiss (2020).


Objective

The original data visualisation was targeted for the general public to show the number of COVID-19 deaths as compared to non-COVID-19 deaths. The visualisation was created with the purpose to highlight how COVID-19 is the leading causes of death in the United States.

The visualisation chosen had the following three main issues:

  • There was no numbers in the visualisation to indicate or show the number of cases for each bars. Furthermore, there was no grid lines in the visualisation, which makes it hard when trying to see the number of cases based on the bar height.
  • The current visualisation plotted the bar side by side, but stacking the bars together to create the bar for total number of cases would be more informative as it allows the audience to know the total number of cases for each age group. Additionally, providing the percentage of COVID and non-COVID deaths for each age-group would be useful to inform the audience of the proportion of COVID and non-COVID deaths for each age-group.
  • The visualisation does not include data for COVID-19 comorbidities deaths, which are deaths that were diagnosed to be caused by the pressence of other disease (e.g., hypertension, diabetes, etc.) as well as COVID-19. As the current visualisation uses the data for deaths with only COVID-19 diagnosis, adding data for COVID-19 comorbidities death would be beneficial to inform the audience about how there are also deaths caused by COVID-19 that co-occurs with other diseases.

Reference

Code

The following code was used to fix the issues identified in the original.

# Loading relevant packages
library(readr)
library(ggplot2)
library(ggrepel)

# Data cleaning and wrangling was done to combine two datasets (National Center for Health Statistics, 2020a, 2020b) and create the current csv file
covid <- read_csv("data.csv")

# Storing the colorblind-friendly palette that will be used for the plot
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")

# Labelling the factor variable "death_causes"
covid$death_causes <- factor(covid$death_causes, levels = c("Non-COVID Deaths", "COVID-19 Comorbidities Deaths", "COVID-19 Deaths"), labels = c("Non-COVID Deaths", "COVID-19 Comorbidities Deaths", "COVID-19 Deaths"))

# Creating the plot
p1 <- ggplot(data = covid, aes(fill = death_causes, y = cases, x = age_group))
p1 <- p1 + geom_bar(position = "stack", stat = "identity") +
  scale_fill_manual(values = cbPalette) + 
  labs(title = "COVID and Comorbidities vs Non-COVID Deaths", 
       subtitle = "Feb 1 - Sep 6, 2020 \n United States",
       y = "Number of Cases", 
       x = "Age Groups") +
  scale_y_continuous(labels = scales::comma) + 
  geom_text_repel(data = covid, 
                  aes(x = age_group, y = cases, label = paste0(ratio,"%")),
                  size = 3, 
                  vjust = 0.7, 
                  position = position_stack()) +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"),
        plot.subtitle = element_text(hjust = 0.5, lineheight = 1.2),
        legend.position = "bottom", 
        legend.direction = "horizontal", 
        legend.title = element_blank())

Data Reference

Reconstruction

The following plot fixes the main issues in the original.