Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: Causes of Death. [online] OurWorldInData.org


Objective

The main objective of the original visualization is to demostrate a disconnect between the actual causes of deaths and what is reported in the news media. With the news media increasingly flirting with “click-bait” articles, less glamorous causes of death aren’t provided equal attention.

The visualization compares the actual causes of deaths in the USA in 2016 against Google search trends for causes of deaths, mentions of causes of deaths in The New York Times and mentions of causes of deaths in The Guardian newspaper.

Given the above, target audience for this visualization is an individual located in the US who wants to understand the major reasons for deaths in the US. This illustration would also be useful for anybody wanting to scrutinize news media’s tendency to sensationalize single, mostly negative events while ignoring larger, more positive trends that are less newsworthy.

The visualisation chosen had the following three main issues:

  • Visual Bombardment

    The visualization shows 13 different causes of death for each of the four categories, with different colours being used for each cause of deaths. A complementary colour scale was employed to represent Quantitative data. This has been reconstructed with a more appropriate bi-colour diverging colour palette.

  • Using Area to Depict Quantity

    Area has been used to represent percentage of death due to a particular cause, which is not the most accurate tool for comparing quantitative variables. In the reconstructed visualization, length has been used instead of area for greater accuracy and clarity.

  • Visual Mess of Annotations

    Annotations have been extensively used for each of the 13 different causes of death in all four categories. This has resulted in some text becoming largely incomprehensible. The reconstructed visualization addresses this issue by showing the causes of death only once in the y-axis, thereby removing redundant annotations and increasing readability.

Reference

Code

The following code was used to fix the issues identified in the original.

## Required packages 

library(readr)
library(dplyr)
library(ggplot2)
library(tidyr)
library(data.table)
library(extrafont)

#Importing dataset

rawcdc <- read_csv("tp_cdc_n.csv", skip = 1)
rawgoogle <- read_csv("tp_google_trends_n.csv", skip = 1)
rawguardian <- read_csv("tp_guardian_n.csv", skip = 1)
rawnyt <- read_csv("tp_nyt_n.csv", skip = 1)

## Data Preprocessing
# Filtering only the data corresponding to year 2016

cdc <- rawcdc %>% filter(Words == 2016)
guardian <- rawguardian %>% filter(Words == 2016)
nyt <- rawnyt %>% filter(Words == 2016)
google <- rawgoogle %>% filter(Words == 2016)


# Changing col order and renaming cols of google to match others

google <- google[,c(1,2,3,4,5,6,7,8,11,9,10,12,13,14)] 
setnames(google, old = c("alzheimer's","cancer","car accidents","diabetes",
                         "heart disease","homicide","kidney disease",
                         "respiratory disease","overdose","pneumonia","stroke",
                         "suicide","terrorism"),
                 new = c("Alzheimer's Disease","Cancer","Car Accidents","Diabetes",
                         "Heart Disease","Homicide","Kidney",
                         "Lower Respiratory Disease","Overdose","Pneumonia & Influenza","Stroke",
                         "Suicide","Terrorism"))

# Sorting datasets based on cdc's values

dfcdc <- data.frame(cdct = t(cdc))
sorted_order <- sort(dfcdc$cdct)
cdc <- cdc[,c(1,14,7,13,11,8,10,5,12,2,9,4,3,6)] 
google <- google[,c(1,14,7,13,11,8,10,5,12,2,9,4,3,6)]
nyt <- nyt[,c(1,14,7,13,11,8,10,5,12,2,9,4,3,6)] 
guardian <- guardian[,c(1,14,7,13,11,8,10,5,12,2,9,4,3,6)]

# Binding all 4 data frames into one

df_all <- rbind(cdc,google,nyt,guardian)
df_all <- cbind(df_all[1], Category=c('cdc','google','nyt','guardian'), df_all[,2:14])

# Finding % of deaths

df_all$total = apply(df_all[,3:15], 1, sum)
pcts = lapply(df_all[,3:15], function(x) {
  (x / df_all$total * 100)
})
pct_df = data.frame(pcts)
pct_df <- cbind(pct_df[0], Category=c('cdc','google','nyt','guardian'), pct_df[,1:13])
df_long <- pct_df %>% gather(Terrorism:Heart.Disease, key = "Variable", value = "Value")

# Clean up factor labels for visualisation

df_long$Category <- factor(df_long$Category,
                           levels = c('cdc','google','nyt','guardian'),
                           labels = c("Actual Causes of deaths",
                                      "Google Searches",
                                      "Media Coverage: New York Times",
                                      "Media Coverage: The Guardian"))
df_long$Variable <- factor(df_long$Variable,
                              levels = c("Terrorism",
                                         "Homicide",
                                         "Suicide",
                                         "Pneumonia...Influenza",
                                         "Kidney",
                                         "Overdose",
                                         "Diabetes",
                                         "Stroke",
                                         "Alzheimer.s.Disease",
                                         "Lower.Respiratory.Disease",
                                         "Car.Accidents",
                                         "Cancer",
                                         "Heart.Disease"),
                              labels = c("Terrorism",
                                         "Homicide",
                                         "Suicide",
                                         "Pneumonia & Influenza",
                                         "Kidney",
                                         "Overdose",
                                         "Diabetes",
                                         "Stroke",
                                         "Alzheimer's Disease",
                                         "Lower Respiratory Disease",
                                         "Road Accidents",
                                         "Cancer",
                                         "Heart Disease"))


# Reconstruction

p1 <- ggplot(data = df_long, aes(x = Variable, y = Value, fill = Variable)) + 
      geom_bar(stat = "identity") + 
      scale_y_continuous(breaks=c(0,10,20,30), label=c("0%","10%","20%","30%")) + 
      coord_flip() + 
      facet_grid(.~Category, labeller = label_wrap_gen(width=18), shrink = FALSE) 

# Setting font and colour theme

p1 <- p1  + theme(legend.position = "none", #legend box
                panel.background = element_rect(fill = "#f5f5f5"), #Background of the plot
                plot.title = element_text(family = "Segoe UI", face = "bold", size = rel(2)), #Title of the plot
                axis.title.x = element_blank(), #x-axis title label
                axis.title.y = element_blank(), #y-axis title label
                axis.text.y = element_text(family = "Segoe UI"), #y-axis labels
                text = element_text(family="Segoe UI")
                )

# Bar colour palette

palette <- c("#8F232C",
             "#CB4B53",
             "#EC9A9A",
             "#DDC2C0",
             "#EFCAC4",
             "#CDEAE5",
             "#ABCFD3",
             "#89B3C1",
             "#457B9D",
             "#31587A",
             "#274769",
             "#1D3557",
             "#324766"
             )
p1 <- p1  + scale_fill_manual(values = palette)

# Adding plot details

p1 <- p1 + labs(title = "Causes of Death in the US - 2016",
              subtitle  = "What Americans die from, what they search on Google, and what the media reports on",
              caption = "Source: Causes of Death - Does the news reflect what we die from? - https://ourworldindata.org/causes-of-death")

Data Reference

Reconstruction

The following plot fixes the main issues in the original.