Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
The main objective of the original visualization is to demostrate a disconnect between the actual causes of deaths and what is reported in the news media. With the news media increasingly flirting with “click-bait” articles, less glamorous causes of death aren’t provided equal attention.
The visualization compares the actual causes of deaths in the USA in 2016 against Google search trends for causes of deaths, mentions of causes of deaths in The New York Times and mentions of causes of deaths in The Guardian newspaper.
Given the above, target audience for this visualization is an individual located in the US who wants to understand the major reasons for deaths in the US. This illustration would also be useful for anybody wanting to scrutinize news media’s tendency to sensationalize single, mostly negative events while ignoring larger, more positive trends that are less newsworthy.
The visualisation chosen had the following three main issues:
Visual Bombardment
The visualization shows 13 different causes of death for each of the four categories, with different colours being used for each cause of deaths. A complementary colour scale was employed to represent Quantitative data. This has been reconstructed with a more appropriate bi-colour diverging colour palette.
Using Area to Depict Quantity
Area has been used to represent percentage of death due to a particular cause, which is not the most accurate tool for comparing quantitative variables. In the reconstructed visualization, length has been used instead of area for greater accuracy and clarity.
Visual Mess of Annotations
Annotations have been extensively used for each of the 13 different causes of death in all four categories. This has resulted in some text becoming largely incomprehensible. The reconstructed visualization addresses this issue by showing the causes of death only once in the y-axis, thereby removing redundant annotations and increasing readability.
Reference
The following code was used to fix the issues identified in the original.
## Required packages
library(readr)
library(dplyr)
library(ggplot2)
library(tidyr)
library(data.table)
library(extrafont)
#Importing dataset
rawcdc <- read_csv("tp_cdc_n.csv", skip = 1)
rawgoogle <- read_csv("tp_google_trends_n.csv", skip = 1)
rawguardian <- read_csv("tp_guardian_n.csv", skip = 1)
rawnyt <- read_csv("tp_nyt_n.csv", skip = 1)
## Data Preprocessing
# Filtering only the data corresponding to year 2016
cdc <- rawcdc %>% filter(Words == 2016)
guardian <- rawguardian %>% filter(Words == 2016)
nyt <- rawnyt %>% filter(Words == 2016)
google <- rawgoogle %>% filter(Words == 2016)
# Changing col order and renaming cols of google to match others
google <- google[,c(1,2,3,4,5,6,7,8,11,9,10,12,13,14)]
setnames(google, old = c("alzheimer's","cancer","car accidents","diabetes",
"heart disease","homicide","kidney disease",
"respiratory disease","overdose","pneumonia","stroke",
"suicide","terrorism"),
new = c("Alzheimer's Disease","Cancer","Car Accidents","Diabetes",
"Heart Disease","Homicide","Kidney",
"Lower Respiratory Disease","Overdose","Pneumonia & Influenza","Stroke",
"Suicide","Terrorism"))
# Sorting datasets based on cdc's values
dfcdc <- data.frame(cdct = t(cdc))
sorted_order <- sort(dfcdc$cdct)
cdc <- cdc[,c(1,14,7,13,11,8,10,5,12,2,9,4,3,6)]
google <- google[,c(1,14,7,13,11,8,10,5,12,2,9,4,3,6)]
nyt <- nyt[,c(1,14,7,13,11,8,10,5,12,2,9,4,3,6)]
guardian <- guardian[,c(1,14,7,13,11,8,10,5,12,2,9,4,3,6)]
# Binding all 4 data frames into one
df_all <- rbind(cdc,google,nyt,guardian)
df_all <- cbind(df_all[1], Category=c('cdc','google','nyt','guardian'), df_all[,2:14])
# Finding % of deaths
df_all$total = apply(df_all[,3:15], 1, sum)
pcts = lapply(df_all[,3:15], function(x) {
(x / df_all$total * 100)
})
pct_df = data.frame(pcts)
pct_df <- cbind(pct_df[0], Category=c('cdc','google','nyt','guardian'), pct_df[,1:13])
df_long <- pct_df %>% gather(Terrorism:Heart.Disease, key = "Variable", value = "Value")
# Clean up factor labels for visualisation
df_long$Category <- factor(df_long$Category,
levels = c('cdc','google','nyt','guardian'),
labels = c("Actual Causes of deaths",
"Google Searches",
"Media Coverage: New York Times",
"Media Coverage: The Guardian"))
df_long$Variable <- factor(df_long$Variable,
levels = c("Terrorism",
"Homicide",
"Suicide",
"Pneumonia...Influenza",
"Kidney",
"Overdose",
"Diabetes",
"Stroke",
"Alzheimer.s.Disease",
"Lower.Respiratory.Disease",
"Car.Accidents",
"Cancer",
"Heart.Disease"),
labels = c("Terrorism",
"Homicide",
"Suicide",
"Pneumonia & Influenza",
"Kidney",
"Overdose",
"Diabetes",
"Stroke",
"Alzheimer's Disease",
"Lower Respiratory Disease",
"Road Accidents",
"Cancer",
"Heart Disease"))
# Reconstruction
p1 <- ggplot(data = df_long, aes(x = Variable, y = Value, fill = Variable)) +
geom_bar(stat = "identity") +
scale_y_continuous(breaks=c(0,10,20,30), label=c("0%","10%","20%","30%")) +
coord_flip() +
facet_grid(.~Category, labeller = label_wrap_gen(width=18), shrink = FALSE)
# Setting font and colour theme
p1 <- p1 + theme(legend.position = "none", #legend box
panel.background = element_rect(fill = "#f5f5f5"), #Background of the plot
plot.title = element_text(family = "Segoe UI", face = "bold", size = rel(2)), #Title of the plot
axis.title.x = element_blank(), #x-axis title label
axis.title.y = element_blank(), #y-axis title label
axis.text.y = element_text(family = "Segoe UI"), #y-axis labels
text = element_text(family="Segoe UI")
)
# Bar colour palette
palette <- c("#8F232C",
"#CB4B53",
"#EC9A9A",
"#DDC2C0",
"#EFCAC4",
"#CDEAE5",
"#ABCFD3",
"#89B3C1",
"#457B9D",
"#31587A",
"#274769",
"#1D3557",
"#324766"
)
p1 <- p1 + scale_fill_manual(values = palette)
# Adding plot details
p1 <- p1 + labs(title = "Causes of Death in the US - 2016",
subtitle = "What Americans die from, what they search on Google, and what the media reports on",
caption = "Source: Causes of Death - Does the news reflect what we die from? - https://ourworldindata.org/causes-of-death")
Data Reference
The following plot fixes the main issues in the original.