Click the Original, Code and Reconstruction tabs to learn more about the issues with the original data visualisation and how these issues were fixed in the reconstructed data visualisation.

Original

Source: Ritchie, H. (2022)

Source: Ritchie, H. (2022)

Objective

The “global deaths from disasters over more than a century” data visualisation aims to demonstrate how the number of people who have died from natural disasters has changed over time and what natural disasters have caused the most deaths. We cannot appreciate the frequency or severity of natural disasters that happen today without this historical context, and this context enables us to make more informed decisions about problems that impact our future, such as climate change and food shortages.

Target Audience

Ritchie (2020) targets this data visualisation towards those who seek to learn more about the impact that natural disasters have on the planet. This might include, but is not limited to, students, researchers, scientists, individuals and governments who use this information to support their work.

Major Issues

There are multiple issues present in this data visualisation, but I have highlighted the three most significant issues below.

  • Area and Size as Quantity

Using size and area to represent the differences in quantitative variables can be difficult for the human eye to interpret when the differences in these variables are negligible. In this data visualisation, each data point is a bubble and the size of the bubble reflects the number of people who have died from natural disasters in a given year. Because there are so many data points, we can see many overlapping bubbles which do not align correctly to the years marked on the x axis. For example, a flood killed 3.7 million people in 1931, but this data point appears to extend across more than 10 years.

  • Ethical Principles - Accuracy

Going back to the “garbage in, garbage out” principle, it is important to prepare data appropriately so that our data visualisations demonstrate trends accurately. This data set includes many missing values, but the data visualisation does not make this evident. For example, the data points that represent wildfires indicate that the death rate has been low but constant over the past 120 years, however, this variable is missing values for 55 years. This data visualisation would not be as misleading if the missing values were handled appropriately and the death rate was plotted against the y axis, because it would remove the need to make assumptions about these data points.

  • Ethical Principles - Transparency

The target audience does not see how the data was collected and prepared, so they can only use previous experiences and expectations to interpret the data visualisation in front of them. Therefore, the choices we make to represent the information in a data visualisation must be completely transparent. For example, we might expect that the “All Disasters” variable is the sum of deaths from all natural disasters included in the data visualisation, but the data set shows that 6 natural disasters are missing from the data visualisation. So, we cannot use the information in this data visualisation to understand how the values stored in the “All Disasters” variable were calculated or to confirm which natural disasters have caused the most deaths.

References

Code

The following code was used to fix the issues that we identified in the original data visualisation.

library(readr)
library(dplyr)
library(ggplot2)
library(tidyr)
library(RColorBrewer)
library(scales)

# import data set
nd <- read_csv("natural-disasters.csv")

# subset data set to only show relevant variables and observations
nd_subset <- nd %>% select(starts_with("Number of deaths from"), Year, Entity) %>% filter(Entity == "World")

# rename variables
nd_subset <- nd_subset %>% rename(
  Drought = `Number of deaths from drought`,
  Earthquake = `Number of deaths from earthquakes`,
  Total = `Number of deaths from disasters`,
  `Volcanic Activity` = `Number of deaths from volcanic activity`,
  Flood = `Number of deaths from floods`,
  `Mass Movement` = `Number of deaths from mass movements`,
  Storm = `Number of deaths from storms`,
  Landslide = `Number of deaths from landslides`,
  Fog = `Number of deaths from fog`,
  Wildfire = `Number of deaths from wildfires`,
  `Extreme Temperature` = `Number of deaths from extreme temperatures`,
  `Glacial Lake Outburst` = `Number of deaths from glacial lake outbursts`)

# the author confirms that where missing values exist, no deaths from natural disasters have been recorded, so missing values should be equal to zero
nd_subset[is.na(nd_subset)] <- 0

# remove variables that do not store any values
nd_subset <- nd_subset %>% select(-c(Fog, `Glacial Lake Outburst`))

# create new variable to include all variables except Drought variable and Flood variable
nd_subset <- nd_subset %>% mutate(`Other *` = `Volcanic Activity` +
                                    Earthquake +
                                    `Mass Movement` +
                                    Storm +
                                    Landslide +
                                    Wildfire +
                                    `Extreme Temperature`)

# transform data set to create Natural Disasters variable and Deaths variable
nd_subset2 <- nd_subset %>% pivot_longer(names_to = "Natural Disasters", values_to = "Deaths", cols = c("Drought", "Flood", "Total", "Other *"))

# divide the Deaths variable by 1,000,000 to simplify y axis values
nd_subset2$Deaths <- nd_subset2$Deaths/1000000

# generate colour blind appropriate palette for categorical variables
display.brewer.all(n = 4, type = "qual", colorblindFriendly = TRUE)

# create time series plot
new_dv <- nd_subset2 %>% ggplot(aes(x = Year, y = Deaths, colour = `Natural Disasters`)) +
  geom_line() +
  scale_colour_brewer(palette = "Dark2") +
  facet_grid(`Natural Disasters` ~ ., scales = "fixed") +
  labs(title = "Global Deaths from Natural Disasters (1900 - 2020)", subtitle = "Floods and droughts are the deadliest natural disasters, but there has been a significant decline in
deaths from natural disasters over time.", x = "Year", y = "Deaths (millions)", caption = "Source: Our World in Data (2020) - https://ourworldindata.org/explorers/natural-disasters
*Other natural disasters include extreme temperatures, wildfires, landslides, storms, mass movements, eartquakes and volcanic activity") +
  scale_x_continuous(breaks = seq(1900,2020,10)) +
  scale_y_continuous(breaks = seq(0,4,1)) +
  theme(legend.position = "none",
        panel.background = element_blank(),
        panel.grid.major.x = element_line(colour="grey95"),
        strip.background = element_rect(fill = "grey90"),
        plot.title = element_text(size = 12, face = "bold"),
        plot.subtitle = element_text(size = 10),
        plot.caption = element_text(size = 8, face = "italic"),
        axis.title.x = element_text(size = 10),
        axis.title.y = element_text(size = 10))

References

Reconstruction

The following plot fixes the issues that we identified in the original data visualisation.