Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
The overall objective of this data visualisation is to show the loss of trees in different regions of the world and the reasons or drivers behind why these trees have been cut down. The target audience of this data visualisation are for those who are environmentally conscious and will most likely be looking at this visualisation as a supporting research in understanding the environmental impact of tree cover loss. Knowing the main drivers will impact the audience’s view on which driver they can focus on to minimise any further tree loss in any region of their choosing.
The visualisation chosen had the following three main issues:
First issue is the use of doughnuts which is a form of deception. While it is very visually appealing, it does not give accurate values for each type of driver as not all sections of the doughnut are labelled with a value. Furthermore, the total tree cover loss by million hectares (mha) are labelled beneath the region name with no mention of what it is referring to. For example, Latin America doughnut has the 2 largest sections labelled by their driver and percentage while the rest of the drivers aren’t labelled in any way. The audience may only be interested in the main drivers in each region but ultimately, while it is not misleading, it’ll leave the audience lacking in information and makes it difficult for the audience to compare the drivers across regions.
Second issue is the use of colour. They are overall vibrant colours and catches the eye but are not colour blind safe, especially for those who have red-green colour blindness as it is the most common type (Baglin 2020). For example, the south-east asia doughnut has such a small percentage of wildfire and urbanisation that it is not very visible, making forestry (green) and deforestation (red) quite close together. Someone with red-green colour blindness may not be able to differentiate between the two, resulting in being ill-informed, especially when not all sections of the doughnut are labelled.
The third issue is ignoring convention, a form of deception. The data visualisation appears to be more visually appealing rather than being informative. It uses a map coordinate system and plots the doughnuts where each region would be in sizes according to their total mha and vibrant colours for each driver. This leaves less room for driver labels on the doughnuts otherwise it would be a visual bombardment of words, covering up the map in the background and losing its visual appeal. With this focus, the visualisation loses majority of its ability to deliver information and leaves the audience without much to take away other than the main drivers of tree cover loss within each region. This may be it’s objective but if the audience wants to know more about the other drivers, they are not able to retrieve it without looking for the original data.
Reference
The following code was used to fix the issues identified in the original.
# import data
library(rvest)
drivers <- read_html("https://science.sciencemag.org/content/361/6407/1108/tab-figures-data")
length(html_nodes(drivers, "table"))
## [1] 1
drivers_data <- html_table(html_nodes(drivers, "table")[[1]])
head(drivers_data)
## X1 X2
## 1 Map-based estimates
## 2 Hansen et al. (3)
## 3 Region Tree coverloss (Mha,2001–2015)
## 4 North America 70
## 5 Latin America 78
## 6 Europe 15
## X3
## 1 Map-based estimates
## 2 Hansen et al. (3)
## 3 Tree cover loss(% of global total,2001–2015)
## 4 21%
## 5 25%
## 6 5%
## X4
## 1 Map-based estimates
## 2 Current study: Driver of tree cover loss
## 3 Deforestation
## 4 1%
## 5 56%
## 6 None
## X5
## 1 Map-based estimates
## 2 Current study: Driver of tree cover loss
## 3 Shiftingagriculture
## 4 <1%
## 5 31%
## 6 <1%
## X6
## 1 Map-based estimates
## 2 Current study: Driver of tree cover loss
## 3 Forestry
## 4 56%
## 5 13%
## 6 99%
## X7
## 1 Map-based estimates
## 2 Current study: Driver of tree cover loss
## 3 Wildfire
## 4 40%
## 5 1%
## 6 1%
## X8
## 1 Map-based estimates
## 2 Current study: Driver of tree cover loss
## 3 Urbanization
## 4 2%
## 5 <1%
## 6 None
## X9
## 1 Sample-based estimates
## 2 Current study: Driver of tree cover loss
## 3 Deforestation
## 4 2 ± 1%
## 5 64 ± 8%
## 6 None
## X10
## 1 Sample-based estimates
## 2 Current study: Driver of tree cover loss
## 3 Shiftingagriculture
## 4 1 ± 1%
## 5 24 ± 7%
## 6 <1 ± <1%
## X11
## 1 Sample-based estimates
## 2 Current study: Driver of tree cover loss
## 3 Forestry
## 4 48 ± 11%
## 5 9 ± 3%
## 6 95 ± 5%
## X12
## 1 Sample-based estimates
## 2 Current study: Driver of tree cover loss
## 3 Wildfire
## 4 48 ± 11%
## 5 <1 ± <1%
## 6 5 ± 5%
## X13
## 1 Sample-based estimates
## 2 Current study: Driver of tree cover loss
## 3 Urbanization
## 4 1 ± 1%
## 5 <1 ± <1%
## 6 None
# tidy
drivers_data <- drivers_data[4:11, -(4:8)]
colnames(drivers_data) <- c("Region", "Tree_Cover_loss(MHA)",
"Tree_Cover_Loss(%)", "Deforestation",
"Shifting_Agriculture", "Forestry", "Wildfire",
"Urbanisation")
str(drivers_data)
## 'data.frame': 8 obs. of 8 variables:
## $ Region : chr "North America" "Latin America" "Europe" "Africa" ...
## $ Tree_Cover_loss(MHA): chr "70" "78" "15" "39" ...
## $ Tree_Cover_Loss(%) : chr "21%" "25%" "5%" "13%" ...
## $ Deforestation : chr "2 ± 1%" "64 ± 8%" "None" "2 ± 1%" ...
## $ Shifting_Agriculture: chr "1 ± 1%" "24 ± 7%" "<1 ± <1%" "93 ± 3%" ...
## $ Forestry : chr "48 ± 11%" "9 ± 3%" "95 ± 5%" "4 ± 2%" ...
## $ Wildfire : chr "48 ± 11%" "<1 ± <1%" "5 ± 5%" "<1 ± <1%" ...
## $ Urbanisation : chr "1 ± 1%" "<1 ± <1%" "None" "1 ± 2%" ...
drivers_data$`Tree_Cover_loss(MHA)` <- as.numeric(drivers_data$`Tree_Cover_loss(MHA)`)
drivers_data$`Tree_Cover_Loss(%)` <- c(21, 25, 5, 13, 20, 13, 3, 100)
drivers_data$Deforestation <- c(2, 64, 0, 2, 2, 61, 8, 27)
drivers_data$Shifting_Agriculture <- c(1, 24, 0.5, 93, 1, 20, 10, 24)
drivers_data$Forestry <- c(48, 9, 95, 4, 38, 14, 19, 26)
drivers_data$Wildfire <- c(48, 0.5, 5, 0.5, 59, 2, 62, 23)
drivers_data$Urbanisation <- c(1, 0.5, 0, 1, 0.5, 0.5, 1, 1)
library(tidyr)
library(magrittr)
drivers_data <- drivers_data %>% gather(`Deforestation`, `Shifting_Agriculture`, `Forestry`, `Wildfire`, `Urbanisation`, key = "Drivers", value = "Loss(%)")
drivers_data$Drivers <- as.factor(drivers_data$Drivers)
# ggplot
library(ggplot2)
drivers_bar <- ggplot(drivers_data, aes(x = Region, y = `Loss(%)`, fill = Drivers))
driver.bar <- drivers_bar + geom_bar(stat = "identity") +
labs(title = "Tree Cover Loss by Driver (2001 - 2015)",
caption = "Source: Curtis et al., 2018") +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "right",
panel.background = element_rect(fill = "white",
colour = "grey93"),
panel.grid = element_line(colour = "grey93")) +
scale_fill_manual(values = c('#fdae61', '#fce747', '#abd9e9', '#275be8', '#d7181c'),
labels = c("Deforestation", "Forestry", "Shifting Agriculture",
"Urbanisation", "Wildfire"),
name = "Driver") +
facet_wrap(~Drivers, ncol = 2)
Data Reference
The following plot fixes the main issues in the original.