Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: Fragapane (2021).


Objective

The objective of this data visualisation is to educate the audience on the top 30 countries (plus Italy) with the most endemic species across several taxonomic groups. This visualisation appeals to a wide audience, however it may be of particular interest to those who are interested in animals.

The visualisation chosen had the following three main issues:

  • Colour issues. The visualisation uses an individual colour to represent a respective taxonomic group. The colours appear to be a palette starting with a deep red transitioning to a light greenish colour. This colour palette would be problematic for those with Deuteranopia (red-green colour blindness).
  • Perceptive issues. Whilst aesthetically pleasing, the wave like shapes used to indicate the number of endemic species in each taxonomic group make it quite difficult to make accurate comparisons between countries. For example, comparing which country has more endemic amphibians out of Mexico and China is extremely difficult. The audience must refer to the number above, which itself is quite small and difficult to read.
  • Data source issues. This visualisation does not source the original data set and simply sources a website that uses the original data set. This can be misleading to the audience

Reference

Code

The IUCN Red List is the original source of the data. The references can be found at the bottom of this tab. These two data sets are only available in PDF format. So, they are converted to an excel format through Adobe’s online conversion tool (these two excel files have been submitted through canvas). Some brief cleaning is performed in excel (Unmerging cells, aligning columns and rows) to make it easier to work with in R (these two cleaned excel files have been submitted through canvas). Note that the original visualisation appears to be using IUCN Red List data from 2020, whereas the reconstruction will be using the 2021 data, therefore will be some minor differences in the figures.

#Packages
library(knitr)
library(readr)
library(dplyr)
library(ggplot2)
library(tidyr)

Data sets are now imported into R and checked.

Verterbrae <- read_csv("IUCN Red List - Verterbrae (Cleaned).csv")
Inverterbrae <- read_csv("IUCN Red List - Inverterbrae (Cleaned).csv")
head(Verterbrae)
## # A tibble: 6 x 17
##   Country        Mammals Birds Reptiles Amphibians Groupers `Herrings, Anchovie~
##   <chr>            <dbl> <dbl>    <dbl>      <dbl>    <dbl>                <dbl>
## 1 Algeria              1     1        3          1        0                    0
## 2 Egypt                5     0        3          1        0                    0
## 3 Libya                2     0        0          0        0                    0
## 4 Morocco              4     0       13          2        0                    0
## 5 Tunisia              1     0        1          0        0                    0
## 6 Western Sahara       0     0        1          0        0                    0
## # ... with 10 more variables: Seahorses & Pipefishes  <dbl>, Sturgeons <dbl>,
## #   Wrasses & Parrotfishes  <dbl>, Sharks & Rays <dbl>,
## #   Crocodiles & Alligators <dbl>, ...13 <lgl>, ...14 <lgl>, ...15 <lgl>,
## #   ...16 <lgl>, ...17 <lgl>
head(Inverterbrae)
## # A tibble: 6 x 11
##   Country        `FW Crabs` `FW Crayfish` Lobsters Abalones `Cone Snails`
##   <chr>               <dbl>         <dbl>    <dbl>    <dbl>         <dbl>
## 1 Algeria                 0             0        0        0             0
## 2 Egypt                   0             0        0        0             1
## 3 Libya                   0             0        0        0             0
## 4 Morocco                 0             0        0        0             0
## 5 Tunisia                 0             0        0        0             0
## 6 Western Sahara          0             0        0        0             0
## # ... with 5 more variables: Dragonflies & Damselflies <dbl>,
## #   Reef-forming Corals <dbl>, ...9 <lgl>, ...10 <lgl>, ...11 <lgl>

Data sets are joined. Only the countries and taxonomic groups that are in the original visualisation will remain after subsetting.

Data = merge(Verterbrae, Inverterbrae, by="Country", all.x = TRUE)
Data <- Data[c(106, 32, 14, 135, 47, 146, 50, 178, 175, 177, 239, 105, 243, 66, 213, 137, 222, 115, 11, 209, 161, 244, 27, 53, 43, 223, 220, 207, 41, 55, 113), c(1, 5, 3, 2, 18, 11, 24)]

Column names and country names are changed to be consistent with the original visualisation

Renamed <- Data %>% rename ("amphibian (A)" = "Amphibians",
                 "bird (B)" = "Birds",
                 "mammal (M)" = "Mammals",
                 "freshwater crab (Cr)" = "FW Crabs",
                 "shark and ray (S)" = "Sharks & Rays",
                 "reef-forming coral (C)" = "Reef-forming Corals")

Renamed[Renamed == "United States of America"] <- "United States"
Renamed[Renamed == "Venezuela, Bolivarian Republic of"] <- "Venezuela"
Renamed[Renamed == "Tanzania, United Republic of"] <- "Tanzania"
Renamed[Renamed == "Viet Nam"] <- "Vietnam"
Renamed[Renamed == "Bolivia, Plurinational State of"] <- "Bolivia"
Renamed[Renamed == "Congo, The Democratic Republic of the"] <- "Dem. Rep. of Congo"
Renamed[Renamed == "Taiwan, Province of China"] <- "Taiwan"

A total column is created to be consistent with the original visualisation. Columns are then reordered so the new ‘total’ column is second in the dataframe.

FinalData <- mutate(Renamed,"Total" = `amphibian (A)`+ `bird (B)` + `mammal (M)` + `freshwater crab (Cr)` + `shark and ray (S)` + `reef-forming coral (C)`)
head(FinalData)
##        Country amphibian (A) bird (B) mammal (M) freshwater crab (Cr)
## 106  Indonesia           218      525        298                   71
## 32      Brazil           564      259        206                   13
## 14   Australia           209      359        259                    7
## 135 Madagascar           312      119        214                   17
## 47       China           252       70         93                  214
## 146     Mexico           290      123        153                   55
##     shark and ray (S) reef-forming coral (C) Total
## 106                19                      4  1135
## 32                 21                      7  1070
## 14                153                      5   992
## 135                 6                      3   671
## 47                  5                      0   634
## 146                 7                      2   630
FinalData <- FinalData[, c("Country", "Total", "amphibian (A)", "bird (B)", "mammal (M)", "freshwater crab (Cr)", "shark and ray (S)", "reef-forming coral (C)")]

The data is transformed into a ‘long’ format.

DataLong <- gather(FinalData, key = "Variable", value = "Value", 'Total':'reef-forming coral (C)')

The variables now exist in single column ‘Variable’. Now, they are factorised.

DataLong$Variable <- factor(DataLong$Variable,
levels = c("Total", "amphibian (A)", "bird (B)", "mammal (M)", "freshwater crab (Cr)", "shark and ray (S)", "reef-forming coral (C)"))

The data has now been preprocessed and is ready to be visualised. This data will be visualised as a faceted bar chart. Some first steps are taken.

Viz <- ggplot(data = DataLong, 
            aes(x = reorder(Country, Value), y = Value, fill=Country)) +
  geom_bar(stat = "identity") + coord_flip() + 
  facet_grid(.~Variable, scales = "free") + 
  labs(title = "Top 30 Countries by number of endemic species, broken down by taxonomic group - 2021",
       caption = "Source: IUCN Red List. (2021). *Table 8a: Total endemic and threatened endemic species in each country (totals by taxonomic group): VERTEBRATES*
                  https://nc.iucnredlist.org/redlist/content/attachment_files/2021-3_RL_Stats_Table_8a_v2.pdf
IUCN Red List. (2021). *Table 8b: Total endemic and threatened endemic species in each country (totals by taxonomic group): INVERTEBRATES* 
                  https://nc.iucnredlist.org/redlist/content/attachment_files/2021-3_RL_Stats_Table_8b_v2.pdf")

Colours for each country’s bar are defined. The colours are based on the ‘average’ colour of the country’s flag using the online tool: https://matkl.github.io/average-color/. A background colour is also defined.

Col <- c("#a1c6e5",
         "#323274",
         "#917523",
         "#2f9a37",
         "#99762f",
         "#373871",
         "#df2c0e",
         "#b1742c",
         "#986a8f",
         "#4973b2",
         "#bb8630",
         "#a8ad64",
         "#ff7f7f",
         "#9a937d",
         "#f2ced6",
         "#a99376",
         "#ae6274",
         "#94736e",
         "#2b2e70",
         "#6e1916",
         "#e65a69",
         "#935a78",
         "#2d666f",
         "#545a5a",
         "#b26a2e",
         "#c22840",
         "#287c53",
         "#96647a",
         "#ba7b8e",
         "#9c5538",
         "#dd2f15")

background <- "#E0E0E0"

Colours are applied and the title of the chart is made to be bold.

Viz <- Viz + 
  scale_fill_manual(values = Col) +
  theme(plot.background = element_rect(fill = background),
        panel.background = element_rect(fill = background),
        title = element_text(face = "bold"))

Various elements are removed

Viz <- Viz + theme(legend.position="none",
                   axis.title.x=element_blank(),
                   axis.title.y=element_blank(),
                   axis.text.x=element_blank(),
                   axis.ticks.x=element_blank())

Labels are added to each bar to show the count in it’s respective facet

Viz <- Viz + geom_bar(position = 'dodge', stat='identity') +
geom_text(aes(label=Value), position=position_dodge(width=0.9), vjust=0.25, hjust=.95, size=3)

Data Reference

Reconstruction

The following plot fixes the main issues in the original.

print(Viz)