Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
The objective of this data visualisation is to educate the audience on the top 30 countries (plus Italy) with the most endemic species across several taxonomic groups. This visualisation appeals to a wide audience, however it may be of particular interest to those who are interested in animals.
The visualisation chosen had the following three main issues:
Reference
The IUCN Red List is the original source of the data. The references can be found at the bottom of this tab. These two data sets are only available in PDF format. So, they are converted to an excel format through Adobe’s online conversion tool (these two excel files have been submitted through canvas). Some brief cleaning is performed in excel (Unmerging cells, aligning columns and rows) to make it easier to work with in R (these two cleaned excel files have been submitted through canvas). Note that the original visualisation appears to be using IUCN Red List data from 2020, whereas the reconstruction will be using the 2021 data, therefore will be some minor differences in the figures.
#Packages
library(knitr)
library(readr)
library(dplyr)
library(ggplot2)
library(tidyr)
Data sets are now imported into R and checked.
Verterbrae <- read_csv("IUCN Red List - Verterbrae (Cleaned).csv")
Inverterbrae <- read_csv("IUCN Red List - Inverterbrae (Cleaned).csv")
head(Verterbrae)
## # A tibble: 6 x 17
## Country Mammals Birds Reptiles Amphibians Groupers `Herrings, Anchovie~
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Algeria 1 1 3 1 0 0
## 2 Egypt 5 0 3 1 0 0
## 3 Libya 2 0 0 0 0 0
## 4 Morocco 4 0 13 2 0 0
## 5 Tunisia 1 0 1 0 0 0
## 6 Western Sahara 0 0 1 0 0 0
## # ... with 10 more variables: Seahorses & Pipefishes <dbl>, Sturgeons <dbl>,
## # Wrasses & Parrotfishes <dbl>, Sharks & Rays <dbl>,
## # Crocodiles & Alligators <dbl>, ...13 <lgl>, ...14 <lgl>, ...15 <lgl>,
## # ...16 <lgl>, ...17 <lgl>
head(Inverterbrae)
## # A tibble: 6 x 11
## Country `FW Crabs` `FW Crayfish` Lobsters Abalones `Cone Snails`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Algeria 0 0 0 0 0
## 2 Egypt 0 0 0 0 1
## 3 Libya 0 0 0 0 0
## 4 Morocco 0 0 0 0 0
## 5 Tunisia 0 0 0 0 0
## 6 Western Sahara 0 0 0 0 0
## # ... with 5 more variables: Dragonflies & Damselflies <dbl>,
## # Reef-forming Corals <dbl>, ...9 <lgl>, ...10 <lgl>, ...11 <lgl>
Data sets are joined. Only the countries and taxonomic groups that are in the original visualisation will remain after subsetting.
Data = merge(Verterbrae, Inverterbrae, by="Country", all.x = TRUE)
Data <- Data[c(106, 32, 14, 135, 47, 146, 50, 178, 175, 177, 239, 105, 243, 66, 213, 137, 222, 115, 11, 209, 161, 244, 27, 53, 43, 223, 220, 207, 41, 55, 113), c(1, 5, 3, 2, 18, 11, 24)]
Column names and country names are changed to be consistent with the original visualisation
Renamed <- Data %>% rename ("amphibian (A)" = "Amphibians",
"bird (B)" = "Birds",
"mammal (M)" = "Mammals",
"freshwater crab (Cr)" = "FW Crabs",
"shark and ray (S)" = "Sharks & Rays",
"reef-forming coral (C)" = "Reef-forming Corals")
Renamed[Renamed == "United States of America"] <- "United States"
Renamed[Renamed == "Venezuela, Bolivarian Republic of"] <- "Venezuela"
Renamed[Renamed == "Tanzania, United Republic of"] <- "Tanzania"
Renamed[Renamed == "Viet Nam"] <- "Vietnam"
Renamed[Renamed == "Bolivia, Plurinational State of"] <- "Bolivia"
Renamed[Renamed == "Congo, The Democratic Republic of the"] <- "Dem. Rep. of Congo"
Renamed[Renamed == "Taiwan, Province of China"] <- "Taiwan"
A total column is created to be consistent with the original visualisation. Columns are then reordered so the new ‘total’ column is second in the dataframe.
FinalData <- mutate(Renamed,"Total" = `amphibian (A)`+ `bird (B)` + `mammal (M)` + `freshwater crab (Cr)` + `shark and ray (S)` + `reef-forming coral (C)`)
head(FinalData)
## Country amphibian (A) bird (B) mammal (M) freshwater crab (Cr)
## 106 Indonesia 218 525 298 71
## 32 Brazil 564 259 206 13
## 14 Australia 209 359 259 7
## 135 Madagascar 312 119 214 17
## 47 China 252 70 93 214
## 146 Mexico 290 123 153 55
## shark and ray (S) reef-forming coral (C) Total
## 106 19 4 1135
## 32 21 7 1070
## 14 153 5 992
## 135 6 3 671
## 47 5 0 634
## 146 7 2 630
FinalData <- FinalData[, c("Country", "Total", "amphibian (A)", "bird (B)", "mammal (M)", "freshwater crab (Cr)", "shark and ray (S)", "reef-forming coral (C)")]
The data is transformed into a ‘long’ format.
DataLong <- gather(FinalData, key = "Variable", value = "Value", 'Total':'reef-forming coral (C)')
The variables now exist in single column ‘Variable’. Now, they are factorised.
DataLong$Variable <- factor(DataLong$Variable,
levels = c("Total", "amphibian (A)", "bird (B)", "mammal (M)", "freshwater crab (Cr)", "shark and ray (S)", "reef-forming coral (C)"))
The data has now been preprocessed and is ready to be visualised. This data will be visualised as a faceted bar chart. Some first steps are taken.
Viz <- ggplot(data = DataLong,
aes(x = reorder(Country, Value), y = Value, fill=Country)) +
geom_bar(stat = "identity") + coord_flip() +
facet_grid(.~Variable, scales = "free") +
labs(title = "Top 30 Countries by number of endemic species, broken down by taxonomic group - 2021",
caption = "Source: IUCN Red List. (2021). *Table 8a: Total endemic and threatened endemic species in each country (totals by taxonomic group): VERTEBRATES*
https://nc.iucnredlist.org/redlist/content/attachment_files/2021-3_RL_Stats_Table_8a_v2.pdf
IUCN Red List. (2021). *Table 8b: Total endemic and threatened endemic species in each country (totals by taxonomic group): INVERTEBRATES*
https://nc.iucnredlist.org/redlist/content/attachment_files/2021-3_RL_Stats_Table_8b_v2.pdf")
Colours for each country’s bar are defined. The colours are based on the ‘average’ colour of the country’s flag using the online tool: https://matkl.github.io/average-color/. A background colour is also defined.
Col <- c("#a1c6e5",
"#323274",
"#917523",
"#2f9a37",
"#99762f",
"#373871",
"#df2c0e",
"#b1742c",
"#986a8f",
"#4973b2",
"#bb8630",
"#a8ad64",
"#ff7f7f",
"#9a937d",
"#f2ced6",
"#a99376",
"#ae6274",
"#94736e",
"#2b2e70",
"#6e1916",
"#e65a69",
"#935a78",
"#2d666f",
"#545a5a",
"#b26a2e",
"#c22840",
"#287c53",
"#96647a",
"#ba7b8e",
"#9c5538",
"#dd2f15")
background <- "#E0E0E0"
Colours are applied and the title of the chart is made to be bold.
Viz <- Viz +
scale_fill_manual(values = Col) +
theme(plot.background = element_rect(fill = background),
panel.background = element_rect(fill = background),
title = element_text(face = "bold"))
Various elements are removed
Viz <- Viz + theme(legend.position="none",
axis.title.x=element_blank(),
axis.title.y=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank())
Labels are added to each bar to show the count in it’s respective facet
Viz <- Viz + geom_bar(position = 'dodge', stat='identity') +
geom_text(aes(label=Value), position=position_dodge(width=0.9), vjust=0.25, hjust=.95, size=3)
Data Reference
The following plot fixes the main issues in the original.
print(Viz)