Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
The original data visualisation uses a “cylinder and pipe” layout to show the source which is either surface water or groundwater of North Carolina’s freshwater and for what purposesthe water was used in 2010.
The general point of this article is to investigate the stress on existing supplies and to improve and evaluate possible water-supply management options.
The visualisation chosen had the following three main issues:
The original diagram does not give a proper representation of how the source water is distributed across several categories. Looking at the diagram, we cannot interpret which source provided the maximum or minimum water supply across the 8 categories.
The diagram depicts flow of water from source in a hierarchical way, that is, the pipe leading out of surface water and groundwater cylinders to the top row and then flowing to the bottom row. This is a deceptive visualisation as it misinterpretes the water distribution.
The original image contains green and blue colours which should be avoided as these are potential nightmares to colour blind users.
Reference
United States Geological Survey (USGS) . (2010). Water Use in North Carolina, 2010 *. Retrieved September 17, 2019, from USGS.gov: https://www.usgs.gov/centers/sa-water/science/water-usenc-2010?qt-science_center_objects=0#qt-science_center_objects
The following code was used to fix the issues identified in the original.
library(ggplot2)
df2 <- data.frame(Source_water = rep(c("Surface","Groundwater"), each = 8),
cat = rep(c("Public Supply","Domestic","Irrigation","Livestock","Aquaculture","Industrial","Mining","Thermal"),2),
len = c(766, 0.9, 279, 15, 1450, 188, 4.9, 7660, 194, 231, 88, 57, 12, 84, 28, 0.99))
head(df2)
## Source_water cat len
## 1 Surface Public Supply 766.0
## 2 Surface Domestic 0.9
## 3 Surface Irrigation 279.0
## 4 Surface Livestock 15.0
## 5 Surface Aquaculture 1450.0
## 6 Surface Industrial 188.0
plt <- ggplot(data=df2, aes(x=cat, y=log(len), fill=Source_water)) +
geom_bar(stat="identity", position=position_dodge()) +
geom_text(aes(label=len), vjust=1.6, color="black",
position = position_dodge(0.9), size=3.5)+
labs(title = "Source and Use of Freshwater in North Caroline, 2010",
x = "categories",
y = "log of Millions Gallons per day")
Data Reference
code to plot bar chart was obtained from : http://www.sthda.com/english/wiki/ggplot2-barplots-quick-start-guide-r-software-and-data-visualization
The following plot fixes the main issues in the original.
The reconstructed visualisation, that is, bar chart is one of the most intuitive ways to compare the different categories of water distribution from source. With the newly constructed bar chart we can easily spot the difference in the water supply. From groundwater, the maximum water was supplied for domestic uses and usage for thermoelectric power was minimal whereas the maximum surface water contributed to thermoelectric power uses and domestic uses had minimum supply.
The bar chart shows how the water in North Carolina is distributed across 8 categories eliminating the perception of water flowing from one row to the other.
In order to get rid of the colour blind error, the colours chosen for the bar plot are red and turquoise.