Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
The visualisation that I have selected to critique was taken from the Observatory of Economic Complexity (OEC), an online data visualization and distribution platform focused on the geography and dynamics of economic activities (OEC 2022).
Specifically, I have selected a visualisation that charts the evolution of market concentration of exports of tomato ketchup and other tomato sauces from around the globe and aims to demonstrate trends over the past 10 years (OEC 2022).
Based on information found on SimilarWeb, a website and audience insights company, we can see that the main audience for the OEC website are visitors aged between 25 - 34, who are referred from news and government websites (Similarweb 2022).
But specific to the visualisation I have selected, I believe that the audience would be business analysts looking for market trends, tomato producers looking for growth markets and business futurists looking to predict what the next 10 years might have in store for the value of exports in tomato ketchup and other tomato sauces.
Visualiation Critique
There are three key issues with the visualisation.
The author has chosen to use a stacked area graph to display the data, which can have many disadvantages. The biggest disadvantage of a stacked area graph is that the reader will tend to focus on the overall uppermost values which can result in poor interpretation of patterns that are visible. This is because each area of the stacked area graph accompanies the baseline of each preceding set of values and directly resembles its trend (VizWiz 2012).
Furthermore, another disadvantage of the stacked area graph is that it’s incredibly difficult to translate the size of each country/continent over time to form distinct patterns.
Lastly, a stacked area graph with so many variables severely restricts your audience to those that can best understand the complexity of the data you’re hoping to present.
At first glance, it is apparent that the visualisation is showing the value of exports in tomato ketchup and other tomato sauces by country but grouped by continent, with United States leading exports globally. However, the font sizes used are inconsistent and only present for 3 countries, making it impossible to compare values with other countries.
The legend used in the visualisation is just 7 coloured boxes, with no context whatsoever, making it near impossible to determine what the legend is trying to demonstrate. Given that we see United States is one colour, and both Netherlands and Italy are are different colour, we can assume that the legend is demonstrating continents, but for countries at the top of the visualisation, we cannot ascertain what continents those values belong to.
Solution
To remedy the issues that I have identified above, I have decided to create an individual line graph that aggregates the trade value by continent, instead of looking at the data by individual countries. This will condense the amount of visual information down to a more manageable level, while still demonstrating global value of tomato ketchup and tomato sauce exports.
Please refer to the tab Code to see the code that I created and implemented to create the new line graph.
Reference
The below code was created to remedy the issues that were identified in the original data visualisation.
As mentioned in the Original tab, I have decided to create an individual line graph that aggregates the trade value by continent, instead of looking at the data by individual countries.
In order to achieve this, I first imported the data using
read.csv()
function and then used the
aggregrate()
function to create a new data frame that
grouped the Trade.Value
values by Continent
and Year
.
After this, I then used ggplot2
to create an individual
line graph that takes the Year
values as the x-axis,
Trade Value
as the y-axis and displayed the data by
Continent
. Using scale_x_continuous()
,
scale_y_continuous()
, labs()
and
theme()
, I was able to manipulate the overall design of the
graph to ensure that the final visualisation was easy to read.
# Open library packages 'ggplot2' and 'scales'
library(ggplot2)
library(scales)
# Import csv file into data frame
df <- read.csv('Value-of-Exports-in-Ketchup.csv')
# Aggregate data by continent
new_df <- aggregate(df$Trade.Value, by = list(df$Continent, df$Year), FUN = sum)
# Add headers to each column
colnames(new_df) <- c("Continent", "Year", "TradeValue")
# Create line graph using ggplot2 package
df_plot <- ggplot(new_df, aes(x = Year, y = TradeValue, group = Continent)) +
geom_line(aes(color = Continent), size = 1) +
geom_point(aes(color = Continent), size = 2) +
scale_y_continuous(labels = unit_format(unit = "M", scale = 1e-6)) +
scale_x_continuous(breaks = scales::pretty_breaks(n = 10)) +
labs(x = "Year", y = "Value of Global Exports ($USD)", title="Value of Exports in Tomato Ketchup and other Tomato Sauces", subtitle="by Continent from 2010-2020", caption="Data Sourced from: The Observatory of Economic Complexity (2020)") +
theme(plot.title = element_text(size=12, face="bold",hjust = 0.5), plot.subtitle = element_text(size=10,hjust = 0.5), plot.caption = element_text(size=9,hjust = 0.5))
Data Reference
In this final section, you will able to see the newly generated data visualisation that fixes all the issues from the original data visualisation listed in Original.