In this assignment, I take a data visualization with obvious issues and try to reconstruct the same to produce a visualization with the issues removed.
Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective:
The visualization clearly aims to educate the viewers on the highest valued brands and their associated industrial fields for the year 2020 and see how a brand’s value compares to the others albeit in a poor way. The target audience for this visualization would range from casual web-surfers who are just interested in information on their favorite brands to serious financial advisers/investors who are keen in knowing the value of a brand before investing in it.
The visualization chosen had the following three main issues:
Insufficient Information: An obvious and immediately noticeable issue is how for most of the brands, the brand name is not specified explicitly and only the logos are illustrated. For example, if you take the Asian continent, there’s quite a few companies denoted by Chinese wordings. Of course, a well-versed financial analyst might find out the name of the brand just by looking at the logo, but it is quite sensible to say that majority of the audience wouldn’t be able to name all the brands, particularly the less-known and non-english brands.
Impreciseness: The usage of differently sized bubbles to convey the brand values looks good, but in reality, lacks precision. A good example for this can be demonstrated by taking the continent of Europe. There, we see that there are a lot of seemingly equal-sized bubbles but obviously they have different brand values. This imprecise representation of the brand values can leave the audience frustrated as they wouldn’t know the exact value of the brand they’re interested in.
Color Bombardment: Usage of colors to denote various categories is essential, but here we see about 16 different colors which are displayed to represent the different brand industries. For example, when we consider the visualization part for the North American continent, we see that it contains almost all the colors from the multi-color legend. Such a bright and vast color palette leads to confusion and increases the time taken for the audience to piece together the industrial field associated with a brand, while serving towards increasing the visual bombardment imposed on the viewer. Also, a hundred bubbles seem all over the place - leaving the audience with no definite starting point.
Reference
The following code was used to fix the issues identified in the original.
Loading up the essential packages:
library(readxl)
library(forcats)
library(extrafont)
library(ggthemes)
library(ggplot2)
loadfonts(device = "win")
Reading the data into the dataframe brands:
setwd("C:/Users/david/Desktop/Data Viz R")
brands <- read_excel("Assignment2.xlsx")
head(brands)
## # A tibble: 6 x 5
## Name Country Region `2020 Brand Value (in millio~ Sector
## <chr> <chr> <chr> <dbl> <chr>
## 1 Amazon(220 Billi~ United Sta~ North Amer~ 220791 Retail
## 2 Google United Sta~ North Amer~ 159722 Tech
## 3 Apple United Sta~ North Amer~ 140524 Tech
## 4 Microsoft United Sta~ North Amer~ 117072 Tech
## 5 Samsung Group South Korea Asia 94494 Tech
## 6 ICBC China Asia 80791 Banki~
Looks good! We create a variable Rank which stores the rankings of the brands and arrange the variables.
brands$Rank = rank(-brands$`2020 Brand Value (in millions USD)`, ties.method = "first")
brands <- subset(brands, select=c(Rank,Name:Sector))
Creating a new variable by converting the millions into billions and rounding for better readability.
brands$`Brand Value (in Billions USD)` <- round(brands$`2020 Brand Value (in millions USD)`/1000,2)
Checking the number of brands in each sector:
table(brands$Sector)
##
## Aerospace & Defence Automobiles
## 1 10
## Banking Commercial Services
## 11 7
## Engineering & Construction Food & Drinks
## 5 7
## Insurance Logistics
## 4 2
## Media Mining, Iron & Steel
## 7 2
## Oil & Gas Others
## 7 6
## Real Estate Retail
## 1 10
## Tech Telecoms
## 10 10
Since we’d like to have less number of categories, similar industries are collapsed together and sectors with less number of associated brands are also collapsed into others.
brands$Sector_collapsed<- fct_collapse(brands$Sector,
`Engineering & Commercial Services` = c("Engineering & Construction",
"Commercial Services"),
Automobiles = "Automobiles",
`Food & Drinks` = "Food & Drinks",
Retail = "Retail",
`Banking & Media` = c("Banking", "Media"),
`Oil & Gas` = "Oil & Gas",
`Tech & Telecoms` = c("Tech", "Telecoms"),
Others = c("Others", "Aerospace & Defence", "Real Estate",
"Mining, Iron & Steel",
"Insurance", "Logistics")
)
table(brands$Sector_collapsed)
##
## Others Automobiles
## 16 10
## Banking & Media Engineering & Commercial Services
## 18 12
## Food & Drinks Oil & Gas
## 7 7
## Retail Tech & Telecoms
## 10 20
Looks a lot better now. Now, we factorize the created Sector_collapsed variable and the Region variable and assign appropriate labels.
brands$Sector_collapsed<- factor(brands$Sector_collapsed,
levels = c("Retail",
"Tech & Telecoms",
"Engineering & Commercial Services",
"Automobiles",
"Oil & Gas",
"Food & Drinks",
"Banking & Media",
"Others"),
labels = c("Retail",
"Technology & Telecommunication",
"Engineering & Commercial Services",
"Automobiles",
"Oil & Gas",
"Food & Drinks",
"Banking & Media",
"Others*")
)
brands$Region <- factor(brands$Region,
levels = c("North America",
"Asia",
"Europe",
"Middle East"),
labels = c("North America",
"Asia",
"Europe",
"Middle East")
)
Now that we have our dataframe ready, we move onto the important part - plotting! We take it step by step.
b1 <- ggplot(data=brands, aes(x=reorder(Name,`Brand Value (in Billions USD)`),
y=`Brand Value (in Billions USD)`,fill=Region)) +
geom_bar(stat="identity") + coord_flip()
b2 <- b1 + facet_wrap(vars(Sector_collapsed), scales="free_y") +
geom_text(aes(label=Rank, y=16), size=18.3, hjust="top", family="Georgia", color="white") +
geom_text(aes(label=`Brand Value (in Billions USD)`), hjust=-0.35, size=18.2, family="Georgia") +
theme(strip.text.x=element_text(size=60, angle=0, family="Georgia"))
b3 <- b2 + theme(axis.text.x=element_text(size=49, angle=0, face="bold"),
axis.text.y=element_text(size=62, face="bold"),
panel.grid.major=element_blank(), panel.grid.minor=element_blank(),
panel.background=element_rect(fill="#C9EBFF", colour="#C9EBFF",
size=0.5, linetype="solid"))
b4 <- b3 + scale_fill_manual(values=c("#820778", "#021F92", "#A10404", "#010101")) +
labs(fill="Region") +
theme(legend.title=element_text(size=75, family="Georgia"),
legend.text=element_text(size=65, family="Georgia"),
legend.background=element_rect(fill="grey90"),
axis.title.x=element_text(vjust=0, hjust=0),
axis.title.y=element_blank()) +
theme(legend.position=c(1, 0),legend.justification=c(1.5, -0.3),
legend.key.size=unit(5, "cm"),legend.key.width=unit(5,"cm"),
legend.margin=margin(4,4,4,4, "cm"),
legend.title.align=0.5)
b5 <- b4 + labs(title="The Top 100 Most Valuable Brands in 2020",
subtitle="Companies Ranked by Brand Valuation Across Industries (ranks shown inside each bar)",
caption="Source: Brand Finance - https://brandirectory.com/rankings/global/2020/table
*Others includes the Aerospace, Real Estate, Mining, Iron and Steel, Insurance and Logistics Industries")
b6 <- b5 + theme(text=element_text(family="Georgia"),
title=element_text(size=80, face="bold"),
plot.subtitle=element_text(size=70, face="plain"),
plot.caption=element_text(size=60, face="plain"),
axis.title.x=element_text(size=62, face="plain"))
Data Reference
A clear and a precise horizontal bar chart, faceted by corresponding industries, was created using ggplot. Each brand’s corresponding bar represents the overall ranking among the 100 brands, the region of the brand’s origin, and also the brand value, which is shown in billions(USD).
The following plot fixes the main issues in the original.