Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: Larry Kim (2019).


Part I – Choose your data visualisation
* The above Data Visualisation has multiple issues, e.g. perception, deception and color issues.
* Data source of this visualisation can be found in a html table via this web page://www.wordstream.com/articles/most-expensive-keywords
* To my knowledge and also googled the title (“Top 20 Most Expensive Keywords”) of the visualisation and didn’t find critiques already made.

Part II – Deconstruct
Objective

The objective of this original data visualisation is to find out what are the top 20 most expensive (highest cost per click) keywords for Google AdWords and the target audience could be any organisation that provide online advertisement, or anyone or any orgnisation that would like to do online marketing via any search engine or alike, or general audience interested in online advertisement.

The visualisation chosen had the following three main issues:

  • The pie chart has 20 different colors, though there are annotations for the smaller areas, it is normally recommended for pir charts to have 5-7 colours (based on short term memory; macDonald 1999) and no more than 1 colors (Buts 2012).
  • The percentage and area propotions are not correct, e.g. it says #1 is 24%, and #2 is 12.8%, which is 36.8% in total, however they take more than 50% of the pie chart area.And actually this percentage information is not really relevant to the audience as the focus of the visualisation is the cost rather than the frequency of the keywords.
  • The colors are also not so right, e.g. #1 is in red and #2 is in green and next to #1, which can be an issue for red/green color blinded audience; also the color associations are not ideal and do not reflect the key words (e.g.: red color used for key word #7 Donate).

Reference

Code

The following code was used to fix the issues identified in the original.

library(rvest)
library(ggplot2)
library(tidyverse)

# Read Data from the web page into varibale "keywordsdata"
webtables <- read_html("https://www.wordstream.com/articles/most-expensive-keywords")

keywordsdata <- html_table(html_nodes(webtables,"table")[[1]])

# Rename a variable of the dataset to make it easier for following data processing
plotdata <- keywordsdata %>% rename(CPC = "Cost per Click (CPC)")

# Remove the "$" sign and change the data format to numeric
plotdata$CPC <- as.numeric(sub("\\$","", plotdata$CPC))

# Factorise the "Keyword" variable of the dataset and order the levels of the factor
plotdata$Keyword <- plotdata$Keyword %>% factor(levels = plotdata$Keyword[order(-plotdata$CPC)])

# Plot a bar chart with Keyword soarted by value and in blue color
p1 <- ggplot(plotdata,aes(x=Keyword,y=CPC))+
  geom_bar(stat = "identity", fill = "dodgerblue4")+
  theme(axis.text.x=element_text(angle = 45,hjust=1))+
  geom_text(aes(label=CPC),vjust=-0.5,size=3)+
  labs(title = "The 20 Most Expensive Keywords in Google Ads", x="Keywords", y="Costs per Click (CPC)")

Data Reference

Reconstruction

The following plot fixes the main issues in the original.