Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Part I – Choose your data visualisation
* The above Data Visualisation has multiple issues, e.g. perception, deception and color issues.
* Data source of this visualisation can be found in a html table via this web page://www.wordstream.com/articles/most-expensive-keywords
* To my knowledge and also googled the title (“Top 20 Most Expensive Keywords”) of the visualisation and didn’t find critiques already made.
Part II – Deconstruct
Objective
The objective of this original data visualisation is to find out what are the top 20 most expensive (highest cost per click) keywords for Google AdWords and the target audience could be any organisation that provide online advertisement, or anyone or any orgnisation that would like to do online marketing via any search engine or alike, or general audience interested in online advertisement.
The visualisation chosen had the following three main issues:
Reference
Larry Kim. (2019). How Does Google Make Money? The Most Expensive Keywords in AdWords. Retrieved August 29, 2019, from WordStream website: https://www.wordstream.com/blog/ws/2011/07/18/most-expensive-keywords-google-adwords
Buts, A. 2012. “2. Visual Perception: Optimizing Information Visualization regarding the human visual system.” https://www.medien.ifi.lmu.de/lehre/ws1112/iv/folien/IV-W11-02-Perception.pdf.
MacDonald, L. W. 1999. “Using color effectively in computer graphics.” IEEE Computer Graphics and Applications 19 (4): 20–35. doi:10.1109/38.773961.
The following code was used to fix the issues identified in the original.
library(rvest)
library(ggplot2)
library(tidyverse)
# Read Data from the web page into varibale "keywordsdata"
webtables <- read_html("https://www.wordstream.com/articles/most-expensive-keywords")
keywordsdata <- html_table(html_nodes(webtables,"table")[[1]])
# Rename a variable of the dataset to make it easier for following data processing
plotdata <- keywordsdata %>% rename(CPC = "Cost per Click (CPC)")
# Remove the "$" sign and change the data format to numeric
plotdata$CPC <- as.numeric(sub("\\$","", plotdata$CPC))
# Factorise the "Keyword" variable of the dataset and order the levels of the factor
plotdata$Keyword <- plotdata$Keyword %>% factor(levels = plotdata$Keyword[order(-plotdata$CPC)])
# Plot a bar chart with Keyword soarted by value and in blue color
p1 <- ggplot(plotdata,aes(x=Keyword,y=CPC))+
geom_bar(stat = "identity", fill = "dodgerblue4")+
theme(axis.text.x=element_text(angle = 45,hjust=1))+
geom_text(aes(label=CPC),vjust=-0.5,size=3)+
labs(title = "The 20 Most Expensive Keywords in Google Ads", x="Keywords", y="Costs per Click (CPC)")
Data Reference
The following plot fixes the main issues in the original.