Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
The original data visualisation is produced by Alphametic (2020), who are a search marketing agency. The visualisation is intended to show market share or market penetration for the most widely used search engines to assist with Internet marketing and allow companies to target and design for particular engines.
There are number of issue with the visualisation:
The visualisation chosen had many issues but the following three are the main issues:
Reference
In order to re-produce the visualisation to be more effective it was decided to show the number of users for each search engine, for each country. This provides a more useful way of understanding the market size, with the visualisation itself showing the market segmentation.
To best represent the original intention of Alphametic an updated version of the market share data was sourced from the same location as the original data, Statcounter (2022). As with the orignal source, there were no figures for market size and so this was sourced from emarketer(2022). This dataset provides total number of internet users for each country.
Data wrangling was performed as a separate task to produce a csv file that contains:
The data wrangling performed, the following methodology was used to fix the issues identified in the original visualisation. The code is shown as separate chunks with clear explanation rather than comments. The modular design of ggplot2 allows that each of the components can be altered without affecting any of the other code chunks.
Load the required packages in order to import, manipulate and visualise the data.
library(ggplot2)
library(readr)
library(tidyr)
Load the data from the .CSV.
df<-read_csv("data\\users_2022_sum.csv")
Change the countries to factors and order them by total users. This is done so that the countries can be ordered from most to least number of users.
df$Country <- factor(df$Country, #
levels = df$Country[order(df$X_sum, decreasing = FALSE)])
Put the data in a tidy format to work with ggplot2. We will now just have one column for the search engine and one for the total number of users.
df_long<-pivot_longer(df,cols=2:9,names_to="engine",values_to="users")
Create the new plot using ggplot2. We will initially set this up so that country is on the x axis and the number of users is on the y axis. The spectral palette has been loaded as this provides a clear visual distinction between variables.
#theme_set(theme_grey())
user_plot <- ggplot(df_long, aes(Country,users)) + scale_fill_brewer(palette ="Spectral")
The stacked bar chart is selected. The engine variable is used for the fill and the legend but is first factorised to provide some control over the order that the search engines are displayed.
user_plot<-user_plot + geom_bar(aes(fill=factor(engine,
levels=c("Google",
"bing",
"Baidu",
"Yahoo!",
"Haosou",
"Mail.ru",
"Shenma",
"YANDEX"))),
stat="identity",
width = 0.8,
col="black",
position="stack")
The display theme is adjusted so that the axis text and title stand out.
user_plot<-user_plot + theme(axis.text.x = element_text(angle=0,face="bold", vjust=0.6),
axis.text.y = element_text(angle=0,face="bold", vjust=0.6),
plot.title = element_text(hjust=0.5, face="bold")
)
The visualisation is clearly labelled.
user_plot<-user_plot + labs(title = "Search Engine Market in Segmentation 2022",
fill = "Search Engine",
x = "Country",
y = "Total Users in Millions",
subtitle="Search Engine Usage by Country ",
caption="sources: Statcounter (2022), emarketer(2022)"
)
Flip the coordinates so that the visualisation is easer to read with the country labels shown on the right.
user_plot<-user_plot +coord_flip()
Data Reference
Statcounter,Search Engine Market Share Worldwide (2022). Statcounter-Search-Country-desktop-mobile-tablet-console-2022-06.csv Retrieved July, 2022, form Statcounter website:https://gs.statcounter.com/search-engine-market-share#monthly-201902-202002-bar
emarketer, Insider Intelligence (2022). Internet Users.csv, Retrieved July, 2022, from emarketer website: https://forecasts-na1.emarketer.com/5a32abf7e0cb1d0dd489d23c/5a32abede0cb1d0dd489d23b
Alphametic (2020). Global Search Engine Market Share in the Top 15 GDP Nations. Retrieved July, 2022, from Alphametic website: https://alphametic.com/global-search-engine-market-share
The following plot fixes the main issues in the original.
It can be easily seen both which markets are the largest and which search engine has the largest market share. The colour scheme makes it easy to differentiate between the different engines.
From the new visualisation it is clear that Google has the largest
number of users overall and completely dominant everywhere except China
and Russia, where Baidu and YANDEX have large market shares
respectively. Surprisingly bing has a significant number of users in
China with the visualisation showing that this is around 129 million
users.