Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: Alphametic (2020).


Objective

The original data visualisation is produced by Alphametic (2020), who are a search marketing agency. The visualisation is intended to show market share or market penetration for the most widely used search engines to assist with Internet marketing and allow companies to target and design for particular engines.

There are number of issue with the visualisation:

  • A map has been used to show market share in percentage. This can be misleading, as will the map used Australia has a similar surface area to China, yet far fewer users.
  • The colour scheme is all blue and white. There are eight shades of blue required to identify the search engine.
  • The legend is distracting and difficult to read.
  • It is hard to match the legend to the country.
  • The source quoted is from 2019 but the text states that it is from 2020.
  • The search engine logos add no value.
  • There are logos for search engines in the country legends that do not appear in the main legend, DuckDuckGo for instance.
  • While the visualisation shows market share, this is not linked to country size. This is important for Internet marketers as a bigger market means potentially more revenue.

The visualisation chosen had many issues but the following three are the main issues:

  • It is not possible for a marketer to understand the size of the market from this visualisation and so it fails in its primary goal.
  • The map adds not real value and is a distraction.
  • the colourscheme makes it nearly impossible to identify market share. This also means that the visualisation does not achieve its goals.

Reference

Code

In order to re-produce the visualisation to be more effective it was decided to show the number of users for each search engine, for each country. This provides a more useful way of understanding the market size, with the visualisation itself showing the market segmentation.

To best represent the original intention of Alphametic an updated version of the market share data was sourced from the same location as the original data, Statcounter (2022). As with the orignal source, there were no figures for market size and so this was sourced from emarketer(2022). This dataset provides total number of internet users for each country.

Data wrangling was performed as a separate task to produce a csv file that contains:

  • a list of total users by country for the 15 countries and 8 search engines identified by Alphametic (2020). This was calculated by multiplying the percentage market share from Statcounter (2022) with the total number of users from emarketer(2022).
  • The sum of the total number of users by country to be used in ordering the visualisation.
  • The total users in millions.

The data wrangling performed, the following methodology was used to fix the issues identified in the original visualisation. The code is shown as separate chunks with clear explanation rather than comments. The modular design of ggplot2 allows that each of the components can be altered without affecting any of the other code chunks.

Load the required packages in order to import, manipulate and visualise the data.

library(ggplot2)
library(readr)
library(tidyr)

Load the data from the .CSV.

df<-read_csv("data\\users_2022_sum.csv")

Change the countries to factors and order them by total users. This is done so that the countries can be ordered from most to least number of users.

df$Country <- factor(df$Country,                                    # 
                  levels = df$Country[order(df$X_sum, decreasing = FALSE)])

Put the data in a tidy format to work with ggplot2. We will now just have one column for the search engine and one for the total number of users.

df_long<-pivot_longer(df,cols=2:9,names_to="engine",values_to="users")

Create the new plot using ggplot2. We will initially set this up so that country is on the x axis and the number of users is on the y axis. The spectral palette has been loaded as this provides a clear visual distinction between variables.

#theme_set(theme_grey())
user_plot <- ggplot(df_long, aes(Country,users)) + scale_fill_brewer(palette ="Spectral")

The stacked bar chart is selected. The engine variable is used for the fill and the legend but is first factorised to provide some control over the order that the search engines are displayed.

user_plot<-user_plot + geom_bar(aes(fill=factor(engine, 
                             levels=c("Google", 
                                      "bing",
                                      "Baidu",
                                      "Yahoo!",
                                      "Haosou",
                                      "Mail.ru",
                                      "Shenma",
                                      "YANDEX"))),
             stat="identity", 
             width = 0.8,
             col="black",
             position="stack")

The display theme is adjusted so that the axis text and title stand out.

user_plot<-user_plot + theme(axis.text.x = element_text(angle=0,face="bold", vjust=0.6),
                             axis.text.y = element_text(angle=0,face="bold", vjust=0.6),
                             plot.title = element_text(hjust=0.5, face="bold")
                             )

The visualisation is clearly labelled.

user_plot<-user_plot + labs(title = "Search Engine Market in Segmentation 2022",
       fill = "Search Engine",
       x = "Country",
       y = "Total Users in Millions",
       subtitle="Search Engine Usage by Country ",
       caption="sources: Statcounter (2022), emarketer(2022)"
       )

Flip the coordinates so that the visualisation is easer to read with the country labels shown on the right.

user_plot<-user_plot +coord_flip() 

Data Reference

Reconstruction

The following plot fixes the main issues in the original.

It can be easily seen both which markets are the largest and which search engine has the largest market share. The colour scheme makes it easy to differentiate between the different engines.

From the new visualisation it is clear that Google has the largest number of users overall and completely dominant everywhere except China and Russia, where Baidu and YANDEX have large market shares respectively. Surprisingly bing has a significant number of users in China with the visualisation showing that this is around 129 million users.