Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: Howmuch.net (2019).


With growth of population and network and buisness , the amount of data piled up has increased dramatically. Cloud computing promises to assis society with managing and re-derivign the stored data. Recently,it has been seen the cloud computing is changing the economy and transforming the work environment of millions of people and many private companies. The objective of the origianl data visualisation is to show top 25 private cloud companies in the world in year of 2019,and to observe it’s relation with number of employees in the companies. The target audience is average people who is interested in the cloud computing system and cloud comptuing companies.

The visualisation chosen had the following three main issues:

  • Visual Bombardment- The data visualisation includes too many usage of images such as company logo and numerical numbers are displayed in the cloud shape. Some of the company logo/names are not very clear as all of the information are squeezed in the cloud shape. This creates confusion for audience and hence lose target of the visualisation. In addition, bar at the botom of the graph which indicates rank of the company is unecessary visual usage. It can also distract and lose the audience’s focus.

  • Misleading visualisation- According to dataset, the company ‘UiPath’ has highest funding amount as well as highest number of employees. However, this data visualisation misleads as the company ‘Stripe’ is the rank 1. Also, one of the company does not even have numerical value yet still in the top 25 rank graph.

  • Poor choice of colour scale- The visualisation uses colour gradient to show the number of employees in the company. However, the colours are not very distinguishable and therefore hard to differentiate between different range of the number of the employees.

Reference

*Howmuch.net. (2019). Retrieved April 25, 2022, from website: https://howmuch.net/articles/best-private-cloud-companies-2019

Code

The following code was used to fix the issues identified in the original.

# Load the required libraries

library (tidyr)
library(dplyr)
library(ggplot2)
# Importing dataset

Data <-read.csv("top25company.csv")
head(Data)
##     Company                        Industry Funding....M.
## 1    Stripe              Payment processing           785
## 2 Snowflake            Cloud data warehouse           920
## 3    UiPath      Robotic process automation          1100
## 4 HashiCorp Cloud infrastructure automation           174
## 5   Datadog   Data monitoring and analytics           148
## 6   Procore         Construction management           250
##                Headquarters Employees                CEO
## 1 San Francisco, California      2000   Patrick Collison
## 2     San Mateo, California      1400     Frank Slootman
## 3        New York, New York      3200       Daniel Dines
## 4 San Francisco, California       600     David McJannet
## 5        New York, New York      1200      Olivier Pomel
## 6   Carpinteria, California      1800 Tooey Courtemanche
# Renaming columns of Company and Funding
Data <- rename(Data,  Funding= "Funding....M.")
head(Data)
##     Company                        Industry Funding              Headquarters
## 1    Stripe              Payment processing     785 San Francisco, California
## 2 Snowflake            Cloud data warehouse     920     San Mateo, California
## 3    UiPath      Robotic process automation    1100        New York, New York
## 4 HashiCorp Cloud infrastructure automation     174 San Francisco, California
## 5   Datadog   Data monitoring and analytics     148        New York, New York
## 6   Procore         Construction management     250   Carpinteria, California
##   Employees                CEO
## 1      2000   Patrick Collison
## 2      1400     Frank Slootman
## 3      3200       Daniel Dines
## 4       600     David McJannet
## 5      1200      Olivier Pomel
## 6      1800 Tooey Courtemanche
# Selecting required columns from Dataset
# Rearranging Funding in Descending order

df <- Data %>% select(Company,Funding,Employees)
topcompanydf <- df %>% arrange(desc(Funding))
topcompanydf
##         Company Funding Employees
## 1        UiPath    1100      3200
## 2     Snowflake     920      1400
## 3        Tanium     800      1200
## 4        Stripe     785      2000
## 5        Rubrik     553      1600
## 6         Gusto     516      1000
## 7    Databricks     499       800
## 8         Toast     496      2000
## 9   TripActions     480       800
## 10   Cloudflare     404      1000
## 11     InVision     350       840
## 12      Illumio     333       325
## 13 ServiceTitan     326       702
## 14        Plaid     310       390
## 15      Segment     284       440
## 16  Squarespace     279      1000
## 17      Procore     250      1800
## 18     Intercom     241       600
## 19    Confluent     206       500
## 20    Darktrace     177      1000
## 21    HashiCorp     174       600
## 22      Vlocity     163       800
## 23        nCino     150       750
## 24      Datadog     148      1200
## 25        Canva     140       650
# Creating data frame

companydf<-data.frame (Company=c("Uipath","Snowflake","Tanium","Stripe","Rubrik","Gusto","Databricks","Toast","TripActions","Cloudflare","InVision","Illumio","ServiceTitan","Plaid","Segment","Squarespace","Procore","Intercom","Confluent","Darktrace","HashiCorp","Vlocity","nCino","Datadog","Canva"), Funding=c(1100,920,800,785,553,516,499,496,480,404,350,333,326,310,284,279,250,241,206,177,174,163,150,148,140), Employees= c("3k and More","1k-1.99k","1k-1.99k","2k-2.99k","1k-1.99k","1k-1.99k","Less than 1k","2k-2.99k","Less than 1k","1k-1.99k","Less than 1k","Less than 1k","Less than 1k","Less than 1k","Less than 1k","1k-1.99k","1k-1.99k","Less than 1k","Less than 1k","1k-1.99k","Less than 1k","Less than 1k","Less than 1k","1k-1.99k","Less than 1k"))
# Converting to factor

companydf$Company <- as.factor (companydf$Company)
companydf$Employees <- as.factor(companydf$Employees) 
companydf
##         Company Funding    Employees
## 1        Uipath    1100  3k and More
## 2     Snowflake     920     1k-1.99k
## 3        Tanium     800     1k-1.99k
## 4        Stripe     785     2k-2.99k
## 5        Rubrik     553     1k-1.99k
## 6         Gusto     516     1k-1.99k
## 7    Databricks     499 Less than 1k
## 8         Toast     496     2k-2.99k
## 9   TripActions     480 Less than 1k
## 10   Cloudflare     404     1k-1.99k
## 11     InVision     350 Less than 1k
## 12      Illumio     333 Less than 1k
## 13 ServiceTitan     326 Less than 1k
## 14        Plaid     310 Less than 1k
## 15      Segment     284 Less than 1k
## 16  Squarespace     279     1k-1.99k
## 17      Procore     250     1k-1.99k
## 18     Intercom     241 Less than 1k
## 19    Confluent     206 Less than 1k
## 20    Darktrace     177     1k-1.99k
## 21    HashiCorp     174 Less than 1k
## 22      Vlocity     163 Less than 1k
## 23        nCino     150 Less than 1k
## 24      Datadog     148     1k-1.99k
## 25        Canva     140 Less than 1k
# Drawing the data visualisation using ggplot

plot<-ggplot(companydf, aes(x = reorder(Company,+Funding), y = Funding, fill = Employees))+
geom_bar(stat = "identity", position = "dodge", width=0.7, ,color="Black")+ 
coord_flip(ylim = c(130,1150))+
labs(title = "The World's top 25 private cloud companies in 2019", 
caption= "Source:https://howmuch.net/articles/best-private-cloud-companies-2019",
subtitle = "By Funding & Number of Employees",
x= 'Company name', y="Funding Value($M)")+ 
geom_text(aes(label=Funding),hjust = -0.5,size = 2, fill="black", family = "Times New Roman")

Data Reference

Reconstruction

The following plot fixes the main issues in the original.