ORIGINAL VISUALISATION


Source: https://howmuch.net/articles/timeline-retail-sales-growth-US


OBJECTIVE

The main objective of the original data visualisation is to examine the growth of retail sales in the US for the last 2 decades. It also depicts the growth in the e-ommerce sector.

The retail industry has always been gradually growing. There have been several times when retail industry saw a pushback by market recessions like the Great Recession from 2007-2009 which had a long lasting impact on retail industry for years, and is clearly evident from the statistical data shown in the data visualisation. But the E-Commerce has seen a positive growth since 2000 that even during the Great Recession, it grew from 2.9 - 3.4 %.

The growth in E-Commerce shows how consumer has shifted towards online product services over the years. The covid pandemic in 2020 saw another pushback for almost all the industries including retail but the e-commerce saw a huge growth that many companies saw the opportunity and invested billions of dollars in e-commerce.

The E-commerce industry is expected to grow at must faster rates in the future years.

The growth of retail sales and e-commerce is what the data visualisation is telling its audience.

The target audience of the data visualisation is general public like students, teachers, professors, etc. not relating directly to the retail industry. The data visualisation gives rough idea about the growth in retail sales in the US. It gives information about sales’ numbers & growth percentages over the years.

The target audience had been retail industry professionals if the data visualisation considered various factors like market behaviour, political factors, internet, and many others that directly or indirectly led to the growth of the retail industry.

The visualisation chosen had the following three main issues:

One of the major issues of the data visualisation is the use of sequential pie chart.

The pie chart includes the sales numbers shown through bars with e-commerce sales stacked on top of total sales. The pie chart gives a clear information about the total retail sales and growth percentages through text labels present over the bars but the given data visualisation fails to strike at the audience’s attention.

This is because of the fact that the audience is unable to easily observe the growth in retail sales numbers. The purpose of the data visualisation is to visually depict information to the audience. But in the original data visualisation, one can only see the growth through numbers not visually through bars.

You have to visually compare the bars’ lengths and sizes to observe and understand the difference between numbers for any 2 or more time periods.

It can also be seen that the angle of all the bars are quite similar. It is the bar lengths that differ. Because of this, it is very difficult to distinguish sales numbers of 1 year from the other just by examining the bar lengths.

There are mainly 2 issues with the use of colour: the combination of colours used are not colourblind safe, and the use of single-hue colours for the two sales’ categories.

with 8% of males and 0.4% of females in the world having some form of colour-blindness, it is very imprortant to design the data visualisation keeping in mind the same.

The red-green colour-blindness in the most common of all, therefore it is essential not to use these colour combinations in the data visualisations.

Also, high saturated colours puts an extra strain on the eyes of the audience thereby not using high saturation is also recommended for good data visualisations.

Failing these two points, the original data visualisation used variations of red and blue colours. The colours at some bars are saturated but not that high. Therefore, data visualisation is not colourblind safe plus it puts some extra strain on audience’s eyes that could have been eliminated using smooth colour choices.

The other issue is the use of sequential single hue colours which is generally used for numerical variables where darker colour means high numeric value while lighter colour meaning low numeric value. But, statistically the sales here is a categorical variable consisting of two components- total sales and e-commerce sales. Therefore, we can use 2 different colour combinations to differentiate between the two kinds of sales which is meaningful and easier to compile.

There are two points where the data visualisation fails to catch the viewers’ eyes: -there is no numerical labelling for the e-commerce sales, and -the labelling of years and the other numeric values & percentages are not suitably placed.

The e-commerce sales is an important part of the objective to be portrayed by the data visualisation, yet the data visualisation lacks to provide the information about the numbers relating to the e-commerce sales.

The use of percentages and the visual of the e-commerce sales’ bars is not enough to give a clear and accurate information about the e-commerce sales.

The years and other values are placed circularly which makes it quite uncomfortable for the audience to see the exact numbers. The audience have to move their head around to see the numbers which makes the data visualisation an unsuitable one.

REFERENCE

Data Visualisation and Dataset Source:

Irena(March 2021) .Charting Over 20 Years of Retail Sales and E-Commerce in the U.S. Retail Industry(2020). https://howmuch.net/articles/timeline-retail-sales-growth-US

Orginal Data Source:

Retail Indicators Branch, U.S. Census Bureau. Charting Over 20 Years of Retail Sales and E-Commerce in the U.S. Retail Industry. Retrieved november 19, 2020, from US Census Bureau website: https://www.census.gov/retail/index.html

CODE

The following code was used to fix the issues identified in the original.

library(tidyr)
library(dplyr)
library(ggplot2)

data <- read.csv("retail-sales.csv")#data imported using import function and saved in a new variable
data$year %>% as.factor()#factorised the year variable
##  [1] 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006
## [16] 2005 2004 2003 2002 2001 2000
## 21 Levels: 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 ... 2020
sales_data <- data %>% select(1:3)#subsetting the dataset

colnames(sales_data) <- c("Year","Total Retail Sales","E-Commerce Sales")#changing variable names

new_sales <- gather(sales_data,"Sales","value(in million dollars)",2:3)#tidying the dataset

new_sales$Sales <- as.factor(new_sales$Sales)#factroising the sales variable

p1 <- ggplot(new_sales,aes(fill=Sales,y=`value(in million dollars)`,x=Year)) + geom_bar(position = "dodge",stat="identity")#creating bar chart

p2 <- p1 + scale_fill_manual(values =c('#1f78b4','#b2df8a'))#adding colors to the bars

p3 <- p2+geom_text(aes(label=`value(in million dollars)`),size=3,vjust=-0.5,position = "dodge")+theme_classic()#labelling values on the bars

p4 <- p3+labs(title="Timeline of Retail Sales Growth in the US",subtitle = "Total Retail Sales by Year & With E-Commerce Growth Broken Out",caption = "Source:Retail Indicators Branch, U.S. Census Bureau")#adding text labels on the data plot

p5 <- p4+theme(plot.title = element_text(color="#d95f02",size=20,face="bold"),plot.subtitle = element_text(color="#a9a9a9",size=10,face="bold"),plot.caption = element_text((face="bold"),hjust=0))

p6 <- p5+theme(axis.title.x = element_text(color = "#000000",size=12,face="bold"),axis.title.y = element_text(color = "#000000",size=12,face="bold"))

Data Reference

Irena(March 2021) .Charting Over 20 Years of Retail Sales and E-Commerce in the U.S. Retail Industry(2020). https://howmuch.net/articles/timeline-retail-sales-growth-US

RECONSTRUCTION

The following plot fixes the main issues in the original.