Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: BHP Economic Contribution Report 2019.


Objective

The data visualization aims to summarise BHP Groups’s total economic contribution in the financial year 2019, according to payments/investments made in US dollars in different sectors, categorized by major countries of operation : Australia, Chile, USA, UK, Canada, and rest of the world. Hence, it aims to provide a comparative depiction as a part of the BHP Economic Contribution Report FY2019.

Target audience : Company stakeholders, suppliers, employees, potential investors and shareholders.

The visualisation chosen had the following three main issues:

  • Doughnut chart fails to represent the categories properly, as it makes use of area to represent totals and category proportions, respectively. This makes visual comparison across sectors difficult especially for similar and smaller proportions, with the reason being that area lacks visual accuracy for representing numeric values.

  • The color scale used to differentiate between segments doesn’t adhere to the conventions to be followed for nominal variables, which produces visual confusion.

  • The doughnut charts use size without any scale to depict the total contribution for each country. Since cost is a quantitative variable, this decreases the accuracy of visualisation. Also, charts are not ordered according to any basis, making it further difficult to visualise comparisons between countries.

Reference

Code

The following code was used to fix the issues identified in the original.

#Preprocessing--------------------

library(dplyr) #to filter data
library(tidyr) #to access gather() function

#import data
economic <- read.csv("C:/Users/de777/Desktop/Data Visualisation/Assignment 2/economic.csv",header=TRUE, stringsAsFactors = FALSE)

str(economic) #check structure

#set column names
colnames(economic) <- c("Country","Pay_gov","Pay_sup","Pay_emp","Pay_share","Soc_inv") 

#add a column for country totals
economic <- economic %>% mutate(Total = rowSums(economic[-1]))

#factorise 'Country' in descending order of 'Total'
economic$Country <- factor(economic$Country,
                           levels = economic$Country[order(-economic$Total)] )
 
str(economic) #check structure

#create proportion dataframe
d1 <- cbind(economic[1], prop.table(as.matrix(economic[2:6]),1))

d1 <- gather(d1, Sector, Proportion, 2:6) #restructure data to long format

#factorise 'Sector' and define labels
d1$Sector <- factor(d1$Sector,
                    levels = c("Pay_gov","Pay_sup","Pay_emp","Pay_share","Soc_inv"),
                    labels = c("Payments to employees","Payments to governments",
                               "Payments to shareholders, lenders and investors",
                               "Payments to suppliers","Social investment"))
str(d1)
d1

#create totals dataframe
d2 <- economic[c(1,7)]
d2$Total <- d2$Total/1000 #convert 'Total' unit to billions
str(d2)
d2

#---------------------------------


#Plotting Visualization----------- 

library(ggplot2) #to create graphics
library(scales) #to format tick mark labels as percents

#plot for percentage
plot1 <- ggplot(d1, aes(x= Sector, y =Proportion, fill=Sector)) +
  geom_bar(stat="identity") +
  geom_text(aes(label = scales::percent(Proportion, accuracy=0.1)), 
            vjust = -.5 , size = 3, fontface="bold" ) +
  labs(fill="Sectors : ", 
       title = "BHP's Global Economic Contribution in FY2019: US$46.2 billion",
       subtitle = "Percentages of contribution in different sectors") +
  facet_grid(~Country, scales="free_y") +
  theme_light() +
  scale_y_continuous(labels = scales::percent) +
  scale_x_discrete(breaks = NULL) +
  theme(axis.title = element_blank(),
        legend.position = "top",
        legend.title = element_text(face="bold",size=10),
        legend.text = element_text(face ="italic",size=10),
        legend.spacing.x = unit(0.3,'cm'),
        plot.title = element_text(color="royalblue4", face="bold", size= 15),
        plot.subtitle = element_text(hjust = 0.5, color="grey30", 
                                     face="bold", size=12),
        strip.text = element_text(size=12)) +
  scale_fill_manual(values=c('#E69F00','#56B4E9','#F0E442','#CC79A7','#009E73')) +
  guides(fill = guide_legend(nrow=2))


#plot for totals
plot2 <- ggplot(d2, aes(x= Country, y =Total)) +
  geom_bar(stat="identity", fill="dodgerblue4") + 
  labs(subtitle = "Country-specific contribution in billions of US$",
       caption = "Source: BHP Economic Contribution Report 2019\nLink: https://www.bhp.com/-/media/documents/investors/annual-reports/2019/bhpeconomiccontributionreport2019.pdf") +
  facet_grid(~Country, scales="free") + 
  theme_gray() + 
  scale_x_discrete(breaks = NULL) +
  geom_text(aes(label = paste(round(Total,1), "b")), 
            vjust = -0.5 , size = 4, fontface = "bold" ) + 
  coord_cartesian(ylim = c(0, 31)) +
  theme(axis.title = element_blank(),
        plot.subtitle = element_text(hjust = 0.5, color="grey20", face="bold", size=12),
        plot.caption = element_text(face ="bold.italic", size = 9, color = "darkorange2", hjust=0),
        strip.text = element_text(size=12))

#---------------------------------
  
#Juxtaposition-------------------

library(cowplot) #to align multiple ggplots together

final <- plot_grid(plot1, plot2, ncol=1,align="v", rel_heights = c(1.4,1))
#--------------------------------

Data Reference

Reconstruction

The following plot fixes the main issues in the original.

Tasks Performed

  • Doughnut charts have been replaced with column bar charts for different category of investments on the basis of country. The bar charts are represented in percentages of investment share in a country. This accentuates visual comparison between the categories as it makes use of position as a feature to represent proportions, since position has a much higher visual accuracy than areas and angles. Especially, the smaller proportions are now easily represented and the labels provided for each percentage share increases readability of the data for the reader.

  • In the original representation, the colors paletter was not suitable for differentiating categories and were close to each other (shades of blue and orange). Now, here a colorblind-safe discrete color scale has been used which is specifically designed for nominal/categorical variables. This has enhanced differentiating each investment sector properly, making comparison better even for those with color-blindness.


Source: Okabe and Ito, 2008.


  • Instead of using size to depict total contribution to each country, simple bar charts have been used in a separate plot below. Also, the charts have been labelled with their respective cost values and re-arranged in decending order of contributions for the reader’s comfort ofcomparison. This has enhanced readability as audience can now make use of position to compare contributions to each country, which was difficult earlier.

Image Reference

  • Okabe, M., & Ito, K. (2008). Color Universal Design (CUD)-How to make figures and presentations that are friendly to Colorblind people-. J*Fly: Data Depository for Drosophila Researchers. Retrieved April 21, 2020, from https://jfly.uni-koeln.de/color/