Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: Penn Sustainability (2021).


Objective

The purpose of this visualisation was to highlight the amount of greenhouse gas emissions generated from a small number of energy companies, as described by the text in the tweet that this graph came from:

While it’s essential to take individual action against climate change, a 2017 report found that 100 energy companies have been responsible for 71% of all industrial emissions. Learn more about the lack of corporate accountability for GHG emissions here:.

The target audience was most likely to be people who are opposed to climate change, and who are interested to learn more about its causes.

The visualisation used had the following three main issues:

  • Including 100 companies into a pie graph made the graph close to unreadable, particularly for the smaller emitters in the list. Only some of these 100 companies were listed, and the selection of these appeared to be random at times - but equally could be regarded by critics as biased. Also there is no reason these 100 companies would make up a complete whole (implied by a pie graph), as the choice of ‘100’ companies was nominal.
  • The data was inconsistent and misleading. It combined coal sectors for some countries (China, Russia and others) in the same category as data from with individual energy companies. Also the use of a pie chart meant that the relative size of the segments implied a greater percentage output that was actually the case - for example the graph stated that the Saudi Arabian Oil Company had 6.4% of global emissions, when the true number (as per the source data) was 4.5% (the stated 6.4% figure was 4.5% divided by 71%).
  • An excessive number of colours in the pie graph lead to a kaleidoscope effect, particularly near the top left and centre of the graph.

Reference

Code

The following code was used to fix the issues identified in the original.

library(dplyr)
library(ggplot2)
library(devtools)
library(magrittr)

Import and factorize data

Companies <- read.csv(file = 'Assessment2data.csv')  # import data
Companies$Coal <- Companies$Coal %>% as.factor()  # factorize a variable referring to coal companies
Companies$Company <- Companies$Company %>% as.factor() # Factorize companies
Companies %<>% rename("Percentage_emission" = "Cumulative.1988.2015.Scope.1.3.of.global.industrial.GHG...",
                      "Source" = "Coal")
Companies$Producer <- Companies$Producer %>%
  factor(levels = Companies$Producer[order(Companies$Percentage_emission)])  # order companies by emissions level

Limit results to companies

Companies %<>% filter(Company == "Yes")

Create colour code to distinguish coal and oil companies

Companies$Colour  <- ifelse(Companies$Source == "Oil", "Orange","Coral4")

Now is perhaps the most difficult decision. We have ruled out a pie chart, and a bar chart is a suitable replacement. But 100 companies is too many as it would make the chart too busy.

So what to do? The purpose of this graphic is to show that a relatively small number of companies have a large relative effect on global emissions. And to present an attractive graphic that draws attention with a strong message. 100 is just a nominal number. So does it have to be 100 companies?

I looked at alternatives to pie charts for large numbers of values, including bar graphs and even treecharts (available via the treemapify package). But all ended up looking overly ‘noisy’.

By a process of trial and error, I determined that the emissions of the top 12 companies sum to 25% - which is a clear message that reinforces how a small number of companies make an outsized contribution.

So I created subset of these top 12 companies, and a graph title that included the total emission value:

Top12Companies <- Companies %>% slice (1:12)
total12 <- sum(Top12Companies$Percentage_emission)
snippet10 <- paste("Just these 12 companies alone were responsible for \n",total12,"% of global industrial greenhouse gases \nbetween 2008 and 2015 \n")

I created a horizontal graph because it is easy to read the names of the companies.

p12 <- ggplot(data = Top12Companies,aes(x = Producer, y = Percentage_emission))
p12 <- p12 + geom_bar(stat = "identity", aes(fill = Colour)) +
  scale_fill_identity() +
  coord_flip()+
   labs(title = snippet10, , x = " ", y = "\n Culmulative 1988-2015 Scope 1 + 3 of \n global industrial greenhouse gases, %", caption = " ")+
  geom_text(
    aes(label = Percentage_emission, y = Percentage_emission + 0.15),
    position = position_dodge(1),
    vjust = 0.3, size = 3, colour = "grey37") +
    theme(axis.title.x = element_text(size = 10, colour = "grey40", face = "bold"),
                plot.background = element_blank(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_blank(),
    plot.title = element_text(colour = "Grey40", face = "bold"),
    plot.subtitle = element_text(colour = "Grey40", size = 9, hjust = 0)) +
   annotate("text",x=3, y=3.5, label="All companies are oil producers, \n apart from Coal India"
            ,size = 3, colour = "grey27" ) +
    annotate("text",x=1, y=3.5, label=" Data: CDP Climate Majors Report 2017"
            ,size = 3, colour = "grey27" ) 

The resulting graph, while containing less information than the original graph, is easy to understand, and also ethical in that the reported data matches the source data.

Data Reference

Data was copied from Appendix 1 in Griffin (2017), and converted into a csv file. In addition to the data supplied in this table, I determined if each producer listed was primarily a producer of coal or oil, and was a company or a sector, i.e. China(Coal). These additional observations were added to the data before it was imported. Note that Coal India is a company, which I confirmed.

Reconstruction

The following plot fixes the main issues in the original.