Original


Primary Energy Consumption by Source and Sector, 2011 (p.37)
Source: U.S. Energy Information Administration / Annual Energy Review (2011).


Objective

This data visualisation is designed to deliver a series of key informative metrics to stakeholders of the United States Energy Information Administration (herein EIA). Although over 10 years old, the details of energy production and consumption are relevant to any energy producer or industry body. Thus, the key audience are those members of the body of the industry who either organise energy production, use, or flow.

As an addendum and caveat to this project, later Annual Energy Review summaries (2012, 2016, etc.) all appear with a similar visualisation shape. This means that although this is an exemplar of a less than successful visualisation, it has become an industry standard for EIA, and offers the industy body a fruitful summary shape for their reports.

Additionally, the data herein is complex. There are several energy sector bodies all producing reports which are compiled and summarised into datasets, which are further compiled to produce this single data visualisation. This means that a single plot is built up of a vast series of compiled data sets, summaries of summaries, and displayed in a single comprehensive visualisation. This single visualisation is therefore a summary of a vast set of data, and is attempting to answer several key questions co-temporally.

This data visualisation appears on page 37 of a larger report titled, “Annual Energy Review 2011”:

https://www.eia.gov/totalenergy/data/annual/archive/038411.pdf”.

Other examples of the same visualisation are available through the dropdown box “Annual Energy Review archives” on the RHS of this landing page:

https://www.eia.gov/totalenergy/data/annual/index.php”.
(choose “Energy consumption by sector”)

NOTE: this format of data visualisation appears in the Annual Energy Reviews (AER), but not in the Monthly Energy Reviews (MER).

General Summary of the Source Document

The source document, Annual Energy Review 2011, is a compilation of the years’ data. It includes a diagram of “Energy Flow” (p. 3), which is a lead visualistion for the section focusing on “Energy Overview”. This data visualisation, like “Primary Energy Consumption by Source and Sector, 2011 (p.37)”, is similarly problematic. This suggests that the kind of data which is compiled in this report is highly multifaceted, making a summary visualisation quite difficult to produce, and necessitating a data scientist to sacrifice visual congruence in order to meet the needs of the report.

Description of Original: Primary Energy Consumption by Source and Sector, 2011 (p.37)

Returning to the data visualisation for this report. “Primary Energy Consumption by Source and Sector, 2011 (p.37)” appears as the lead visualisation for the section of the report focusing on “Energy Consumption by Sector”. The visualisation is a summary of the section that follows. Subsequent pages of the report visualise the minute data more concisely through using histograms, pie charts, column charts, and tables. This data visualisation is therefore an attempt to condense myriad sources of information into a single visual representation. A single summary visualisation of this type will naturally be difficult to achieve. The difficulty stems from a need to summarise, answer several questions simultaneously, and meet the needs of various audience types independently but co-temporally.

Whole image split into 3 parts for explanation

In order to validate the whole image, it must first be separated into its 3 component parts; Source, Sector, and Throughput. Each section will then be assessed using both Evergreen and Emery’s (2016) checklist, and the QDV multidimensional tri-fecta check.

The 3 sections are described below;



Plot 1
Source: U.S. Energy Information Administration / Annual Energy Review (2011).(edited)


The Source section quantifies the amount of energy produced by various production sources; Petroleum, Natural Gas, Coal, Renewable Energy, Nuclear Electric Power.

Evergreen and Emery (2016) Checklist

\(total: 26/36 = 72.2%\) < 85% threshold

Black and White example

** Black and White reprint of Original **

Trifecta Check-up

Q. Single question, “what is the percentage build up of the sources of power for the grid?”

D. The data clearly displays the percentage build up of the sources of power for the grid.

V. The chart shows the percentage build up of the sources of power for the grid

Q, D, V, are in sync (for plot 1)



Plot 2
Source: U.S. Energy Information Administration / Annual Energy Review (2011).(edited)


The Sector section (plot 2) quantifies the amount of energy used and consumed by various industry bodies and sectors; Transportation, Industrial, Residential & Commercial, Electric Power.

The numbers in each section equate to the percentage of the whole generated power grid volume consumed by that sector. In Transportation, for example, they use about 27.0 of the whole volume of the grid (rounded up to 28% due to various rounding accumulation errors from the various sources, as noted in the subtext). This is therefore a complicated set of enumerations for each sector because there are 2 related, but different, numbers for each of the stacked boxes.

The Sector section is colour coded to distinguish between the sectors. For example, Transportation is coloured yellow, and the sectors become variably darker going down the stacked boxes. The colour palette used (yellow to brown) implies that the lower boxes are potentially ‘more’ dirty than those above. This colour difference does not follow from the increment of numbers in boxes, only the different sectors; Transport is less dirty than Industrial etc.

The final box, “Electric Power” is actually the biggest box in terms of percentages (40%) and the darkest brown box. This means that the stacked boxes are not clearly ordered in any way.

Evergreen and Emery (2016) Checklist

\(total: 20/36 = 55.6%\) < 85% threshold

Trifecta Check-up

Q. Single question, “what is the percentage use of the sectors of power for the grid?”

D. The data clearly displays the percentage use of the sectors of power for the grid.

V. The chart shows sectors use but they are not aligned in order

V offers poor visuals (for this section)



Plot 3
Source: U.S. Energy Information Administration / Annual Energy Review (2011).(edited)


The Throughput section (plot 3) attempts to show a complicated set of information in a single section. It attempts to designate the percentage of production which is distributed to each sector while also showing the use of each sector.

Visually, although the data is all enumerated (at the beginnings and ends of the individual flow diagram lines), there are a great many intersecting lines and numbers making this particular section difficult to parse at an initial glance. Each single line in the Throughput has 2 different quantities associated with it; the output percentage, and the sector use percentage.In terms of colour, the flow diagram colour for each of the lines is matched to the source sector colour. This designates a clear visual cue of relationship between the source and the throughput. These colours do not clearly show up in black and white rendering of the image.

Evergreen and Emery (2016) Checklist

\(total: 15/36 = 41.7%\) < 85% threshold

Trifecta Check-up

Q. multiple questions, “what is the source percentage?”, “what is the sector use percentage?”

D. The data is confused as it does not cohere across the single lines (the first number does not match the second number).

V. The chart shows source percentages and sector use percentages.

QD, not a single question, and problematic data.

Summary of all sections

Key issues with the whole visualisation

The visualisation has attempted to pursue a complex summary. There are three distinct sections to the diagram, attempting a faceted data visualisation. When combined into the single visualisation, the data from several questions are compounded into a confusing picture and the reader must take time to parse several sections.

The Annual Energy visualisation has the following three main issues:

  1. Attempts to answer multiple questions with a single data visualisation, confusing the reader depending on which section the reader attends to,
  2. Improperly justifies the relationship between Sector percentages, offering no clear reason for the organisation of the stacked column,
  3. In attempting to meet the needs of a vast audience (beneficence), a sacrifice of clarity over data accuracy has been made.

By attempting to answer a larger set of summary questions at the same time, the visualisation compounds several issues of colour, labeling, and organisation. The output visualisation is a clever use of two pillars (Source and Sector) with a flow diagram in the middle. In this way, it encapsulates several summaries in one picture; Source percentages, output percentages toward various Sectors, flow of energy between sectors, input percentages from various Sources, and final Sector percentages. Being part of a larger context (a summary report for the EIA), it is followed by all of these smaller summaries in much clearer visualisations. Problematically, in a single visualisation, these independent problems are compounded.

Reconstruction

The following plots fix the issues with the main plot. Each of the three plots relies on a summary of a vast amount of data. When sourced, that data is problematic to parse and combine for this assignment. So, firstly, a contingency table was recreated using the key data from the original visual. These new data visualisations will answer the question, “what is the energy use by sector?” By focusing on this question (rather than a vast summary) we get all the main points of the summary in a more simplified set of visualisations. The following reconstructions offered include: a Mosaic plot, a Stacked Barchart, and a set of Coordinated Polar plots. Any of these offered plots would work to answer any single summary question in place of the original.

Original data Another issue was found with this particular data visualisation. The original data retrieved from EIA includes (but is not limited to) the following dataframes.

Due to the complex nature of these above charts, the original data was not used herein. These datasets were NOT used to construct the final plots because of the complex nature of drawing on multiple datasets.

Rather, a data frame was constructed from the data in the chart itself. See below, “Recreation of data frame”.

# initialing relevant libraries
library(vcd) #installing vcd in order to display a mosaic plot for the data
library(dplyr)
library(tidyr)
library(ggplot2) # used for stacked bar-chart and Co-ord Polar plots below.

Recreation of data frame

The key figures were sourced from the original “Primary Energy Consumption by Source and Sector, 2011 (p.37)”.

With this contingency table in place, the following plots were created. note: Each of the following charts will be described in terms of the Graphical perception model offered by Cleveland and Mcgill (1985).

mosaicplot((energy_matrix), main="Energy Use by Sector and Source", color = terrain.colors(5),
           xlab = "Sector", ylab = "Source", cex.axis = 0.8, las = 1)

note: several attempts to rename data tags unsuccessful

This Mosaic plot uses columns for each sector; 1.1=Transport, 1.2=Industrial, 1.3=Residential and Commercial, 1.4=Electric Power. The build up of various sectors can be visually seen more clearly than the original using ‘area’ and ‘colour hue’ as distinguishing markers.

A mosaic plot has several benefits over the original design. It maintains visual comparison accuracy (Cleveland and Mcgill, 1985) using area, which, although the “least accurate of the elementary tasks” (ibid, p. 829), do not rely on volume, density, colour hue or saturation as the original does.

The colour differences between the sections (1.1, 1.2, 1.3, 1.4) are maintained in BnW printing. This can visually represent a similar source for the power of each section. Although at the bottom of Cleveland and Mcgill (1985) “Aspect Judged” ranking (p. 229), they offer some comparative scope for the audience.

This mosaic only summarises one section: the Sector use by Source. It seems that this mosaic is more clearly aligned with the original visualisation’s title, “Primary Energy Consumption by Source and Sector”. It also offers some clarity for each production sector, something missing in the original visualisation.

# reshape the data to long format
Energy_use_sector_source_long <- pivot_longer(Energy_use_sector_source, -Sector, names_to = "Source", values_to = "Percent")

# create the stacked bar chart
ggplot(Energy_use_sector_source_long, aes(x = Sector, y = Percent, fill = Source)) +
  geom_bar(stat = "identity") +
  scale_fill_manual(values = c("#1B9E77", "#D95F02", "#7570B3", "#3355DD", "#66A61E")) +
  labs(title = "Energy use by sector and source", x = "Sector", y = "Percent", fill = "Source") +
  theme(legend.position = "bottom")

This stacked barchart will produce a similar visualisation to the mosaic plot above. It relies on area and colour hue like the above mosaic plot, which is not the best cue for graphical perception, they are combined (not relying on any single cue).

The stacked barchart answers a single summary question, “What is the source make-up of various sectors?”. Can clealy identify each EIA group by colour (colour could be adjusted to an industry standard for each section). This particular choice of colours do NOT work well in BnW reprints, however.

# create a list to store the plots
plots <- list()

# loop through each sector
for (i in 1:nrow(Energy_use_sector_source)) {
  
  # subset the data for the current sector
  data <- Energy_use_sector_source[i,]
  
  # convert the data to long format
  data_long <- pivot_longer(data, -Sector, names_to = "Source", values_to = "Percent")
  
  # create the polar plot
  p <- ggplot(data_long, aes(x = Source, y = Percent, fill = Source)) +
    geom_bar(stat = "identity", width = 1) +
    coord_polar() +
    scale_fill_manual(values = c("#f2a900", "#d7191c", "#abdda4", "#2b83ba", "#FDAEB4")) +
    labs(title = paste("Energy use by", data$Sector), x = "", y = "") +
    theme(legend.position = "none")
  
  # add the plot to the list
  plots[[i]] <- p
}

# display the plots
gridExtra::grid.arrange(grobs = plots, nrow = 2, ncol = 2)

Although there are many “perceptual challenges” offered by pie charts (see Univariate and Bivariate Solutions section, Week 3 material, MATH2404, RMIT 2023), these polar coordined plots are different.

These coordinated polar plots offer clear perceptual ordering by offering: 1. a common position on an identical but non-aligned scale (rank 2 in Cleveland and Mcgill, 1985 “Aspect Judged” ranking, p. 229) 2. visual perception using angle and volume combined (ranks 2 and 6 in Cleveland and Mcgill, 1985 “Aspect Judged” ranking, p. 229) 3. however, due their positioning as separate visualisations, they do not coordinate themselves to a common scale (rank 1 in Cleveland and Mcgill,1985 “Aspect Judged” ranking, p. 229).

Although the coordinated polar plots meet more of the perceptual cues noted in the literature, they are separate plots. It is felt that either the moasic or stacked barchart will meet the needs of EIA more substantially as a summary of summaries for their Annual Review reports.

Reference

  • Cleveland, W & Mcgill, R (1985). Graphical Perception and Graphical Methods for Analyzing Scientific Data, Science 229. 828-33. DOI: 10.1126/science.229.4716.828.

  • Computer Science Colby (N.D.) Good and Bad Examples, Colby Computer Science https://cs.colby.edu/, accessed 28/03/2023. https://cs.colby.edu/courses/S14/cs251/goodbad.php

  • Gelman A and Nolan D (2017) Teaching Statistics: A Bag of Tricks, 2nd edn, Oxford University Press, New York.

  • U.S. Energy Information Administration (2012) Total Energy: Annual Energy Review 2011, Independent Statistics and Analysis: U.S Energy Information Administration, accessed 28/03/2023, https://www.eia.gov/totalenergy/data/annual/pdf/sec2_3.pdf

Data Reference