Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: EIA. (2019a).


Objective

The data visualisation is retrieved from the U.S Energy Information Administration (EIA) and the target audience is the general public, private analysts and policy makers to monitor current status and trends of energy supply and consumption. The objective of the data visualisation is to show the price breakdown of what people pay for in each gallon of gasoline and diesel (EIA, 2019b).

The visualisation chosen had the following three main issues:

  • The visualisation uses deceptive methods in the visualisation where the areas do not reflect the actual percentages. Relative areas falsely portrayed between the two fuels.

  • Even though the price per gallon is shown, the visualisation does not have clear axes and scaling, making it hard to compare the two different fuel types.

  • The visualisation uses unconventional data visualisation methods (fuel pump) that can create potential confusion for the audience. Distance between the two charts makes it hard for the audience to make comparisons.

References

Code

The following code was used to fix the issues identified in the original.

library(ggplot2)
library(dplyr)
library(tidyr)
library(rvest)
library(stringr)

#Scrape data from web table
gas_url <- "https://www.eia.gov/petroleum/gasdiesel/gaspump_hist.php"
diesel_url <- "https://www.eia.gov/petroleum/gasdiesel/dieselpump_hist.php"
gasdata <- html_table(read_html(gas_url, "table"))[[1]]
dieseldata <- html_table(read_html(diesel_url, "table"))[[1]]

colnames(dieseldata) <- dieseldata[1,]
colnames(gasdata) <- gasdata[1,]

dieseldata <- dieseldata[-1,]
gasdata <- gasdata[-1,]

#Reconstruct data
july_gas <- gasdata %>% filter(`Mon-yr` == "Jul-19")
july_gas["Type"] <- c("Gas")
july_diesel <- dieseldata %>% filter(`Mon-yr` == "Jul-19")
july_diesel["Type"] <- c("Diesel")

#Reorder columns
july_gas <- july_gas[,c(7,1,2,3,4,5,6)]
july_diesel <- july_diesel[,c(7,1,2,3,4,5,6)]

#Drop date column
july_gas <- july_gas[,c(-2)]
july_diesel <- july_diesel[,c(-2)]

#Convert to long format
july_gas <- july_gas %>% gather(c(3:6), key = "Component", value = "Percentage")
july_diesel <- july_diesel %>% gather(c(3:6), key = "Component", value = "Percentage")

#Ensure values are correct data type
july_diesel$Percentage <- as.numeric(july_diesel$Percentage)
july_diesel$`Retail Price(Dollars per gallon)` <- as.numeric(july_diesel$`Retail Price(Dollars per gallon)`)
july_gas$Percentage <- as.numeric(july_gas$Percentage)
july_gas$`Retail Price(Dollars per gallon)` <- as.numeric(july_gas$`Retail Price(Dollars per gallon)`)

#Combine datasets
data <- rbind(july_gas, july_diesel)

#Create proportion column
data["prop"] <- data$Percentage/100

#Factor according to descending order of diesel values
data$Component <- data$Component %>% factor(labels = unique(data$Component), levels = data$Component[order(july_diesel$Percentage)])

#Add a label column for percentages
data["percent_label"] <- paste0(as.character(data$Percentage),"%")

#Reconstruct visualisation
gasretail <- (data %>% filter(Type == "Gas"))[1, "Retail Price(Dollars per gallon)"]

dieselretail <- (data %>% filter(Type == "Diesel"))[1, "Retail Price(Dollars per gallon)"]

p <- ggplot(data, aes(x = Component, y = prop, fill = Type)) + theme_minimal() +
    geom_bar(stat = "identity", position = "dodge", width = 0.7) +
    scale_x_discrete(labels = function(x) str_wrap(x, width = 25)) + 
    coord_cartesian(ylim = c(0,0.6)) +
    labs(title = "How much we pay for in a gallon of:",
         caption = "Source: https://www.eia.gov/petroleum/gasdiesel/"
    ) + 
    geom_text(position = position_dodge(width = 0.7), aes(label=percent_label), vjust = -0.5, size = 3.5) +
    scale_fill_manual(labels = c(str_interp("Diesel: $${round(dieselretail,2)}"), str_interp("Gas: $${round(gasretail,2)}")), values = c("gold1", "red3")) +
    scale_y_continuous(labels = scales::percent) +
    scale_x_discrete(labels = c("Refining(percentage)" = "Refining", "Distribution & Marketing(percentage)" = "Distribution & Marketing", "Taxes(percentage)" = "Taxes", "Crude Oil(percentage)" = "Crude Oil")) +
    theme(
        legend.position = c(0,1), legend.justification = c(0,1),
        panel.grid.minor = element_blank(),
        panel.grid.major.x = element_blank(),
        plot.title = element_text(size = 18),
        legend.background = element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        plot.caption = element_text(colour = "grey")
        ) + 
    guides(fill = guide_legend(title = "", ncol = 1))

Data References

Reconstruction

The following plot fixes the main issues in the original.