Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
The data visualisation is retrieved from the U.S Energy Information Administration (EIA) and the target audience is the general public, private analysts and policy makers to monitor current status and trends of energy supply and consumption. The objective of the data visualisation is to show the price breakdown of what people pay for in each gallon of gasoline and diesel (EIA, 2019b).
The visualisation chosen had the following three main issues:
The visualisation uses deceptive methods in the visualisation where the areas do not reflect the actual percentages. Relative areas falsely portrayed between the two fuels.
Even though the price per gallon is shown, the visualisation does not have clear axes and scaling, making it hard to compare the two different fuel types.
The visualisation uses unconventional data visualisation methods (fuel pump) that can create potential confusion for the audience. Distance between the two charts makes it hard for the audience to make comparisons.
References
EIA. (2019a). Gasoline and Diesel Fuel Update. Retrieved September 15, 2019, from Energy Information Administration website: https://www.eia.gov/petroleum/gasdiesel/?fbclid=IwAR1ojecmgcn2GGMO87CTvTplOtwnSaBKF50_Mupfjxn0lTm8j5kCbUKmFRQ
EIA. (2019b). Information quality guidelines. Retrieved September 15, 2019, from Energy Information Administration website: https://www.eia.gov/about/information_quality_guidelines.php
The following code was used to fix the issues identified in the original.
library(ggplot2)
library(dplyr)
library(tidyr)
library(rvest)
library(stringr)
#Scrape data from web table
gas_url <- "https://www.eia.gov/petroleum/gasdiesel/gaspump_hist.php"
diesel_url <- "https://www.eia.gov/petroleum/gasdiesel/dieselpump_hist.php"
gasdata <- html_table(read_html(gas_url, "table"))[[1]]
dieseldata <- html_table(read_html(diesel_url, "table"))[[1]]
colnames(dieseldata) <- dieseldata[1,]
colnames(gasdata) <- gasdata[1,]
dieseldata <- dieseldata[-1,]
gasdata <- gasdata[-1,]
#Reconstruct data
july_gas <- gasdata %>% filter(`Mon-yr` == "Jul-19")
july_gas["Type"] <- c("Gas")
july_diesel <- dieseldata %>% filter(`Mon-yr` == "Jul-19")
july_diesel["Type"] <- c("Diesel")
#Reorder columns
july_gas <- july_gas[,c(7,1,2,3,4,5,6)]
july_diesel <- july_diesel[,c(7,1,2,3,4,5,6)]
#Drop date column
july_gas <- july_gas[,c(-2)]
july_diesel <- july_diesel[,c(-2)]
#Convert to long format
july_gas <- july_gas %>% gather(c(3:6), key = "Component", value = "Percentage")
july_diesel <- july_diesel %>% gather(c(3:6), key = "Component", value = "Percentage")
#Ensure values are correct data type
july_diesel$Percentage <- as.numeric(july_diesel$Percentage)
july_diesel$`Retail Price(Dollars per gallon)` <- as.numeric(july_diesel$`Retail Price(Dollars per gallon)`)
july_gas$Percentage <- as.numeric(july_gas$Percentage)
july_gas$`Retail Price(Dollars per gallon)` <- as.numeric(july_gas$`Retail Price(Dollars per gallon)`)
#Combine datasets
data <- rbind(july_gas, july_diesel)
#Create proportion column
data["prop"] <- data$Percentage/100
#Factor according to descending order of diesel values
data$Component <- data$Component %>% factor(labels = unique(data$Component), levels = data$Component[order(july_diesel$Percentage)])
#Add a label column for percentages
data["percent_label"] <- paste0(as.character(data$Percentage),"%")
#Reconstruct visualisation
gasretail <- (data %>% filter(Type == "Gas"))[1, "Retail Price(Dollars per gallon)"]
dieselretail <- (data %>% filter(Type == "Diesel"))[1, "Retail Price(Dollars per gallon)"]
p <- ggplot(data, aes(x = Component, y = prop, fill = Type)) + theme_minimal() +
geom_bar(stat = "identity", position = "dodge", width = 0.7) +
scale_x_discrete(labels = function(x) str_wrap(x, width = 25)) +
coord_cartesian(ylim = c(0,0.6)) +
labs(title = "How much we pay for in a gallon of:",
caption = "Source: https://www.eia.gov/petroleum/gasdiesel/"
) +
geom_text(position = position_dodge(width = 0.7), aes(label=percent_label), vjust = -0.5, size = 3.5) +
scale_fill_manual(labels = c(str_interp("Diesel: $${round(dieselretail,2)}"), str_interp("Gas: $${round(gasretail,2)}")), values = c("gold1", "red3")) +
scale_y_continuous(labels = scales::percent) +
scale_x_discrete(labels = c("Refining(percentage)" = "Refining", "Distribution & Marketing(percentage)" = "Distribution & Marketing", "Taxes(percentage)" = "Taxes", "Crude Oil(percentage)" = "Crude Oil")) +
theme(
legend.position = c(0,1), legend.justification = c(0,1),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
plot.title = element_text(size = 18),
legend.background = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
plot.caption = element_text(colour = "grey")
) +
guides(fill = guide_legend(title = "", ncol = 1))
Data References
EIA. (2019c). Diesel Fuel Pump Components History. Retrieved September 15, 2019, from Energy Information Administration website: https://www.eia.gov/petroleum/gasdiesel/dieselpump_hist.php
EIA. (2019d). Gasoline Fuel Pump Components History. Retrieved September 15, 2019, from Energy Information Administration website: https://www.eia.gov/petroleum/gasdiesel/gaspump_hist.php
The following plot fixes the main issues in the original.