Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: ACMA Research and Analysis Section (2015).


Objective

  • The original visualization has been materialized to demonstrate facts regarding how the claimed 1.75 trillion $ debt is distributed across US state governments along with details regarding each state’s worth in assets and liabilities held by the state. Assets and liabilities are shown using co-centric circular areas with different colours, which are annotated with the amount they signify. The assets are indicated by solid blue coloured circles while the red circles utilize 5 different levels of saturation (baby-pink to dark red), where area indicates the liabilities and saturation indicates the % of debt ratio. Debt ratio is mentioned in form of percentage of ration of liabilities to assets.

  • Visualization is published on www.howmuch.net, which is a specialist organization for creating commercial data analysis and visualizations, as part of a demonstrative portfolio project . Hence, the primary target of the plot is the potential clients or other interested site visitors. Visualization is available for publishing in media and after publishing, targeted audience of the visualization can be a broad group of people ranging all the way from economists, financers, and investors to general voters, news readers and government officials.

The visualisation chosen had the following three main issues:

  • Visualization uses area of circular shapes overlayed on the geographic location of the states to indicate details regarding the state. Using such areas to indicate the proportions is not ideal as they are not the most accurate method to understand the scale of values as well as the shapes doesn’t support any sort of comparison or analytical queries. For example, finding out the top 5 or lowest 5 asset holding state will be a painfully slow process.

  • Visualization uses different saturation of red color to show different levels of debt ratio % for each state by using 5 different colors essentially. Here, first 4 segments are of 25% each (0-25, 25-50, 50-75 & 75-100). the last segment is over 100 with no upper limit. This is not only inaccurate, but is also potentially deceptive. For example, ( Hawaii & Rhode Island) and (Illinois & New Jersey) all get same color segment. However, Hawaii & Rhode Island have a debt ration % around 100%, while NJ & Illinois have debt ration % of more than 300%. Yet they all get put into same category, which is deceptive.

  • Last major problem with the visualization is the general approach authors have chosen to layout the annotations on the plot. There are at least 150 annotations on the plots. Authors have used four colors (black, white, red & blue) and selection of color is not fixed for type of data being annotated. E.g. Some where, white numbers are for liabilities and other places they are for assets. location and size of the annotations are not fixed as well. As a result, plot looks cluttered with text and overloaded to the point where it is hard to identify the right annotation for that state, e.g. AR, TN and KY states.

Reference

Code

The following code was used to fix the issues identified in the original.

# Required package imports
library("rvest")
library("usmap")
library(ggplot2)
library(dplyr)
library(tidyr)
library(gtable)
library(stringr)
library(grid)
library(plotly)
library("cowplot")

# Prepare to scrape table
url <- "https://docs.google.com/spreadsheets/u/0/d/e/2PACX-1vQFBxscTcs04Jg5oVYY0XPdiIAVxJnJZgeKyGvOROeViQ55sJJXOK8RjCwCSOoybFmCgyBYwX69yXMS/pubhtml/sheet?headers=false&gid=0"

# Scrape table
data <- url %>%
  html() %>%
  html_nodes(xpath='/html/body/div/div/div/table') %>%
  html_table()

# Extract table and preprocess
df <- data[[1]][3:52,3:6]

featutreNames<- c('State', 'Liabilities', 'Assets', 'Debt_Ratio_Percentage' ) 

colnames(df) <- featutreNames
df$Debt_Ratio_Percentage <- as.numeric(df$Debt_Ratio_Percentage)
df$Assets <- as.numeric(df$Assets)
df$Liabilities <- as.numeric(df$Liabilities)

df$State <- tolower(df$State)

# For plot 1, long format sorted data
df1 <- transform(df, variable=reorder(State, Liabilities) ) 
long <- gather(df1, metric, a, Liabilities:Assets , factor_key=TRUE)

# For plot 2, sorted Data
df2 <- transform(df1, variable=reorder(State, Debt_Ratio_Percentage) ) 
df2 <-df2[,c("variable","Debt_Ratio_Percentage")]


# Plot 1, Debt ration plot
plot1 <- ggplot(df2, aes(x=variable, y=Debt_Ratio_Percentage)) +
  geom_segment( aes(xend=variable, yend=0)) +
  geom_point( size=3, color="lightsalmon1") +
  xlab("") +
  theme(axis.text.x = element_text(angle = 75, hjust = 1,  size = 14))

# Plot 2, giant butterfly plot prep
fontsize = 4.5

# General theme
theme = theme(axis.text.y = element_blank(), axis.title.y = element_blank(), plot.title = element_text(size = 10, hjust = 0.5))

# Subplot 1, right side plot
assetsPlot <- ggplot(data = subset(long, metric =='Assets'), aes(x=variable)) +
  geom_bar(aes(y = a), stat = "identity", fill = "mediumaquamarine") +
  scale_y_continuous('', limits = c(0, 350), expand = c(0,0)) + 
  labs(x = NULL) +  ggtitle("Assets in billion $") + coord_flip() + theme + theme(plot.margin= unit(c(1, 0, 0, 0), "lines"))

# Fetch Grob for aligning on the side
assetsPlotGrob <- ggplotGrob(assetsPlot)


# Subplot 2, left side plot 
liabilitiesPlot <- ggplot(data = subset(long, metric == 'Liabilities'), aes(x=variable)) +
  geom_bar(aes(y = a), stat = "identity", fill = "darksalmon") +
  scale_y_continuous('', trans = 'reverse', limits = c(350, 0), expand = c(0,0)) +  labs(x = NULL) +
  ggtitle("Liabilities in billion $") + coord_flip() + theme +  theme(plot.margin= unit(c(1, 0, 0, 1), "lines"))

# Fetch Grob for aligning on the side
liabilitiesPlotGrob <- ggplotGrob(liabilitiesPlot)

# Tingle the axis to void the middle space
rn <- which(liabilitiesPlotGrob$layout$name == "axis-l")
axis.grob <- liabilitiesPlotGrob$grobs[[rn]]
axisl <- axis.grob$children[[2]]
yaxis = axisl$grobs[[2]] 
yaxis$x = yaxis$x - unit(1, "npc") + unit(2.75, "pt")
panelPos = liabilitiesPlotGrob$layout[grepl("panel", liabilitiesPlotGrob$layout$name), c('t','l')]
liabilitiesPlotGrob <- gtable_add_cols(liabilitiesPlotGrob, liabilitiesPlotGrob$widths[3], panelPos$l)
liabilitiesPlotGrob <-  gtable_add_grob(liabilitiesPlotGrob, yaxis, t = panelPos$t, l = panelPos$l+1)
liabilitiesPlotGrob = liabilitiesPlotGrob[, -c(2,3)] 


# Middle plot, state names basically
middlePlot <- ggplot(data = subset(long, metric == 'Assets'), aes(x=variable)) +
  geom_bar(stat = "identity", aes(y = 0)) + geom_text(aes(y = 0,  label = variable), size = fontsize) +
  ggtitle("States") +  coord_flip() + theme_bw() + theme +  theme(panel.border = element_rect(colour = NA))

# Fetch grob and set titles for all the plots
middlePlotGrob <- ggplotGrob(middlePlot)
Title = middlePlotGrob$grobs[[which(middlePlotGrob$layout$name == "title")]]
middlePlotGrob = middlePlotGrob$grobs[[which(middlePlotGrob$layout$name == "panel")]]

# Combine left and right plots, make sure to leave some space in the middle for state names
gt = cbind(liabilitiesPlotGrob, assetsPlotGrob, size = "first")
maxlab = long$variable[which(str_length(long$variable) == max(str_length(long$variable)))]
gt = gtable_add_cols(gt, sum(unit(1, "grobwidth", textGrob(maxlab, gp = gpar(fontsize = fontsize*72.27/25.4))), unit(5, "mm")), 
                     pos = length(liabilitiesPlotGrob$widths))

# Add state names in the middle to properly align entire plot
gt = gtable_add_grob(gt, middlePlotGrob, t = panelPos$t, l = length(liabilitiesPlotGrob$widths) + 1)

# Set the title
titlePos = liabilitiesPlotGrob$layout$l[which(liabilitiesPlotGrob$layout$name == "title")]
gt = gtable_add_grob(gt, Title, t = titlePos, l = length(liabilitiesPlotGrob$widths) + 1)




# Export plots in high res for ease of display
png("myplot1.png", units="in", width=12, height=8, res = 300)
print(plot1)
dev.off()
## png 
##   2
png("myplot.png", units="in", width=12, height=9, res = 300)
print(grid.draw(gt))
## NULL
dev.off()
## png 
##   2

Data Reference

Reconstruction

The following plots fixes the main issues in the original.
US State-wise Debt Ratio in %


US State-wise Liabilities and Assets