Original


Source: Amoros (2017).


Objective

As written in the original visualisation itself (Amoros, 2017), I believe the main objective of the visualisation is to show which U.S. states has the most and least amount of debt. The visualisation also educates the reader about the extent of debt held by the U.S. state governments. All 50 states are shown to enable the reader to compare their state financial position to others. The states are drawn based on their location to show hotspots and so that neighbouring states can be compared easily. Total assets and liabilities are shown to indicate whether the high (or low) debt ratio is mainly caused by large (or low) liabilities or low (or high) assets or a combination of both.

The graph was made by and taken from howmuch.net, which is a website that creates visualisation and article about finance and economic. Hence, the audience is the general public who are interested in finance or economy. Since the visualisation is specific to the U.S., the majority of the audience are likely people from the U.S. who would like to compare how their state government is ranked compared to the other states, or people who have interest in the U.S economy.

The visualisation chosen had the following three main issues:

  • Using an area of a circle drawn beneath or on top of another circle to portray quantity (amount of assets and liabilities). Area is not as accurate as for example position, to portray a quantity. Moreover, a circle is drawn on top of another circle, which might give the illusion that the bottom circle does not have as much area than it actually is. For example, for California, the majority of the red circle is covered in blue.
  • Visualisation shows incorrect numbers (or at least the wrong year). The original visualisation was published on 2017 while the source was published on 2018, so it is likely that the source was updated after the visualisation was published. The numbers written in the visualisation and the numbers written in the source are different. For example, for California total assets, the visualisation shows $249.4B (Amoros, 2017) while the source shows $287.79B (DePietro, 2018). The source obtained the data from each state’s 2017 Comprehensive Annual Financial Report (CAFR). So I looked at the 2017 CAFR of a few states such as Alaska (Department of Administration, 2018), California (Yee, 2018), New York (DiNapoli, 2017), and Texas (Hegar, 2018), and it seems that the visualisation has the wrong numbers while source has the correct numbers. After looking at CAFR reports of a few states, it seems that the numbers in the visualisation is the sum of liabilities and deferred inflows and sum of assets and deferred outflows for 2016. However, the title of the visualisation says the year 2017.
  • The debt ratio is depicted with colors and the liability circles are drawn with these colors. As stated above, I think the main objective of the visualisation is to rank states debt ratio, so I would argue that debt ratio is the main variable here. However, it is difficult to get the state with the most or least debt ratio since the value is categorized into 5 shades of red. Moreover, the liability circles are drawn with these different colors. This is not ideal because comparing area of circles with different colors and background colors might create an illusion that makes the circle appear smaller or larger.

Reference

Code

The following code was used to fix the issues identified in the original.

library(rvest)
library(tidyverse)


#################################################################
##                         Scrape Data                         ##
#################################################################

url <- 'https://www.gobankingrates.com/making-money/states-least-amount-debt/'
page <- read_html(url)

# Select nodes that contain all of the following 3 classes.
items <- page %>% html_nodes(".col-xs-12.col-sm-offset-1.col-sm-10")

# Read the data from <h2> and <li> HTML tags.
df <- html_nodes(page, ".col-xs-12.col-sm-offset-1.col-sm-10") %>% 
      map_df(~{
                 tibble(
                        State = html_node(.x, "h2") %>% html_text(trim = TRUE),
                        Data = html_nodes(.x, "ul > li") %>% html_text(trim = TRUE)
                 )
})


##################################################################
##                          Preprocess                          ##
##################################################################

# Remove non breaking space.
df$Data <- df$Data %>% gsub(pattern = "\u00A0", replacement = "", fixed = TRUE)

# Filter rows. Separate variable names and value.
df <- df %>%
      filter(grepl("Total assets:|Total liabilities:|Debt ratio:", Data)) %>%
      separate(Data, into = c("Variable", "Value"), sep = ":")

# State column: remove number.
df$State <- gsub("[0-9]*\\. ", "", df$State)

# Value column:
# - Remove "percent", "$", and "billion".
# - Convert to numeric.
# - Round to two decimal places.
df$Value <- df$Value %>%
            gsub(pattern = "percent", replacement = "") %>%
            gsub(pattern = "billion", replacement = "") %>%
            gsub(pattern = "\\$", replacement = "") %>%
            as.numeric() %>%
            round(2)

# Add a new column to label a state’s rank on each variable.
df <- df %>%
      group_by(Variable) %>%
      mutate(Rank = rank(-Value, ties.method = "min"))


##################################################################
##                             Plot                             ##
##################################################################

# Add an asterisk to Alabama because they use numbers from an older report.
df$State[df$State == "Alabama"] <- "*Alabama"

# Get the order of state by debt ratio.
stateOrder <- df %>% filter(Variable == "Debt ratio") %>% arrange(Value) %>% .$State

# Order the variables by how the order we want them to appear in the plot.
variableLevels <- c("Debt ratio", "Total liabilities" ,"Total assets")
variableLabels <- c("Debt ratio (percent)", 
                    "Total liabilities (USD billion)",
                    "Total assets (USD billion)")
df$Variable <- factor(df$Variable,
                      levels = variableLevels,
                      labels = variableLabels)

# Dummy data frame containing 2 rows: an asset and a liability.
# The asset row will have the value of the largest liability.
# The liability row will have the value of the largest asset.
# This dummy data will be used to make the axis limits of assets and liabilities the same.
# So that the bars of assets and liabilities are comparable.
dummy <- data.frame(State = c("California", "California"), # Pick any state.
                    Variable = factor(c("Total assets", "Total liabilities"),
                                      levels = variableLevels,
                                      labels = variableLabels),
                    Value = c(max(df %>% filter(Variable == "Total liabilities (USD billion)") %>% .$Value),
                              max(df %>% filter(Variable == "Total assets (USD billion)") %>% .$Value)))

# Assets and liabilities have the same color (because they are the same unit and have same scale),
# But debt ratio has a different color.
pal <- c("#E7298A", "#29E786", "#29E786")

# Create the plot.
p <- ggplot(data = df,
            # Order the state.
            aes(x = factor(State, level = stateOrder), y = Value, fill = Variable)) +
     geom_bar(stat = "identity", show.legend = FALSE) + coord_flip() +
     geom_blank(data=dummy) +
     facet_grid(.~Variable, scales = "free") +
     # Add rank.
     geom_text(aes(label = Rank, y = 0), hjust = 1, size = 2.9) +
     scale_fill_manual(values = pal) +
     labs(title = "Debt of U.S. States Ranked - 2017",
          subtitle  = "debt ratio = (total liabilities + deferred inflows) / (total assets + deferred outflows)",
          caption = "Source: Andrew DePietro (2018) - https://www.gobankingrates.com/making-money/states-least-amount-debt/
          The numbers next to the bars indicate the states rank for that variable.
          *Alabama data are from fiscal year 2016.") +
      theme(plot.title = element_text(face = "bold"),
            axis.title.x = element_blank(),
            axis.title.y = element_blank())

Data Reference

Reconstruction

The following plot fixes the main issues in the original.

Since the states are ordered based on their debt ratio, the reconstructed visualisation loses the spatial data of the states. But since I believe that the main objective of the original visualisation is to rank the states debt ratio, I think losing the states spatial data is a trade-off I am willing to take, especially when the U.S states map can be easily found on the internet.