Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: ScarletteOScare (2022).


Objective

The objective of this visualization is to demonstrate the percentages of Income Tax for each state in the United States.

Target Audience

The target audience would be the residents of the United States who live and pay income tax in the US or people who are considering living in the US.

The visualization chosen had the following three main issues:

  • Visualization Layout - the layout isn’t straightforward for people to identify which states have the highest and lowest income tax percentage. The audiences might have to spend very long time to identify which color scale the states belongs to as it is difficult to compare the color with the color scale one by one.

  • The color scale- color in sequential is much more suitable for ordinal data while the income tax percentage is quantitative data. Audiences could only have brief idea of which range the states belongs to because some of the color is too similar to other.

  • Data integrity - the label didn’t tell what yellow color means in the graph and didn’t mention what the value means in the scale label. The unit of the value is not mentioned. (E.g. percentage? US dollar?) This might confuse or mislead audience so the audience might get wrong information from this visualization.

Reference

Code

The following code was used to fix the issues identified in the original.

library("rjson")
library("dplyr")
library("ggplot2")
library("shadowtext")

#read and convert data to dataframe
myData <- fromJSON(file="newdata.json")
data <- as.data.frame(myData)

#select and sort needed data
new <- select(data,c("State","IncomeTax"))
colnames(new) <- c("state","IncomeTax")

new$state<-factor(new$state,
                  levels = new$state[order(new$IncomeTax,decreasing = FALSE)])

p2 <-ggplot(data = new, aes(x=IncomeTax, y=state,fill=IncomeTax))
  
p2 <- p2 + geom_histogram(stat="identity") +
  theme(
    axis.text = element_text(size = 13),
    axis.title = element_text(size = 25), 
    plot.title = element_text(size = 30),
    panel.grid.major.x = element_line(color = "gray10",size = 0.5,linetype = 1),
    panel.grid.minor.x = element_line(color = "gray20",size = 0.5,linetype = 2))+
  scale_colour_gradient(low="blue",high="red")+
  geom_shadowtext (aes(label = paste(IncomeTax,"%")),color = 'white',size = 5, fontface = "bold", vjust = 0.4,nudge_x = .5)+
  labs(
    title = "Income Tax for Each State in US 2022",
    y = "States",
    x = "Income Tax %")+
  scale_x_continuous(expand = c(0,0),limits = c(0,15),breaks=seq(0, 15,2))

Noted that source is in json format and I have modified the data structure so it could be loaded with the rjson library.

I have upload the python code to github and here is the link:https://github.com/LukeHii97/Visualization-Assignment-2--Preporcessing-json-data/tree/main

Data Reference

Reconstruction

The following plot fixes the main issues in the original.