Objective
In 2019, economic inequality in the U.S. reached its highest level in more than half a century. However, the World Economic Forum recently warned that income inequality could worsen as a result of the coronavirus crisis. This visualization provides a snapshot of economic inequality in the U.S. prior to the outbreak by comparing the distribution of wage income in 2018.
The visualization draws upon 2018 wage data from the Social Security Administration. The circle graph represents 100% of the total wages earned in the U.S. Each slice of the circle represents the percentage of Americans whose net compensation fell within a certain interval, such as $0-$4,999 or $5,000-$9,999. The larger the slice of the circle, the higher the percentage of Americans within that net compensation range. In addition, each slice of the circle is color-coded. The shades of pink indicate lower wages, while the shades of blue and green indicate higher wages.
The visualisation chosen had the following three main issues:
Unordered pentagon/hexagon shapes in pie chart lack accuracy. Even if the area for each pentagon/hexagon is divided as per their percentage it is not visually understood. US is appoximately double than china however distribution is deceptive in the visualization. The distribution looks deceptive as the chart is divided using irregular shapes (pentagon, hexagon, etc). Due to this it is not possible to interpret from the visual that the Interval ‘$20K - 24.9K’ and ‘15K - 19.9K’ have the same perccentage of 6.52.
As the distribution used is a pie chart it is not possible to compare two Intervals as there is a bombardment of data. The values for few of the Intervals like ‘$175K - $199.9K’, ‘$200K - 249.9K’ are hard to read.
The colours used in the visualization are not the best. As the Interval ‘250K - $1M’ and ‘> $1M’ are so close to each other and the colours used to differenciate the two are not easily distingushable.
Reference
The following code was used to fix the issues identified in the original.
library(ggplot2)
library(readr)
library(forcats)
wage_inequality <- read.csv("C:/Users/SHRIYA SALIL BAGWE/Desktop/Sem 1/Data visual/Assignment 2.csv")
str(wage_inequality)
## 'data.frame': 59 obs. of 5 variables:
## $ Interval : Factor w/ 59 levels "$1,000K - $1,499.9K",..: 58 49 6 17 30 32 36 38 42 44 ...
## $ Number : Factor w/ 59 levels "1,01,57,201",..: 18 6 5 4 3 2 1 58 57 53 ...
## $ Percentage : num 12.47 7.75 6.89 6.52 6.52 ...
## $ Aggregate.amount: Factor w/ 59 levels "1,05,39,43,46,162",..: 41 58 7 12 20 24 32 34 35 36 ...
## $ Average.amount : Factor w/ 59 levels "1,02,433","1,07,444",..: 26 52 23 25 31 33 37 39 43 45 ...
wage_inequality$Number <- as.numeric(wage_inequality$Number)
wage_inequality$Interval <- as.character(wage_inequality$Interval)
str(wage_inequality)
## 'data.frame': 59 obs. of 5 variables:
## $ Interval : chr "Less than $5K" "$5K - $9.9K" "$10K - $14.9K" "$15K - $19.9K" ...
## $ Number : num 18 6 5 4 3 2 1 58 57 53 ...
## $ Percentage : num 12.47 7.75 6.89 6.52 6.52 ...
## $ Aggregate.amount: Factor w/ 59 levels "1,05,39,43,46,162",..: 41 58 7 12 20 24 32 34 35 36 ...
## $ Average.amount : Factor w/ 59 levels "1,02,433","1,07,444",..: 26 52 23 25 31 33 37 39 43 45 ...
plot <- ggplot(data = wage_inequality, aes(reorder(Interval, Number), Number))
plot1 <- plot + geom_bar(stat ="identity", fill = '#f68099') +
coord_flip() +
theme_bw() +
labs(x = "Compensation Interval", y ="Number of Earners", title = "How much Americans Make in Wages", subtitle = "Compensation Intervals & Number of Earners") +
theme(axis.text.x = element_text(angle = 90), plot.title = element_text(size = 20, hjust = 0.5, colour = "black"), plot.subtitle = element_text(size = 20, hjust = 0.5, colour = "black"), text = element_text(size = 20))
Data Reference
The following plot fixes the main issues in the original.