Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
Explain the objective of the original data visualization and the targeted audience.
The visualization chosen for this assignment was published by HowMuch.net, a financial literacy website, from an article written by Raul Amoros in 2019. It attempts to show an annual salary to median house price comparison for the top 50 biggest US cities in the same year. Note the article and visualization were calculated assuming the prospective buyer has made a 20% down payment.The target audience for those interested in the real estate market in the United States in general, and potential home buyers in particular. The objective of this visual is to compare the salary needed to afford a home based on the median price of a city in the US.
The visualization has the following three main issues:
Visualization is overcrowded: While the author used geographical map to pinpoint the city’s exact location, this actually caused clusters on certain sides of the visualizations and made it less readable. The article also analysed in reference to regions (such as Midwest, East Coast), which can be challenging to identify from the visualization with too many texts.
Inconsistency in fonts: While it can be deduced that there is a correlation between the city names’ sizes and the annual salary needed to buy a house, the inconsistency creates more confusion than clarification. Text sizes do not establish a clear hierarchy. Some cities are in bold which cause distraction for the audience and make it more difficult to compare.
The use of legend colours and 3D visual cues: median house prices are presented in different shades of green, which can be quite hard to distinguish (i.e. the range $250-$499K is quite similar to $500 - $999K). In addition, the legend of annual salary is in 2D triangle shapes, while on the map they are represented in ‘cone’ shapes, making it difficult to compare. Having two legends also requires the audience to constantly switch between interpreting the chart and legends.
Reference
Salary Needed To Buy a House in Largest U.S. Metros. (2019). HowMuch.net. Retrieved November 17, 2022, from https://cdn.howmuch.net/articles/salary-needed-to-buy-a-house-in-largest-us-metros-f6cf-af2f.jpg
Amoros, R. (n.d.). Visualizing how much is needed to buy a house in the 50 largest metros in U.S. HowMuch. Retrieved November 17, 2022, from https://howmuch.net/articles/salary-needed-to-buy-a-house-in-largest-us-metros
The following code was used to fix the issues identified in the original.
library("ggplot2")
library("readxl")
library("here")
# This was run to set the working directory to my assignment 2 folder
here::here()
## [1] "C:/Users/User/Documents/Uni/Data Visualisation and Communication - MATH2404/A2"
# Using the "here" package to reference the subfolder and reading in the
# dataframe
df <- read_excel(here("data", "data.xlsx"))
# Remove the first row which is national average
df = df[-1,]
# Removing columns not required for the visualization
df <- df[,-c(3,4,6,7)]
# Assigning the region code based on state into a new column
df$Region <- ifelse(df$State == "WA" | df$State == "OR" | df$State == "CA" |
df$State == "NV" | df$State == "UT" | df$State == "CO" |
df$State == "WY" | df$State == "MT" | df$State == "ID",
"West",
ifelse(df$State == "AZ" | df$State == "NM" | df$State == "TX" |
df$State == "OK", "Southwest",
ifelse(df$State == "ND" | df$State == "SD" | df$State == "NE" |
df$State == "KS" | df$State == "MO" | df$State == "IL" |
df$State == "IN" | df$State == "OH" | df$State == "MI" |
df$State == "WI" | df$State == "MN" | df$State == "IA",
"Midwest",
ifelse(df$State == "PA" | df$State == "NY" | df$State == "ME" |
df$State == "VT" | df$State == "ME" | df$State == "NH" |
df$State == "MA" | df$State == "RI" | df$State == "CT" |
df$State == "NJ", "Northeast",
ifelse(df$State == "AR" | df$State == "LA" | df$State == "MS" |
df$State == "AL" | df$State == "FL" | df$State == "GA" |
df$State == "SC" | df$State == "NC" | df$State == "VA" |
df$State == "DC" | df$State == "DE" | df$State == "MD" |
df$State == "WV" | df$State == "KY" | df$State == "TN",
"Southeast", "NA")))))
# Converting columns to appropriate data types
df$Region <- as.factor(df$Region)
df$`Metro Area` <- as.factor(df$`Metro Area`)
# Generating the scatter-plot and selecting the x and y axis as well as the
# legend fill.Both axis have log transformation applied due to large some
# variances being large
p1 <- ggplot(data = df, aes(x = log(`Salary Needed`),
y = log(`Median Home Price`),
color = df$Region))
p1 <- p1 + geom_point()+
scale_color_manual(values = c("West" = "#E69F00",
"Southwest" = "#56B4E9",
"Midwest" = "#999999",
"Northeast" = "#0072B2",
"Southeast" = "#CC79A7")) +
labs(title = "Salary Required for Median House price",
subtitle = "Top 50 Largest U.S Metros grouped by Region for 2019",
x = "Salary Needed (log)",
y = "Median House Price (log)",
color = "Region") +
theme_bw()
Data Reference
Salary Needed To Buy a House in Largest U.S. Metros. (2019). HowMuch.net. Retrieved November 17, 2022, from https://cdn.howmuch.net/articles/salary-needed-to-buy-a-house-in-largest-us-metros-f6cf-af2f.jpg
Amoros, R. (n.d.). Visualizing how much is needed to buy a house in the 50 largest metros in U.S. HowMuch. Retrieved November 17, 2022, from https://howmuch.net/articles/salary-needed-to-buy-a-house-in-largest-us-metros
Amoros, R. (n.d.). Do you earn enough to afford a house in the largest U.S. metros? HowMuch. Retrieved November 17, 2022, from https://howmuch.net/sources/salary-needed-to-buy-a-house-in-largest-us-metros
United States regions. National Geographic Society. (n.d.). Retrieved November 18, 2022, from https://education.nationalgeographic.org/resource/united-states-regions
H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
Wickham H, Bryan J (2022). readxl: Read Excel Files. R package version 1.4.1, https://CRAN.R-project.org/package=readxl.
Müller K (2020). here: A Simpler Way to Find Your Files. R package version 1.0.1, https://CRAN.R-project.org/package=here.
The following plot fixes the main issues in the original.