This is a template file. The example included is not considered a good example to follow for Assignment 2. Remove this warning prior to submitting.

Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original

Original data visualization

Original data visualization

Objective

Explain the objective of the original data visualisation and the targetted audience.

The visualisation chosen had the following three main issues:

  • Briefly explain issue 1
  • Briefly explain issue 2
  • Briefly explain issue 3

  • The objective of the original data visualisation:

The objective of the map plot is to show population density across California, population gap between each county, and aims to allow people to access data easily and impressively.

Firstly, author split 58 counties into 3 groups and each group represent 1/3 of total population, it probably wants to tell audience which counties are in the “Tier 1” group, “Tier 2” group, and “Tier 3” group in terms of population and how large is the gap between each group. Thus, probably author wants to tell the population gap between each county.

Secondly, author uses color scale to represent population status in each county rather than number. This could be viewed as a method to improve accessibility of the data because people can identify different color immediately but need more time to process number. So that, people can briefly understand the population density across California and therefore, another objective should be improve accessibility.

Finally, By answering “So what?” question, the purpose of this plot is probably to give people a broad picture of which county is more prosperous because cities with huge population are normally mega city or CBD.

  • The targeted audience:

There are no much hints from the plot to tell what kind of people might be the target audience but migrant or people who wanted to develop or make money in California might be more likely to search information like this plot because they want to know which county is the best one to develop their career or settle down (Probably big city). Moreover, the source of the original data visualization is from a US social community website, and thus, the market they are serving is American people. Therefore, the majority of audience should be migrant, people who wanted to develop or make money in California, and American people.

  • Issue 1: Failure to answer a practical question

Author’s objective is to show population density across the state. However, telling audience which group of counties has 1/3 population couldn’t answer this objective.

Example:

There are 50 white counties, 6 pink counties, and 2 red counties but audience can only notice that red counties have high population, pink counties have medium population, and white counties have low population because 50 white counties = 6 pink counties = and 2 red counties in terms of population. What audience might know is the population gap between 3 groups of counties and the ranking of these 3 groups of counties but have no idea what’s the population status for one county. For example, audience don’t know what’s the population of Los Angeles.

Negative impact of the design:

This plot can’t tell audience the population of each county

  • Issue 2: Ethical issues such as perceived bias

As we know, 50 white counties = 6 pink counties = and 2 red counties in terms of population. In the perspective of audience, they might think counties that have the same color have nearly the same population, but this is not true and might mislead audience.

Example:

Los Angeles and San Diego are both red counties but Los Angeles has a population of around 10 million and San Diego only has a population of around 3 million. The population of Los Angeles is around the triple of San Diego. Furthermore, Contra Costa and Alpine are both white counties but Contra Costa has around 1.1 million of the population but Alpine only has around 1 thousand of the population. There is also a huge population gap between white counties.

Negative impact of the design:

This design can lead to misunderstanding of the population staus and the population gap for each county.

  • Issue 3: Deceptive methods

Author split 58 counties into 3 groups and shows that each group of populations represent 1/3 total population. However, there are a lot of counties combination that has 1/3 of total population. Technically, this doesn’t make any sense and it can’t help audience to identify which county has higher population and the exact number of population for each county.

Example:

In the perspective of audience, Los Angeles and San Diego are both red counties and the sum of population represent 1/3 of the total population. However, audience might think maybe 1. Los Angeles is a super big county and it can represent 1/3 population by itself but San Diego is just a very small county which can’t contribute any population for the “red county group”. 2. Another hypothesis could be both Los Angeles and San Diego are big counties and they equally contribute population to the “red county group”, it means that these two counties have nearly the same population. Thus, it leads to confusion to audience.

Negative impact of the design:

This design doesn’t make any sense in terms of either showing the population for each county or showing the population gap between counties. And, it would make people confused because it doesn’t clearly tell audience which county has more population and also leave audience to guess which one has more population.

Reference

Code

The following code was used to fix the issues identified in the original.

library(sf)
library(leaflet)
library(htmlwidgets)
library(htmltools)

California.shp <- st_read("/Users/macbook/Documents/RMIT-Master of Analytics/Data visualization and communication/CA_Counties/CA_Counties_TIGER2016.shp") %>% 
  sf::st_transform('+proj=longlat +datum=WGS84')
## Reading layer `CA_Counties_TIGER2016' from data source `/Users/macbook/Documents/RMIT-Master of Analytics/Data visualization and communication/CA_Counties/CA_Counties_TIGER2016.shp' using driver `ESRI Shapefile'
## Simple feature collection with 58 features and 17 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -13857270 ymin: 3832931 xmax: -12705030 ymax: 5162404
## projected CRS:  WGS 84 / Pseudo-Mercator
p1 <- leaflet(California.shp) %>% 
  setView(lng = -119.4179, lat = 36.7783, zoom = 6)

p1 %>% addPolygons()
California.Numbers<- readxl::read_excel("/Users/macbook/Documents/RMIT-Master of Analytics/Data visualization and communication/Assignment 2 dd.xlsx")

Population_2019_1 <- as.character(California.Numbers$Population_2019)

merge.California<-sp::merge(California.shp, California.Numbers, 
                               by="NAME", duplicateGeoms = TRUE)

bins <- quantile(
  California.Numbers$Population_2019,
  probs = seq(0,1,.125), names = FALSE, na.rm = TRUE)
bins
## [1]     1129.00    19149.50    48048.25    98015.00   187029.00   355122.25
## [7]   709276.00  1502241.50 10039107.00
pal <- colorBin(
  "YlOrBr",
  domain = California.Numbers$Population_2019, 
  bins = bins
)

p2 <- leaflet(merge.California) %>% 
  setView(lng = -119.4179, lat = 36.7783, zoom = 6)
p2 %>% addPolygons(
  fillColor = ~pal(Population_2019),
  weight = 2,
  opacity = 1,
  color = "black",
  dashArray = "3",
  fillOpacity = 0.7)
labels <- sprintf(
  "<strong>%s</strong><br/>%s  Population",
  merge.California$NAME, 
  Population_2019_1
) %>% lapply(htmltools::HTML)

title <- tags$div(
  HTML('Population of each counties in California  - 2019')
)

p3 <- p2 %>% addPolygons(
  fillColor = ~pal(Population_2019),
  weight = 1,
  opacity = 2,
  color = "black",
  dashArray = "3",
  fillOpacity = 2,
  highlight = highlightOptions(
    weight = 5,
    color = "#666",
    dashArray = "",
    fillOpacity = 0.7,
    bringToFront = TRUE),
  label = labels,
  labelOptions = labelOptions(
    style = list("font-weight" = "normal", padding = "3px 8px"),
    textsize = "15px",
    direction = "auto")) %>% 
addLegend(pal = pal, 
          values = ~Population_2019,
          opacity = 0.7, title = "Population",
          position = "bottomleft") %>% 
  addControl(title,position = "topright")

Data Reference

The link that contains the original data set: https://www2.census.gov/programs-surveys/popest/tables/2010-2019/counties/totals/co-est2019-annres-06.xlsx

Reconstruction

The following plot fixes the main issues in the original.