Original


Source: r/DataIsUgly Metro Detroit Coronavirus Cases (2020).


Objective

The objective of the original data visualisation was to visualise and compare the Covid-19 infected cases of Wayne County and Metropolitan Detroit. The visualisation was taken from reddit’s r/dataisugly. The target audience for this visualisation would be thee people residing within the state of Michigan in the United States, as both Wayne County and Detroit are situated next to each other and is considered a metropolitan area of Michigan. This visualisation is likely used to inform the locals on the current situation of coronavirus cases around their area.

The chosen visualisation had the following three main issues:

  • The first problem with the visualisation is that it is using a pie chart to represent the visualisation. This is not good as pie charts are known to lack visual accuracy. For example, looking at the visualisation, while it is possible to see that the towns of Wayne County have a higher percentage of cases than Detroit as a whole, it is difficult to compare the proportions of cases in the individual locations of Wayne County visually with pie charts. For example, people living in Hamtramck will find it difficult to compare their proportions to Melvindale. Neither of these towns have a viewable “slice” in this pie chart.

  • The second problem is with regards to the colour’s and label’s on the visualisation. There are too many colours and labels on the pie chart that it causes visual bombardment. Even the lines that connects the “slice” of the pie charts to their respective labels are difficult to read. Additionally, some labels such as Plymouth Township or Northville Township for example, has their labels covered or overlapped by their connection lines to their “slice” of the pie chart. Furthermore, the chosen colour combinations could make it difficult for people who suffer from colour blindness to follow along and understand the visualisation.

  • The third problem with this visualisation is that the pie chart is incomplete as it is missing a few locations within Wayne County, as only 30 out of 42 locations were listed in this visualisation. This could cause deception as it can misrepresent data. For example, people staying in Redford Township, Wayne County do not see their location listed - it can lead to the assumption that they have no coronavirus cases in their location, which upon further inspection is incorrect. They have cases, but they are not represented in the visualisation.

Reference

Code

As this visualisation was posted on r/dataisugly, there was no attached source for this visualisation. However, the visualisation was labelled and mentions that the source of its data was from the Wayne County Public Health Division and the Detroit Health Department. So the data that is used in this reconstruction was sourced directly from the Wayne County Public Health Division in order to ensure the reliability and legitimacy of the data. This is the same for the data for Metropolitan Detroit - the data was taken directly from the Detroit Health Department.

The following code was used to setup the datasets that will be used to fix the issues in the original visualisation.

library(dplyr)

# Wayne County Data
wayneCounty = read.csv("WayneCountyCovid-19.csv") # Importing the dataset
wayneCases = select(wayneCounty, -3) # Removing unnecessary columns
names(wayneCases)[names(wayneCases) == "ï..LOCATION"] <- "LOCATION" # Renaming column
wayneCases$CASES <- as.double(wayneCases$CASES) # Changing cases to Double datatype
wayneData = wayneCases %>% mutate('PERCENTAGE' = (wayneCases$CASES/30876*100)) # Mutating a percentage column
wayneData = wayneData %>% mutate('ALIGNMENT' = 'Wayne County') # Mutating an 'alignment' column to help with the plotting
head(wayneData)
##           LOCATION CASES PERCENTAGE    ALIGNMENT
## 1         Dearborn  2561   8.294468 Wayne County
## 2          Livonia  1348   4.365850 Wayne County
## 3         Westland  1216   3.938334 Wayne County
## 4  Canton Township  1082   3.504340 Wayne County
## 5 Dearborn Heights  1068   3.458997 Wayne County
## 6           Taylor  1011   3.274388 Wayne County
# Metropolitan Detroit Data
metroDetroit = read.csv("MetroDetroitCovid-19.csv") # Importing the dataset
metroDetroit$ï..ZIP.CODE <- as.character(metroDetroit$ï..ZIP.CODE) # Converting to char datatype
totalDetroit <- rbind(metroDetroit, c("Detroit", colSums(metroDetroit[2]))) # Finding total number of cases for the whole of Detroit
detroit = tail(totalDetroit, n = 1) # Just keeping the Detroit data
names(detroit)[names(detroit) == "ï..ZIP.CODE"] <- "LOCATION" # Renaming column
detroit$CASES <- as.double(detroit$CASES) # Converting to Double datatype
detroit$LOCATION <- as.factor(detroit$LOCATION) # Converting Detroit to a factor
detroitData = detroit %>% mutate('PERCENTAGE' = (detroit$CASES/30876*100)) # Mutating a percentage column
detroitData = detroitData %>% mutate('ALIGNMENT' = 'Metro Detroit') # Mutating an 'alignment' column to help with the plotting
head(detroitData)
##   LOCATION CASES PERCENTAGE     ALIGNMENT
## 1  Detroit 13224   42.82938 Metro Detroit
# Combining the datasets
wayneDetroit <- rbind(wayneData, detroitData) # Combining the two datasets
tail(wayneDetroit)
##                LOCATION CASES PERCENTAGE     ALIGNMENT
## 38     Sumpter Township    73  0.2364296  Wayne County
## 39 Northville (City of)    57  0.1846094  Wayne County
## 40 Grosse Pointe Shores    51  0.1651768  Wayne County
## 41            Gibraltar    51  0.1651768  Wayne County
## 42             Rockwood    31  0.1004016  Wayne County
## 43              Detroit 13224 42.8293820 Metro Detroit

Data Reference

Reconstruction

The following plot fixes the main issues in the original. It is a bar plot that depicts the percentage of Cases of Coronavirus in Metropolitan Michigan. The left bar represents Metro Detroit, while the right stacked-bar represents the whole of Wayne County. The stacked-bar is also arranged according to the legend so it is easier to follow.