Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
Explain the objective of the original data visualisation and the targetted audience. Ans: The objective of this data visualisation is to emphasize on a point that killers are getting away with murders due to disparity in US police’s homicide arrest rate. The following data visualisation attempts to establish how race of victim could be a reason for US Police to decide whether to make an arrest or not. The graphic shows two categories of percentage of arrests for 54 states of United States - when the vicim is ‘White’ or non-white termed as ‘Minority’. The Washington Post is a national newspaper with a paywall, therefore it’s targeted audience is it’s paid subscribers. It’s audience is both domestic and international. Given it is a well known newspaper, a data visualisation report on homicide arrest would be of interest for American taxpayers, politicians, bureacrats and Human rights activists and Race relation experts.
The Washington Post is the primary source of data. They collated data on 55,000 criminal homicides in 55 largest American cities. The Washington Post mapped each homicide by geography in each city. Prior to publication they validated their analysis to local police department. Therefore, we can have confidence in the data integrity. “The Post considered a homicide to be closed by arrest when police reported that to be the case” (The Washington Post, 2018).
The visualisation chosen had the following three main issues:
Issue 1 - Perceptual or colour issues: The data visualisation commentary infers that arrest rate is higher in homicides with a white victim compared with a minority victim. From the visualisation it appears that the arrest rate is not just often but in fact quite high. Now whether it is perceptual issue, needs to be investigated. The visualisation also has an issue of connectedness with many object clustered together connected by lines demonstrates relationship between objects when there might be none. It also has a continuity issue where arrest rates could be perceived between different cities could be perceived as similar when they may be dissimilar. The visualisation is represented by box plot which leaves out ambiguity in perception.
Briefly explain issue 2 - Failure to answer a practical question - The question The Washington Post is trying to address is whether police arrests have a corelation with victim’s racial group. The data visualisation categorises the data in just two categories - vicim is either white or minority. All the non-white racial groups are categorised into one box - minority. This does not fully address the question visualisation is trying to answer. A seemingly closer to reality answer would be to look into how police arrests are spread across different racial groups. This is achieved by using R tools to categorise the data into all the reported racial group categories. They are White, Hispanic, Black, Asian, Other and Unknown. In addition to that before approaching how the number of arrests are spread across fifty five cities, a good start for a reconstructed visualisation would be to count the number of arrests across the country, grouped in different racial categories. Another failure in the Washington Post’s visualisation is that they have included cases where the arrests are not made. A more accurate representation would be pick up only those cases where arrests have been made and then see which racial group victims belong to. R codes are used to filter the data and create a dataset with only those cases which were closed after arrests.
Briefly explain issue 3 - Deceptive methods - having analysed the dataset from the original source, this visualisation falls under the data deception category. “A graphical depiction of information, designed with or without an intent to deceive, that may create a belief about the message and/or its components, which varies from the actual message.”(p.1471, Pandey etal.2015)The actual message from the dataset is that in the cases closed by arrests, the number of arrests across the country is significantly higher for with victims who are black as compared to other races. This is in complete contrast to original message conveyed the visualisation which is that white vicims lead to higher number of arrests. The actual message is whenever arrests are made black victims have highest number than any other race. R codes are applied to first filter data set for cases closed by arrest. Then data is reorganised grouped by city and victim’s race. For reconstruction this data is now box plotted. The box plot reflect very clearly that when the victim belongs to black racial background the number of arrests are highest.
Reference
The following code was used to fix the issues identified in the original.
library(ggplot2)
homicide <- read.csv("homicide-data.csv")
df <- homicide %>% filter( homicide$disposition == "Closed by arrest")
dfp <- df %>% group_by(city,victim_race) %>% count(disposition, sort = TRUE)
dfp
## # A tibble: 238 x 4
## # Groups: city, victim_race [238]
## city victim_race disposition n
## <chr> <chr> <chr> <int>
## 1 Philadelphia Black Closed by arrest 1274
## 2 Chicago Black Closed by arrest 1077
## 3 Detroit Black Closed by arrest 905
## 4 Memphis Black Closed by arrest 883
## 5 Baltimore Black Closed by arrest 876
## 6 Dallas Unknown Closed by arrest 813
## 7 Houston Black Closed by arrest 769
## 8 Kansas City Unknown Closed by arrest 704
## 9 Washington Black Closed by arrest 678
## 10 St. Louis Black Closed by arrest 671
## # ... with 228 more rows
Data Reference
The following plot fixes the main issues in the original.