1. Overview

Police brutality has been in the spotlight in recent months. With the death of George Floyd being the straw that broke the camel’s back, Americans from all walks of life united together to protest against police brutality. This has led to calls to de-fund the police, in which funds that have been devoted to police budgets are re-diverted to other non-policing means, such as improving community support at the grassroots level. However, opponents of such a movement claim that police brutality is often perpetuated by a few bad eggs, and is not endemic throughout all police departments in the United States.

1.1 Purpose

The purpose of this visualisation is to make use of the dataset that has been retrieved from Mapping Police Violence (https://mappingpoliceviolence.org/), a research collaborative collecting data on police killings in the US, to uncover some of the more notable trends that relate to police killings. Through the dataset, we can endeavour to gain some insights into whether police brutality is restricted to only certain police departments, or is a systemic, wide-spread problem that needs to be addressed urgently.

1.2 Data & Design Challenges

The dataset is split into four different sheets, aggregated on different levels. Moreover, for some sheets, additional data pre-processing is needed before the data can be ready for visualisation. Therefore, Python will be used to help with the data pre-processing for two of the sheets where more data cleaning is needed. The Pandas library will be used, which is helpful to aggregate data on different levels, such as by performing groupbys. Thereafter, the datasets will be exported from Python and into R for visualisation purposes.

With regards to design challenges, as the number of states in the US are quite a lot, the state-level visualisations will only show the top 10 states when it comes to various police-brutality related metrics, for the sake of brevity and not over-cluttering the visualistion. An interactive plot would help circumvent this problem, as the viewer can select which states he wishes to compare against, and can be further implemented in future. Additionally, it is important to highlight trends in police brutality objectively but also in a clearly understandable manner, as the target viewer may not have advanced statistical knowledge. Therefore, with this target audience in mind, the visualisations will make clear of such trends to a layman viewer, without the use of overly complicated charts or in-depth statistical knowledge.

1.3 Sketch of Proposed Design

2. Step-by-step description

2.1 Install and load R packages

  • tidyverse - Helps with cleaning, processing, modelling and visualising data, and includes the ggplot2 and dplyr packages.
  • readxl - Helps to retrieve data from .xlsx files.
  • plyr - Helps with data manipulation.
  • ggthemes - Provides additional themes for ggplot2.
  • forcats - Provides help with handling factors, including changing the order.
  • zoo - Helps with preparing time series variables.
  • plotly - Helps with making high quality visualisations
  • extrafont - Used for loading fonts for use in visualisation. Please ensure you have the Verdana font and use the loadfonts() command.
  • reshape2 - Helps to transform data between wide to long formats
packages = c("tidyverse", "readxl", "plyr", "ggthemes", "forcats", 'zoo', 'plotly', 'extrafont', 'reshape2')

for(p in packages){
  if(!require(p, character.only=T)){
    install.packages(p)
  }
  library(p, character.only=T)
}

2.1.1 Loading fonts

In order to use the Verdana font, please ensure you have it installed and then load the next line. You can use the command fonts() to check if it has been loaded into R.

loadfonts()

2.2 Chart #1 - Pyramid Plot of Police Killings by Gender

The pyramid plot will look at top 10 states by count of police killings by gender.

2.2.1 Loading the data

The data for this chart has been cleaned using Pandas and aggregated at the state level, with the counts of each victim by gender.

data <- read.csv('Victims by State & Gender.csv')

2.2.2 Visualising Chart #1

The breaks and labels are pre-defined outside ggplot and then later fed in as arguments to scale_y_continuous. As the dataset comprises many states, only the top 10 states are shown here. The factors are also reordered in descending order, according to how high the male victim count is per state.

brks <- seq(-1200, 1200, 200)
labels <- paste0(as.character(c(seq(1200, 0, -200), seq(200, 1200, 200))))

chart_one <- ggplot(subset(data, State %in% c('CA', 'TX', 'FL', 'AZ', 'GA', 'CO', 'WA', 'OK', 'OH', 'NC')), aes(x = fct_relevel(State, levels=c('CA', 'TX', 'FL', 'AZ', 'GA', 'CO', 'WA', 'OK', 'OH', 'NC')), y = Victims, fill=Gender)) +
  geom_bar(stat='identity', width=0.6)+
  scale_y_continuous(breaks=brks, labels=labels) +
  coord_flip() +
  labs(title='1. Top 10 States with highest victim count by Gender (Jan 2013 - Jun 2020)',
       subtitle='The pyramid chart shows the gender disparity of police killings across the top 10 states with the highest overall victim count. \n \nSource: Mapping Police Violence') + 
  theme_economist()+
  theme(plot.title = element_text(vjust=1.5),
        plot.subtitle = element_text(hjust=0),
        axis.ticks.y = element_blank(),
        axis.title.y = element_text(vjust=4, face='bold'),
        axis.title.x = element_text(vjust=-4, face='bold'),
        text = element_text(family='Verdana'),
        panel.grid.major = element_blank(),
        legend.title = element_text(face='bold'),
        legend.key = element_rect(fill='whitesmoke'),
        legend.position=c(0.7, 0.5)
  )+
  scale_fill_manual(values = c("red","steelblue")) +
  xlab('State') + ylab('Count of Victims by Gender')

chart_one

2.3 Chart #2 - Calendar Heatmap of Police Kililngs

The calender heatmap shows on which days did police killings take place, and for days which police killings did occur, the count of the number of such killings across the United States.

2.3.1 Loading the data

Most of the data wrangling for this chart was done using pandas, to create additional derived fields from the date fields.

df <- read.csv("calendar_heatmapv4.csv")

2.3.2 Data Manipulation

Several fields are turned to datetime formats, such as the weekday field. Additionally, the ‘as.yearmon’ method is used to create a yearmonth variable to help with visualisation. Subsequently, yearmonth is converted into a factor. Monthweek is calculated in order to get the week of the month from the week of the year. Subsequently, the dataframe is reassigned to only keep the relevant columns for visualisation purposes. Ordered_months and ordered_weeks variables are used for formatting and ordering the visualisation.

df$weekday <- as.Date(df$Date)  

# Create Month Week
df$yearmonth <- as.yearmon(df$Date)
df$yearmonthf <- factor(df$yearmonth)

df <- ddply(df,.(yearmonthf), transform, monthweek=1+Week-min(Week)) 
df <- df[, c("Year", "yearmonthf", "Monthf", "Week", "monthweek", "Weekdayf", "Victims")]


df$ordered_month <- factor(df$Monthf, levels=c('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'))
df$ordered_week <- factor(df$monthweek, levels=c('1','2','3','4','5'))

2.3.3 Visualising Chart #2

The calendar plot is visualised, and the various levels are ordered accordingly. The facet grid allows one to see the breakdown of police killings for every single day from Jan 2013 to Jun 2020 clearly. A light red to dark red gradient is selected to encode the intensity of the number of police killings, with equal breaks and intervals of 2. On days that have no police killings, they are shown as empty squares that are not coloured.

2.4 Chart #3 - Grouped bar chart of Police Killings by State and Racial group

The grouped bar chart gives a quick overview in terms of the disparity between police kill rates of black people versus all racial groups in America.

2.4.1 Loading the data

For this chart, the original dataset’s sheet is loaded in.

data_state <- read_excel('MPVDatasetDownload.xlsx', sheet='2013-2019 Killings by State')

2.4.2 Data Manipulation

Using the reshape2 library, the data is reshaped, such that for each state, there are two rates - one for black people and another rate for all people. This step is necessary for the grouped bar chart.

df_m <- melt(data_state[,c('State', 'Rate (Black People)', 'Rate (All People)')], id.vars = 1)

2.4.3 Visualising Chart #3

For this visualisation, the top 10 states with the highest combined kill rates are shown. Therefore, a subset of the top 10 states is used, and then reordered in descending order.

chart_three <- ggplot(subset(df_m, State %in% c('Utah', 'Alaska', 'Oklahoma', 'Oregon', 'Arizona', 'New Mexico', 'West Virginia', 'Colorado', 'Nevada', 'Missouri')), aes(x = fct_relevel(State, levels = rev(c('Utah', 'Alaska', 'Oklahoma', 'West Virginia', 'Oregon', 'Arizona', 'Colorado', 'Missouri', 'Nevada', 'New Mexico'))), y=value)) +
  geom_bar(aes(fill=variable), stat='identity', position='dodge') + 
  coord_flip() + 
  theme_economist()+
  labs(x="State", 
       y ="% Kill Rate of Racial Group by Police", 
       fill = '% Kill Rate',
       title = '3. Comparison of Police Killings by State and Race (Standardised by Population in each State)',
       subtitle="The grouped bar chart shows the disparity of police killings towards Black People, versus people that belong to other ethnicities from Jan 2013 to Dec 2019. Only the \ntop 10 states in terms of the highest combined kill rate are shown.\n\nSource: Mapping Police Violence") +
  scale_fill_manual(values = c("#008BBC","#76c0c1")) + #76c0c1 #6E8E84
  scale_y_continuous(labels = function(value) paste0(value, '%')) +
  theme(plot.title = element_text(vjust=1.5),
        plot.subtitle = element_text(hjust=0, size=14),
      axis.ticks.y = element_blank(),
      axis.title.y = element_text(vjust=4, face='bold'),
      axis.title.x = element_text(vjust=-3, face='bold'),
      text = element_text(family='Verdana'),
      panel.grid.major = element_blank())

chart_three  

2.5 Chart #4 - Lollipop chart of average homicide rate

The lollipop chart is a good alternative to the bar chart as it reduces the amount of data ink. This lollipop chart shows the comparison of the average homicide rate for city police departments, versus the benchmark of the 2018 National Murder Rate

2.5.1 Loading the data

The data is located in another sheet of the dataset and is loaded as is.

data_pd <- read_excel('MPVDatasetDownload.xlsx', sheet='Police Killings of Black Men')

2.5.2 Data Manipulation

The top 10 city police departments with the highest average homicide rates between Jan 2013 to Dec 2019 are shown. Additional variables are added to allow ggplot2 to differentiate the colour of the benchmark from the other data points.

top10 <- data_pd %>%
  select(City, `Average Police Homicide Rate for Black Men (per 100,000) (2013-19)`) %>%
  arrange(desc(`Average Police Homicide Rate for Black Men (per 100,000) (2013-19)`)) %>%
  arrange(`Average Police Homicide Rate for Black Men (per 100,000) (2013-19)`) %>%
  mutate(City = factor(City, levels= .$City))

top10 <- top10 %>% mutate(toHighlight = ifelse(City == 'National Murder Rate (2018)', "no", 'yes'))
top10 <- top10 %>% mutate(point_colour = ifelse(City == 'National Murder Rate (2018)', "red", 'blue'))

2.5.3 Visualising Chart #4

The lollipop chart is created using a combination of the geom_segment and geom_point methods. Additionally, the colours are scaled according to the variables that were set up in the previous section.

3. Final Visualisation

The final visualisation is shown below.

4. Insights

  • The first insight is that from the pyramid plot, it is evident that police brutality disparately targets males more than females in the top 10 states with the highest victim count. This could be attributed to the fact that previous studies have uncovered that gun violence is disproportionately a male problem (Dasgatir, 2017), and according to the non-profit Kaiser Family Foundation (2018), men also accounted for 86% of gun deaths in the United States.

  • From the calendar heatmap, the number of police killings that have taken place in the past 6.5 years show that police killings take place almost every day. On the worst days with the highest number of killings, it could go up to 10 killings a day. This may hint at the problem that the police may be overly inclined to use lethal force at the first sign of danger, instead of attempting to use non-lethal weapons such as tasers first, in an attempt to disarm possible perpetrators (Picheta & Pettersson, 2020).

  • Looking at the grouped bar chart, the rate of police killings towards black people is disproportionately higher than that when compared to police killings of all racial groups. This trend is consistent for all the states in the visualisation, showing that there is a systemic problem of police brutality towards African Americans in particular. This is corroborated by a research paper that found the odds ratio of police brutality towards blacks is 2.769 (Fryer JR, 2017).

  • The lollipop chart shows eight city police departments with an average homicide rate above the benchmark of the 2018 National Murder Rate. The fact that homicide rates by numerous city police departments underscores the need to re-evaluate police training for these departments. The high rate of homicides by police is also addressed in a research paper (Barber et al., 2016), which mentions that approximately 100,000 people are treated for non-lethal injuries inflicted by law enforcement officers.

5. References