R Markdown

The following lines of code import necessary libraries to execute this code:

Preparation / Imports

install.packages("devtools")
install.packages("flexdashboard")
install.packages("lubridate")
install.packages("dplyr")
install.packages("ggplot2")
devtools::install_github("lionel-/ggstance")
install.packages("ggalt")
library(devtools)
library(flexdashboard)
library(lubridate)
library(dplyr)
library(ggplot2)
library(ggstance)
library(ggalt)

Introduction

For this lab, we took a look at the use of deadly force by police officers in the United States. This analysis is based on the work of Julia Silge’s “data flex-dashboard”.

The data we imported is a compilation of every fatal shooting in the United States by a police officer in the line of duty from a database organized by the Washington Post since Jan. 1, 2015.

Since 2015, The Washington Post has recorded dozens of details about each killing:

In 2016, The Washington post went to gather additional information about each fatal shooting that occurred during this year. There were reports, websites, and social media that have been constantly monitoring independent databases about fatal police shootings. More than a dozen additional details were collected about the officers and victims in each individual shooting. As of right now, there is a continuous live web tracking service that records the number of Fatal Police Shootings in 2016.

In this lab assignment, we want to look at the data set to make an in-depth analysis on the different demographic, date, location, and circumstances of death within Fatal Police Shootings Data.

Analysis

We imported the data set obtained from the Washington Post police shootings database. Then, we clean up the code by changing dates, genders, races, and manners of death to an understandable format.

fatalshootings <- read.csv("./fatal-police-shootings-data.csv", stringsAsFactors = FALSE)
fatalshootings$date <- ymd(fatalshootings$date)
fatalshootings[,c(4:5,7:8,9:14)] <- lapply(fatalshootings[,c(4:5,7:8,9:14)], as.factor)
levels(fatalshootings$gender) <- c("Female", "Male")
levels(fatalshootings$race) <- c("Unknown", "Asian", "Black", "Hispanic", "Unknown", "Other", "White")
fatalshootings$race <- factor(fatalshootings$race, levels(fatalshootings$race)[c(5,2,1,4,3,6)])
levels(fatalshootings$flee) <- c("Unknown", "Car", "Foot", "Not fleeing", "Other")
fatalshootings$flee <- factor(fatalshootings$flee, levels(fatalshootings$flee)[c(1,5,3,2,4)])
levels(fatalshootings$manner_of_death) <- c("Beaten", "Shot", "Shot/Tasered")
fatalshootings$manner_of_death <- factor(fatalshootings$manner_of_death, levels(fatalshootings$manner_of_death)[c(1,3,2)])

Once we have our clean data, we are ready to start creating interesting visualizations using ggplot.


Demographic Analysis


Deaths by Race/Ethnicity

ggplot(data = fatalshootings, aes(y = race)) + 
        geom_barh(aes(fill = ..count..)) +
        theme_minimal(base_size = 13) +
        theme(legend.position = "none") +
        scale_x_continuous(expand=c(0,0)) +
        scale_fill_gradient(low = "royalblue3", high = "navyblue") +
        labs(y = NULL, x = "Number of deaths")

This is a simple graph yet you can still scrutinize a immense amount information from it. Instead of the typical vertical graph the original creator Julia decided to make the graph horizontal. Based on the number of Fatal Shootings she also implemented a lighter-to-darker gradient transition. Overall, she makes good use of whitespace and direction. The same goes with ink to data ratio. The most important feature is that one can clearly tell which ethinicity suffers the most from fatal police shootings. Our only question is what categorizes the 'unknown'.

Deaths by Gender

ggplot(data = fatalshootings, aes(y = gender)) + 
        geom_barh(aes(fill = ..count..)) +
        theme_minimal(base_size = 13) +
        theme(legend.position = "none") +
        scale_x_continuous(expand=c(0,0)) +
        scale_fill_gradient(low = "royalblue3", high = "navyblue") +
        labs(y = NULL, x = "Number of deaths")

Clearly, male victims fall into the greater number of fatal police shootings. Further investigation is needed since the male ratio deaths are overwhelmingly higher than the females. Possible questions could be why are males usually approached with more violence and deaths compared to females?

Deaths by Age

ggplot(data = fatalshootings, aes(x = age)) + 
        geom_histogram(aes(fill = ..count..), bins = 20) +
        theme_minimal(base_size = 13) +
        theme(legend.position = "none") +
        scale_x_continuous(expand=c(0,0)) +
        scale_fill_gradient(low = "royalblue3", high = "navyblue") +
        labs(x = "Age at death", y = "Number of deaths")

The histogram forms a skewed-right distribution. The average number of deaths seem to be the greatest around the age of 30. We are also glad that children are not usually involved with fatal shootings. Overall, the creator applies good data visualization by using consistent themes and having a clear number of bins for her ggplots.

Deaths based on mental illness

ggplot(data = fatalshootings, aes(y = signs_of_mental_illness)) + 
        geom_barh(aes(fill = ..count..)) +
        theme_minimal(base_size = 13) +
        theme(legend.position = "none") +
        scale_x_continuous(expand=c(0,0)) +
        scale_fill_gradient(low = "royalblue3", high = "navyblue") +
        labs(y = NULL, x = "Number of deaths")

# I would add the percentages or proportions onto the graph
summary(fatalshootings$signs_of_mental_illness)
## False  True 
##  1278   420
(420/1698)
## [1] 0.2473498
A significant proportion of death is based on mental illness. Could there be some bias in the data collection? What considers to be a mental illness? Is the medical report verified before or after the fatal shooting incident?

Date of Shootings


Overall

ggplot(data = fatalshootings, aes(x = date)) + 
        geom_histogram(aes(fill = ..count..), bins = 25) +
        theme_minimal() +
        theme(legend.position = "none") +
        scale_fill_gradient(low = "royalblue3", high = "navyblue") +
        labs(x = NULL, y = "Number of deaths")

Deaths by Month

ggplot(data = fatalshootings, aes(x = month(date, label = TRUE))) + 
        geom_bar(aes(fill = ..count..)) +
        theme_minimal() +
        theme(legend.position = "none") +
        scale_y_continuous(expand=c(0,0)) +
        scale_fill_gradient(low = "royalblue3", high = "navyblue") +
        labs(x = NULL, y = "Number of deaths")

Deaths by Day of Week

ggplot(data = fatalshootings, aes(x = wday(date, label = TRUE))) + 
        geom_bar(aes(fill = ..count..)) +
        theme_minimal() +
        theme(legend.position = "none") +
        scale_y_continuous(expand=c(0,0)) +
        scale_fill_gradient(low = "royalblue3", high = "navyblue") +
        labs(x = NULL, y = "Number of deaths")

Essentially, the dates of fatal shootings are the same graph, but with different labels and count for each date. When creating the flexboard, it's good that the creator uses a consistent theme while continuing the trend of lighter-to-darker gradient. I also think it was a good call to minimize it to the appropriate number of bins. These features help so that the individual can draw complicated information quickly.

Top 15 States For Fatal Police Shootings


Location Breakdown


Deaths by State

stateinfo <- fatalshootings %>% group_by(state) %>% summarise(n = n()) %>% 
        arrange(desc(n)) %>% top_n(15) %>% 
        mutate(state = factor(state, levels = rev(unique(state))))
ggplot(stateinfo, aes(x = n, y = state)) +
        geom_barh(stat="identity", aes(fill = n)) +
        #geom_stateface(aes(y=state, x=7, label=as.character(state)), colour="white", size=8) +
        geom_text(aes(x = 17, y = state, label=as.character(state)), color="white", size=4) +
        labs(y = NULL, x = "Number of deaths") +
        scale_fill_gradient(low = "royalblue3", high = "navyblue") +
        theme_minimal(base_size = 13) +
        theme(axis.text.y=element_blank()) +
        theme(legend.position = "none") +
        scale_x_continuous(expand=c(0,0))

This graph is a little bit more complicated than the rest. Julie focuses on seperating the top 15 states when finding the unique number of variables. She creates a state column and then maps it to the barplot. She creates the size and identifies the top 15 States for total death. Originally there were state icons, but we couldn't find the 'geom_stateface' asset. 

Was The Person Killed Armed


Was the victim armed?

armedinfo <- fatalshootings %>% group_by(armed) %>% summarise(n = n()) %>% 
        arrange(desc(n)) %>% top_n(10) %>% 
        mutate(armed = factor(armed, levels = rev(unique(armed))))

ggplot(data = armedinfo, aes(x = n, y = armed)) + 
        geom_barh(stat="identity", aes(fill = n)) +
        theme_minimal(base_size = 13) +
        theme(legend.position = "none") +
        scale_x_continuous(expand=c(0,0)) +
        scale_fill_gradient(low = "royalblue3", high = "navyblue") +
        labs(y = NULL, x = "Number of deaths")

There were a great number of unusually recored weapons. It makes sense that the data describes most fatal shootings would occur when victim holding out a gun. However, it does not make sense that there is still a moderately high ratio of deaths for victims who are completely unarmed. Also when wrangling with the data Julie subsetted the data to count the top 10 number of armed weaponry.

Circumstances of Death


Was the officer wearing a body camera?

ggplot(data = fatalshootings, aes(y = body_camera)) + 
        geom_barh(aes(fill = ..count..)) +
        theme_minimal(base_size = 13) +
        theme(legend.position = "none") +
        scale_x_continuous(expand=c(0,0)) +
        scale_fill_gradient(low = "royalblue3", high = "navyblue") +
        labs(y = NULL, x = "Number of deaths")

There should be more officers wearing a body camera. Somehow I believe there could have been faulty reports or censorship within the data collection. I can imagine officers or reporters hidding the fact when approached if asked whether there was a body camera that happen to record the entire event of the officer shooting a victim to death.

Was the victim fleeing?

ggplot(data = fatalshootings, aes(y = flee)) + 
        geom_barh(aes(fill = ..count..)) +
        theme_minimal(base_size = 13) +
        theme(legend.position = "none") +
        scale_x_continuous(expand=c(0,0)) +
        scale_fill_gradient(low = "royalblue3", high = "navyblue") +
        labs(y = NULL, x = "Number of deaths")

Great visualization that shows the amount of victims fleeing or not fleeing during the scene. Again, how do you 'categorize' unknown column if we also have a column called 'other'.

How was the person killed?

ggplot(data = fatalshootings, aes(y = manner_of_death)) + 
        geom_barh(aes(fill = ..count..)) +
        theme_minimal(base_size = 13) +
        theme(legend.position = "none") +
        scale_x_continuous(expand=c(0,0)) +
        scale_fill_gradient(low = "royalblue3", high = "navyblue") +
        labs(y = NULL, x = "Number of deaths")

We did not expect that most victims were beaten to death rather than shot to death. After all, the data is labeled 'Fatal Police Shootings.' 

Follow Up


To enhance the article, we added more detail to the ‘location’ graph to have the data represent death per 1 million residents. The population data was retrieved from the United States Census Bureau here https://www.census.gov/popest/data/datasets.html. We had to import the data, then merge data to perform the calculation, and finally plot the new variable, ‘dPerCapita’. As you can see, California proportionally does not have the most deaths, but rather New Mexico. Also, Wyoming and Arkansas, who were not in the original visualization are second and third in deaths per million.

Death per Capita Breakdown

stateinfo <- fatalshootings %>% group_by(state) %>% summarise(n = n()) %>% 
  arrange(desc(n)) %>% 
  mutate(state = factor(state, levels = rev(unique(state))))

census <- read.csv('./NST-EST2015-alldata.csv', stringsAsFactors = FALSE)
census <- transform(census, state = state.abb[match(census$NAME,state.name)])
stateinfo <- merge(stateinfo,census,by="state")
stateinfo$dPerCap <- stateinfo$n / (stateinfo$POPESTIMATE2015 / 1000000)
stateinfo <- stateinfo %>% arrange(desc(dPerCap)) %>% top_n(15) 

ggplot(stateinfo, aes(x = dPerCap, y = state)) +
  geom_barh(stat="identity", aes(fill = dPerCap)) +
  geom_text(aes(x = 20, y = state, label=as.character(state)), color="white", size=4) +
  labs(y = NULL, x = "Number of deaths per 1M people") +
  scale_fill_gradient(low = "royalblue3", high = "navyblue") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none") +
  scale_x_continuous(expand=c(0,0))


Conclusion