The article I chose for this assignment is called “Where Police Have Killed Americans In 2015”, written by Ben Casselman (https://fivethirtyeight.com/features/where-police-have-killed-americans-in-2015/)
This article is about the release of Guardian’s interactive database of Americans killed by police in 2015. The data was retrieved from a combination of media coverage, reader submissions, and open-source information. The Guardian then verified the incidents with their own reporting processes.
In this section, I do some data wrangling.
library(readr)
# retrieve the csv file from GitHub
urlfile = "https://raw.githubusercontent.com/fivethirtyeight/data/master/police-killings/police_killings.csv"
policekillings <- read_csv(url(urlfile), show_col_types = FALSE)
# subset the data into a smaller data frame
policekillings <- subset(policekillings, select=c("name", "age", "gender", "raceethnicity", "state", "h_income", "pov"))
# remove the rows that have "unknown" for age
policekillings <- policekillings[policekillings$age != "Unknown", ]
# remove the rows that have "-" for poverty
policekillings <- policekillings[policekillings$pov != "-", ]
# change columns from characters to numeric
policekillings$pov <- as.numeric(policekillings$pov)
policekillings$age <- as.integer(policekillings$age)
# rename the columns
colnames(policekillings) <- c("Name", "Age", "Gender", "Race", "State", "HouseholdIncome", "PovertyRate")
policekillings <- data.frame(policekillings)
# show a glimpse of the data frame
head(policekillings)
## Name Age Gender Race State HouseholdIncome
## 1 A'donte Washington 16 Male Black AL 51367
## 2 Aaron Rutledge 27 Male White LA 27972
## 3 Aaron Siler 26 Male White WI 45365
## 4 Aaron Valdez 25 Male Hispanic/Latino CA 48295
## 5 Adam Jovicic 29 Male White OH 68785
## 6 Adam Reinhart 29 Male White AZ 20833
## PovertyRate
## 1 14.1
## 2 28.8
## 3 14.6
## 4 11.7
## 5 1.9
## 6 58.0
In this next section, I use the above subset to determine the breakdown of killings for each poverty rate range, grouped by state. This is visualized in the table below.
library(gt)
library(dplyr)
## show a table with each state's count of killings for each poverty level range
# Define the breakpoints for poverty rate categories
breaks <- seq(0, 100, by = 10)
# Label each category
custom_labels <- c(
"0-10%", "10-20%", "20-30%", "30-40%", "40-50%",
"50-60%", "60-70%", "70-80%", "80-90%", "90-100%"
)
# Create a new column with poverty rate categories
policekillings <- policekillings %>%
mutate(pov_category = cut(PovertyRate, breaks = breaks, labels = custom_labels))
# Group the data by state and poverty rate category, calculate counts
summary_data <- policekillings %>%
group_by(State, pov_category) %>%
summarise(count = n())
# Create a gt table from the summarized data
policekillings_tbl <- gt(summary_data)
# Customize the table headers
policekillings_tbl <- policekillings_tbl |>
tab_header(
title = md("**Killings by Poverty Rate in Each State**")
) |>
cols_label(
State = "State", pov_category = md("**Poverty Rate Range**"), count = md("**Killings Count**")
)
# Display the table
policekillings_tbl
| Killings by Poverty Rate in Each State | |
| Poverty Rate Range | Killings Count |
|---|---|
| AK | |
| 10-20% | 1 |
| 20-30% | 1 |
| AL | |
| 0-10% | 3 |
| 10-20% | 2 |
| 20-30% | 2 |
| 30-40% | 1 |
| AR | |
| 10-20% | 2 |
| 20-30% | 1 |
| 30-40% | 1 |
| AZ | |
| 0-10% | 4 |
| 10-20% | 9 |
| 20-30% | 5 |
| 30-40% | 4 |
| 40-50% | 1 |
| 50-60% | 2 |
| CA | |
| 0-10% | 18 |
| 10-20% | 22 |
| 20-30% | 19 |
| 30-40% | 9 |
| 40-50% | 2 |
| 50-60% | 3 |
| 70-80% | 1 |
| CO | |
| 0-10% | 3 |
| 10-20% | 3 |
| 20-30% | 5 |
| 40-50% | 1 |
| CT | |
| 0-10% | 1 |
| DC | |
| 10-20% | 1 |
| DE | |
| 0-10% | 1 |
| 10-20% | 1 |
| FL | |
| 0-10% | 5 |
| 10-20% | 13 |
| 20-30% | 2 |
| 30-40% | 5 |
| 40-50% | 2 |
| 50-60% | 2 |
| GA | |
| 0-10% | 5 |
| 10-20% | 6 |
| 20-30% | 2 |
| 30-40% | 1 |
| 40-50% | 2 |
| HI | |
| 0-10% | 2 |
| 10-20% | 1 |
| 20-30% | 1 |
| IA | |
| 20-30% | 1 |
| 30-40% | 1 |
| ID | |
| 0-10% | 2 |
| 20-30% | 1 |
| 40-50% | 1 |
| IL | |
| 10-20% | 4 |
| 20-30% | 3 |
| 30-40% | 4 |
| IN | |
| 0-10% | 1 |
| 10-20% | 2 |
| 20-30% | 4 |
| 30-40% | 1 |
| KS | |
| 0-10% | 2 |
| 10-20% | 1 |
| 20-30% | 1 |
| 30-40% | 1 |
| 40-50% | 1 |
| KY | |
| 0-10% | 1 |
| 10-20% | 3 |
| 30-40% | 2 |
| 40-50% | 1 |
| LA | |
| 10-20% | 2 |
| 20-30% | 2 |
| 30-40% | 2 |
| 40-50% | 2 |
| 50-60% | 2 |
| MA | |
| 0-10% | 3 |
| 10-20% | 1 |
| 40-50% | 1 |
| MD | |
| 0-10% | 3 |
| 10-20% | 3 |
| 20-30% | 1 |
| 30-40% | 1 |
| 40-50% | 1 |
| 50-60% | 1 |
| ME | |
| 10-20% | 1 |
| MI | |
| 0-10% | 4 |
| 10-20% | 2 |
| 30-40% | 1 |
| 40-50% | 1 |
| 50-60% | 1 |
| MN | |
| 0-10% | 2 |
| 10-20% | 2 |
| 20-30% | 1 |
| 40-50% | 1 |
| MO | |
| 10-20% | 3 |
| 20-30% | 2 |
| 30-40% | 3 |
| 50-60% | 1 |
| 60-70% | 1 |
| MS | |
| 10-20% | 3 |
| 20-30% | 1 |
| 30-40% | 1 |
| 40-50% | 1 |
| MT | |
| 0-10% | 1 |
| 20-30% | 1 |
| NC | |
| 0-10% | 1 |
| 10-20% | 2 |
| 20-30% | 4 |
| 30-40% | 2 |
| 60-70% | 1 |
| NE | |
| 0-10% | 1 |
| 10-20% | 4 |
| 30-40% | 1 |
| NH | |
| 0-10% | 1 |
| NJ | |
| 0-10% | 5 |
| 10-20% | 5 |
| 30-40% | 1 |
| NM | |
| 0-10% | 2 |
| 10-20% | 2 |
| 20-30% | 1 |
| NV | |
| 10-20% | 2 |
| 20-30% | 1 |
| NY | |
| 0-10% | 5 |
| 10-20% | 3 |
| 20-30% | 1 |
| 30-40% | 1 |
| 40-50% | 3 |
| OH | |
| 0-10% | 2 |
| 10-20% | 3 |
| 20-30% | 1 |
| 30-40% | 2 |
| 40-50% | 1 |
| OK | |
| 0-10% | 4 |
| 10-20% | 3 |
| 20-30% | 8 |
| 30-40% | 6 |
| 50-60% | 1 |
| OR | |
| 0-10% | 2 |
| 10-20% | 1 |
| 20-30% | 3 |
| 30-40% | 2 |
| PA | |
| 0-10% | 2 |
| 10-20% | 3 |
| 30-40% | 2 |
| SC | |
| 0-10% | 2 |
| 10-20% | 4 |
| 20-30% | 2 |
| 40-50% | 1 |
| TN | |
| 0-10% | 1 |
| 10-20% | 3 |
| 30-40% | 2 |
| TX | |
| 0-10% | 10 |
| 10-20% | 12 |
| 20-30% | 10 |
| 30-40% | 7 |
| 40-50% | 3 |
| 50-60% | 1 |
| UT | |
| 0-10% | 2 |
| 10-20% | 2 |
| 20-30% | 1 |
| VA | |
| 0-10% | 2 |
| 10-20% | 3 |
| 20-30% | 3 |
| 30-40% | 1 |
| WA | |
| 10-20% | 4 |
| 20-30% | 5 |
| 30-40% | 2 |
| WI | |
| 10-20% | 4 |
| 20-30% | 1 |
| WV | |
| 10-20% | 2 |
| WY | |
| 0-10% | 1 |
The article is very short and only contains a data table with a small subset of the data. If I wanted to extend the work in the article, I would provide a few graphs to help visualize the data to readers.
I would provide a graph showing the distribution of killings based on poverty rate.
library(ggplot2)
colnames(policekillings) <- c("Name", "Age", "Gender", "Race", "State", "HouseholdIncome", "PovertyRate")
# Create histogram for distribution of killings based on poverty rate
ggplot() +
geom_histogram(data = policekillings, aes(x = PovertyRate), fill = "lightblue", color = "darkblue", binwidth = 5, alpha = 0.5) +
labs(
title = "Distribution of Killings Based on Poverty Rate",
x = "Poverty Rate (%)",
y = "Frequency"
) +
scale_x_continuous(breaks = seq(0, 100, by = 10))
As you can see from the above graph, there is a higher distribution of killings in areas where the poverty rate is between 5-25%. This is interesting, because the article states “One thing that’s clear from the data: Police killings tend to take place in neighborhoods that are poorer and blacker than the U.S. as a whole.” (*)
In the article, the author based this statement off of the household income data. Let’s see if the household income provides a different distribution.
I would provide a graph showing the distribution of killings based on household income.
colnames(policekillings) <- c("Name", "Age", "Gender", "Race", "State", "HouseholdIncome", "PovertyRate")
# Create histogram for distribution based on household income
ggplot() +
geom_histogram(data = policekillings, aes(x = HouseholdIncome), fill = "lightgreen", color = "darkgreen", binwidth = 10000, alpha = 0.5) +
labs(
title = "Distribution of Killings Based on Household Income",
x = "Household Income ($)",
y = "Frequency"
) +
scale_x_continuous(breaks = seq(0, 140000, by = 15000))
As you can see from the above graph, there is a higher distribution of killings in areas where the household income is lower (between $15,000 and $60,000). If you’re looking at the data in this way, you could say the author was correct by their statement (*).
FiveThirtyEight. Where Police Have Killed Americans in 2015. https://fivethirtyeight.com/features/where-police-have-killed-americans-in-2015/
FiveThirtyEight. Police Killings Data https://github.com/fivethirtyeight/data/blob/master/police-killings