The article is called “Where Police Have Killed Americans In 2015” (https://fivethirtyeight.com/features/where-police-have-killed-americans-in-2015/). It contains the description and analysis of a database of Americans killed by police since the start of 2015.
The data were provided by FiveThirtyEight (https://github.com/fivethirtyeight/data/tree/master/police-killings). They used the Guardian’s database and the Census as their sources.
data <- read.csv("https://raw.githubusercontent.com/ex-pr/Data607-Week-2/main/police_killings.csv", header=TRUE, sep=",")
Dimension 467x34, looking at the dataset and checking which columns have missing values (maybe to remove them)
dim(data)
## [1] 467 34
summary(data)
## name age gender raceethnicity
## Length:467 Length:467 Length:467 Length:467
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## month day year streetaddress
## Length:467 Min. : 1.00 Min. :2015 Length:467
## Class :character 1st Qu.: 8.00 1st Qu.:2015 Class :character
## Mode :character Median :16.00 Median :2015 Mode :character
## Mean :15.83 Mean :2015
## 3rd Qu.:23.00 3rd Qu.:2015
## Max. :31.00 Max. :2015
##
## city state latitude longitude
## Length:467 Length:467 Min. :19.92 Min. :-159.64
## Class :character Class :character 1st Qu.:33.34 1st Qu.:-111.95
## Mode :character Mode :character Median :35.77 Median : -94.76
## Mean :36.40 Mean : -96.97
## 3rd Qu.:39.94 3rd Qu.: -82.96
## Max. :61.22 Max. : -68.10
##
## state_fp county_fp tract_ce geo_id
## Min. : 1.00 Min. : 1.00 Min. : 100 Min. :1.003e+09
## 1st Qu.: 8.00 1st Qu.: 29.00 1st Qu.: 5202 1st Qu.:8.022e+09
## Median :24.00 Median : 63.00 Median : 40200 Median :2.403e+10
## Mean :25.34 Mean : 91.58 Mean :236937 Mean :2.543e+10
## 3rd Qu.:40.00 3rd Qu.:111.00 3rd Qu.:378450 3rd Qu.:4.011e+10
## Max. :56.00 Max. :740.00 Max. :980000 Max. :5.601e+10
##
## county_id namelsad lawenforcementagency cause
## Min. : 1003 Length:467 Length:467 Length:467
## 1st Qu.: 8022 Class :character Class :character Class :character
## Median :24033 Mode :character Mode :character Mode :character
## Mean :25434
## 3rd Qu.:40112
## Max. :56005
##
## armed pop share_white share_black
## Length:467 Min. : 0 Length:467 Length:467
## Class :character 1st Qu.: 3358 Class :character Class :character
## Mode :character Median : 4447 Mode :character Mode :character
## Mean : 4784
## 3rd Qu.: 5816
## Max. :26826
##
## share_hispanic p_income h_income county_income
## Length:467 Length:467 Min. : 10290 Min. : 22545
## Class :character Class :character 1st Qu.: 32625 1st Qu.: 43804
## Mode :character Mode :character Median : 42759 Median : 50856
## Mean : 46627 Mean : 52527
## 3rd Qu.: 56190 3rd Qu.: 56832
## Max. :142500 Max. :110292
## NA's :2
## comp_income county_bucket nat_bucket pov
## Min. :0.1840 Min. :1.000 Min. :1.000 Length:467
## 1st Qu.:0.6454 1st Qu.:1.000 1st Qu.:1.000 Class :character
## Median :0.8696 Median :2.000 Median :2.000 Mode :character
## Mean :0.8959 Mean :2.498 Mean :2.497
## 3rd Qu.:1.0815 3rd Qu.:4.000 3rd Qu.:3.000
## Max. :2.8652 Max. :5.000 Max. :5.000
## NA's :2 NA's :27 NA's :2
## urate college
## Min. :0.01133 Min. :0.01355
## 1st Qu.:0.06859 1st Qu.:0.10617
## Median :0.10518 Median :0.16954
## Mean :0.11740 Mean :0.22022
## 3rd Qu.:0.14083 3rd Qu.:0.28454
## Max. :0.50761 Max. :0.82807
## NA's :2 NA's :2
head(data, n=3)
## name age gender raceethnicity month day year
## 1 A'donte Washington 16 Male Black February 23 2015
## 2 Aaron Rutledge 27 Male White April 2 2015
## 3 Aaron Siler 26 Male White March 14 2015
## streetaddress city state latitude longitude state_fp county_fp
## 1 Clearview Ln Millbrook AL 32.52958 -86.36283 1 51
## 2 300 block Iris Park Dr Pineville LA 31.32174 -92.43486 22 79
## 3 22nd Ave and 56th St Kenosha WI 42.58356 -87.83571 55 59
## tract_ce geo_id county_id namelsad
## 1 30902 1051030902 1051 Census Tract 309.02
## 2 11700 22079011700 22079 Census Tract 117
## 3 1200 55059001200 55059 Census Tract 12
## lawenforcementagency cause armed pop share_white share_black
## 1 Millbrook Police Department Gunshot No 3779 60.5 30.5
## 2 Rapides Parish Sheriff's Office Gunshot No 2769 53.8 36.2
## 3 Kenosha Police Department Gunshot No 4079 73.8 7.7
## share_hispanic p_income h_income county_income comp_income county_bucket
## 1 5.6 28375 51367 54766 0.9379359 3
## 2 0.5 14678 27972 40930 0.6834107 2
## 3 16.8 25286 45365 54930 0.8258693 2
## nat_bucket pov urate college
## 1 3 14.1 0.09768638 0.1685095
## 2 1 28.8 0.06572379 0.1114024
## 3 3 14.6 0.16629314 0.1473123
I decided to check the portrait of a person who most likely to be killed by the police.
Choosing only columns that look important:
subset <- data[,c('name','age','gender','raceethnicity','city','state','cause','armed','share_white','share_black','share_hispanic','h_income','county_income','pov','urate')]
Renaming the dataset columns:
names(subset)[names(subset) == "raceethnicity"] <- "ethnicity"
names(subset)[names(subset) == "share_white"] <- "%white"
names(subset)[names(subset) == "share_black"] <- "%black"
names(subset)[names(subset) == "share_hispanic"] <- "%hispanic"
names(subset)[names(subset) == "h_income"] <- "house_income"
names(subset)[names(subset) == "pov"] <- "pov_rate"
names(subset)[names(subset) == "urate"] <- "unempl_rate"
Field descriptions:
name - Name of deceased
age - Age of deceased
gender - Gender of deceased
raceethnicity - Race/ethnicity of deceased
city - City where incident occurred
state - State where incident occurred
cause - Cause of death
armed - How/whether deceased was armed
share_white - Share of pop that is non-Hispanic white
share_black - Share of pop that is black (alone, not in combination)
share_hispanic - Share of pop that is Hispanic/Latino (any race)
h_income - Tract-level median household income
county_income - County-level median household income
pov - Tract-level poverty rate (official)
urate - Tract-level unemployment rate
The gender of the most victims
ggplot(subset) +
aes(x = gender) +
geom_bar(fill='lightgreen') +
labs(x = "Gender of deceased", y = "Amount of deaths", title = "Kills by gender") +
theme_light() +
theme(plot.title = element_text(hjust = 0.5))
The race of the most victims
ggplot(subset) +
aes(x = ethnicity) +
geom_bar(fill='lightblue') +
labs(x = "Race/ethnicity of deceased", y = "Amount of deaths", title = "Kills by ethnicity") +
theme_light() +
theme(plot.title = element_text(hjust = 0.5))
If a deceased was armed or not. If yes, with what weapon
ggplot(subset) +
aes(x = armed) +
geom_bar(fill='pink') +
labs(x = "Weapon?", y = "Amount of deaths", title = "Deceased was armed or not") +
theme_light() +
theme(plot.title = element_text(hjust = 0.5))
Income of the victims
ggplot(subset) +
aes(x = house_income) +
geom_histogram(bins=30L,fill='blue') +
labs(x = "Median household income", y = "Amount of deaths", title = "Income of the victims") +
theme_light() +
theme(plot.title = element_text(hjust = 0.5))
The article contained the analysis based only on the race and income of the neighborhood but didn’t include the gender of the victims or their age. It is important to know that mostly men are killed by the police. Do they commit more crime? More men are armed? Or they are more attracted by the police to be accidentally killed? So there are questions to be answered, maybe the article could do more research on these questions. It would be also interesting to analyze the common age of the victims. The research in the article is closely connected to the crime level. As a result, the results can be used to better understand the nature of crime and how to prevent it.