1. Introduction

The article is called “Where Police Have Killed Americans In 2015” (https://fivethirtyeight.com/features/where-police-have-killed-americans-in-2015/). It contains the description and analysis of a database of Americans killed by police since the start of 2015.
The data were provided by FiveThirtyEight (https://github.com/fivethirtyeight/data/tree/master/police-killings). They used the Guardian’s database and the Census as their sources.

2. Data

2.1 Load data and summary of the data

data <- read.csv("https://raw.githubusercontent.com/ex-pr/Data607-Week-2/main/police_killings.csv", header=TRUE, sep=",")

Dimension 467x34, looking at the dataset and checking which columns have missing values (maybe to remove them)

dim(data)
## [1] 467  34
summary(data)
##      name               age               gender          raceethnicity     
##  Length:467         Length:467         Length:467         Length:467        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##     month                day             year      streetaddress     
##  Length:467         Min.   : 1.00   Min.   :2015   Length:467        
##  Class :character   1st Qu.: 8.00   1st Qu.:2015   Class :character  
##  Mode  :character   Median :16.00   Median :2015   Mode  :character  
##                     Mean   :15.83   Mean   :2015                     
##                     3rd Qu.:23.00   3rd Qu.:2015                     
##                     Max.   :31.00   Max.   :2015                     
##                                                                      
##      city              state              latitude       longitude      
##  Length:467         Length:467         Min.   :19.92   Min.   :-159.64  
##  Class :character   Class :character   1st Qu.:33.34   1st Qu.:-111.95  
##  Mode  :character   Mode  :character   Median :35.77   Median : -94.76  
##                                        Mean   :36.40   Mean   : -96.97  
##                                        3rd Qu.:39.94   3rd Qu.: -82.96  
##                                        Max.   :61.22   Max.   : -68.10  
##                                                                         
##     state_fp       county_fp         tract_ce          geo_id         
##  Min.   : 1.00   Min.   :  1.00   Min.   :   100   Min.   :1.003e+09  
##  1st Qu.: 8.00   1st Qu.: 29.00   1st Qu.:  5202   1st Qu.:8.022e+09  
##  Median :24.00   Median : 63.00   Median : 40200   Median :2.403e+10  
##  Mean   :25.34   Mean   : 91.58   Mean   :236937   Mean   :2.543e+10  
##  3rd Qu.:40.00   3rd Qu.:111.00   3rd Qu.:378450   3rd Qu.:4.011e+10  
##  Max.   :56.00   Max.   :740.00   Max.   :980000   Max.   :5.601e+10  
##                                                                       
##    county_id       namelsad         lawenforcementagency    cause          
##  Min.   : 1003   Length:467         Length:467           Length:467        
##  1st Qu.: 8022   Class :character   Class :character     Class :character  
##  Median :24033   Mode  :character   Mode  :character     Mode  :character  
##  Mean   :25434                                                             
##  3rd Qu.:40112                                                             
##  Max.   :56005                                                             
##                                                                            
##     armed                pop        share_white        share_black       
##  Length:467         Min.   :    0   Length:467         Length:467        
##  Class :character   1st Qu.: 3358   Class :character   Class :character  
##  Mode  :character   Median : 4447   Mode  :character   Mode  :character  
##                     Mean   : 4784                                        
##                     3rd Qu.: 5816                                        
##                     Max.   :26826                                        
##                                                                          
##  share_hispanic       p_income            h_income      county_income   
##  Length:467         Length:467         Min.   : 10290   Min.   : 22545  
##  Class :character   Class :character   1st Qu.: 32625   1st Qu.: 43804  
##  Mode  :character   Mode  :character   Median : 42759   Median : 50856  
##                                        Mean   : 46627   Mean   : 52527  
##                                        3rd Qu.: 56190   3rd Qu.: 56832  
##                                        Max.   :142500   Max.   :110292  
##                                        NA's   :2                        
##   comp_income     county_bucket     nat_bucket        pov           
##  Min.   :0.1840   Min.   :1.000   Min.   :1.000   Length:467        
##  1st Qu.:0.6454   1st Qu.:1.000   1st Qu.:1.000   Class :character  
##  Median :0.8696   Median :2.000   Median :2.000   Mode  :character  
##  Mean   :0.8959   Mean   :2.498   Mean   :2.497                     
##  3rd Qu.:1.0815   3rd Qu.:4.000   3rd Qu.:3.000                     
##  Max.   :2.8652   Max.   :5.000   Max.   :5.000                     
##  NA's   :2        NA's   :27      NA's   :2                         
##      urate            college       
##  Min.   :0.01133   Min.   :0.01355  
##  1st Qu.:0.06859   1st Qu.:0.10617  
##  Median :0.10518   Median :0.16954  
##  Mean   :0.11740   Mean   :0.22022  
##  3rd Qu.:0.14083   3rd Qu.:0.28454  
##  Max.   :0.50761   Max.   :0.82807  
##  NA's   :2         NA's   :2
head(data, n=3)
##                 name age gender raceethnicity    month day year
## 1 A'donte Washington  16   Male         Black February  23 2015
## 2     Aaron Rutledge  27   Male         White    April   2 2015
## 3        Aaron Siler  26   Male         White    March  14 2015
##            streetaddress      city state latitude longitude state_fp county_fp
## 1           Clearview Ln Millbrook    AL 32.52958 -86.36283        1        51
## 2 300 block Iris Park Dr Pineville    LA 31.32174 -92.43486       22        79
## 3   22nd Ave and 56th St   Kenosha    WI 42.58356 -87.83571       55        59
##   tract_ce      geo_id county_id            namelsad
## 1    30902  1051030902      1051 Census Tract 309.02
## 2    11700 22079011700     22079    Census Tract 117
## 3     1200 55059001200     55059     Census Tract 12
##              lawenforcementagency   cause armed  pop share_white share_black
## 1     Millbrook Police Department Gunshot    No 3779        60.5        30.5
## 2 Rapides Parish Sheriff's Office Gunshot    No 2769        53.8        36.2
## 3       Kenosha Police Department Gunshot    No 4079        73.8         7.7
##   share_hispanic p_income h_income county_income comp_income county_bucket
## 1            5.6    28375    51367         54766   0.9379359             3
## 2            0.5    14678    27972         40930   0.6834107             2
## 3           16.8    25286    45365         54930   0.8258693             2
##   nat_bucket  pov      urate   college
## 1          3 14.1 0.09768638 0.1685095
## 2          1 28.8 0.06572379 0.1114024
## 3          3 14.6 0.16629314 0.1473123

2.2 Creating a subset

I decided to check the portrait of a person who most likely to be killed by the police.
Choosing only columns that look important:

subset <- data[,c('name','age','gender','raceethnicity','city','state','cause','armed','share_white','share_black','share_hispanic','h_income','county_income','pov','urate')]

Renaming the dataset columns:

names(subset)[names(subset) == "raceethnicity"] <- "ethnicity"
names(subset)[names(subset) == "share_white"] <- "%white"
names(subset)[names(subset) == "share_black"] <- "%black"
names(subset)[names(subset) == "share_hispanic"] <- "%hispanic"
names(subset)[names(subset) == "h_income"] <- "house_income"
names(subset)[names(subset) == "pov"] <- "pov_rate"
names(subset)[names(subset) == "urate"] <- "unempl_rate"

Field descriptions:
name - Name of deceased
age - Age of deceased
gender - Gender of deceased
raceethnicity - Race/ethnicity of deceased
city - City where incident occurred
state - State where incident occurred
cause - Cause of death
armed - How/whether deceased was armed
share_white - Share of pop that is non-Hispanic white
share_black - Share of pop that is black (alone, not in combination)
share_hispanic - Share of pop that is Hispanic/Latino (any race)
h_income - Tract-level median household income
county_income - County-level median household income
pov - Tract-level poverty rate (official)
urate - Tract-level unemployment rate

2.3 Exploratory data analysis

The gender of the most victims

ggplot(subset) +
 aes(x = gender) +
 geom_bar(fill='lightgreen') +
 labs(x = "Gender of deceased", y = "Amount of deaths", title = "Kills by gender") +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5))


The race of the most victims

ggplot(subset) +
 aes(x = ethnicity) +
 geom_bar(fill='lightblue') +
 labs(x = "Race/ethnicity of deceased", y = "Amount of deaths", title = "Kills by ethnicity") +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5))


If a deceased was armed or not. If yes, with what weapon

ggplot(subset) +
 aes(x = armed) +
 geom_bar(fill='pink') +
 labs(x = "Weapon?", y = "Amount of deaths", title = "Deceased was armed or not") +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5))


Income of the victims

ggplot(subset) +
 aes(x = house_income) +
 geom_histogram(bins=30L,fill='blue') +
 labs(x = "Median household income", y = "Amount of deaths", title = "Income of the victims") +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5))

3. Conclusions

The article contained the analysis based only on the race and income of the neighborhood but didn’t include the gender of the victims or their age. It is important to know that mostly men are killed by the police. Do they commit more crime? More men are armed? Or they are more attracted by the police to be accidentally killed? So there are questions to be answered, maybe the article could do more research on these questions. It would be also interesting to analyze the common age of the victims. The research in the article is closely connected to the crime level. As a result, the results can be used to better understand the nature of crime and how to prevent it.