Fatal Police Shootings

We will be walking plotting the data of the fatal police shootings using different r packages. We will create different visualizations and summarize our findings. Below is a quick view of the data, note the names of each column and the summary of each variable, followed by the entire data table.

a)

coul1 <- brewer.pal(4, "Set1") #create color palettes
coul2 <- brewer.pal(4, "Set2") 
coul3 <- brewer.pal(4, "Set3") 

names(Shoot)
##  [1] "id"                      "name"                   
##  [3] "date"                    "manner_of_death"        
##  [5] "armed"                   "age"                    
##  [7] "gender"                  "race"                   
##  [9] "city"                    "state"                  
## [11] "signs_of_mental_illness" "threat_level"           
## [13] "flee"                    "body_camera"            
## [15] "longitude"               "latitude"               
## [17] "is_geocoding_exact"
summary(Shoot)
##        id           name               date           manner_of_death   
##  Min.   :   3   Length:8002        Length:8002        Length:8002       
##  1st Qu.:2240   Class :character   Class :character   Class :character  
##  Median :4446   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :4415                                                           
##  3rd Qu.:6580                                                           
##  Max.   :8696                                                           
##                                                                         
##     armed                age           gender              race          
##  Length:8002        Min.   : 2.00   Length:8002        Length:8002       
##  Class :character   1st Qu.:27.00   Class :character   Class :character  
##  Mode  :character   Median :35.00   Mode  :character   Mode  :character  
##                     Mean   :37.21                                        
##                     3rd Qu.:45.00                                        
##                     Max.   :92.00                                        
##                     NA's   :503                                          
##      city              state           signs_of_mental_illness
##  Length:8002        Length:8002        Length:8002            
##  Class :character   Class :character   Class :character       
##  Mode  :character   Mode  :character   Mode  :character       
##                                                               
##                                                               
##                                                               
##                                                               
##  threat_level           flee           body_camera          longitude      
##  Length:8002        Length:8002        Length:8002        Min.   :-160.01  
##  Class :character   Class :character   Class :character   1st Qu.:-112.03  
##  Mode  :character   Mode  :character   Mode  :character   Median : -94.31  
##                                                           Mean   : -97.04  
##                                                           3rd Qu.: -83.15  
##                                                           Max.   : -67.87  
##                                                           NA's   :840      
##     latitude     is_geocoding_exact
##  Min.   :19.50   Length:8002       
##  1st Qu.:33.48   Class :character  
##  Median :36.10   Mode  :character  
##  Mean   :36.68                     
##  3rd Qu.:40.03                     
##  Max.   :71.30                     
##  NA's   :840
datatable(Shoot)

Map of Each Person Killed by Police

b)

#Filter for only 2020 data

 library(data.table)
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:reshape2':
## 
##     dcast, melt
## The following objects are masked from 'package:lubridate':
## 
##     hour, isoweek, mday, minute, month, quarter, second, wday, week,
##     yday, year
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
## The following object is masked from 'package:purrr':
## 
##     transpose
 dt<-setDT(Shoot)[date %between% c('2020-01-01', '2020-12-31')]
head(dt)
##      id                           name       date  manner_of_death      armed
## 1: 5344              Derrick A. Elseth 2020-01-01             shot        gun
## 2: 5347    Teddy James Maverick Varner 2020-01-01             shot        gun
## 3: 5403 Gerardo Antonio Conchas-Bustas 2020-01-01             shot      knife
## 4: 5342             Gabriel Strickland 2020-01-01 shot and Tasered toy weapon
## 5: 5339           Jeffery Dale Millsap 2020-01-02             shot        gun
## 6: 5348                  Stanley Hayes 2020-01-02             shot        gun
##    age gender race            city state signs_of_mental_illness threat_level
## 1:  24      M    W Richmond County    VA                   False        other
## 2:  29      M    W   Central Point    OR                    True       attack
## 3:  20      M    H          Denver    CO                   False       attack
## 4:  25      M    W    Grass Valley    CA                    True       attack
## 5:  26      M    W            Holt    MO                   False       attack
## 6:  69      M    W       Hillsboro    OR                    True       attack
##           flee body_camera longitude latitude is_geocoding_exact
## 1: Not fleeing       False   -76.875   38.100               True
## 2: Not fleeing        True  -122.993   42.418               True
## 3: Not fleeing        True  -105.030   39.700               True
## 4: Not fleeing       False  -121.061   39.219               True
## 5:         Car       False   -94.345   39.404               True
## 6: Not fleeing       False  -122.870   45.521               True
leaflet(data = NULL) %>%
  addTiles() %>%  # use the default base map which is OpenStreetMap tiles
  addMarkers(lng = dt$longitude,    # Longitudes
             lat = dt$latitude,    # Latitudes
             popup=dt$name) #name of individual

The Distribution of People Killed by Race

c)

Shoot[Shoot ==""]<- "Unknown"  #replace all blanks with Unknown

p <- ggplot(Shoot, aes(x = race, fill = race)) +
  geom_bar()+
  labs(x = "Race", 
       y = "Frequency", 
       title = "People Killed by Police Based on Race",
       caption = "Voronyak 2023") + 
  theme(plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5))

p 

Based on the data given we can see that W, which stands for White. This should make sense because the majority of the US population is white (roughly 75%). Next in line in B for black and they are about half of what W is, but note they are only about 13.6% of the population. This means the proportion for black Americans being killed is higher than white Americans. Also, note that H for Hispanic is also very high, but they only about 3% of the population. Thus the proportion of Hispanic people being killed by police appears to be the highest. ~Note these percents were found by a basic Google search

The Distribution of People Killed by Gender

d)

p <- ggplot(Shoot, aes(x = gender, fill = gender)) +
  geom_bar()+
  labs(x = "Gender", 
       y = "Frequency", 
       title = "People Killed by Police Based on Gender",
       caption = "Voronyak 2023") + 
  theme(plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5))

p 

It appears the majority of fatalities are males by a large margin.

The Distribution of People Killed by Threat Level

e)

p <- ggplot(Shoot, aes(x = threat_level, fill = threat_level)) +
  geom_bar()+
  labs(x = "Threat Level", 
       y = "Frequency", 
       title = "People Killed by Police Based on Their Threat Level",
       caption = "Voronyak 2023") + 
  theme(plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5))

p 

Most of the killings occurred when the police officer was attacked. The “other” category is quite large and very nondescript, thus no good conclusions.

The Distribution of People Killed Based on How they Fled

f)

p <- ggplot(Shoot, aes(x = flee, fill = flee)) +
  geom_bar()+
  labs(x = "Flee", 
       y = "Frequency", 
       title = "People Killed by Police Based on How they Fled",
       caption = "Voronyak 2023") + 
  theme(plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5))

p 

p <- ggplot(Shoot, aes(x = flee, fill = threat_level)) +
  geom_bar()+
  labs(x = "Flee", 
       y = "Frequency", 
       title = "People Killed by Police Based on How they Fled",
       caption = "Voronyak 2023") + 
  theme(plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5))

p 

The majority of people whom were killed did not flee. Looking back the threat level, this would make sense because they were attacking. Also, most of the people who did flee were using a vehicle.

The Distribution of People Killed Based on Age

g)

ggplot(Shoot, aes(x = age)) +
  geom_density(color = "red")+
  labs(title = "Fatalities Based on Age", x="Age", y= "Density") 

ggplot(Shoot, aes(x = age)) +
  geom_histogram(bins = 10)+
  geom_histogram(bins=10,fill=1:10)

It appears that most of the fatalities were in the range of 20 to 40 years of age. It appears that if you are a younger adult you may either pose as more threatening or you are more likely to break the law?

The Distribution of People Killed Based on How They Were Armed

h)

t = table(Shoot$armed)
pt = prop.table(t)
sort <- pt %>%                
  as.data.frame() %>% 
  arrange(desc(Freq))
sort$filtered = ifelse(sort$Freq > 0.05, sort$Var1, "other")
sort[1,3] <- "gun"; sort[2,3] = "knife"; sort[3,3] = "unarmed"
#sort
ggplot(sort, aes(x = "", y = Freq, fill = filtered)) +
  geom_col() +
  coord_polar(theta = "y")+
  labs(x ="", y = "", fill = "Weapon")

The main three weapons that were used were a gun, a knife, and then unarmed. I consolidated all the of the minor weapons into an “other” category, otherwise there were 100+ different categories.

The Top 10 cities that have the most victims

i)

  t = table(Shoot$city)
sort2 <- t %>%                
  as.data.frame() %>% 
  arrange(desc(Freq))
top10 = sort2[1:10,]
ggplot(top10, aes(x = Var1, y = Freq)) + 
  geom_bar(stat = "identity", color = "black", fill = "grey") +
  labs(title = "Frequency by City\n", x = "\nCity", y = "Frequency\n") +
  theme_classic()

The top cities for fatalities by police are Los Angeles, Phoenix, and Houston. LA makes sense because it is a very large city, You would think that New York would be up there.