We will be walking plotting the data of the fatal police shootings using different r packages. We will create different visualizations and summarize our findings. Below is a quick view of the data, note the names of each column and the summary of each variable, followed by the entire data table.
coul1 <- brewer.pal(4, "Set1") #create color palettes
coul2 <- brewer.pal(4, "Set2")
coul3 <- brewer.pal(4, "Set3")
names(Shoot)
## [1] "id" "name"
## [3] "date" "manner_of_death"
## [5] "armed" "age"
## [7] "gender" "race"
## [9] "city" "state"
## [11] "signs_of_mental_illness" "threat_level"
## [13] "flee" "body_camera"
## [15] "longitude" "latitude"
## [17] "is_geocoding_exact"
summary(Shoot)
## id name date manner_of_death
## Min. : 3 Length:8002 Length:8002 Length:8002
## 1st Qu.:2240 Class :character Class :character Class :character
## Median :4446 Mode :character Mode :character Mode :character
## Mean :4415
## 3rd Qu.:6580
## Max. :8696
##
## armed age gender race
## Length:8002 Min. : 2.00 Length:8002 Length:8002
## Class :character 1st Qu.:27.00 Class :character Class :character
## Mode :character Median :35.00 Mode :character Mode :character
## Mean :37.21
## 3rd Qu.:45.00
## Max. :92.00
## NA's :503
## city state signs_of_mental_illness
## Length:8002 Length:8002 Length:8002
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## threat_level flee body_camera longitude
## Length:8002 Length:8002 Length:8002 Min. :-160.01
## Class :character Class :character Class :character 1st Qu.:-112.03
## Mode :character Mode :character Mode :character Median : -94.31
## Mean : -97.04
## 3rd Qu.: -83.15
## Max. : -67.87
## NA's :840
## latitude is_geocoding_exact
## Min. :19.50 Length:8002
## 1st Qu.:33.48 Class :character
## Median :36.10 Mode :character
## Mean :36.68
## 3rd Qu.:40.03
## Max. :71.30
## NA's :840
datatable(Shoot)
#Filter for only 2020 data
library(data.table)
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:reshape2':
##
## dcast, melt
## The following objects are masked from 'package:lubridate':
##
## hour, isoweek, mday, minute, month, quarter, second, wday, week,
## yday, year
## The following objects are masked from 'package:dplyr':
##
## between, first, last
## The following object is masked from 'package:purrr':
##
## transpose
dt<-setDT(Shoot)[date %between% c('2020-01-01', '2020-12-31')]
head(dt)
## id name date manner_of_death armed
## 1: 5344 Derrick A. Elseth 2020-01-01 shot gun
## 2: 5347 Teddy James Maverick Varner 2020-01-01 shot gun
## 3: 5403 Gerardo Antonio Conchas-Bustas 2020-01-01 shot knife
## 4: 5342 Gabriel Strickland 2020-01-01 shot and Tasered toy weapon
## 5: 5339 Jeffery Dale Millsap 2020-01-02 shot gun
## 6: 5348 Stanley Hayes 2020-01-02 shot gun
## age gender race city state signs_of_mental_illness threat_level
## 1: 24 M W Richmond County VA False other
## 2: 29 M W Central Point OR True attack
## 3: 20 M H Denver CO False attack
## 4: 25 M W Grass Valley CA True attack
## 5: 26 M W Holt MO False attack
## 6: 69 M W Hillsboro OR True attack
## flee body_camera longitude latitude is_geocoding_exact
## 1: Not fleeing False -76.875 38.100 True
## 2: Not fleeing True -122.993 42.418 True
## 3: Not fleeing True -105.030 39.700 True
## 4: Not fleeing False -121.061 39.219 True
## 5: Car False -94.345 39.404 True
## 6: Not fleeing False -122.870 45.521 True
leaflet(data = NULL) %>%
addTiles() %>% # use the default base map which is OpenStreetMap tiles
addMarkers(lng = dt$longitude, # Longitudes
lat = dt$latitude, # Latitudes
popup=dt$name) #name of individual
Shoot[Shoot ==""]<- "Unknown" #replace all blanks with Unknown
p <- ggplot(Shoot, aes(x = race, fill = race)) +
geom_bar()+
labs(x = "Race",
y = "Frequency",
title = "People Killed by Police Based on Race",
caption = "Voronyak 2023") +
theme(plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5))
p
Based on the data given we can see that W, which stands for White. This
should make sense because the majority of the US population is white
(roughly 75%). Next in line in B for black and they are about half of
what W is, but note they are only about 13.6% of the population. This
means the proportion for black Americans being killed is higher than
white Americans. Also, note that H for Hispanic is also very high, but
they only about 3% of the population. Thus the proportion of Hispanic
people being killed by police appears to be the highest. ~Note
these percents were found by a basic Google search
p <- ggplot(Shoot, aes(x = gender, fill = gender)) +
geom_bar()+
labs(x = "Gender",
y = "Frequency",
title = "People Killed by Police Based on Gender",
caption = "Voronyak 2023") +
theme(plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5))
p
It appears the majority of fatalities are males by a large margin.
p <- ggplot(Shoot, aes(x = threat_level, fill = threat_level)) +
geom_bar()+
labs(x = "Threat Level",
y = "Frequency",
title = "People Killed by Police Based on Their Threat Level",
caption = "Voronyak 2023") +
theme(plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5))
p
Most of the killings occurred when the police officer was attacked. The “other” category is quite large and very nondescript, thus no good conclusions.
p <- ggplot(Shoot, aes(x = flee, fill = flee)) +
geom_bar()+
labs(x = "Flee",
y = "Frequency",
title = "People Killed by Police Based on How they Fled",
caption = "Voronyak 2023") +
theme(plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5))
p
p <- ggplot(Shoot, aes(x = flee, fill = threat_level)) +
geom_bar()+
labs(x = "Flee",
y = "Frequency",
title = "People Killed by Police Based on How they Fled",
caption = "Voronyak 2023") +
theme(plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5))
p
The majority of people whom were killed did not flee. Looking back the threat level, this would make sense because they were attacking. Also, most of the people who did flee were using a vehicle.
ggplot(Shoot, aes(x = age)) +
geom_density(color = "red")+
labs(title = "Fatalities Based on Age", x="Age", y= "Density")
ggplot(Shoot, aes(x = age)) +
geom_histogram(bins = 10)+
geom_histogram(bins=10,fill=1:10)
It appears that most of the fatalities were in the range of 20 to 40 years of age. It appears that if you are a younger adult you may either pose as more threatening or you are more likely to break the law?
t = table(Shoot$armed)
pt = prop.table(t)
sort <- pt %>%
as.data.frame() %>%
arrange(desc(Freq))
sort$filtered = ifelse(sort$Freq > 0.05, sort$Var1, "other")
sort[1,3] <- "gun"; sort[2,3] = "knife"; sort[3,3] = "unarmed"
#sort
ggplot(sort, aes(x = "", y = Freq, fill = filtered)) +
geom_col() +
coord_polar(theta = "y")+
labs(x ="", y = "", fill = "Weapon")
The main three weapons that were used were a gun, a knife, and then unarmed. I consolidated all the of the minor weapons into an “other” category, otherwise there were 100+ different categories.
t = table(Shoot$city)
sort2 <- t %>%
as.data.frame() %>%
arrange(desc(Freq))
top10 = sort2[1:10,]
ggplot(top10, aes(x = Var1, y = Freq)) +
geom_bar(stat = "identity", color = "black", fill = "grey") +
labs(title = "Frequency by City\n", x = "\nCity", y = "Frequency\n") +
theme_classic()
The top cities for fatalities by police are Los Angeles, Phoenix, and Houston. LA makes sense because it is a very large city, You would think that New York would be up there.