Importing Data

library(readr)
data <- read_csv("fatal_encounters.csv",show_col_types = FALSE)
## New names:
## • `` -> `...33`
## • `` -> `...34`
## Warning: One or more parsing issues, see `problems()` for details
View(data)

Visulization 1 - Visualizing Age of Victims

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(sf)
## Linking to GEOS 3.9.1, GDAL 3.4.3, PROJ 7.2.1; sf_use_s2() is TRUE
library(tmap)
library(tidyverse)
## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──
## ✔ tibble  3.1.8     ✔ stringr 1.4.1
## ✔ tidyr   1.2.0     ✔ forcats 0.5.2
## ✔ purrr   0.3.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
df1 <- data %>% group_by(Age) %>% summarise(total_count=n(),.groups = 'drop')

df2 <- df1 %>% mutate(Age_Range = cut(Age, breaks = seq(0, 100, by = 5))) 

plot <- ggplot(df2, aes(x = Age_Range,y = total_count,fill = Age))+
  geom_col(position = position_dodge()) + 
  labs(title = "Age Range of Police Fatalities in the U.S (1999-2021)", x= "Age Range", y ='Number of Fatalities' )+theme_minimal()+scale_x_discrete(guide = guide_axis(n.dodge=2))


plot

Visulization 2 - Visualizing the top 5 Status of the Case

colnames(data)[29] <- "Conclusion"
data_coords <- data[which(!is.na(data$Latitude) & !is.na(data$Longitude)),]
data_sf <- st_as_sf(data_coords, coords = c("Longitude", "Latitude"), crs = 4326)


df3<-data_sf %>% group_by(Conclusion) %>% summarise(total_count=n(),.groups = 'drop')

df3<-df3 %>% arrange(desc(total_count))
df4<-df3[1:5, ]



ggplot(df4, aes(x="", y= total_count, fill =Conclusion )) +
  geom_bar(stat="identity", width=1) +
  geom_col(width = 1) + 
  coord_polar(theta = "y") +
  xlab(NULL)+
  guides(fill = guide_legend(title = "Status"))+labs(title = "Status of Cases from 1999 - 2021" , y ='Total Count' )+scale_fill_brewer(palette="PiYG")

Visulization 3 - Visualizing Race of Victims

data_coords <- data[which(!is.na(data$Latitude) & !is.na(data$Longitude)),]
data_sf <- st_as_sf(data_coords, coords = c("Longitude", "Latitude"), crs = 4326)
data_race<- data_sf %>% subset(Race!="Race unspecified" )

data_race$Race[data_race$Race == 'european-American/White'] <- 'European-American/White'
data_race$Race[data_race$Race == 'European-American/European-American/White'] <- 'European-American/White'
data_race<-data_race[!(data_race$Race=="Christopher Anthony Alexander" | data_race$Race=="African-American/Black African-American/Black Not imputed"),]

st_sf(data_sf)
## Simple feature collection with 31496 features and 34 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -165.5919 ymin: 19.03468 xmax: -67.26603 ymax: 71.30125
## Geodetic CRS:  WGS 84
## # A tibble: 31,496 × 35
##    `Unique ID` Name     Age Gender Race  Race …¹ Imput…² URL o…³ Date …⁴ Locat…⁵
##  *       <dbl> <chr>  <dbl> <chr>  <chr> <chr>   <chr>   <chr>   <chr>   <chr>  
##  1       31495 Ashle…    28 Female Afri… Africa… Not im… https:… 12/31/… South …
##  2       31496 Name …    NA Female Race… <NA>    <NA>    <NA>    12/31/… 1500 2…
##  3       31497 Name …    NA Male   Race… <NA>    <NA>    <NA>    12/31/… 1500 2…
##  4       31491 Johnn…    36 Male   Race… <NA>    <NA>    <NA>    12/30/… Martin…
##  5       31492 Denni…    44 Male   Euro… <NA>    <NA>    <NA>    12/30/… 435 E …
##  6       31493 Ny'Da…    21 Male   Race… <NA>    <NA>    <NA>    12/30/… State …
##  7       31494 Timot…    50 Male   Euro… Europe… Not im… https:… 12/30/… Sykes …
##  8       31409 Name …    NA Male   Hisp… Hispan… Not im… <NA>    12/29/… Carneg…
##  9       31410 Name …    NA Female Hisp… Hispan… Not im… <NA>    12/29/… Carneg…
## 10       31465 Chris…    49 <NA>   Afri… Africa… Not im… https:… 12/29/… 1521 B…
## # … with 31,486 more rows, 25 more variables: `Location of death (city)` <chr>,
## #   State <chr>, `Location of death (zip code)` <dbl>,
## #   `Location of death (county)` <chr>, `Full Address` <chr>,
## #   `Agency or agencies involved` <chr>, `Highest level of force` <chr>,
## #   `UID Temporary` <dbl>, `Name Temporary` <chr>, `Armed/Unarmed` <chr>,
## #   `Alleged weapon` <chr>, `Aggressive physical movement` <chr>,
## #   `Fleeing/Not fleeing` <chr>, `Description Temp` <chr>, `URL Temp` <chr>, …
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(data_race)+ tm_dots(col = "Race")

Visulization 4 - Fatalities Over Time

colnames(data)[9] <- "date"

df5<-data %>%
  separate(date, c("day", "month", "year"), "/")

df6<-df5 %>% group_by(year) %>% summarise(total_count=n(),.groups = 'drop')

ggplot(data=df6, aes(x=year, y=total_count, group = 1)) +
  geom_line(color="#AA336A")+
  geom_point()+geom_smooth(method=lm, se=FALSE, col='purple')+ labs(title = "Number of Fatalities Over 10 years (1999-2021)", x= "Year", y ='Number of Fatalities')+scale_x_discrete(guide = guide_axis(n.dodge=2))
## `geom_smooth()` using formula 'y ~ x'

I wanted to give a general overview of the fatalities over a ten year period starting with the age of victims. The highest number of fatalities occur in the 20-25 year old age range. However, it is still very concerning how high the 15-20 age range is at about 750. Following the 20-25 age range there is a decline,however, fatalities stay above 500 until the 45- 50 age range. Losing so many people at the prime of their life is devastating. Visualizing the age of victims puts into perspective that this could happen to anyone. Following this I wanted to find the status of these cases. The majority of them are either unreported or still pending investigation. The third largest outcome is that the fatality was justified, and the fourth largest is that the death was marked as a suicide. The last and smallest outcome is that the fatality was charged as a crime. This pie chart shows that the probability of having a police killing be blamed on the cop is very low, and it is far more likely that the death is completely overlooked or incorrectly categorized as a suicide. The third visualization is a map depicting the race of victims across the United States. Despite the fact that African American and Hispanic individuals make up roughly 30 percent of the population, they are very prominent on the map. The map also has stories within itself. For example, in Texas there is a higher amount of Hispanic/Latino fatalities. This is because Texas borders Mexico, and it is possible that a lot of these deaths are caused by ICE or other agencies targeting that specific group. Lastly, I think it is important to acknowledge that this problem is only getting worse, and the final plot is the perfect way to do so. Over the last 10 years the amount of fatalities has increased almost consistently. In visualization two we found that the majority of these cases are still pending investigation or completely overlooked. With the majority of the victims being so young, and many being within minority communities, it is important to use data like this to implement policy measures that protect the people from police violence.