Data 110-FinalProject_Police Fatalaties

The Data

In 2015, The Washington Post began to log every fatal shooting by an on-duty police officer in the United States. In that time there have been more than 5,400 such shootings recorded by The Post. The Post’s data relies primarily on news accounts, social media postings and police reports. In this notebook we used the most current dataset (taken directly from The Post).The dataset is a tabular data, mainly constructed of categorical attributes.

Since 2015, The Washington Post has recorded dozens of details about each killing:

id: unique id of the person name: first, middle, and last name of the person date the date of the fatal shooting manner_of_death: description of whether they were Beaten or Shot armed: whether the person killed was armed age: person’s age gender: Male or Female race: includes the race of the deceased city: location of city area state: location of state threat level: the circumstances of the shooting signs_of_mental_illness: whether the victim was experiencing a mental-health crisis flee: were they running away by Foot, Not Fleeing, Car, Other body_camera: was the victim recorded by a body camera

I get the data from the github repository maintained by Washington Post: https://github.com/washingtonpost/data-police-shootings

Article : https://www.washingtonpost.com/graphics/investigations/police-shootings-database/

Installing libraries

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(plotly)

## Loading required package: ggplot2

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library(ggplot2)
library(ggthemes)
library(scales)
library(knitr)
library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ tibble  3.0.6     ✓ purrr   0.3.4
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x lubridate::as.difftime() masks base::as.difftime()
## x readr::col_factor()      masks scales::col_factor()
## x lubridate::date()        masks base::date()
## x purrr::discard()         masks scales::discard()
## x plotly::filter()         masks dplyr::filter(), stats::filter()
## x lubridate::intersect()   masks base::intersect()
## x dplyr::lag()             masks stats::lag()
## x lubridate::setdiff()     masks base::setdiff()
## x lubridate::union()       masks base::union()

library(highcharter)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(viridisLite)
library(xts)

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## 
## Attaching package: 'xts'

## The following objects are masked from 'package:dplyr':
## 
##     first, last

library(viridis)

## 
## Attaching package: 'viridis'

## The following object is masked from 'package:scales':
## 
##     viridis_pal

library(rvest)

## 
## Attaching package: 'rvest'

## The following object is masked from 'package:readr':
## 
##     guess_encoding

library(ggforce)
library(ggmap)

## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.

## Please cite ggmap if you use it! See citation("ggmap") for details.

## 
## Attaching package: 'ggmap'

## The following object is masked from 'package:plotly':
## 
##     wind

library(usmap)
library(waffle)
library(RColorBrewer)
library(lubridate)

Reading the CSV

setwd("~/Desktop/Pankti _ Data Science")
Police_Shooting_data <- read_csv("~/Desktop/Pankti _ Data Science/fatal-police-shootings-data 2015-2021.csv")

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   ID = col_double(),
##   Name = col_character(),
##   Date = col_character(),
##   Manner_of_Death = col_character(),
##   Armed = col_character(),
##   Age = col_double(),
##   Gender = col_character(),
##   Race = col_character(),
##   City = col_character(),
##   `State ID` = col_character(),
##   Signs_of_Mental_Illness = col_logical(),
##   Threat_Level = col_character(),
##   Flee = col_character(),
##   Body_Camera = col_logical(),
##   Longitude = col_double(),
##   Latitude = col_double(),
##   Is_geocoding_exact = col_logical(),
##   State = col_character()
## )

Manipulating the data

Police_Shooting_data$Date <- as.POSIXct(Police_Shooting_data$Date,format="%m/%d/%y") #Function to manipulate objects of classes "POSIXlt" and "POSIXct" representing calendar dates and times.
Police_Shooting_data$year <- year(Police_Shooting_data$Date) #Creating Year 
Police_Shooting_data$day <- wday(Police_Shooting_data$Date)#Creating Weekday 
Police_Shooting_data$day <- as.factor(Police_Shooting_data$day)# Converting to factor
Police_Shooting_data$month <- month(Police_Shooting_data$Date) #Creating Month
Police_Shooting_data$month <- factor(month(Police_Shooting_data$Date))
Police_Shooting_data$armed_logic <- ifelse(Police_Shooting_data$Armed=="unarmed" | Police_Shooting_data$Armed=="undetermined"| Police_Shooting_data$Armed=="unknown weapon","unarmed","armed")# Creating categories for NA
Police_Shooting_data$armed_logic <- as.factor(Police_Shooting_data$armed_logic) #Converting to factor
Police_Shooting_data$Race[Police_Shooting_data$Race == "W"] <- "White" #Generating the Race Desc
Police_Shooting_data$Race[Police_Shooting_data$Race == "B"] <- "Black"
Police_Shooting_data$Race[Police_Shooting_data$Race == "H"] <- "Hispanic/Latino"
Police_Shooting_data$Race[Police_Shooting_data$Race == "O"] <- "Other"
Police_Shooting_data$Race[Police_Shooting_data$Race == "A"] <- "Asian/Pacific Islander"
Police_Shooting_data$Race[Police_Shooting_data$Race == "N"] <- "Native American"

Replacing the missing values

# Now lets replace the missing values with "Unrecorded" in our dataset

levels(Police_Shooting_data$Gender)[levels(Police_Shooting_data$Gender)== ""] = "Unrecorded"
levels(Police_Shooting_data$Flee)[levels(Police_Shooting_data$Flee)== ""] = "Unrecorded"
levels(Police_Shooting_data$Armed)[levels(Police_Shooting_data$Armed)== ""] = "Unrecorded"


# Replacing the missing values in the age column with the mean

table(is.na(Police_Shooting_data$Age)) # --> 235 missing values

## 
## FALSE  TRUE 
##  5961   280

Police_Shooting_data[is.na(Police_Shooting_data$Age) == TRUE, "age"] = round(mean(Police_Shooting_data$Age, na.rm = TRUE))

###Now lets starts exploring the dataset for the analysis:

First I will start with plotting the time series to give the overall ideas abut the count of shooting per year. I like this tool as it can show the cumulative values as well as it can give give the vales of the individual years depending on the date range selected.

Plotting Time series for police shooting

Police_Shooting_data$Date <- as.Date(Police_Shooting_data$Date, "%m/%d/%Y ")

## Warning in as.POSIXlt.POSIXct(x, tz = tz): unknown timezone '%m/%d/%Y '

Police_Shooting_data %>% 
  group_by(Date) %>% 
  summarise(Total = n())-> by_Date 

tseries <- xts(by_Date$Total, order.by=as.POSIXct(by_Date$Date))

Time series plot of Deaths on account of police shooting.(Using highcharter)

hchart(tseries, name = "Police Shootings") %>%
  hc_add_theme(hc_theme_538()) %>%
  hc_credits(enabled = TRUE, text = "Data Source : Washington Post", style = list(fontSize = "13px")) %>%
  hc_title(text = "Times Series plot:Deaths by Police Shooting in usa 2015- ") %>%
  hc_legend(enabled = TRUE)

I created this time series to show the difference in the shootings happened since 2015-2021. In 2015 the highest shootings happened in July and December In 2016 the highest shootings happened in January and December(which is strange starting and ending of the year) In 2017 the highest shootings happened in January,February,July and December In 2018 the highest shootings happened in January,February,April and July(The numbers are reducing after July) In 2019 the the highest shootings happened in January and June In 2020 the highest shootings happened in May and July(completely different Month) In 2021 the highest shootings happened in February.

Outcome - Looking at the years January is the most common month where the highest shootings have taken place.

Now lets check the rate of casualities based on the Race - Percentage of casualities by Race

RaceCasualties<-as.data.frame(table(Police_Shooting_data$Race)) %>%arrange(Freq)
names(RaceCasualties)<-c('Race','Freq')

RaceCasualties <- RaceCasualties %>% mutate(Race=factor(Race,levels=rev(Race))) 

options(repr.plot.width = 14, repr.plot.height = 8)

rc<-ggplot(RaceCasualties,aes(x=Race,y=Freq/sum(Freq),fill=Race))+geom_bar(aes(y=Freq/sum(Freq)),stat="identity",color="black")+
scale_y_continuous(labels=percent)+labs(y='Percentage of casualties',x="Race",title="Percentage of casualities by Race",caption="Data from Washington Post")+geom_text(aes(label=percent(Freq/sum(Freq))),vjust=-.5)+theme_bw()+theme(legend.position="none")+theme(axis.text.x=element_text(angle=45,hjust=0.5,vjust=0.5))+
 theme(axis.text = element_text(face = "bold"),
                 plot.title = element_text(size = 18, face = "bold"),
                 axis.title = element_text(face = "bold", size = 14))+scale_fill_brewer(palette="Accent")
rc

fig<-ggplotly(rc)
fig

Statistics shows that 50%+ of the casualties is for White Race. Then second group is black race with 25%+ casualties.Here we need to keep in mind that White race comprises of 60% of the population and Black race has the population of 13.4%. I gather this data from(https://www.census.gov/quickfacts/fact/table/US/PST045219). The third group which has increasing number of shotting is Hispanic/Latinowhich only comprises of(18%) of puopulation.

Now lets check the age of the victims

Graph - Total Victims Killed at The Age of

us_shootings_age_group = Police_Shooting_data %>% 
 group_by(Age) %>% 
tally(name = "Total_Count")


age<-ggplot(data = us_shootings_age_group, aes(x = Age, y = Total_Count)) +
geom_line(stat = "identity", color = "darkred") +
geom_point(stat = "identity", color = "black") +
geom_area(fill = "skyblue", alpha = 0.5) +
 labs(title = "Total Victims Killed at The Age of : ",
                y = "Count of Shootings", x = "Age of Victim",caption="Data from Washnington Post") +
theme(plot.title = element_text(face = "bold", size = rel(1.3), hjust = 0.40),
                 plot.subtitle = element_text(face = "italic", size = rel(1), hjust = 0.6),
                 axis.title = element_text(face = "bold", size = 14),
                 axis.text = element_text(face = "bold", size = 10),
                 legend.title = element_text(face = "bold", size = 14),
                 legend.text = element_text(face = "bold", size = 10))+theme_bw()
age

## Warning: Removed 1 rows containing missing values (position_stack).

## Warning: Removed 1 row(s) containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_point).

### Graph - Total Victims Killed at The Age of (Hover for values)

fig<-ggplotly(age)

## Warning: Removed 1 rows containing missing values (position_stack).

fig

The highest number of victims were in age group of 25-35 years

Graph - Total Victims Killed at The Age of? in relation to Race

density1 <- ggplot(data=Police_Shooting_data,aes(x=Age,fill=Race))+geom_density(position='stack')+theme_bw()+labs(y='Density',x="Race",title="Age and Race in relation with the casualities",caption="Data from Washington Post")+theme(plot.title = element_text(face = "bold", size = rel(1.3), hjust = 0.40),
                 plot.subtitle = element_text(face = "italic", size = rel(1), hjust = 0.6),
                 axis.title = element_text(face = "bold", size = 14),
                 axis.text = element_text(face = "bold", size = 10),
                 legend.title = element_text(face = "bold", size = 14),
                 legend.text = element_text(face = "bold", size = 10))+scale_fill_brewer(palette="Accent")
density1

## Warning: Removed 280 rows containing non-finite values (stat_density).

Graph - Total Victims Killed at The Age of? in relation to Race (Hover for values)

fig<-ggplotly(density1)

## Warning: Removed 280 rows containing non-finite values (stat_density).

fig

The casualties is in the age range between 25-35 for all races with white race as an exception.

Map total number of shootings per state to find out the state with highest shooting

#all_states <- map_data("state")
#all_states <- merge(all_states,Mapper, by = "region")

#PS_States<-as.data.frame(table(Police_Shooting_data$State))
#names(PS_States)<-c('state','occurences')


#plot_usmap(data=PS_States,values='occurences',color='red')+scale_fill_continuous(low="white",high="red",name="Police Shootings (2015-2021)",labels=comma) +theme(legend.position = 'right')

Tried to create state graph to show the highest shooting per state but the graph is not working thus I commented it

Graph of Top 20 cities with highest shooting rate

Police_Shooting_data$CityState<-paste(Police_Shooting_data$City,Police_Shooting_data$State,sep='-')
Shootings_per_Cities<-as.data.frame(table(Police_Shooting_data$CityState)) %>%arrange(-Freq)%>% head(20)

Shootings_per_Cities$City<-sapply(str_split(Shootings_per_Cities$Var1,'-'), `[[`, 1)
Shootings_per_Cities$State<-sapply(str_split(Shootings_per_Cities$Var1,'-'), `[[`, 2)
Shootings_per_Cities$Var1<-NULL
Shootings_per_Cities<-Shootings_per_Cities %>%mutate(City=factor(City,levels=City))

City <- ggplot(data=Shootings_per_Cities,aes(x=City,y=Freq,fill=State))+geom_bar(stat='identity',color="black")+theme_bw()+theme(axis.text.x=element_text(angle=45,hjust=1))+labs(title="Top 20 cities with highest shooting rate",caption="Data from Washington Post")+  theme(plot.title = element_text(face = "bold", size = rel(1.3), hjust = 0.40),
                 plot.subtitle = element_text(face = "italic", size = rel(1), hjust = 0.6),
                 axis.title = element_text(face = "bold", size = 14),
                 axis.text = element_text(face = "bold", size = 10),
                 axis.text.x=element_text(angle=45,hjust=0.5,vjust=0.5),legend.position = "none")


City

Graph of Top 20 cities with highest shooting rate (hover for values)

fig<-ggplotly(City)
fig

The current metro area population of Atlanta in 2021 is 5,911,000 and the population of Los Angeles in 2021 is 12,459,000.This tells us that the size of the city along with the economy and joblessness is why these events happened. The lack of workplaces can be very frustrating especially for young people that want to start their families and self-sufficient life.

Graph of Breakdown of top 20 cities By Presence of Body Camera on Duty

bodycam <- Police_Shooting_data %>% group_by(City) %>% summarise(n=n()) %>% arrange(desc(n))
names <- bodycam$City[1:20]

p3<- Police_Shooting_data %>% group_by(Body_Camera,City) %>% summarise(n=n()) %>% filter(City %in% names)%>%
  ggplot(aes(x=City, y=n, fill=Body_Camera)) +
geom_bar(stat="identity", position="dodge",color="black") + labs(title="Breakdown of top 20 cities  By Presence of Body Camera on Duty ",caption="Data from Washington Post",y="Freq",x="City")+theme_bw()+  theme(plot.title = element_text(face = "bold", size = rel(1.3), hjust = 0.40),
                 plot.subtitle = element_text(face = "italic", size = rel(1), hjust = 0.6),
                 axis.title = element_text(face = "bold", size = 14),
                 axis.text = element_text(face = "bold", size = 10),
                 legend.title = element_text(face = "bold", size = 14),
                 legend.text = element_text(face = "bold", size = 10),axis.text.x=element_text(angle=45,hjust=0.5,vjust=0.5))+ scale_fill_brewer( palette = "Accent")

## `summarise()` has grouped output by 'Body_Camera'. You can override using the `.groups` argument.

p3

### Graph of Breakdown of top 20 cities By Presence of Body Camera on Duty (hover for values)

fig<-ggplotly(p3)
 fig

The highest shooting happened in LA where body camera were not turned on followed by Phoenix and then Houston where body camera were not turned on.

Graph of Breakdown of top 20 cities By Gender

gender <- Police_Shooting_data %>% group_by(City) %>% dplyr::summarise(n=n()) %>% arrange(desc(n))
names <- gender$City[1:20]

G<-Police_Shooting_data %>% group_by(Gender,City) %>% summarise(n=n()) %>% filter(City %in% names)%>%

ggplot(aes(x=City, y=n, fill=Gender)) +
geom_bar(stat="identity", position="dodge",color="black")+theme(plot.title=element_text(size=18),axis.text.x = element_text(angle=90, vjust=1))+ labs(title="Breakdown of top 20 cities  By Gender of Victim ",caption="Data from Washington Post",y="Freq",x="City")+theme_bw()+ scale_fill_brewer( palette = "Accent")+ theme(plot.title = element_text(face = "bold", size = rel(1.3), hjust = 0.40),
                 plot.subtitle = element_text(face = "italic", size = rel(1), hjust = 0.6),
                 axis.title = element_text(face = "bold", size = 14),
                 axis.text = element_text(face = "bold", size = 10),
                 legend.title = element_text(face = "bold", size = 14),
                 legend.text = element_text(face = "bold", size = 10),axis.text.x=element_text(angle=45,hjust=0.5,vjust=0.5))

## `summarise()` has grouped output by 'Gender'. You can override using the `.groups` argument.

### Graph of Breakdown of top 20 cities By Gender(hover for values)

fig<-ggplotly(G)
fig

The highest shooting happened in LA, followed by Phoenix and then Houston where majority of victims were male.

lets check whether the victims were armed or not and what kind of weapon they had. This will help us to see whether only armed person was shoot or the one without the arms as well

by_armed_gender <- Police_Shooting_data %>%
                  group_by(Armed, Gender) %>% filter(Armed!="NA") %>%
                        summarise(Total = n()) %>%
                        arrange(desc(Total)) %>% 
                                        head(20)

## `summarise()` has grouped output by 'Armed'. You can override using the `.groups` argument.

p2 <- ggplot(by_armed_gender, aes(reorder(Armed, Total),Total, fill = Gender)) + 
            geom_bar( stat = "identity",color="black") +
                  ggtitle("Type of weapon victim had in reference to Gender") + theme_bw()  + labs(x="Weapons")+scale_fill_brewer( palette = "Accent")+  theme(plot.title = element_text(face = "bold", size = rel(1.3), hjust = 0.40),
                 plot.subtitle = element_text(face = "italic", size = rel(1), hjust = 0.6),
                 axis.title = element_text(face = "bold", size = 14),
                 axis.text = element_text(face = "bold", size = 10),
                 legend.title = element_text(face = "bold", size = 14),
                 legend.text = element_text(face = "bold", size = 10),axis.text.x=element_text(angle=90,hjust=0.5,vjust=0.5))
p2

lets check whether the victims were armed or not and what kind of weapon they had. This will help us to see whether only armed person was shoot or the one without the arms as well(hover for values)

fig<-ggplotly(p2)
fig

Looking at the chart the The vast majority of casualties were allegedly armed with guns, followed by people with knives. However, there is also a group of unarmed victims which can give the idea that the police too were aggressive.The majority of the gender were Male

Lets find the top weapons used by victims based on Gender and Age

shoot_gender_armed_age = Police_Shooting_data %>% 
  group_by(Gender, Armed, Age) %>% 
 tally(name = "Total_Count") %>% 
  arrange(desc(Age))%>% head(50)

ggplot(data = shoot_gender_armed_age, aes(x = reorder(Armed, Total_Count), y = Age, fill = Gender)) +
  geom_bar(stat = "identity", color = "black", position = "dodge", width = 0.8) +
  coord_flip() +
  labs(title = "Distribution of Armed Weapons Carried by Male & Female Victims At the Age Of:",
                x = "Armed Weapon of Victim", y = "Age of Victim", fill = "Gender of Victim:",
                subtitle = "Weapons are in Decreasing Order of Total Killings (Max.- Min.)") +
scale_fill_brewer( palette = "Accent") +
  theme_bw() +
  theme(plot.title = element_text(face = "bold", size = rel(1.3), hjust = 0.40),
                 plot.subtitle = element_text(face = "italic", size = rel(1), hjust = 0.6),
                 axis.title = element_text(face = "bold", size = 14),
                 axis.text = element_text(face = "bold", size = 10),
                 legend.title = element_text(face = "bold", size = 14),
                 legend.text = element_text(face = "bold", size = 10))

Maximum males above age of 25 had gun has a weapon. even the female in same age group as men has gun as their weapon.Followed by Knives as second weapon used by males only.Here also we see maximum number of females were unarmed who also became victim of police shooting.

lets see the percentage of victims with the mental illness

MentalIllness<-data.frame(table(Police_Shooting_data$Signs_of_Mental_Illness))%>%arrange(Freq)%>%mutate(Var1=factor(Var1,levels=rev(Var1)),Cumulative=cumsum(Freq),Midpoint=Cumulative-Freq/2)
names(MentalIllness)[1]<-'Signs_of_Mental_Illness'


ggplot(data=MentalIllness,aes(x="",y=Freq/sum(Freq),fill=Signs_of_Mental_Illness))+geom_bar(stat="identity",position="stack",color="black")+coord_polar("y")+scale_y_continuous(labels=percent)+ggtitle( "Pie chart of casualities having signs of mental illness")+
geom_text(aes(label=percent(Freq/sum(Freq)),y=Midpoint/sum(Midpoint)))+theme_void()+scale_fill_brewer(palette='Accent',name='Signs of mental illness',labels=c('No','Yes'))+  theme(plot.title = element_text(face = "bold", size = rel(1.3), hjust = 0.40),
                 plot.subtitle = element_text(face = "italic", size = rel(1), hjust = 0.6),
                 axis.title = element_text(face = "bold", size = 14),
                 axis.text = element_text(face = "bold", size = 10),
                 legend.title = element_text(face = "bold", size = 14),
                 legend.text = element_text(face = "bold", size = 10))

From the statistics only 23% of the victims showed the sign on metal illness which is not a crucial factor to define the increasing rate of shooting

After looking at the variables I am now interested to see whether the name has any relationship to the casualities over the years

library(tm)

## Loading required package: NLP

## 
## Attaching package: 'NLP'

## The following object is masked from 'package:ggplot2':
## 
##     annotate

library(tidytext)
library(wordcloud)


Police_Shooting_data %>%
  unnest_tokens(word, Name) %>%
  filter(!word %in% stop_words$word) %>% filter(word!="tk") %>%
  count(word,sort = TRUE) %>%
  ungroup() %>%
  mutate(word = factor(word, levels = rev(unique(word)))) %>%
  head(50) %>%
  
  with(wordcloud(word, n, max.words = 100,colors=brewer.pal(3, "Dark2")))

The most common name is Michael followed by James and David.There are some Hispanic names as weel like Garcia,Antonia,Hernandez.Difficult to say from the names about the race with exception to Hispanic race.

Final Graph based on the shooting over years per Race

us_shootings_race_year = Police_Shooting_data %>% 
  group_by(year, Race) %>% 
  tally(name = "Total_Count")


  ggplot(data = us_shootings_race_year, aes(x = year, y = Total_Count,fill=Race)) +
  geom_histogram(aes(fill = Race), color = "darkred", stat = "identity", show.legend = FALSE, alpha = 0.9) +
 geom_line(color = "black", stat = "identity", size = 1, linetype = 3) +
  labs(title = "Distribution of Victims of Different Race Killed per Year :",
                x = "Year of Shootout", y = "Count of Victims") +
facet_wrap(Race ~ ., scales = "free", ncol = 1, strip.position = "right") +
 scale_x_discrete(limits = c(2015:2021)) +
geom_text(aes(label = Total_Count, vjust = 1.2)) +
  theme_tufte() + 
  theme(plot.title = element_text(face = "bold", size = 18),
                 strip.text = element_text(face = "bold", size = 11),
                 axis.title = element_text(face = "bold", size = 14),
                 axis.text.y = element_text(face = "bold", size = 9),
                 axis.text.x = element_text(face = "bold.italic", size = 10),
                 panel.grid.major.y = element_line(colour = "lightgrey"))+scale_fill_brewer( palette = "Set3")

## Warning: Ignoring unknown parameters: binwidth, bins, pad

## Warning: Continuous limits supplied to discrete scale.
## Did you mean `limits = factor(...)` or `scale_*_continuous()`?

I have created the histogram to compare the death rate by race for the years 2015-2020.Though its shows the highest number of the victims were white we need to keep in mind that the whites have the maximum population in the US. The victim of highest shooting over the years is the people of black race, their population only comprises of 13% of the population.The third grup again is Hispanic race which comprises of 18% of the population It is sad to see the people of color is facing so much hatred leading to losing their lives. I was hopng to see th encrease is death rates in recent years i.e after 2018 but I astound to see the death rate for Black race has remain stagnant over the years.I believe if we backtrack the data the results wont change.

I am an immigrant and person of color in USA, when I came in I saw USA in its full glory but when in the year 2019 BLM movement took place it hit me to dig deeper by my own and do some findings and then while selecting this dataset the second instance which triggered was the Asian women who were shot in spa in Atlanta (https://www.nytimes.com/live/2021/03/17/us/shooting-atlanta-acworth). While doing this analysis I saw that the number for white race is higher but as explained the probability is lower compared to the population.To support my analysis I found this article by BMJ (https://www.bmj.com/company/newsroom/fatal-police-shootings-of-unarmed-black-people-in-us-more-than-3-times-as-high-as-in-whites/)in which they states (“The rate of fatal police shootings of unarmed Black people in the US is more than 3 times as high as it is among White people, finds research published online in the Journal of Epidemiology & Community Health.”) They calculated the percentage rate of the shooting rate of death and years of life lost by race/ethnic group for all fatal police shootings per quarter, per million (pqpm) from 2015-2020. Average deaths per quarter were highest among Native Americans (1.74 pqpm), followed by Blacks (1.49 pqpm), Hispanics (0.74 pqpm), Whites (0.57 pqpm) and Asians (0.25 pqpm). These data is till 2020 but there is signifant change in 2021(April).

As I further did the analysis on the basis of city I found LA has the highest fatality rate and to support my finding I found the article https://xtown.la/2021/01/07/los-angels-homicide-rate/. This article was published in 2021 January where “LAPD Chief Michel Moore has stated that the rise in homicides is in part the result of frustration exacerbated by the pandemic. In November, in reaction to a spate of shootings, he noted that individuals may be more prone to violence after months of isolation and few job opportunities.” These article even shows the top 3 category fo the weapons alleged victims had during the time shooting occured. Even giving insight about youngest and oldest person who were shot. “The youngest victim was a 4-year-old Black girl who was killed in Gramercy Park on Aug. 11. The oldest victim was an 85-year-old Hispanic woman who was killed in Larchmont on May 31.” In addition I also found this article showing the rank of 65 cities from most to least deadliest (https://www.cbsnews.com/pictures/murder-map-deadliest-u-s-cities/ by cbs news which gave the list of top 65 cities in the USA)

Doing this analysis it did not gave me the answer as why the shootings within certain communities/race has increased except the hatred being the reason. I try to find the article/papers justifying the factors but there are no references I could find.

This dataset is such a rich dataset that I feel I can still do more analysis but I tried to do the research and show my findings via visualizations.I wanted to do the maps graph showing the state with the highest number of shootings but could not do it. I have commented the chunk as my code was not excecuting.

Data 110-FinalProject_Police Fatalaties_PD

Pankti Dalal

5/9/2021