Introduction to the Data and Data Cleaning

Import relevant libraries

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(plotly)
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(ggplot2)
library(RColorBrewer)
library(streamgraph)
library(alluvial)

Load the data

setwd("~/Desktop/Python Datasets/hatecrimes")
fatal1 <- read.csv("fatal-police-shootings-data.csv",stringsAsFactors = TRUE)

Analyze the structure

str(fatal1)
## 'data.frame':    5610 obs. of  17 variables:
##  $ id                     : int  3 4 5 8 9 11 13 15 16 17 ...
##  $ name                   : Factor w/ 5390 levels ""," Austin Wilburly  Reid",..: 5017 3323 2634 3599 3779 3096 3080 600 402 3313 ...
##  $ date                   : Factor w/ 1919 levels "2015-01-02","2015-01-03",..: 1 1 2 3 3 3 4 5 5 5 ...
##  $ manner_of_death        : Factor w/ 2 levels "shot","shot and Tasered": 1 1 2 1 1 1 1 1 1 1 ...
##  $ armed                  : Factor w/ 96 levels "","air conditioner",..: 39 39 88 87 64 39 39 39 88 87 ...
##  $ age                    : int  53 47 23 32 39 18 22 35 34 47 ...
##  $ gender                 : Factor w/ 3 levels "","F","M": 3 3 3 3 3 3 3 3 2 3 ...
##  $ race                   : Factor w/ 7 levels "","A","B","H",..: 2 7 4 7 4 7 4 7 7 3 ...
##  $ city                   : Factor w/ 2532 levels "300 block of State Line Road",..: 2087 37 2477 2018 718 924 386 92 301 1167 ...
##  $ state                  : Factor w/ 51 levels "AK","AL","AR",..: 48 38 17 5 6 37 4 17 13 39 ...
##  $ signs_of_mental_illness: Factor w/ 2 levels "False","True": 2 1 1 2 1 1 1 1 1 1 ...
##  $ threat_level           : Factor w/ 3 levels "attack","other",..: 1 1 2 1 1 1 1 1 2 1 ...
##  $ flee                   : Factor w/ 5 levels "","Car","Foot",..: 4 4 4 4 4 4 2 4 4 4 ...
##  $ body_camera            : Factor w/ 2 levels "False","True": 1 1 1 1 1 1 1 1 2 1 ...
##  $ longitude              : num  -123.1 -122.9 -97.3 -122.4 -104.7 ...
##  $ latitude               : num  47.2 45.5 37.7 37.8 40.4 ...
##  $ is_geocoding_exact     : Factor w/ 2 levels "False","True": 2 2 2 2 2 2 2 2 2 2 ...
head(fatal1)
##   id               name       date  manner_of_death      armed age gender race
## 1  3         Tim Elliot 2015-01-02             shot        gun  53      M    A
## 2  4   Lewis Lee Lembke 2015-01-02             shot        gun  47      M    W
## 3  5 John Paul Quintero 2015-01-03 shot and Tasered    unarmed  23      M    H
## 4  8    Matthew Hoffman 2015-01-04             shot toy weapon  32      M    W
## 5  9  Michael Rodriguez 2015-01-04             shot   nail gun  39      M    H
## 6 11  Kenneth Joe Brown 2015-01-04             shot        gun  18      M    W
##            city state signs_of_mental_illness threat_level        flee
## 1       Shelton    WA                    True       attack Not fleeing
## 2         Aloha    OR                   False       attack Not fleeing
## 3       Wichita    KS                   False        other Not fleeing
## 4 San Francisco    CA                    True       attack Not fleeing
## 5         Evans    CO                   False       attack Not fleeing
## 6       Guthrie    OK                   False       attack Not fleeing
##   body_camera longitude latitude is_geocoding_exact
## 1       False  -123.122   47.247               True
## 2       False  -122.892   45.487               True
## 3       False   -97.281   37.695               True
## 4       False  -122.422   37.763               True
## 5       False  -104.692   40.384               True
## 6       False   -97.423   35.877               True

Clean the data

fatal1$name = as.character(fatal1$name)
fatal1$date = as.character(fatal1$date)

fatal_clean<- fatal1 %>% mutate(date = ymd(date))
head(fatal_clean)
##   id               name       date  manner_of_death      armed age gender race
## 1  3         Tim Elliot 2015-01-02             shot        gun  53      M    A
## 2  4   Lewis Lee Lembke 2015-01-02             shot        gun  47      M    W
## 3  5 John Paul Quintero 2015-01-03 shot and Tasered    unarmed  23      M    H
## 4  8    Matthew Hoffman 2015-01-04             shot toy weapon  32      M    W
## 5  9  Michael Rodriguez 2015-01-04             shot   nail gun  39      M    H
## 6 11  Kenneth Joe Brown 2015-01-04             shot        gun  18      M    W
##            city state signs_of_mental_illness threat_level        flee
## 1       Shelton    WA                    True       attack Not fleeing
## 2         Aloha    OR                   False       attack Not fleeing
## 3       Wichita    KS                   False        other Not fleeing
## 4 San Francisco    CA                    True       attack Not fleeing
## 5         Evans    CO                   False       attack Not fleeing
## 6       Guthrie    OK                   False       attack Not fleeing
##   body_camera longitude latitude is_geocoding_exact
## 1       False  -123.122   47.247               True
## 2       False  -122.892   45.487               True
## 3       False   -97.281   37.695               True
## 4       False  -122.422   37.763               True
## 5       False  -104.692   40.384               True
## 6       False   -97.423   35.877               True

Create variables for year and month

months = vector(mode = "numeric", length = nrow(fatal_clean))
years = vector(mode = "numeric", length = nrow(fatal_clean))

fatal_clean2 <- cbind(fatal_clean,months,years)

head(fatal_clean2)
##   id               name       date  manner_of_death      armed age gender race
## 1  3         Tim Elliot 2015-01-02             shot        gun  53      M    A
## 2  4   Lewis Lee Lembke 2015-01-02             shot        gun  47      M    W
## 3  5 John Paul Quintero 2015-01-03 shot and Tasered    unarmed  23      M    H
## 4  8    Matthew Hoffman 2015-01-04             shot toy weapon  32      M    W
## 5  9  Michael Rodriguez 2015-01-04             shot   nail gun  39      M    H
## 6 11  Kenneth Joe Brown 2015-01-04             shot        gun  18      M    W
##            city state signs_of_mental_illness threat_level        flee
## 1       Shelton    WA                    True       attack Not fleeing
## 2         Aloha    OR                   False       attack Not fleeing
## 3       Wichita    KS                   False        other Not fleeing
## 4 San Francisco    CA                    True       attack Not fleeing
## 5         Evans    CO                   False       attack Not fleeing
## 6       Guthrie    OK                   False       attack Not fleeing
##   body_camera longitude latitude is_geocoding_exact months years
## 1       False  -123.122   47.247               True      0     0
## 2       False  -122.892   45.487               True      0     0
## 3       False   -97.281   37.695               True      0     0
## 4       False  -122.422   37.763               True      0     0
## 5       False  -104.692   40.384               True      0     0
## 6       False   -97.423   35.877               True      0     0
fatal_clean2$months <- as.integer(fatal_clean2$months)
fatal_clean2$years <- as.integer(fatal_clean2$years)

head(fatal_clean2)
##   id               name       date  manner_of_death      armed age gender race
## 1  3         Tim Elliot 2015-01-02             shot        gun  53      M    A
## 2  4   Lewis Lee Lembke 2015-01-02             shot        gun  47      M    W
## 3  5 John Paul Quintero 2015-01-03 shot and Tasered    unarmed  23      M    H
## 4  8    Matthew Hoffman 2015-01-04             shot toy weapon  32      M    W
## 5  9  Michael Rodriguez 2015-01-04             shot   nail gun  39      M    H
## 6 11  Kenneth Joe Brown 2015-01-04             shot        gun  18      M    W
##            city state signs_of_mental_illness threat_level        flee
## 1       Shelton    WA                    True       attack Not fleeing
## 2         Aloha    OR                   False       attack Not fleeing
## 3       Wichita    KS                   False        other Not fleeing
## 4 San Francisco    CA                    True       attack Not fleeing
## 5         Evans    CO                   False       attack Not fleeing
## 6       Guthrie    OK                   False       attack Not fleeing
##   body_camera longitude latitude is_geocoding_exact months years
## 1       False  -123.122   47.247               True      0     0
## 2       False  -122.892   45.487               True      0     0
## 3       False   -97.281   37.695               True      0     0
## 4       False  -122.422   37.763               True      0     0
## 5       False  -104.692   40.384               True      0     0
## 6       False   -97.423   35.877               True      0     0
for (i in 1:nrow(fatal_clean2)){
   if (month(fatal_clean2[i,"date"]) == 1){
    fatal_clean2[i,"months"] = 1
  }else if (month(fatal_clean2[i,"date"]) == 2){
    fatal_clean2[i,"months"] = 2
  }else if (month(fatal_clean2[i,"date"]) == 3){
    fatal_clean2[i,"months"] = 3
  }else if (month(fatal_clean2[i,"date"]) == 4){
    fatal_clean2[i,"months"] = 4
  }else if (month(fatal_clean2[i,"date"]) == 5){
    fatal_clean2[i,"months"] = 5
  }else if (month(fatal_clean2[i,"date"]) == 6){
    fatal_clean2[i,"months"] = 6
  }else if (month(fatal_clean2[i,"date"]) == 7){
    fatal_clean2[i,"months"] = 7
  }else if (month(fatal_clean2[i,"date"]) == 8){
    fatal_clean2[i,"months"] = 8
  }else if (month(fatal_clean2[i,"date"]) == 9){
    fatal_clean2[i,"months"] = 9
  }else if (month(fatal_clean2[i,"date"]) == 10){
    fatal_clean2[i,"months"] = 10
  }else if (month(fatal_clean2[i,"date"]) == 11){
    fatal_clean2[i,"months"] = 11
  }else if (month(fatal_clean2[i,"date"]) == 12){
    fatal_clean2[i,"months"] = 12
  }
}

for (i in 1:nrow(fatal_clean2)){
   if(year(fatal_clean2[i,"date"]) == 2015){
    fatal_clean2[i,"years"] = 2015
  }else if (year(fatal_clean2[i,"date"]) == 2016){
    fatal_clean2[i,"years"] = 2016
  }else if (year(fatal_clean2[i,"date"]) == 2017){
    fatal_clean2[i,"years"] = 2017
  }else if (year(fatal_clean2[i,"date"]) == 2018){
    fatal_clean2[i,"years"] = 2018
  }else if (year(fatal_clean2[i,"date"]) == 2019){
    fatal_clean2[i,"years"] = 2019
  }else if(year(fatal_clean2[i,"date"]) == 2020){
    fatal_clean2[i,"years"] = 2020
  }
}

fatal_clean2$months <- as.integer(fatal_clean2$months)
fatal_clean2$years <- as.integer(fatal_clean2$years)

Data Visualizations

Alluvial and Streamgraphs

Total Number of People Killed per Year by Race

fatal2 <- fatal_clean2 %>% select(years,race) %>% group_by(years,race) %>% count(race)
fatal2
## # A tibble: 42 x 3
## # Groups:   years, race [42]
##    years race      n
##    <int> <fct> <int>
##  1  2015 ""       29
##  2  2015 "A"      14
##  3  2015 "B"     258
##  4  2015 "H"     172
##  5  2015 "N"       9
##  6  2015 "O"      14
##  7  2015 "W"     498
##  8  2016 ""       56
##  9  2016 "A"      15
## 10  2016 "B"     234
## # … with 32 more rows
# Streamgraph
streamgraph(fatal2, key="race", value="n", date="years") %>%
  sg_axis_x(1, "year", "%Y")
fatal3 <- fatal_clean2 %>% select(race,years) %>% group_by(race,years) %>% count(race) %>% filter(race != "") %>% arrange(years,race)
fatal3
## # A tibble: 36 x 3
## # Groups:   race, years [36]
##    race  years     n
##    <fct> <int> <int>
##  1 A      2015    14
##  2 B      2015   258
##  3 H      2015   172
##  4 N      2015     9
##  5 O      2015    14
##  6 W      2015   498
##  7 A      2016    15
##  8 B      2016   234
##  9 H      2016   160
## 10 N      2016    16
## # … with 26 more rows
# Alluvial graph
alluvial_ts(fatal3, wave = .3, ygap = 5, plotdir = 'centred', alpha=.9,
            grid = TRUE, grid.lwd = 5, xmargin = 0.1, lab.cex = .7, xlab = 'Year',
            ylab = 'Number of People Killed', border = NA, axis.cex = .7, leg.cex = .8,
            leg.col='white', 
            title = "Number of People Killed by Police\n By Year and Race (2015-2020)")

Heatmaps

Mental Illness and Race by Count and Proportion

fatal_clean2 %>% count(race,signs_of_mental_illness) %>% ggplot(mapping = aes(x = race, y=signs_of_mental_illness)) + geom_tile(mapping = aes(fill = n)) + xlab("Race") + ylab("Signs of Mental Illness") + ggtitle("Mental Illness and Race by Count") + theme(plot.title = element_text(hjust = 0.5)) + coord_flip()

fatal4 <- fatal_clean2 %>% count(race,signs_of_mental_illness) %>% group_by(race) %>% mutate(prop = n/sum(n))
fatal4
## # A tibble: 14 x 4
## # Groups:   race [7]
##    race  signs_of_mental_illness     n  prop
##    <fct> <fct>                   <int> <dbl>
##  1 ""    False                     488 0.796
##  2 ""    True                      125 0.204
##  3 "A"   False                      70 0.753
##  4 "A"   True                       23 0.247
##  5 "B"   False                    1132 0.856
##  6 "B"   True                      190 0.144
##  7 "H"   False                     764 0.827
##  8 "H"   True                      160 0.173
##  9 "N"   False                      64 0.821
## 10 "N"   True                       14 0.179
## 11 "O"   False                      37 0.787
## 12 "O"   True                       10 0.213
## 13 "W"   False                    1801 0.711
## 14 "W"   True                      732 0.289
fatal4 %>% ggplot(mapping = aes(x = race, y=signs_of_mental_illness)) + geom_tile(mapping = aes(fill = prop))+ xlab("Race") + ylab("Signs of Mental Illness") + ggtitle("Mental Illness and Race by Proportion") + theme(plot.title = element_text(hjust = 0.5)) + coord_flip()

Threat Level and Race by Count and Proportion

fatal_clean2 %>% count(race,threat_level) %>% ggplot(mapping = aes(x = race, y=threat_level)) + geom_tile(mapping = aes(fill = n))+ xlab("Race") + ylab("Threat Level") + ggtitle("Threat Level and Race by Count") + theme(plot.title = element_text(hjust = 0.5)) + coord_flip()

fatal5 <- fatal_clean2 %>% count(race,threat_level) %>% group_by(race) %>% mutate(prop = n/sum(n))
fatal5
## # A tibble: 20 x 4
## # Groups:   race [7]
##    race  threat_level     n   prop
##    <fct> <fct>        <int>  <dbl>
##  1 ""    attack         407 0.664 
##  2 ""    other          170 0.277 
##  3 ""    undetermined    36 0.0587
##  4 "A"   attack          51 0.548 
##  5 "A"   other           41 0.441 
##  6 "A"   undetermined     1 0.0108
##  7 "B"   attack         887 0.671 
##  8 "B"   other          374 0.283 
##  9 "B"   undetermined    61 0.0461
## 10 "H"   attack         535 0.579 
## 11 "H"   other          340 0.368 
## 12 "H"   undetermined    49 0.0530
## 13 "N"   attack          44 0.564 
## 14 "N"   other           30 0.385 
## 15 "N"   undetermined     4 0.0513
## 16 "O"   attack          31 0.660 
## 17 "O"   other           16 0.340 
## 18 "W"   attack        1677 0.662 
## 19 "W"   other          759 0.300 
## 20 "W"   undetermined    97 0.0383
fatal5 %>% ggplot(mapping = aes(x = race, y=threat_level)) + geom_tile(mapping = aes(fill = prop))+ xlab("Race") + ylab("Threat Level") + ggtitle("Threat Level and Race by Proportion") + theme(plot.title = element_text(hjust = 0.5)) + coord_flip()

Type of Weapon and Race by Count and Proportion

fatal_clean3 <- fatal_clean2 %>% count(armed) %>% filter(n>10)
fatal_clean3
##              armed    n
## 1                   212
## 2               ax   24
## 3     baseball bat   18
## 4       box cutter   12
## 5              gun 3183
## 6      gun and car   11
## 7    gun and knife   18
## 8  gun and vehicle   11
## 9           hammer   16
## 10         hatchet   11
## 11           knife  828
## 12         machete   47
## 13      metal pipe   14
## 14     screwdriver   13
## 15    sharp object   13
## 16           sword   23
## 17           Taser   26
## 18      toy weapon  193
## 19         unarmed  356
## 20    undetermined  172
## 21  unknown weapon   79
## 22         vehicle  161
armed_types <- fatal_clean3$armed
armed_types
##  [1]                 ax              baseball bat    box cutter     
##  [5] gun             gun and car     gun and knife   gun and vehicle
##  [9] hammer          hatchet         knife           machete        
## [13] metal pipe      screwdriver     sharp object    sword          
## [17] Taser           toy weapon      unarmed         undetermined   
## [21] unknown weapon  vehicle        
## 96 Levels:  air conditioner air pistol Airsoft pistol ax ... wrench
fatal_clean2 %>% filter(armed %in% armed_types) %>%  count(race,armed) %>% ggplot(mapping = aes(x = race, y=armed)) + geom_tile(mapping = aes(fill = n))+ xlab("Race") + ylab("Type of Weapon") + ggtitle("Arms and Race") + theme(plot.title = element_text(hjust = 0.5))

fatal6 <- fatal_clean2 %>% count(race,armed) %>% group_by(race) %>% filter(armed %in% armed_types) %>%  mutate(prop = n/sum(n))
fatal6
## # A tibble: 115 x 4
## # Groups:   race [7]
##    race  armed               n    prop
##    <fct> <fct>           <int>   <dbl>
##  1 ""    ""                  9 0.0151 
##  2 ""    "ax"                3 0.00504
##  3 ""    "baseball bat"      2 0.00336
##  4 ""    "box cutter"        1 0.00168
##  5 ""    "gun"             363 0.610  
##  6 ""    "gun and car"       2 0.00336
##  7 ""    "gun and knife"     3 0.00504
##  8 ""    "hammer"            2 0.00336
##  9 ""    "knife"           101 0.170  
## 10 ""    "machete"           6 0.0101 
## # … with 105 more rows
fatal6 %>% ggplot(mapping = aes(x = race, y=armed)) + geom_tile(mapping = aes(fill = prop))+ xlab("Race") + ylab("Type of Weapon") + ggtitle("Type of Weapon and Race by Proportion") + theme(plot.title = element_text(hjust = 0.5))

Race and Method of Fleeing by Count and Proportion

fatal_clean2 %>% count(race,flee) %>% ggplot(mapping = aes(x = race, y=flee)) + geom_tile(mapping = aes(fill = n))+ xlab("Race") + ylab("Method of Fleeing") + ggtitle("Race and Fleeing by Count") + theme(plot.title = element_text(hjust = 0.5)) + coord_flip()

fatal7 <- fatal_clean2 %>% count(race,flee) %>% group_by(race) %>% mutate(prop = n/sum(n))
fatal7
## # A tibble: 34 x 4
## # Groups:   race [7]
##    race  flee              n   prop
##    <fct> <fct>         <int>  <dbl>
##  1 ""    ""               52 0.0848
##  2 ""    "Car"            90 0.147 
##  3 ""    "Foot"           59 0.0962
##  4 ""    "Not fleeing"   392 0.639 
##  5 ""    "Other"          20 0.0326
##  6 "A"   ""                5 0.0538
##  7 "A"   "Car"             6 0.0645
##  8 "A"   "Foot"           11 0.118 
##  9 "A"   "Not fleeing"    71 0.763 
## 10 "B"   ""               53 0.0401
## # … with 24 more rows
fatal7 %>% ggplot(mapping = aes(x = race, y=flee)) + geom_tile(mapping = aes(fill = prop))+ xlab("Race") + ylab("Method of Fleeing") + ggtitle("Race and Method of Fleeing by Proportion") + theme(plot.title = element_text(hjust = 0.5)) + coord_flip()

Threat Level and Type of Weapon by Count and Proportion

fatal_clean2 %>% filter(armed %in% armed_types) %>% count(armed,threat_level) %>% ggplot(mapping = aes(x = armed, y=threat_level)) + geom_tile(mapping = aes(fill = n))+coord_flip()+ xlab("Type of Weapon") + ylab("Threat Level") + ggtitle("Threat Level and Type of Weapon by Count") + theme(plot.title = element_text(hjust = 0.5)) + coord_flip()
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

fatal8 <- fatal_clean2 %>% count(armed,threat_level) %>% group_by(threat_level) %>% mutate(prop = n/sum(n)) %>% filter(armed %in% armed_types)
fatal8
## # A tibble: 53 x 4
## # Groups:   threat_level [3]
##    armed          threat_level     n    prop
##    <fct>          <fct>        <int>   <dbl>
##  1 ""             attack         100 0.0275 
##  2 ""             other          102 0.0590 
##  3 ""             undetermined    10 0.0403 
##  4 "ax"           attack           6 0.00165
##  5 "ax"           other           18 0.0104 
##  6 "baseball bat" attack           9 0.00248
##  7 "baseball bat" other            8 0.00462
##  8 "baseball bat" undetermined     1 0.00403
##  9 "box cutter"   attack           7 0.00193
## 10 "box cutter"   other            5 0.00289
## # … with 43 more rows
fatal8 %>% ggplot(mapping = aes(x = armed, y=threat_level)) + geom_tile(mapping = aes(fill = prop))+ xlab("Type of Weapon") + ylab("Threat Level") + ggtitle("Threat Level and Type of Weapon by Proportion") + theme(plot.title = element_text(hjust = 0.5)) + coord_flip()

Threat Level and Signs of Mental Illness by Count and Proportion

fatal_clean2 %>% count(signs_of_mental_illness,threat_level) %>% ggplot(mapping = aes(x = signs_of_mental_illness, y=threat_level)) + geom_tile(mapping = aes(fill = n))+ xlab("Signs of Mental Illness") + ylab("Threat Level") + ggtitle("Threat Level and Signs of Mental Illness by Count") + theme(plot.title = element_text(hjust = 0.5)) 

fatal9 <- fatal_clean2 %>% count(signs_of_mental_illness,threat_level) %>% group_by(threat_level) %>% mutate(prop = n/sum(n))
fatal9
## # A tibble: 6 x 4
## # Groups:   threat_level [3]
##   signs_of_mental_illness threat_level     n  prop
##   <fct>                   <fct>        <int> <dbl>
## 1 False                   attack        2858 0.787
## 2 False                   other         1286 0.743
## 3 False                   undetermined   212 0.855
## 4 True                    attack         774 0.213
## 5 True                    other          444 0.257
## 6 True                    undetermined    36 0.145
fatal9 %>% ggplot(mapping = aes(x = signs_of_mental_illness, y=threat_level)) + geom_tile(mapping = aes(fill = prop))+ xlab("Signs of Mental Illness") + ylab("Threat Level") + ggtitle("Threat Level and Signs of Mental Illness by Proportion") + theme(plot.title = element_text(hjust = 0.5)) + coord_flip()

6a. The source of my data was the Washington Post police fatalities dataset. I imported the dataset in such a way that any string variables were regarded as factors since the majority of the string variables appeared to be categorical. There were 17 variables in total, the majority of which were categorical. The date, name, and age variables were the only exceptions. I cleaned the dataset by first converting the name variable to a character variable rather than a factor. I also converted the date variable into a character variable so that I could then easily convert it into a date variable, which I did using the lubridate package. Although I ultimately did not use the date variable directly, I did create two new variables, months and years, which extracted the month and year for each observation in the dataset so that I could group and analyze observations by year or month rather than solely by day. I used for loops to accomplish this although I could’ve also used the case_when function within the mutate function to accomplish this as well. Another cleaning operation I performed consisted in removing categories in the armed variable for which there were less than 10 observations due to the large number of possible values of the armed variable.

6b. The alluvial and streamgraph visualizations at the beginning on my analysis demonstrate that the number of police fatalities by race from 2015 to 2020 has been fairly constant. While it was not particularly surprising that white people constituted the majority of police fatalities from 2015 to 2020 given that the constitute the vast majority of the US population, it was surprising that black people constituted the next largest share of police fatalities from 2015 to 2020 considering that they make up a much smaller percentage of the US population. The same can be said of Hispanics albeit to a lesser degree since they did not die with the same frequency as black people and since they make up a larger proportion of the US population compared to black people. Given the large number of categorical variables, I felt it made sense to looks for associations between categorical variables and so, consequently, I made frequent use of heatmaps. It is interesting to note that for every race, roughly 75% of individuals did not show signs of mental illness, though the percentage seems to be slightly lower for Asian people and white people. It is also interesting that a higher proportion of black and white people were perceived as highly threatening or dangerous as compared to Asians, Hispanics, and Native Americans. The proportion of individuals that were armed with guns was high for every race but Asians, with whites seeming to constitute the highest proportion. Asians did, however, make up the highest proportion of individuals armed with knives, which came as a surprise. Another interesting observation was that the proportion of people that fled from the police was at or above 60% for every race. It would’ve been very interesting to look at age and race or gender and race. I have a hypothesis that the amount of black people killed by police is roughly the same across age and gender, whereas middle-aged white men and Asian men are killed more often than young men or women. I’d like to investigate this conjecture in the future. It would also have been very interesting to analyze the details of the police officers themselves (i.e. the gender, age, race, and number of officers for any given fatality) to see if, for example, black police officers are just as likely to kill black people as white police officers.