library(ggplot2)
library(kableExtra)
library(dplyr)
library(forcats)
library(here)
library(leaflet)

load(file = here("data-outputs", "CleanData.rda"))

CA2015 <- fe_clean %>%
  filter(st == "CA" & year > 2014 )

# Select homicides
# Remove suicides and some final cases that don't belong
## 25804-5 are the Tad Norman case in Lake City. 
## Not victims of police violence.
## they were killed by Norman

homicides <- CA2015 %>% 
  filter(circumstances != "Suicide") %>%
  arrange(date)

# for geocoded LDs
with(homicides, write.csv(cbind(feID,latitude,longitude),
                          here::here("data-outputs",
                                     "CAgeocodes2015.csv")))

all.cases <- dim(CA2015)[1]
last.hom.index <- dim(homicides)[1]
last.date <- max(homicides$date)

last.name <- ifelse(homicides$lname[last.hom.index] == "Unknown", 
                    "(Name not released)",        
                  paste(homicides$fname[last.hom.index],
                        homicides$lname[last.hom.index]))

last.age <- homicides$age[last.hom.index]
last.agency <- homicides$agency[last.hom.index]
last.cod <- homicides$cod[last.hom.index]
tot.by.yr <- table(homicides$year)
tot.this.yr <- tot.by.yr[[length(tot.by.yr)]]
num.suffix <- ifelse(tot.this.yr == 1, "st",
                     ifelse(tot.this.yr == 2, "nd",
                            ifelse(tot.this.yr == 3, "rd", "th")))

ca.homicides = mean(1861,1930,1829,1739,1679)

Introduction

This report tracks the number of persons killed by law enforcement officers in CA state since January 1, 2015.

MOST RECENT DATA UPDATE

  • Total homicides by law enforcement since Jan 1, 2015: 1334

  • Last reported case: (Name not released), 29 years old, on 2021-02-13 by Fontana Police Department

    This is the 29th person killed in 2021. The cause of death is reported as Gunshot.

Last year, 246 people were killed by law enforcement officers in California. The average number killed each year since 2015 is 218. Roughly 1 of every 10 homicides in CA is committed by a law enforcement officer. (The state’s homicide data can be found here).


Where the data come from

The data in this report are updated at least once each week, from the Fatal Encounters Project (https://fatalencounters.org/). Fatal Encounters includes all deaths during encounters with police (by contrast, the Washington Post data only includes fatal shootings by police).

This report is restricted to the cases that can be classified as homicides by police, it excludes cases identified in the Fatal Encounters dataset as suicides, and it includes deaths that occur during a “hot pursuit”.


What is a homicide?

The deadly force incidents in this report are homicides. A homicide is simply defined as the killing of one person by another. In the context of this report it refers to any encounter with law enforcement officers that results in a fatality. Homicides normally result in a criminal investigation or inquest, but the word does not imply a crime has been committed.

  • The word homicide means only that the death was caused in some way by the officer.

  • It does not not mean the officer’s actions that led to the death were justified, or that they were unjustified.

There are many different types of homicides. In the U.S., these types and definitions vary across states, but there are some general similarities. The definitions below are taken from a useful online summary found here, based on California State laws.

Homicide
Homicide is the killing of one person by another. This is a broad term that includes both legal and illegal killings. For example, a soldier may kill another soldier in battle, but that is not a crime. The situation in which the killing happened determines whether it is a crime.

  • Murder is the illegal and intentional killing of another person. Under California Penal Code Section 187, for example, murder is defined as one person killing another person with malice aforethought. Malice is defined as the knowledge and intention or desire to do evil. Malice aforethought is found when one person kills another person with the intention to do so.

    In California, for example, a defendant may be charged with first-degree murder, second-degree murder, or capital murder.

    • First-degree murder is the most serious and includes capital murder – first-degree murder with “special circumstances” that make the crime even more egregious. These cases can be punishable by life in prison without the possibility of parole, or death.

    • Second-degree murder is murder without premeditation, but with intent that is typically rooted in pre-existing circumstances. The penalty for second-degree murder may be up to 15 years to life in prison in California.

    • Felony murder is a subset of first-degree murder and is charged when a person is killed during the commission of a felony, such as a robbery or rape.

  • Manslaughter is the illegal killing of another person without premeditation, and in some cases without the intent to kill. These cases are treated as less severe crimes than murder. Manslaughter can also be categorized as voluntary or involuntary.

    • Voluntary manslaughter occurs when a person kills another without premeditation, typically in the heat of passion. The provocation must be such that a reasonable person under the same circumstances would have acted the same way. Penalties for voluntary manslaughter include up to 11 years in prison in California.

    • Involuntary manslaughter is when a person is killed by actions that involve a wanton disregard for life by another. Involuntary manslaughter is committed without premeditation and without the true intent to kill, but the death of another person still occurs as a result. Penalties for involuntary manslaughter include up to four years in prison in California.

    • Vehicular manslaughter occurs when a person dies in a car accident due to another driver’s gross negligence or even simple negligence, in certain circumstances. ___

Interactive Map

Click the circled numbers to view the map pointers for each person killed by police.

  • Hovering over the pointer brings up the name of the person killed and agency of the officer who killed them;
  • Clicking the pointer will bring up a url to a news article on the case (if available).
map1 <- leaflet(data = homicides, width = "100%") %>% 
  addTiles() %>%
  addMarkers( ~ longitude,
              ~ latitude,
              popup = ~ url_click,
              label = ~ as.character(paste(name, "by", agency,
                                           "on", date)),
              clusterOptions = markerClusterOptions())
map1
agencynum <- homicides %>%
  group_by(agency) %>%
  count() %>%
  rename(agency_num = n)

homicides <- right_join(homicides, agencynum)

map2 <- leaflet(data = homicides, width = "100%") %>% 
  addTiles() %>%
  addCircles(~longitude, ~latitude, 
             weight = 1, radius = ~sqrt(agency_num) * 5000, 
             label = ~ as.character(agency),
             popup = ~agency_num)

map2
map3 <- leaflet(data = homicides, width = "100%") %>% 
  addTiles() %>%
  addMarkers( ~ longitude,
              ~ latitude,
              popup = ~ url_click,
              label = ~ as.character(name))
map3

Breakdowns

By race

Table

The race of the victim is missing in 19% of the original data. Fatal Encounters employs an algorithm to try to impute these cases. Over half of the missing values are successfully imputed. These imputations are included in the analysis here.

tab <- homicides %>%
  mutate(raceImp = recode(raceImp,
    "API" = "Asian/Pacific Islander",
    "BAA" = "Black/African American",
    "HL" = "Hispanic/Latinx",
    "NA" = "Native American/Indigenous",
    "WEA" = "White/European American",
    "ME" = "Mexican")
  ) %>%
group_by(raceImp) %>%
  summarize(Number = n(),
            Percent = round(100*Number/nrow(homicides), 1)
  ) %>%
  bind_rows(data.frame(raceImp="Total", 
                       Number = sum(.$Number), 
                       Percent = sum(.$Percent))) %>%
  rename(Race = raceImp) 

tab %>%
  kable(caption = "Breakdown by Race") %>%
  kable_styling(bootstrap_options = c("striped","hover")) %>%
  row_spec(row=dim(tab)[1], bold = T) %>%
  add_footnote(label = "Percents may not sum to 100 due to rounding",
               notation = "symbol")
Breakdown by Race
Race Number Percent
Asian/Pacific Islander 57 4.3
Black/African American 209 15.7
Hispanic/Latinx 513 38.5
Mexican 6 0.4
Native American/Indigenous 8 0.6
White/European American 394 29.5
Unknown 147 11.0
Total 1334 100.0
* Percents may not sum to 100 due to rounding

Plots

tempDF <- homicides %>%
  mutate(raceImp = recode(raceImp,
    "API" = "Asian",
    "BAA" = "Black",
    "HL" = "Latinx",
    "NA" = "Native",
    "WEA" = "White",
    "ME" = "Mexican")
  ) %>%
  count(raceImp) %>%
  mutate(perc = n / nrow(homicides)) 

tempDF %>%
  ggplot(aes(x=raceImp, 
             y = perc, 
             label = n)) +
  # label = round(100*perc, 1))) +
  geom_bar(stat="identity", fill="blue", alpha=.5) +
  geom_text(aes(y = perc), size = 3, nudge_y = .025) +
  labs(title = "Fatalities by Race",
       caption = "California since 2015; y-axis=pct, bar label=count") +
         xlab("Reported Race") +
         ylab("Percent of Total")

temp2DF <- homicides %>%
  mutate(raceb = case_when(raceImp == "WEA" ~ "White",
                           raceImp == "Unknown" ~ "Unknown",
                           TRUE ~ "BIPOC"),
         raceb = fct_relevel(raceb, "Unknown", 
                             after = Inf)) %>%
  count(raceb) %>%
  mutate(perc = n / nrow(homicides)) 

temp2DF %>%
  ggplot(aes(x=raceb, 
             y = perc, 
             label = n)) +
  geom_text(aes(y = perc), size = 3, nudge_y = .025) +
  geom_bar(stat="identity", fill="blue", alpha=.5) +
  labs(title = "Fatalities by Race",
       caption = "California since 2015; y-axis=pct, bar label=count") +
  xlab("Reported Race") +
  ylab("Percent of Total")

Discussion

Racial disparities in the risk of being killed by police are one of the most important factors driving the public demand for police accountability and reform. For that reason it is important to understand how these numbers can, or should not be used.

There are several things to keep in mind when interpreting the breakdown of cases by the race of the person killed in this report.

  1. Many case reports are missing data on race

These cases are denoted “Unknown” in the tables and plots in this report.

For the Fatal Encounters dataset, about 19% of the cases reported here do not have information that explicitly identifies the race of the person killed. The Fatal Encounters team uses an “imputation” model to try to predict race for these cases. A brief description of the methodology is online here. They are able to impute just over half of the missing cases with reasonable confidence, and we include these imputations in the breakdowns reported here. After imputations, about 11% of cases are still missing race.

  1. We are reporting the counts, not per capita rates

Breaking the total count down by race, the largest single group of persons killed by police, among those whose race is reported, are identified as Hispanic/Latino –
38% – and 59% are black, indigenous and persons of color.

The extent of the racial disparities in these homicides can only be answered after controlling for the size of the CA state populations by race. Hispanic/Latino residents are also the single largest ethnic group in CA source: Statista, 39% in 2020, and about ` r scales::percent(1 - 14.4/39.5) is BIPOC. On a per capita basis, the rate of persons killed by police is therefore somewhat higher for BIPOC, but with 11% of the persons killed being of unknown race, we can’t be sure.

There is, however, strong evidence for disparities for Black/African Americans. They comprise only 5%of the CA population, but 16% of the persons killed by police: they are over 3 times more likely to be killed by police on a per capita basis.

So why don’t we report the per capita counts instead?

Because, in addition to the unknown cases, there are other important sources of uncertainty in the race and ethnic classification of the data in Fatal Encounters:

  • Racial self-identification in official state population counts includes about 5% of people who report two or more races when asked. This multiple-race classification does not exist in the Fatal Encounters data on persons killed by police, and this complicates the detailed calculation of per capita rates by race.

  • Hispanic/Latinx is an ethnicity classification that crosses several racial groups, primarily White, Black and Native American. These cases are identified in Fatal Encounters as a distinct category in the race variable, rather than as a separate ethnicity classification. This also complicates the calculation of detailed per capita rates by race.

  1. Finally, it is worth noting that the race assigned to the victims in the Fatal Encounters data do not represent what the officer perceived that person’s race to be, so we can’t answer the question of intention, or explicit/implicit bias, with certainty.

By cause of death

Plot

homicides %>%
  mutate(codb = case_when(cod == "Gunshot" ~ "Gunshot",
                          TRUE ~ "Other")
         ) %>%
  count(codb) %>%
  mutate(perc = n / nrow(homicides)) %>%
  ggplot(aes(x=codb, 
             y = perc, 
             label = n)) +
  geom_text(aes(y = perc), size = 3, nudge_y = .025) +
  geom_bar(stat="identity", fill="blue", alpha=.5) +
  labs(title = "Fatalities by Cause of Death",
       caption = "California since 2015; y-axis=pct, bar label=count") +
  xlab("Reported Weapon Used by Police") +
  ylab("Percent of Total")

Table

tab <- homicides %>%
  group_by(cod) %>%
  summarize(Number = n(),
            Percent = round(100*Number/nrow(homicides), 1)
  ) %>%
  arrange(desc(Number)) %>%
  bind_rows(data.frame(cod ="Total", 
                       Number = sum(.$Number), 
                       Percent = sum(.$Percent))) 

While gunshot is by far the most common cause of death, it is worth noting that the next most common – 19.4% – is vehicle-related, typically “hot pursuits”.

tab %>%
  kable(caption = "Breakdown by Cause of Death",
        col.names = c("Cause of Death", "Number", "Percent")) %>%
  kable_styling(bootstrap_options = c("striped","hover")) %>%
  row_spec(row=dim(tab)[1], bold = T)  %>%
  add_footnote(label = "Percents may not sum to 100 due to rounding",
               notation = "symbol")
Breakdown by Cause of Death
Cause of Death Number Percent
Gunshot 973 72.9
Vehicle 259 19.4
Tasered 41 3.1
Asphyxiated/Restrained 22 1.6
Medical emergency 15 1.1
Drowned 9 0.7
Beaten/Bludgeoned with instrument 6 0.4
Fell from a height 5 0.4
Unknown 2 0.1
Chemical agent/Pepper spray 1 0.1
Stabbed 1 0.1
Total 1334 99.9
* Percents may not sum to 100 due to rounding

By County

Plot

homicides %>%
  group_by(county) %>%
  summarize(n = n(),
            perc = n / nrow(homicides)) %>%
  ggplot(aes(reorder(county, perc),
             y = perc, 
             label = n)) +
  geom_text(aes(y = perc), size = 3, nudge_y = .005) +
  geom_bar(stat="identity", fill="blue", alpha=.5) +
  labs(title = "County",
       caption = "California since 2015; x-axis=pct, bar label=count") +
  xlab("") +
  ylab("Percent of Total") +
  coord_flip()

Table

homicides %>%
  group_by(county) %>%
  summarize(Number = n(),
            Percent = round(100*Number/nrow(homicides), 1)
  ) %>%
  DT::datatable(rownames = F,
                caption = "Breakdown by County")

Agency/PD involved

# tab <- homicides %>%
#   group_by(agency) %>%
#   summarize(Number = n(),
#             Percent = round(100*Number/nrow(homicides), 1)
#             ) %>%
#   arrange(desc(Number)) %>%
#   bind_rows(data.frame(agency ="Total", 
#                        Number = sum(.$Number), 
#                        Percent = sum(.$Percent))) 
# 
# tab %>%
#   kable(caption = "Breakdown by Agency/PD of Involved Officer",
#         col.names = c("Agency/PD", "Number", "Percent")) %>%
#   kable_styling(bootstrap_options = c("striped","hover")) %>%
#   row_spec(row=dim(tab)[1], bold = T) %>%
#   add_footnote(label = "Percents may not sum to 100 due to rounding",
#                notation = "symbol")

homicides %>%
  group_by(agency) %>%
  summarize(Number = n(),
            Percent = round(100*Number/nrow(homicides), 1)
  ) %>%
  DT::datatable(rownames = F,
                caption = "Breakdown by Agency/PD of Involved Officer")

Online information availability

This information comes from Fatal Encounters. It takes the form of a single url to a news article that is available online.

There may be multiple news articles available online for a case, and they may report the conflicting details of the event, as well as conflicting perspectives on whether the homicide was justified. So the link provided here should be used as a starting place for research, not as a definitive description of the event.

The clickable urls are available in this report in the Interactive Map and Say their names sections.

tab <- homicides %>%
  mutate(url_info = case_when(url_info == "" ~ "No",
                               is.na(url_info) ~ "No",
                               TRUE ~ "Yes")) %>%
  group_by(url_info) %>%
  summarize(Number = n(),
            Percent = round(100*Number/nrow(homicides), 1)
  ) %>%
  bind_rows(data.frame(url_info ="Total", 
                       Number = sum(.$Number), 
                       Percent = sum(.$Percent))) 

tab %>%
  kable(caption = "URL for news article in Fatal Encounters",
        col.names = c("Availability", "Number", "Percent")) %>%
  kable_styling(bootstrap_options = c("striped","hover")) %>%
  row_spec(row=dim(tab)[1], bold = T) %>%
  add_footnote(label = "Percents may not sum to 100 due to rounding",
               notation = "symbol")
URL for news article in Fatal Encounters
Availability Number Percent
Yes 1334 100
Total 1334 100
* Percents may not sum to 100 due to rounding

By date

By Year

Note 2021 is not complete, it is year to date: 2021-02-21. There is also a lag in reporting of up to two weeks as information becomes available for a case.

p <- homicides %>%
  mutate(cod = ifelse(cod == "Gunshot", "shot", "other"),
         year = as.character(year)) %>%
  group_by(year, cod) %>%
  summarize(n = n()) %>%
  mutate(percent = round(100*n / sum(n), 1),
         year = ifelse(year == "2021", "2021 to date", year)) %>%
  ggplot(aes(x = year,
             y = n,
             text = percent,
             fill = cod)) +
  #geom_text(aes(y = n), size = 3, nudge_y = .025) +
  geom_bar(stat="identity", alpha=.5) +
  labs(title = "Year",
       caption = "California since 2015") +
  xlab("Year") +
  ylab("Number") +
  labs(fill = "Cause of\nDeath")

plotly::ggplotly(p, tooltip = "text")

Cumulative Totals by Month

The lines show the cumulative total fatalities by month as the year progresses, for each year. For the current year, we only plot months that have ended, to get the full monthly count.

# we will only plot after current month has finished
curr_mo <- lubridate::month(Sys.Date())

homicides %>%
  filter(date < as.Date(paste0("2021-", curr_mo, "-01"))) %>% 
  group_by(year, month) %>%
  summarize(count = n()) %>%
  mutate(cumulative = cumsum(count)) %>%
  ggplot(aes(x = month, 
             y = cumulative, 
             color = factor(year),
             group = year)) +
  geom_line(size = 1.5, alpha = 0.5) +
  geom_point(size=1, alpha = 0.5) +
  labs(title = "Cumulative fatalities by Month and Year",
       caption = "California since 2015",
       color = "Year") +
  xlab("Month") +
  ylab("Number")+ 
  scale_color_brewer(palette="YlOrRd")

By Month

The points represent monthly averages, across all years. For the current year, we only plot months that have ended, to get the full monthly count.

homicides %>%
  filter(date < as.Date(paste0("2021-", curr_mo, "-01"))) %>% 
  group_by(month, year) %>%
  summarize(count = n()) %>%
  group_by(month) %>%
  summarize(avg.per.mo = mean(count)) %>%
  ggplot(aes(x = month, 
             y = avg.per.mo)) +
  geom_bar(stat="identity", fill = "blue", alpha = 0.5) +
  labs(title = "Average Fatalities by Month",
       caption = "California since 2015") +
  xlab("Month") +
  ylab("Average Number")

Say their names

Name known

Of the 1334 persons killed by police, 1231 of the victim’s names are known at this time.

tempDF <- homicides %>%
  select(name, date, age=age, county, agency, url_click) %>%
  mutate(agency = gsub("California", "LA", agency),
         agency = gsub("County Sheriff's Office", "CSO", agency),
         agency = gsub("Police Department", "PD", agency),
         agency = gsub("Highway Patrol", "HP", agency),
         agency = gsub("Department of Fish and Wildlife", "Dept Fish & Wildlife", agency),
         agency = gsub("U.S. Bureau of Investigation", "US FBI", agency),
         agency = gsub("U.S. Customs and Border Protection", "US Customs & BP", agency),
         agency = gsub("U.S. Department of Homeland Security", "US Dept HS", agency),
         agency = gsub("U.S. Federal Bureau of Investigation", "US FBI", agency),
         agency = gsub("U.S. Drug Enforcement Administration", "US DEA", agency),
         agency = gsub("U.S. Immigration and Customs Enforcement", "US ICE", agency),
         agency = gsub("Marshals Service", "Marshals", agency),
         agency = gsub("University of California", "UC", agency))

tempDF %>%
  filter(name != "Unknown") %>%
  arrange(desc(date)) %>%
  DT::datatable(rownames = F,
                caption = paste("The Names We Know:  as of", scrape.date),
                filter = 'top',
                escape = FALSE)

Name Unknown

The remaining 103 of the victim’s names are not known at this time.

tempDF %>%
  filter(name == "Unknown") %>%
  arrange(desc(date)) %>%
  DT::datatable(rownames = F,
                caption = paste("The Names We Don't Know:  as of", scrape.date),
                filter = 'top',
                escape = FALSE)