In this report, we analyzed the gender, age, and ethnicity characteristics of residents killed by police from 2003-2021, as well as their geographic distribution using the Fatal Encounter Dataset collected by Brian Burghart of USC. Below is the data processing procedure and graphical representation.
library(tidyverse)
library(sf)
library(tmap)
library(leaflet)
library(here)
library(tidycensus)
library(sf)
library(tmap)
library(jsonlite)
library(tidyverse)
library(httr)
library(jsonlite)
library(knitr)
library(tigris)
library(places)
library(grid)
library(lubridate)
data <- read.csv("FATAL ENCOUNTERS DOT ORG SPREADSHEET (See Read me tab) - Form Responses.csv")
head(data)
## Unique.ID Name Age Gender Race
## 1 31495 Ashley McClendon 28 Female African-American/Black
## 2 31496 Name withheld by police Female Race unspecified
## 3 31497 Name withheld by police Male Race unspecified
## 4 31491 Johnny C. Martin Jr. 36 Male Race unspecified
## 5 31492 Dennis McHugh 44 Male European-American/White
## 6 31493 Ny'Darius McKinney 21 Male Race unspecified
## Race.with.imputations Imputation.probability
## 1 African-American/Black Not imputed
## 2 <NA> <NA>
## 3 <NA> <NA>
## 4 <NA> <NA>
## 5 <NA> <NA>
## 6 <NA> <NA>
## URL.of.image..PLS.NO.HOTLINKS.
## 1 https://fatalencounters.org/wp-content/uploads/2022/01/Ashley-McClendon.jpg
## 2
## 3
## 4
## 5
## 6
## Date.of.injury.resulting.in.death..month.day.year.
## 1 12/31/2021
## 2 12/31/2021
## 3 12/31/2021
## 4 12/30/2021
## 5 12/30/2021
## 6 12/30/2021
## Location.of.injury..address. Location.of.death..city. State
## 1 South Pearl Street and Tory Road Pageland SC
## 2 1500 21st Street Meridian MS
## 3 1500 21st Street Meridian MS
## 4 Martinez Lane Nicholls GA
## 5 435 E 4th Street Beaumont CA
## 6 State Rd S-29-296 & Bethel Rd Lancaster SC
## Location.of.death..zip.code. Location.of.death..county.
## 1 29728 Chesterfield
## 2 39301 Lauderdale
## 3 39301 Lauderdale
## 4 31554 Coffee
## 5 92223 Riverside
## 6 29720 Lancaster
## Full.Address Latitude
## 1 South Pearl Street and Tory Road Pageland SC 29728 Chesterfield 34.7452955
## 2 1500 21st Street Meridian MS 39301 Lauderdale 32.3793294
## 3 1500 21st Street Meridian MS 39301 Lauderdale 32.3793294
## 4 Martinez Lane Nicholls GA 31554 Coffee 31.5307934
## 5 400 E 4th Street Beaumont CA 92223 Riverside 33.9261462
## 6 State Rd S-29-296 & Bethel Rd Lancaster SC 29720 Lancaster 34.6608217
## Longitude
## 1 -80.39306
## 2 -88.69397
## 3 -88.69397
## 4 -82.63782
## 5 -116.97715
## 6 -80.83714
## Agency.or.agencies.involved
## 1 Pageland Police Department
## 2 Meridian Police Department
## 3 Meridian Police Department
## 4 Coffee County Sheriff's Office
## 5 Riverside County Sheriff's Department, Beaumont Police Department, Banning Police Department
## 6 South Carolina Law Enforcement Division
## Highest.level.of.force UID.Temporary Name.Temporary Armed.Unarmed
## 1 Vehicle NA
## 2 Gunshot NA
## 3 Gunshot NA
## 4 Gunshot NA
## 5 Gunshot NA
## 6 Vehicle NA
## Alleged.weapon Aggressive.physical.movement Fleeing.Not.fleeing
## 1
## 2
## 3
## 4
## 5
## 6
## Description.Temp URL.Temp
## 1
## 2
## 3
## 4
## 5
## 6
## Brief.description
## 1 Ashley McClendon's boyfriend, 33-year-old Marcus Allen Davis, allegedly was driving a 1996 Ford Coupe back to Pageland when an officer reportedly saw the car run a stop sign before midnight on Dec. 31. A traffic stop was attempted, but Davis refused to stop. As the car fled down South Pearl Street near Tory Road it left the road and struck a tree, killing passenger McClendon.
## 2 Police responded to a man causing a disturbance who was covered in blood. The man had a gun in each hand at a home. Once officers arrived, they were met with gunfire. Officers responded back with gunfire. A man and woman were killed.
## 3 Police responded to a man causing a disturbance who was covered in blood. The man had a gun in each hand at a home. Once officers arrived, they were met with gunfire. Officers responded back with gunfire. A man and woman were killed.
## 4 Johnny C. Martin, Jr. arrived at a gas station at 7:10 p.m. While at the gas station, Martin allegedly carjacked a woman, shooting at her while stealing her car. The Ware County Sheriff's Office and the Georgia State Patrol found the stolen car driven by Martin and pursued. Martin eventually went off the road into a field. When officers approached the car, they found Martin with a fatal gunshot to the head and a gun in his hand.
## 5 Deputies responded to a domestic violence call. When deputies arrived, the man was gone. The man reportedly had a felony warrant for a violation of an assault with a deadly weapon, domestic violence, kidnapping, vandalism, and a domestic violence restraining order violation. Deputies learned the suspect was in the city of Beaumont. A helicopter found the suspect's car. When officers located the man, he tried to flee by ramming his car into other cars. An officer and deputy shot and killed him.
## 6 About 5:35 p.m., Joseph Jemar Hinson was allegedly driving a car when police tried to pull him over, and he fled. Police pursued him until he ran off the left side of the roadway, struck a fence and wrecked, killing back seat passenger Ny'Darius McKinney. Hinson was charged with failure to stop for a blue light resulting in death.
## Dispositions.Exclusions.INTERNAL.USE..NOT.FOR.ANALYSIS
## 1 Criminal
## 2 Pending investigation
## 3 Pending investigation
## 4 Suicide
## 5 Pending investigation
## 6 Criminal
## Intended.use.of.force..Developing.
## 1 Pursuit
## 2 Deadly force
## 3 Deadly force
## 4 Suicide
## 5 Deadly force
## 6 Pursuit
## Supporting.document.link
## 1 https://www.wsoctv.com/news/1-person-dead-after-attempting-escape-police-troopers-say/QXA244QPUZGJ5GAGRADGDWBAEU/
## 2 https://www.wtok.com/2022/01/01/officer-involved-shooting/
## 3 https://www.wtok.com/2022/01/01/officer-involved-shooting/
## 4 https://gbi.georgia.gov/press-releases/2021-12-31/gbi-perry-and-douglas-offices-investigating-related-shootings
## 5 https://kesq.com/news/2021/12/31/officer-involved-shooting-unfolds-in-beaumont/
## 6 https://www.thelancasternews.com/content/21-year-old-man-killed-when-car-fleeing-police-crashes
## Foreknowledge.of.mental.illness..INTERNAL.USE..NOT.FOR.ANALYSIS X X.1
## 1 No NA NA
## 2 No NA NA
## 3 No NA NA
## 4 No NA NA
## 5 No NA NA
## 6 No NA NA
## Unique.ID.formula Unique.identifier..redundant.
## 1 NA 31495
## 2 NA 31496
## 3 NA 31497
## 4 NA 31491
## 5 NA 31492
## 6 NA 31493
allstate <- suppressMessages(
get_acs(geography = "state", # or "block group", "county", "state" etc.
variables = c(hhincome = 'B19019_001',
race.tot = "B02001_001",
race.white = "B02001_002",
race.black = "B02001_003"
),
year = 2021,
survey = "acs5", # American Community Survey 5-year estimate
geometry = TRUE, # returns sf objects
output = "wide") # wide vs. long
)
##
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 3%
|
|=== | 4%
|
|=== | 5%
|
|===== | 6%
|
|===== | 7%
|
|======= | 9%
|
|======= | 10%
|
|======== | 11%
|
|======== | 12%
|
|========= | 13%
|
|========== | 14%
|
|========== | 15%
|
|=========== | 16%
|
|============ | 17%
|
|============= | 18%
|
|============= | 19%
|
|============== | 20%
|
|============== | 21%
|
|=============== | 22%
|
|================ | 24%
|
|================= | 25%
|
|================== | 25%
|
|================== | 26%
|
|=================== | 27%
|
|==================== | 28%
|
|===================== | 29%
|
|===================== | 30%
|
|====================== | 32%
|
|======================= | 33%
|
|======================== | 35%
|
|========================== | 37%
|
|========================== | 38%
|
|=========================== | 38%
|
|=========================== | 39%
|
|============================ | 40%
|
|============================= | 42%
|
|=============================== | 44%
|
|=============================== | 45%
|
|================================== | 49%
|
|==================================== | 51%
|
|===================================== | 53%
|
|====================================== | 54%
|
|======================================= | 56%
|
|======================================== | 57%
|
|========================================= | 58%
|
|========================================== | 60%
|
|=========================================== | 62%
|
|============================================ | 63%
|
|============================================= | 64%
|
|=============================================== | 67%
|
|=============================================== | 68%
|
|================================================ | 69%
|
|================================================= | 70%
|
|================================================== | 71%
|
|=================================================== | 72%
|
|==================================================== | 74%
|
|====================================================== | 77%
|
|====================================================== | 78%
|
|======================================================= | 79%
|
|=========================================================== | 84%
|
|============================================================ | 85%
|
|============================================================ | 86%
|
|============================================================= | 87%
|
|============================================================== | 88%
|
|=============================================================== | 90%
|
|================================================================ | 91%
|
|================================================================= | 92%
|
|================================================================== | 94%
|
|=================================================================== | 96%
|
|==================================================================== | 97%
|
|===================================================================== | 98%
|
|===================================================================== | 99%
|
|======================================================================| 100%
allstate <- allstate %>%
select(GEOID,
name = NAME,
hhincome = hhincomeE,
race.tot = race.totE,
race.white = race.whiteE,
race.black = race.blackE
)
# Delete Duplicated Rows
data_unique <- data %>%
distinct(Unique.ID, .keep_all=T)
#to see if there is NA in key variables(Age, Gender, Race, Date,State, Lati and Logi)
data_unique[data_unique == ""] <- NA
data_unique %>%
map_dbl(., function(x) sum(is.na(x)))
## Unique.ID
## 1
## Name
## 0
## Age
## 1221
## Gender
## 144
## Race
## 1
## Race.with.imputations
## 868
## Imputation.probability
## 884
## URL.of.image..PLS.NO.HOTLINKS.
## 16773
## Date.of.injury.resulting.in.death..month.day.year.
## 0
## Location.of.injury..address.
## 556
## Location.of.death..city.
## 36
## State
## 1
## Location.of.death..zip.code.
## 182
## Location.of.death..county.
## 15
## Full.Address
## 1
## Latitude
## 1
## Longitude
## 1
## Agency.or.agencies.involved
## 78
## Highest.level.of.force
## 4
## UID.Temporary
## 25969
## Name.Temporary
## 25969
## Armed.Unarmed
## 14419
## Alleged.weapon
## 14421
## Aggressive.physical.movement
## 14418
## Fleeing.Not.fleeing
## 14419
## Description.Temp
## 27431
## URL.Temp
## 28281
## Brief.description
## 2
## Dispositions.Exclusions.INTERNAL.USE..NOT.FOR.ANALYSIS
## 3
## Intended.use.of.force..Developing.
## 3
## Supporting.document.link
## 2
## Foreknowledge.of.mental.illness..INTERNAL.USE..NOT.FOR.ANALYSIS
## 62
## X
## 31498
## X.1
## 31497
## Unique.ID.formula
## 31496
## Unique.identifier..redundant.
## 1
#select data
data_select<-data_unique %>%
select(Unique.ID,
Name,
Age,
Gender,
Race,
Date=Date.of.injury.resulting.in.death..month.day.year.,
State,
City=Location.of.death..city.,
Latitude,
Longitude,
Cause=Highest.level.of.force
)
#Drop NA
data_cleaned <- na.omit(data_select)
nrow(data_cleaned)
## [1] 30234
#Count the event happened per year
data$Year<-year(as.Date(data$Date.of.injury.resulting.in.death..month.day.year., format = "%m/%d/%Y"))
DeathByYear<-data%>%
group_by(Year) %>%
count()
#visualization
# Create a bar plot
ggplot(DeathByYear, aes(x = Year, y = n)) +
geom_bar(stat = "identity", fill = "darkgray") +
geom_text(aes(label = n), vjust = -0.5, color = "black", size = 2) +
labs(title = "Death by Year", x = "Year", y = "Number of Deaths") +
theme_minimal()
# Calculate the counts and arrange them in descending order
DeathByRace <- data_cleaned %>%
group_by(Race) %>%
count() %>%
filter(Race != "Race unspecified") %>%
arrange(desc(n)) %>%
head(6)
# Create a bar plot with y-axis labels in double lines without changing x and y aesthetics
ggplot(DeathByRace, aes(x = n, y = fct_reorder(Race, n))) +
geom_bar(stat = "identity", fill = "darkblue") +
geom_text(aes(label = n), hjust = -0.2, color = "black", size = 2) +
labs(title = "Top 6 Race Group for the Deceased from 2003-2021", x = "Number of Deaths", y = NULL) +
theme_minimal() +
theme(axis.text.y = element_text(size = 10),
axis.title.y = element_text(size = 12, margin = margin(b = 10))) +
scale_y_discrete(labels = function(y) str_wrap(y, width = 8),expand = c(0.3, 0))
library(RColorBrewer)
colors <- brewer.pal(n = 3, name = "Dark2")
data_cleaned <- data_cleaned %>%
mutate(Age = as.numeric(Age))
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Age = as.numeric(Age)`.
## Caused by warning:
## ! NAs introduced by coercion
# Define age bins and create AgeGroup column
data_cleaned <- data_cleaned %>%
mutate(AgeGroup = cut(Age, breaks = c(0, 10, 20, 30, 40, 50, 60, 70, 100), labels = c("0-10", "11-20", "21-30", "31-40", "41-50", "51-60", "61-70", "Above 70"), right = FALSE))
# Group by AgeGroup and Gender, then count
DeathByAgeGroupAndGender <- data_cleaned %>%
group_by(AgeGroup, Gender) %>%
summarise(Count = n(), .groups = 'drop')
# Create a bar plot with fixed bar width and custom colors
ggplot(DeathByAgeGroupAndGender, aes(x = AgeGroup, y = Count, fill = Gender)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.8)) +
labs(title = "Counts of Deaths by Age Group and Gender 2003-2021",
x = "Age Group", y = "Count", fill = "Gender") +
scale_fill_manual(values = colors) + # Apply custom color palette
theme_minimal()
# Calculate the counts and arrange them in descending order
DeathByCause <- data_cleaned %>%
group_by(Cause) %>%
count() %>%
arrange(desc(n)) %>%
head(10)
# Create a bar plot with adjusted margins
ggplot(DeathByCause, aes(x = n, y = fct_reorder(Cause, n))) +
geom_bar(stat = "identity", fill = "darkblue") +
geom_text(aes(label = n), hjust = -0.1, color = "black", size = 2) +
labs(title = "Top 10 Cause for the Death from 2003-2021", x = "Number of Deaths", y = NULL) + #
theme_minimal() +
theme(axis.text.y = element_text(size = 9),
axis.title.y = element_text(size = 12, margin = margin(b = 10)),
plot.margin = margin(0.5, 0, 0.5, 0.5, "cm")) +
scale_y_discrete(expand = c(0.1, 0))
load("StateCodeTable.RData")
# Group by State
DeathByState <- data_cleaned %>%
group_by(State) %>%
count()
# Merge table
DeathByState1 <- merge(DeathByState, StateCodeTable, by.x = "State", by.y = "STUSAB", all = FALSE)
Death_All_State <- merge(DeathByState1, allstate, by.x = "STATE_NAME", by.y = "name",all = FALSE)
# Calculate Race Percent
Death_All_State$PctWhite<-Death_All_State$race.white/Death_All_State$race.tot
Death_All_State$PctBlack<-Death_All_State$race.black/Death_All_State$race.tot
# histogram
ggplot(Death_All_State) +
geom_histogram(mapping = aes(x = n),
bins = 60,
color="black") #<< color of the outline
sf_Death_All_State <- st_as_sf(Death_All_State)
sf_Death_All_State_crs <- st_transform(sf_Death_All_State, crs=4326)
tm_shape(sf_Death_All_State_crs) + tm_polygons("n")
The analysis of police interactions resulting in death in the United States from 2003 to 2021 reveals several significant trends.
Over the years, there has been a consistent increase in the number of people killed during interactions with the police, indicating a concerning trend.
The majority of individuals killed during these interactions are of European-American/white ethnicity, followed by African-American/Black individuals, with Hispanic/Latino individuals ranking third in terms of fatalities.
The age group most affected by these fatalities falls within the range of 21 to 50 years, with a significant imbalance between male and female victims, with males experiencing a substantially higher mortality rate.
The leading causes of death during police interactions are gunshot wounds and incidents involving vehicles, significantly outnumbering other causes such as taser use.
While most U.S. states report fewer than 1500 deaths during this period, three states stand out as outliers. Notably, California has the highest number of deaths, exceeding 4500, prompting further examination of the association between the number of people killed during police interactions and various census variables in this state.
In this section, our primary focus centers on individuals who lost their lives during police interactions within the state of California. We have compiled the death by each tract, and conducted analysis to explore potential associations between the number of death and several key variables, including the degree of gentrification (as measured by "median household expense" and "median housing tax"), median household income, and the percentage of white residents.
CA_County <- suppressMessages(
get_acs(geography = "tract", # or "block group", "county", "state" etc.
state = "CA",
variables = c(hhincome = 'B19019_001',
race.tot = "B02001_001",
race.white = "B02001_002",
race.black = "B02001_003",
# Community Environment
med_housexp = "B25104_001",
med_realestate_taxes = "B25103_001"
),
year = 2020,
survey = "acs5", # American Community Survey 5-year estimate
geometry = TRUE, # returns sf objects
output = "wide") # wide vs. long
)
##
|
| | 0%
|
|= | 1%
|
|== | 2%
|
|== | 3%
|
|=== | 4%
|
|==== | 6%
|
|===== | 7%
|
|===== | 8%
|
|====== | 9%
|
|======= | 10%
|
|======= | 11%
|
|======== | 12%
|
|========= | 13%
|
|========== | 14%
|
|=========== | 16%
|
|============ | 17%
|
|============= | 18%
|
|============= | 19%
|
|============== | 20%
|
|=============== | 22%
|
|================ | 23%
|
|================= | 24%
|
|================== | 25%
|
|================== | 26%
|
|=================== | 27%
|
|==================== | 28%
|
|==================== | 29%
|
|====================== | 32%
|
|======================== | 34%
|
|========================= | 35%
|
|========================= | 36%
|
|========================== | 37%
|
|=========================== | 38%
|
|=========================== | 39%
|
|============================ | 40%
|
|============================= | 41%
|
|============================= | 42%
|
|============================== | 43%
|
|=============================== | 44%
|
|=============================== | 45%
|
|================================ | 45%
|
|================================ | 46%
|
|================================= | 47%
|
|================================== | 48%
|
|================================== | 49%
|
|=================================== | 50%
|
|==================================== | 51%
|
|==================================== | 52%
|
|====================================== | 54%
|
|====================================== | 55%
|
|======================================= | 55%
|
|======================================= | 56%
|
|======================================== | 57%
|
|========================================= | 58%
|
|========================================= | 59%
|
|========================================== | 60%
|
|=========================================== | 61%
|
|============================================= | 64%
|
|============================================= | 65%
|
|============================================== | 65%
|
|============================================== | 66%
|
|=============================================== | 67%
|
|================================================ | 68%
|
|================================================ | 69%
|
|================================================= | 70%
|
|================================================== | 71%
|
|================================================== | 72%
|
|=================================================== | 72%
|
|=================================================== | 73%
|
|==================================================== | 74%
|
|==================================================== | 75%
|
|===================================================== | 76%
|
|====================================================== | 77%
|
|====================================================== | 78%
|
|======================================================= | 79%
|
|======================================================== | 80%
|
|========================================================= | 81%
|
|========================================================= | 82%
|
|========================================================== | 83%
|
|=========================================================== | 84%
|
|============================================================ | 85%
|
|============================================================ | 86%
|
|============================================================= | 87%
|
|============================================================== | 88%
|
|=============================================================== | 90%
|
|================================================================= | 92%
|
|================================================================== | 94%
|
|================================================================== | 95%
|
|=================================================================== | 96%
|
|======================================================================| 100%
CA_tract <- CA_County %>%
select(GEOID,
hhincome = hhincomeE,
race.tot = race.totE,
race.white = race.whiteE,
race.black = race.blackE,
Med_HHExp = med_housexpE,
med_RETaxes = med_realestate_taxesE
)
CA_tract$PctWhite=CA_tract$race.white/CA_tract$race.tot
CA_tract$PctBlack=CA_tract$race.black/CA_tract$race.tot
# Convert "Latitude" and "Longitude" columns to numeric
data_cleaned$Latitude<-as.numeric(data_cleaned$Latitude)
## Warning: NAs introduced by coercion
data_cleaned$Longitude<-as.numeric(data_cleaned$Longitude)
# choose all cases in CA
Death_CA <- data_cleaned[data_cleaned$State == "CA", ]
# make the death data in CA to sf
sf_Death_CA <- Death_CA %>%
st_as_sf(coords=c("Longitude", "Latitude"), crs = 4326)
# Delete rows fall outside of the boundary of CA
CA <- places('CA')
## Retrieving data for the year 2021
##
|
| | 0%
|
| | 1%
|
|= | 1%
|
|== | 3%
|
|== | 4%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|===== | 7%
|
|===== | 8%
|
|====== | 8%
|
|====== | 9%
|
|======= | 9%
|
|======= | 10%
|
|======== | 11%
|
|======== | 12%
|
|========= | 12%
|
|========= | 13%
|
|========== | 14%
|
|========== | 15%
|
|=========== | 15%
|
|=========== | 16%
|
|============ | 17%
|
|============= | 18%
|
|============= | 19%
|
|============== | 20%
|
|============== | 21%
|
|=============== | 21%
|
|=============== | 22%
|
|================ | 22%
|
|================ | 23%
|
|================= | 24%
|
|================= | 25%
|
|=================== | 27%
|
|====================== | 31%
|
|======================= | 32%
|
|======================= | 33%
|
|======================== | 34%
|
|======================== | 35%
|
|========================= | 35%
|
|========================= | 36%
|
|========================== | 37%
|
|========================== | 38%
|
|=========================== | 38%
|
|=========================== | 39%
|
|============================ | 40%
|
|============================= | 41%
|
|============================= | 42%
|
|============================== | 43%
|
|=============================== | 44%
|
|=============================== | 45%
|
|================================ | 45%
|
|================================ | 46%
|
|================================= | 47%
|
|================================== | 48%
|
|================================== | 49%
|
|=================================== | 49%
|
|=================================== | 50%
|
|==================================== | 51%
|
|==================================== | 52%
|
|===================================== | 52%
|
|===================================== | 53%
|
|====================================== | 54%
|
|======================================= | 56%
|
|========================================= | 58%
|
|============================================ | 63%
|
|============================================= | 64%
|
|============================================= | 65%
|
|============================================== | 65%
|
|============================================== | 66%
|
|=============================================== | 67%
|
|=============================================== | 68%
|
|================================================ | 68%
|
|================================================ | 69%
|
|================================================= | 69%
|
|================================================= | 70%
|
|================================================== | 71%
|
|================================================== | 72%
|
|=================================================== | 73%
|
|=================================================== | 74%
|
|==================================================== | 74%
|
|==================================================== | 75%
|
|===================================================== | 76%
|
|====================================================== | 78%
|
|======================================================= | 78%
|
|======================================================= | 79%
|
|======================================================== | 79%
|
|======================================================== | 80%
|
|========================================================= | 81%
|
|========================================================= | 82%
|
|========================================================== | 82%
|
|========================================================== | 83%
|
|=========================================================== | 84%
|
|=========================================================== | 85%
|
|============================================================ | 85%
|
|============================================================ | 86%
|
|============================================================= | 88%
|
|============================================================== | 88%
|
|============================================================== | 89%
|
|=============================================================== | 90%
|
|================================================================ | 92%
|
|================================================================= | 92%
|
|================================================================= | 93%
|
|================================================================== | 94%
|
|================================================================== | 95%
|
|=================================================================== | 95%
|
|=================================================================== | 96%
|
|==================================================================== | 97%
|
|===================================================================== | 99%
|
|======================================================================| 99%
|
|======================================================================| 100%
CA_Geom <- st_transform(CA, crs=4326)
Death_in_CA <- st_intersection(sf_Death_CA, CA_Geom)
## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries
tm_shape(Death_in_CA) + tm_dots(col="red")
## Transform their CRS into the same
CA_tract_Geom <- st_transform(CA_tract, crs=4326)
Death_in_tract <- st_join(CA_tract_Geom, Death_in_CA, join = st_intersects)
# Now count the Death Cases by tract
Death_count_tract <- count(as_tibble(Death_in_tract), GEOID.x)
# Join tract geometry with the number of Death in tract
test <- st_join(CA_tract_Geom, Death_in_tract %>% mutate(count = 1))
out <- test %>%
group_by(GEOID.x) %>%
summarise(count = sum(count, na.rm = T))
## Join the counts of Death to the Tract data
Death_in_tract_join <- CA_tract_Geom %>%
left_join(out %>% st_set_geometry(NULL), by = c("GEOID"="GEOID.x"))
## check NA in ariables
#Turn the NA in 'count' to 0
Death_in_tract_join$Death<- ifelse(Death_in_tract_join$count>0, Death_in_tract_join$count, 0)
# Dropping the missing values
Death_dropna1Income <- Death_in_tract_join[!is.na(Death_in_tract_join$hhincome),]
Death_dropna2Race <- Death_dropna1Income[!is.na(Death_dropna1Income$PctWhite),]
Death_dropna3Exp <- Death_dropna2Race[!is.na(Death_dropna2Race$Med_HHExp),]
Death_dropna4Tax <- Death_dropna3Exp[!is.na(Death_dropna3Exp$med_RETaxes),]
#check the result
# To check if it is still a sf file
#Visualization in map
tm_shape(Death_dropna4Tax) + tm_polygons(col="Death")
# Define breaks and labels for creating categories
breaks <- c(0, 5000, 10000, 15000, 20000) # Define your own break points
labels <- c("Low", "Moderate", "High", "Very High") # Labels for categories
# Create a new column with categorized tax levels
Death_dropna4Tax$TaxLevel <- cut(Death_dropna4Tax$med_RETaxes, breaks = breaks, labels = labels)
# Create a scatter plot with custom aesthetics
ggplot(data = Death_dropna4Tax, aes(x = Med_HHExp, y = Death, color = TaxLevel)) +
geom_point(size = 1.8, shape = 16) + # Set point size and shape
labs(title = "Scatter Plot of Household Expenditure vs. Deaths by House Tax Level",
x = "Median Household Expenditure",
y = "Number of Deaths",
color = "House Tax Level") +
scale_color_manual(values = c("Low" = "gray50", "Moderate" = "lightgreen", "High" = "orange", "Very High" = "red")) + # Set custom colors
ggdark::dark_theme_gray()
## Inverted geom defaults of fill and color/colour.
## To change them back, use invert_geom_defaults().
Death_dropna4Tax %>%
mutate(PctWhite_cut = cut(PctWhite, breaks = quantile(PctWhite, prob = c(0,0.5,0.75,1)), include.lowest=TRUE)) %>%
ggplot(data = ., aes(x = hhincome, y = count)) +
geom_point(mapping = aes(color = PctWhite_cut)) +
scale_color_manual(values = c("gray50", "orange", "red"), labels = c("0 - 50th", "50th- 75th", "75th - 100th")) +
labs(x = "Annual Household Income", y = "Death Cases", color = "%White Residents (discrete)", title = "Household Income vs. Death Count") +
ggdark::dark_theme_gray() +
# ------------------------------------------------------------------
# This line of code adds the correlation analysis result to the plot
ggpubr::stat_cor(method = "pearson", label.x = 160000, label.y = -1.5)
a <- ggplot(data = Death_dropna4Tax) +
geom_boxplot(aes(x = Death, y = TaxLevel),
fill = "white", color = "black") +
labs(title = "Boxplot of Deaths by Housing Tax Level",
x= "Number of Death")
# Create custom breaks and labels for PctWhite
custom_breaks <- quantile(Death_dropna4Tax$PctWhite, prob = c(0, 0.5, 0.75, 1), include.lowest = TRUE)
custom_labels <- c("0 - 50th", "50th - 75th", "75th - 100th")
# Create a new variable for y-axis labels
Death_dropna4Tax$YAxisLabels <- cut(Death_dropna4Tax$PctWhite, breaks = custom_breaks, labels = custom_labels)
# Create the boxplot with custom y-axis labels and exclude NA values
b <- ggplot(data = Death_dropna4Tax) +
geom_boxplot(aes(x = Death, y = YAxisLabels), fill = "white", color = "black", na.rm = TRUE) +
labs(title = "Boxplot of Deaths by White Resident Percentage (Quantile)",
x = "Number of Deaths",
y = "White Resident Percentage") +
scale_y_discrete(labels = custom_labels)
gridExtra::grid.arrange(a, b)
Death_dropna4Tax$PctDeath<-Death_dropna4Tax$Death/sum(Death_dropna4Tax$Death)
regress_deathVshhincome <- lm(Death_dropna4Tax$PctDeath~Death_dropna4Tax$hhincome)
summary(regress_deathVshhincome)
##
## Call:
## lm(formula = Death_dropna4Tax$PctDeath ~ Death_dropna4Tax$hhincome)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.881e-05 -3.324e-05 -1.551e-05 5.760e-06 1.061e-03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.312e-04 1.788e-06 73.37 <2e-16 ***
## Death_dropna4Tax$hhincome -1.894e-10 1.864e-11 -10.16 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.141e-05 on 8710 degrees of freedom
## Multiple R-squared: 0.01172, Adjusted R-squared: 0.0116
## F-statistic: 103.3 on 1 and 8710 DF, p-value: < 2.2e-16
regress_deathVsPctWhite <- lm(Death_dropna4Tax$PctDeath~Death_dropna4Tax$PctWhite)
summary(regress_deathVsPctWhite)
##
## Call:
## lm(formula = Death_dropna4Tax$PctDeath ~ Death_dropna4Tax$PctWhite)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.026e-04 -3.294e-05 -1.619e-05 5.010e-06 1.060e-03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.332e-04 2.229e-06 59.745 <2e-16 ***
## Death_dropna4Tax$PctWhite -3.182e-05 3.626e-06 -8.776 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.151e-05 on 8710 degrees of freedom
## Multiple R-squared: 0.008766, Adjusted R-squared: 0.008652
## F-statistic: 77.02 on 1 and 8710 DF, p-value: < 2.2e-16
Both the intercept and the hhincome variable are highly significant becayse p-values < 2e-16).The negative coefficient for hhincome suggests that there is a negative relationship between hhincome and PctDeath. However, the effect size is extremely small.
Both the intercept and the PctWhite variable are highly significant beacuse p-values < 2e-16. The negative coefficient for PctWhite suggests that there is a negative relationship between PctWhite and PctDeath. However, the effect size is also very small.
regress_deathVsTax <- lm(Death_dropna4Tax$PctDeath~Death_dropna4Tax$med_RETaxes)
summary(regress_deathVsTax)
##
## Call:
## lm(formula = Death_dropna4Tax$PctDeath ~ Death_dropna4Tax$med_RETaxes)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.678e-05 -3.429e-05 -1.542e-05 5.150e-06 1.070e-03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.250e-04 1.614e-06 77.465 < 2e-16 ***
## Death_dropna4Tax$med_RETaxes -2.346e-09 3.250e-10 -7.217 5.75e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.162e-05 on 8710 degrees of freedom
## Multiple R-squared: 0.005945, Adjusted R-squared: 0.005831
## F-statistic: 52.09 on 1 and 8710 DF, p-value: 5.746e-13
regress_deathVsHHexp <- lm(Death_dropna4Tax$PctDeath~Death_dropna4Tax$Med_HHExp)
summary(regress_deathVsHHexp)
##
## Call:
## lm(formula = Death_dropna4Tax$PctDeath ~ Death_dropna4Tax$Med_HHExp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.269e-05 -3.252e-05 -1.692e-05 4.280e-06 1.077e-03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.018e-04 2.213e-06 46.001 < 2e-16 ***
## Death_dropna4Tax$Med_HHExp 8.854e-09 1.418e-09 6.243 4.48e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.167e-05 on 8710 degrees of freedom
## Multiple R-squared: 0.004455, Adjusted R-squared: 0.004341
## F-statistic: 38.98 on 1 and 8710 DF, p-value: 4.482e-10
Model 3 (PctDeath ~ med_RETaxes):
Both the intercept and the med_RETaxes variable are highly significant (p-values < 0.001).The negative coefficient for med_RETaxes suggests a negative relationship between med_RETaxes and PctDeath, but the effect size is extremely small.
Model 4 (PctDeath ~ Med_HHExp):
Both the intercept and the Med_HHExp variable are highly significant (p-values < 0.001).The positive coefficient for Med_HHExp suggests a positive relationship between Med_HHExp and PctDeath, but again, the effect size is very small.
This analysis investigated the relationships between various factors and the percentage of deaths during police interactions in each tract (PctDeath) within California tracts, employing four distinct models.
Overall, the analysis indicates statistically significant but minimal associations between median household income, percentage of white residents, degree of gentrification(variables "med_RETaxes" and "Med_HHExp" serve as indicators) and PctDeath in California tracts.
While these findings provide insights, more complex factors likely contribute to police interactions and their outcomes in specific tracts. Further research is essential for a comprehensive understanding.
```