setwd("C:/Users/49765/Desktop/Urban Analytics/major1")
# Read CSV file
data <- read.csv("FATAL ENCOUNTERS DOT ORG SPREADSHEET (See Read me tab) - Form Responses.csv")
head(data)
## Unique.ID Name Age Gender Race
## 1 31495 Ashley McClendon 28 Female African-American/Black
## 2 31496 Name withheld by police Female Race unspecified
## 3 31497 Name withheld by police Male Race unspecified
## 4 31491 Johnny C. Martin Jr. 36 Male Race unspecified
## 5 31492 Dennis McHugh 44 Male European-American/White
## 6 31493 Ny'Darius McKinney 21 Male Race unspecified
## Race.with.imputations Imputation.probability
## 1 African-American/Black Not imputed
## 2 <NA> <NA>
## 3 <NA> <NA>
## 4 <NA> <NA>
## 5 <NA> <NA>
## 6 <NA> <NA>
## URL.of.image..PLS.NO.HOTLINKS.
## 1 https://fatalencounters.org/wp-content/uploads/2022/01/Ashley-McClendon.jpg
## 2
## 3
## 4
## 5
## 6
## Date.of.injury.resulting.in.death..month.day.year.
## 1 12/31/2021
## 2 12/31/2021
## 3 12/31/2021
## 4 12/30/2021
## 5 12/30/2021
## 6 12/30/2021
## Location.of.injury..address. Location.of.death..city. State
## 1 South Pearl Street and Tory Road Pageland SC
## 2 1500 21st Street Meridian MS
## 3 1500 21st Street Meridian MS
## 4 Martinez Lane Nicholls GA
## 5 435 E 4th Street Beaumont CA
## 6 State Rd S-29-296 & Bethel Rd Lancaster SC
## Location.of.death..zip.code. Location.of.death..county.
## 1 29728 Chesterfield
## 2 39301 Lauderdale
## 3 39301 Lauderdale
## 4 31554 Coffee
## 5 92223 Riverside
## 6 29720 Lancaster
## Full.Address Latitude
## 1 South Pearl Street and Tory Road Pageland SC 29728 Chesterfield 34.7452955
## 2 1500 21st Street Meridian MS 39301 Lauderdale 32.3793294
## 3 1500 21st Street Meridian MS 39301 Lauderdale 32.3793294
## 4 Martinez Lane Nicholls GA 31554 Coffee 31.5307934
## 5 400 E 4th Street Beaumont CA 92223 Riverside 33.9261462
## 6 State Rd S-29-296 & Bethel Rd Lancaster SC 29720 Lancaster 34.6608217
## Longitude
## 1 -80.39306
## 2 -88.69397
## 3 -88.69397
## 4 -82.63782
## 5 -116.97715
## 6 -80.83714
## Agency.or.agencies.involved
## 1 Pageland Police Department
## 2 Meridian Police Department
## 3 Meridian Police Department
## 4 Coffee County Sheriff's Office
## 5 Riverside County Sheriff's Department, Beaumont Police Department, Banning Police Department
## 6 South Carolina Law Enforcement Division
## Highest.level.of.force UID.Temporary Name.Temporary Armed.Unarmed
## 1 Vehicle NA
## 2 Gunshot NA
## 3 Gunshot NA
## 4 Gunshot NA
## 5 Gunshot NA
## 6 Vehicle NA
## Alleged.weapon Aggressive.physical.movement Fleeing.Not.fleeing
## 1
## 2
## 3
## 4
## 5
## 6
## Description.Temp URL.Temp
## 1
## 2
## 3
## 4
## 5
## 6
## Brief.description
## 1 Ashley McClendon's boyfriend, 33-year-old Marcus Allen Davis, allegedly was driving a 1996 Ford Coupe back to Pageland when an officer reportedly saw the car run a stop sign before midnight on Dec. 31. A traffic stop was attempted, but Davis refused to stop. As the car fled down South Pearl Street near Tory Road it left the road and struck a tree, killing passenger McClendon.
## 2 Police responded to a man causing a disturbance who was covered in blood. The man had a gun in each hand at a home. Once officers arrived, they were met with gunfire. Officers responded back with gunfire. A man and woman were killed.
## 3 Police responded to a man causing a disturbance who was covered in blood. The man had a gun in each hand at a home. Once officers arrived, they were met with gunfire. Officers responded back with gunfire. A man and woman were killed.
## 4 Johnny C. Martin, Jr. arrived at a gas station at 7:10 p.m. While at the gas station, Martin allegedly carjacked a woman, shooting at her while stealing her car. The Ware County Sheriff's Office and the Georgia State Patrol found the stolen car driven by Martin and pursued. Martin eventually went off the road into a field. When officers approached the car, they found Martin with a fatal gunshot to the head and a gun in his hand.
## 5 Deputies responded to a domestic violence call. When deputies arrived, the man was gone. The man reportedly had a felony warrant for a violation of an assault with a deadly weapon, domestic violence, kidnapping, vandalism, and a domestic violence restraining order violation. Deputies learned the suspect was in the city of Beaumont. A helicopter found the suspect's car. When officers located the man, he tried to flee by ramming his car into other cars. An officer and deputy shot and killed him.
## 6 About 5:35 p.m., Joseph Jemar Hinson was allegedly driving a car when police tried to pull him over, and he fled. Police pursued him until he ran off the left side of the roadway, struck a fence and wrecked, killing back seat passenger Ny'Darius McKinney. Hinson was charged with failure to stop for a blue light resulting in death.
## Dispositions.Exclusions.INTERNAL.USE..NOT.FOR.ANALYSIS
## 1 Criminal
## 2 Pending investigation
## 3 Pending investigation
## 4 Suicide
## 5 Pending investigation
## 6 Criminal
## Intended.use.of.force..Developing.
## 1 Pursuit
## 2 Deadly force
## 3 Deadly force
## 4 Suicide
## 5 Deadly force
## 6 Pursuit
## Supporting.document.link
## 1 https://www.wsoctv.com/news/1-person-dead-after-attempting-escape-police-troopers-say/QXA244QPUZGJ5GAGRADGDWBAEU/
## 2 https://www.wtok.com/2022/01/01/officer-involved-shooting/
## 3 https://www.wtok.com/2022/01/01/officer-involved-shooting/
## 4 https://gbi.georgia.gov/press-releases/2021-12-31/gbi-perry-and-douglas-offices-investigating-related-shootings
## 5 https://kesq.com/news/2021/12/31/officer-involved-shooting-unfolds-in-beaumont/
## 6 https://www.thelancasternews.com/content/21-year-old-man-killed-when-car-fleeing-police-crashes
## Foreknowledge.of.mental.illness..INTERNAL.USE..NOT.FOR.ANALYSIS X X.1
## 1 No NA NA
## 2 No NA NA
## 3 No NA NA
## 4 No NA NA
## 5 No NA NA
## 6 No NA NA
## Unique.ID.formula Unique.identifier..redundant.
## 1 NA 31495
## 2 NA 31496
## 3 NA 31497
## 4 NA 31491
## 5 NA 31492
## 6 NA 31493
After perusing the data and realizing that I will not use the following columns, remove them.
# Delete specified multiple columns
cleaned_data <- data %>%
select(-c(
"Race.with.imputations",
"Imputation.probability",
"URL.of.image..PLS.NO.HOTLINKS.",
"UID.Temporary",
"Name.Temporary",
"Description.Temp",
"URL.Temp",
"Unique.ID.formula",
"Unique.identifier..redundant.",
"X",
"X.1",
"Location.of.death..zip.code.",
"Brief.description",
"Supporting.document.link",
"Agency.or.agencies.involved",
"Dispositions.Exclusions.INTERNAL.USE..NOT.FOR.ANALYSIS",
"Foreknowledge.of.mental.illness..INTERNAL.USE..NOT.FOR.ANALYSIS"
))
skim(cleaned_data)
| Name | cleaned_data |
| Number of rows | 31498 |
| Number of columns | 19 |
| _______________________ | |
| Column type frequency: | |
| character | 17 |
| numeric | 2 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| Name | 0 | 1 | 4 | 82 | 0 | 29859 | 0 |
| Age | 0 | 1 | 0 | 5 | 1221 | 112 | 0 |
| Gender | 0 | 1 | 0 | 11 | 144 | 4 | 0 |
| Race | 0 | 1 | 0 | 57 | 1 | 12 | 0 |
| Date.of.injury.resulting.in.death..month.day.year. | 0 | 1 | 10 | 10 | 0 | 7736 | 0 |
| Location.of.injury..address. | 0 | 1 | 0 | 74 | 556 | 28893 | 0 |
| Location.of.death..city. | 0 | 1 | 0 | 30 | 36 | 6340 | 0 |
| State | 0 | 1 | 0 | 2 | 1 | 52 | 0 |
| Location.of.death..county. | 0 | 1 | 0 | 33 | 15 | 1536 | 0 |
| Full.Address | 0 | 1 | 0 | 103 | 1 | 29709 | 0 |
| Latitude | 0 | 1 | 0 | 17 | 1 | 29515 | 0 |
| Highest.level.of.force | 0 | 1 | 0 | 33 | 4 | 19 | 0 |
| Armed.Unarmed | 0 | 1 | 0 | 19 | 14419 | 10 | 0 |
| Alleged.weapon | 0 | 1 | 0 | 35 | 14421 | 269 | 0 |
| Aggressive.physical.movement | 0 | 1 | 0 | 42 | 14418 | 32 | 0 |
| Fleeing.Not.fleeing | 0 | 1 | 0 | 42 | 14419 | 26 | 0 |
| Intended.use.of.force..Developing. | 0 | 1 | 0 | 22 | 3 | 9 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Unique.ID | 1 | 1 | 15749.0 | 9092.55 | 1.00 | 7875 | 15749.00 | 23623.00 | 31497.00 | ▇▇▇▇▇ |
| Longitude | 1 | 1 | -95.4 | 16.30 | -165.59 | -111 | -90.56 | -82.57 | -67.27 | ▁▁▅▇▇ |
Going through the skim data, I found that there are a lot of null values in the data. Since each row represents a life lost, I didn’t want to simply remove the null values. The way I chose to handle this was to first populate all Na values and null values with NoData.
# Replace all NA values with "NoData"
cleaned_data[is.na(cleaned_data)] <- "NoData"
# Replace all empty cells with "NoData"
cleaned_data[cleaned_data == ""] <- "NoData"
skim(cleaned_data)
| Name | cleaned_data |
| Number of rows | 31498 |
| Number of columns | 19 |
| _______________________ | |
| Column type frequency: | |
| character | 19 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| Unique.ID | 0 | 1 | 1 | 6 | 0 | 31498 | 0 |
| Name | 0 | 1 | 4 | 82 | 0 | 29859 | 0 |
| Age | 0 | 1 | 1 | 6 | 0 | 112 | 0 |
| Gender | 0 | 1 | 4 | 11 | 0 | 4 | 0 |
| Race | 0 | 1 | 6 | 57 | 0 | 12 | 0 |
| Date.of.injury.resulting.in.death..month.day.year. | 0 | 1 | 10 | 10 | 0 | 7736 | 0 |
| Location.of.injury..address. | 0 | 1 | 3 | 74 | 0 | 28893 | 0 |
| Location.of.death..city. | 0 | 1 | 3 | 30 | 0 | 6340 | 0 |
| State | 0 | 1 | 2 | 6 | 0 | 52 | 0 |
| Location.of.death..county. | 0 | 1 | 3 | 33 | 0 | 1536 | 0 |
| Full.Address | 0 | 1 | 2 | 103 | 0 | 29709 | 0 |
| Latitude | 0 | 1 | 2 | 17 | 0 | 29515 | 0 |
| Longitude | 0 | 1 | 3 | 17 | 0 | 29515 | 0 |
| Highest.level.of.force | 0 | 1 | 5 | 33 | 0 | 19 | 0 |
| Armed.Unarmed | 0 | 1 | 4 | 19 | 0 | 10 | 0 |
| Alleged.weapon | 0 | 1 | 4 | 35 | 0 | 269 | 0 |
| Aggressive.physical.movement | 0 | 1 | 4 | 42 | 0 | 32 | 0 |
| Fleeing.Not.fleeing | 0 | 1 | 4 | 42 | 0 | 26 | 0 |
| Intended.use.of.force..Developing. | 0 | 1 | 2 | 22 | 0 | 9 | 0 |
But a null value was found in the UNIQUE ID column, after browsing, it was found to be meaningless and deleted.
# Remove rows with "NoData" in Unique.ID
cleaned_data_1 <- cleaned_data[cleaned_data$Unique.ID != "NoData", ]
unique_race <- unique(cleaned_data_1$Race)
# Show unique 'race' values
print(unique_race)
## [1] "African-American/Black"
## [2] "Race unspecified"
## [3] "European-American/White"
## [4] "Hispanic/Latino"
## [5] "Christopher Anthony Alexander"
## [6] "Asian/Pacific Islander"
## [7] "Native American/Alaskan"
## [8] "European-American/European-American/White"
## [9] "Middle Eastern"
## [10] "African-American/Black African-American/Black Not imputed"
## [11] "european-American/White"
In the meantime, I found the values for race to be too confusing, and I organized them.
# Updating of the 'race' columns
cleaned_data_1$Race <- ifelse(cleaned_data_1$Race %in% c("european-American/White", "European-American/European-American/White"), "European-American/White", cleaned_data_1$Race)
cleaned_data_1$Race <- ifelse(cleaned_data_1$Race %in% c("African-American/Black African-American/Black Not imputed"), "African-American/Black", cleaned_data_1$Race)
cleaned_data_1$Race <- ifelse(cleaned_data_1$Race == "Christopher Anthony Alexander", "Race unspecified", cleaned_data_1$Race)
As can be seen from the histogram, almost all of those who suffered accidents were male. I believe that this is partly due to the fact that men have a physical advantage and are more likely to be perceived as a threat. Additionally, historical gender biases may have played a role in this disparity, as men have long faced legal disadvantages.
# Creation of gender distribution chart
ggplot(cleaned_data_1, aes(x = Gender)) +
geom_bar(fill = "skyblue") +
labs(title = "Gender Distribution", x = "Gender", y = "Number")
In the age distribution, there is a dramatic increase in the number of accidents from the age of 15 onwards, peaking at the stage of around 30 years of age. This may be influenced by a number of factors, including social, psychological and physiological aspects.Risk-Taking Behavior: Young people are generally more inclined to take risks and explore, which may result in them being more likely to get into accidents or dangerous situations. This behavioral tendency is especially pronounced during the teenage years, so an increase in the number of accidents from the age of 15 may be linked to this factor. Driving and Traffic Behavior: young drivers may lack driving experience, leading to an increased rate of traffic accidents. As they get older and gain driving experience, accident rates may decrease. Physiological Factors: Young people typically have better physical resilience and therefore may recover more easily from minor accidents or injuries, which may cause them to be more likely to report accidents. Family and responsibilities: As individuals age, they usually take on more family and social responsibilities, which may make them more careful and cautious, reducing the risk of accidents.
# Replace "NoData" with NA
cleaned_data_1$Age[cleaned_data_1$Age == "NoData"] <- NA
# Convert age columns to numeric values
cleaned_data_1$Age <- as.numeric(cleaned_data_1$Age)
## Warning: 强制改变过程中产生了NA
# Creation of age distribution chart
ggplot(cleaned_data_1, aes(x = Age)) +
geom_histogram(binwidth = 10, fill = "skyblue", color = "black") +
labs(title = "Age Distribution", x = "Age", y = "Number") +
theme_minimal()
## Warning: Removed 1222 rows containing non-finite values (`stat_bin()`).
In the racial distribution of accidents, I was struck by the stereotype that the white population corresponded to the highest number of accidents. However, there is also a very high number of people of indeterminate race. The relationship between race and accidents will be analyzed later.
# Create a data box with counts of different races
race_counts <- table(cleaned_data_1$Race)
# Converting Count Data to Data Frames
race_data <- data.frame(Race = names(race_counts), Count = as.numeric(race_counts))
# Creation of race distribution chart
race_plot <- ggplot(data = race_data, aes(x = Race, y = Count, fill = Race)) +
geom_bar(stat = "identity") +
labs(title = "Race Distribution",
x = "Race",
y = "Count") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
print(race_plot)
In the distribution of accidents versus time, we can see that the number of accidents is getting higher.This may indicate a problem with the management of the police system, which is not taking the matter seriously enough to take action after the increasing number of police killings of civillians.
# Convert date columns to datetime format
cleaned_data_1$Date.of.injury.resulting.in.death..month.day.year. <- as.Date(cleaned_data_1$Date.of.injury.resulting.in.death..month.day.year., format = "%m/%d/%Y")
# Extract year
cleaned_data_1$Year <- year(cleaned_data_1$Date.of.injury.resulting.in.death..month.day.year.)
# Creation of year distribution chart
ggplot(cleaned_data_1, aes(x = Year)) +
geom_histogram(binwidth = 1, fill = "skyblue", color = "black") +
labs(title = "Year Distribution", x = "Year", y = "Number") +
theme_minimal()
Analyzing the highest levels of violence in the incidents, we see that most of the deaths are caused by gunshots, a small percentage by carriers, and the other causes are small by comparison.
# Create a dataframe with Highest.level.of.force and counts
force_counts <- table(cleaned_data_1$Highest.level.of.force)
force_data <- data.frame(Level = names(force_counts), Count = as.vector(force_counts))
# Use the pie function to create a pie chart, setting the labels parameter to a null character
pie(force_data$Count, labels = rep("", length(force_data$Level)),
col = rainbow(length(force_data$Count)),
main = "Distribution of Highest Level of Force")
# Create a customized legend containing only "Gunshot" and "Vehicle"
legend("topright", legend = c("Vehicle","Gunshot"),
fill = rainbow(2), title = "Legend")
The histogram shows that most of the intentions to use force result from death threats.
# Create chart and rotate the x-axis labels
ggplot(cleaned_data_1, aes(x = Intended.use.of.force..Developing.)) +
geom_bar() +
labs(x = "Intended Use of Force (Developing)", y = "Count", title = "Distribution of Intended Use of Force (Developing)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
By analyzing the relationship between shootings and death threats, we can conclude that the cause of almost all shootings is death threats.
# 子集操作,选择最高武力级别是"Gunshot"的行
subset_data <- subset(cleaned_data_1, Highest.level.of.force == "Gunshot")
# 计算"Deadly force"和"不是"的数量
count_deadly_force <- sum(subset_data$Intended.use.of.force..Developing. == "Deadly force")
count_not_deadly_force <- sum(subset_data$Intended.use.of.force..Developing. == "No")
# 创建一个新的数据框,用于可视化
count_data <- data.frame(Force_Type = c("Deadly force", "Not Deadly force"),
Count = c(count_deadly_force, count_not_deadly_force))
# 使用ggplot2创建可视化图表
library(ggplot2)
ggplot(count_data, aes(x = Force_Type, y = Count, fill = Force_Type)) +
geom_bar(stat = "identity") +
labs(title = "Distribution of Intended Use of Force in Gunshot Incidents",
x = "Force Type",
y = "Count") +
theme_minimal() +
theme(legend.position = "none") # 隐藏图例
Since all shootings can be explained by death threats, I can use the number of death threats caused by different races to verify my conjecture that blacks experience additional shootings. So, we analyze the number of death threats caused by different races. It can be seen that the distribution of the number is almost similar to the number of accidents.
If we exclude the number of unrecognized races, blacks did not experience additional shootings. This is different from the “truth” I know from the news: usually the media gives the impression that blacks are more aggressive. So police are more likely to shoot when confronted by blacks. Especially since there are multiple well known cases of black people being unjustifiably shot, it makes me think that police are more likely to shoot black people.
However, the data tells me that the number of police shootings when confronted with black and white, both races, is actually similar and does not include race as a basis for judgment.
# 子集操作,选择最高武力级别是"Deadly force"的行
deadly_force_data <- subset(cleaned_data_1, Intended.use.of.force..Developing. == "Deadly force")
# 计算各个种族的分布
race_counts <- table(deadly_force_data$Race)
# 将结果转换为数据框
race_counts_df <- data.frame(Race = names(race_counts), Count = as.numeric(race_counts))
# 使用ggplot2创建可视化图表
library(ggplot2)
ggplot(race_counts_df, aes(x = Race, y = Count, fill = Race)) +
geom_bar(stat = "identity") +
labs(title = "Distribution of Races in Deadly Force Incidents",
x = "Race",
y = "Count") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # 旋转x轴标签,以防止交叉