Prepare The Data

setwd("C:/Users/49765/Desktop/Urban Analytics/major1")

# Read CSV file
data <- read.csv("FATAL ENCOUNTERS DOT ORG SPREADSHEET (See Read me tab) - Form Responses.csv")

head(data)
##   Unique.ID                    Name Age Gender                    Race
## 1     31495        Ashley McClendon  28 Female  African-American/Black
## 2     31496 Name withheld by police     Female        Race unspecified
## 3     31497 Name withheld by police       Male        Race unspecified
## 4     31491    Johnny C. Martin Jr.  36   Male        Race unspecified
## 5     31492           Dennis McHugh  44   Male European-American/White
## 6     31493      Ny'Darius McKinney  21   Male        Race unspecified
##    Race.with.imputations Imputation.probability
## 1 African-American/Black            Not imputed
## 2                   <NA>                   <NA>
## 3                   <NA>                   <NA>
## 4                   <NA>                   <NA>
## 5                   <NA>                   <NA>
## 6                   <NA>                   <NA>
##                                                URL.of.image..PLS.NO.HOTLINKS.
## 1 https://fatalencounters.org/wp-content/uploads/2022/01/Ashley-McClendon.jpg
## 2                                                                            
## 3                                                                            
## 4                                                                            
## 5                                                                            
## 6                                                                            
##   Date.of.injury.resulting.in.death..month.day.year.
## 1                                         12/31/2021
## 2                                         12/31/2021
## 3                                         12/31/2021
## 4                                         12/30/2021
## 5                                         12/30/2021
## 6                                         12/30/2021
##       Location.of.injury..address. Location.of.death..city. State
## 1 South Pearl Street and Tory Road                 Pageland    SC
## 2                 1500 21st Street                 Meridian    MS
## 3                 1500 21st Street                 Meridian    MS
## 4                    Martinez Lane                 Nicholls    GA
## 5                 435 E 4th Street                 Beaumont    CA
## 6    State Rd S-29-296 & Bethel Rd                Lancaster    SC
##   Location.of.death..zip.code. Location.of.death..county.
## 1                        29728               Chesterfield
## 2                        39301                 Lauderdale
## 3                        39301                 Lauderdale
## 4                        31554                     Coffee
## 5                        92223                  Riverside
## 6                        29720                  Lancaster
##                                                      Full.Address   Latitude
## 1 South Pearl Street and Tory Road Pageland SC 29728 Chesterfield 34.7452955
## 2                   1500 21st Street Meridian MS 39301 Lauderdale 32.3793294
## 3                   1500 21st Street Meridian MS 39301 Lauderdale 32.3793294
## 4                          Martinez Lane Nicholls GA 31554 Coffee 31.5307934
## 5                    400 E 4th Street Beaumont CA 92223 Riverside 33.9261462
## 6      State Rd S-29-296 & Bethel Rd Lancaster SC 29720 Lancaster 34.6608217
##    Longitude
## 1  -80.39306
## 2  -88.69397
## 3  -88.69397
## 4  -82.63782
## 5 -116.97715
## 6  -80.83714
##                                                                    Agency.or.agencies.involved
## 1                                                                   Pageland Police Department
## 2                                                                   Meridian Police Department
## 3                                                                   Meridian Police Department
## 4                                                               Coffee County Sheriff's Office
## 5 Riverside County Sheriff's Department, Beaumont Police Department, Banning Police Department
## 6                                                      South Carolina Law Enforcement Division
##   Highest.level.of.force UID.Temporary Name.Temporary Armed.Unarmed
## 1                Vehicle            NA                             
## 2                Gunshot            NA                             
## 3                Gunshot            NA                             
## 4                Gunshot            NA                             
## 5                Gunshot            NA                             
## 6                Vehicle            NA                             
##   Alleged.weapon Aggressive.physical.movement Fleeing.Not.fleeing
## 1                                                                
## 2                                                                
## 3                                                                
## 4                                                                
## 5                                                                
## 6                                                                
##   Description.Temp URL.Temp
## 1                          
## 2                          
## 3                          
## 4                          
## 5                          
## 6                          
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Brief.description
## 1                                                                                                                         Ashley McClendon's boyfriend, 33-year-old Marcus Allen Davis, allegedly was driving a 1996 Ford Coupe back to Pageland when an officer reportedly saw the car run a stop sign before midnight on Dec. 31. A traffic stop was attempted, but Davis refused to stop. As the car fled down South Pearl Street near Tory Road it left the road and struck a tree, killing passenger McClendon.
## 2                                                                                                                                                                                                                                                                          Police responded to a man causing a disturbance who was covered in blood. The man had a gun in each hand at a home. Once officers arrived, they were met with gunfire. Officers responded back with gunfire. A man and woman were killed.
## 3                                                                                                                                                                                                                                                                          Police responded to a man causing a disturbance who was covered in blood. The man had a gun in each hand at a home. Once officers arrived, they were met with gunfire. Officers responded back with gunfire. A man and woman were killed.
## 4                                                                   Johnny C. Martin, Jr. arrived at a gas station at 7:10 p.m. While at the gas station, Martin allegedly carjacked a woman, shooting at her while stealing her car. The Ware County Sheriff's Office and the Georgia State Patrol found the stolen car driven by Martin and pursued. Martin eventually went off the road into a field. When officers approached the car, they found Martin with a fatal gunshot to the head and a gun in his hand.
## 5 Deputies responded to a domestic violence call. When deputies arrived, the man was gone. The man reportedly had a felony warrant for a violation of an assault with a deadly weapon, domestic violence, kidnapping, vandalism, and a domestic violence restraining order violation. Deputies learned the suspect was in the city of Beaumont. A helicopter found the suspect's car. When officers located the man, he tried to flee by ramming his car into other cars. An officer and deputy shot and killed him.
## 6                                                                                                                                                                       About 5:35 p.m., Joseph Jemar Hinson was allegedly driving a car when police tried to pull him over, and he fled. Police pursued him until he ran off the left side of the roadway, struck a fence and wrecked, killing back seat passenger Ny'Darius McKinney. Hinson was charged with failure to stop for a blue light resulting in death.
##   Dispositions.Exclusions.INTERNAL.USE..NOT.FOR.ANALYSIS
## 1                                               Criminal
## 2                                  Pending investigation
## 3                                  Pending investigation
## 4                                                Suicide
## 5                                  Pending investigation
## 6                                               Criminal
##   Intended.use.of.force..Developing.
## 1                            Pursuit
## 2                       Deadly force
## 3                       Deadly force
## 4                            Suicide
## 5                       Deadly force
## 6                            Pursuit
##                                                                                            Supporting.document.link
## 1 https://www.wsoctv.com/news/1-person-dead-after-attempting-escape-police-troopers-say/QXA244QPUZGJ5GAGRADGDWBAEU/
## 2                                                        https://www.wtok.com/2022/01/01/officer-involved-shooting/
## 3                                                        https://www.wtok.com/2022/01/01/officer-involved-shooting/
## 4   https://gbi.georgia.gov/press-releases/2021-12-31/gbi-perry-and-douglas-offices-investigating-related-shootings
## 5                                   https://kesq.com/news/2021/12/31/officer-involved-shooting-unfolds-in-beaumont/
## 6                   https://www.thelancasternews.com/content/21-year-old-man-killed-when-car-fleeing-police-crashes
##   Foreknowledge.of.mental.illness..INTERNAL.USE..NOT.FOR.ANALYSIS  X X.1
## 1                                                              No NA  NA
## 2                                                              No NA  NA
## 3                                                              No NA  NA
## 4                                                              No NA  NA
## 5                                                              No NA  NA
## 6                                                              No NA  NA
##   Unique.ID.formula Unique.identifier..redundant.
## 1                NA                         31495
## 2                NA                         31496
## 3                NA                         31497
## 4                NA                         31491
## 5                NA                         31492
## 6                NA                         31493

Data Cleaning

Delete unused columns

After perusing the data and realizing that I will not use the following columns, remove them.

# Delete specified multiple columns
cleaned_data <- data %>%
  select(-c(
    "Race.with.imputations",
    "Imputation.probability",
    "URL.of.image..PLS.NO.HOTLINKS.",
    "UID.Temporary",
    "Name.Temporary",
    "Description.Temp",
    "URL.Temp",
    "Unique.ID.formula",
    "Unique.identifier..redundant.",
    "X",
    "X.1",
    "Location.of.death..zip.code.",
    "Brief.description",
    "Supporting.document.link",
    "Agency.or.agencies.involved",
    "Dispositions.Exclusions.INTERNAL.USE..NOT.FOR.ANALYSIS",
    "Foreknowledge.of.mental.illness..INTERNAL.USE..NOT.FOR.ANALYSIS"
  ))
skim(cleaned_data)
Data summary
Name cleaned_data
Number of rows 31498
Number of columns 19
_______________________
Column type frequency:
character 17
numeric 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Name 0 1 4 82 0 29859 0
Age 0 1 0 5 1221 112 0
Gender 0 1 0 11 144 4 0
Race 0 1 0 57 1 12 0
Date.of.injury.resulting.in.death..month.day.year. 0 1 10 10 0 7736 0
Location.of.injury..address. 0 1 0 74 556 28893 0
Location.of.death..city. 0 1 0 30 36 6340 0
State 0 1 0 2 1 52 0
Location.of.death..county. 0 1 0 33 15 1536 0
Full.Address 0 1 0 103 1 29709 0
Latitude 0 1 0 17 1 29515 0
Highest.level.of.force 0 1 0 33 4 19 0
Armed.Unarmed 0 1 0 19 14419 10 0
Alleged.weapon 0 1 0 35 14421 269 0
Aggressive.physical.movement 0 1 0 42 14418 32 0
Fleeing.Not.fleeing 0 1 0 42 14419 26 0
Intended.use.of.force..Developing. 0 1 0 22 3 9 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Unique.ID 1 1 15749.0 9092.55 1.00 7875 15749.00 23623.00 31497.00 ▇▇▇▇▇
Longitude 1 1 -95.4 16.30 -165.59 -111 -90.56 -82.57 -67.27 ▁▁▅▇▇

Going through the skim data, I found that there are a lot of null values in the data. Since each row represents a life lost, I didn’t want to simply remove the null values. The way I chose to handle this was to first populate all Na values and null values with NoData.

# Replace all NA values with "NoData"
cleaned_data[is.na(cleaned_data)] <- "NoData"

# Replace all empty cells with "NoData"
cleaned_data[cleaned_data == ""] <- "NoData"

skim(cleaned_data)
Data summary
Name cleaned_data
Number of rows 31498
Number of columns 19
_______________________
Column type frequency:
character 19
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Unique.ID 0 1 1 6 0 31498 0
Name 0 1 4 82 0 29859 0
Age 0 1 1 6 0 112 0
Gender 0 1 4 11 0 4 0
Race 0 1 6 57 0 12 0
Date.of.injury.resulting.in.death..month.day.year. 0 1 10 10 0 7736 0
Location.of.injury..address. 0 1 3 74 0 28893 0
Location.of.death..city. 0 1 3 30 0 6340 0
State 0 1 2 6 0 52 0
Location.of.death..county. 0 1 3 33 0 1536 0
Full.Address 0 1 2 103 0 29709 0
Latitude 0 1 2 17 0 29515 0
Longitude 0 1 3 17 0 29515 0
Highest.level.of.force 0 1 5 33 0 19 0
Armed.Unarmed 0 1 4 19 0 10 0
Alleged.weapon 0 1 4 35 0 269 0
Aggressive.physical.movement 0 1 4 42 0 32 0
Fleeing.Not.fleeing 0 1 4 42 0 26 0
Intended.use.of.force..Developing. 0 1 2 22 0 9 0

But a null value was found in the UNIQUE ID column, after browsing, it was found to be meaningless and deleted.

# Remove rows with "NoData" in Unique.ID
cleaned_data_1 <- cleaned_data[cleaned_data$Unique.ID != "NoData", ]
unique_race <- unique(cleaned_data_1$Race)

# Show unique 'race' values
print(unique_race)
##  [1] "African-American/Black"                                   
##  [2] "Race unspecified"                                         
##  [3] "European-American/White"                                  
##  [4] "Hispanic/Latino"                                          
##  [5] "Christopher Anthony Alexander"                            
##  [6] "Asian/Pacific Islander"                                   
##  [7] "Native American/Alaskan"                                  
##  [8] "European-American/European-American/White"                
##  [9] "Middle Eastern"                                           
## [10] "African-American/Black African-American/Black Not imputed"
## [11] "european-American/White"

In the meantime, I found the values for race to be too confusing, and I organized them.

# Updating of the 'race' columns
cleaned_data_1$Race <- ifelse(cleaned_data_1$Race %in% c("european-American/White", "European-American/European-American/White"), "European-American/White", cleaned_data_1$Race)
cleaned_data_1$Race <- ifelse(cleaned_data_1$Race %in% c("African-American/Black African-American/Black Not imputed"), "African-American/Black", cleaned_data_1$Race)
cleaned_data_1$Race <- ifelse(cleaned_data_1$Race == "Christopher Anthony Alexander", "Race unspecified", cleaned_data_1$Race)

Story telling

1

As can be seen from the histogram, almost all of those who suffered accidents were male. I believe that this is partly due to the fact that men have a physical advantage and are more likely to be perceived as a threat. Additionally, historical gender biases may have played a role in this disparity, as men have long faced legal disadvantages.

# Creation of gender distribution chart
ggplot(cleaned_data_1, aes(x = Gender)) +
  geom_bar(fill = "skyblue") +
  labs(title = "Gender Distribution", x = "Gender", y = "Number")

2

In the age distribution, there is a dramatic increase in the number of accidents from the age of 15 onwards, peaking at the stage of around 30 years of age. This may be influenced by a number of factors, including social, psychological and physiological aspects.Risk-Taking Behavior: Young people are generally more inclined to take risks and explore, which may result in them being more likely to get into accidents or dangerous situations. This behavioral tendency is especially pronounced during the teenage years, so an increase in the number of accidents from the age of 15 may be linked to this factor. Driving and Traffic Behavior: young drivers may lack driving experience, leading to an increased rate of traffic accidents. As they get older and gain driving experience, accident rates may decrease. Physiological Factors: Young people typically have better physical resilience and therefore may recover more easily from minor accidents or injuries, which may cause them to be more likely to report accidents. Family and responsibilities: As individuals age, they usually take on more family and social responsibilities, which may make them more careful and cautious, reducing the risk of accidents.

# Replace "NoData" with NA
cleaned_data_1$Age[cleaned_data_1$Age == "NoData"] <- NA

# Convert age columns to numeric values
cleaned_data_1$Age <- as.numeric(cleaned_data_1$Age)
## Warning: 强制改变过程中产生了NA
# Creation of age distribution chart
ggplot(cleaned_data_1, aes(x = Age)) +
  geom_histogram(binwidth = 10, fill = "skyblue", color = "black") +
  labs(title = "Age Distribution", x = "Age", y = "Number") +
  theme_minimal()
## Warning: Removed 1222 rows containing non-finite values (`stat_bin()`).

3

In the racial distribution of accidents, I was struck by the stereotype that the white population corresponded to the highest number of accidents. However, there is also a very high number of people of indeterminate race. The relationship between race and accidents will be analyzed later.

# Create a data box with counts of different races
race_counts <- table(cleaned_data_1$Race)

# Converting Count Data to Data Frames
race_data <- data.frame(Race = names(race_counts), Count = as.numeric(race_counts))

# Creation of race distribution chart
race_plot <- ggplot(data = race_data, aes(x = Race, y = Count, fill = Race)) +
  geom_bar(stat = "identity") +
  labs(title = "Race Distribution",
       x = "Race",
       y = "Count") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

print(race_plot)

4

In the distribution of accidents versus time, we can see that the number of accidents is getting higher.This may indicate a problem with the management of the police system, which is not taking the matter seriously enough to take action after the increasing number of police killings of civillians.

# Convert date columns to datetime format
cleaned_data_1$Date.of.injury.resulting.in.death..month.day.year. <- as.Date(cleaned_data_1$Date.of.injury.resulting.in.death..month.day.year., format = "%m/%d/%Y")

# Extract year
cleaned_data_1$Year <- year(cleaned_data_1$Date.of.injury.resulting.in.death..month.day.year.)

# Creation of year distribution chart
ggplot(cleaned_data_1, aes(x = Year)) +
  geom_histogram(binwidth = 1, fill = "skyblue", color = "black") +
  labs(title = "Year Distribution", x = "Year", y = "Number") +
  theme_minimal()

5

Analyzing the highest levels of violence in the incidents, we see that most of the deaths are caused by gunshots, a small percentage by carriers, and the other causes are small by comparison.

# Create a dataframe with Highest.level.of.force and counts
force_counts <- table(cleaned_data_1$Highest.level.of.force)
force_data <- data.frame(Level = names(force_counts), Count = as.vector(force_counts))


# Use the pie function to create a pie chart, setting the labels parameter to a null character
pie(force_data$Count, labels = rep("", length(force_data$Level)),
    col = rainbow(length(force_data$Count)),
    main = "Distribution of Highest Level of Force")

# Create a customized legend containing only "Gunshot" and "Vehicle"
legend("topright", legend = c("Vehicle","Gunshot"),
       fill = rainbow(2), title = "Legend")

6

The histogram shows that most of the intentions to use force result from death threats.

# Create chart and rotate the x-axis labels
ggplot(cleaned_data_1, aes(x = Intended.use.of.force..Developing.)) +
  geom_bar() +
  labs(x = "Intended Use of Force (Developing)", y = "Count", title = "Distribution of Intended Use of Force (Developing)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

7

By analyzing the relationship between shootings and death threats, we can conclude that the cause of almost all shootings is death threats.

# 子集操作,选择最高武力级别是"Gunshot"的行
subset_data <- subset(cleaned_data_1, Highest.level.of.force == "Gunshot")

# 计算"Deadly force"和"不是"的数量
count_deadly_force <- sum(subset_data$Intended.use.of.force..Developing. == "Deadly force")
count_not_deadly_force <- sum(subset_data$Intended.use.of.force..Developing. == "No")

# 创建一个新的数据框,用于可视化
count_data <- data.frame(Force_Type = c("Deadly force", "Not Deadly force"),
                         Count = c(count_deadly_force, count_not_deadly_force))

# 使用ggplot2创建可视化图表
library(ggplot2)

ggplot(count_data, aes(x = Force_Type, y = Count, fill = Force_Type)) +
  geom_bar(stat = "identity") +
  labs(title = "Distribution of Intended Use of Force in Gunshot Incidents",
       x = "Force Type",
       y = "Count") +
  theme_minimal() +
  theme(legend.position = "none")  # 隐藏图例

8

Since all shootings can be explained by death threats, I can use the number of death threats caused by different races to verify my conjecture that blacks experience additional shootings. So, we analyze the number of death threats caused by different races. It can be seen that the distribution of the number is almost similar to the number of accidents.

If we exclude the number of unrecognized races, blacks did not experience additional shootings. This is different from the “truth” I know from the news: usually the media gives the impression that blacks are more aggressive. So police are more likely to shoot when confronted by blacks. Especially since there are multiple well known cases of black people being unjustifiably shot, it makes me think that police are more likely to shoot black people.

However, the data tells me that the number of police shootings when confronted with black and white, both races, is actually similar and does not include race as a basis for judgment.

# 子集操作,选择最高武力级别是"Deadly force"的行
deadly_force_data <- subset(cleaned_data_1, Intended.use.of.force..Developing. == "Deadly force")

# 计算各个种族的分布
race_counts <- table(deadly_force_data$Race)

# 将结果转换为数据框
race_counts_df <- data.frame(Race = names(race_counts), Count = as.numeric(race_counts))

# 使用ggplot2创建可视化图表
library(ggplot2)

ggplot(race_counts_df, aes(x = Race, y = Count, fill = Race)) +
  geom_bar(stat = "identity") +
  labs(title = "Distribution of Races in Deadly Force Incidents",
       x = "Race",
       y = "Count") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # 旋转x轴标签,以防止交叉