There was a lot of cleaning done to create my visualizations. The cleaning process began with Excel files, which were converted to CSV for easier handling. Since the variables I wanted were spread across files, they were merged, focusing on aligning them by state. Then, I removed any missing values and converted ages to numbers for a scatter plot. For a ridge plot showing age distribution, I grouped ages into categories like teens (ages 13 to 19). Similarly, for a horizontal bar graph, I replaced missing values and filtered the data to focus on specific columns. Next, for an area chart, I grouped data by year and type. Finally, for a stacked bar graph, I replaced missing values and filtered the data accordingly.
The website features a section detailing their data collection process. Here, they explain how their dataset is sourced from official police use of force data collection programs in states such as California, Texas, and Virginia. They supplement this with nationwide data from crowdsourced databases like The Gun Violence Archive and the Fatal Encounters database. Additionally, they conducted comprehensive research, scouring social media, obituaries, criminal records databases, police reports, and other sources to ascertain the race of 90 percent of all victims in their database.I chose this topic and dataset because police violence is a pressing social issue that crosses with systemic racism and injustice. It’s important to talk about these issues because they affect real people’s lives and have long-lasting consequences.
# Load required libraries
library(tidyverse)
library(highcharter)
library(lubridate)
library(leaflet)
library(ggridges)
library(ggthemes)
library(htmlwidgets)
library(plotly)
# Set working directory
setwd("/Users/rebeccambaho/Downloads/project 3")
# Load datasets
Policekillings <- read_csv("Policekillings.csv")
MergedDataset <- read_csv("MergedDataset.csv")
# Clean and wrangle data
cleaned_data <- MergedDataset %>%
filter(!is.na(`Victim's age`), !is.na(`# Black people killed`)) %>%
mutate(`Victim's age` = as.numeric(ifelse(is.na(`Victim's age`), NA, `Victim's age`))) %>%
filter(!is.na(`Victim's age`), `Victim's age` > 0)
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Victim's age = as.numeric(ifelse(is.na(`Victim's age`), NA,
## `Victim's age`))`.
## Caused by warning:
## ! NAs introduced by coercion
# Scatter plot with theme_solarized
scatter_plot <- ggplot(cleaned_data, aes(x = `Victim's age`, y = `# Black people killed`)) +
geom_point(color = "darkorange") +
labs(title = "Police Killings Among Black People by Age",
x = "Victim's Age",
y = "Number of Black People Killed",
caption = "Source: Mapping police voilence") +
theme_solarized() +
scale_colour_solarized('brown')
scatter_plot
The scatterplot shows how the age of victims relates to the number of black people killed by police. Each dot represents one incident, with the victim’s age on the horizontal axis and the number of black people killed on the vertical axis. The graph results are quite interesting because they display a clear trend. It seems like there is a straight-line relationship between the variables.
# Linear regression
model <- lm(`# Black people killed` ~ `Victim's age`, data = cleaned_data)
summary(model)
##
## Call:
## lm(formula = `# Black people killed` ~ `Victim's age`, data = cleaned_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -141.09 -86.92 -28.23 130.07 167.49
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 145.01547 2.80079 51.777 < 2e-16 ***
## `Victim's age` -0.23120 0.07124 -3.245 0.00118 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 102.2 on 12151 degrees of freedom
## Multiple R-squared: 0.000866, Adjusted R-squared: 0.0007838
## F-statistic: 10.53 on 1 and 12151 DF, p-value: 0.001176
# Diagnostic plots
plot(model)
The linear regression analysis tells us that there’s a connection between the age of victims and the number of black people killed by police. Specifically, as victims get older, the number of black people killed tends to decrease. For every year increase in age, there’s about a quarter-person fewer being killed. However, this model isn’t great at explaining why police killings happen among black people, as it only explains a tiny bit of the differences we see in the data. While the relationship between age and killings appears real (because of the low p-value), age alone doesn’t tell us much about why police killings happen. There are likely other important factors that aren’t included in this analysis.
# Data Cleaning and Manipulation
MergedDataset_Clean <- MergedDataset %>%
filter(!is.na(`Victim's age`) & !is.na(`# Black people killed`)) %>%
select(`Victim's age`, `# Black people killed`)
# Print the cleaned dataset to check for missing values
print(MergedDataset_Clean)
## # A tibble: 12,704 × 2
## `Victim's age` `# Black people killed`
## <chr> <dbl>
## 1 45 4
## 2 23 4
## 3 27 4
## 4 35 4
## 5 28 4
## 6 21 4
## 7 27 4
## 8 40 4
## 9 28 4
## 10 23 4
## # ℹ 12,694 more rows
categorize_age <- function(age) {
if (is.na(age)) {
return("Unknown")
} else if (age < 1) {
return("Babies")
} else if (age >= 1 & age <= 12) {
return("Children")
} else if (age >= 13 & age <= 19) {
return("Teen")
} else if (age >= 20 & age <= 39) {
return("Adult")
} else if (age >= 40 & age <= 59) {
return("Middle Age")
} else if (age >= 60) {
return("Senior")
} else {
return("Unknown")
}
}
# Apply age categorization and remove Unknown category
MergedDataset_Clean <- MergedDataset_Clean %>%
mutate(Age_Group = sapply(`Victim's age`, categorize_age)) %>%
filter(!is.na(Age_Group))
# create the ridge plot
ggplot(MergedDataset_Clean, aes(x = `# Black people killed`, y = Age_Group, fill = Age_Group)) +
geom_density_ridges(alpha = 0.6) +
theme_stata() + # Add theme_stata
theme(legend.position = "none",
axis.title.y = element_text(hjust = 0.5),
axis.title.x = element_text(hjust = 0.5),
axis.text.y = element_text(angle = 0, hjust = 1),
plot.caption = element_text(size = 8, hjust = 1)) +
labs(title = "Distribution of Black People Killed by Age Group",
caption = "Source: Mapping Police Violence") +
scale_fill_brewer(palette = "YlGnBu")
## Picking joint bandwidth of 30.1
The graph illustrates the density distribution of black people killed across various age groups. It shows how the number of fatalities varies among different age categories. We can see that children seemed to be the highest.
This Tableau visualization depicts the number of police killings categorized by race and gender, presented as a horizontal double bar chart. Each racial group is depicted along the vertical axis, with separate bars for different sexes. The length of each bar corresponds to the count of police killings, facilitating a straightforward comparison between different racial groups and genders. From the graph, we gather several insights. Firstly, more men were affected than any other sex. Particularly, white men experienced the highest rate of victimization compared to any other racial group, which was surprising.
# Define main encounter types
main_encounter_types <- c("Violent Crime", "Non-Violent Offense", "Domestic Disturbance", "Person with a Weapon", "Traffic Stop", "Mental Health/Welfare Check", "Other Crimes Against People", "None/Unknown")
# Data Cleaning and Manipulation
Policekillings_clean <- Policekillings %>%
filter(!is.na(`Date of Incident (month/day/year)`),
!is.na(`Encounter Type`)) %>%
mutate(Date_of_Incident = lubridate::mdy(`Date of Incident (month/day/year)`),
Year = lubridate::year(Date_of_Incident),
Encounter_Type = stringr::str_extract(`Encounter Type`,
paste(main_encounter_types, collapse = "|"))) %>%
filter(!is.na(Encounter_Type)) %>%
count(Year, Encounter_Type, name = "Police_Killings_Count", .groups = "drop")
# Aggregate data by year and encounter type
Policekillings_aggregated <- Policekillings_clean %>%
group_by(Year, Encounter_Type) %>%
summarise(Police_Killings_Count = sum(Police_Killings_Count), .groups = "drop")
# Convert encounter type to factor to maintain order
Policekillings_aggregated$Encounter_Type <- factor(Policekillings_aggregated$Encounter_Type,
levels = main_encounter_types)
# Create the Highcharter chart with a source caption
hc <- highchart() %>%
hc_chart(type = "area", marginTop = 80) %>% # Increase marginTop to create space for the title
hc_title(text = "Police Killings Over Years by Encounter Type") %>%
hc_xAxis(categories = unique(Policekillings_aggregated$Year)) %>%
hc_yAxis(title = list(text = "Number of Police Killings")) %>%
hc_plotOptions(area = list(stacking = "normal", lineColor = "#ffffff",
lineWidth = 0.2)) %>%
hc_add_series(data = Policekillings_aggregated,
type = "area",
hcaes(x = Year, y = Police_Killings_Count,
group = Encounter_Type,
name = Encounter_Type),
showInLegend = TRUE,
shape = "square") %>%
hc_legend(align = "right", verticalAlign = "top", layout = "vertical", y = 50) %>% # Adjust legend position
hc_colors(c("rgb(141,211,199)", "rgb(255,255,179)", "rgb(190,186,218)",
"rgb(251,128,114)", "rgb(128,177,211)", "rgb(253,180,98)",
"rgb(179,222,105)", "rgb(252,205,229)")) %>%
hc_add_theme(hc_theme_ffx()) %>% # Add FFX theme
hc_credits(enabled = TRUE, text = "Source: Mapping police violence", position = list(align = "right", x = -10))
# Print the chart
hc
The graph illustrates the fluctuation in the number of police killings across various encounter types over the years. Each color represents a distinct encounter type, such as Violent Crime or Traffic Stop. By analyzing the chart, we gain valuable insights into trends in law enforcement incidents. It is evident that encounters involving violent crime were the most prevalent. Additionally, 2023 stands out as a peak year for encounters, although other years also exhibit notable peaks.
# Data Preprocessing
MergedDataset <- MergedDataset %>%
mutate(Race = ifelse(is.na(`Victim's race`), "Unknown", `Victim's race`),
Fleeing = ifelse(is.na(`Fleeing (Source: WaPo and Review of Cases Not Included in WaPo Database)`), "Unknown",
`Fleeing (Source: WaPo and Review of Cases Not Included in WaPo Database)`),
Fleeing = ifelse(Fleeing %in% c("Foot", "Car"), "Fleeing", Fleeing))
# Filter the dataset based on race categories and fleeing statuses
MergedDataset <- MergedDataset %>%
filter(Race %in% c("White", "Black", "Hispanic", "Asian", "Pacific Islander", "Unknown Race", "Native American"),
Fleeing %in% c("Not Fleeing", "Fleeing"))
# Calculate the counts for each combination
counts <- MergedDataset %>%
group_by(Race, Fleeing) %>%
summarise(count = n(), .groups = 'drop') # explicitly setting .groups to 'drop'
# Check the structure of the counts data frame
str(counts)
## tibble [12 × 3] (S3: tbl_df/tbl/data.frame)
## $ Race : chr [1:12] "Asian" "Asian" "Black" "Black" ...
## $ Fleeing: chr [1:12] "Fleeing" "Not Fleeing" "Fleeing" "Not Fleeing" ...
## $ count : int [1:12] 38 135 942 1420 598 1107 47 91 17 23 ...
# Pivot the data for highcharter
pivot_data <- counts %>%
pivot_wider(names_from = Fleeing,
values_from = count,
values_fill = 0)
# Create the stacked bar chart
plot1 <- highchart() |>
hc_chart(type = "bar") |>
hc_title(text = "Police Killing Victims: Armed Status by Race and Fleeing Type", style = list(color = "white")) |>
hc_xAxis(categories = pivot_data$Race, labels = list(style = list(color = "white"))) |>
hc_yAxis(title = list(text = "Count", style = list(color = "white")), labels = list(style = list(color = "white"))) |>
hc_legend(enabled = TRUE, itemStyle = list(color = "white")) |>
hc_plotOptions(series = list(stacking = "normal")) |>
hc_add_series(name = "Not Fleeing", data = pivot_data$`Not Fleeing`, color = "blue") |>
hc_add_series(name = "Fleeing", data = pivot_data$Fleeing, color = "aliceblue") |>
hc_add_theme(hc_theme_monokai()) |> # Apply the monokai theme
hc_tooltip(shared = TRUE, borderColor = "black", style = list(color = "white"), backgroundColor = "black") |>
hc_caption(text = "Source: Mapping police voilence", align = "right", style = list(color = "white"))
# Display the chart
plot1
This stacked bar graph shows the number of police killing victims based on their race and whether they were fleeing or not. The graph reveals that the number of victims who were not fleeing (depicted in blue) surpasses those who were fleeing (depicted in light blue)
# Calculate total number of victims per state
data <- read_csv("MergedDataset.csv")
state_victims <- aggregate(`Victim's name` ~ State, data, FUN = length)
# Create a color palette
palette <- colorNumeric(palette = "Reds", domain = state_victims$`Victim's name`)
leaflet(data) %>%
addTiles() %>%
addCircleMarkers(
~Longitude, ~Latitude,
radius = 5,
color = ~palette(state_victims$`Victim's name`[match(State, state_victims$State)]),
fillOpacity = 0.7,
weight = 1,
popup = ~paste("State: ", State, "<br>",
"Total Victims: ", state_victims$`Victim's name`[match(State, state_victims$State)], "<br>",
"Victim's name: ", `Victim's name`, "<br>",
"Victim's age: ", `Victim's age`, "<br>",
"Date of Incident: ", `Date of Incident (month/day/year)`, "<br>",
"Cause of death: ", `Cause of death`, "<br>",
"Media description: ", `Media description of the circumstances surrounding the death`)
) %>%
addLegend("bottomright", pal = palette, values = state_victims$`Victim's name`,
title = "Total Victims", opacity = 1) %>%
addControl(
position = "topleft",
html = "<h3>Map of Police Killings across America</h3>"
) %>%
addControl(
position = "bottomleft",
html = "<p>Source: Mapping police violence</p>"
)
## Warning in validateCoords(lng, lat, funcName): Data contains 158836 rows with
## either missing or invalid lat/lon values and will be ignored