I will first import my libraries, set my directory, and load the dataset
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(leaflet)
## Warning: package 'leaflet' was built under R version 4.5.3
setwd("C:/Users/SwagD/Downloads/Data 110")
collisions <- read_csv("Police_Dispatches_for_Collisions_20260423.csv")
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
## Rows: 155484 Columns: 26
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (12): Incident_ID, Start Time, End Time, Initial Type, Close Type, Addre...
## dbl (14): Crime Reports, Crash Reports, Priority, Zip, Longitude, Latitude, ...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
collisions # make sure it loaded properly
## # A tibble: 155,484 × 26
## Incident_ID `Crime Reports` `Crash Reports` `Start Time` `End Time` Priority
## <chr> <dbl> <dbl> <chr> <chr> <dbl>
## 1 P2600114893 NA NA 04/20/2026 1… 04/20/202… 4
## 2 P2600114884 NA NA 04/20/2026 1… 04/20/202… 0
## 3 P2600114887 NA NA 04/20/2026 1… 04/20/202… 2
## 4 P2600114901 NA NA 04/20/2026 1… 04/20/202… 0
## 5 P2600114890 NA NA 04/20/2026 1… 04/20/202… 0
## 6 P2600114781 NA NA 04/20/2026 0… 04/20/202… 2
## 7 P2600114808 NA NA 04/20/2026 1… 04/20/202… 3
## 8 P2600114843 NA NA 04/20/2026 1… 04/20/202… 2
## 9 P2600114821 NA NA 04/20/2026 1… 04/20/202… 2
## 10 P2600114812 NA NA 04/20/2026 1… 04/20/202… 2
## # ℹ 155,474 more rows
## # ℹ 20 more variables: `Initial Type` <chr>, `Close Type` <chr>, Address <chr>,
## # City <chr>, State <chr>, Zip <dbl>, Longitude <dbl>, Latitude <dbl>,
## # `Police District Number` <dbl>, Beat <chr>, PRA <chr>,
## # `CallTime CallRoute` <dbl>, `Calltime Dispatch` <dbl>,
## # `Calltime Arrive` <dbl>, `Calltime Cleared` <dbl>,
## # `CallRoute Dispatch` <dbl>, `Dispatch Arrive` <dbl>, …
Now I will clean the dataset
# Clean column names
names(collisions) <- tolower(names(collisions))
# Rename columns with spaces so they work
names(collisions) <- gsub(" ", "_", names(collisions))
# Convert types, filter, and grab only needed variables
collisions_clean <- collisions %>%
mutate(
city = as.factor(city),
zip = as.factor(zip),
priority = as.numeric(priority),
dispatch_arrive = as.numeric(dispatch_arrive),
arrive_cleared = as.numeric(arrive_cleared),
latitude = as.numeric(latitude),
longitude = as.numeric(longitude),
disposition_desc = as.character(disposition_desc)
) %>%
filter(city %in% c("SILVER SPRING", "ROCKVILLE", "GAITHERSBURG", "GERMANTOWN", "BETHESDA")) %>%
filter(disposition_desc %in% c("COLOTH-INJRY-ROAD-PED", "COLOTH-INJRY-ROAD-HITRUN")) %>%
drop_na(latitude, longitude, dispatch_arrive) %>%
filter(latitude > 38, longitude < -76) %>%
mutate(
city = as.factor(str_to_title(as.character(city))),
collision_type = case_when(
disposition_desc == "COLOTH-INJRY-ROAD-PED" ~ "Pedestrian Injury",
disposition_desc == "COLOTH-INJRY-ROAD-HITRUN" ~ "Hit-and-Run Injury"
)
) %>%
select(city, zip, collision_type, priority, dispatch_arrive, arrive_cleared, latitude, longitude)
nrow(collisions_clean)
## [1] 2500
summary(collisions_clean)
## city zip collision_type priority
## Bethesda : 231 20910 : 268 Length:2500 Min. :0.0000
## Gaithersburg : 399 20902 : 234 Class :character 1st Qu.:0.0000
## Germantown : 221 20852 : 194 Mode :character Median :0.0000
## Rockville : 474 20906 : 193 Mean :0.4812
## Silver Spring:1175 20850 : 183 3rd Qu.:0.0000
## (Other):1423 Max. :4.0000
## NA's : 5
## dispatch_arrive arrive_cleared latitude longitude
## Min. : 0.0 Min. : 91 Min. :38.94 Min. :-77.29
## 1st Qu.: 138.8 1st Qu.: 1747 1st Qu.:39.02 1st Qu.:-77.18
## Median : 228.0 Median : 2972 Median :39.06 Median :-77.09
## Mean : 343.5 Mean : 4138 Mean :39.07 Mean :-77.10
## 3rd Qu.: 382.2 3rd Qu.: 5028 3rd Qu.:39.11 3rd Qu.:-77.03
## Max. :5568.0 Max. :258781 Max. :39.29 Max. :-76.94
##
# Incident count by city and collision type
collisions_clean %>%
count(city, collision_type)
## # A tibble: 10 × 3
## city collision_type n
## <fct> <chr> <int>
## 1 Bethesda Hit-and-Run Injury 64
## 2 Bethesda Pedestrian Injury 167
## 3 Gaithersburg Hit-and-Run Injury 158
## 4 Gaithersburg Pedestrian Injury 241
## 5 Germantown Hit-and-Run Injury 80
## 6 Germantown Pedestrian Injury 141
## 7 Rockville Hit-and-Run Injury 165
## 8 Rockville Pedestrian Injury 309
## 9 Silver Spring Hit-and-Run Injury 499
## 10 Silver Spring Pedestrian Injury 676
# Incidents by zip code
collisions_clean %>%
count(zip, sort = TRUE) %>%
head(10)
## # A tibble: 10 × 2
## zip n
## <fct> <int>
## 1 20910 268
## 2 20902 234
## 3 20852 194
## 4 20906 193
## 5 20850 183
## 6 20877 177
## 7 20904 164
## 8 20901 161
## 9 20874 142
## 10 20814 132
# Scatter plot of dispatch time vs time on scene
ggplot(collisions_clean, aes(x = dispatch_arrive, y = arrive_cleared)) +
geom_point(alpha = 0.5, color = "blue") +
scale_y_continuous(labels = scales::comma) +
scale_x_continuous(labels = scales::comma) +
labs(title = "Response Time vs Time on Scene",
x = "Dispatch to Arrival in seconds",
y = "Arrival to Cleared in seconds")
# Creates a summary based on city
city_summary <- collisions_clean %>%
group_by(city, collision_type) %>%
summarize(
count = n(),
avg_resp = round(mean(dispatch_arrive, na.rm = TRUE), 0)
)
## `summarise()` has grouped output by 'city'. You can override using the
## `.groups` argument.
city_summary
## # A tibble: 10 × 4
## # Groups: city [5]
## city collision_type count avg_resp
## <fct> <chr> <int> <dbl>
## 1 Bethesda Hit-and-Run Injury 64 639
## 2 Bethesda Pedestrian Injury 167 333
## 3 Gaithersburg Hit-and-Run Injury 158 419
## 4 Gaithersburg Pedestrian Injury 241 296
## 5 Germantown Hit-and-Run Injury 80 398
## 6 Germantown Pedestrian Injury 141 334
## 7 Rockville Hit-and-Run Injury 165 521
## 8 Rockville Pedestrian Injury 309 310
## 9 Silver Spring Hit-and-Run Injury 499 364
## 10 Silver Spring Pedestrian Injury 676 269
# Plots collision counts by city filled by collision type
ggplot(city_summary, aes(x = reorder(city, -count, sum), y = count, fill = collision_type)) +
geom_col(color = "black") +
scale_fill_brewer(palette = "Set2") +
annotate("text", x = 1, y = 115,
label = "Highest pedestrian\ninjury count",
size = 3.2, color = "black", fontface = "italic") +
labs(
title = "Pedestrian & Hit and Run Injury Collisions by City",
x = "City",
y = "Number of Incidents",
fill = "Collision Type",
caption = "Source: Montgomery County Open Data Portal: Police Dispatches for Collisions"
) +
theme_minimal() +
theme(legend.position = "bottom")
# Sample down to 500 points to make the map look nicer
set.seed(42)
collisions_plot <- collisions_clean %>%
slice_sample(n = min(500, nrow(collisions_clean)))
pal <- colorFactor(c("red", "orange"), domain = collisions_plot$collision_type)
leaflet(collisions_plot) %>%
addTiles() %>%
addCircleMarkers(
lng = ~longitude,
lat = ~latitude,
radius = 5,
color = ~pal(collision_type),
stroke = FALSE,
fillOpacity = 0.8,
popup = ~paste0(
"<strong>", city, "</strong><br>",
"Zip: ", zip, "<br>",
"Type: ", collision_type, "<br>",
"Priority: ", priority, "<br>",
"Response Time: ", dispatch_arrive, " seconds<br>",
"Time on Scene: ", arrive_cleared, " seconds"
)
) %>%
addLegend(
position = "bottomright",
pal = pal,
values = ~collision_type,
title = "Collision Type",
opacity = 0.9
)
I cleaned the dataset by removing all special value from every column names, converting the categorical variables into factors, and filtering out to only having the pedestrian variable and hit and run injury collisions variable across the five cites in the county. I filtered it first before converting the city names to title case so the uppercase values matched. I then used drop_na() on only the variables I needed and selected only the columns I would use in the analysis.
The bar chart shows that Silver Spring has the most serious collisions when compared any other I’m looking at, which makes sense when we consider Silver Spring has a bigger population and foot traffic compared to the other four cities. Hit and run injuries appeared in every city, meaning this isn’t a city specific issue, but a countywide problem. The map breaks things down further by zip code, as when you click on any dot it shows the exact zip, collision type, priority, and response time. The Zip codes with the highest collision amount are 20902 and 20910, which are in the Silver Spring backing up the claim that it has the most collisions.
This analysis only goes over one year and two collision types. If I were to do this in the future, that version of it could look across multiple years and break down incidents by time of day to see if collisions happen more at night or at day.
Dataset: Montgomery County Open Data Portal — Police Dispatches for Collisions. https://data.montgomerycountymd.gov/Public-Safety/Police-Dispatches-for-Collisions/gfmu-f97a/about_data
AI assistance: I used gemini to get the command scale_y_continuous(labels = scales::comma) and scale_x_continuous(labels = scales::comma) because Arrival to Cleared in seconds was being shown as 2e+05 and other numbers like that. I didn’t know how to fix it so I asked Gemini and it gave me that command to turn that into 100,000 and 200,000