Data 110 Project 2

By: Ameer Adegun

#1 Introduction

The dataset I chose for this project was the Police Dispatches for Collisions dataset from the Montgomery County Open Data Portal. Each row is a police dispatch event for a traffic collisions. I only wanted to look at the most serious collision types which were pedestrian injuries and hit and run injuries, specifically the ones that occurred in the five biggest cities in the county.

I chose this dataset because traffic is something that affects me everyday as I commute to college, so I want to explore the safety around it. I also want to understand what areas serious collisions are happening in the most and how quickly police are respond ton these incidents. The variables I used for this project are city,(categorical),zip_code(categorical), collision_type (categorical), priority (catagorical), dispatch_arrive (quantitative), arrive_aleared (quantitative), and Latitude/Longitude

#2 Data Cleaning

I will first import my libraries, set my directory, and load the dataset

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(leaflet)
## Warning: package 'leaflet' was built under R version 4.5.3
setwd("C:/Users/SwagD/Downloads/Data 110")

collisions <- read_csv("Police_Dispatches_for_Collisions_20260423.csv")
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## Rows: 155484 Columns: 26
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (12): Incident_ID, Start Time, End Time, Initial Type, Close Type, Addre...
## dbl (14): Crime Reports, Crash Reports, Priority, Zip, Longitude, Latitude, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
collisions # make sure it loaded properly
## # A tibble: 155,484 × 26
##    Incident_ID `Crime Reports` `Crash Reports` `Start Time`  `End Time` Priority
##    <chr>                 <dbl>           <dbl> <chr>         <chr>         <dbl>
##  1 P2600114893              NA              NA 04/20/2026 1… 04/20/202…        4
##  2 P2600114884              NA              NA 04/20/2026 1… 04/20/202…        0
##  3 P2600114887              NA              NA 04/20/2026 1… 04/20/202…        2
##  4 P2600114901              NA              NA 04/20/2026 1… 04/20/202…        0
##  5 P2600114890              NA              NA 04/20/2026 1… 04/20/202…        0
##  6 P2600114781              NA              NA 04/20/2026 0… 04/20/202…        2
##  7 P2600114808              NA              NA 04/20/2026 1… 04/20/202…        3
##  8 P2600114843              NA              NA 04/20/2026 1… 04/20/202…        2
##  9 P2600114821              NA              NA 04/20/2026 1… 04/20/202…        2
## 10 P2600114812              NA              NA 04/20/2026 1… 04/20/202…        2
## # ℹ 155,474 more rows
## # ℹ 20 more variables: `Initial Type` <chr>, `Close Type` <chr>, Address <chr>,
## #   City <chr>, State <chr>, Zip <dbl>, Longitude <dbl>, Latitude <dbl>,
## #   `Police District Number` <dbl>, Beat <chr>, PRA <chr>,
## #   `CallTime CallRoute` <dbl>, `Calltime Dispatch` <dbl>,
## #   `Calltime Arrive` <dbl>, `Calltime Cleared` <dbl>,
## #   `CallRoute Dispatch` <dbl>, `Dispatch Arrive` <dbl>, …

Now I will clean the dataset

# Clean column names
names(collisions) <- tolower(names(collisions))

# Rename columns with spaces so they work
names(collisions) <- gsub(" ", "_", names(collisions))

# Convert types, filter, and grab only needed variables
collisions_clean <- collisions %>%
  mutate(
    city            = as.factor(city),
    zip             = as.factor(zip),
    priority        = as.numeric(priority),
    dispatch_arrive  = as.numeric(dispatch_arrive),
    arrive_cleared   = as.numeric(arrive_cleared),
    latitude         = as.numeric(latitude),
    longitude        = as.numeric(longitude),
    disposition_desc = as.character(disposition_desc)
  ) %>%
  filter(city %in% c("SILVER SPRING", "ROCKVILLE", "GAITHERSBURG", "GERMANTOWN", "BETHESDA")) %>%
  filter(disposition_desc %in% c("COLOTH-INJRY-ROAD-PED", "COLOTH-INJRY-ROAD-HITRUN")) %>%
  drop_na(latitude, longitude, dispatch_arrive) %>%
  filter(latitude > 38, longitude < -76) %>%
  mutate(
    city = as.factor(str_to_title(as.character(city))),
    collision_type = case_when(
      disposition_desc == "COLOTH-INJRY-ROAD-PED"    ~ "Pedestrian Injury",
      disposition_desc == "COLOTH-INJRY-ROAD-HITRUN" ~ "Hit-and-Run Injury"
    )
  ) %>%
  select(city, zip, collision_type, priority, dispatch_arrive, arrive_cleared, latitude, longitude)

nrow(collisions_clean)
## [1] 2500

#3 Exploring the Dataset

summary(collisions_clean)
##             city           zip       collision_type        priority     
##  Bethesda     : 231   20910  : 268   Length:2500        Min.   :0.0000  
##  Gaithersburg : 399   20902  : 234   Class :character   1st Qu.:0.0000  
##  Germantown   : 221   20852  : 194   Mode  :character   Median :0.0000  
##  Rockville    : 474   20906  : 193                      Mean   :0.4812  
##  Silver Spring:1175   20850  : 183                      3rd Qu.:0.0000  
##                       (Other):1423                      Max.   :4.0000  
##                       NA's   :   5                                      
##  dispatch_arrive  arrive_cleared      latitude       longitude     
##  Min.   :   0.0   Min.   :    91   Min.   :38.94   Min.   :-77.29  
##  1st Qu.: 138.8   1st Qu.:  1747   1st Qu.:39.02   1st Qu.:-77.18  
##  Median : 228.0   Median :  2972   Median :39.06   Median :-77.09  
##  Mean   : 343.5   Mean   :  4138   Mean   :39.07   Mean   :-77.10  
##  3rd Qu.: 382.2   3rd Qu.:  5028   3rd Qu.:39.11   3rd Qu.:-77.03  
##  Max.   :5568.0   Max.   :258781   Max.   :39.29   Max.   :-76.94  
## 
# Incident count by city and collision type
collisions_clean %>%
  count(city, collision_type)
## # A tibble: 10 × 3
##    city          collision_type         n
##    <fct>         <chr>              <int>
##  1 Bethesda      Hit-and-Run Injury    64
##  2 Bethesda      Pedestrian Injury    167
##  3 Gaithersburg  Hit-and-Run Injury   158
##  4 Gaithersburg  Pedestrian Injury    241
##  5 Germantown    Hit-and-Run Injury    80
##  6 Germantown    Pedestrian Injury    141
##  7 Rockville     Hit-and-Run Injury   165
##  8 Rockville     Pedestrian Injury    309
##  9 Silver Spring Hit-and-Run Injury   499
## 10 Silver Spring Pedestrian Injury    676
# Incidents by zip code
collisions_clean %>%
  count(zip, sort = TRUE) %>%
  head(10)
## # A tibble: 10 × 2
##    zip       n
##    <fct> <int>
##  1 20910   268
##  2 20902   234
##  3 20852   194
##  4 20906   193
##  5 20850   183
##  6 20877   177
##  7 20904   164
##  8 20901   161
##  9 20874   142
## 10 20814   132
# Scatter plot of dispatch time vs time on scene
ggplot(collisions_clean, aes(x = dispatch_arrive, y = arrive_cleared)) +
  geom_point(alpha = 0.5, color = "blue") +
  scale_y_continuous(labels = scales::comma) +
  scale_x_continuous(labels = scales::comma) +
  labs(title = "Response Time vs Time on Scene",
       x = "Dispatch to Arrival in seconds",
       y = "Arrival to Cleared in seconds")

#4 Summary by City

# Creates a summary based on city
city_summary <- collisions_clean %>%
  group_by(city, collision_type) %>%
  summarize(
    count    = n(),
    avg_resp = round(mean(dispatch_arrive, na.rm = TRUE), 0)
  )
## `summarise()` has grouped output by 'city'. You can override using the
## `.groups` argument.
city_summary
## # A tibble: 10 × 4
## # Groups:   city [5]
##    city          collision_type     count avg_resp
##    <fct>         <chr>              <int>    <dbl>
##  1 Bethesda      Hit-and-Run Injury    64      639
##  2 Bethesda      Pedestrian Injury    167      333
##  3 Gaithersburg  Hit-and-Run Injury   158      419
##  4 Gaithersburg  Pedestrian Injury    241      296
##  5 Germantown    Hit-and-Run Injury    80      398
##  6 Germantown    Pedestrian Injury    141      334
##  7 Rockville     Hit-and-Run Injury   165      521
##  8 Rockville     Pedestrian Injury    309      310
##  9 Silver Spring Hit-and-Run Injury   499      364
## 10 Silver Spring Pedestrian Injury    676      269

#5 Final Visualization

# Plots collision counts by city filled by collision type
ggplot(city_summary, aes(x = reorder(city, -count, sum), y = count, fill = collision_type)) +
  geom_col(color = "black") +
  scale_fill_brewer(palette = "Set2") +
  annotate("text", x = 1, y = 115,
           label = "Highest pedestrian\ninjury count",
           size = 3.2, color = "black", fontface = "italic") +
  labs(
    title   = "Pedestrian & Hit and Run Injury Collisions by City",
    x       = "City",
    y       = "Number of Incidents",
    fill    = "Collision Type",
    caption = "Source: Montgomery County Open Data Portal: Police Dispatches for Collisions"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

#6 Map Visualization

# Sample down to 500 points to make the map look nicer
set.seed(42)
collisions_plot <- collisions_clean %>%
  slice_sample(n = min(500, nrow(collisions_clean)))

pal <- colorFactor(c("red", "orange"), domain = collisions_plot$collision_type)

leaflet(collisions_plot) %>%
  addTiles() %>%
  addCircleMarkers(
    lng         = ~longitude,
    lat         = ~latitude,
    radius      = 5,
    color       = ~pal(collision_type),
    stroke      = FALSE,
    fillOpacity = 0.8,
    popup = ~paste0(
      "<strong>", city, "</strong><br>",
      "Zip: ", zip, "<br>",
      "Type: ", collision_type, "<br>",
      "Priority: ", priority, "<br>",
      "Response Time: ", dispatch_arrive, " seconds<br>",
      "Time on Scene: ", arrive_cleared, " seconds"
    )
  ) %>%
  addLegend(
    position = "bottomright",
    pal      = pal,
    values   = ~collision_type,
    title    = "Collision Type",
    opacity  = 0.9
  )

#7 Conclusion

Process:

I cleaned the dataset by removing all special value from every column names, converting the categorical variables into factors, and filtering out to only having the pedestrian variable and hit and run injury collisions variable across the five cites in the county. I filtered it first before converting the city names to title case so the uppercase values matched. I then used drop_na() on only the variables I needed and selected only the columns I would use in the analysis.

Visualization Insight:

The bar chart shows that Silver Spring has the most serious collisions when compared any other I’m looking at, which makes sense when we consider Silver Spring has a bigger population and foot traffic compared to the other four cities. Hit and run injuries appeared in every city, meaning this isn’t a city specific issue, but a countywide problem. The map breaks things down further by zip code, as when you click on any dot it shows the exact zip, collision type, priority, and response time. The Zip codes with the highest collision amount are 20902 and 20910, which are in the Silver Spring backing up the claim that it has the most collisions.

Limitations:

This analysis only goes over one year and two collision types. If I were to do this in the future, that version of it could look across multiple years and break down incidents by time of day to see if collisions happen more at night or at day.

Sources/AI Useage:

  • Dataset: Montgomery County Open Data Portal — Police Dispatches for Collisions. https://data.montgomerycountymd.gov/Public-Safety/Police-Dispatches-for-Collisions/gfmu-f97a/about_data

  • AI assistance: I used gemini to get the command scale_y_continuous(labels = scales::comma) and scale_x_continuous(labels = scales::comma) because Arrival to Cleared in seconds was being shown as 2e+05 and other numbers like that. I didn’t know how to fix it so I asked Gemini and it gave me that command to turn that into 100,000 and 200,000