Projet 2

Author

Myriam O.

Projet 2 Assignment

Introduction

My projet is about police dispatched incidents from Montgomery County Data. This dataset includes information about police calls, such as incident type, priority level, city, ZIP code, dispatch time, start and end time, address, location, police district, and disposition description. I chose this topic because I was curious about how incidents are distributed across cities and how dispatch time changes by priority and incident type.

The variables I focused on include both categorical and quantitative variables. Categorical variables include Close Type, City, ZIP code, Priority, and Disposition Desc. Quantitative variable include Calltime Dispatch. I used Latitude and Longitude for the map, Calltime Dispatch to study response time, and Priority to compare incident urgency. To clean the data, I selected the most useful variables, removed bad values, and filtered the dataset to focus on what I wanted to show.

Load library

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(plotly)


Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

library(leaflet)

Read data

setwd("~/Downloads/First data 110 assignment_files")
police <- read_csv("Police_Dispatched_Incidents_2026 .csv")

Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)

Rows: 239695 Columns: 26
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (11): Incident_ID, Start Time, End Time, Initial Type, Close Type, Addre...
dbl (15): Crime Reports, Crash Reports, Priority, Zip, Longitude, Latitude, ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Select most important variables

police_clean <- police |>
  select(`Close Type`, Priority, City, Zip, `Police District Number`,Latitude, 
         Longitude, `Calltime Dispatch`, `Disposition Desc`)

I used select to keep only the variables I needed. This helps reduce unnecessary variables and makes the dataset easier to work with for my visualization.

Top 07 incidents types

top7_types <- police_clean |>
  count(`Close Type`) |>
  arrange(desc(n)) |>
  slice_head(n = 7)
top7_types

# A tibble: 7 × 2
  `Close Type`                          n
  <chr>                             <int>
1 TRAFFIC/TRANSPORTATION INCIDENT   18541
2 SUSPICIOUS CIRC, PERSONS, VEHICLE 17761
3 CHECK WELFARE                     13490
4 DISTURBANCE/NUISANCE              11506
5 TRESPASSING/UNWANTED              10464
6 TRAFFIC VIOLATION                  9837
7 DOMESTIC DISTURBANCE/VIOLENCE      9787

I decided to use the top 7 incident types because it helped me focus on the most common incidents. I used count to count the number of incidents in each type, arrange to sort them from highest to lowest, and slice_head to keep only the top 7.

Explore incident priority levels

ggplot(police_clean, aes(x = factor(Priority), fill = factor(Priority))) +
  geom_bar() +
  labs(title = "Distribution of Incident Priority Levels",
       x = "Priority",
       y = "Count",
       fill = "Priority Level",
       caption = "Source: Montgomery County Data") +
  theme_minimal()

I created this bar chart to explore the distribution of incident priority levels and compare how often each priority appears in the dataset.

Explore Cities with the most Incidents

top5_city <- police_clean |>
  count(City) |>
  arrange(desc(n)) |>
  slice_head(n = 5)

ggplot(top5_city, aes(x = reorder(City, n), y = n, fill = City)) +
  geom_col() +
  coord_flip() +
  scale_fill_brewer(palette = "Set2") +
  labs(title = "Top 5 cities with the most Incidents",
       x = "City",
       y = "Number of Incidents",
       caption = "Source: Montgomery County Data") +
  theme_minimal() +
  theme(legend.position = "none")

I created this bar chart to explore which cities had the highest number of incidents in the dataset because I wanted to choose one or two cities to focus on for my final visualization.

Explore call Dispatch times

ggplot(police_clean, aes(y = `Calltime Dispatch`)) +
  geom_boxplot() +
  labs(title = "Distribution of Call Dispatch Times",
       y = "Calltime Dispatch") +
  theme_bw()

Warning: Removed 14642 rows containing non-finite outside the scale range
(`stat_boxplot()`).

I created this boxplot to explore the distribution of call dispatch times and identify possible outliers. This helped me see that some dispatch times were very high, so I decided to remove extreme values and focus on a reasonable range.

Remove bad values and filter dataset

police_clean2 <- police_clean |>
  filter(`Close Type` %in% top7_types$`Close Type`) |>
  filter(!is.na(`Calltime Dispatch`)) |>
  filter(`Calltime Dispatch` <= 200) |>
  filter(Longitude != 0, Latitude != 0) |>
  filter(City %in% c("ROCKVILLE", "SILVER SPRING")) |>
  filter(`Disposition Desc` == "FAMILYTROUBLE")

I used filter to remove bad values and focus the dataset on what I wanted to analyze. I removed missing dispatch times, very high dispatch times, invalid data, and kept only the top incident types. I also chose the cities Rockville and Silver Spring because they had the highest number of incidents. I focused on Family Trouble because it was one of the incident types that interested me.

Plot 1: Heatmap

heat_data <- police_clean2 |>
 group_by(`Close Type`, Priority) |>
 summarize(avg_dispatch = mean(`Calltime Dispatch`), .groups = "drop") |>
 filter(`Close Type` %in% c(
   "DOMESTIC DISTURBANCE/VIOLENCE",
   "SUSPICIOUS CIRC, PERSONS, VEHICLE",
   "DISTURBANCE/NUISANCE"
 ))

p1 <- ggplot(heat_data,
 aes(x = factor(Priority),
     y = `Close Type`,
     fill = avg_dispatch,
     text = paste(
       "<br>Priority:", Priority,
       "<br>Average Dispatch:", round(avg_dispatch,1)
     ))) +
 geom_tile() +
  scale_fill_gradientn(colors = c("gold", "darkgreen", "purple")) +
 labs(title = "Average Dispatch Time by 
      Priority and Incident Type",
      x = "Priority",
      y = "Incident Type",
      fill = "Avg Dispatch") +
 theme_minimal() 

ggplotly(p1, tooltip="text")

Final plot: Map

pal <- colorFactor(c("red", "gold"), domain = police_clean2$City)

leaflet(police_clean2) |>
  addProviderTiles(providers$Esri.WorldStreetMap) |>
  setView(lng = -77.10, lat = 39.08, zoom = 12) |>
  addCircleMarkers(
    lng = ~Longitude,
    lat = ~Latitude,
    radius = ~`Calltime Dispatch` /30,
    color = ~pal(City),
    fillOpacity = 0.75,
    popup = ~paste("<b>Incident:</b>", `Close Type`,
                   "<br><b>Priority:</b>", Priority,
                   "<br><b>City:</b>", City,
                   "<br><b>Dispatch time:</b>", `Calltime Dispatch`,
                   "<br><b>Zip code:</b>", Zip
                   )
    ) |>
  addLegend(position = "bottomright",
            pal = pal,
            values = ~City,
            title = "City")

Essay

I did the heatmap with the most incident reports because I wanted to compare how dispatch time changes across the incidents that appeared most often in the dataset. It represents the relationship between incident type, priority level, and average dispatch time. One interesting thing I found was that some incident types and priorities had higher dispatch times than others.

The map shows the location of the incidents. The colors represent both cities, with one color for Rockville and another for Silver Spring. When we hover over the points, we can see information such as incident type, priority, city, dispatch time, and zip code. The size of the bubble points depends on dispatch time, where larger bubbles represent higher dispatch times and smaller bubbles represent lower dispatch times. I noticed on the map that some areas within the cities have more incidents than others, especially where points appear more clustered, which may show higher activity in those locations. I also noticed that some places have larger bubbles, showing that some incidents took longer to respond to than others.

One thing I wish I could have done was include more cities, but since the dataset had to stay under 800 observations, it was difficult to use more than two cities. I also wish I could have compared more incident types, but I kept the analysis focused so the visualizations stayed clear, meaningful, and met the project requirements.