Crime in Montgomery County

Author

Daniel Bland

1. Introduction

Dataset topic

The dataset topic is Crime in Montgomery County, MD. The data is provided by the Police Department from Montgomery County and updated daily.

Variables

Overview of data headers documented in the appendix of the data source

Display Order	Column	Field Description
1	Incident ID	Police Incident Number
2	CR Number	Police Report Number
3	Dispatch Date / Time	The actual date and time a Officer was dispatched
4	Class	Four-digit code identifying the crime type of the incident
5	Class Description	Common name description of the incident class type
6	Police District Name	Name of District (Rockville, Wheaton etc.)
7	Block Address	Address in 100 block level
8	City	City
9	State	State
10	Zip Code	Zip code
11	Agency	Assigned Police Department
12	Place	Place description
13	Police Sector	Police Sector Name
14	Beat	Police patrol area subset within District
15	PRA	Police patrol area subset within Beat
16	Start Date / Time	Occurred from date/time
17	End Date / Time	Occurred to date/time
18	Latitude	Latitude
19	Longitude	Longitude
20	Police District Number	Major Police Boundary
21	Location	Location

What I plan to explore

I plan on exploring the change in crime over time because it has become a hot button issue in politics lately, and I want to explore the Montgomery County Dataset to see how the trends look like. People have speculated that it could be due to various factors, ranging from the effects of the pandemic to claims that it might be influenced by the political climate, such as the ‘defund the police’ campaigns following the events of the Great Reckoning. Both of these factors of course affect demoralization, crime rates, and staffing issues. This exploration is intended to identify any causation or even correlation, but just to see if there is a common trend that affects Montgomery County locally.

Source of dataset

Data Collected and Exported From: Open Montgomery
Data Provided By: Montgomery County, MD
Link to dataset at Open Montgomery:
https://data.montgomerycountymd.gov/Public-Safety/Crime/icn6-v9z3
Link to Direct Download:
https://data.montgomerycountymd.gov/api/views/icn6-v9z3/rows.csv?date=20231017&accessType=DOWNLOAD&bom=true&format=true

2. Load Libraries and Dataset

Load libraries

library(tidyverse)
library(ggplot2)
library(dplyr)
library(RColorBrewer)

Set working directory and load dataset

setwd("C:/Users/Administrator/Documents/Data110/Datasets") # set working directory
crimes <- read_csv("Crime_20231017.csv") # read csv and load dataset

3. Perform necessary cleaning

Clean headers and view data

names(crimes) <- tolower(names(crimes)) # make all headers lowercase
names(crimes) <- gsub(" ","",names(crimes)) # remove any spaces
names(crimes) <- gsub("/","_",names(crimes)) # remove slashes

Show headers and data

head(crimes) # show headers from dataset

# A tibble: 6 × 30
  incidentid offencecode  crnumber dispatchdate_time      start_date_time       
       <dbl> <chr>           <dbl> <chr>                  <chr>                 
1  201127168 3582         17014976 <NA>                   03/23/2017 11:33:00 PM
2  201095140 2303         16043118 <NA>                   08/24/2016 09:47:00 PM
3  201090710 5707         16037677 <NA>                   07/25/2016 05:31:00 PM
4  201101176 2303         16050746 <NA>                   10/03/2016 12:09:00 PM
5  201091832 2303         16038909 <NA>                   08/01/2016 01:45:00 PM
6  201296318 2204        200029321 07/28/2020 02:55:00 PM 07/28/2020 02:54:00 PM
# ℹ 25 more variables: end_date_time <chr>, nibrscode <chr>, victims <dbl>,
#   crimename1 <chr>, crimename2 <chr>, crimename3 <chr>,
#   policedistrictname <chr>, blockaddress <chr>, city <chr>, state <chr>,
#   zipcode <dbl>, agency <chr>, place <chr>, sector <chr>, beat <chr>,
#   pra <chr>, addressnumber <dbl>, streetprefix <chr>, streetname <chr>,
#   streetsuffix <chr>, streettype <chr>, latitude <dbl>, longitude <dbl>,
#   policedistrictnumber <chr>, location <chr>

Remove NA data on variables, convert date columns, and add delay

#convert varchar to datetime and add year
crimes <- crimes %>%
  mutate(start_date_time = as.POSIXct(start_date_time, format = "%m/%d/%Y %I:%M:%S %p"),
         year = as.integer(format(start_date_time, "%Y")))

# Print Summary
summary(crimes)

   incidentid        offencecode           crnumber         dispatchdate_time 
 Min.   :201087096   Length:277372      Min.   : 10046679   Length:277372     
 1st Qu.:201175261   Class :character   1st Qu.:180007692   Class :character  
 Median :201264067   Mode  :character   Median :190053156   Mode  :character  
 Mean   :201265588                      Mean   :178068289                     
 3rd Qu.:201355646                      3rd Qu.:210048564                     
 Max.   :201448819                      Max.   :230101499                     
                                                                              
 start_date_time                  end_date_time       nibrscode        
 Min.   :2016-07-01 00:00:00.00   Length:277372      Length:277372     
 1st Qu.:2018-02-06 15:15:00.00   Class :character   Class :character  
 Median :2019-10-27 03:33:00.00   Mode  :character   Mode  :character  
 Mean   :2019-12-25 21:27:11.48                                        
 3rd Qu.:2021-11-18 11:05:30.00                                        
 Max.   :2023-10-16 19:32:00.00                                        
 NA's   :5                                                             
    victims        crimename1         crimename2         crimename3       
 Min.   : 1.000   Length:277372      Length:277372      Length:277372     
 1st Qu.: 1.000   Class :character   Class :character   Class :character  
 Median : 1.000   Mode  :character   Mode  :character   Mode  :character  
 Mean   : 1.028                                                           
 3rd Qu.: 1.000                                                           
 Max.   :22.000                                                           
                                                                          
 policedistrictname blockaddress           city              state          
 Length:277372      Length:277372      Length:277372      Length:277372     
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
                                                                            
    zipcode         agency             place              sector         
 Min.   :    6   Length:277372      Length:277372      Length:277372     
 1st Qu.:20854   Class :character   Class :character   Class :character  
 Median :20878   Mode  :character   Mode  :character   Mode  :character  
 Mean   :20877                                                           
 3rd Qu.:20904                                                           
 Max.   :29878                                                           
 NA's   :2394                                                            
     beat               pra            addressnumber     streetprefix      
 Length:277372      Length:277372      Min.   :      0   Length:277372     
 Class :character   Class :character   1st Qu.:   1700   Class :character  
 Mode  :character   Mode  :character   Median :   8200   Mode  :character  
                                       Mean   :   8533                     
                                       3rd Qu.:  12400                     
                                       Max.   :2090600                     
                                       NA's   :26533                       
  streetname        streetsuffix        streettype           latitude    
 Length:277372      Length:277372      Length:277372      Min.   : 0.00  
 Class :character   Class :character   Class :character   1st Qu.:39.02  
 Mode  :character   Mode  :character   Mode  :character   Median :39.07  
                                                          Mean   :38.76  
                                                          3rd Qu.:39.14  
                                                          Max.   :39.34  
                                                                         
   longitude      policedistrictnumber   location              year     
 Min.   :-77.52   Length:277372        Length:277372      Min.   :2016  
 1st Qu.:-77.20   Class :character     Class :character   1st Qu.:2018  
 Median :-77.10   Mode  :character     Mode  :character   Median :2019  
 Mean   :-76.49                                           Mean   :2019  
 3rd Qu.:-77.03                                           3rd Qu.:2021  
 Max.   :  0.00                                           Max.   :2023  
                                                          NA's   :5

Remove NA from variables being used

# Clean NA data on columns I want to use
crimes_cleaned <- crimes |> 
  filter(!is.na(start_date_time) &
         !is.na(victims) &
         !is.na(crimename1) &
         !is.na(city))

4. Explore dataset

Show final headers and summary of dataset

head(crimes_cleaned) # show headers from dataset

# A tibble: 6 × 31
  incidentid offencecode  crnumber dispatchdate_time      start_date_time    
       <dbl> <chr>           <dbl> <chr>                  <dttm>             
1  201127168 3582         17014976 <NA>                   2017-03-23 23:33:00
2  201095140 2303         16043118 <NA>                   2016-08-24 21:47:00
3  201090710 5707         16037677 <NA>                   2016-07-25 17:31:00
4  201101176 2303         16050746 <NA>                   2016-10-03 12:09:00
5  201091832 2303         16038909 <NA>                   2016-08-01 13:45:00
6  201296318 2204        200029321 07/28/2020 02:55:00 PM 2020-07-28 14:54:00
# ℹ 26 more variables: end_date_time <chr>, nibrscode <chr>, victims <dbl>,
#   crimename1 <chr>, crimename2 <chr>, crimename3 <chr>,
#   policedistrictname <chr>, blockaddress <chr>, city <chr>, state <chr>,
#   zipcode <dbl>, agency <chr>, place <chr>, sector <chr>, beat <chr>,
#   pra <chr>, addressnumber <dbl>, streetprefix <chr>, streetname <chr>,
#   streetsuffix <chr>, streettype <chr>, latitude <dbl>, longitude <dbl>,
#   policedistrictnumber <chr>, location <chr>, year <int>

summary(crimes_cleaned) # show headers from dataset

   incidentid        offencecode           crnumber         dispatchdate_time 
 Min.   :201087096   Length:277354      Min.   : 10046679   Length:277354     
 1st Qu.:201175271   Class :character   1st Qu.:180007717   Class :character  
 Median :201264074   Mode  :character   Median :190053166   Mode  :character  
 Mean   :201265594                      Mean   :178069838                     
 3rd Qu.:201355648                      3rd Qu.:210048572                     
 Max.   :201448819                      Max.   :230101499                     
                                                                              
 start_date_time                 end_date_time       nibrscode        
 Min.   :2016-07-01 00:00:00.0   Length:277354      Length:277354     
 1st Qu.:2018-02-06 16:04:15.0   Class :character   Class :character  
 Median :2019-10-27 06:04:00.0   Mode  :character   Mode  :character  
 Mean   :2019-12-25 22:30:53.5                                        
 3rd Qu.:2021-11-18 11:55:45.0                                        
 Max.   :2023-10-16 19:32:00.0                                        
                                                                      
    victims        crimename1         crimename2         crimename3       
 Min.   : 1.000   Length:277354      Length:277354      Length:277354     
 1st Qu.: 1.000   Class :character   Class :character   Class :character  
 Median : 1.000   Mode  :character   Mode  :character   Mode  :character  
 Mean   : 1.028                                                           
 3rd Qu.: 1.000                                                           
 Max.   :22.000                                                           
                                                                          
 policedistrictname blockaddress           city              state          
 Length:277354      Length:277354      Length:277354      Length:277354     
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
                                                                            
    zipcode         agency             place              sector         
 Min.   :    6   Length:277354      Length:277354      Length:277354     
 1st Qu.:20854   Class :character   Class :character   Class :character  
 Median :20878   Mode  :character   Mode  :character   Mode  :character  
 Mean   :20877                                                           
 3rd Qu.:20904                                                           
 Max.   :29878                                                           
 NA's   :2394                                                            
     beat               pra            addressnumber     streetprefix      
 Length:277354      Length:277354      Min.   :      0   Length:277354     
 Class :character   Class :character   1st Qu.:   1700   Class :character  
 Mode  :character   Mode  :character   Median :   8200   Mode  :character  
                                       Mean   :   8533                     
                                       3rd Qu.:  12400                     
                                       Max.   :2090600                     
                                       NA's   :26520                       
  streetname        streetsuffix        streettype           latitude    
 Length:277354      Length:277354      Length:277354      Min.   : 0.00  
 Class :character   Class :character   Class :character   1st Qu.:39.02  
 Mode  :character   Mode  :character   Mode  :character   Median :39.07  
                                                          Mean   :38.76  
                                                          3rd Qu.:39.14  
                                                          Max.   :39.34  
                                                                         
   longitude      policedistrictnumber   location              year     
 Min.   :-77.52   Length:277354        Length:277354      Min.   :2016  
 1st Qu.:-77.20   Class :character     Class :character   1st Qu.:2018  
 Median :-77.10   Mode  :character     Mode  :character   Median :2019  
 Mean   :-76.49                                           Mean   :2019  
 3rd Qu.:-77.03                                           3rd Qu.:2021  
 Max.   :  0.00                                           Max.   :2023

Count distinct crime categories for three columns and min/max of date to get range

n_distinct(crimes_cleaned$crimename1) # Count distinct crimename1

[1] 3

n_distinct(crimes_cleaned$crimename2) # Count distinct crimename2

[1] 57

n_distinct(crimes_cleaned$crimename3) # Count distinct crimename3

[1] 267

n_distinct(crimes_cleaned$city) # Count distinct crimename3

[1] 247

min(crimes_cleaned$start_date_time) # min start date-time

[1] "2016-07-01 EDT"

max(crimes_cleaned$start_date_time) # max start date-time

[1] "2023-10-16 19:32:00 EDT"

Group by for the Visualization

Using the cleaned data, I grouped by year, city, and crime type and counted the number of victims. The data was filtered to reduce crowding. Three cities where chose along 270 in the immediate campus areas and 2016 & 2023 was omitted because there’s not enough data to complete the year.

cities <- crimes_cleaned |> # Group by for the Visualization
  filter(tolower(city) %in% tolower(c("Rockville", "Gaithersburg", "Germantown")), # Filter three cities
         year >= 2017, year <= 2022) |>  # Filter between 2017 and 2022 
  group_by(year, city, crimename1) |> # Group by year, city, and crime type
  summarize(victim_count = sum(victims)) |> # count victims
  arrange(desc(year), city, crimename1)

`summarise()` has grouped output by 'year', 'city'. You can override using the
`.groups` argument.

View new table

cities

# A tibble: 54 × 4
# Groups:   year, city [18]
    year city         crimename1             victim_count
   <int> <chr>        <chr>                         <dbl>
 1  2022 GAITHERSBURG Crime Against Person           1029
 2  2022 GAITHERSBURG Crime Against Property         3168
 3  2022 GAITHERSBURG Crime Against Society          1280
 4  2022 GERMANTOWN   Crime Against Person            787
 5  2022 GERMANTOWN   Crime Against Property         2272
 6  2022 GERMANTOWN   Crime Against Society           593
 7  2022 ROCKVILLE    Crime Against Person           1051
 8  2022 ROCKVILLE    Crime Against Property         3351
 9  2022 ROCKVILLE    Crime Against Society           502
10  2021 GAITHERSBURG Crime Against Person            958
# ℹ 44 more rows

5. Create Visualization

Visualization <- cities |>
  ggplot() +
  
  # Stacked bar chart with year on X axis, victim count on y axis, and crime type stacked
  geom_bar(aes(x = year, y = victim_count, fill = crimename1), 
           position = "stack", stat = "identity") + 
  
  # Use color palette
  scale_fill_manual(values = brewer.pal(n = 9, name = "Set2")) +  
  
  # Text labels
  labs(
    fill = "Crime Types",
    y = "Number of Victims",
    x = "Year",
    title = "Crime in 3 Major Cities in Montgomery County 2017-2022",
    caption = "Source: Montgomery County, MD Police Department"
  ) +
  
  # Add comma to y scale
  scale_y_continuous(labels = scales::comma, name = "Number of Victims") + 
  
  # Theme custumization
  theme_minimal() +
  theme(
    # title
    plot.title = element_text(size = 14, hjust = 0.5, margin = margin(b = 15)), 
    
    # x axis
    axis.text.x = element_text(size = 11, angle = 45, hjust = 1),
    axis.title.x = element_text(size = 12, face = "bold", vjust = 1),    
    
    # y axis
    axis.text.y = element_text(size = 11, angle = 45, hjust = 1),
    axis.title.y = element_text(size = 12, face = "bold", vjust = 1),
    
    # legend
    legend.title = element_text(size = 11,face = "bold",hjust=.5),
    legend.text = element_text(size = 10),
    
    # caption
    plot.caption = element_text(size = 10, hjust = 0.5),
    
    # panel / gridlines
    panel.grid.major = element_line(color = "lightgray", size = 0.2),
    panel.grid.minor = element_blank(),
    panel.background = element_rect(fill = "white", color = "lightgray"),
    panel.border = element_blank()
  ) +
  
  # Seperate cities in bar chart and label
  facet_grid(. ~ city)

Warning in brewer.pal(n = 9, name = "Set2"): n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors

Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.

Visualization

6. Essay

a. Cleaning
I started cleaning the dataset by looking at the headers and summary. I chose the columns that I wanted to explore, which includes city, a date column, number of victims, and crime type. I fixed the headers and converted date that were in chr to a date type and looked at the NA. I removed NA from those columns and looked at the range of data of min/max and distinct counts to see if those are usable. I decided on the crime type that was a higher level, filtered out the years that were not complete, and chose three highly populated cities that are the most relevant to our area and the campuses, which are namely Rockville, Gaithersburg, and Germantown.

b. Interpretation of the Visualization
Total victim count is displayed over time by year on the x-axis and the victim count on the y-axis. The crime type is stacked on the bar chart for additional information to eyeball categories by color. Each city is separated so they can be easily compared independently, but can also be compared to other cities despite the data not being normalized by utilizing per capita and dividing by population size. The more important thing I was checking was if all three areas have similar trends and it looks like for each of the three cities crime was lowest in 2021, which makes sense. Most people were still staying inside and there was not that much opportunity for crime. Likewise, when everything opened, there could have been a race condition where crime was spiking, but departments were understaffed. Not verified of course, but it does seem to follow similarly of national trends.

c. Anything Not Work or Wish You Could Have Done
I thought it would be interested in doing a heatmap on a mapping layer. I know there are leaflet packages and others I could explore, but figured it was out of scope of this project. I feel back on previous lessons and working on theming and color palettes to improve the style of the visualization.