library(tidyverse)
library(ggplot2)
library(dplyr)
library(RColorBrewer)Crime in Montgomery County
1. Introduction
Dataset topic
The dataset topic is Crime in Montgomery County, MD. The data is provided by the Police Department from Montgomery County and updated daily.
Variables
Overview of data headers documented in the appendix of the data source
| Display Order | Column | Field Description |
|---|---|---|
| 1 | Incident ID | Police Incident Number |
| 2 | CR Number | Police Report Number |
| 3 | Dispatch Date / Time | The actual date and time a Officer was dispatched |
| 4 | Class | Four-digit code identifying the crime type of the incident |
| 5 | Class Description | Common name description of the incident class type |
| 6 | Police District Name | Name of District (Rockville, Wheaton etc.) |
| 7 | Block Address | Address in 100 block level |
| 8 | City | City |
| 9 | State | State |
| 10 | Zip Code | Zip code |
| 11 | Agency | Assigned Police Department |
| 12 | Place | Place description |
| 13 | Police Sector | Police Sector Name |
| 14 | Beat | Police patrol area subset within District |
| 15 | PRA | Police patrol area subset within Beat |
| 16 | Start Date / Time | Occurred from date/time |
| 17 | End Date / Time | Occurred to date/time |
| 18 | Latitude | Latitude |
| 19 | Longitude | Longitude |
| 20 | Police District Number | Major Police Boundary |
| 21 | Location | Location |
What I plan to explore
I plan on exploring the change in crime over time because it has become a hot button issue in politics lately, and I want to explore the Montgomery County Dataset to see how the trends look like. People have speculated that it could be due to various factors, ranging from the effects of the pandemic to claims that it might be influenced by the political climate, such as the ‘defund the police’ campaigns following the events of the Great Reckoning. Both of these factors of course affect demoralization, crime rates, and staffing issues. This exploration is intended to identify any causation or even correlation, but just to see if there is a common trend that affects Montgomery County locally.
Source of dataset
Data Collected and Exported From: Open Montgomery
Data Provided By: Montgomery County, MD
Link to dataset at Open Montgomery:
https://data.montgomerycountymd.gov/Public-Safety/Crime/icn6-v9z3Link to Direct Download:
https://data.montgomerycountymd.gov/api/views/icn6-v9z3/rows.csv?date=20231017&accessType=DOWNLOAD&bom=true&format=true
2. Load Libraries and Dataset
Load libraries
Set working directory and load dataset
setwd("C:/Users/Administrator/Documents/Data110/Datasets") # set working directory
crimes <- read_csv("Crime_20231017.csv") # read csv and load dataset3. Perform necessary cleaning
Clean headers and view data
names(crimes) <- tolower(names(crimes)) # make all headers lowercase
names(crimes) <- gsub(" ","",names(crimes)) # remove any spaces
names(crimes) <- gsub("/","_",names(crimes)) # remove slashesShow headers and data
head(crimes) # show headers from dataset# A tibble: 6 × 30
incidentid offencecode crnumber dispatchdate_time start_date_time
<dbl> <chr> <dbl> <chr> <chr>
1 201127168 3582 17014976 <NA> 03/23/2017 11:33:00 PM
2 201095140 2303 16043118 <NA> 08/24/2016 09:47:00 PM
3 201090710 5707 16037677 <NA> 07/25/2016 05:31:00 PM
4 201101176 2303 16050746 <NA> 10/03/2016 12:09:00 PM
5 201091832 2303 16038909 <NA> 08/01/2016 01:45:00 PM
6 201296318 2204 200029321 07/28/2020 02:55:00 PM 07/28/2020 02:54:00 PM
# ℹ 25 more variables: end_date_time <chr>, nibrscode <chr>, victims <dbl>,
# crimename1 <chr>, crimename2 <chr>, crimename3 <chr>,
# policedistrictname <chr>, blockaddress <chr>, city <chr>, state <chr>,
# zipcode <dbl>, agency <chr>, place <chr>, sector <chr>, beat <chr>,
# pra <chr>, addressnumber <dbl>, streetprefix <chr>, streetname <chr>,
# streetsuffix <chr>, streettype <chr>, latitude <dbl>, longitude <dbl>,
# policedistrictnumber <chr>, location <chr>
Remove NA data on variables, convert date columns, and add delay
#convert varchar to datetime and add year
crimes <- crimes %>%
mutate(start_date_time = as.POSIXct(start_date_time, format = "%m/%d/%Y %I:%M:%S %p"),
year = as.integer(format(start_date_time, "%Y")))
# Print Summary
summary(crimes) incidentid offencecode crnumber dispatchdate_time
Min. :201087096 Length:277372 Min. : 10046679 Length:277372
1st Qu.:201175261 Class :character 1st Qu.:180007692 Class :character
Median :201264067 Mode :character Median :190053156 Mode :character
Mean :201265588 Mean :178068289
3rd Qu.:201355646 3rd Qu.:210048564
Max. :201448819 Max. :230101499
start_date_time end_date_time nibrscode
Min. :2016-07-01 00:00:00.00 Length:277372 Length:277372
1st Qu.:2018-02-06 15:15:00.00 Class :character Class :character
Median :2019-10-27 03:33:00.00 Mode :character Mode :character
Mean :2019-12-25 21:27:11.48
3rd Qu.:2021-11-18 11:05:30.00
Max. :2023-10-16 19:32:00.00
NA's :5
victims crimename1 crimename2 crimename3
Min. : 1.000 Length:277372 Length:277372 Length:277372
1st Qu.: 1.000 Class :character Class :character Class :character
Median : 1.000 Mode :character Mode :character Mode :character
Mean : 1.028
3rd Qu.: 1.000
Max. :22.000
policedistrictname blockaddress city state
Length:277372 Length:277372 Length:277372 Length:277372
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
zipcode agency place sector
Min. : 6 Length:277372 Length:277372 Length:277372
1st Qu.:20854 Class :character Class :character Class :character
Median :20878 Mode :character Mode :character Mode :character
Mean :20877
3rd Qu.:20904
Max. :29878
NA's :2394
beat pra addressnumber streetprefix
Length:277372 Length:277372 Min. : 0 Length:277372
Class :character Class :character 1st Qu.: 1700 Class :character
Mode :character Mode :character Median : 8200 Mode :character
Mean : 8533
3rd Qu.: 12400
Max. :2090600
NA's :26533
streetname streetsuffix streettype latitude
Length:277372 Length:277372 Length:277372 Min. : 0.00
Class :character Class :character Class :character 1st Qu.:39.02
Mode :character Mode :character Mode :character Median :39.07
Mean :38.76
3rd Qu.:39.14
Max. :39.34
longitude policedistrictnumber location year
Min. :-77.52 Length:277372 Length:277372 Min. :2016
1st Qu.:-77.20 Class :character Class :character 1st Qu.:2018
Median :-77.10 Mode :character Mode :character Median :2019
Mean :-76.49 Mean :2019
3rd Qu.:-77.03 3rd Qu.:2021
Max. : 0.00 Max. :2023
NA's :5
Remove NA from variables being used
# Clean NA data on columns I want to use
crimes_cleaned <- crimes |>
filter(!is.na(start_date_time) &
!is.na(victims) &
!is.na(crimename1) &
!is.na(city))4. Explore dataset
Show final headers and summary of dataset
head(crimes_cleaned) # show headers from dataset# A tibble: 6 × 31
incidentid offencecode crnumber dispatchdate_time start_date_time
<dbl> <chr> <dbl> <chr> <dttm>
1 201127168 3582 17014976 <NA> 2017-03-23 23:33:00
2 201095140 2303 16043118 <NA> 2016-08-24 21:47:00
3 201090710 5707 16037677 <NA> 2016-07-25 17:31:00
4 201101176 2303 16050746 <NA> 2016-10-03 12:09:00
5 201091832 2303 16038909 <NA> 2016-08-01 13:45:00
6 201296318 2204 200029321 07/28/2020 02:55:00 PM 2020-07-28 14:54:00
# ℹ 26 more variables: end_date_time <chr>, nibrscode <chr>, victims <dbl>,
# crimename1 <chr>, crimename2 <chr>, crimename3 <chr>,
# policedistrictname <chr>, blockaddress <chr>, city <chr>, state <chr>,
# zipcode <dbl>, agency <chr>, place <chr>, sector <chr>, beat <chr>,
# pra <chr>, addressnumber <dbl>, streetprefix <chr>, streetname <chr>,
# streetsuffix <chr>, streettype <chr>, latitude <dbl>, longitude <dbl>,
# policedistrictnumber <chr>, location <chr>, year <int>
summary(crimes_cleaned) # show headers from dataset incidentid offencecode crnumber dispatchdate_time
Min. :201087096 Length:277354 Min. : 10046679 Length:277354
1st Qu.:201175271 Class :character 1st Qu.:180007717 Class :character
Median :201264074 Mode :character Median :190053166 Mode :character
Mean :201265594 Mean :178069838
3rd Qu.:201355648 3rd Qu.:210048572
Max. :201448819 Max. :230101499
start_date_time end_date_time nibrscode
Min. :2016-07-01 00:00:00.0 Length:277354 Length:277354
1st Qu.:2018-02-06 16:04:15.0 Class :character Class :character
Median :2019-10-27 06:04:00.0 Mode :character Mode :character
Mean :2019-12-25 22:30:53.5
3rd Qu.:2021-11-18 11:55:45.0
Max. :2023-10-16 19:32:00.0
victims crimename1 crimename2 crimename3
Min. : 1.000 Length:277354 Length:277354 Length:277354
1st Qu.: 1.000 Class :character Class :character Class :character
Median : 1.000 Mode :character Mode :character Mode :character
Mean : 1.028
3rd Qu.: 1.000
Max. :22.000
policedistrictname blockaddress city state
Length:277354 Length:277354 Length:277354 Length:277354
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
zipcode agency place sector
Min. : 6 Length:277354 Length:277354 Length:277354
1st Qu.:20854 Class :character Class :character Class :character
Median :20878 Mode :character Mode :character Mode :character
Mean :20877
3rd Qu.:20904
Max. :29878
NA's :2394
beat pra addressnumber streetprefix
Length:277354 Length:277354 Min. : 0 Length:277354
Class :character Class :character 1st Qu.: 1700 Class :character
Mode :character Mode :character Median : 8200 Mode :character
Mean : 8533
3rd Qu.: 12400
Max. :2090600
NA's :26520
streetname streetsuffix streettype latitude
Length:277354 Length:277354 Length:277354 Min. : 0.00
Class :character Class :character Class :character 1st Qu.:39.02
Mode :character Mode :character Mode :character Median :39.07
Mean :38.76
3rd Qu.:39.14
Max. :39.34
longitude policedistrictnumber location year
Min. :-77.52 Length:277354 Length:277354 Min. :2016
1st Qu.:-77.20 Class :character Class :character 1st Qu.:2018
Median :-77.10 Mode :character Mode :character Median :2019
Mean :-76.49 Mean :2019
3rd Qu.:-77.03 3rd Qu.:2021
Max. : 0.00 Max. :2023
Count distinct crime categories for three columns and min/max of date to get range
n_distinct(crimes_cleaned$crimename1) # Count distinct crimename1[1] 3
n_distinct(crimes_cleaned$crimename2) # Count distinct crimename2[1] 57
n_distinct(crimes_cleaned$crimename3) # Count distinct crimename3[1] 267
n_distinct(crimes_cleaned$city) # Count distinct crimename3[1] 247
min(crimes_cleaned$start_date_time) # min start date-time[1] "2016-07-01 EDT"
max(crimes_cleaned$start_date_time) # max start date-time[1] "2023-10-16 19:32:00 EDT"
Group by for the Visualization
Using the cleaned data, I grouped by year, city, and crime type and counted the number of victims. The data was filtered to reduce crowding. Three cities where chose along 270 in the immediate campus areas and 2016 & 2023 was omitted because there’s not enough data to complete the year.
cities <- crimes_cleaned |> # Group by for the Visualization
filter(tolower(city) %in% tolower(c("Rockville", "Gaithersburg", "Germantown")), # Filter three cities
year >= 2017, year <= 2022) |> # Filter between 2017 and 2022
group_by(year, city, crimename1) |> # Group by year, city, and crime type
summarize(victim_count = sum(victims)) |> # count victims
arrange(desc(year), city, crimename1)`summarise()` has grouped output by 'year', 'city'. You can override using the
`.groups` argument.
View new table
cities# A tibble: 54 × 4
# Groups: year, city [18]
year city crimename1 victim_count
<int> <chr> <chr> <dbl>
1 2022 GAITHERSBURG Crime Against Person 1029
2 2022 GAITHERSBURG Crime Against Property 3168
3 2022 GAITHERSBURG Crime Against Society 1280
4 2022 GERMANTOWN Crime Against Person 787
5 2022 GERMANTOWN Crime Against Property 2272
6 2022 GERMANTOWN Crime Against Society 593
7 2022 ROCKVILLE Crime Against Person 1051
8 2022 ROCKVILLE Crime Against Property 3351
9 2022 ROCKVILLE Crime Against Society 502
10 2021 GAITHERSBURG Crime Against Person 958
# ℹ 44 more rows
5. Create Visualization
Visualization <- cities |>
ggplot() +
# Stacked bar chart with year on X axis, victim count on y axis, and crime type stacked
geom_bar(aes(x = year, y = victim_count, fill = crimename1),
position = "stack", stat = "identity") +
# Use color palette
scale_fill_manual(values = brewer.pal(n = 9, name = "Set2")) +
# Text labels
labs(
fill = "Crime Types",
y = "Number of Victims",
x = "Year",
title = "Crime in 3 Major Cities in Montgomery County 2017-2022",
caption = "Source: Montgomery County, MD Police Department"
) +
# Add comma to y scale
scale_y_continuous(labels = scales::comma, name = "Number of Victims") +
# Theme custumization
theme_minimal() +
theme(
# title
plot.title = element_text(size = 14, hjust = 0.5, margin = margin(b = 15)),
# x axis
axis.text.x = element_text(size = 11, angle = 45, hjust = 1),
axis.title.x = element_text(size = 12, face = "bold", vjust = 1),
# y axis
axis.text.y = element_text(size = 11, angle = 45, hjust = 1),
axis.title.y = element_text(size = 12, face = "bold", vjust = 1),
# legend
legend.title = element_text(size = 11,face = "bold",hjust=.5),
legend.text = element_text(size = 10),
# caption
plot.caption = element_text(size = 10, hjust = 0.5),
# panel / gridlines
panel.grid.major = element_line(color = "lightgray", size = 0.2),
panel.grid.minor = element_blank(),
panel.background = element_rect(fill = "white", color = "lightgray"),
panel.border = element_blank()
) +
# Seperate cities in bar chart and label
facet_grid(. ~ city)Warning in brewer.pal(n = 9, name = "Set2"): n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.
Visualization6. Essay
a. Cleaning
I started cleaning the dataset by looking at the headers and summary. I chose the columns that I wanted to explore, which includes city, a date column, number of victims, and crime type. I fixed the headers and converted date that were in chr to a date type and looked at the NA. I removed NA from those columns and looked at the range of data of min/max and distinct counts to see if those are usable. I decided on the crime type that was a higher level, filtered out the years that were not complete, and chose three highly populated cities that are the most relevant to our area and the campuses, which are namely Rockville, Gaithersburg, and Germantown.
b. Interpretation of the Visualization
Total victim count is displayed over time by year on the x-axis and the victim count on the y-axis. The crime type is stacked on the bar chart for additional information to eyeball categories by color. Each city is separated so they can be easily compared independently, but can also be compared to other cities despite the data not being normalized by utilizing per capita and dividing by population size. The more important thing I was checking was if all three areas have similar trends and it looks like for each of the three cities crime was lowest in 2021, which makes sense. Most people were still staying inside and there was not that much opportunity for crime. Likewise, when everything opened, there could have been a race condition where crime was spiking, but departments were understaffed. Not verified of course, but it does seem to follow similarly of national trends.
c. Anything Not Work or Wish You Could Have Done
I thought it would be interested in doing a heatmap on a mapping layer. I know there are leaflet packages and others I could explore, but figured it was out of scope of this project. I feel back on previous lessons and working on theming and color palettes to improve the style of the visualization.