library(tidyverse)
library(lubridate)
library(ggthemes)
library(RColorBrewer)
library(plotly)
library(leaflet)Crimes
Introduction
This data set is about all founded crimes reported after July 1st 2016 in Montgomery County. These reports come from the National Incident-Based Reporting System(NIBRS). Each row in the data set represents a single crime event and includes details such as the type of crime, whether it was against a person or property, the police district, the city, the exact location, and the date it happened. This data set has 466813 observations and 30 variables.
I chose to work with the Montgomery county crime data set because I have recently found myself really interested in crime-related topics. Even in project 1, I ended up choosing a data set about suicide attacks, so continuing with something related to crime felt very natural to me. I though it would be interesting to explore which police district reports the highest number of crimes.
Variables
| Variables Name | Meaning | Data type |
|---|---|---|
| Crime Name1 | If the crime happened against property or person | character |
| Crime Name2 | What is the crime | character |
| Police District | The police district that crime happened | character |
| Latitude | Latitude of montgomery county | numerical(mapping element) |
| Longtitude | Longitude of montgomery county | numerical(mapping element) |
| Date | The exact date that crime happened | date(lubridate) |
Question
Which police district in Montgomery County reports the highest number of crimes?
Source
National Incident-Based Reporting System (NIBRS) of the Criminal Justice Information Services (CJIS) Division Uniform Crime Reporting (UCR)
Loading the libraries
Loading the data set
crimes <- read_csv("Crime.csv")Just to look at the data type and first 6 rows
#str(crimes)
head(crimes)# A tibble: 6 × 30
`Incident ID` `Offence Code` `CR Number` `Dispatch Date / Time`
<dbl> <dbl> <dbl> <chr>
1 201194204 2305 180031343 06/25/2018 07:03:12 PM
2 201192754 3562 180029527 06/15/2018 09:28:52 AM
3 201197266 5404 180035053 07/15/2018 10:14:12 PM
4 201197538 2399 180035451 07/17/2018 10:11:04 PM
5 201194239 1399 180031372 06/25/2018 10:14:39 PM
6 201195017 5707 180032324 06/30/2018 06:36:12 PM
# ℹ 26 more variables: Start_Date_Time <chr>, End_Date_Time <chr>,
# `NIBRS Code` <chr>, Victims <dbl>, `Crime Name1` <chr>,
# `Crime Name2` <chr>, `Crime Name3` <chr>, `Police District Name` <chr>,
# `Block Address` <chr>, City <chr>, State <chr>, `Zip Code` <dbl>,
# Agency <chr>, Place <chr>, Sector <chr>, Beat <chr>, PRA <chr>,
# `Address Number` <dbl>, `Street Prefix` <chr>, `Street Name` <chr>,
# `Street Suffix` <chr>, `Street Type` <chr>, Latitude <dbl>, …
Creating a column with only date(extracting the time)
crime2 <- crimes |>
mutate(date=as.Date(mdy_hms(`Dispatch Date / Time`)))Cleaning the data set
names(crime2) <- tolower(names(crime2))
names(crime2) <- gsub(" ","_",names(crime2))
names(crime2) <- gsub("[/]"," ",names(crime2))
head(crime2)# A tibble: 6 × 31
incident_id offence_code cr_number `dispatch_date_ _time` start_date_time
<dbl> <dbl> <dbl> <chr> <chr>
1 201194204 2305 180031343 06/25/2018 07:03:12 PM 06/22/2018 06:00:00…
2 201192754 3562 180029527 06/15/2018 09:28:52 AM 06/15/2018 09:28:00…
3 201197266 5404 180035053 07/15/2018 10:14:12 PM 07/15/2018 10:14:00…
4 201197538 2399 180035451 07/17/2018 10:11:04 PM 07/17/2018 01:40:00…
5 201194239 1399 180031372 06/25/2018 10:14:39 PM 06/25/2018 10:14:00…
6 201195017 5707 180032324 06/30/2018 06:36:12 PM 06/30/2018 06:15:00…
# ℹ 26 more variables: end_date_time <chr>, nibrs_code <chr>, victims <dbl>,
# crime_name1 <chr>, crime_name2 <chr>, crime_name3 <chr>,
# police_district_name <chr>, block_address <chr>, city <chr>, state <chr>,
# zip_code <dbl>, agency <chr>, place <chr>, sector <chr>, beat <chr>,
# pra <chr>, address_number <dbl>, street_prefix <chr>, street_name <chr>,
# street_suffix <chr>, street_type <chr>, latitude <dbl>, longitude <dbl>,
# police_district_number <chr>, location <chr>, date <date>
Checking NAs
colSums(is.na(crime2)) incident_id offence_code cr_number
0 5 0
dispatch_date_ _time start_date_time end_date_time
67648 0 257439
nibrs_code victims crime_name1
0 0 0
crime_name2 crime_name3 police_district_name
0 0 1409
block_address city state
37282 37 9503
zip_code agency place
3387 0 0
sector beat pra
0 0 9
address_number street_prefix street_name
37142 447249 1334
street_suffix street_type latitude
461805 1720 0
longitude police_district_number location
0 0 0
date
67648
Handling NAs
crime_clean <- crime2 |>
filter(!is.na(police_district_name),!is.na(city))Removing unused columns
crimes_clean2 <- crime_clean |>
select(-`dispatch_date_ _time`,-end_date_time,-block_address,-street_prefix,-street_suffix,-street_type,-offence_code,-zip_code,-address_number,-street_name)
head(crimes_clean2)# A tibble: 6 × 21
incident_id cr_number start_date_time nibrs_code victims crime_name1
<dbl> <dbl> <chr> <chr> <dbl> <chr>
1 201194204 180031343 06/22/2018 06:00:00 PM 23F 1 Crime Against…
2 201192754 180029527 06/15/2018 09:28:00 AM 35A 1 Crime Against…
3 201197266 180035053 07/15/2018 10:14:00 PM 90D 1 Crime Against…
4 201197538 180035451 07/17/2018 01:40:00 PM 23H 1 Crime Against…
5 201194239 180031372 06/25/2018 10:14:00 PM 13B 1 Crime Against…
6 201195017 180032324 06/30/2018 06:15:00 PM 90J 1 Crime Against…
# ℹ 15 more variables: crime_name2 <chr>, crime_name3 <chr>,
# police_district_name <chr>, city <chr>, state <chr>, agency <chr>,
# place <chr>, sector <chr>, beat <chr>, pra <chr>, latitude <dbl>,
# longitude <dbl>, police_district_number <chr>, location <chr>, date <date>
A summary table
crime_summary <- crimes_clean2 |>
group_by(police_district_name) |>
summarise(count = n()) |>
arrange(desc(count))
crime_summary# A tibble: 8 × 2
police_district_name count
<chr> <int>
1 SILVER SPRING 98248
2 WHEATON 87108
3 MONTGOMERY VILLAGE 78989
4 BETHESDA 66611
5 ROCKVILLE 62966
6 GERMANTOWN 59052
7 TAKOMA PARK 13825
8 OTHER 14
Removing others because it has only 14 and compared to other counts it is small
crime_summary1 <- crime_summary |>
filter(police_district_name != "OTHER")
crime_summary1# A tibble: 7 × 2
police_district_name count
<chr> <int>
1 SILVER SPRING 98248
2 WHEATON 87108
3 MONTGOMERY VILLAGE 78989
4 BETHESDA 66611
5 ROCKVILLE 62966
6 GERMANTOWN 59052
7 TAKOMA PARK 13825
Visualization
ggplot(crime_summary1,aes(x=(police_district_name),y=count,fill=police_district_name)) +
geom_col()+
labs(title="Crime count by Police District in Montgomery County",
x = "Police District",y="Number of Crimes",caption = "Source: National Incident-Based Reporting System(NIBRS),CJIS,UCR Program")+
theme_minimal(base_size = 12,base_family = "serif")+
scale_fill_brewer(palette = "Dark2")+
theme(plot.title = element_text(face = "bold",size = 14,hjust=0.5),
axis.text.x = element_text(angle = 45,hjust = 1),
plot.caption = element_text(hjust = 1),
legend.position = "none")Bar chart with the plotly
p <- ggplot(crime_summary1,
aes(x=police_district_name,
y=count,fill=police_district_name,
text=paste("District:",police_district_name,
"<br> Total Crimes:",count)))+
geom_col()+
labs(title="Crime count by Police District in Montgomery County",
x = "Police District",
y="Number of Crimes",
caption = "Source: National Incident-Based Reporting System(NIBRS),CJIS,UCR Program")+
theme_minimal(base_size = 12,base_family = "serif")+
scale_fill_brewer(palette = "Dark2")+
theme(plot.title = element_text(face = "bold",size = 14,hjust=0.5),
axis.text.x = element_text(angle = 45,hjust = 1),
plot.caption = element_text(hjust = 1),
legend.position = "none")
p <- ggplotly(p,tooltip = "text") ## I used this cite to customize the tooltip (tooltip ="text") https://r-graph-gallery.com/customize-plotly-tooltip.html)
pFiltered only takoma park
crime_takoma <- crimes_clean2 |>
filter(police_district_name == "TAKOMA PARK",
crime_name1 %in% c("Crime Against Property","Crime Against Person"),
!is.na(longitude),
!is.na(latitude),
latitude >38 & latitude <40,
longitude > -77 & longitude < -76) ## I filtered the latitude and longitude to focus only on Takoma park geographic area. Without this step, Leaflet shows the entire world map and the crime points do not appear within the initial zoom.Getting the latitude and longitude
mean_long <- mean(crime_takoma$longitude,na.rm = TRUE)
mean_lat <- mean(crime_takoma$latitude,na.rm=TRUE)
mean_long[1] -76.99196
mean_lat[1] 38.98192
Assigning two colors for crime against person and crime against property
pal <- colorFactor(
palette = c("#27408B", "#00FF7F"),
domain = c("Crime Against Property", "Crime Against Person")
)
##https://r-charts.com/spatial/interactive-maps-leaflet/ because with levels this code didnt workFinal Map
leaflet(crime_takoma) |>
setView(lng = mean_long,
lat = mean_lat,
zoom = 13.3) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircleMarkers(lng = ~longitude,
lat = ~latitude,
radius = 4,
stroke = FALSE,
fillColor = ~pal(crime_name1),
fillOpacity = 0.6,
popup = ~paste(
"<b>District: </b>",police_district_name,"<br>",
"Crime Type:",crime_name1,"<br>",
"Details:",crime_name2,"<br>",
"Date:",as.character(date) )) |>
addLegend(position = "bottomright",
pal = pal,
values = crime_takoma$crime_name1,
title = "Crime Type",
opacity = 1)##https://r-charts.com/spatial/interactive-maps-leaflet/ for legendEssay
My bar chart shows the total number of crimes reported in each police district in Montgomery County. This helped me answer my main question about which district has the highest crime count. Once I plotted it, the results were super obvious, Silver Spring had the most crimes by a huge margin. I also removed the “OTHER” category because it only had 14 cases, which didn’t really contribute anything and just made the plot look unnecessary.
For the map, I decided to zoom in on just Takoma Park instead of the whole county because mapping every single crime point would have been way too messy. I used Leaflet to plot the exact locations of crimes and used two different colors to show whether each case was a crime against property or a crime against a person. I really liked how this helped me see the differences between the two types visually. I noticed that crimes were spread throughout the city.
At first, my map kept loading the entire world or showing no points at all, so I had to filter the latitude and longitude. After fixing that, the map finally made sense and looked clean.
I originally wanted to make the dot sizes represent how serious the crime was, but the dataset didn’t include a severity level. Even without this feature, I feel like the two-color map clearly shows the patterns I wanted to highlight.
Bibliography
https://r-graph-gallery.com/customize-plotly-tooltip.html)
https://r-charts.com/spatial/interactive-maps-leaflet/
https://r-charts.com/spatial/interactive-maps-leaflet/