Crimes

Author

Thiloni Konara

Introduction

This data set is about all founded crimes reported after July 1st 2016 in Montgomery County. These reports come from the National Incident-Based Reporting System(NIBRS). Each row in the data set represents a single crime event and includes details such as the type of crime, whether it was against a person or property, the police district, the city, the exact location, and the date it happened. This data set has 466813 observations and 30 variables.

I chose to work with the Montgomery county crime data set because I have recently found myself really interested in crime-related topics. Even in project 1, I ended up choosing a data set about suicide attacks, so continuing with something related to crime felt very natural to me. I though it would be interesting to explore which police district reports the highest number of crimes.

Variables

Variables Name Meaning Data type
Crime Name1 If the crime happened against property or person character
Crime Name2 What is the crime character
Police District The police district that crime happened character
Latitude Latitude of montgomery county numerical(mapping element)
Longtitude Longitude of montgomery county numerical(mapping element)
Date The exact date that crime happened date(lubridate)

Question

Which police district in Montgomery County reports the highest number of crimes?

Source

National Incident-Based Reporting System (NIBRS) of the Criminal Justice Information Services (CJIS) Division Uniform Crime Reporting (UCR)

Loading the libraries

library(tidyverse)
library(lubridate)
library(ggthemes)
library(RColorBrewer)
library(plotly)
library(leaflet)

Loading the data set

crimes <- read_csv("Crime.csv")

Just to look at the data type and first 6 rows

#str(crimes)
head(crimes)
# A tibble: 6 × 30
  `Incident ID` `Offence Code` `CR Number` `Dispatch Date / Time`
          <dbl>          <dbl>       <dbl> <chr>                 
1     201194204           2305   180031343 06/25/2018 07:03:12 PM
2     201192754           3562   180029527 06/15/2018 09:28:52 AM
3     201197266           5404   180035053 07/15/2018 10:14:12 PM
4     201197538           2399   180035451 07/17/2018 10:11:04 PM
5     201194239           1399   180031372 06/25/2018 10:14:39 PM
6     201195017           5707   180032324 06/30/2018 06:36:12 PM
# ℹ 26 more variables: Start_Date_Time <chr>, End_Date_Time <chr>,
#   `NIBRS Code` <chr>, Victims <dbl>, `Crime Name1` <chr>,
#   `Crime Name2` <chr>, `Crime Name3` <chr>, `Police District Name` <chr>,
#   `Block Address` <chr>, City <chr>, State <chr>, `Zip Code` <dbl>,
#   Agency <chr>, Place <chr>, Sector <chr>, Beat <chr>, PRA <chr>,
#   `Address Number` <dbl>, `Street Prefix` <chr>, `Street Name` <chr>,
#   `Street Suffix` <chr>, `Street Type` <chr>, Latitude <dbl>, …

Creating a column with only date(extracting the time)

crime2 <- crimes |>
  mutate(date=as.Date(mdy_hms(`Dispatch Date / Time`)))

Cleaning the data set

names(crime2) <- tolower(names(crime2))
names(crime2) <- gsub(" ","_",names(crime2))
names(crime2) <- gsub("[/]"," ",names(crime2))

head(crime2)
# A tibble: 6 × 31
  incident_id offence_code cr_number `dispatch_date_ _time` start_date_time     
        <dbl>        <dbl>     <dbl> <chr>                  <chr>               
1   201194204         2305 180031343 06/25/2018 07:03:12 PM 06/22/2018 06:00:00…
2   201192754         3562 180029527 06/15/2018 09:28:52 AM 06/15/2018 09:28:00…
3   201197266         5404 180035053 07/15/2018 10:14:12 PM 07/15/2018 10:14:00…
4   201197538         2399 180035451 07/17/2018 10:11:04 PM 07/17/2018 01:40:00…
5   201194239         1399 180031372 06/25/2018 10:14:39 PM 06/25/2018 10:14:00…
6   201195017         5707 180032324 06/30/2018 06:36:12 PM 06/30/2018 06:15:00…
# ℹ 26 more variables: end_date_time <chr>, nibrs_code <chr>, victims <dbl>,
#   crime_name1 <chr>, crime_name2 <chr>, crime_name3 <chr>,
#   police_district_name <chr>, block_address <chr>, city <chr>, state <chr>,
#   zip_code <dbl>, agency <chr>, place <chr>, sector <chr>, beat <chr>,
#   pra <chr>, address_number <dbl>, street_prefix <chr>, street_name <chr>,
#   street_suffix <chr>, street_type <chr>, latitude <dbl>, longitude <dbl>,
#   police_district_number <chr>, location <chr>, date <date>

Checking NAs

colSums(is.na(crime2))
           incident_id           offence_code              cr_number 
                     0                      5                      0 
  dispatch_date_ _time        start_date_time          end_date_time 
                 67648                      0                 257439 
            nibrs_code                victims            crime_name1 
                     0                      0                      0 
           crime_name2            crime_name3   police_district_name 
                     0                      0                   1409 
         block_address                   city                  state 
                 37282                     37                   9503 
              zip_code                 agency                  place 
                  3387                      0                      0 
                sector                   beat                    pra 
                     0                      0                      9 
        address_number          street_prefix            street_name 
                 37142                 447249                   1334 
         street_suffix            street_type               latitude 
                461805                   1720                      0 
             longitude police_district_number               location 
                     0                      0                      0 
                  date 
                 67648 

Handling NAs

crime_clean <- crime2 |>
  filter(!is.na(police_district_name),!is.na(city))

Removing unused columns

crimes_clean2 <- crime_clean |>
  select(-`dispatch_date_ _time`,-end_date_time,-block_address,-street_prefix,-street_suffix,-street_type,-offence_code,-zip_code,-address_number,-street_name)
head(crimes_clean2)
# A tibble: 6 × 21
  incident_id cr_number start_date_time        nibrs_code victims crime_name1   
        <dbl>     <dbl> <chr>                  <chr>        <dbl> <chr>         
1   201194204 180031343 06/22/2018 06:00:00 PM 23F              1 Crime Against…
2   201192754 180029527 06/15/2018 09:28:00 AM 35A              1 Crime Against…
3   201197266 180035053 07/15/2018 10:14:00 PM 90D              1 Crime Against…
4   201197538 180035451 07/17/2018 01:40:00 PM 23H              1 Crime Against…
5   201194239 180031372 06/25/2018 10:14:00 PM 13B              1 Crime Against…
6   201195017 180032324 06/30/2018 06:15:00 PM 90J              1 Crime Against…
# ℹ 15 more variables: crime_name2 <chr>, crime_name3 <chr>,
#   police_district_name <chr>, city <chr>, state <chr>, agency <chr>,
#   place <chr>, sector <chr>, beat <chr>, pra <chr>, latitude <dbl>,
#   longitude <dbl>, police_district_number <chr>, location <chr>, date <date>

A summary table

crime_summary <- crimes_clean2 |>
  group_by(police_district_name) |>
  summarise(count = n()) |>
  arrange(desc(count))
crime_summary
# A tibble: 8 × 2
  police_district_name count
  <chr>                <int>
1 SILVER SPRING        98248
2 WHEATON              87108
3 MONTGOMERY VILLAGE   78989
4 BETHESDA             66611
5 ROCKVILLE            62966
6 GERMANTOWN           59052
7 TAKOMA PARK          13825
8 OTHER                   14

Removing others because it has only 14 and compared to other counts it is small

crime_summary1 <- crime_summary |>
  filter(police_district_name != "OTHER")
crime_summary1
# A tibble: 7 × 2
  police_district_name count
  <chr>                <int>
1 SILVER SPRING        98248
2 WHEATON              87108
3 MONTGOMERY VILLAGE   78989
4 BETHESDA             66611
5 ROCKVILLE            62966
6 GERMANTOWN           59052
7 TAKOMA PARK          13825

Visualization

ggplot(crime_summary1,aes(x=(police_district_name),y=count,fill=police_district_name)) +
  geom_col()+
  labs(title="Crime count by Police District in Montgomery County",
       x = "Police District",y="Number of Crimes",caption = "Source: National Incident-Based Reporting System(NIBRS),CJIS,UCR Program")+
  theme_minimal(base_size = 12,base_family = "serif")+
  scale_fill_brewer(palette = "Dark2")+
  theme(plot.title = element_text(face = "bold",size = 14,hjust=0.5),
        axis.text.x = element_text(angle = 45,hjust = 1),
        plot.caption = element_text(hjust = 1),
        legend.position = "none")

Bar chart with the plotly

p <- ggplot(crime_summary1,
            aes(x=police_district_name,
                y=count,fill=police_district_name,
                text=paste("District:",police_district_name,
                           "<br> Total Crimes:",count)))+
   geom_col()+
  labs(title="Crime count by Police District in Montgomery County",
       x = "Police District",
       y="Number of Crimes",
       caption = "Source: National Incident-Based Reporting System(NIBRS),CJIS,UCR Program")+
  theme_minimal(base_size = 12,base_family = "serif")+
  scale_fill_brewer(palette = "Dark2")+
  theme(plot.title = element_text(face = "bold",size = 14,hjust=0.5),
        axis.text.x = element_text(angle = 45,hjust = 1),
        plot.caption = element_text(hjust = 1),
        legend.position = "none")
p <- ggplotly(p,tooltip = "text") ## I used this cite to customize the tooltip (tooltip ="text") https://r-graph-gallery.com/customize-plotly-tooltip.html)
p

Filtered only takoma park

crime_takoma <- crimes_clean2 |>
  filter(police_district_name == "TAKOMA PARK",
         crime_name1 %in% c("Crime Against Property","Crime Against Person"),
         !is.na(longitude),
         !is.na(latitude),
         latitude >38 & latitude <40,
         longitude > -77 & longitude < -76) ## I filtered the latitude and longitude to focus only on Takoma park geographic area. Without this step, Leaflet shows the entire world map and the crime points do not appear within the initial zoom.

Getting the latitude and longitude

mean_long <- mean(crime_takoma$longitude,na.rm = TRUE)
mean_lat <- mean(crime_takoma$latitude,na.rm=TRUE)

mean_long
[1] -76.99196
mean_lat
[1] 38.98192

Assigning two colors for crime against person and crime against property

pal <- colorFactor(
  palette = c("#27408B", "#00FF7F"),  
  domain  = c("Crime Against Property", "Crime Against Person")
)
##https://r-charts.com/spatial/interactive-maps-leaflet/ because with levels this code didnt work

Final Map

leaflet(crime_takoma) |>
  setView(lng = mean_long,
          lat = mean_lat,
          zoom = 13.3) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircleMarkers(lng = ~longitude,
                   lat = ~latitude,
                   radius = 4,
                   stroke = FALSE,
                   fillColor = ~pal(crime_name1),
                   fillOpacity = 0.6,
                   popup = ~paste(
                     "<b>District: </b>",police_district_name,"<br>",
                     "Crime Type:",crime_name1,"<br>",
                     "Details:",crime_name2,"<br>",
                     "Date:",as.character(date)                   )) |>
  addLegend(position = "bottomright",
            pal = pal,
            values = crime_takoma$crime_name1,
            title = "Crime Type",
            opacity = 1)
##https://r-charts.com/spatial/interactive-maps-leaflet/ for legend

Essay

My bar chart shows the total number of crimes reported in each police district in Montgomery County. This helped me answer my main question about which district has the highest crime count. Once I plotted it, the results were super obvious, Silver Spring had the most crimes by a huge margin. I also removed the “OTHER” category because it only had 14 cases, which didn’t really contribute anything and just made the plot look unnecessary.

For the map, I decided to zoom in on just Takoma Park instead of the whole county because mapping every single crime point would have been way too messy. I used Leaflet to plot the exact locations of crimes and used two different colors to show whether each case was a crime against property or a crime against a person. I really liked how this helped me see the differences between the two types visually. I noticed that crimes were spread throughout the city.

At first, my map kept loading the entire world or showing no points at all, so I had to filter the latitude and longitude. After fixing that, the map finally made sense and looked clean.

I originally wanted to make the dot sizes represent how serious the crime was, but the dataset didn’t include a severity level. Even without this feature, I feel like the two-color map clearly shows the patterns I wanted to highlight.

Bibliography

https://r-graph-gallery.com/customize-plotly-tooltip.html)

https://r-charts.com/spatial/interactive-maps-leaflet/

https://r-charts.com/spatial/interactive-maps-leaflet/