Andrulis_GEOG588_FinalProject

Spatial and Temporal Analysis of 2023 Crime Incidents for Washington, DC

Abstract

This study analyzed 34,213 crime incidents in Washington, DC in 2023, focusing on their spatial and temporal characteristics. The analysis revealed that H3 hexagons, which are not influenced by political, cultural, and physical geographic factors, provided an unbiased view of crime incidents, aiding in resource allocation for crime prevention. The study also found a 57%/43% split between crimes occurring during the day and at night, aligning with previous studies. Furthermore, the analysis grouped crime incidents by type and time of occurrence, offering insights into the nature of crimes committed during different times of the day. These findings can help city officials and law enforcement agencies better understand crime patterns and effectively allocate resources.

Introduction

In 2023 there were 34,213 recorded crime incidents in Washington, DC; making it one of the most crime filled years in the city’s history.¹ As a result of the recent uptick in crime Washingtonians have been expressing their displeasure with the status quo and have been urging the city to implement change to rectify the situation.² However, to eventually reduce city officials needs to know where and why most of the crimes are taking place. By knowing where and when crimes are taking place it allows officials to modify their current approaches to handle the new types of crime that are occurring in the city.³

To help the city stay informed on the current crime trends this exploratory analysis seeks to provide insight on the spatial and temporal characteristics of the 2023 crime incidents. The analysis does this by answering the following research questions:

Best Geography to Summarize Crime Data: What is the best time geometry to summarize crime data for Washington, DC?
High Crime Areas: Where do high amounts of crime occur within Washington, DC?
Day vs. Night Crimes: Are there more daytime crimes compared to night?
Types of Crimes: What types of crimes happen more frequently during the day?

This project explores answering these research questions through utilizing the R programming language to ingest, condition, analyze and communicate findings. The analysis was conducted in R studio where the finalized results were then packaged and presented in an R Markdown document. This document is hosted on the R Pubs servers which allows readers to easily view the HTML document, transparently view the sources and methods used, while also being able to interact with the document’s interactive elements.

Study Area

This project chose to use Washington, DC as its study area. This is because Washington, DC has a very robust online data portal which make a lot of useful geographic information available to analysts. In addition, Washington, DC is unique because in recent years it has experienced more crime than normal and there should be some interesting spatial and temporal patterns.

#Map of DC Wards
dcWardMap <- ggplot()+
  #DC Wards
  geom_sf(data = dc_wards, color = "black", fill = NA)+
  # DC Waterbodies
  geom_sf(data = dc_water, aes(fill = "Waterbodies"), color = NA, alpha = 0.5) +
  # DC Parks
  geom_sf(data = dc_parks, aes(fill = "Parks"), color = NA, alpha = 0.5) +
  # Labels and theme
  labs(title = "Wards") +
  scale_fill_manual(name = NULL, values = c("Waterbodies" = "blue", "Parks" = "darkgreen"), 
                    labels = c("Parks", "Waterbodies"))+
  theme_void()

#DC Neighborhood Map
dcNeighborhoodMap <- ggplot()+
  #DC Wards
  geom_sf(data = dc_neighborhoods, color = "black", fill = NA)+
  # DC Waterbodies
  geom_sf(data = dc_water, aes(fill = "Waterbodies"), color = NA, alpha = 0.5) +
  # DC Parks
  geom_sf(data = dc_parks, aes(fill = "Parks"), color = NA, alpha = 0.5) +
  # Labels and theme
  labs(title = "Neighborhoods") +
  scale_fill_manual(name = NULL, values = c("Waterbodies" = "blue", "Parks" = "darkgreen"), 
                    labels = c("Parks", "Waterbodies"))+
  theme_void()

#DC Block Group Map
dcBlockGroupMap <- ggplot()+
  #DC Wards
  geom_sf(data = dc_blockgroups, color = "black", fill = NA)+
  # DC Waterbodies
  geom_sf(data = dc_water, aes(fill = "Waterbodies"), color = NA, alpha = 0.5) +
  # DC Parks
  geom_sf(data = dc_parks, aes(fill = "Parks"), color = NA, alpha = 0.5) +
  # Labels and theme
  labs(title = "Block Groups") +
  scale_fill_manual(name = NULL, values = c("Waterbodies" = "blue", "Parks" = "darkgreen"), 
                    labels = c("Parks", "Waterbodies"))+
  theme_void()

#Map of DC Hexagons
dcHexagonMap <- ggplot()+
  #DC Boundary
  geom_sf(data = dc_boundary, color = "black", fill = NA, size = 10) +
  #DC Hexagons
  geom_sf(data = dc_hexagons, color = "black", fill = NA) +
  # DC Waterbodies
  geom_sf(data = dc_water, aes(fill = "Waterbodies"), color = NA, alpha = 0.5) +
  # DC Parks
  geom_sf(data = dc_parks, aes(fill = "Parks"), color = NA, alpha = 0.5) +
  # Labels and theme
  labs(title = "H3 Hexagons") +
  scale_fill_manual(name = NULL, values = c("Waterbodies" = "blue", "Parks" = "darkgreen"), 
                    labels = c("Parks", "Waterbodies"))+
  theme_void()

# Create the 2x2 grid of plots
grid.arrange(dcWardMap, dcNeighborhoodMap, dcBlockGroupMap, dcHexagonMap, ncol = 2, nrow = 2)

Data

This project used a variety of data that were processed using R and multiple R libraries. Most of the DC specific data were pulled directly from the DC Open Data’s API as GeoJSON files. The DC Open Data Portal is a data portal funded by the DC Government which allows law makers, public servants, law enforcement and citizens equal access to the same data.⁴ The decision to use these GeoJSON data is due to their steaming nature. By ingesting streamed GeoJSON for most of the data it insured that this analysis was constantly using the most up-to-date information available.

In addition, other data was ingested using APIs directly from the R script. Data were sourced from three different R libraries: tidycensus, h3jsr and sunclac. Tidycensus is a library made by Dr. Kyle Walker and interacts directly with the U.S. Census Bureau’s Census API to ingest data directly from the Census Bureau.⁵ H3jsr was created by Lauren O’Brien and allows users to use R to query Uber’s H3 API.⁶ Sunclac was created by Benoit Thieurmel and Achraf Elmarhraoui and utilizes a series of formulas to calculate the specific sunrise and sunset times given a specific date and location.⁷

Below is a complete list of the data used in the project in addition to outlining some of the quirks of each data set.

DC 2023 Crime Incidents⁸
- 34,213 crime records.
- Includes columns for when the crime events were reported, happened, and closed out of the MPD system. For consistency this project chose to use the report timestamp of the analysis.
- All homicide crime incidents have a timestamp of 00:00:00.
Sunrise and Sunset Data
- Accessed using suncalc.
- Includes results for a multitude of solar characteristics such as “nadir” which is the darkest moment of the night in sky. However, this project utilized “sunrise” and “sunset” statistics. +Sunrise – When the top edge of the un appears on the horizon. +Sunset – when the edge of the sun disappears below the horizon.
DC Wards⁹
- Polygon datasets that highlights the geometries for each of the city’s 8 wards.
DC Neighborhood Clusters¹⁰
- Polygon dataset that highlights the 2021 neighborhood clusters for the city.
DC Block Groups
- Accessed using tidycensus.
- Returned the 2021 census block groups for Washington, DC.
H3 Hexagons
- Accessed using h3jsr.
- Used to create resolution 9 hexagons for DC. Each resolution 9 hexagon covers on average 105,322 square meters.
DC Waterbodies¹¹
- Dataset shows the updated 2021 water boundaries. The city updates this dataset every few years.
DC Parks¹²
- A polygon dataset of every DC and National Park within the district’s boundaries.
DC Boundary¹³
- A polygon of the district’s political boundaries.

Methodology

The first few steps in the analysis were importing the R libraries, importing data from the various data sources, and then conditioning the data. The data conditioning was important because it ensured that all the data sets would work together and that datasets didn’t have any ancillary fields that were not needed. This allowed the datasets to take up less system memory and allowed them to be processed quicker and more efficiently. For example, the original 2023 Crime Incidents table had 21 original columns but only four of them were necessary for this analysis.

To analyze the spatial component of the crime incidents the project summarized the crime incident point data to the four different geographies being compared: DC Ward, Neighborhood, Census Block Group, and H3 Resolution 9 hexagons. This was done through using the SF function st_intersects. This function determines in the input points, crime incidents, intersect the specified input geometry. The output is a new polygon layer that has a new column that has the count of point features that intersected that specific geometry. These data were then used to create four maps that compared how the different geometries summarized the data.

To shift from the spatial to temporal component of the analysis of the crime incident dataset needed to be conditioned further. Originally, crime incidents report date field held the timestamp for each event in UNIX time. This time format counts the number of seconds it’s been since January 1, 1970. However, that format wasn’t conducive to be input into the suncalc library so it was converted. The results was three new fields, the first was a combined date and time field that was formatted in the posix format, a date field which just kept the date of each incident in a date format, and a time column which housed the time of each crime incident.

#Extract Dates from Crime Data----
#First 10 Pre-Converted Records in DC Crime
top10dates_notConverted <- head(dc_crime, 10)

# Function to convert timestamps to separate components
convert_timestamps <- function(timestamps) {
  # Convert timestamps to numeric format
  timestamps_numeric <- as.numeric(timestamps)
  
  # Convert timestamps to POSIXct format
  datetime <- as.POSIXct(timestamps_numeric / 1000, origin = "1970-01-01")
  
  # Extract components
  day <- format(datetime, "%d")
  month <- format(datetime, "%m")
  year <- format(datetime, "%Y")
  hour <- format(datetime, "%H")
  minute <- format(datetime, "%M")
  second <- format(datetime, "%S")
  
  # Create data frame with separate columns
  df <- data.frame(
    day = as.integer(day),
    month = as.integer(month),
    year = as.integer(year),
    hour = as.integer(hour),
    minute = as.integer(minute),
    second = as.integer(second)
  )
  
  return(df)
}

# Converting the REPORT_DAT column to Individual
dc_crime$REPORT_DATE<- convert_timestamps(unlist(dc_crime$REPORT_DAT))

# Combine the columns into a single DTG column
dc_crime$REPORT_TIMESTAMP <- with(dc_crime, paste(
  REPORT_DATE$year, REPORT_DATE$month, REPORT_DATE$day,
  REPORT_DATE$hour, REPORT_DATE$minute, REPORT_DATE$second,
  sep = "-"
))

# Convert the DTG column to a POSIXct date-time object
dc_crime$REPORT_TIMESTAMP <- as.POSIXct(dc_crime$REPORT_TIMESTAMP, format = "%Y-%m-%d-%H-%M-%S")

#Making a new column that's just that date as a Date class
dc_crime$REPORT_DATE <- as.Date(dc_crime$REPORT_TIMESTAMP)

#Making a new column that's just the tiem as a posix class
dc_crime$TIME <- format(dc_crime$REPORT_TIMESTAMP, format = "%H:%M:%S")

#Removing unncessary columns
dc_crime <- dc_crime[c("REPORT_TIMESTAMP", "REPORT_DATE", "TIME", "OFFENSE", "LONGITUDE", "LATITUDE", "geometry")]

##First 10 Converted Records in DC Crime
top10dates_Converted <- head(dc_crime,10)

#Removing variables from memory
rm(top10dates_Converted, top10dates_notConverted)

Now that the crime incidents were formatted correctly, they were input into suncalc’s getSunglightTimes function. This function took each incident’s lat, lon, and date to find out what the official sunrise and sunset times were for each crime incident. Logic was used to change the time zone used to the calculation based on the 2023 day light savings guidance. This changed the time zone used in the getSunlightTimes function from UTC+4 to UTC+5 depending on the time of year each event occurred. Having assigned the sunrise and sunset times for each incident the crime data was now properly formatted so that it could be used to create maps and charts.

Results

#Making a map to show all of the crime ----
# Plotting all crime
allCrimeMap <- ggplot() +
  #DC Boundary
  geom_sf(data = dc_boundary, color = "black", fill = NA) +
  #DC Crime
  geom_sf(data = dc_crime, aes(fill = "Crime Incidents"), color = "darkred", size = 1)+
  # DC Waterbodies
  geom_sf(data = dc_water, aes(fill = "Waterbodies"), color = NA, alpha = 0.5) +
  # DC Parks
  geom_sf(data = dc_parks, aes(fill = "Parks"), color = NA, alpha = 0.5) +
  # Labels and theme
  labs(title = "2023 Crime Incidents",
       x = "Longitude", y = "Latitude")+
  scale_fill_manual(name = NULL, values = c("Crime Incidents" = "darkred", "Waterbodies" = "blue", "Parks" = "darkgreen"), 
                    labels = c("Crime Incidents", "Parks", "Waterbodies"))+
  theme_void()

allCrimeMap

This map shows all 34,213 crime incidents plotted using ggplot. While all the incidents are plotted on the map very few insights can be extracted from it. This is because there is no way to visualize the clustering of crime points across the city. This map make it appear that all of central Washington, DC has equal amounts of crime since there is a continuous red splotch across the map. However, that is not the case. There are simply just too many crime incidents or this visualization method to accurately display.

#Summarizing crime data by the different geographies----
#Wards
#Summarize the number of crime incidents per ward
dc_wards$incidents <- lengths(st_intersects(dc_wards, dc_crime))

#Finding the top Ward
topWard <- dc_wards %>%
  top_n(1, incidents)

wardCrimeSummary <- ggplot() +
  geom_sf(data = dc_wards, aes(fill = incidents), color = "black") +
  scale_fill_distiller(palette = "Reds", direction = 1, name = "Incidents") +
  geom_sf(data = topWard, fill = NA, color="black", lwd = 1.5)+
  labs(title = "Crime Incidents by Ward",
       caption = "Calling out the top Ward") +
  theme_void()

#Neighborhoods
#Summarize the number of crime incidents by neighborhood
dc_neighborhoods$incidents <- lengths(st_intersects(dc_neighborhoods, dc_crime))

#Finding the top Neighborhoods
topNeighborhoods <- dc_neighborhoods %>%
  top_n(5, incidents)

#Plotting the neighborhoods 
neighborhoodCrimeSummary <- ggplot()+
  geom_sf(data = dc_neighborhoods, aes(fill = incidents), color = "black")+
  scale_fill_distiller(palette = "Reds", direction = 1, name = "Incidents")+
  geom_sf(data = topNeighborhoods, fill = NA, color="black", lwd = 1.5)+
  labs(title = "Crime Incidents by Neighborhood",
       caption = "Calling out the top 5 Neighborhoods")+
  theme_void()

#Block Groups
#Summarize the number of crime incidents by block group
dc_blockgroups$incidents <- lengths(st_intersects(dc_blockgroups, dc_crime))

# Finding the top block groups
topBG <- dc_blockgroups %>%
  top_n(10, incidents)

# Calculate Jenks natural breaks
breaks <- classIntervals(dc_blockgroups$incidents, n = 5, style = "jenks")

# Discretize the variable based on Jenks breaks
dc_blockgroups$incidents_group <- cut(dc_blockgroups$incidents, breaks$brks)

# Plot with styled fill
bgCrimeSummary <- ggplot(data = dc_blockgroups, aes(fill = incidents_group)) +
  geom_sf() +
  scale_fill_brewer(palette = "Reds", name = "Incidents") +  # Use a suitable color palette
  geom_sf(data = topBG, fill = NA, color = "black", lwd = 1) +
  labs(title = "Crime Incidents by Block Group",
       caption = "Calling out the top 10 Block Groups")+
  theme_void()


#Hexagons
#Summarize crime to each polygon
dc_hexagons$incidents = lengths(st_intersects(dc_hexagons, dc_crime))

#Removing hexagons if they have a value of 0
DC_Hexagons <- dc_hexagons[dc_hexagons$incidents != 0, ]


#Finding the top 10 hexagons
topHexagons <- DC_Hexagons %>%
  top_n(10, incidents)

# Calculate Jenks natural breaks
breaks <- classIntervals(DC_Hexagons$incidents, n = 5, style = "jenks")

# Discretize the variable based on Jenks breaks
DC_Hexagons$incidents_group <- cut(DC_Hexagons$incidents, breaks$brks)

# Plot with styled fill
hexSummary <- ggplot(data = DC_Hexagons, aes(fill = incidents_group)) +
  geom_sf() +
  scale_fill_brewer(palette = "Reds", name = "Incidents") +  # Use a suitable color palette
  geom_sf(data = topHexagons, fill = NA, color="black", lwd = 0.6)+
  theme_void() +
  labs(title = "Crime Incidents by Hexagon",
       caption = "Calling out the top 10 hexagons")

#Plotting the crimes summarized by geometry
# Create the 2x2 grid of plots
grid.arrange(wardCrimeSummary, neighborhoodCrimeSummary, bgCrimeSummary, hexSummary, ncol = 2, nrow = 2)

This map series explored how the four different summary geographies faired when trying to summarize the crime incidents. While all the maps relay information it all depends on the scale needed. The Wards and Neighborhood geographies vary spatially but that can be contributed to how those geographies were created. Both of those are cultural creations that seek to outline where and how people live within the city. Where are the Block Groups and H3 hexagons approach the problem a different way. Block Groups are a subsect of Census Tracts that attempt to keep the population within each Block Group to a population range between 600 - 3,000 people.¹⁴ These removes most of the political boundaries found in Ward and Neighborhood divisions. H3 hexagons take this to a new level where each hexagon represents a specific area and is not influenced by political, cultural, and physical geographic factors. As a result, H3 hexagons can offer an unbiased look at how data are aggregated within political and cultural defined boundaries.¹⁵

The unbiased view that H3 hexagons provide for this map are instrumental in summarizing crime incidents. While there’s value in understanding where crime is happening at the ward, neighborhood, and block groups levels is important for budgeting and police force distribution. However, those geographies have little impact on crime and so a non-biased form of data aggregation is needed.¹⁶ H3 provides a much finer resolution while can help define where crime fighting resources should be concentrated.

#Plotting the top results from each geography----
#Top Ward
topWard_Plot <- ggplot(topWard, aes(x = reorder(NAME, incidents), y = incidents)) +
  geom_col(fill = "skyblue", color = "black") +
  geom_text(aes(label = str_wrap(NAME, width = 10)), vjust = 10) +  # Wrap x-axis labels and adjust position
  geom_text(aes(label = incidents), vjust = -0.5, size = 3, color = "black") +  # Display total values on top of bars
  labs(title = "Top Ward by Incidents",
       x = NULL, y = "Incidents") +  # Remove x-axis label
  theme_minimal() +
  theme(axis.text.x = element_blank())  # Remove x-axis labels

#Top Neighborhood
topNeighborhood_Plot <- ggplot(topNeighborhoods, aes(x = reorder(NBH_NAMES, incidents), y = incidents)) +
  geom_col(fill = "skyblue", color = "black") +
  geom_text(aes(label = str_wrap(NBH_NAMES, width = 10)), vjust = 1.1) +  # Wrap x-axis labels and adjust position
  geom_text(aes(label = incidents), vjust = -0.5, size = 3, color = "black") +  # Display total values on top of bars
  labs(title = "Top 5 Neighborhoods by Incidents",
       x = NULL, y = "Incidents") +  # Remove x-axis label
  theme_minimal() +
  theme(axis.text.x = element_blank())  # Remove x-axis labels

# Top Block Groups
topBG_Plot <- ggplot(topBG, aes(x = reorder(GEOID, incidents), y = incidents)) +
  geom_col(fill = "skyblue", color = "black") +
  geom_text(aes(label = paste0("GEOID: ", str_wrap(GEOID, width = 10))), hjust = 1.25, angle=90) +  # Wrap x-axis labels and adjust position
  geom_text(aes(label = incidents), vjust = -0.5, size = 3, color = "black") +  # Display total values on top of bars
  labs(title = "Top 10 Block Groups by Incidents",
       x = NULL, y = "Incidents") +  # Remove x-axis label
  theme_minimal() +
  theme(axis.text.x = element_blank())  # Remove x-axis labels

#Top Hexagons
topHex_Plot <- ggplot(topHexagons, aes(x = reorder(hex_id, incidents), y = incidents)) +
  geom_col(fill = "skyblue", color = "black") +
  geom_text(aes(label = paste0("Hex ID: ", str_wrap(hex_id, width = 10))), hjust = 1.1, angle=90) +  # Wrap x-axis labels and adjust position
  geom_text(aes(label = incidents), vjust = -0.5, size = 3, color = "black") +  # Display total values on top of bars
  labs(title = "Top 10 H3 Hexagons by Incidents",
       x = NULL, y = "Incidents") +  # Remove x-axis label
  theme_minimal() +
  theme(axis.text.x = element_blank())  # Remove x-axis labels

# Create the 2x2 grid of plots
grid.arrange(topWard_Plot, topNeighborhood_Plot, topBG_Plot, topHex_Plot, ncol = 2, nrow = 2)

These plots call out the top, top-5 and top-10 geographies within each the summarized geography, the same ones called out in the previous map. These graphs highlight that block groups and h3 have a very similar breakdown. However, both of their downsides is that they need to be referenced against a map to see where each block group or hexagon is located. The real value of these graphs comes when they’re viewed in conjunction with the previous map. This allows readers to see spatial where the geographies in the graphs are in addition to viewing the scale of each using size and color in the two visualizations.

#Finding Sunrise and Sunset Times ----
#Using sunclac to calculate if a crime event occurred during the day or night.
#Start: 12 March 2023 
#End: 5 November 2023
dc_crime <- dc_crime %>%
  rowwise() %>%
  mutate(
    sunrise = list(if_else(
      REPORT_DATE >= as.Date("2023-03-12") & REPORT_DATE < as.Date("2023-11-05"), #Accounting for Daylight Savings
      getSunlightTimes(date = REPORT_DATE, lon = LONGITUDE, lat = LATITUDE, keep = "sunrise", tz = "UTC+4"),
      getSunlightTimes(date = REPORT_DATE, lon = LONGITUDE, lat = LATITUDE, keep = "sunrise", tz = "UTC+5")
    )),
    sunset = list(if_else(
      REPORT_DATE >= as.Date("2023-03-12") & REPORT_DATE < as.Date("2023-11-05"), #Accounting for Daylight Savings
      getSunlightTimes(date = REPORT_DATE, lon = LONGITUDE, lat = LATITUDE, keep = "sunset", tz = "UTC+4"),
      getSunlightTimes(date = REPORT_DATE, lon = LONGITUDE, lat = LATITUDE, keep = "sunset", tz = "UTC+5")
    ))
  ) %>%
  ungroup()

# Un-nesting the nested df with the time data. Selects a specific column within the nested data.
dc_crime <- dc_crime %>%
  mutate(
    sunrise = map(sunrise, ~ select(.x, sunrise)),
    sunset = map(sunset, ~ select(.x, sunset))
  ) %>%
  unnest(cols = c(sunrise, sunset))


#Creating a new column and populating it depending on when the crime took place
dc_crime$time_of_day <- ifelse(dc_crime$REPORT_TIMESTAMP >= dc_crime$sunrise & dc_crime$REPORT_TIMESTAMP <= dc_crime$sunset, 'Day', 'Night')




#Creating a chart of the number of crimes that took place at night vs during the day. ----
#Creating a variable that keeps the count of Day vs Night values
crime_count <- table(dc_crime$time_of_day)

#Making that a df
crime_count_df <- as.data.frame(crime_count)

# Calculate percentage for each time of day
crime_count_df$percentage <- crime_count_df$Freq / sum(crime_count_df$Freq) * 100

# Create the pie chart of number of crimes by time of day
crime_count_pie <- plot_ly(crime_count_df, labels = ~Var1, values = ~Freq, type = 'pie',
               textinfo = 'label+percent+value',
               insidetextorientation = 'radial')

#Adding a title
crime_count_pie <- crime_count_pie %>% layout(title = list(text = "Crime by Time of Day", y = .98))

# Display the plot
crime_count_pie

This pie chart illustrates the breakdown of crime incidents by incidents that occurred during the day and those that occurred during the night. Analysis of the 32,213 incidents reveals that is nearly a 57%/ 43% split between crimes occurring during the day compared to at night. These results for Washington, DC line up with a 2017 aggregated crime analysis that was conducted by The Sleep Judge using data from multiple cities found that there was a 55% / 45% split between daytime and nighttime crime.¹⁷ However, differences between their analysis and this study’s analysis is that they used hard time cutoffs to delineate daytime and nighttime. They used 7am – 6:59pm for all cities to classify their data. While that’s one approach, this study considers the longitude and latitude of each crime incident and determines if that happened outside of within actual daytime and nighttime boundaries.

#Breaking out the types of crime by when they occurred ----
offense_summary <- dc_crime %>%
  group_by(OFFENSE, time_of_day) %>%
  summarize(count = n(), .groups = "drop")

#Creating a bar chart of the summarized data
offense_plot <- ggplot(offense_summary, aes(x = OFFENSE, y = count, fill = time_of_day)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_minimal() +
  geom_text(aes(label=count), vjust=-0.3, size=3.5, position = position_dodge(0.9)) +
  labs(title = "Offense Count by Time of Day",
       x = element_blank(),
       y = "Count",
       fill = "Time of Day",
       caption = "NOTE: All Homicides have a timestamp of 00:00:00") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))+
  scale_fill_manual(values = c("Day" = "lightblue", "Night" = "black"))

offense_plot

This bar chart groups the daytime/ nighttime data by the type of offense that was committed. The result is a bar chart that calls out what crime typically occur during the day and which ones occur during the day or night. Typical crimes like theft and motor vehicle theft occur most frequently during the day, However, there are over twice as many instances of assault with a dangerous weapon and robbery incidents that occur during the night compared to daytime numbers. Sex abuse also occurs more frequently during the night but only at a 20% increase over daytime incidents. Homicides are excluded from this analysis. All their timestamps in the original data are recorded as 00:00:00. DC does not explain why this is but it’s theorized that that’s because a homicide is an assault with a deadly weapon where the victim ends up dying. As a result, which timestamp is the correct one to record the event, the assault with a deadly weapon or the timestamp when the victim passed. As a result of homicides being difficult to accurately time, they are all just assigned a default time value which allows their location and date information to be included.

#Mapping the crimes that occured at night vs the ones that occured during the day. ----
#Making new layers for day and night
dc_crime_day <- subset(dc_crime, time_of_day == "Day")
dc_crime_night <- subset(dc_crime, time_of_day == "Night")

#Mapping the crime data by time
crime_TOD_map <- leaflet() %>%
  addProviderTiles(providers$CartoDB) %>%
  #Day Crimes
  addCircleMarkers(data = dc_crime_day,
                   lng = ~LONGITUDE, lat = ~LATITUDE,
                   color = "blue",
                   label = ~as.character(OFFENSE),
                   opacity = 1, fillOpacity = 1,
                   radius = 1,
                   group = "Day Crimes") %>%
  #Night Crimes
  addCircleMarkers(data = dc_crime_night,
                   lng = ~LONGITUDE, lat = ~LATITUDE,
                   color = "grey",
                   label = ~as.character(OFFENSE),
                   opacity = .35, fillOpacity = .25,
                   radius = 1,
                   group = "Night Crimes") %>%
  addLayersControl(
    overlayGroups = c("Day Crimes", "Night Crimes"),
    options = layersControlOptions(collapsed = FALSE)
  )

crime_TOD_map

This leaflet map simply displays the point locations of all daytime and nighttime crime incidents. There aren’t many insights that can be extracted from it. However, this map does a good job of highlighting the types of crimes that occur on a micro scale. It’s interesting to see how there are more nighttime crimes in the Trinidad neighborhood compared to daytime crimes.

#Hexbins by time of day ----
#Making a new Crimes Hexagon Layers
#By copying and removing columns it removes the need to recreate the hexagons
dayCrime_Hexagons <- subset(dc_hexagons, select = c("hex_id", "geometry"))
nightCrime_Hexagons <- subset(dc_hexagons, select = c("hex_id", "geometry"))

#Summarizing by day and night
dayCrime_Hexagons$incidents = lengths(st_intersects(dayCrime_Hexagons, dc_crime_day))
nightCrime_Hexagons$incidents = lengths(st_intersects(nightCrime_Hexagons, dc_crime_night))

#Removing rows with 0s
dayCrime_Hexagons <- dayCrime_Hexagons[dayCrime_Hexagons$incidents != 0, ]
nightCrime_Hexagons <- nightCrime_Hexagons[nightCrime_Hexagons$incidents != 0, ]


#Creating a ggplot that displays the center of activity for both day and night crimes.
#Day
#Finding the top 10 hexagons
topHexagons <- dayCrime_Hexagons %>%
  top_n(10, incidents)

# Calculate Jenks natural breaks
breaks <- classIntervals(dayCrime_Hexagons$incidents, n = 5, style = "jenks")

# Discretize the variable based on Jenks breaks
dayCrime_Hexagons$incidents_group <- cut(dayCrime_Hexagons$incidents, breaks$brks)

# Plot with styled fill
day_hexSummary <- ggplot(data = dayCrime_Hexagons, aes(fill = incidents_group)) +
  geom_sf() +
  #DC Boundary
  geom_sf(data = dc_boundary, color = "black", fill = NA, size = 10) +
  # DC Waterbodies
  geom_sf(data = dc_water, aes(fill = "waterbodies"), color = NA, fill = "lightblue",alpha = 0.5) +
  # DC Parks
  geom_sf(data = dc_parks, aes(fill = "parks"), color = NA,fill = "darkgreen", alpha = 0.5) +
  scale_fill_brewer(palette = "Reds", name = "Incidents") +  # Use a suitable color palette
  geom_sf(data = topHexagons, fill = NA, color="black", lwd = 0.6)+
  theme_void() +
  labs(title = "Day Crimes by Hexagon",
       caption = "Calling out the top 10 hexagons")

#Night
#Finding the top 10 hexagons
topHexagons <- nightCrime_Hexagons %>%
  top_n(10, incidents)

# Calculate Jenks natural breaks
breaks <- classIntervals(nightCrime_Hexagons$incidents, n = 5, style = "jenks")

# Discretize the variable based on Jenks breaks
nightCrime_Hexagons$incidents_group <- cut(nightCrime_Hexagons$incidents, breaks$brks)

# Plot with styled fill
night_hexSummary <- ggplot(data = nightCrime_Hexagons, aes(fill = incidents_group)) +
  geom_sf() +
  #DC Boundary
  geom_sf(data = dc_boundary, color = "black", fill = NA, size = 10) +
  # DC Waterbodies
  geom_sf(data = dc_water, aes(fill = "waterbodies"), color = NA, fill = "lightblue",alpha = 0.5) +
  # DC Parks
  geom_sf(data = dc_parks, aes(fill = "parks"), color = NA,fill = "darkgreen", alpha = 0.5) +
  scale_fill_brewer(palette = "Reds", name = "Incidents") +  # Use a suitable color palette
  geom_sf(data = topHexagons, fill = NA, color="black", lwd = 0.6)+
  theme_void() +
  labs(title = "Night Crimes by Hexagon",
       caption = "Calling out the top 10 hexagons")

# Create the 2x1 grid of plots
grid.arrange(day_hexSummary, night_hexSummary, ncol = 2, nrow = 1)

This map summarized the analysis by showing the differences between the top 10 h3 hexagons during the daytime and nighttime. Comparing the two maps it becomes apparent that there is a consolidation of crime events to the Shaw neighborhood during the nighttime. 5 of the top 10 hexagons are all neighbors in that neighborhood during the evening, which shows that there’s a clustering of crime incidents there. Whereas the daytime crime incidents shows a small cluster near Shaw most hexagons are spread out across the city. This provides an insight into the fact that during the evening crime might concentrate in certain areas like the Shaw neighborhood.

Conclusion

In conclusion this project provided commentary on what types of geographies are best to describe crimes at a city scale. H3 hexagons are great for summarizing these types of data because they aren’t influenced by outside factors, and they keep their own unique spatial boundaries. This allows for multiple years’ worth of h3 analyses to be merged and compared.

This study also found that there’s a 57%/ 43% split between the number of crime incidents depending on if they occurred during the day or night. Of those crimes it was found that assault with a deadly weapon and robberies are over 100% more likely to occur during the night and sex abuse incidents are 20% more likely.

Reflection

This project was a data science challenge. Formatting all the data so that they would mesh with each other provided a challenge. In addition, at times it was frustrating trying to reformat various time fields from unix time to posix and then to break out the date and time into separate columns.

To continue this project I’d compared the results of multiple cities to see if DC’s breakdown of nighttime vs daytime crime statistics are unique to DC or vastly different from the study group’s mean.

References

¹https://www.vox.com/cities-and-urbanism/24055029/washington-dc-crime-rate-homicides-republican-democrats
²https://www.wsj.com/us-news/violent-crime-is-surging-in-d-c-this-year-we-just-stood-there-and-screamed-380f3c69
³https://nij.ojp.gov/topics/articles/maps-how-mapping-helps-reduce-crime-and-improve-public-safety
⁴https://opendata.dc.gov/
⁵https://walker-data.com/tidycensus/
⁶https://obrl-soil.github.io/h3jsr/
⁷https://github.com/datastorm-open/suncalc
⁸https://opendata.dc.gov/datasets/DCGIS::crime-incidents-in-2023/about
⁹https://opendata.dc.gov/datasets/DCGIS::wards-from-2022/about
¹⁰https://opendata.dc.gov/datasets/DCGIS::neighborhood-clusters/about
¹¹https://opendata.dc.gov/datasets/DCGIS::waterbodies-2021/about
¹²https://opendata.dc.gov/datasets/DCGIS::parks-and-recreation-areas/about
¹³https://opendata.dc.gov/datasets/DCGIS::washington-dc-boundary/about
¹⁴https://en.wikipedia.org/wiki/Census_block_group
¹⁵https://www.uber.com/blog/h3/
¹⁶https://journals.sagepub.com/doi/10.1177/0011128716687756
¹⁷https://www.thesleepjudge.com/crimes-that-happen-while-you-sleep/