Spatial and Temporal Analysis of 2023 Crime Incidents for Washington, DC

Research Question:

This exploratory analysis seeks to quantify the spatial and temporal trends in 2023 crime data for Washington, DC. In 2023 there were 34,213 recorded crimes in Washington, DC; making it one of the most crime filled years in the city’s history. The purpose of this analysis is to explore these data and extract some interesting statistics and or trends from the data.

To do this this analysis first looks at the spatial distribution of crime incidents throughout the city. Explores four different methods of summarizing point data to different geographies. The analysis summarized the Crime Incident data to Washington, DC Wards, Neighborhoods, U.S. Census Bureau Block Groups, and Resolution 9 H3 hexagons.

After comparing the differences between the different geographies the analysis shifts to explore the temporal components of the crime incidents. This aspect of the analysis compares the locations of crimes that occurred during daylight hours compared to those that occurred during night hours. Then the analysis compares the data against a different temporal component, seasonality. This compares the spatial patterns that occur during the Winter, Spring, Summer and Fall seasons.

Data:

This project uses a variety of data that are processed using R and multiple R libraries. The majority of the DC focused data were pulled directly from DC Open Data’s API as GeoJSON files. The DC Open Data portal is a data portal funded by the DC Government and allows for public servants, law makers and researches to all use the same data when making decision. The decision to use streaming GeoJSON data in this project compared to directly downloaded tablular data was to account for any of DC’s data management. IF DC decided to remove any data or update any data the newest data would automatically be pulled into this analysis. As a results this allows this analysis to use the most up-to-date data available.

This project also experiments with importing some data directly as a shapefile, dc_boundary, and others using R libraries. In total this projet brings in data from two R libraries: tidycensus and h3jsr. Tidycensus is a library made by Dr. Kyle Walker and it interacts directly with the U.S. Census Bureau’s Census API to pull data in directly from the Census Bureau (1). The second library, h3jsr, is a library created by Lauren O’Brien that allows users to pull Uber’s H3 hexagons directly into the R projects (2).

  1. https://walker-data.com/tidycensus/
  2. https://obrl-soil.github.io/h3jsr/
# Load required libraries
library(h3jsr)
library(sf)
library(dplyr)
library(tidyverse)
library(tidycensus)
library(leaflet)
library(gridExtra)
library(plotly)
library(classInt)
library(suncalc)
library(plotly)

Pulling Data Into R:

#Importing Data ----
dc_boundary <- st_read("C:\\PSU\\Geog588\\Week7\\Washington_DC_Boundary\\Washington_DC_Boundary.shp")
dc_crime <- st_read("https://maps2.dcgis.dc.gov/dcgis/rest/services/FEEDS/MPD/MapServer/5/query?outFields=*&where=1%3D1&f=geojson")
dc_wards <- st_read("https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Administrative_Other_Boundaries_WebMercator/MapServer/53/query?outFields=*&where=1%3D1&f=geojson")
dc_neighborhoods <- st_read("https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Administrative_Other_Boundaries_WebMercator/MapServer/17/query?outFields=*&where=1%3D1&f=geojson")
dc_water <- st_read("https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Environment_Water_WebMercator/MapServer/24/query?outFields=*&where=1%3D1&f=geojson")
dc_parks <- st_read("https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Recreation_WebMercator/MapServer/10/query?outFields=*&where=1%3D1&f=geojson")

#Census block groups
dc_blockgroups <- get_acs(
  geography = "block group",
  variables = c(
    Income = "B19013_001"),
  state = "DC",
  year = 2021,
  output = "wide",
  geometry = TRUE
) 

# Get H3 hexagons covering Washington DC
dc_h3 <- polygon_to_cells(dc_boundary, res = 9)
# Convert the H3 hexagons back to polygons
dc_hexagons <- cell_to_polygon(dc_h3)
# Convert to spatial data frame and make a column for hex ids
dc_hexagons <- st_as_sf(dc_hexagons, hexid=dc_h3)
#Changing the column names
names(dc_hexagons) <- c("hex_id", "geometry")
#resetting the geometry column
dc_hexagons <- st_set_geometry(dc_hexagons, "geometry")

Cleaning Data:

Now that the data are imported into R it is important to clean and condition it. This is important for two reasons. First, removing unnecessary columns reduces the amount of memory each datasets uses. As a result it increases the system efficiency so that R does not have to worry about using so many resources when analyzing the data. Secondly, reducing the number of columns makes the data easier to deal with and manipulate. E.g. the dc_wards dataset has 300+ columns that include information about ANC comissioners, trash pick up date and times and etc. Since those columns are not necessary for this analysis they are simply removed.

#Cleaning the Data ----
#Removing the income stats
dc_blockgroups <- dc_blockgroups[c("GEOID", "NAME", "geometry")]
#Setting the block group CRS to the same one as the Wards
dc_blockgroups <- st_transform(dc_blockgroups, crs = st_crs(dc_wards))
#Removing unnecessary ward fields
dc_wards <- dc_wards[c("NAME", "geometry")]
#Cleaning DC Boundary
dc_boundary <- dc_boundary[c("CITY_NAME", "AREAKM", "geometry")]
#Cleaning Crime Data
dc_crime <- dc_crime[c("REPORT_DAT","OFFENSE", "LATITUDE", "LONGITUDE","geometry")]
#Cleaning Neighborhoods
dc_neighborhoods <- dc_neighborhoods[c("NAME", "NBH_NAMES", "geometry")]
#Cleaning Water
dc_water <- dc_water[c("geometry")]
#Cleaning Parks
dc_parks <- dc_parks[c("geometry")]
#Removing dc_H#
rm(dc_h3)

Crime Incidents

In 2023 there were over 34,000 crime incidents recorded by the Metropolitan Police Department. Below is a simple overview of the raw data plotted on a boundary polygon of DC’s boundaries. In future steps these incidents will be summarized to each of the study geographies outlined in the next code block.

#Making a map to show all of the crime ----
# Plotting all crime
allCrimeMap <- ggplot() +
  #DC Boundary
  geom_sf(data = dc_boundary, color = "black", fill = NA) +
  #DC Crime
  geom_sf(data = dc_crime, aes(fill = "Crime Incidents"), color = "darkred", size = 1)+
  # DC Waterbodies
  geom_sf(data = dc_water, aes(fill = "Waterbodies"), color = NA, alpha = 0.5) +
  # DC Parks
  geom_sf(data = dc_parks, aes(fill = "Parks"), color = NA, alpha = 0.5) +
  # Labels and theme
  labs(title = "2023 Crime Incidents",
       x = "Longitude", y = "Latitude")+
  scale_fill_manual(name = NULL, values = c("Crime Incidents" = "darkred", "Waterbodies" = "blue", "Parks" = "darkgreen"), 
                    labels = c("Crime Incidents", "Parks", "Waterbodies"))+
  theme_void()

allCrimeMap

#Removing the allCrimeMap from memory
rm(allCrimeMap)

Study Geographies

This analysis started with exploring which geographies would be most beneficial to display and compare crime data within Washington, DC. To do this, this analysis compares four different geographies: DC Wards, DC Neighborhoods, DC U.S. Census Block Groups, and H3 Hexagons. A overview of the four different geographies can be viewed in the following code block.

#Project Scales - Maps ----
#Map of DC Wards
dcWardMap <- ggplot()+
  #DC Wards
  geom_sf(data = dc_wards, color = "black", fill = NA)+
  # DC Waterbodies
  geom_sf(data = dc_water, aes(fill = "Waterbodies"), color = NA, alpha = 0.5) +
  # DC Parks
  geom_sf(data = dc_parks, aes(fill = "Parks"), color = NA, alpha = 0.5) +
  # Labels and theme
  labs(title = "Wards") +
  scale_fill_manual(name = NULL, values = c("Waterbodies" = "blue", "Parks" = "darkgreen"), 
                    labels = c("Parks", "Waterbodies"))+
  theme_void()

#DC Neighborhood Map
dcNeighborhoodMap <- ggplot()+
  #DC Wards
  geom_sf(data = dc_neighborhoods, color = "black", fill = NA)+
  # DC Waterbodies
  geom_sf(data = dc_water, aes(fill = "Waterbodies"), color = NA, alpha = 0.5) +
  # DC Parks
  geom_sf(data = dc_parks, aes(fill = "Parks"), color = NA, alpha = 0.5) +
  # Labels and theme
  labs(title = "Neighborhoods") +
  scale_fill_manual(name = NULL, values = c("Waterbodies" = "blue", "Parks" = "darkgreen"), 
                    labels = c("Parks", "Waterbodies"))+
  theme_void()

#DC Block Group Map
dcBlockGroupMap <- ggplot()+
  #DC Wards
  geom_sf(data = dc_blockgroups, color = "black", fill = NA)+
  # DC Waterbodies
  geom_sf(data = dc_water, aes(fill = "Waterbodies"), color = NA, alpha = 0.5) +
  # DC Parks
  geom_sf(data = dc_parks, aes(fill = "Parks"), color = NA, alpha = 0.5) +
  # Labels and theme
  labs(title = "Block Groups") +
  scale_fill_manual(name = NULL, values = c("Waterbodies" = "blue", "Parks" = "darkgreen"), 
                    labels = c("Parks", "Waterbodies"))+
  theme_void()

#Map of DC Hexagons
dcHexagonMap <- ggplot()+
  #DC Boundary
  geom_sf(data = dc_boundary, color = "black", fill = NA, size = 10) +
  #DC Hexagons
  geom_sf(data = dc_hexagons, color = "black", fill = NA) +
  # DC Waterbodies
  geom_sf(data = dc_water, aes(fill = "Waterbodies"), color = NA, alpha = 0.5) +
  # DC Parks
  geom_sf(data = dc_parks, aes(fill = "Parks"), color = NA, alpha = 0.5) +
  # Labels and theme
  labs(title = "H3 Hexagons") +
  scale_fill_manual(name = NULL, values = c("Waterbodies" = "blue", "Parks" = "darkgreen"), 
                    labels = c("Parks", "Waterbodies"))+
  theme_void()

# Create the 2x2 grid of plots
grid.arrange(dcWardMap, dcNeighborhoodMap, dcBlockGroupMap, dcHexagonMap, ncol = 2, nrow = 2)

#Removing the plots from computer memory

Summarizing Crime to Geographies

Following introducing the different geographies it is important to see how each of the geographies fair with summarizing the crime incident data. One thing to note is that both the U.S. Cenusus Block Group and H3 results are symbolized using Jenk natural breaks while Wards and Neighborhood are not using a classification scheme. This is because with Wards there aren’t enough Wards to justify a 5 class classification.

Comparing the results is interesting. The Wards and Neighborhood geographies vary spatially but that can be contributed to how those geographies were created. Both of those are cultural creations that seek to outline where and how people live within the city. Where are the Block Groups and H3 hexagons approach the problem a different way. Block Groups are a subsect of Census Tracts that attempt to keep the population within each Block Group to a population range between 600 - 3,000 people (3). This removes must of the political boundaries found in Ward and Neighborhood divisions. H3 hexagons take this to a new level where each hexagon represents a specific area and is not influenced by political, cultural and physical geographic factors. As a result H3 hexagons can offer an unbaised look at how data are aggergated within political and cultural defined boundaries (4).

  1. https://en.wikipedia.org/wiki/Census_block_group#:~:text=A%20Census%20Block%20Group%20is,a%20fraction%20of%20all%20households.
  2. https://www.uber.com/blog/h3/
#Summarizing crime data by the different geographies----
#Wards
#Summarize the number of crime incidents per ward
dc_wards$incidents <- lengths(st_intersects(dc_wards, dc_crime))

#Finding the top Ward
topWard <- dc_wards %>%
  top_n(1, incidents)

wardCrimeSummary <- ggplot() +
  geom_sf(data = dc_wards, aes(fill = incidents), color = "black") +
  scale_fill_distiller(palette = "Reds", direction = 1, name = "Incidents") +
  geom_sf(data = topWard, fill = NA, color="black", lwd = 1.5)+
  labs(title = "Crime Incidents by Ward",
       caption = "Calling out the top Ward") +
  theme_void()

#Neighborhoods
#Summarize the number of crime incidents by neighborhood
dc_neighborhoods$incidents <- lengths(st_intersects(dc_neighborhoods, dc_crime))

#Finding the top Neighborhoods
topNeighborhoods <- dc_neighborhoods %>%
  top_n(5, incidents)

#Plotting the neighborhoods 
neighborhoodCrimeSummary <- ggplot()+
  geom_sf(data = dc_neighborhoods, aes(fill = incidents), color = "black")+
  scale_fill_distiller(palette = "Reds", direction = 1, name = "Incidents")+
  geom_sf(data = topNeighborhoods, fill = NA, color="black", lwd = 1.5)+
  labs(title = "Crime Incidents by Neighborhood",
       caption = "Calling out the top 5 Neighborhoods")+
  theme_void()

#Block Groups
#Summarize the number of crime incidents by block group
dc_blockgroups$incidents <- lengths(st_intersects(dc_blockgroups, dc_crime))

# Finding the top block groups
topBG <- dc_blockgroups %>%
  top_n(10, incidents)

# Calculate Jenks natural breaks
breaks <- classIntervals(dc_blockgroups$incidents, n = 5, style = "jenks")

# Discretize the variable based on Jenks breaks
dc_blockgroups$incidents_group <- cut(dc_blockgroups$incidents, breaks$brks)

# Plot with styled fill
bgCrimeSummary <- ggplot(data = dc_blockgroups, aes(fill = incidents_group)) +
  geom_sf() +
  scale_fill_brewer(palette = "Reds", name = "Incidents") +  # Use a suitable color palette
  geom_sf(data = topBG, fill = NA, color = "black", lwd = 1) +
  labs(title = "Crime Incidents by Block Group",
       caption = "Calling out the top 10 Block Groups")+
  theme_void()


#Hexagons
#Summarize crime to each polygon
dc_hexagons$incidents = lengths(st_intersects(dc_hexagons, dc_crime))

#Removing hexagons if they have a value of 0
DC_Hexagons <- dc_hexagons[dc_hexagons$incidents != 0, ]


#Finding the top 10 hexagons
topHexagons <- DC_Hexagons %>%
  top_n(10, incidents)

# Calculate Jenks natural breaks
breaks <- classIntervals(DC_Hexagons$incidents, n = 5, style = "jenks")

# Discretize the variable based on Jenks breaks
DC_Hexagons$incidents_group <- cut(DC_Hexagons$incidents, breaks$brks)

# Plot with styled fill
hexSummary <- ggplot(data = DC_Hexagons, aes(fill = incidents_group)) +
  geom_sf() +
  scale_fill_brewer(palette = "Reds", name = "Incidents") +  # Use a suitable color palette
  geom_sf(data = topHexagons, fill = NA, color="black", lwd = 0.6)+
  theme_void() +
  labs(title = "Crime Incidents by Hexagon",
       caption = "Calling out the top 10 hexagons")

#Plotting the crimes summarized by geometry
# Create the 2x2 grid of plots
grid.arrange(wardCrimeSummary, neighborhoodCrimeSummary, bgCrimeSummary, hexSummary, ncol = 2, nrow = 2)

#Removing the plots from computer memory
rm(wardCrimeSummary, neighborhoodCrimeSummary, bgCrimeSummary, hexSummary, breaks, DC_Hexagons)

Comparison of the Top 1, 5 and 10 geographies within each geography.

This section will discuss the differences between geographies more and introduce four charts that compare the top units within each geography.

#Plotting the top results from each geography----
#Top Ward
topWard_Plot <- ggplot(topWard, aes(x = reorder(NAME, incidents), y = incidents)) +
  geom_col(fill = "skyblue", color = "black") +
  geom_text(aes(label = str_wrap(NAME, width = 10)), vjust = 10) +  # Wrap x-axis labels and adjust position
  geom_text(aes(label = incidents), vjust = -0.5, size = 3, color = "black") +  # Display total values on top of bars
  labs(title = "Top Ward by Incidents",
       x = NULL, y = "Incidents") +  # Remove x-axis label
  theme_minimal() +
  theme(axis.text.x = element_blank())  # Remove x-axis labels

#Top Neighborhood
topNeighborhood_Plot <- ggplot(topNeighborhoods, aes(x = reorder(NBH_NAMES, incidents), y = incidents)) +
  geom_col(fill = "skyblue", color = "black") +
  geom_text(aes(label = str_wrap(NBH_NAMES, width = 10)), vjust = 1.1) +  # Wrap x-axis labels and adjust position
  geom_text(aes(label = incidents), vjust = -0.5, size = 3, color = "black") +  # Display total values on top of bars
  labs(title = "Top 5 Neighborhoods by Incidents",
       x = NULL, y = "Incidents") +  # Remove x-axis label
  theme_minimal() +
  theme(axis.text.x = element_blank())  # Remove x-axis labels

# Top Block Groups
topBG_Plot <- ggplot(topBG, aes(x = reorder(GEOID, incidents), y = incidents)) +
  geom_col(fill = "skyblue", color = "black") +
  geom_text(aes(label = paste0("GEOID: ", str_wrap(GEOID, width = 10))), hjust = 1.25, angle=90) +  # Wrap x-axis labels and adjust position
  geom_text(aes(label = incidents), vjust = -0.5, size = 3, color = "black") +  # Display total values on top of bars
  labs(title = "Top 10 Block Groups by Incidents",
       x = NULL, y = "Incidents") +  # Remove x-axis label
  theme_minimal() +
  theme(axis.text.x = element_blank())  # Remove x-axis labels

#Top Hexagons
topHex_Plot <- ggplot(topHexagons, aes(x = reorder(hex_id, incidents), y = incidents)) +
  geom_col(fill = "skyblue", color = "black") +
  geom_text(aes(label = paste0("Hex ID: ", str_wrap(hex_id, width = 10))), hjust = 1.1, angle=90) +  # Wrap x-axis labels and adjust position
  geom_text(aes(label = incidents), vjust = -0.5, size = 3, color = "black") +  # Display total values on top of bars
  labs(title = "Top 10 H3 Hexagons by Incidents",
       x = NULL, y = "Incidents") +  # Remove x-axis label
  theme_minimal() +
  theme(axis.text.x = element_blank())  # Remove x-axis labels

# Create the 2x2 grid of plots
grid.arrange(topWard_Plot, topNeighborhood_Plot, topBG_Plot, topHex_Plot, ncol = 2, nrow = 2)

#Removing not needed variables from memory
rm(topHex_Plot, topWard_Plot, topNeighborhood_Plot, topBG_Plot, topHexagons, topBG, topNeighborhoods, topWard)

Exploring Temporally

Before the data can be explored temporally the DC Crime data needs to be cleaned and condition. Currently, the time stamps within the data are stored in the UNIX/ POSIX time format. This format counts the number of seconds that have passed since January 1, 1970 at 12:00:00 AM (5). This format is typically used by machines to easily keep track of time but is is not useful for our project. As a result this field needs to be converted to a typical human Date-Time-Group of DD-MM-YYY HH:MM:SS.

  1. https://en.wikipedia.org/wiki/Unix_time
#Extract Dates from Crime Data----
#First 10 Pre-Converted Records in DC Crime
top10dates_notConverted <- head(dc_crime, 10)
top10dates_notConverted
## Simple feature collection with 10 features and 4 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -77.03541 ymin: 38.89459 xmax: -76.92429 ymax: 38.93125
## Geodetic CRS:  WGS 84
##      REPORT_DAT                    OFFENSE LATITUDE LONGITUDE
## 1  1.700615e+12               THEFT F/AUTO 38.91334 -77.01060
## 2  1.676351e+12                   HOMICIDE 38.90320 -76.98137
## 3  1.695701e+12                   HOMICIDE 38.90387 -76.92429
## 4  1.681517e+12                THEFT/OTHER 38.90565 -77.00256
## 5  1.675210e+12                THEFT/OTHER 38.91454 -77.01893
## 6  1.673929e+12 ASSAULT W/DANGEROUS WEAPON 38.93124 -77.03541
## 7  1.674075e+12                   BURGLARY 38.90723 -77.01086
## 8  1.674078e+12               THEFT F/AUTO 38.92696 -76.99618
## 9  1.674100e+12 ASSAULT W/DANGEROUS WEAPON 38.89458 -76.92923
## 10 1.674130e+12                THEFT/OTHER 38.93059 -77.03182
##                      geometry
## 1  POINT (-77.01061 38.91335)
## 2  POINT (-76.98137 38.90321)
## 3  POINT (-76.92429 38.90388)
## 4  POINT (-77.00256 38.90565)
## 5  POINT (-77.01894 38.91454)
## 6  POINT (-77.03541 38.93125)
## 7  POINT (-77.01086 38.90724)
## 8  POINT (-76.99619 38.92697)
## 9  POINT (-76.92924 38.89459)
## 10  POINT (-77.03182 38.9306)
# Function to convert timestamps to separate components
convert_timestamps <- function(timestamps) {
  # Convert timestamps to numeric format
  timestamps_numeric <- as.numeric(timestamps)
  
  # Convert timestamps to POSIXct format
  datetime <- as.POSIXct(timestamps_numeric / 1000, origin = "1970-01-01")
  
  # Extract components
  day <- format(datetime, "%d")
  month <- format(datetime, "%m")
  year <- format(datetime, "%Y")
  hour <- format(datetime, "%H")
  minute <- format(datetime, "%M")
  second <- format(datetime, "%S")
  
  # Create data frame with separate columns
  df <- data.frame(
    day = as.integer(day),
    month = as.integer(month),
    year = as.integer(year),
    hour = as.integer(hour),
    minute = as.integer(minute),
    second = as.integer(second)
  )
  
  return(df)
}

# Converting the REPORT_DAT column to Individual
dc_crime$REPORT_DATE<- convert_timestamps(unlist(dc_crime$REPORT_DAT))

# Combine the columns into a single DTG column
dc_crime$REPORT_TIMESTAMP <- with(dc_crime, paste(
  REPORT_DATE$year, REPORT_DATE$month, REPORT_DATE$day,
  REPORT_DATE$hour, REPORT_DATE$minute, REPORT_DATE$second,
  sep = "-"
))

# Convert the DTG column to a POSIXct date-time object
dc_crime$REPORT_TIMESTAMP <- as.POSIXct(dc_crime$REPORT_TIMESTAMP, format = "%Y-%m-%d-%H-%M-%S")

#Making a new column that's just that date as a Date class
dc_crime$REPORT_DATE <- as.Date(dc_crime$REPORT_TIMESTAMP)

#Making a new column that's just the tiem as a posix class
dc_crime$TIME <- format(dc_crime$REPORT_TIMESTAMP, format = "%H:%M:%S")

#Removing unncessary columns
dc_crime <- dc_crime[c("REPORT_TIMESTAMP", "REPORT_DATE", "TIME", "OFFENSE", "LONGITUDE", "LATITUDE", "geometry")]

##First 10 Converted Records in DC Crime
top10dates_Converted <- head(dc_crime,10)
top10dates_Converted
## Simple feature collection with 10 features and 6 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -77.03541 ymin: 38.89459 xmax: -76.92429 ymax: 38.93125
## Geodetic CRS:  WGS 84
##       REPORT_TIMESTAMP REPORT_DATE     TIME                    OFFENSE
## 1  2023-11-21 20:03:22  2023-11-22 20:03:22               THEFT F/AUTO
## 2  2023-02-14 00:00:00  2023-02-14 00:00:00                   HOMICIDE
## 3  2023-09-26 00:00:00  2023-09-26 00:00:00                   HOMICIDE
## 4  2023-04-14 20:01:51  2023-04-15 20:01:51                THEFT/OTHER
## 5  2023-01-31 19:01:54  2023-02-01 19:01:54                THEFT/OTHER
## 6  2023-01-16 23:23:52  2023-01-17 23:23:52 ASSAULT W/DANGEROUS WEAPON
## 7  2023-01-18 15:55:25  2023-01-18 15:55:25                   BURGLARY
## 8  2023-01-18 16:34:19  2023-01-18 16:34:19               THEFT F/AUTO
## 9  2023-01-18 22:53:14  2023-01-19 22:53:14 ASSAULT W/DANGEROUS WEAPON
## 10 2023-01-19 07:13:07  2023-01-19 07:13:07                THEFT/OTHER
##    LONGITUDE LATITUDE                   geometry
## 1  -77.01060 38.91334 POINT (-77.01061 38.91335)
## 2  -76.98137 38.90320 POINT (-76.98137 38.90321)
## 3  -76.92429 38.90387 POINT (-76.92429 38.90388)
## 4  -77.00256 38.90565 POINT (-77.00256 38.90565)
## 5  -77.01893 38.91454 POINT (-77.01894 38.91454)
## 6  -77.03541 38.93124 POINT (-77.03541 38.93125)
## 7  -77.01086 38.90723 POINT (-77.01086 38.90724)
## 8  -76.99618 38.92696 POINT (-76.99619 38.92697)
## 9  -76.92923 38.89458 POINT (-76.92924 38.89459)
## 10 -77.03182 38.93059  POINT (-77.03182 38.9306)
#Removing variables from memory
rm(top10dates_Converted, top10dates_notConverted)

#Finding Sun Times

#Finding Sunrise and Sunset Times ----
#Using sunclac to calculate if a crime event occurred during the day or night.
#Start: 12 March 2023 
#End: 5 November 2023
dc_crime <- dc_crime %>%
  rowwise() %>%
  mutate(
    sunrise = list(if_else(
      REPORT_DATE >= as.Date("2023-03-12") & REPORT_DATE < as.Date("2023-11-05"), #Accounting for Daylight Savings
      getSunlightTimes(date = REPORT_DATE, lon = LONGITUDE, lat = LATITUDE, keep = "sunrise", tz = "UTC+4"),
      getSunlightTimes(date = REPORT_DATE, lon = LONGITUDE, lat = LATITUDE, keep = "sunrise", tz = "UTC+5")
    )),
    sunset = list(if_else(
      REPORT_DATE >= as.Date("2023-03-12") & REPORT_DATE < as.Date("2023-11-05"), #Accounting for Daylight Savings
      getSunlightTimes(date = REPORT_DATE, lon = LONGITUDE, lat = LATITUDE, keep = "sunset", tz = "UTC+4"),
      getSunlightTimes(date = REPORT_DATE, lon = LONGITUDE, lat = LATITUDE, keep = "sunset", tz = "UTC+5")
    ))
  ) %>%
  ungroup()

# Un-nesting the nested df with the time data. Selects a specific column within the nested data.
dc_crime <- dc_crime %>%
  mutate(
    sunrise = map(sunrise, ~ select(.x, sunrise)),
    sunset = map(sunset, ~ select(.x, sunset))
  ) %>%
  unnest(cols = c(sunrise, sunset))


#Creating a new column and populating it depending on when the crime took place
dc_crime$time_of_day <- ifelse(dc_crime$REPORT_TIMESTAMP >= dc_crime$sunrise & dc_crime$REPORT_TIMESTAMP <= dc_crime$sunset, 'Day', 'Night')




#Creating a chart of the number of crimes that took place at night vs during the day. ----
#Creating a variable that keeps the count of Day vs Night values
crime_count <- table(dc_crime$time_of_day)

#Making that a df
crime_count_df <- as.data.frame(crime_count)

# Calculate percentage for each time of day
crime_count_df$percentage <- crime_count_df$Freq / sum(crime_count_df$Freq) * 100

# Create the pie chart of number of crimes by time of day
crime_count_pie <- plot_ly(crime_count_df, labels = ~Var1, values = ~Freq, type = 'pie',
               textinfo = 'label+percent+value',
               insidetextorientation = 'radial')

#Adding a title
crime_count_pie <- crime_count_pie %>% layout(title = list(text = "Crime by Time of Day", y = .98))

# Display the plot
crime_count_pie
#Removing items from memory
rm(crime_count_pie, crime_count, crime_count_df)

#Break down of day vs night

#Breaking out the types of crime by when they occurred ----
offense_summary <- dc_crime %>%
  group_by(OFFENSE, time_of_day) %>%
  summarize(count = n(), .groups = "drop")

#Creating a bar chart of the summarized data
offense_plot <- ggplot(offense_summary, aes(x = OFFENSE, y = count, fill = time_of_day)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_minimal() +
  geom_text(aes(label=count), vjust=-0.3, size=3.5, position = position_dodge(0.9)) +
  labs(title = "Offense Count by Time of Day",
       x = element_blank(),
       y = "Count",
       fill = "Time of Day",
       caption = "NOTE: All Homicides have a timestamp of 00:00:00") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))+
  scale_fill_manual(values = c("Day" = "lightblue", "Night" = "black"))

offense_plot

#Removing items from memory
rm(offense_summary, offense_plot)

#Mapping the crimes day vs night - Interactive

#Mapping the crimes that occured at night vs the ones that occured during the day. ----
#Making new layers for day and night
dc_crime_day <- subset(dc_crime, time_of_day == "Day")
dc_crime_night <- subset(dc_crime, time_of_day == "Night")

#Mapping the crime data by time
crime_TOD_map <- leaflet() %>%
  addProviderTiles(providers$CartoDB) %>%
  #Day Crimes
  addCircleMarkers(data = dc_crime_day,
                   lng = ~LONGITUDE, lat = ~LATITUDE,
                   color = "blue",
                   label = ~as.character(OFFENSE),
                   opacity = 1, fillOpacity = 1,
                   radius = 1,
                   group = "Day Crimes") %>%
  #Night Crimes
  addCircleMarkers(data = dc_crime_night,
                   lng = ~LONGITUDE, lat = ~LATITUDE,
                   color = "grey",
                   label = ~as.character(OFFENSE),
                   opacity = .35, fillOpacity = .25,
                   radius = 1,
                   group = "Night Crimes") %>%
  addLayersControl(
    overlayGroups = c("Day Crimes", "Night Crimes"),
    options = layersControlOptions(collapsed = FALSE)
  )

crime_TOD_map
#Removing items from memory
rm(crime_TOD_map)

#Hexagons Day vs Night

#Hexbins by time of day ----
#Making a new Crimes Hexagon Layers
#By copying and removing columns it removes the need to recreate the hexagons
dayCrime_Hexagons <- subset(dc_hexagons, select = c("hex_id", "geometry"))
nightCrime_Hexagons <- subset(dc_hexagons, select = c("hex_id", "geometry"))

#Summarizing by day and night
dayCrime_Hexagons$incidents = lengths(st_intersects(dayCrime_Hexagons, dc_crime_day))
nightCrime_Hexagons$incidents = lengths(st_intersects(nightCrime_Hexagons, dc_crime_night))

#Removing rows with 0s
dayCrime_Hexagons <- dayCrime_Hexagons[dayCrime_Hexagons$incidents != 0, ]
nightCrime_Hexagons <- nightCrime_Hexagons[nightCrime_Hexagons$incidents != 0, ]


#Creating a ggplot that displays the center of activity for both day and night crimes.
#Day
#Finding the top 10 hexagons
topHexagons <- dayCrime_Hexagons %>%
  top_n(10, incidents)

# Calculate Jenks natural breaks
breaks <- classIntervals(dayCrime_Hexagons$incidents, n = 5, style = "jenks")

# Discretize the variable based on Jenks breaks
dayCrime_Hexagons$incidents_group <- cut(dayCrime_Hexagons$incidents, breaks$brks)

# Plot with styled fill
day_hexSummary <- ggplot(data = dayCrime_Hexagons, aes(fill = incidents_group)) +
  geom_sf() +
  #DC Boundary
  geom_sf(data = dc_boundary, color = "black", fill = NA, size = 10) +
  # DC Waterbodies
  geom_sf(data = dc_water, aes(fill = "waterbodies"), color = NA, fill = "lightblue",alpha = 0.5) +
  # DC Parks
  geom_sf(data = dc_parks, aes(fill = "parks"), color = NA,fill = "darkgreen", alpha = 0.5) +
  scale_fill_brewer(palette = "Reds", name = "Incidents") +  # Use a suitable color palette
  geom_sf(data = topHexagons, fill = NA, color="black", lwd = 0.6)+
  theme_void() +
  labs(title = "Day Crimes by Hexagon",
       caption = "Calling out the top 10 hexagons")

#Night
#Finding the top 10 hexagons
topHexagons <- nightCrime_Hexagons %>%
  top_n(10, incidents)

# Calculate Jenks natural breaks
breaks <- classIntervals(nightCrime_Hexagons$incidents, n = 5, style = "jenks")

# Discretize the variable based on Jenks breaks
nightCrime_Hexagons$incidents_group <- cut(nightCrime_Hexagons$incidents, breaks$brks)

# Plot with styled fill
night_hexSummary <- ggplot(data = nightCrime_Hexagons, aes(fill = incidents_group)) +
  geom_sf() +
  #DC Boundary
  geom_sf(data = dc_boundary, color = "black", fill = NA, size = 10) +
  # DC Waterbodies
  geom_sf(data = dc_water, aes(fill = "waterbodies"), color = NA, fill = "lightblue",alpha = 0.5) +
  # DC Parks
  geom_sf(data = dc_parks, aes(fill = "parks"), color = NA,fill = "darkgreen", alpha = 0.5) +
  scale_fill_brewer(palette = "Reds", name = "Incidents") +  # Use a suitable color palette
  geom_sf(data = topHexagons, fill = NA, color="black", lwd = 0.6)+
  theme_void() +
  labs(title = "Night Crimes by Hexagon",
       caption = "Calling out the top 10 hexagons")

# Create the 2x1 grid of plots
grid.arrange(day_hexSummary, night_hexSummary, ncol = 2, nrow = 1)

#Remove items from memory
rm(breaks, day_hexSummary, dayCrime_Hexagons, dc_crime_day, dc_crime_night, nightCrime_Hexagons, night_hexSummary, topHexagons)