Spatial Analysis of Car Crashes in Washington, D.C., 2009-2022

Abstract

Car crashes have become increasingly common and problematic throughout the United States, particularly for cyclists and pedestrians. This is particularly true in Washington, D.C., where car crashes reached an all-time high in 2022. While spatial analysis of car crashes has been conducted elsewhere, there is a need for further research on where car crashes occur within Washington, D.C. The main objective of this paper is to identify hotspots of crashes in the District of Columbia between 2009 and 2023. Other objectives are to identify where crashes resulting in pedestrian injury (pedestrian-involved crashes) and cyclist injury (cyclist-involved crashes) take place within the District of Columbia. Crash data maintained by the District Department of Transportation are used, in addition to TIGER shapefiles used for geocoding and visualization of data. Data preparation and analysis were performed using packages in R. Spatial analysis was performed in R using three types of spatial methods. These methods include kernel density estimation, local Moran’s I, and Getis-Ord Gi*. The findings of this paper suggest that car crashes are not randomly distributed, but rather are heavily concentrated in areas such as downtown D.C. and the areas surrounding the National Mall. Pedestrian-involved car crashes follow a similar pattern, but cyclist-involved car crashes have a different distribution that appears related to cyclist infrastructure.

Introduction

This term paper studied the distribution of car crashes within Washington, D.C. Pedestrian and cyclist-involved car crashes have increased in recent years in the United States (Bennett & Norris, 2023; Governors Highway Safety Association, 2021). In 2022, car crashes reached an all-time high in Washington, D.C. (Lazo, Jayaraman, & Moriarty, 2022). Car crashes have become increasingly problematic for pedestrians and cyclists, and car crash fatalities increased throughout the COVID-19 pandemic (Bendix, 2022). As of 2021, pedestrian fatalities resulting from car collisions were up 80 percent from their all-time low in 2009 (Insurance Institute for Highway Safety, n.d.).

Spatial factors have been studied in association with car crashes and have been found to be associated with car crashes (Dai, 2012; L. Li, Zhu, & Sui, 2007; Saha, Alluri, Gan, & Wu, 2018). Identifying relevant factors associated with car crashes and car crash hotspots in Washington, D.C. could inform local planning and crash prevention issues, such as Vision Zero (D.C. Department of Transportation, 2023). Identifying problematic or dangerous areas could also help benefit infrastructure improvement advocacy efforts by community groups, local civic organizations, and the general public.

The objective of this study is to identify the areas of Washington, D.C. where car crashes tend to occur. The project is attempting to answer the question “where do car crashes occur within Washington, D.C.?” Additional analysis relating specifically to both pedestrian-involved crashes and cyclist-involved crashes has also been performed. This project also attempts to answer the questions “where do pedestrian-involved crashes occur” and “where do cyclist-involved crashes occur” using the same methods.

Data

Crash Dataset

This paper uses a crash dataset that contains crashes within Washington D.C. up until May 2023 (maintained by the District Department of Transportation and available from D.C. Open Data: https://opendata.dc.gov/datasets/DCGIS::crashes-in-dc/explore). This dataset is a set of points within Washington, D.C. that represent crash locations. There are 288,171 observations in the original uncleaned data set, with 288,162 including coordinate data.

I included only data starting from 2009 in my analysis due to a lack of completeness in prior years – only a limited number of crashes are included in previous years. In addition, data between 2009 and 2022 is used because these years have a complete set of car crash data. Observations without coordinate data were removed, in addition to two observations outside of Washington, D.C. Duplicate incidents were also removed in the preparation for assigning observations to census tracts.

There is a “locationerror” column that flags crashes that might have incorrect coordinates, for example, being too far from the centerline of a road. Some of these error types could be indicative of a problem with a data point. After manually checking these errors by visualizing all points with a flag in the locationerror column, none of the points appeared obviously illegitimate or implausible. As a result, no additional observations were removed from the data set.

Each observation is identified by a unique ID (ccn). The crash dataset includes the date and time that the crash occurred, the date that a report was made, the longitude and latitude of the crash, the address, and the ward of the crash. Washington, D.C. is divided into four quadrants (Northeast, Northwest, Southeast, and Southwest) and eight wards (numbered one through five). Quadrants are used for addresses and were included in the dataset in the address field. Other details included in the dataset include street segment IDs, intersections, route IDs, distance from the intersection, and D.C.-specific markers, such as the MAR address and MAR ID.

One limitation of the crash dataset and crash details dataset is that they are missing geographic identifiers for census tracts – the dataset includes only longitude, latitude, and the eight wards of D.C. In order to add census tract data, the sf package was used to perform an intersection between the crash dataset and the census tract shapefile. I created subsets of the crash dataset for pedestrian-involved crashes and cyclist-involved crashes in order to identify hotspots for these crashes. I added derived variables, such as speeding-involved crashes, injury-involved crashes, and other useful variables as needed for analysis.

There is a variable in the main dataset that indicates that a crash involves speeding. There are details including injuries and fatalities within the crash dataset, in addition to a crash details table (available from D.C. Open Data: https://opendata.dc.gov/datasets/DCGIS::crash-details-table/explore) that can be joined to the car crash data table. The primary crash dataset contains details such as pedestrian and cyclist injuries and fatalities (which indicate pedestrian or cyclist involvement). The additional information available in both of these datasets is incomplete, however.

Additional Data

2021 TIGER/Line census tract shapefiles (U.S. Census Bureau, 2023a) and roads shapefiles (U.S. Census Bureau, 2023b) were acquired from the U.S. Census Bureau using the tigris package in R (Walker, 2023). These shapefiles were used for visualization and analysis of car crashes. In addition, quadrant (City of Washington, DC, 2022) and ward (City of Washington, DC, 2023) shapefiles from D.C. Open Data were used to visualize the study area in Figure 1. Area water shapefiles (U.S. Census Bureau, 2021) were used for visualization purposes.

The study area is visualized in Figure 1, with the quadrants, wards, and census tracts within Washington, D.C. visualized to provide context.

library(tidyverse)
library(sf)
library(tigris)
library(here)
library(gridExtra)

# Theme edits-----
theme_map <- function(...) {
  theme_minimal() +
    theme(
      text = element_text(family = "Lato", size = 11, color = "#22211d"),
      axis.line = element_blank(),
      axis.text.x = element_blank(),
      axis.text.y = element_blank(),
      axis.ticks = element_blank(),
      axis.title.x = element_blank(),
      axis.title.y = element_blank(),
      plot.title = element_text(size=13, face = "bold"),
      plot.subtitle = element_text(size=11),
      # panel.grid.minor = element_line(color = "#ebebe5", size = 0.2),
      panel.grid.major = element_blank(),
      panel.grid.minor = element_blank(),
      plot.background = element_rect(fill = "#f5f5f2", color = NA), 
      panel.background = element_rect(fill = "#f5f5f2", color = NA), 
      legend.background = element_rect(fill = "#f5f5f2", color = NA),
      panel.border = element_blank(),
      plot.margin = margin(.25, .25, .25, .25, "cm"),
      # caption
      plot.caption = element_text(hjust=0, colour="grey50",
                                  margin = margin(t = 5, r=0, b = 5, l =0)),
      ...
    )
}

library(tigris)
library(sf)
library(tidyverse)

# Read in files-----
# Census tracts already loaded
dc_quadrants <- st_read("shapefiles/DC_Quadrants/DC_Quadrants.shp", quiet = TRUE)
dc_wards <- st_read("shapefiles/Wards_from_2022/Wards_from_2022.shp", quiet = TRUE)
# Load DC census tracts----
dc_census_tracts <- st_read("data/dc_census_tracts.shp", quiet = TRUE)
# Get water using tigris-----
dc_water <- area_water("DC", county = "001", progress_bar = FALSE)

# Map DC quadrant data-----
dc_quadrant_map <- dc_quadrants %>%
  ggplot() +
  geom_sf(fill = "white") +
  geom_sf_text(aes(label = QUADRANT)) +
  labs(
    title = "Quadrants in Washington D.C.",
    subtitle = ""
  ) +
  theme_map()

# Map DC Ward data-----
dc_ward_map <- dc_wards %>%
  ggplot() +
  geom_sf(fill = "white") +
  geom_sf_text(aes(label = WARD)) +
  labs(
    title = "Wards in Washington D.C.",
    subtitle = ""
  ) +
  theme_map()

# Map census tracts without data-----
dc_tract_map <- dc_census_tracts %>%
  ggplot() +
  geom_sf(fill = "white") +
  labs(
    title = "Census Tracts in Washington D.C.",
    subtitle = ""
  ) +
  theme_map()

# TODO add this into the rmarkdown
library(tigris)
library(sf)
library(tidyverse)

# Read in files-----
# Census tracts already loaded
dc_bike_lanes <- st_read("shapefiles/Bicycle_Lanes/Bicycle_Lanes.shp", quiet = TRUE)
dc_outline <- dc_quadrants %>% st_union()

# Map DC Bike lanes----
dc_bike_lane_map <- dc_bike_lanes %>%
  ggplot() +
  geom_sf(data = dc_outline) +
  geom_sf(data = dc_water, fill = "skyblue") +
  geom_sf(color = "black") +
  labs(
    title = "Bike Lanes in Washington D.C.",
    subtitle = ""
  ) +
  theme_map()

Figure 1

Quadrants, wards, and census tracts in the study area of Washington, D.C.

cowplot::plot_grid(dc_quadrant_map, dc_ward_map, dc_tract_map, nrow = 1)

Figure 2

An interactive leaflet map displaying the primary roads and overall road network in Washington, D.C.

library(leaflet)
library(tigris)
library(tidyverse)

# Load DC Roads from tigris-----
dc_roads <- roads("DC", "001", year = 2021, progress_bar = FALSE)

# Only main roads-----
rttyp_id <- c("I", "S", "U")

dc_main_roads <- dc_roads %>% filter(RTTYP %in% rttyp_id)

leaflet_road_map <- leaflet() %>%
  addProviderTiles(providers$CartoDB.DarkMatter, group = "Carto Dark Matter") %>% 
  addProviderTiles(providers$CartoDB.Positron, group = "Carto") %>% 
  addPolylines(data = dc_main_roads,
               weight = 1,
               color = "darkgreen",
               group = "Main Roads") %>%
  addPolylines(data = dc_roads,
               weight = .2,
               group = "All Roads") %>%
  setMaxBounds(-77.119759, 38.99511, -76.909395, 38.791645) %>%
  addLayersControl(
    baseGroups = c("Carto", "Carto Dark Matter"),
    overlayGroups = c("Main Roads",
                      "All Roads"),
    position = "bottomleft",
    options = layersControlOptions(collapsed = FALSE)
  )

leaflet_road_map

Methodology

In order to prepare the data, I used packages including the janitor package and various tidyverse (Wickham et al., 2019) packages, including dplyr (Wickham, François, Henry, Müller, & Vaughan, 2023), readr (Wickham, Hester, & Bryan, 2023), and lubridate (Spinu, Grolemund, & Wickham, 2023). Using these packages, I added derived variables such as the year. I filtered out data points that should be excluded; these points include observations without coordinate data, data points outside of the District of Columbia, and any duplicate or erroneous observations identified in the dataset. The dataset was converted to a spatial format using the sf package (Pebesma, 2022) in order to perform analysis tasks. Census tracts were downloaded using the tigris package (Walker, 2023). Using the sf package (Pebesma, 2022), census tracts were added to each car crash observation in order to perform further analysis.

library(tidyverse)
library(janitor)
library(lubridate)


# Read in DC Car Crash data from DC Open Data
crashes <- read_csv("data/Crashes_in_DC.csv")
crashes <- crashes %>% clean_names()

# Remove data that is missing or outside of D.C.
crashes <- crashes %>% filter(!is.na(latitude) & !is.na(longitude))

# Filter dataset to remove observations outside of D.C.
crashes <- crashes %>% 
  filter(between(longitude,-77.119759,-76.909395),
         between(latitude,38.791645,38.99511))

# Add variables
# Add speeding variable derived from speeding_involved (var that sums speeding vehicles)
crashes <- crashes %>% 
  mutate(speeding = if_else(speeding_involved > 0, "Yes", "No"))


# Add variables for Census geocoding
crashes <- crashes %>% mutate(city = "Washington", state = "DC")

# Parse year, month, day, hour, DOW, week
crashes <- crashes  %>%
  mutate(year = as.integer(year(fromdate)),
         month = as.integer(month(fromdate)),
         day = day(fromdate),
         dow = wday(fromdate),
         week = week(fromdate))

# Add month name and month abbr
crashes <- crashes %>% 
  mutate(month_abbr = month(fromdate, label = TRUE),
         month_name = month(fromdate, label = TRUE, abbr = FALSE))

crashes <- crashes %>% mutate(date_abbr = paste(month, day, year, sep = "/"))
crashes <- crashes %>% mutate(date = paste0(month_name," " ,day,", " ,year))

# Add injuries
crashes <- crashes %>% 
  mutate(total_injuries = across(contains("injuries")) %>% rowSums)

# Add injury categories
crashes <- crashes %>% 
  mutate(injury_cat = case_when(
    total_injuries == 0 ~ "None",
    total_injuries == 1 ~ "One",
    total_injuries == 2 ~ "Two",
    total_injuries >= 3 ~ "Three or More"
  ))

# Add pedestrian injuries
crashes <- crashes %>% 
  mutate(pedestrian_injuries = across(contains("injuries_pedestrian")) %>% rowSums)

# Add cyclist injuries
crashes <- crashes %>% 
  mutate(cyclist_injuries = across(contains("injuries_bicyclist")) %>% rowSums)

# Add driver injuries
crashes <- crashes %>% 
  mutate(driver_injuries = across(contains("injuries_driver")) %>% rowSums)

# Add passenger injuries
crashes <- crashes %>% 
  mutate(passenger_injuries = across(contains("injuries_passenger")) %>% rowSums)


# Add active transit injuries
crashes <- crashes %>% 
  mutate(active_transit_injuries = pedestrian_injuries + cyclist_injuries) 

# Driver or passenger injuries
crashes <- crashes %>% 
  mutate(driver_or_pass_injuries = driver_injuries + passenger_injuries) 

# Create derived variables for each form of transit and injuries
crashes <- crashes %>% 
  mutate(injured_pedestrian = if_else(pedestrian_injuries > 0, 1, 0),
         injured_cyclist = if_else(cyclist_injuries > 0, 1, 0),
         injured_pass_or_driver = if_else(driver_injuries > 0 | passenger_injuries >0, 1, 0),
         injured_active_transit = if_else(pedestrian_injuries > 0 | cyclist_injuries >0, 1, 0))

active_transit_inj_involved_crashes <- crashes %>% filter(active_transit_injuries > 0)
driver_or_pass_inj_involved_crashes <- crashes %>% filter(driver_or_pass_injuries > 0)


# Create subset of speed-related crashes
crashes_speed <- crashes %>% filter(speeding == "Yes")
crashes_speed <- crashes_speed %>% 
  mutate(density = 0) %>%
  as.data.frame()

# crashes %>% write_csv("crashes_cleaned.csv")

# Clean geocode
library(tidyverse)
library(janitor)
library(sf)

# Load car crash dataset----
crashes <- read_csv("data/crashes_geocoded.csv")
crashes <- crashes %>% 
  filter(year > 2008 & year < 2023) %>%
  select(ccn, longitude, latitude) %>% distinct()

# Load census tracts-----
dc_tracts <- st_read("data/dc_census_tracts.shp")

# Crashes as sf object
crashes_sf <- crashes %>%
  st_as_sf(coords = c("longitude", "latitude"), crs = 4326) 

# transform to match the crs of dc_tracts
crashes_sf <- crashes_sf %>% st_transform(., st_crs(dc_tracts))

# Perform intersection
# Note: this will take many hours to complete if using 2009-2022 data
crash_intersect <- st_intersection(dc_tracts, crashes_sf)

crash_tracts_sf <- crash_intersect %>%
  group_by(GEOID) %>%
  count() 

# Write crashes by tract to csv
write_csv(crash_tracts_sf, "data/crashes_by_censustract.csv")

# drop geometry for export
crashes_geocoded <- crash_intersect %>% st_drop_geometry()

# export geocoded crashes csv file
write_csv(crashes_geocoded, "data/crashes_geocoded.csv")

While there are likely differences over time, upon viewing the data at three different points in time, it is not clear that a spatiotemporal analysis is necessary for this particular paper. In 2009, crashes primarily took place in the central parts of Washington, D.C. and in Northwest D.C. Arterial roads appear to have more crashes than other roads. In 2015, crashes primarily took place in the central parts of Washington, D.C., with a high concentration of crashes taking place downtown and to the North of downtown D.C. There also appear to have been many crashes along highways and arterial roads. In 2022, crashes appeared to have similar patterns as in 2015. These patterns are displayed in Figure 2.

Figure 3

Car crashes in Washington, D.C. in 2009, 2015, and 2022 respectively. While the intensity of car crashes varies, the patterns are similar in all three years.

Kernel Density Estimation

Kernel density estimation is one method that is used to examine car crashes in this paper. Kernel density estimation provides an estimate of density and can identify hot spots outside of a set zone (e.g. a census tract) and provide an estimate over a continuous area (Yin, 2020). Different types of kernel density estimation have been used in the context of car crashes (Anderson, 2009; Bassani, Rossetti, & Catani, 2020). For kernel density estimation, the MASS (Modern Applied Statistics with S) package includes a two-dimensional kernel density estimation function (Ripley, 2022). I used the two-dimensional kernel density estimate described by Venables and Ripley (2002). The formula is as follows, with the kernel density estimate represented as a fraction of the sum of observations at one particular location divided by the sum of all observations (Venables & Ripley, 2002):

# Kernel Density Estimation Script
# Load first
library(tidyverse)
library(tigris)
library(ggspatial)
options(tigris_use_cache = TRUE)

# Load modified crashes file----
crashes_kde_tbl <- read_csv("data/crashes_cleaned.csv")

# filter years
crashes_kde_tbl <- crashes_kde_tbl %>% filter(year > 2008 & year < 2023)
crashes_kde_tbl <- crashes_kde_tbl %>% as.data.frame()

# Download shapefiles for maps-----
# Download geographies----
dc <- counties("11", year = 2021) # Tigris: DC----


# Colors-----


# Create a subset of pedestrian-involved crashes-----
crashes_kde_tbl_pedes <- crashes_kde_tbl %>% 
  filter(injured_pedestrian == 1) %>% # filter for injured pedestrians
  as.data.frame() # convert to DF for MASS package

# Create a subset of cyclist-involved crashes-----
crashes_kde_tbl_cycl <- crashes_kde_tbl %>% 
  filter(injured_cyclist == 1) %>% # filter for injure cyclists
  as.data.frame() # convert to DF for MASS package

# Kernel density estimation-----
# Overall crashes
crashes_kde_tbl$density <- fields::interp.surface(
  MASS::kde2d(crashes_kde_tbl$longitude, crashes_kde_tbl$latitude), 
  crashes_kde_tbl[,c("longitude", "latitude")])

# Pedestrian-involved
crashes_kde_tbl_pedes$density <- fields::interp.surface(
  MASS::kde2d(crashes_kde_tbl_pedes$longitude, crashes_kde_tbl_pedes$latitude), 
  crashes_kde_tbl_pedes[,c("longitude", "latitude")])

# Cyclist-involved
crashes_kde_tbl_cycl$density <- fields::interp.surface(
  MASS::kde2d(crashes_kde_tbl_cycl$longitude, crashes_kde_tbl_cycl$latitude), 
  crashes_kde_tbl_cycl[,c("longitude", "latitude")])

# Create visualizations----
# Overall crashes
crashes_overall_kde <- ggplot() +
  ggspatial::annotation_map_tile(zoom = 12, type = "cartolight") +
  geom_point(data = crashes_kde_tbl %>% dplyr::arrange(density), 
             aes(x = longitude, y = latitude, color = density), 
             size = 1 , alpha = .05, shape = 16) +
  coord_sf(crs = st_crs(dc), datum = NA) + 
  labs(title = 'Car Crashes in Washington, D.C., 2009-2022',
       color = 'Crash Density',
       caption = "© OpenStreetMap contributors © CARTO") +
  scale_color_gradient(low = "#0091ff", high = "#f0650e") +
  guides(fill = guide_colourbar(nbin=300, raster=FALSE, ticks = FALSE),
         color = guide_colourbar(nbin=300, raster=FALSE, ticks = FALSE)) +
  theme_map() 

#KDE_Overall_with_Attribution

# Pedestrian-involved
pedes_involved_kde <- ggplot() +
  ggspatial::annotation_map_tile(zoom = 12, type = "cartolight") +
  geom_point(data = crashes_kde_tbl_pedes %>% dplyr::arrange(density), 
             aes(x = longitude, y = latitude, color = density), 
             size = 2 , alpha = .1, shape = 16) +
  coord_sf(crs = st_crs(dc), datum = NA) + 
  labs(title = 'Pedestrian-Involved Car Crashes in Washington, D.C., 2009-2022',
       color = 'Crash Density',
       caption = "© OpenStreetMap contributors © CARTO") +
  scale_color_gradient(low = "#0091ff", high = "#f0650e") +
  guides(fill = guide_colourbar(nbin=300, raster=FALSE, ticks = FALSE),
         color = guide_colourbar(nbin=300, raster=FALSE, ticks = FALSE)) +
  theme_map() 

# Cyclist-involved
cyclist_involved_kde <- ggplot(crashes_kde_tbl_cycl) +
  ggspatial::annotation_map_tile(zoom = 12, type = "cartolight") +
  geom_point(data = crashes_kde_tbl_cycl %>% dplyr::arrange(density), 
             aes(x = longitude, y = latitude, color = density), 
             size = 2 , alpha = .1, shape = 16) +
  coord_sf(crs = st_crs(dc), datum = NA) + 
  labs(title = str_c('Cyclist-Involved Car Crashes in Washington, D.C., 2009-2022'),
       color = 'Crash Density',
       caption = "© OpenStreetMap contributors © CARTO") +
  scale_color_gradient(low = "#0091ff", high = "#f0650e") +
  guides(fill = guide_colourbar(nbin=300, raster=FALSE, ticks = FALSE),
         color = guide_colourbar(nbin=300, raster=FALSE, ticks = FALSE)) +
  theme_map() 

# TODO  - Should I visualize using interactive chart?

In addition to kernel density estimation, local indicators of spatial association were used to identify car crash clusters. This analysis has been performed at the census tract level.

Local Moran’s I

Local Moran’s I statistics were calculated using the rgeoda package (X. Li & Anselin, 2023). Local Moran’s I identifies clusters by measuring the similarity (or difference) of an observation (e.g. census tract) relative to its neighbors. Local Moran’s I is a statistic developed by Luc Anselin (Anselin, 1995) and is represented by the following formula (Anselin, 2020a):

This formula represents the denominator of the Local Moran’s I equation multiplied by the weighted lag, or the weighted sum of the nearest neighbors (Anselin, 2020a).

# Local Moran's I Script-----
# Load packages-----
library(tidyverse)
library(tidycensus)
library(rgeoda)
library(sf)
library(cowplot)

# Load geocoded data-----
crashes_gc <- read_csv("data/crash_tracts_overall.csv")
crashes_gc$GEOID <- as.character(crashes_gc$GEOID)

crashes_gc_cycl <- read_csv("data/cyclist_crashes_gc_count.csv")
crashes_gc_cycl$GEOID <- as.character(crashes_gc_cycl$GEOID)

crashes_gc_pedes <- read_csv("data/pedestrian_crashes_gc_count.csv")
crashes_gc_pedes$GEOID <- as.character(crashes_gc_pedes$GEOID)

# Load DC census tracts----
dc_census_tracts <- st_read("data/dc_census_tracts.shp", quiet = TRUE)

# Join each type of crashes to census tracts-----
crashes_gc_join <- dc_census_tracts %>% left_join(., crashes_gc, by = "GEOID")
# cyclist crashes
crashes_gc_join_cycl <- dc_census_tracts %>% left_join(., crashes_gc_cycl, by = "GEOID")
# coalesce to 0 due to census tracts with zero crashes
crashes_gc_join_cycl <- crashes_gc_join_cycl %>% mutate(n = coalesce(n, 0))
# pedestrian crashes
crashes_gc_join_pedes <- dc_census_tracts %>% left_join(., crashes_gc_pedes, by = "GEOID")


# Prepare for local Moran's I-----
# Queen weights - overall crashes
queenw <- queen_weights(crashes_gc_join)

# LISA object - overall crashes
crashes_lisa <- local_moran(queenw, crashes_gc_join["n"], significance_cutoff = 0.05)

# Colors, labels, clusters - overall crashes
lisa_c_col <- lisa_colors(crashes_lisa)
lisa_c_lab <- lisa_labels(crashes_lisa)
lisa_c_clust <- lisa_clusters(crashes_lisa)

# Create data frame to use in ggplot2
lisa_crash_df <- data.frame(test_id = 1:206)
lisa_crash_df$lisa_vals <- crashes_lisa$lisa_vals
lisa_crash_df$c_vals <- crashes_lisa$c_vals
lisa_crash_df$p_vals <- crashes_lisa$p_vals
lisa_crash_df$nn_vals <- crashes_lisa$nn_vals

# Create data frame for ggplot2 mapping - can be used with all df
moran_labels <- data.frame(c_vals = 0:6)
moran_labels$cluster_label <- lisa_c_lab
moran_labels$cluster_color <- lisa_c_col

moran_crashes_label_join <- lisa_crash_df %>% 
  left_join(., moran_labels, by = "c_vals")

moran_crashes_labeled <- crashes_gc_join %>% bind_cols(., moran_crashes_label_join)

# Cyclist Moran's prep-----
# Queen weights - cyclist crashes
queenw_cycl <- queen_weights(crashes_gc_join_cycl)

# LISA object - cyclist crashes
crashes_lisa_cycl <- local_moran(queenw_cycl, crashes_gc_join_cycl["n"], significance_cutoff = 0.05)

# Colors, labels, clusters - cyclist crashes
lisa_cycl_col <- lisa_colors(crashes_lisa_cycl)
lisa_cycl_lab <- lisa_labels(crashes_lisa_cycl)
lisa_cycl_clust <- lisa_clusters(crashes_lisa_cycl)

# Create data frame for ggplot2 mapping
lisa_crash_cycl_df <- data.frame(test_id = 1:206)
lisa_crash_cycl_df$lisa_vals <- crashes_lisa_cycl$lisa_vals
lisa_crash_cycl_df$c_vals <- crashes_lisa_cycl$c_vals
lisa_crash_cycl_df$p_vals <- crashes_lisa_cycl$p_vals
lisa_crash_cycl_df$nn_vals <- crashes_lisa_cycl$nn_vals

# Pedestrian Moran's prep-----
# Queen weights - pedestrian crashes
queenw_pedes <- queen_weights(crashes_gc_join_pedes)

# LISA object - pedestrian crashes
crashes_lisa_pedes <- local_moran(queenw_pedes, crashes_gc_join_pedes["n"], significance_cutoff = 0.05)

# Colors, labels, clusters - pedestrian crashes
lisa_pedes_col <- lisa_colors(crashes_lisa_pedes)
lisa_pedes_lab <- lisa_labels(crashes_lisa_pedes)
lisa_pedes_clust <- lisa_clusters(crashes_lisa_pedes)

# Create data frame for ggplot2 mapping
lisa_crash_pedes_df <- data.frame(test_id = 1:206)
lisa_crash_pedes_df$lisa_vals <- crashes_lisa_pedes$lisa_vals
lisa_crash_pedes_df$c_vals <- crashes_lisa_pedes$c_vals
lisa_crash_pedes_df$p_vals <- crashes_lisa_pedes$p_vals
lisa_crash_pedes_df$nn_vals <- crashes_lisa_pedes$nn_vals

# Local Moran map preparation -----
# Join labels to Moran's I data frames----
lisa_crash_join <- lisa_crash_df %>%
  left_join(., moran_labels, by = "c_vals")

lisa_crash_cycl_join <- lisa_crash_cycl_df %>%
  left_join(., moran_labels, by = "c_vals")

lisa_crash_pedes_join <- lisa_crash_pedes_df %>%
  left_join(., moran_labels, by = "c_vals")

# Join to original SF data-----

lisa_crash_join_sf <- crashes_gc_join %>% bind_cols(., lisa_crash_join)

lisa_cycl_crash_join_sf <- crashes_gc_join_cycl %>% bind_cols(., lisa_crash_cycl_join)

lisa_pedes_crash_join_sf <- crashes_gc_join_pedes %>% bind_cols(., lisa_crash_pedes_join)


cluster_colors <- c("Not significant" = "#eeeeee", 
                    "High-High" = "#FF0000", 
                    "Low-Low" = "#0000FF", 
                    "Low-High" = "#a7adf9", 
                    "High-Low" = "#f4ada8", 
                    "Undefined" = "#464646", 
                    "Isolated" = "#999999")


# Cast cluster label as factor----
lisa_crash_join_sf <- lisa_crash_join_sf %>% 
  mutate(cluster_label_fct = as.factor(cluster_label),
         cluster_label_fct = fct_reorder(cluster_label_fct, c_vals))

lisa_cycl_crash_join_sf <- lisa_cycl_crash_join_sf %>% 
  mutate(cluster_label_fct = as.factor(cluster_label),
         cluster_label_fct = fct_reorder(cluster_label_fct, c_vals))

lisa_pedes_crash_join_sf <- lisa_pedes_crash_join_sf %>% 
  mutate(cluster_label_fct = as.factor(cluster_label),
         cluster_label_fct = fct_reorder(cluster_label_fct, c_vals))

# Create Moran's maps using ggplot2-----

# Local Moran I Cluster Map

moran_map_overall <- lisa_crash_join_sf %>%
  ggplot() +
  geom_sf(aes(fill = cluster_label)) +
  scale_fill_manual(name = "Clusters", values = cluster_colors,
                    limits = c("Not significant", "High-High", "Low-Low", "Low-High", "High-Low",
                               "Undefined", "Isolated"),
                    labels = c("Not significant", "High-High", "Low-Low", "Low-High", "High-Low",
                               "Undefined", "Isolated")) +
  labs(
    fill = "Local Moran Map",
    title = "Car Crashes in Washington D.C.",
    subtitle = "Moran's I Cluster Map"
  ) +
  theme_map()

# Cyclist-involved crash map

moran_map_cycl <- lisa_cycl_crash_join_sf %>%
  ggplot() +
  geom_sf(aes(fill = cluster_label)) +
  scale_fill_manual(name = "Clusters", values = cluster_colors,
                    limits = c("Not significant", "High-High", "Low-Low", "Low-High", "High-Low",
                               "Undefined", "Isolated"),
                    labels = c("Not significant", "High-High", "Low-Low", "Low-High", "High-Low",
                               "Undefined", "Isolated")) +
  labs(
    fill = "Local Moran Map",
    title = "Cyclist-Involved Car Crashes in Washington D.C.",
    subtitle = "Moran's I Cluster Map"
  ) +
  theme_map()

moran_map_pedes <- lisa_pedes_crash_join_sf %>%
  ggplot() +
  geom_sf(aes(fill = cluster_label)) +
  scale_fill_manual(name = "Clusters", values = cluster_colors,
                    limits = c("Not significant", "High-High", "Low-Low", "Low-High", "High-Low",
                               "Undefined", "Isolated"),
                    labels = c("Not significant", "High-High", "Low-Low", "Low-High", "High-Low",
                               "Undefined", "Isolated")) +
  labs(
    title = "Pedestrian-Involved Car Crashes in Washington D.C.",
    subtitle = "Moran's I Cluster Map"
  ) +
  theme_map()



# Significance maps-----

p_value_colors <- c("Not significant" = "#eeeeee", 
                    "p <= 0.05" = "#84f576", 
                    "p <= 0.01" = "#53c53c", 
                    "p <= 0.001" = "#348124")

# Classify the p-values 
lisa_crash_join_sf <- lisa_crash_join_sf %>% 
  mutate(p_value_class = case_when(
      p_vals <= 0.001 ~ "p <= 0.001",
      p_vals <= 0.01 ~ "p <= 0.01", 
      p_vals <= 0.05 ~ "p <= 0.05",
      TRUE ~ "Not significant"
    ),
    p_value_class = factor(
      p_value_class,
      levels = c("Not significant", "p <= 0.05", "p <= 0.01", "p <= 0.001")
    )
  ) 

lisa_cycl_crash_join_sf <- lisa_cycl_crash_join_sf %>% 
  mutate(p_value_class = case_when(
    p_vals <= 0.001 ~ "p <= 0.001",
    p_vals <= 0.01 ~ "p <= 0.01", 
    p_vals <= 0.05 ~ "p <= 0.05",
    TRUE ~ "Not significant"
  ),
  p_value_class = factor(
    p_value_class,
    levels = c("Not significant", "p <= 0.05", "p <= 0.01", "p <= 0.001")
  )
  ) 

lisa_pedes_crash_join_sf <- lisa_pedes_crash_join_sf %>% 
  mutate(p_value_class = case_when(
    p_vals <= 0.001 ~ "p <= 0.001",
    p_vals <= 0.01 ~ "p <= 0.01", 
    p_vals <= 0.05 ~ "p <= 0.05",
    TRUE ~ "Not significant"
  ),
  p_value_class = factor(
    p_value_class,
    levels = c("Not significant", "p <= 0.05", "p <= 0.01", "p <= 0.001")
  )
  ) 

moran_sig_map_overall <- lisa_crash_join_sf %>%
  ggplot() +
  geom_sf(aes(fill = p_value_class)) +
  scale_fill_manual(name = "P-Value", values = p_value_colors,
                    limits = c("Not significant", "p <= 0.05", "p <= 0.01", "p <= 0.001"),
                    labels = c("Not significant", "p <= 0.05", "p <= 0.01", "p <= 0.001")) +
  labs(
    fill = "P-value",
    title = " ",
    subtitle = "Significance Map"
  ) +
  theme_map()

moran_sig_map_cycl <- lisa_cycl_crash_join_sf %>%
  ggplot() +
  geom_sf(aes(fill = p_value_class)) +
  scale_fill_manual(name = "P-Value", values = p_value_colors,
                    limits = c("Not significant", "p <= 0.05", "p <= 0.01", "p <= 0.001"),
                    labels = c("Not significant", "p <= 0.05", "p <= 0.01", "p <= 0.001")) +
  labs(
    fill = "P-value",
    title = " ",
    subtitle = "Significance Map"
  ) +
  theme_map()

moran_sig_map_pedes <- lisa_pedes_crash_join_sf %>%
  ggplot() +
  geom_sf(aes(fill = p_value_class)) +
  scale_fill_manual(name = "P-Value", values = p_value_colors,
                    limits = c("Not significant", "p <= 0.05", "p <= 0.01", "p <= 0.001"),
                    labels = c("Not significant", "p <= 0.05", "p <= 0.01", "p <= 0.001")) +
  labs(
    fill = "P-value",
    title = " ",
    subtitle = "Significance Map"
  ) +
  theme_map()


# rgeoda::local_moran(queenw, lisa_pedes_crash_join_sf["n"])

# 
# lisa_crash_join_sf %>%
#   ggplot() +
#   geom_point(aes(x = p_vals, y = lisa_vals))


# cowplot::plot_grid(moran_map_pedes, moran_sig_map_pedes, ncol = 2)

# cowplot::plot_grid(moran_map_pedes, moran_sig_map_pedes)

Getis-Ord Gi*

Getis-Ord Gi* statistics were calculated using the sfdep package. Getis-Ord Gi* identifies statistically significant hot spots and cold spots by looking at an observation and comparing it to its neighbors (Getis & Ord, 1992). Getis-Ord Gi* is represented by the following formula (Anselin, 2020b):

library(sf)
library(sfdep)
library(tidyverse)


# Create neighbors and weights, calculate lag----
crashes_nbs <- crashes_gc_join %>%
  mutate(nb = st_contiguity(geometry),
         wt = st_weights(nb),
         crashes_lag = st_lag(n, nb, wt))

crashes_cycl_nbs <- crashes_gc_join_cycl %>%
  mutate(nb = st_contiguity(geometry),
         wt = st_weights(nb),
         crashes_lag = st_lag(n, nb, wt))

crashes_pedes_nbs <- crashes_gc_join_pedes %>%
  mutate(nb = st_contiguity(geometry),
         wt = st_weights(nb),
         crashes_lag = st_lag(n, nb, wt))

# Create data frame for overall crashes----
crashes_nbs_gi <- crashes_nbs %>%
  mutate(gi = sfdep::local_g_perm(n, nb, wt, nsim = 999)) %>%
  unnest(cols = c(geometry, nb, wt, gi))

# Select only gi and p_folded_sim
crash_hot_spots <- crashes_nbs_gi %>%
  select(gi, p_folded_sim)

# Create data frame for cyclist crashes-----
crashes_cycl_nbs_gi <- crashes_cycl_nbs %>%
  mutate(gi = sfdep::local_g_perm(n, nb, wt, nsim = 999)) %>%
  unnest(cols = c(geometry, nb, wt, gi))

# Select only gi and p_folded_sim
cycl_crash_hot_spots <- crashes_cycl_nbs_gi %>%
  select(gi, p_folded_sim)

# Create data frame for pedestrian crashes-----
crashes_pedes_nbs_gi <- crashes_pedes_nbs %>%
  mutate(gi = sfdep::local_g_perm(n, nb, wt, nsim = 999)) %>%
  unnest(cols = c(geometry, nb, wt, gi))

pedes_crash_hot_spots <- crashes_pedes_nbs_gi %>%
  select(gi, p_folded_sim)

# Assign hot spot classification using gi and p_folded_sim values----
# Create function for case_when classification
classify_gi_case_when <- function(data) {
  data %>%
    mutate(
      classification = case_when(
        gi > 0 & p_folded_sim <= 0.01 ~ "Very hot",
        gi > 0 & p_folded_sim <= 0.05 ~ "Hot",
        gi > 0 & p_folded_sim <= 0.1 ~ "Somewhat hot",
        gi < 0 & p_folded_sim <= 0.01 ~ "Very cold",
        gi < 0 & p_folded_sim <= 0.05 ~ "Cold",
        gi < 0 & p_folded_sim <= 0.1 ~ "Somewhat cold",
        TRUE ~ "Insignificant"
      ),
      classification = factor(
        classification,
        levels = c("Very hot", "Hot", "Somewhat hot",
                   "Insignificant",
                   "Somewhat cold", "Cold", "Very cold")
      )
    )
}

crash_hot_spots <- classify_gi_case_when(crash_hot_spots)

cycl_crash_hot_spots <- classify_gi_case_when(cycl_crash_hot_spots)

pedes_crash_hot_spots <- classify_gi_case_when(pedes_crash_hot_spots)

# Visualize hot spots----
# Function for plots
getis_ord_crash_plot_fxn <- function(data, title) {
  ggplot(data, aes(fill = classification)) +
    geom_sf(color = "grey50", lwd = 0.15) +
    scale_fill_brewer(type = "div", palette = 5) +
    labs(
      fill = "Hot Spot Classification",
      title = {{title}},
      subtitle = "2009-2022"
    ) +
    theme_map() 
}


# Plot overall crashes
plot_getis_ord_overall <- getis_ord_crash_plot_fxn(crash_hot_spots, 
                                                "Car Crash Hot Spots in Washington, D.C.")

# Plot pedestrian-involved crashes
plot_getis_ord_pedes <- getis_ord_crash_plot_fxn(pedes_crash_hot_spots, 
                                                 "Pedestrian-Involved Car Crash Hot Spots in Washington, D.C.")


# Plot cyclist-involved crashes
plot_getis_ord_cycl <- getis_ord_crash_plot_fxn(cycl_crash_hot_spots, 
                                                 "Cyclist-Involved Car Crash Hot Spots in Washington, D.C.")

Visualization

The ggplot2 package (Wickham, Chang, et al., 2023) was used to create maps displaying both Local Moran I and Getis-Ord hotspots. The ggspatial package (Dunnington, 2023) was used to supplement ggplot2 in order to provide a basemap for kernel density maps. The leaflet package was used for additional interactive maps.

Figure 4 provides an overview of all of the steps involved in data wrangling and analysis.

Figure 4

Car crash analysis flowchart. Data preparation was performed prior to performing the different types of analysis involved in the car crash analysis.

Results

The results of the analysis suggest that crashes are not randomly distributed throughout roadways within the District of Columbia. There are statistically significant hotspots and clusters of car crashes within the District of Columbia. The census tracts surrounding the downtown area and the National Mall contain many car crashes. Each type of analysis will be described in depth below.

Kernel Density Estimation

The results of the kernel density estimation analysis show that car crashes are most dense in downtown D.C. and in the National Mall area. Other areas that are fairly dense include highways, particularly in Northeast and Southeast D.C. Car crashes are least dense in areas in upper Northwest D.C.

Figure 5

A Map of car crashes in Washington, D.C. 2009 – 2022 with kernel density displayed. Crashes are least prevalent in upper Northwest D.C. and most prevalent around the National Mall, downtown D.C., and in areas Northeast of downtown.

# KDE Overall
crashes_overall_kde

Pedestrian-involved crashes follow a similar pattern to overall crashes. Pedestrian-involved are most dense in downtown D.C., but the densest areas also continue upwards into parts of Northeast and Northwest D.C. Crashes are dense around Washington, D.C. roadways, but are most dense in more central areas. Car crashes that resulted in pedestrian injury are clustered in mixed-use residential neighborhoods, in nightlife corridors, and in areas highly frequented by tourists.

Figure 6

Crashes involving pedestrian injuries with kernel density displayed in Washington, D.C. between 2009 and 2022. Crashes involving pedestrian injuries are less likely to take place on highways or freeways than overall crashes. Crashes involving pedestrian injuries do frequently take place on other major or arterial roads in D.C., however. These crashes are most common in Northwest and Northwest D.C. and in areas with nightlife and tourism.

# KDE Pedestrian
pedes_involved_kde

Cyclist-involved crash clusters are similar to pedestrian-involved crash clusters, but they do appear to follow a slightly different pattern. Cyclist-involved crashes are sparser and more likely to take place in areas with developed cyclist infrastructure – particularly bike lanes and cyclepaths. Car crashes involving injuries to cyclists are less common in Southeast D.C., particularly east of the Anacostia River. In Figure 6, bike lanes within Washington, D.C. are visualized; most of these lanes are in central parts of Washington, D.C., with limited infrastructure in upper Northwest D.C. and Southeast D.C.

Figure 7

A map of bike lanes within Washington, D.C. The majority of bike lanes within D.C. are in central parts of Washington, D.C. There are few bike lanes within upper Northwest D.C. and Southeast D.C., particularly east of the Anacostia River.

library(tigris)
library(sf)
library(tidyverse)

# Read in files-----
# Census tracts already loaded
dc_bike_lanes <- st_read("shapefiles/Bicycle_Lanes/Bicycle_Lanes.shp", quiet = TRUE)
dc_outline <- dc_quadrants %>% st_union()

# Map DC Bike lanes----
dc_bike_lane_map <- dc_bike_lanes %>%
  ggplot() +
  geom_sf(data = dc_outline) +
  geom_sf(data = dc_water, fill = "skyblue") +
  geom_sf(color = "black") +
  labs(
    title = "Bike Lanes in Washington D.C.",
    subtitle = ""
  ) +
  theme_map()

# KDE Pedestrian
dc_bike_lane_map

Pedestrian-involved crashes and cyclist-involved crashes appear to be more geographically concentrated than overall car crashes, which is intuitive given that pedestrians and cyclists are less likely to have access to or be present on highways. Cyclist-involved crashes appear to be more concentrated than overall or pedestrian-involved crashes, which may be a result of infrastructure for cyclists. Few crashes involving cyclist injuries have occurred in upper Northwest D.C. or in Southeast D.C., particularly East of the Anacostia River.

Figure 8

A map of crashes involving cyclist injuries within Washington, D.C. between 2009 and 2022 with kernel density displayed. The majority of crashes involving cyclist injuries took place in the downtown area and adjacent areas primarily to the North and Northeast. These areas are also more likely to contain bike lanes and cyclepaths.

# KDE Cyclist
cyclist_involved_kde

Local Moran’s I

While kernel density estimation does provide a good visual indication of the density of car crashes, Local Indicators of Spatial Association (LISA) methods provide additional information about hot spots and cold spots for car crashes within Washington, D.C.

Car crashes are particularly high in central parts of D.C. There are large clusters of crashes near downtown D.C. and near the National Mall, in addition to some smaller clusters in Northeast D.C. Car crashes are particularly low in upper Northwest D.C. This is likely in part because Rock Creek Park, a large park in Northwest D.C. which can be seen in can be seen in the North Central portion of Figures 5 through 7, has limited vehicular traffic running through it.

Figure 9

Car crash clusters within Washington, D.C. and a corresponding significance map. Car crashes are particularly high near downtown D.C. and near the National Mall, in addition to Northeast D.C. Car crashes are particularly low in upper Northwest D.C.

# Local Moran's I Overall
cowplot::plot_grid(moran_map_overall, moran_sig_map_overall, ncol = 2)

Pedestrian-involved car crash clusters within D.C. are similar to overall car crash clusters. While there are fewer clusters than in the overall car crash map, the patterns are similar – car crashes that injure pedestrians are less likely to occur in certain neighborhoods within Northeast D.C. which often lack pedestrian infrastructure and may have fewer pedestrians.

Figure 10

A map of pedestrian-involved car crash clusters within Washington, D.C. and a corresponding significance map. Car crashes are particularly high near downtown D.C. and near the National Mall, in addition to small portions of Northeast D.C. Car crashes are particularly low in Northwest D.C.

cowplot::plot_grid(moran_map_pedes, moran_sig_map_pedes, ncol = 2)

Cyclist-involved car crash clusters differ more from the overall car crash map than pedestrian-involved car crash clusters. Cyclist-involved car crashes are most common in downtown D.C. and near the National Mall, but these clusters also extend upward into areas that are popular for nightlife and which feature more cyclist infrastructure. In addition, Southeast D.C. contains many areas where cyclist-involved car crashes are less common.

Figure 11

A map of cyclist-involved car crash clusters within Washington, D.C. and a corresponding significance map. Car crashes are particularly high near downtown D.C., north of downtown D.C., and near the National Mall. Car crashes are particularly low in upper Northwest D.C.

# Local Moran's I Cyclists
cowplot::plot_grid(moran_map_cycl, moran_sig_map_cycl, ncol = 2)

Getis-Ord Gi*

The Getis-Ord Gi* analysis results are similar to the results of the Local Moran’s I analysis. The advantage of the Getis-Ord Gi* analysis is that it provides differentiation within the groups of cold spots and hot spots.

Car crashes are less likely to take place in Northwest D.C., particularly in Ward 3 of D.C. The pattern of hot spots identified using the Getis-Ord Gi* method is similar to the pattern of clusters identified by the kernel density estimation and Local Moran’s I methods.

Figure 12

A map of crash hot spots and cold spots within Washington, D.C. between 2009 and 2022, identified using the Getis-Ord Gi* method. Crash hot spots are in the most central parts of Washington, D.C. and in Northeast D.C. These areas include the downtown and National Mall areas. Cold spots include upper Northwest D.C. areas.

# Getis-Ord Gi* Overall
plot_getis_ord_overall

Pedestrian-involved car crash hot spots are mostly confined to downtown D.C. and the National Mall area, though there are some somewhat hot and hot spots within Northeast D.C. as well.

Figure 13

A map of pedestrian-involved car crash hot spots and cold spots within Washington, D.C. between 2009 and 2022 identified using the Getis-Ord Gi* method. Crash hot spots are most concentrated in the most central parts of Washington, D.C. These areas include Capitol Hill and upper Northwest D.C. neighborhoods.

# Getis-Ord Gi* Pedestrian
plot_getis_ord_pedes

Cyclist-involved car crash hot spots are most concentrated in the most central parts of Washington, D.C. These areas include the downtown and National Mall areas. Hot and somewhat hot spots extend upward north of downtown D.C. Cold spots primarily include upper Northwest D.C. neighborhoods and Southeast D.C. neighborhoods east of the Anacostia River.

Figure 14

A map of cyclist-involved car crash hot spots and cold spots within Washington, D.C. between 2009 and 2022 identified using the Getis-Ord Gi* method. Crash hot spots are most concentrated in the most central parts of Washington, D.C., including the downtown and National Mall areas. Cold spots include upper Northwest D.C. neighborhoods and Southeast D.C. neighborhoods east of the Anacostia River.

# Getis-Ord Gi* Cyclists
plot_getis_ord_cycl

Throughout all maps and all three methods, car crashes are common in the downtown D.C. and National Mall areas and less concentrated in Northwest D.C. This is likely due to high traffic volume in downtown D.C. and in the National Mall area. Pedestrian and cyclist related crashes were more concentrated than overall crashes, which is likely a reflection of the available infrastructure for pedestrians and cyclists rather than an indication of safety or lack thereof.

In Figure 15, both Local Moran’s I and Getis-Ord Gi* statistics are displayed using leaflet. The clusters identified are similar, but Getis-Ord Gi* does not have outliers. Additionally, Getis-Ord Gi* has more levels of classification than Local Moran’s I.

library(leaflet)
library(leaflet.extras)
library(tidyverse)


# Map CSS and html-----
map_css <- 
  ".leaflet-popup-content-wrapper {
    color:#2c3e50;
    background:#fff;
    font-size:14px;
    font-family: Lato, sans-serif;
    }
  "

# Convert CSS to HTML to be used by HTMLtools package
map_html <- htmltools::tags$style(type = "text/css", map_css)


# Palettes-----
pal_moran <- colorFactor(c("#eeeeee","#FF0000", "#0000FF", "#a7adf9", "#f4ada8"), 
                         domain = c("Not significant", "High-High", "Low-Low", "Low-High", "High-Low"),
                         ordered = T)



pal_getis <- colorFactor(palette = c("#B2172C","#F08A62", "#FDDBC7", "#F7F7F7", 
                                    "#D2E5F0", "#67A9CF", "#2167AC"), 
                         domain = c("Very hot", "Hot", "Somewhat hot", "Insignficant", 
                                    "Somewhat cold", "Cold", "Very cold"),
                         na.color = "transparent",
                         ordered = T)


# Maps-----
# Moran's Map------
moran_map_leaflet <- leaflet() %>%
  addProviderTiles(providers$CartoDB.DarkMatter, group = "Carto Dark Matter") %>% 
  addProviderTiles(providers$CartoDB.Positron, group = "Carto") %>% 
  addPolygons(
    data = lisa_crash_join_sf,
    fillOpacity = .8,
    weight = .5,
    fillColor = ~pal_moran(lisa_crash_join_sf$cluster_label_fct),
    popup = paste0("<b>Cluster Type:</b> ", lisa_crash_join_sf$cluster_label_fct),
    opacity = 1,
    color =  "#7a7a7a",
    group = "Local Moran's I - Overall") %>% 
  addPolygons(
    data = lisa_pedes_crash_join_sf,
    fillOpacity = .8,
    weight = .5,
    fillColor = ~pal_moran(lisa_pedes_crash_join_sf$cluster_label_fct),
    popup = paste0("<b>Cluster Type:</b> ", lisa_pedes_crash_join_sf$cluster_label_fct),
    opacity = 1,
    color =  "#7a7a7a",
    group = "Local Moran's I - Pedestrian Injuries")  %>%
  addPolygons(
    data = lisa_cycl_crash_join_sf,
    fillOpacity = .8,
    weight = .5,
    fillColor = ~pal_moran(lisa_cycl_crash_join_sf$cluster_label_fct),
    popup = paste0("<b>Cluster Type:</b> ", lisa_cycl_crash_join_sf$cluster_label_fct),
    opacity = 1,
    color =  "#7a7a7a",
    group = "Local Moran's I - Cyclist Injuries")  %>%
  addLegend(data = lisa_cycl_crash_join_sf, 
            pal = pal_moran, 
            title = "Hot Spot Classification",
            values = ~lisa_cycl_crash_join_sf$cluster_label_fct, 
            opacity = 1) %>%
  setMaxBounds(-77.119759, 38.99511, -76.909395, 38.791645) %>%
  addLayersControl(
    baseGroups = c("Carto", "Carto Dark Matter"),
    overlayGroups = c("Local Moran's I - Overall",
                      "Local Moran's I - Pedestrian Injuries",
                      "Local Moran's I - Cyclist Injuries"),
    position = "bottomleft",
    options = layersControlOptions(collapsed = TRUE)
  ) %>%
  hideGroup(c("Local Moran's I - Pedestrian Injuries",
            "Local Moran's I - Cyclist Injuries"))

# Getis-Ord Map-----

getis_ord_leaflet_map <- leaflet() %>%
  addProviderTiles(providers$CartoDB.DarkMatter, group = "Carto Dark Matter") %>% 
  addProviderTiles(providers$CartoDB.Positron, group = "Carto") %>% 
  addPolygons(
    data = crash_hot_spots,
    fillOpacity = ifelse(crash_hot_spots$classification == "Insignificant", .4, .6),
    weight = .5,
    popup = paste0("<b>Hot spot classification:</b> ", crash_hot_spots$classification),
    fillColor = ~pal_getis(crash_hot_spots$classification),
    opacity = 1,
    color = "#7a7a7a",
    group = "Getis-Ord - Overall") %>%
  addPolygons(
    data = pedes_crash_hot_spots,
    fillOpacity = ifelse(crash_hot_spots$classification == "Insignificant", .4, .6),
    weight = .5,
    popup = paste0("<b>Hot spot classification:</b> ", pedes_crash_hot_spots$classification),
    fillColor = ~pal_getis(pedes_crash_hot_spots$classification),
    opacity = 1,
    color = "#7a7a7a",
    group = "Getis-Ord - Pedestrian Injuries") %>%
  addPolygons(
    data = cycl_crash_hot_spots,
    fillOpacity = ifelse(crash_hot_spots$classification == "Insignificant", .4, .6),
    weight = .5,
    popup = paste0("<b>Hot spot classification:</b> ", cycl_crash_hot_spots$classification),
    fillColor = ~pal_getis(cycl_crash_hot_spots$classification),
    opacity = 1,
    color = "#7a7a7a",
    group = "Getis-Ord - Cyclist Injuries") %>%
  addLegend(data = cycl_crash_hot_spots, 
            pal = pal_getis, 
            title = "Hot Spot Classification",
            values = ~cycl_crash_hot_spots$classification, 
            opacity = 1) %>%
  setMaxBounds(-77.119759, 38.99511, -76.909395, 38.791645) %>%
  addLayersControl(
    baseGroups = c("Carto", "Carto Dark Matter"),
    overlayGroups = c("Getis-Ord - Overall",
                      "Getis-Ord - Pedestrian Injuries",
                      "Getis-Ord - Cyclist Injuries"),
    position = "bottomleft",
    options = layersControlOptions(collapsed = TRUE)
  ) %>%
  hideGroup(c("Getis-Ord - Pedestrian Injuries",
              "Getis-Ord - Cyclist Injuries"))

Figure 15

Two leaflet maps of Local Moran’s I (left) and Getis-Ord Gi* (right) are displayed and synced. The different layers can be toggled using the layers control in the bottom left corner to further a comparison of the two methods. The clusters identified are similar, but Getis-Ord Gi* does not have outliers. Getis-Ord Gi* has more levels of classification than Local Moran’s I.

# Leaflet map
leafsync::sync(moran_map_leaflet, getis_ord_leaflet_map)

Conclusion

The results of the analysis in this paper suggest that car crashes are not randomly distributed throughout Washington, D.C. Crashes that resulted in injuries to pedestrians and cyclists are distributed in a different pattern than overall crashes. The pattern for cyclists differs the most from the overall crash distribution. While the reason for the distribution is unknown, this could be a result of the distribution of bike lanes and cyclepaths within the city.

In general, car crashes are clustered in downtown D.C. and near the National Mall, and are less common in upper Northwest D.C. Infrastructure appears to affect the car crashes that involve injuries to pedestrians and cyclists, as it is likely that pedestrians and cyclists are less likely to frequent areas without sufficient infrastructure.

Future Work

In the future, it will be useful to analyze the relationship between car crashes and income of the surrounding areas to determine whether car crashes in Washington, D.C. are associated with low-income census tracts or block groups. Additionally, in the future, a smaller unit of analysis such as block groups might prove useful for analysis to identify more localized hot spots and cold spots.

Further analysis might also include spatiotemporal analysis. Although changes over time do not appear to be major, the few apparent minor changes over time could indicate that a spatiotemporal analysis would result in important new insights. This type of analysis can include a spatiotemporal analysis that uses the KDE+ method (Bíl, Andrášik, & Janoška, 2013). If possible, this analysis might include potential reasons that decreases in crashes may have occurred, such as the presence of stop signs as used in the analysis by Bíl, et al. (2019) or traffic calming measures.

Reflection

There are many different ideas that I wanted to explore with this project - in terms of the content, methods, and R packages. I wanted to explore and compare different methods with the dataset of choice in the space allotted and time available. This project was very challenging, but it also felt rewarding and engaging when I was able to work through and solve different methodological or programming problems. I hope to continue to explore this topic and type of data analysis in future courses.

In particular, I would have liked to include time as a factor in my analysis. While it is not clear the patterns have changed over time, I do believe that some form of spatiotemporal analysis would have been useful to explore.

References

Anderson, T. K. (2009). Kernel density estimation and K-means clustering to profile road accident hotspots. Accident Analysis & Prevention, 41(3), 359–364. https://doi.org/10.1016/j.aap.2008.12.014

Anselin, L. (1995). Local Indicators of Spatial Association—LISA. Geographical Analysis, 27(2), 93–115. https://doi.org/10.1111/j.1538-4632.1995.tb00338.x

Anselin, L. (2020a, June 25). Local Spatial Autocorrelation. Retrieved from https://geodacenter.github.io/workbook/6a_local_auto/lab6a.html

Anselin, L. (2020b, October). Local Spatial Autocorrelation: Other Local Spatial Autocorrelation Statistics. GeoDa Center. Retrieved from http://geodacenter.github.io/workbook/6b_local_adv/lab6b.html#getis-ord-statistics

Bassani, M., Rossetti, L., & Catani, L. (2020). Spatial analysis of road crashes involving vulnerable road users in support of road safety management strategies. Transportation Research Procedia, 45, 394–401. https://doi.org/10.1016/j.trpro.2020.03.031

Bendix, A. (2022, August). Deadly car crashes hit a record high in early 2022. {Pandemic}-fueled risky driving may be to blame. NBC News. Retrieved from https://www.nbcnews.com/health/health-news/fatal-car-crash-increase-risky-driving-rcna43969

Bennett, G., & Norris, C. (2023). Pedestrian deaths in U.S. reach highest level in 40 years. Retrieved April 12, 2023, from PBS NewsHour website: https://www.pbs.org/newshour/show/pedestrian-deaths-in-u-s-reach-highest-level-in-40-years

Bíl, M., Andrášik, R., & Janoška, Z. (2013). Identification of hazardous road locations of traffic accidents by means of kernel density estimation and cluster significance evaluation. Accident Analysis and Prevention, 55, 265–273. https://doi.org/10.1016/j.aap.2013.03.003

Bíl, M., Andrášik, R., & Sedoník, J. (2019). A detailed spatiotemporal analysis of traffic crash hotspots. Applied Geography, 107(August 2018), 82–90. https://doi.org/10.1016/j.apgeog.2019.04.008

Cheng, J., Karambelkar, B., & Xie, Y. (2022). leaflet: Create Interactive Web Maps with the JavaScript Leaflet Library. Retrieved from https://rstudio.github.io/leaflet/

City of Washington DC. (2022, January). DC Quadrants. D.C. Open Data. Retrieved from https://opendata.dc.gov/datasets/dc-quadrants/explore

City of Washington DC. (2023, February). Wards from 2022. D.C. Open Data. Retrieved from https://opendata.dc.gov/datasets/wards-from-2022/explore

D.C. Department of Transportation. (2023, July 19). Vision Zero DC. Retrieved from https://visionzero.dc.gov/pages/overview

Dai, D. (2012). Identifying clusters and risk factors of injuries in pedestrian-vehicle crashes in a GIS environment. Journal of Transport Geography, 24, 206–214. https://doi.org/10.1016/j.jtrangeo.2012.02.005

Dunnington, D. (2023). ggspatial: Spatial Data Framework for ggplot2. Retrieved from https://cran.r-project.org/package=ggspatial

Getis, A., & Ord, J. K. (1992). The Analysis of Spatial Association by Use of Distance Statistics. Geographical Analysis, 24(3), 189–206. https://doi.org/https://doi.org/10.1111/j.1538-4632.1992.tb00261.x

Governors Highway Safety Association. (2021). Traffic Fatalities by State: 2020 Preliminary Data. 33. Retrieved from https://www.ghsa.org/sites/default/files/2021-03/Ped Spotlight 2021 FINAL 3.23.21.pdf

Insurance Institute for Highway Safety. (n.d.). Fatality Facts 2021: Pedestrians. Retrieved June 2, 2023, from https://www.iihs.org/topics/fatality-statistics/detail/pedestrians

Li, L., Zhu, L., & Sui, D. Z. (2007). A GIS-based Bayesian approach for analyzing spatial-temporal patterns of intra-city motor vehicle crashes. Journal of Transport Geography, 15(4), 274–285. https://doi.org/10.1016/j.jtrangeo.2006.08.005

Li, X., & Anselin, L. (2023). rgeoda: R Library for Spatial Data Analysis. Retrieved from https://cran.r-project.org/package=rgeoda

Pebesma, E. (2022). sf: Simple Features for R. Retrieved from https://cran.r-project.org/package=sf

Ripley, B. (2022). MASS: Support Functions and Datasets for Venables and Ripley’s MASS. Retrieved from http://www.stats.ox.ac.uk/pub/MASS4/

Saha, D., Alluri, P., Gan, A., & Wu, W. (2018). Spatial analysis of macro-level bicycle crashes using the class of conditional autoregressive models. Accident Analysis and Prevention, 118(February), 166–177. https://doi.org/10.1016/j.aap.2018.02.014

Spinu, V., Grolemund, G., & Wickham, H. (2023). lubridate: Make Dealing with Dates a Little Easier. Retrieved from https://cran.r-project.org/package=lubridate

U.S. Census Bureau. (2021). 2021 Area Water State-based Shapefiles TIGER/Line Shapefiles. Retrieved from https://www2.census.gov/geo/tiger/TIGER_RD18/STATE/11_DISTRICT_OF_COLUMBIA/11001/tl_rd22_11001_areawater.zip

U.S. Census Bureau. (2023a). 2021 TIGER/Line Census Tract Shapefiles (machinereadable data files). Retrieved from https://www2.census.gov/geo/tiger/TIGER_RD18/STATE/11_DISTRICT_OF_COLUMBIA/11/tl_rd22_11_tract.zip

U.S. Census Bureau. (2023b). 2021 TIGER/Line Roads Shapefiles (machinereadable data files). Retrieved from https://www2.census.gov/geo/tiger/TIGER_RD18/STATE/11_DISTRICT_OF_COLUMBIA/11001/tl_rd22_11001_roads.zip

Venables, W. N., & Ripley, B. D. (2002). Modern Applied Statistics with S (Fourth). Retrieved from https://www.stats.ox.ac.uk/pub/MASS4/

Walker, K. (2023). tigris: Load Census TIGER/Line Shapefiles. Retrieved from https://cran.r-project.org/package=tigris

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., … Yutani, H. (2019). Welcome to the {tidyverse}. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H., Chang, W., Henry, L., Pedersen, T. L., Takahashi, K., Wilke, C., … Dunnington, D. (2023). ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. Retrieved from https://cran.r-project.org/package=ggplot2

Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2023). dplyr: A Grammar of Data Manipulation. Retrieved from https://cran.r-project.org/package=dplyr

Wickham, H., Hester, J., & Bryan, J. (2023). readr: Read Rectangular Text Data. Retrieved from https://cran.r-project.org/package=readr

Yin, P. (2020). Kernels and Density Estimation. Geographic Information Science & Technology Body of Knowledge, (Q1). https://doi.org/10.22224/gistbok/2020.1.12

Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0

Spatial Analysis of Car Crashes in Washington, D.C., 2009-2022

Simone Roy

2023-07-19

Abstract

Introduction

Data

Crash Dataset

Additional Data

Methodology

Kernel Density Estimation

Local Moran’s I

Getis-Ord Gi*

Visualization

Results

Kernel Density Estimation

Local Moran’s I

Getis-Ord Gi*

Conclusion

Future Work

Reflection

References